diff --git a/drivers/gpu/drm/msm/registers/adreno/a2xx.xml b/drivers/gpu/drm/msm/registers/adreno/a2xx.xml
new file mode 100644
index 000000000000..22caddaa0db9
--- /dev/null
+++ b/drivers/gpu/drm/msm/registers/adreno/a2xx.xml
@@ -0,0 +1,1865 @@
+ note: only 0x3f worth of valid register values for VS_REGS and
+ PS_REGS, but high bit is set to indicate '0 registers used':
+ Texture state dwords
diff --git a/drivers/gpu/drm/msm/registers/adreno/a3xx.xml b/drivers/gpu/drm/msm/registers/adreno/a3xx.xml
new file mode 100644
index 000000000000..6717abc0a897
--- /dev/null
+++ b/drivers/gpu/drm/msm/registers/adreno/a3xx.xml
@@ -0,0 +1,1751 @@
+ The pair of MEM_SIZE/ADDR registers get programmed
+ in sequence with the size/addr of each buffer.
+ aka clip_halfz
+ range of -8.0 to 8.0
+ range of -512.0 to 512.0
+ RENDER_MODE is RB_RESOLVE_PASS for gmem->mem, otherwise RB_RENDER_PASS
+ render targets - 1
+ Pitch (actually, appears to be pitch in bytes, so really is a stride)
+ in GMEM, so pitch of the current tile.
+ offset into GMEM (or system memory address in bypass mode)
+ actually, appears to be pitch in bytes, so really is a stride
+ Z_READ_ENABLE bit is set for zfunc other than GL_ALWAYS or GL_NEVER
+ seems to be always set to 0x00000000
+ DEPTH_BASE is offset in GMEM to depth/stencil buffer, ie
+ bin_w * bin_h / 1024 (possible rounded up to multiple of
+ something?? ie. 39 becomes 40, 78 becomes 80.. 75 becomes
+ 80.. so maybe it needs to be multiple of 8??
+ Pitch of depth buffer or combined depth+stencil buffer
+ in z24s8 cases.
+ seems to be always set to 0x00000000
+ Base address for stencil when not using interleaved depth/stencil
+ pitch of stencil buffer when not using interleaved depth/stencil
+ seems to be set to 0x00000002 during binning pass
+ X/Y offset of current bin
+ seems to be where firmware writes BIN_DATA_ADDR from
+ CP_SET_BIN_DATA packet.. probably should be called
+ PC_BIN_BASE (just using name from yamato for now)
+ probably should be PC_BIN_SIZE
+ SIZE is current pipe width * height (in tiles)
+ N is some sort of slot # between 0..(SIZE-1). In case
+ multiple tiles use same pipe, each tile gets unique slot #
+ STRIDE_IN_VPC: ALIGN(next_outloc - 8, 4) / 4
+ (but, in cases where you'd expect 1, the blob driver uses
+ 2, so possibly 0 (no varying) or minimum of 2)
+ indexed by dimension
+ indexed by dimension, global_size / local_size
+ TOTALATTRTOVS is # of attributes to vertex shader, in register
+ slots (ie. vec4+vec3 -> 7)
+ STRMDECINSTRCNT is # of VFD_DECODE_INSTR registers valid
+ MAXSTORAGE could be # of attributes/vbo's
+ SHIFTCNT appears to be size, ie. FLOAT_32_32_32 is 12, and BYTE_8 is 1
+ From register spec:
+ start offset in on chip RAM,
+ 128bit aligned
+ The full/half register footprint is in units of four components,
+ so if r0.x is used, that counts as all of r0.[xyzw] as used.
+ There are separate full/half register footprint values as the
+ full and half registers are independent (not overlapping).
+ Presumably the thread scheduler hardware allocates the full/half
+ register names from the actual physical register file and
+ handles the register renaming.
+ From regspec:
+ SP_FS_CTRL_REG0.FS_LENGTH [31:24]: FS length, unit = 256bits.
+ If bit31 is 1, it means overflow
+ or any long shader.
+ These seem to be offsets for storage of the varyings.
+ Always seems to start from 8, possibly loc 0 and 4
+ are for gl_Position and gl_PointSize?
+ SP_VS_OBJ_START_REG contains pointer to the vertex shader program,
+ immediately followed by the binning shader program (although I
+ guess that is probably just re-using the same gpu buffer)
+ The size of memory that ldp/stp can address, in 128 byte increments.
+ The full/half register footprint is in units of four components,
+ so if r0.x is used, that counts as all of r0.[xyzw] as used.
+ There are separate full/half register footprint values as the
+ full and half registers are independent (not overlapping).
+ Presumably the thread scheduler hardware allocates the full/half
+ register names from the actual physical register file and
+ handles the register renaming.
+ From regspec:
+ SP_FS_CTRL_REG0.FS_LENGTH [31:24]: FS length, unit = 256bits.
+ If bit31 is 1, it means overflow
+ or any long shader.
+ SP_FS_OBJ_START_REG contains pointer to fragment shader program
+ seems to be one bit per scalar, '1' for flat, '0' for smooth
+ seems to be one bit per scalar, '1' for flat, '0' for smooth
+ render targets - 1
+ Configures the mapping between VSC_PIPE buffer and
+ bin, X/Y specify the bin index in the horiz/vert
+ direction (0,0 is upper left, 0,1 is leftmost bin
+ on second row, and so on). W/H specify the number
+ of bins assigned to this VSC_PIPE in the horiz/vert
+ dimension.
+ seems to be set to 0x00000001 during binning pass
+ seems to be always set to 0x00000001
+ seems to be always set to 0x00000001
+ seems to be always set to 0x00000001
+ seems to be always set to 0x00000003
+ seems to be always set to 0x00000001
+ Texture sampler dwords
+ Texture constant dwords
+ INDX is index of texture address(es) in MIPMAP state block
+ Pitch in bytes (so actually stride)
+ SWAP bit is set for BGRA instead of RGBA
diff --git a/drivers/gpu/drm/msm/registers/adreno/a4xx.xml b/drivers/gpu/drm/msm/registers/adreno/a4xx.xml
new file mode 100644
index 000000000000..69a9f9b02bc9
--- /dev/null
+++ b/drivers/gpu/drm/msm/registers/adreno/a4xx.xml
@@ -0,0 +1,2409 @@
+ Pitch (actually, appears to be pitch in bytes, so really is a stride)
+ in GMEM, so pitch of the current tile.
+ actually, appears to be pitch in bytes, so really is a stride
+ Z_READ_ENABLE bit is set for zfunc other than GL_ALWAYS or GL_NEVER
+ DEPTH_BASE is offset in GMEM to depth/stencil buffer, ie
+ bin_w * bin_h / 1024 (possible rounded up to multiple of
+ something?? ie. 39 becomes 40, 78 becomes 80.. 75 becomes
+ 80.. so maybe it needs to be multiple of 8??
+ stride of depth/stencil buffer
+ ???
+ Base address for stencil when not using interleaved depth/stencil
+ pitch of stencil buffer when not using interleaved depth/stencil
+ The full/half register footprint is in units of four components,
+ so if r0.x is used, that counts as all of r0.[xyzw] as used.
+ There are separate full/half register footprint values as the
+ full and half registers are independent (not overlapping).
+ Presumably the thread scheduler hardware allocates the full/half
+ register names from the actual physical register file and
+ handles the register renaming.
+ These seem to be offsets for storage of the varyings.
+ Always seems to start from 8, possibly loc 0 and 4
+ are for gl_Position and gl_PointSize?
+ From register spec:
+ start offset in on chip RAM,
+ 128bit aligned
+ These seem to be offsets for storage of the varyings.
+ Always seems to start from 8, possibly loc 0 and 4
+ are for gl_Position and gl_PointSize?
+ These seem to be offsets for storage of the varyings.
+ Always seems to start from 8, possibly loc 0 and 4
+ are for gl_Position and gl_PointSize?
+ Configures the mapping between VSC_PIPE buffer and
+ bin, X/Y specify the bin index in the horiz/vert
+ direction (0,0 is upper left, 0,1 is leftmost bin
+ on second row, and so on). W/H specify the number
+ of bins assigned to this VSC_PIPE in the horiz/vert
+ dimension.
+ TOTALATTRTOVS is # of attributes to vertex shader, in register
+ slots (ie. vec4+vec3 -> 7)
+ BYPASSATTROVS seems to count varyings that are just directly
+ assigned from attributes (ie, "vFoo = aFoo;")
+ STRMDECINSTRCNT is # of VFD_DECODE_INSTR registers valid
+ MAXSTORAGE could be # of attributes/vbo's
+ SHIFTCNT appears to be size, ie. FLOAT_32_32_32 is 12, and BYTE_8 is 1
+ SIZE is current pipe width * height (in tiles)
+ N is some sort of slot # between 0..(SIZE-1). In case
+ multiple tiles use same pipe, each tile gets unique slot #
+ in groups of 4x vec4, blob only uses values
+ 0, 1, 2, 4, 6, 8
+ Texture sampler dwords
+ Texture constant dwords
+ Pitch in bytes (so actually stride)
+ Pitch in bytes (so actually stride)
diff --git a/drivers/gpu/drm/msm/registers/adreno/adreno_common.xml b/drivers/gpu/drm/msm/registers/adreno/adreno_common.xml
new file mode 100644
index 000000000000..218ec8bb966e
--- /dev/null
+++ b/drivers/gpu/drm/msm/registers/adreno/adreno_common.xml
@@ -0,0 +1,400 @@
+ Registers in common between a2xx and a3xx
+Address mode for a5xx+
+ Line mode for a5xx+
+ Note that Bresenham lines are only supported with MSAA disabled.
+ Blob (v615) seem to only use SAM and I wasn't able to coerce
+ it to produce any other command.
+ Probably valid for a4xx+ but not enabled or tested on anything
+ but a6xx.
+ Produces garbage
+ Causes reads from an invalid address
+ Results in color being zero
diff --git a/drivers/gpu/drm/msm/registers/adreno/adreno_pm4.xml b/drivers/gpu/drm/msm/registers/adreno/adreno_pm4.xml
new file mode 100644
index 000000000000..cab01af55d22
--- /dev/null
+++ b/drivers/gpu/drm/msm/registers/adreno/adreno_pm4.xml
@@ -0,0 +1,2268 @@
+ Flushes dirty data from UCHE, and also writes a GPU timestamp to
+ the address if one is provided.
+ If A6XX_RB_SAMPLE_COUNT_CONTROL.copy is true, writes OQ Z passed
+ sample counts to RB_SAMPLE_COUNT_ADDR. This writes to main
+ memory, skipping UCHE.
+ Writes the GPU timestamp to the address that follows, once RB
+ access and flushes are complete.
+ Invalidates depth attachment data from the CCU. We assume this
+ happens in the last stage.
+ Invalidates color attachment data from the CCU. We assume this
+ happens in the last stage.
+ Flushes the small cache used by CP_EVENT_WRITE::BLIT (which,
+ along with its registers, would be better named RESOLVE).
+ Flushes depth attachment data from the CCU. We assume this
+ happens in the last stage.
+ Flushes color attachment data from the CCU. We assume this
+ happens in the last stage.
+ 2D blit to resolve GMEM to system memory (skipping CCU) at the
+ end of a render pass. Compare to CP_BLIT's BLIT_OP_SCALE for
+ more general blitting.
+ Clears based on GRAS_LRZ_CNTL configuration, could clear
+ fast-clear buffer or LRZ direction.
+ LRZ direction is stored at lrz_fc_offset + 0x200, has 1 byte which
+ could be expressed by enum:
+ CUR_DIR_GE = 0x1
+ CUR_DIR_LE = 0x2
+ Clear of direction means setting the direction to CUR_DIR_UNSET.
+ Invalidates UCHE.
+ Doesn't seem to do anything
+ initialize CP's micro-engine
+ skip N 32-bit words to get to the next packet
+ indirect buffer dispatch. prefetch parser uses this packet
+ type to determine whether to pre-fetch the IB
+ Takes the same arguments as CP_INDIRECT_BUFFER, but jumps to
+ another buffer at the same level. Must be at the end of IB, and
+ doesn't work with draw state IB's.
+ indirect buffer dispatch. same as IB, but init is pipelined
+ Waits for the IDLE state of the engine before further drawing.
+ This is pipelined, so the CP may continue.
+ wait until a register or memory location is a specific value
+ wait until a register location is equal to a specific value
+ wait until a register location is >= a specific value
+ wait until a read completes
+ wait until all base/size writes from an IB_PFD packet have completed
+ register read/modify/write
+ Set binning configuration registers
+ reads register in chip and writes to memory
+ write N 32-bit words to memory
+ write CP_PROG_COUNTER value to memory
+ conditional execution of a sequence of packets
+ conditional write to memory or register
+ generate an event that creates a write to memory when completed
+ generate a VS|PS_done event
+ generate a cache flush done event
+ generate a z_pass done event
+ not sure the real name, but this seems to be what is used for
+ opencl, instead of CP_DRAW_INDX..
+ initiate fetch of index buffer and draw
+ draw using supplied indices in packet
+ initiate fetch of index buffer and binIDs and draw
+ initiate fetch of bin IDs and draw using supplied indices
+ begin/end initiator for viz query extent processing
+ fetch state sub-blocks and initiate shader code DMAs
+ load constant into chip and to memory
+ load sequencer instruction memory (pointer-based)
+ load sequencer instruction memory (code embedded in packet)
+ load constants from a location in memory
+ selective invalidation of state pointers
+ dynamically changes shader instruction memory partition
+ sets the 64-bit BIN_MASK register in the PFP
+ sets the 64-bit BIN_SELECT register in the PFP
+ updates the current context, if needed
+ generate interrupt from the command stream
+ copy sequencer instruction memory to system memory
+ sets draw initiator flags register in PFP, gets bitwise-ORed into
+ every draw initiator
+ sets the register protection mode
+ load high level sequencer command
+ Conditionally load a IB based on a flag, prefetch enabled
+ Conditionally load a IB based on a flag, prefetch disabled
+ Load a buffer with pre-fetch enabled
+ Set bin (?)
+ test 2 memory locations to dword values specified
+ Write register, ignoring context state for context sensitive registers
+ Record the real-time when this packet is processed by PFP
+ PFP waits until the FIFO between the PFP and the ME is empty
+ Used a bit like CP_SET_CONSTANT on a2xx, but can write multiple
+ groups of registers. Looks like it can be used to create state
+ objects in GPU memory, and on state change only emit pointer
+ (via CP_SET_DRAW_STATE), which should be nice for reducing CPU
+ overhead:
+ (A4x) save PM4 stream pointers to execute upon a visible draw
+ Enable or disable predication globally. Also resets the
+ predicate to "passing" and the local bit to enabled when
+ enabling global predication.
+ Enable or disable predication locally. Unlike globally enabling
+ predication, this packet doesn't touch any other state.
+ Predication only happens when enabled globally and locally and a
+ predicate has been set. This should be used for internal draws
+ which aren't supposed to use the predication state:
+ ... do draw...
+ Latch a draw predicate into the internal register.
+ for A4xx
+ Write to register with address that does not fit into type-0 pkt
+ copy from ME scratch RAM to a register
+ Copy from REG to ME scratch RAM
+ Wait for memory writes to complete
+ Conditional execution based on register comparison
+ Memory to REG copy
+ for a5xx
+ Tells CP the current mode of GPU operation
+ Instruct CP to set a few internal CP registers
+ Enables IB2 skipping. If both GLOBAL and LOCAL are 1 and
+ nothing is left in the visibility stream, then
+ CP_INDIRECT_BUFFER will be skipped, and draws will early return
+ from their IB.
+ General purpose 2D blit engine for image transfers and mipmap
+ generation. Reads through UCHE, writes through the CCU cache in
+ the PS stage.
+ Write CP_CONTEXT_SWITCH_*_INFO from CP to the following dwords,
+ and forcibly switch to the indicated context.
+ These first appear in a650_sqe.bin. They can in theory be used
+ to loop any sequence of IB1 commands, but in practice they are
+ used to loop over bins. There is a fixed-size per-iteration
+ prefix, used to set per-bin state, and then the following IB1
+ commands are executed until CP_END_BIN which are always the same
+ for each iteration and usually contain a list of
+ CP_INDIRECT_BUFFER calls to IB2 commands which setup state and
+ execute restore/draw/save commands. This replaces the previous
+ technique of just repeating the CP_INDIRECT_BUFFER calls and
+ "unrolling" the loop.
+ Make next dword 1 to disable preemption, 0 to re-enable it.
+ Can clear BV/BR counters, or wait until one catches up to another
+ Clears, adds to local, or adds to global timestamp
+ Write to a scratch memory that is read by CP_REG_TEST with
+ SOURCE_SCRATCH_MEM set. It's not the same scratch as scratch registers.
+ However it uses the same memory space.
+ Executes an array of fixed-size command buffers where each
+ buffer is assumed to have one draw call, skipping buffers with
+ non-visible draw calls.
+ Reset various on-chip state used for synchronization
+ Load state, a3xx (and later?)
+ inline with the CP_LOAD_STATE packet
+ in buffer pointed to by EXT_SRC_ADDR
+ Load state, a4xx+
+ Load state, a6xx+
+ SS6_UBO used by the a6xx vulkan blob with tesselation constants
+ in this case, EXT_SRC_ADDR is (ubo_id shl 16 | offset)
+ to load constants from a UBO loaded with DST_OFF = 14 and offset 0,
+ EXT_SRC_ADDR = 0xe0000
+ (offset is a guess, should be in bytes given that maxUniformBufferRange=64k)
+ DST_OFF same as in CP_LOAD_STATE6 - vec4 VS const at this offset will
+ be updated for each draw to {draw_id, first_vertex, first_instance, 0}
+ value of 0 disables it
+ Read a 64-bit value at the given address and
+ test if it equals/doesn't equal 0.
+ value at offset 0 always seems to be 0x00000000..
+ Like CP_SET_BIN_DATA5, but set the pointers as offsets from the
+ pointers stored in VSC_PIPE_{DATA,DATA2,SIZE}_ADDRESS. Useful
+ for Vulkan where these values aren't known when the command
+ stream is recorded.
+ Modifies DST_REG using two sources that can either be registers
+ or immediates. If SRC1_ADD is set, then do the following:
+ $dst = (($dst & $src0) rot $rotate) + $src1
+ Otherwise:
+ $dst = (($dst & $src0) rot $rotate) | $src1
+ Here "rot" means rotate left.
+ Like CP_REG_TO_MEM, but the memory address to write to can be
+ offsetted using either one or two registers or scratch
+ registers.
+ Like CP_REG_TO_MEM, but the memory address to write to can be
+ offsetted using a DWORD in memory.
+ Wait until a memory value is greater than or equal to the
+ reference, using signed comparison.
+ This uses the same internal comparison as CP_COND_WRITE,
+ but waits until the comparison is true instead. It busy-loops in
+ the CP for the given number of cycles before trying again.
+ Waits for REG0 to not be 0 or REG1 to not equal REF
+ Tell CP the current operation mode, indicates save and restore procedure
+ Set internal CP registers, used to indicate context save data addresses
+ Tests bit in specified register and sets predicate for CP_COND_REG_EXEC.
+ So:
+ opcode: CP_REG_TEST (39) (2 dwords)
+ { REG = 0xc10 | BIT = 0 }
+ 0000: 70b90001 00000c10
+ opcode: CP_COND_REG_EXEC (47) (3 dwords)
+ 0000: 70c70002 10000000 00000004
+ opcode: CP_INDIRECT_BUFFER (3f) (4 dwords)
+ Will execute the CP_INDIRECT_BUFFER only if b0 in the register at
+ offset 0x0c10 is 1
+ Executes the following DWORDs of commands if the dword at ADDR0
+ is not equal to 0 and the dword at ADDR1 is less than REF
+ (signed comparison).
+ Used by the userspace driver to set various IB's which are
+ executed during context save/restore for handling
+ state that isn't restored by the
+ context switch routine itself.
+ Executed unconditionally when switching back to the context.
+ Executed when switching back after switching
+ away during execution of
+ a CP_SET_MARKER packet with RM6_YIELD as the
+ payload *and* the normal save routine was
+ bypassed for a shorter one. I think this is
+ connected to the "skipsaverestore" bit set by
+ the kernel when preempting.
+ Executed when switching away from the context,
+ except for context switches initiated via
+ This can only be set by the RB (i.e. the kernel)
+ and executes with protected mode off, but
+ is otherwise similar to SAVE_IB.
+ Note, kgsl calls this CP_KMD_AMBLE_TYPE
+ Keep shadow copies of these registers and only set them
+ when drawing, avoiding redundant writes:
+ - VPC_CNTL_0
+ Track RB_RENDER_CNTL, and insert a WFI in the following
+ situation:
+ - There is a write that disables binning
+ - There was a draw with binning left enabled, but in
+ BYPASS mode
+ Presumably this is a hang workaround?
+ Do a mysterious CP_EVENT_WRITE 0x3f when the low bit of
+ the data to write is 0. Used by the Vulkan blob with
+ PC_MULTIVIEW_CNTL, but this isn't predicated on particular
+ register(s) like the others.
+ GRAS_LRZ_DEPTH_VIEW with previous values, and if one of
+ the following is true:
+ - GRAS_LRZ_CNTL::GREATER has changed
+ - GRAS_LRZ_CNTL::DIR has changed, the old value is not
+ CUR_DIR_GE, and the new value is not CUR_DIR_DISABLED
+ - GRAS_LRZ_DEPTH_VIEW has changed
+ then it does a LRZ_FLUSH with GRAS_LRZ_CNTL::ENABLE
+ forced to 1.
+ Only exists in a650_sqe.fw.
+ Note that the SMMU's definition of TTBRn can take different forms
+ depending on the pgtable format. But a5xx+ only uses aarch64
+ format.
+ Unused, does not apply to aarch64 pgtable format
+ Size of prefix for each bin. For each bin index i, the
+ prefix commands at PREFIX_ADDR + i * PREFIX_DWORDS are
+ executed in an IB2 before the IB1 commands following
+ this packet.
+ Number of dwords after this packet until CP_END_BIN
+ Best guess is that it is a faster way to fetch all the VSC_STATE registers
+ and keep them in a local scratch memory instead of fetching every time
+ when skipping IBs.
+ Scratch memory size is 48 dwords`