6acc942c5e
Kernel sphinx has learned how to do that in
commit d74b0d31dd
Author: Jonathan Corbet <corbet@lwn.net>
Date: Thu Apr 25 07:55:07 2019 -0600
Docs: An initial automarkup extension for sphinx
Unfortunately it hasn't learned that yet for structures, so we're
stuck with the :c:type: noise for now still.
Reviewed-by: Thierry Reding <treding@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191204101933.861169-1-daniel.vetter@ffwll.ch
507 lines
19 KiB
ReStructuredText
507 lines
19 KiB
ReStructuredText
=====================
|
|
DRM Memory Management
|
|
=====================
|
|
|
|
Modern Linux systems require large amount of graphics memory to store
|
|
frame buffers, textures, vertices and other graphics-related data. Given
|
|
the very dynamic nature of many of that data, managing graphics memory
|
|
efficiently is thus crucial for the graphics stack and plays a central
|
|
role in the DRM infrastructure.
|
|
|
|
The DRM core includes two memory managers, namely Translation Table Maps
|
|
(TTM) and Graphics Execution Manager (GEM). TTM was the first DRM memory
|
|
manager to be developed and tried to be a one-size-fits-them all
|
|
solution. It provides a single userspace API to accommodate the need of
|
|
all hardware, supporting both Unified Memory Architecture (UMA) devices
|
|
and devices with dedicated video RAM (i.e. most discrete video cards).
|
|
This resulted in a large, complex piece of code that turned out to be
|
|
hard to use for driver development.
|
|
|
|
GEM started as an Intel-sponsored project in reaction to TTM's
|
|
complexity. Its design philosophy is completely different: instead of
|
|
providing a solution to every graphics memory-related problems, GEM
|
|
identified common code between drivers and created a support library to
|
|
share it. GEM has simpler initialization and execution requirements than
|
|
TTM, but has no video RAM management capabilities and is thus limited to
|
|
UMA devices.
|
|
|
|
The Translation Table Manager (TTM)
|
|
===================================
|
|
|
|
TTM design background and information belongs here.
|
|
|
|
TTM initialization
|
|
------------------
|
|
|
|
**Warning**
|
|
This section is outdated.
|
|
|
|
Drivers wishing to support TTM must pass a filled :c:type:`ttm_bo_driver
|
|
<ttm_bo_driver>` structure to ttm_bo_device_init, together with an
|
|
initialized global reference to the memory manager. The ttm_bo_driver
|
|
structure contains several fields with function pointers for
|
|
initializing the TTM, allocating and freeing memory, waiting for command
|
|
completion and fence synchronization, and memory migration.
|
|
|
|
The :c:type:`struct drm_global_reference <drm_global_reference>` is made
|
|
up of several fields:
|
|
|
|
.. code-block:: c
|
|
|
|
struct drm_global_reference {
|
|
enum ttm_global_types global_type;
|
|
size_t size;
|
|
void *object;
|
|
int (*init) (struct drm_global_reference *);
|
|
void (*release) (struct drm_global_reference *);
|
|
};
|
|
|
|
|
|
There should be one global reference structure for your memory manager
|
|
as a whole, and there will be others for each object created by the
|
|
memory manager at runtime. Your global TTM should have a type of
|
|
TTM_GLOBAL_TTM_MEM. The size field for the global object should be
|
|
sizeof(struct ttm_mem_global), and the init and release hooks should
|
|
point at your driver-specific init and release routines, which probably
|
|
eventually call ttm_mem_global_init and ttm_mem_global_release,
|
|
respectively.
|
|
|
|
Once your global TTM accounting structure is set up and initialized by
|
|
calling ttm_global_item_ref() on it, you need to create a buffer
|
|
object TTM to provide a pool for buffer object allocation by clients and
|
|
the kernel itself. The type of this object should be
|
|
TTM_GLOBAL_TTM_BO, and its size should be sizeof(struct
|
|
ttm_bo_global). Again, driver-specific init and release functions may
|
|
be provided, likely eventually calling ttm_bo_global_ref_init() and
|
|
ttm_bo_global_ref_release(), respectively. Also, like the previous
|
|
object, ttm_global_item_ref() is used to create an initial reference
|
|
count for the TTM, which will call your initialization function.
|
|
|
|
See the radeon_ttm.c file for an example of usage.
|
|
|
|
The Graphics Execution Manager (GEM)
|
|
====================================
|
|
|
|
The GEM design approach has resulted in a memory manager that doesn't
|
|
provide full coverage of all (or even all common) use cases in its
|
|
userspace or kernel API. GEM exposes a set of standard memory-related
|
|
operations to userspace and a set of helper functions to drivers, and
|
|
let drivers implement hardware-specific operations with their own
|
|
private API.
|
|
|
|
The GEM userspace API is described in the `GEM - the Graphics Execution
|
|
Manager <http://lwn.net/Articles/283798/>`__ article on LWN. While
|
|
slightly outdated, the document provides a good overview of the GEM API
|
|
principles. Buffer allocation and read and write operations, described
|
|
as part of the common GEM API, are currently implemented using
|
|
driver-specific ioctls.
|
|
|
|
GEM is data-agnostic. It manages abstract buffer objects without knowing
|
|
what individual buffers contain. APIs that require knowledge of buffer
|
|
contents or purpose, such as buffer allocation or synchronization
|
|
primitives, are thus outside of the scope of GEM and must be implemented
|
|
using driver-specific ioctls.
|
|
|
|
On a fundamental level, GEM involves several operations:
|
|
|
|
- Memory allocation and freeing
|
|
- Command execution
|
|
- Aperture management at command execution time
|
|
|
|
Buffer object allocation is relatively straightforward and largely
|
|
provided by Linux's shmem layer, which provides memory to back each
|
|
object.
|
|
|
|
Device-specific operations, such as command execution, pinning, buffer
|
|
read & write, mapping, and domain ownership transfers are left to
|
|
driver-specific ioctls.
|
|
|
|
GEM Initialization
|
|
------------------
|
|
|
|
Drivers that use GEM must set the DRIVER_GEM bit in the struct
|
|
:c:type:`struct drm_driver <drm_driver>` driver_features
|
|
field. The DRM core will then automatically initialize the GEM core
|
|
before calling the load operation. Behind the scene, this will create a
|
|
DRM Memory Manager object which provides an address space pool for
|
|
object allocation.
|
|
|
|
In a KMS configuration, drivers need to allocate and initialize a
|
|
command ring buffer following core GEM initialization if required by the
|
|
hardware. UMA devices usually have what is called a "stolen" memory
|
|
region, which provides space for the initial framebuffer and large,
|
|
contiguous memory regions required by the device. This space is
|
|
typically not managed by GEM, and must be initialized separately into
|
|
its own DRM MM object.
|
|
|
|
GEM Objects Creation
|
|
--------------------
|
|
|
|
GEM splits creation of GEM objects and allocation of the memory that
|
|
backs them in two distinct operations.
|
|
|
|
GEM objects are represented by an instance of struct :c:type:`struct
|
|
drm_gem_object <drm_gem_object>`. Drivers usually need to
|
|
extend GEM objects with private information and thus create a
|
|
driver-specific GEM object structure type that embeds an instance of
|
|
struct :c:type:`struct drm_gem_object <drm_gem_object>`.
|
|
|
|
To create a GEM object, a driver allocates memory for an instance of its
|
|
specific GEM object type and initializes the embedded struct
|
|
:c:type:`struct drm_gem_object <drm_gem_object>` with a call
|
|
to drm_gem_object_init(). The function takes a pointer
|
|
to the DRM device, a pointer to the GEM object and the buffer object
|
|
size in bytes.
|
|
|
|
GEM uses shmem to allocate anonymous pageable memory.
|
|
drm_gem_object_init() will create an shmfs file of the
|
|
requested size and store it into the struct :c:type:`struct
|
|
drm_gem_object <drm_gem_object>` filp field. The memory is
|
|
used as either main storage for the object when the graphics hardware
|
|
uses system memory directly or as a backing store otherwise.
|
|
|
|
Drivers are responsible for the actual physical pages allocation by
|
|
calling shmem_read_mapping_page_gfp() for each page.
|
|
Note that they can decide to allocate pages when initializing the GEM
|
|
object, or to delay allocation until the memory is needed (for instance
|
|
when a page fault occurs as a result of a userspace memory access or
|
|
when the driver needs to start a DMA transfer involving the memory).
|
|
|
|
Anonymous pageable memory allocation is not always desired, for instance
|
|
when the hardware requires physically contiguous system memory as is
|
|
often the case in embedded devices. Drivers can create GEM objects with
|
|
no shmfs backing (called private GEM objects) by initializing them with a call
|
|
to drm_gem_private_object_init() instead of drm_gem_object_init(). Storage for
|
|
private GEM objects must be managed by drivers.
|
|
|
|
GEM Objects Lifetime
|
|
--------------------
|
|
|
|
All GEM objects are reference-counted by the GEM core. References can be
|
|
acquired and release by calling drm_gem_object_get() and drm_gem_object_put()
|
|
respectively. The caller must hold the :c:type:`struct drm_device <drm_device>`
|
|
struct_mutex lock when calling drm_gem_object_get(). As a convenience, GEM
|
|
provides drm_gem_object_put_unlocked() functions that can be called without
|
|
holding the lock.
|
|
|
|
When the last reference to a GEM object is released the GEM core calls
|
|
the :c:type:`struct drm_driver <drm_driver>` gem_free_object_unlocked
|
|
operation. That operation is mandatory for GEM-enabled drivers and must
|
|
free the GEM object and all associated resources.
|
|
|
|
void (\*gem_free_object) (struct drm_gem_object \*obj); Drivers are
|
|
responsible for freeing all GEM object resources. This includes the
|
|
resources created by the GEM core, which need to be released with
|
|
drm_gem_object_release().
|
|
|
|
GEM Objects Naming
|
|
------------------
|
|
|
|
Communication between userspace and the kernel refers to GEM objects
|
|
using local handles, global names or, more recently, file descriptors.
|
|
All of those are 32-bit integer values; the usual Linux kernel limits
|
|
apply to the file descriptors.
|
|
|
|
GEM handles are local to a DRM file. Applications get a handle to a GEM
|
|
object through a driver-specific ioctl, and can use that handle to refer
|
|
to the GEM object in other standard or driver-specific ioctls. Closing a
|
|
DRM file handle frees all its GEM handles and dereferences the
|
|
associated GEM objects.
|
|
|
|
To create a handle for a GEM object drivers call drm_gem_handle_create(). The
|
|
function takes a pointer to the DRM file and the GEM object and returns a
|
|
locally unique handle. When the handle is no longer needed drivers delete it
|
|
with a call to drm_gem_handle_delete(). Finally the GEM object associated with a
|
|
handle can be retrieved by a call to drm_gem_object_lookup().
|
|
|
|
Handles don't take ownership of GEM objects, they only take a reference
|
|
to the object that will be dropped when the handle is destroyed. To
|
|
avoid leaking GEM objects, drivers must make sure they drop the
|
|
reference(s) they own (such as the initial reference taken at object
|
|
creation time) as appropriate, without any special consideration for the
|
|
handle. For example, in the particular case of combined GEM object and
|
|
handle creation in the implementation of the dumb_create operation,
|
|
drivers must drop the initial reference to the GEM object before
|
|
returning the handle.
|
|
|
|
GEM names are similar in purpose to handles but are not local to DRM
|
|
files. They can be passed between processes to reference a GEM object
|
|
globally. Names can't be used directly to refer to objects in the DRM
|
|
API, applications must convert handles to names and names to handles
|
|
using the DRM_IOCTL_GEM_FLINK and DRM_IOCTL_GEM_OPEN ioctls
|
|
respectively. The conversion is handled by the DRM core without any
|
|
driver-specific support.
|
|
|
|
GEM also supports buffer sharing with dma-buf file descriptors through
|
|
PRIME. GEM-based drivers must use the provided helpers functions to
|
|
implement the exporting and importing correctly. See ?. Since sharing
|
|
file descriptors is inherently more secure than the easily guessable and
|
|
global GEM names it is the preferred buffer sharing mechanism. Sharing
|
|
buffers through GEM names is only supported for legacy userspace.
|
|
Furthermore PRIME also allows cross-device buffer sharing since it is
|
|
based on dma-bufs.
|
|
|
|
GEM Objects Mapping
|
|
-------------------
|
|
|
|
Because mapping operations are fairly heavyweight GEM favours
|
|
read/write-like access to buffers, implemented through driver-specific
|
|
ioctls, over mapping buffers to userspace. However, when random access
|
|
to the buffer is needed (to perform software rendering for instance),
|
|
direct access to the object can be more efficient.
|
|
|
|
The mmap system call can't be used directly to map GEM objects, as they
|
|
don't have their own file handle. Two alternative methods currently
|
|
co-exist to map GEM objects to userspace. The first method uses a
|
|
driver-specific ioctl to perform the mapping operation, calling
|
|
do_mmap() under the hood. This is often considered
|
|
dubious, seems to be discouraged for new GEM-enabled drivers, and will
|
|
thus not be described here.
|
|
|
|
The second method uses the mmap system call on the DRM file handle. void
|
|
\*mmap(void \*addr, size_t length, int prot, int flags, int fd, off_t
|
|
offset); DRM identifies the GEM object to be mapped by a fake offset
|
|
passed through the mmap offset argument. Prior to being mapped, a GEM
|
|
object must thus be associated with a fake offset. To do so, drivers
|
|
must call drm_gem_create_mmap_offset() on the object.
|
|
|
|
Once allocated, the fake offset value must be passed to the application
|
|
in a driver-specific way and can then be used as the mmap offset
|
|
argument.
|
|
|
|
The GEM core provides a helper method drm_gem_mmap() to
|
|
handle object mapping. The method can be set directly as the mmap file
|
|
operation handler. It will look up the GEM object based on the offset
|
|
value and set the VMA operations to the :c:type:`struct drm_driver
|
|
<drm_driver>` gem_vm_ops field. Note that drm_gem_mmap() doesn't map memory to
|
|
userspace, but relies on the driver-provided fault handler to map pages
|
|
individually.
|
|
|
|
To use drm_gem_mmap(), drivers must fill the struct :c:type:`struct drm_driver
|
|
<drm_driver>` gem_vm_ops field with a pointer to VM operations.
|
|
|
|
The VM operations is a :c:type:`struct vm_operations_struct <vm_operations_struct>`
|
|
made up of several fields, the more interesting ones being:
|
|
|
|
.. code-block:: c
|
|
|
|
struct vm_operations_struct {
|
|
void (*open)(struct vm_area_struct * area);
|
|
void (*close)(struct vm_area_struct * area);
|
|
vm_fault_t (*fault)(struct vm_fault *vmf);
|
|
};
|
|
|
|
|
|
The open and close operations must update the GEM object reference
|
|
count. Drivers can use the drm_gem_vm_open() and drm_gem_vm_close() helper
|
|
functions directly as open and close handlers.
|
|
|
|
The fault operation handler is responsible for mapping individual pages
|
|
to userspace when a page fault occurs. Depending on the memory
|
|
allocation scheme, drivers can allocate pages at fault time, or can
|
|
decide to allocate memory for the GEM object at the time the object is
|
|
created.
|
|
|
|
Drivers that want to map the GEM object upfront instead of handling page
|
|
faults can implement their own mmap file operation handler.
|
|
|
|
For platforms without MMU the GEM core provides a helper method
|
|
drm_gem_cma_get_unmapped_area(). The mmap() routines will call this to get a
|
|
proposed address for the mapping.
|
|
|
|
To use drm_gem_cma_get_unmapped_area(), drivers must fill the struct
|
|
:c:type:`struct file_operations <file_operations>` get_unmapped_area field with
|
|
a pointer on drm_gem_cma_get_unmapped_area().
|
|
|
|
More detailed information about get_unmapped_area can be found in
|
|
Documentation/nommu-mmap.txt
|
|
|
|
Memory Coherency
|
|
----------------
|
|
|
|
When mapped to the device or used in a command buffer, backing pages for
|
|
an object are flushed to memory and marked write combined so as to be
|
|
coherent with the GPU. Likewise, if the CPU accesses an object after the
|
|
GPU has finished rendering to the object, then the object must be made
|
|
coherent with the CPU's view of memory, usually involving GPU cache
|
|
flushing of various kinds. This core CPU<->GPU coherency management is
|
|
provided by a device-specific ioctl, which evaluates an object's current
|
|
domain and performs any necessary flushing or synchronization to put the
|
|
object into the desired coherency domain (note that the object may be
|
|
busy, i.e. an active render target; in that case, setting the domain
|
|
blocks the client and waits for rendering to complete before performing
|
|
any necessary flushing operations).
|
|
|
|
Command Execution
|
|
-----------------
|
|
|
|
Perhaps the most important GEM function for GPU devices is providing a
|
|
command execution interface to clients. Client programs construct
|
|
command buffers containing references to previously allocated memory
|
|
objects, and then submit them to GEM. At that point, GEM takes care to
|
|
bind all the objects into the GTT, execute the buffer, and provide
|
|
necessary synchronization between clients accessing the same buffers.
|
|
This often involves evicting some objects from the GTT and re-binding
|
|
others (a fairly expensive operation), and providing relocation support
|
|
which hides fixed GTT offsets from clients. Clients must take care not
|
|
to submit command buffers that reference more objects than can fit in
|
|
the GTT; otherwise, GEM will reject them and no rendering will occur.
|
|
Similarly, if several objects in the buffer require fence registers to
|
|
be allocated for correct rendering (e.g. 2D blits on pre-965 chips),
|
|
care must be taken not to require more fence registers than are
|
|
available to the client. Such resource management should be abstracted
|
|
from the client in libdrm.
|
|
|
|
GEM Function Reference
|
|
----------------------
|
|
|
|
.. kernel-doc:: include/drm/drm_gem.h
|
|
:internal:
|
|
|
|
.. kernel-doc:: drivers/gpu/drm/drm_gem.c
|
|
:export:
|
|
|
|
GEM CMA Helper Functions Reference
|
|
----------------------------------
|
|
|
|
.. kernel-doc:: drivers/gpu/drm/drm_gem_cma_helper.c
|
|
:doc: cma helpers
|
|
|
|
.. kernel-doc:: include/drm/drm_gem_cma_helper.h
|
|
:internal:
|
|
|
|
.. kernel-doc:: drivers/gpu/drm/drm_gem_cma_helper.c
|
|
:export:
|
|
|
|
VRAM Helper Function Reference
|
|
==============================
|
|
|
|
.. kernel-doc:: drivers/gpu/drm/drm_vram_helper_common.c
|
|
:doc: overview
|
|
|
|
.. kernel-doc:: include/drm/drm_gem_vram_helper.h
|
|
:internal:
|
|
|
|
GEM VRAM Helper Functions Reference
|
|
-----------------------------------
|
|
|
|
.. kernel-doc:: drivers/gpu/drm/drm_gem_vram_helper.c
|
|
:doc: overview
|
|
|
|
.. kernel-doc:: include/drm/drm_gem_vram_helper.h
|
|
:internal:
|
|
|
|
.. kernel-doc:: drivers/gpu/drm/drm_gem_vram_helper.c
|
|
:export:
|
|
|
|
GEM TTM Helper Functions Reference
|
|
-----------------------------------
|
|
|
|
.. kernel-doc:: drivers/gpu/drm/drm_gem_ttm_helper.c
|
|
:doc: overview
|
|
|
|
.. kernel-doc:: drivers/gpu/drm/drm_gem_ttm_helper.c
|
|
:export:
|
|
|
|
VMA Offset Manager
|
|
==================
|
|
|
|
.. kernel-doc:: drivers/gpu/drm/drm_vma_manager.c
|
|
:doc: vma offset manager
|
|
|
|
.. kernel-doc:: include/drm/drm_vma_manager.h
|
|
:internal:
|
|
|
|
.. kernel-doc:: drivers/gpu/drm/drm_vma_manager.c
|
|
:export:
|
|
|
|
.. _prime_buffer_sharing:
|
|
|
|
PRIME Buffer Sharing
|
|
====================
|
|
|
|
PRIME is the cross device buffer sharing framework in drm, originally
|
|
created for the OPTIMUS range of multi-gpu platforms. To userspace PRIME
|
|
buffers are dma-buf based file descriptors.
|
|
|
|
Overview and Lifetime Rules
|
|
---------------------------
|
|
|
|
.. kernel-doc:: drivers/gpu/drm/drm_prime.c
|
|
:doc: overview and lifetime rules
|
|
|
|
PRIME Helper Functions
|
|
----------------------
|
|
|
|
.. kernel-doc:: drivers/gpu/drm/drm_prime.c
|
|
:doc: PRIME Helpers
|
|
|
|
PRIME Function References
|
|
-------------------------
|
|
|
|
.. kernel-doc:: include/drm/drm_prime.h
|
|
:internal:
|
|
|
|
.. kernel-doc:: drivers/gpu/drm/drm_prime.c
|
|
:export:
|
|
|
|
DRM MM Range Allocator
|
|
======================
|
|
|
|
Overview
|
|
--------
|
|
|
|
.. kernel-doc:: drivers/gpu/drm/drm_mm.c
|
|
:doc: Overview
|
|
|
|
LRU Scan/Eviction Support
|
|
-------------------------
|
|
|
|
.. kernel-doc:: drivers/gpu/drm/drm_mm.c
|
|
:doc: lru scan roster
|
|
|
|
DRM MM Range Allocator Function References
|
|
------------------------------------------
|
|
|
|
.. kernel-doc:: include/drm/drm_mm.h
|
|
:internal:
|
|
|
|
.. kernel-doc:: drivers/gpu/drm/drm_mm.c
|
|
:export:
|
|
|
|
DRM Cache Handling
|
|
==================
|
|
|
|
.. kernel-doc:: drivers/gpu/drm/drm_cache.c
|
|
:export:
|
|
|
|
DRM Sync Objects
|
|
===========================
|
|
|
|
.. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
|
|
:doc: Overview
|
|
|
|
.. kernel-doc:: include/drm/drm_syncobj.h
|
|
:internal:
|
|
|
|
.. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
|
|
:export:
|
|
|
|
GPU Scheduler
|
|
=============
|
|
|
|
Overview
|
|
--------
|
|
|
|
.. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
|
|
:doc: Overview
|
|
|
|
Scheduler Function References
|
|
-----------------------------
|
|
|
|
.. kernel-doc:: include/drm/gpu_scheduler.h
|
|
:internal:
|
|
|
|
.. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
|
|
:export:
|