linux/Documentation/mm/memory-model.rst

176 lines
7.9 KiB
ReStructuredText
Raw Normal View History

.. SPDX-License-Identifier: GPL-2.0
=====================
Physical Memory Model
=====================
Physical memory in a system may be addressed in different ways. The
simplest case is when the physical memory starts at address 0 and
spans a contiguous range up to the maximal address. It could be,
however, that this range contains small holes that are not accessible
for the CPU. Then there could be several contiguous ranges at
completely distinct addresses. And, don't forget about NUMA, where
different memory banks are attached to different CPUs.
Linux abstracts this diversity using one of the two memory models:
FLATMEM and SPARSEMEM. Each architecture defines what
memory models it supports, what the default memory model is and
whether it is possible to manually override that default.
All the memory models track the status of physical page frames using
struct page arranged in one or more arrays.
Regardless of the selected memory model, there exists one-to-one
mapping between the physical page frame number (PFN) and the
corresponding `struct page`.
Each memory model defines :c:func:`pfn_to_page` and :c:func:`page_to_pfn`
helpers that allow the conversion from PFN to `struct page` and vice
versa.
FLATMEM
=======
The simplest memory model is FLATMEM. This model is suitable for
non-NUMA systems with contiguous, or mostly contiguous, physical
memory.
In the FLATMEM memory model, there is a global `mem_map` array that
maps the entire physical memory. For most architectures, the holes
have entries in the `mem_map` array. The `struct page` objects
corresponding to the holes are never fully initialized.
docs/vm: update memory-models documentation To reflect the updates to free_area_init() family of functions. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Hoan Tran <hoan@os.amperecomputing.com> [arm64] Cc: Baoquan He <bhe@redhat.com> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200412194859.12663-22-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 15:58:22 -07:00
To allocate the `mem_map` array, architecture specific setup code should
call :c:func:`free_area_init` function. Yet, the mappings array is not
usable until the call to :c:func:`memblock_free_all` that hands all the
memory to the page allocator.
arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL ARM is the only architecture that defines CONFIG_ARCH_HAS_HOLES_MEMORYMODEL which in turn enables memmap_valid_within() function that is intended to verify existence of struct page associated with a pfn when there are holes in the memory map. However, the ARCH_HAS_HOLES_MEMORYMODEL also enables HAVE_ARCH_PFN_VALID and arch-specific pfn_valid() implementation that also deals with the holes in the memory map. The only two users of memmap_valid_within() call this function after a call to pfn_valid() so the memmap_valid_within() check becomes redundant. Remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL and memmap_valid_within() and rely entirely on ARM's implementation of pfn_valid() that is now enabled unconditionally. Link: https://lkml.kernel.org/r/20201101170454.9567-9-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matt Turner <mattst88@gmail.com> Cc: Meelis Roos <mroos@linux.ee> Cc: Michael Schmitz <schmitzmic@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-14 19:09:55 -08:00
An architecture may free parts of the `mem_map` array that do not cover the
actual physical pages. In such case, the architecture specific
:c:func:`pfn_valid` implementation should take the holes in the
`mem_map` into account.
With FLATMEM, the conversion between a PFN and the `struct page` is
straightforward: `PFN - ARCH_PFN_OFFSET` is an index to the
`mem_map` array.
The `ARCH_PFN_OFFSET` defines the first page frame number for
systems with physical memory starting at address different from 0.
SPARSEMEM
=========
SPARSEMEM is the most versatile memory model available in Linux and it
is the only memory model that supports several advanced features such
as hot-plug and hot-remove of the physical memory, alternative memory
maps for non-volatile memory devices and deferred initialization of
the memory map for larger systems.
The SPARSEMEM model presents the physical memory as a collection of
sections. A section is represented with struct mem_section
that contains `section_mem_map` that is, logically, a pointer to an
array of struct pages. However, it is stored with some other magic
that aids the sections management. The section size and maximal number
of section is specified using `SECTION_SIZE_BITS` and
`MAX_PHYSMEM_BITS` constants defined by each architecture that
supports SPARSEMEM. While `MAX_PHYSMEM_BITS` is an actual width of a
physical address that an architecture supports, the
`SECTION_SIZE_BITS` is an arbitrary value.
The maximal number of sections is denoted `NR_MEM_SECTIONS` and
defined as
.. math::
NR\_MEM\_SECTIONS = 2 ^ {(MAX\_PHYSMEM\_BITS - SECTION\_SIZE\_BITS)}
The `mem_section` objects are arranged in a two-dimensional array
called `mem_sections`. The size and placement of this array depend
on `CONFIG_SPARSEMEM_EXTREME` and the maximal possible number of
sections:
* When `CONFIG_SPARSEMEM_EXTREME` is disabled, the `mem_sections`
array is static and has `NR_MEM_SECTIONS` rows. Each row holds a
single `mem_section` object.
* When `CONFIG_SPARSEMEM_EXTREME` is enabled, the `mem_sections`
array is dynamically allocated. Each row contains PAGE_SIZE worth of
`mem_section` objects and the number of rows is calculated to fit
all the memory sections.
The architecture setup code should call sparse_init() to
initialize the memory sections and the memory maps.
With SPARSEMEM there are two possible ways to convert a PFN to the
corresponding `struct page` - a "classic sparse" and "sparse
vmemmap". The selection is made at build time and it is determined by
the value of `CONFIG_SPARSEMEM_VMEMMAP`.
The classic sparse encodes the section number of a page in page->flags
and uses high bits of a PFN to access the section that maps that page
frame. Inside a section, the PFN is the index to the array of pages.
The sparse vmemmap uses a virtually mapped memory map to optimize
pfn_to_page and page_to_pfn operations. There is a global `struct
page *vmemmap` pointer that points to a virtually contiguous array of
`struct page` objects. A PFN is an index to that array and the
offset of the `struct page` from `vmemmap` is the PFN of that
page.
To use vmemmap, an architecture has to reserve a range of virtual
addresses that will map the physical pages containing the memory
map and make sure that `vmemmap` points to that range. In addition,
the architecture should implement :c:func:`vmemmap_populate` method
that will allocate the physical memory and create page tables for the
virtual memory map. If an architecture does not have any special
requirements for the vmemmap mappings, it can use default
:c:func:`vmemmap_populate_basepages` provided by the generic memory
management.
The virtually mapped memory map allows storing `struct page` objects
for persistent memory devices in pre-allocated storage on those
devices. This storage is represented with struct vmem_altmap
that is eventually passed to vmemmap_populate() through a long chain
of function calls. The vmemmap_populate() implementation may use the
mm/sparsemem: enable vmem_altmap support in vmemmap_alloc_block_buf() There are many instances where vmemap allocation is often switched between regular memory and device memory just based on whether altmap is available or not. vmemmap_alloc_block_buf() is used in various platforms to allocate vmemmap mappings. Lets also enable it to handle altmap based device memory allocation along with existing regular memory allocations. This will help in avoiding the altmap based allocation switch in many places. To summarize there are two different methods to call vmemmap_alloc_block_buf(). vmemmap_alloc_block_buf(size, node, NULL) /* Allocate from system RAM */ vmemmap_alloc_block_buf(size, node, altmap) /* Allocate from altmap */ This converts altmap_alloc_block_buf() into a static function, drops it's entry from the header and updates Documentation/vm/memory-model.rst. Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Jia He <justin.he@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Will Deacon <will@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yu Zhao <yuzhao@google.com> Link: http://lkml.kernel.org/r/1594004178-8861-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-06 23:23:24 -07:00
`vmem_altmap` along with :c:func:`vmemmap_alloc_block_buf` helper to
allocate memory map on the persistent memory device.
ZONE_DEVICE
===========
The `ZONE_DEVICE` facility builds upon `SPARSEMEM_VMEMMAP` to offer
`struct page` `mem_map` services for device driver identified physical
address ranges. The "device" aspect of `ZONE_DEVICE` relates to the fact
that the page objects for these address ranges are never marked online,
and that a reference must be taken against the device, not just the page
to keep the memory pinned for active use. `ZONE_DEVICE`, via
:c:func:`devm_memremap_pages`, performs just enough memory hotplug to
turn on :c:func:`pfn_to_page`, :c:func:`page_to_pfn`, and
:c:func:`get_user_pages` service for the given range of pfns. Since the
page reference count never drops below 1 the page is never tracked as
free memory and the page's `struct list_head lru` space is repurposed
for back referencing to the host device / driver that mapped the memory.
While `SPARSEMEM` presents memory as a collection of sections,
optionally collected into memory blocks, `ZONE_DEVICE` users have a need
for smaller granularity of populating the `mem_map`. Given that
`ZONE_DEVICE` memory is never marked online it is subsequently never
subject to its memory ranges being exposed through the sysfs memory
hotplug api on memory block boundaries. The implementation relies on
this lack of user-api constraint to allow sub-section sized memory
ranges to be specified to :c:func:`arch_add_memory`, the top-half of
memory hotplug. Sub-section support allows for 2MB as the cross-arch
common alignment granularity for :c:func:`devm_memremap_pages`.
The users of `ZONE_DEVICE` are:
* pmem: Map platform persistent memory to be used as a direct-I/O target
via DAX mappings.
* hmm: Extend `ZONE_DEVICE` with `->page_fault()` and `->page_free()`
event callbacks to allow a device-driver to coordinate memory management
events related to device-memory, typically GPU memory. See
Documentation/mm/hmm.rst.
* p2pdma: Create `struct page` objects to allow peer devices in a
PCI/-E topology to coordinate direct-DMA operations between themselves,
i.e. bypass host memory.