IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Change names for interleave ways macros to clearly indicate which
variable is encoded and which is the actual ways value.
ways == interleave ways
eiw == encoded interleave ways
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167027516228.3124679.11265039496968588580.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Change names for granularity macros to clearly indicate which
variable is encoded and which is the actual granularity.
granularity == interleave granularity
eig == encoded interleave granularity
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167027493237.3124429.8948852388671827664.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pick up support for "XOR" interleave math when parsing ACPI CFMWS window
structures. Fix up conflicts with the RCH emulation already pending in
cxl/next.
tl;dr: Clean up an unnecessary export and enable cxl_test.
An RCD (Restricted CXL Device), in contrast to a typical CXL device in
a VH topology, obtains its component registers from the bottom half of
the associated CXL host bridge RCRB (Root Complex Register Block). In
turn this means that cxl_rcrb_to_component() needs to be called from
devm_cxl_add_endpoint().
Presently devm_cxl_add_endpoint() is part of the CXL core, but the only
user is the CXL mem module. Move it from cxl_core to cxl_mem to not only
get rid of an unnecessary export, but to also enable its call out to
cxl_rcrb_to_component(), in a subsequent patch, to be mocked by
cxl_test. Recall that cxl_test can only mock exported symbols, and since
cxl_rcrb_to_component() is itself inside the core, all callers must be
outside of cxl_core to allow cxl_test to mock it.
Reviewed-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993045072.1882361.13944923741276843683.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When the CFMWS is using XOR math, parse the corresponding
CXIMS structure and store the xormaps in the root decoder
structure. Use the xormaps in a new lookup, cxl_hb_xor(),
to find a targets entry in the host bridge interleave
target list.
Defined in CXL Specfication 3.0 Section: 9.17.1
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/5794813acdf7b67cfba3609c6aaff46932fa38d0.1669847017.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add nominal error handling that tears down CXL.mem in response to error
notifications that imply a device reset. Given some CXL.mem may be
operating as System RAM, there is a high likelihood that these error
events are fatal. However, if the system survives the notification the
expectation is that the driver behavior is equivalent to a hot-unplug
and re-plug of an endpoint.
Note that this does not change the mask values from the default. That
awaits CXL _OSC support to determine whether platform firmware is in
control of the mask registers.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166974413966.1608150.15522782911404473932.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The RAS Capability Structure has some ancillary information that may be
relevant with respect to AER events, link and protcol error status
registers. Map the RAS Capability Registers in support of defining a
'struct pci_error_handlers' instance for the cxl_pci driver.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166974412803.1608150.7096566580400947001.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The RAS Capabilitiy Structure is a CXL Component register capability
block. Unlike the HDM Decoder Capability, it will be referenced by the
cxl_pci driver in response to PCIe AER events. Due to this it is no
longer the case that cxl_map_component_regs() can assume that it should
map all component registers. Plumb a bitmask of capability ids to map
through cxl_map_component_regs().
For symmetry cxl_probe_device_regs() is updated to populate @id in
'struct cxl_reg_map' even though cxl_map_device_regs() does not have a
need to map a subset of the device registers per caller.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166974412214.1608150.11487843455070795378.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
A downstream port must be connected to a component register block.
For restricted hosts the base address is determined from the RCRB. The
RCRB is provided by the host's CEDT CHBS entry. Rework CEDT parser to
get the RCRB and add code to extract the component register block from
it.
RCRB's BAR[0..1] point to the component block containing CXL subsystem
component registers. MEMBAR extraction follows the PCI base spec here,
esp. 64 bit extraction and memory range alignment (6.0, 7.5.1.2.1). The
RCRB base address is cached in the cxl_dport per-host bridge so that the
upstream port component registers can be retrieved later by an RCD
(RCIEP) associated with the host bridge.
Note: Right now the component register block is used for HDM decoder
capability only which is optional for RCDs. If unsupported by the RCD,
the HDM init will fail. It is future work to bypass it in this case.
Co-developed-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/Y4dsGZ24aJlxSfI1@rric.localdomain
[djbw: introduce devm_cxl_add_rch_dport()]
Link: https://lore.kernel.org/r/166993044524.1882361.2539922887413208807.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
A "DPA invalidation event" is any scenario where the contents of a DPA
(Device Physical Address) is modified in a way that is incoherent with
CPU caches, or if the HPA (Host Physical Address) to DPA association
changes due to a remapping event.
PMEM security events like Unlock and Passphrase Secure Erase already
manage caches through LIBNVDIMM, so that leaves HPA to DPA remap events
that need cache management by the CXL core. Those only happen when the
boot time CXL configuration has changed. That event occurs when
userspace attaches an endpoint decoder to a region configuration, and
that region is subsequently activated.
The implications of not invalidating caches between remap events is that
reads from the region at different points in time may return different
results due to stale cached data from the previous HPA to DPA mapping.
Without a guarantee that the region contents after cxl_region_probe()
are written before being read (a layering-violation assumption that
cxl_region_probe() can not make) the CXL subsystem needs to ensure that
reads that precede writes see consistent results.
A CONFIG_CXL_REGION_INVALIDATION_TEST option is added to support debug
and unit testing of the CXL implementation in QEMU or other environments
where cpu_cache_has_invalidate_memregion() returns false. This may prove
too restrictive for QEMU where the HDM decoders are emulated, but in
that case the CXL subsystem needs some new mechanism / indication that
the HDM decoder is emulated and not a passthrough of real hardware.
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166993222098.1995348.16604163596374520890.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that cxl_nvdimm and cxl_pmem_region objects are torn down
sychronously with the removal of either the bridge, or an endpoint, the
cxl_pmem_wq infrastructure can be jettisoned.
Tested-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993042335.1882361.17022872468068436287.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The three objects 'struct cxl_nvdimm_bridge', 'struct cxl_nvdimm', and
'struct cxl_pmem_region' manage CXL persistent memory resources. The
bridge represents base platform resources, the nvdimm represents one or
more endpoints, and the region is a collection of nvdimms that
contribute to an assembled address range.
Their relationship is such that a region is torn down if any component
endpoints are removed. All regions and endpoints are torn down if the
foundational bridge device goes down.
A workqueue was deployed to manage these interdependencies, but it is
difficult to reason about, and fragile. A recent attempt to take the CXL
root device lock in the cxl_mem driver was reported by lockdep as
colliding with the flush_work() in the cxl_pmem flows.
Instead of the workqueue, arrange for all pmem/nvdimm devices to be torn
down immediately and hierarchically. A similar change is made to both
the 'cxl_nvdimm' and 'cxl_pmem_region' objects. For bisect-ability both
changes are made in the same patch which unfortunately makes the patch
bigger than desired.
Arrange for cxl_memdev and cxl_region to register a cxl_nvdimm and
cxl_pmem_region as a devres release action of the bridge device.
Additionally, include a devres release action of the cxl_memdev or
cxl_region device that triggers the bridge's release action if an endpoint
exits before the bridge. I.e. this allows either unplugging the bridge,
or unplugging and endpoint to result in the same cleanup actions.
To keep the patch smaller the cleanup of the now defunct workqueue
infrastructure is saved for a follow-on patch.
Tested-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993041773.1882361.16444301376147207609.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that a cxl_nvdimm object can only experience ->remove() via an
unregistration event (because the cxl_nvdimm bind attributes are
suppressed), additional cleanups are possible.
It is already the case that the removal of a cxl_memdev object triggers
->remove() on any associated region. With that mechanism in place there
is no need for the cxl_nvdimm removal to trigger the same. Just rely on
cxl_region_detach() to tear down the whole cxl_pmem_region.
Tested-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993041215.1882361.6321535567798911286.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When reviewing the CFMWS parsing code that deals with the HDM decoders,
I noticed a couple of magic numbers. This commit replaces these magic numbers
with constants defined by the CXL 3.0 specification.
v2:
- Change references to CXL 3.0 specification (David)
- CXL_DECODER_MAX_GRANULARITY_ORDER -> CXL_DECODER_MAX_ENCODED_IG (Dan)
Signed-off-by: Adam Manzanares <a.manzanares@samsung.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20220829220249.243888-1-a.manzanares@samsung.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The function devm_cxl_iomap_block() is only used in the core
code. There are two declarations in header files of it, in
drivers/cxl/core/core.h and drivers/cxl/cxl.h. Remove its unused
declaration in drivers/cxl/cxl.h.
Fixing build error in regs.c found by kernel test robot by including
"core.h" there.
Signed-off-by: Robert Richter <rrichter@amd.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20221018132341.76259-2-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When programming port decode targets, the algorithm wants to ensure that
two devices are compatible to be programmed as peers beneath a given
port. A compatible peer is a target that shares the same dport, and
where that target's interleave position also routes it to the same
dport. Compatibility is determined by the device's interleave position
being >= to distance. For example, if a given dport can only map every
Nth position then positions less than N away from the last target
programmed are incompatible.
The @distance for the host-bridge's cxl_port in a simple dual-ported
host-bridge configuration with 2 direct-attached devices is 1, i.e. An
x2 region divided by 2 dports to reach 2 region targets.
An x4 region under an x2 host-bridge would need 2 intervening switches
where the @distance at the host bridge level is 2 (x4 region divided by
2 switches to reach 4 devices).
However, the distance between peers underneath a single ported
host-bridge is always zero because there is no limit to the number of
devices that can be mapped. In other words, there are no decoders to
program in a passthrough, all descendants are mapped and distance only
starts matters for the intervening descendant ports of the passthrough
port.
Add tracking for the number of dports mapped to a port, and use that to
detect the passthrough case for calculating @distance.
Cc: <stable@vger.kernel.org>
Reported-by: Bobo WL <lmw.bobo@gmail.com>
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: http://lore.kernel.org/r/20221010172057.00001559@huawei.com
Fixes: 27b3f8d13830 ("cxl/region: Program target lists")
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/166752185440.947915.6617495912508299445.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When a cxl_nvdimm object goes through a ->remove() event (device
physically removed, nvdimm-bridge disabled, or nvdimm device disabled),
then any associated regions must also be disabled. As highlighted by the
cxl-create-region.sh test [1], a single device may host multiple
regions, but the driver was only tracking one region at a time. This
leads to a situation where only the last enabled region per nvdimm
device is cleaned up properly. Other regions are leaked, and this also
causes cxl_memdev reference leaks.
Fix the tracking by allowing cxl_nvdimm objects to track multiple region
associations.
Cc: <stable@vger.kernel.org>
Link: https://github.com/pmem/ndctl/blob/main/test/cxl-create-region.sh [1]
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Fixes: 04ad63f086d1 ("cxl/region: Introduce cxl_pmem_region objects")
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/166752183647.947915.2045230911503793901.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Sphinx reported undescribed parameters in cxl_region_params struct:
./drivers/cxl/cxl.h:376: warning: Function parameter or member 'targets' not described in 'cxl_region_params'
./drivers/cxl/cxl.h:376: warning: Function parameter or member 'nr_targets' not described in 'cxl_region_params'
Describe these members.
Fixes: b9686e8c8e39 ("cxl/region: Enable the assignment of endpoint decoders to regions")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220804075448.98241-3-bagasdotme@gmail.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The kernel enforces that region granularity is >= to the top-level
interleave-granularity for the given CXL window. However, when the CXL
window interleave is x1, i.e. non-interleaved at the host bridge level,
then the specified granularity does not matter. Override the window
specified granularity to the CXL minimum so that any valid region
granularity is >= to the root granularity.
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/165853776917.2430596.16823264262010844458.stgit@dwillia2-xfh.jf.intel.com
[djbw: add CXL_DECODER_MIN_GRANULARITY per vishal]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The "ways" variable comes from the user. The ways_to_cxl() function
has an upper bound but it doesn't check for negatives. Make
the "ways" variable an unsigned int to fix this bug.
Fixes: 80d10a6cee05 ("cxl/region: Add interleave geometry attributes")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/Yueo3NV2hFCXx1iV@kili
[djbw: fixup interleave_ways_store() to only accept unsigned input]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The LIBNVDIMM subsystem is a platform agnostic representation of system
NVDIMM / persistent memory resources. To date, the CXL subsystem's
interaction with LIBNVDIMM has been to register an nvdimm-bridge device
and cxl_nvdimm objects to proxy CXL capabilities into existing LIBNVDIMM
subsystem mechanics.
With regions the approach is the same. Create a new cxl_pmem_region
object to proxy CXL region details into a LIBNVDIMM definition. With
this enabling LIBNVDIMM can partition CXL persistent memory regions with
legacy namespace labels. A follow-on patch will add CXL region label and
CXL namespace label support to persist region configurations across
driver reload / system-reset events.
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784340111.1758207.3036498385188290968.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Be careful to only disable cxl_pmem objects related to a given
cxl_nvdimm_bridge. Otherwise, offline_nvdimm_bus() reaches across CXL
domains and disables more than is expected.
Fixes: 21083f51521f ("cxl/pmem: Register 'pmem' / cxl_nvdimm devices")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784339569.1758207.1557084545278004577.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The CXL region driver is responsible for routing fully formed CXL
regions to one of libnvdimm, for persistent memory regions, device-dax
for volatile memory regions, or just act as an enumeration placeholder
if the region was setup and configuration locked by platform firmware.
In the platform-firmware-setup case the expectation is that region is
already accounted in the system memory map, i.e. already enabled as
"System RAM".
For now, just attach to CXL regions in the CXL_CONFIG_COMMIT state, and
take no further action.
Given this driver is just a small / simple router, include it in the
core rather than its own module.
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220624041950.559155-18-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
After all the soft validation of the region has completed, convey the
region configuration to hardware while being careful to commit decoders
in specification mandated order. In addition to programming the endpoint
decoder base-address, interleave ways and granularity, the switch
decoder target lists are also established.
While the kernel can enforce spec-mandated commit order, it can not
enforce spec-mandated reset order. For example, the kernel can't stop
someone from removing an endpoint device that is occupying decoderN in a
switch decoder where decoderN+1 is also committed. To reset decoderN,
decoderN+1 must be torn down first. That "tear down the world"
implementation is saved for a follow-on patch.
Callback operations are provided for the 'commit' and 'reset'
operations. While those callbacks may prove useful for CXL accelerators
(Type-2 devices with memory) the primary motivation is to enable a
simple way for cxl_test to intercept those operations.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784338418.1758207.14659830845389904356.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Once the region's interleave geometry (ways, granularity, size) is
established and all the endpoint decoder targets are assigned, the next
phase is to program all the intermediate decoders. Specifically, each
CXL switch in the path between the endpoint and its CXL host-bridge
(including the logical switch internal to the host-bridge) needs to have
its decoders programmed and the target list order assigned.
The difficulty in this implementation lies in determining which endpoint
decoder ordering combinations are valid. Consider the cxl_test case of 2
host bridges, each of those host-bridges attached to 2 switches, and
each of those switches attached to 2 endpoints for a potential 8-way
interleave. The x2 interleave at the host-bridge level requires that all
even numbered endpoint decoder positions be located on the "left" hand
side of the topology tree, and the odd numbered positions on the other.
The endpoints that are peers on the same switch need to have a position
that can be routed with a dedicated address bit per-endpoint. See
check_last_peer() for the details.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784337827.1758207.132121746122685208.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL regions (interleave sets) are made up of a set of memory devices
where each device maps a portion of the interleave with one of its
decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure).
As endpoint decoders are identified by a provisioning tool they can be
added to a region provided the region interleave properties are set
(way, granularity, HPA) and DPA has been assigned to the decoder.
The attach event triggers several validation checks, for example:
- is the DPA sized appropriately for the region
- is the decoder reachable via the host-bridges identified by the
region's root decoder
- is the device already active in a different region position slot
- are there already regions with a higher HPA active on a given port
(per CXL 2.0 8.2.5.12.20 Committing Decoder Programming)
...and the attach event affords an opportunity to collect data and
resources relevant to later programming the target lists in switch
decoders, for example:
- allocate a decoder at each cxl_port in the decode chain
- for a given switch port, how many the region's endpoints are hosted
through the port
- how many unique targets (next hops) does a port need to map to reach
those endpoints
The act of reconciling this information and deploying it to the decoder
configuration is saved for a follow-on patch.
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784337277.1758207.4108508181328528703.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The ACPI CXL Fixed Memory Window Structure (CFMWS) defines multiple
methods to determine which host bridge provides access to a given
endpoint relative to that device's position in the interleave. The
"Interleave Arithmetic" defines either a "standard modulo" /
round-random algorithm, or "xormap" based algorithm which can be defined
as a non-linear transform. Given that there are already more options
beyond "standard modulo" and that "xormap" may turn out to be ACPI CXL
specific, provide a callback for the region provisioning code to map
endpoint positions back to expected host bridge id (cxl_dport target).
For now just support the simple modulo math case and save the xormap for
a follow-on change.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220624041950.559155-14-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The region provisioning process involves allocating DPA to a set of
endpoint decoders, and HPA plus the region geometry to a region device.
Then the decoder is assigned to the region. At this point several
validation steps can be performed to validate that the decoder is
suitable to participate in the region.
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/165784336184.1758207.16403282029203949622.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
After a region's interleave parameters (ways and granularity) are set,
add a way for regions to allocate HPA (host physical address space) from
the free capacity in their parent root-decoder. The allocator for this
capacity reuses the 'struct resource' based allocator used for
CONFIG_DEVICE_PRIVATE.
Once the tuple of "ways, granularity, [uuid], and size" is set the
region configuration transitions to the CXL_CONFIG_INTERLEAVE_ACTIVE
state which is a precursor to allowing endpoint decoders to be added to
a region.
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784335630.1758207.420216490941955417.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add ABI to allow the number of devices that comprise a region to be
set as well as the interleave granularity for the region.
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
[djbw: reword changelog]
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220624041950.559155-11-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The process of provisioning a region involves triggering the creation of
a new region object, pouring in the configuration, and then binding that
configured object to the region driver to start its operation. For
persistent memory regions the CXL specification mandates that it
identified by a uuid. Add an ABI for userspace to specify a region's
uuid.
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
[djbw: simplify locking]
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784334465.1758207.8224025435884752570.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The port scanning algorithm in devm_cxl_enumerate_ports() walks up the
topology and adds cxl_port objects starting from the root down to the
endpoint. When those ports are initially created they know all their
dports, but they do not know the downstream cxl_port instance that
represents the next descendant in the topology. Rework create_endpoint()
into devm_cxl_add_endpoint() that enumerates the downstream cxl_port
topology into each port's 'struct cxl_ep' record for each endpoint it
that the port is an ancestor.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220624041950.559155-7-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reduce the complexity and the overhead of walking the topology to
determine endpoint connectivity to root decoder interleave
configurations.
Note that cxl_detach_ep(), after it determines that the last @ep has
departed and decides to delete the port, now needs to walk the dport
array with the device_lock() held to remove entries. Previously
list_splice_init() could be used atomically delete all dport entries at
once and then perform entry tear down outside the lock. There is no
list_splice_init() equivalent for the xarray.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784331647.1758207.6345820282285119339.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In preparation for region provisioning that needs to walk the topology
by endpoints, use an xarray to record endpoint interest in a given port.
In addition to being more space and time efficient it also reduces the
complexity of the implementation by moving locking internal to the
xarray implementation. It also allows for a single cxl_ep reference to
be recorded in multiple xarrays.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220624041950.559155-2-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
At the time that cxl_port instances are being created, cache the dport
from the parent port that points to this new child port. This will be
useful for region provisioning when walking the tree to calculate
decoder targets, and saves rewalking the dport list after the fact to
build this information.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220624041950.559155-1-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Recall that the primary role of the cxl_mem driver is to probe if the
given endpoint is connected to a CXL port topology. In that process it
walks its device ancestry to its PCI root port. If that root port is
also a CXL root port then the probe process adds cxl_port object
instances at switch in the path between to the root and the endpoint. As
those cxl_port instances are added, or if a previous enumeration
attempt already created the port, a 'struct cxl_ep' instance is
registered with that port to track the endpoints interested in that
port.
At the time the cxl_ep is registered the downstream egress path from the
port to the endpoint is known. Take the opportunity to record that
information as it will be needed for dynamic programming of decoder
targets during region provisioning.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784329944.1758207.15203961796832072116.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The CXL specification enforces that endpoint decoders are committed in
hw instance id order. In preparation for adding dynamic DPA allocation,
record the hw instance id in endpoint decoders, and enforce allocations
to occur in hw instance id order.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784328827.1758207.9627538529944559954.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Recall that the Device Physical Address (DPA) space of a CXL Memory
Expander is potentially partitioned into a volatile and persistent
portion. A decoder maps a Host Physical Address (HPA) range to a DPA
range and that translation depends on the value of all previous (lower
instance number) decoders before the current one.
In preparation for allowing dynamic provisioning of regions, decoders
need an ABI to indicate which DPA partition a decoder targets. This ABI
needs to be prepared for the possibility that some other agent committed
and locked a decoder that spans the partition boundary.
Add 'decoderX.Y/mode' to endpoint decoders that indicates which
partition 'ram' / 'pmem' the decoder targets, or 'mixed' if the decoder
currently spans the partition boundary.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165603881967.551046.6007594190951596439.stgit@dwillia2-xfh
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In preparation for provisioning CXL regions, add accounting for the DPA
space consumed by existing regions / decoders. Recall, a CXL region is a
memory range comprised from one or more endpoint devices contributing a
mapping of their DPA into HPA space through a decoder.
Record the DPA ranges covered by committed decoders at initial probe of
endpoint ports relative to a per-device resource tree of the DPA type
(pmem or volatile-ram).
The cxl_dpa_rwsem semaphore is introduced to globally synchronize DPA
state across all endpoints and their decoders at once. The vast majority
of DPA operations are reads as region creation is expected to be as rare
as disk partitioning and volume creation. The device_lock() for this
synchronization is specifically avoided for concern of entangling with
sysfs attribute removal.
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784327682.1758207.7914919426043855876.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Previously the target routing specifics of switch decoders and platform
CXL window resource tracking of root decoders were factored out of
'struct cxl_decoder'. While switch decoders translate from SPA to
downstream ports, endpoint decoders translate from SPA to DPA.
This patch, 3 of 3, adds a 'struct cxl_endpoint_decoder' that tracks an
endpoint-specific Device Physical Address (DPA) resource. For now this
just defines ->dpa_res, a follow-on patch will handle requesting DPA
resource ranges from a device-DPA resource tree.
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784327088.1758207.15502834501671201192.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Previously the target routing specifics of switch decoders were factored
out of 'struct cxl_decoder' into 'struct cxl_switch_decoder'.
This patch, 2 of 3, adds a 'struct cxl_root_decoder' as a superset of a
switch decoder that also track the associated CXL window platform
resource.
Note that the reason the resource for a given root decoder needs to be
looked up after the fact (i.e. after cxl_parse_cfmws() and
add_cxl_resource()) is because add_cxl_resource() may have merged CXL
windows in order to keep them at the top of the resource tree / decode
hierarchy.
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784326541.1758207.9915663937394448341.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Currently 'struct cxl_decoder' contains the superset of attributes
needed for all decoder types. Before more type-specific attributes are
added to the common definition, reorganize 'struct cxl_decoder' into type
specific objects.
This patch, the first of three, factors out a cxl_switch_decoder type.
See the new kdoc for what a 'struct cxl_switch_decoder' represents in a
CXL topology.
Co-developed-by: Ben Widawsky <bwidawsk@kernel.org>
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/165784325340.1758207.5064717153608954960.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The per-device CDAT data provides performance data that is relevant for
mapping which CXL devices can participate in which CXL ranges by QTG
(QoS Throttling Group) (per ECN: CXL 2.0 CEDT CFMWS & QTG_DSM) [1]. The
QTG association specified in the ECN is advisory. Until the
cxl_acpi driver grows support for invoking the QTG _DSM method the CDAT
data is only of interest to userspace that may need it for debug
purposes.
Search the DOE mailboxes available, query CDAT data, cache the data and
make it available via a sysfs binary attribute per endpoint at:
/sys/bus/cxl/devices/endpointX/CDAT
...similar to other ACPI-structured table data in
/sys/firmware/ACPI/tables. The CDAT is relative to 'struct cxl_port'
objects since switches in addition to endpoints can host a CDAT
instance. Switch CDAT support is not implemented.
This does not support table updates at runtime. It will always provide
whatever was there when first cached. It is also the case that table
updates are not expected outside of explicit DPA address map affecting
commands like Set Partition with the immediate flag set. Given that the
driver does not support Set Partition with the immediate flag set there
is no current need for update support.
Link: https://www.computeexpresslink.org/spec-landing [1]
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
[djbw: drop in-kernel parsing infra for now, and other minor fixups]
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220719205249.566684-7-ira.weiny@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
While there is a need to go from a LIBNVDIMM 'struct nvdimm' to a CXL
'struct cxl_nvdimm', there is no use case to go the other direction.
Likely this is a leftover from an early version of the referenced commit
before it implemented devm for releasing the created nvdimm.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20220624041950.559155-19-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>