linux/include
Robert Jennings a90ab95a95 powerpc/pseries: vio bus support for CMO
This is a large patch but the normal code path is not affected.  For
non-pSeries platforms the code is ifdef'ed out and for non-CMO enabled
pSeries systems this does not affect the normal code path.  Devices that
do not perform DMA operations do not need modification with this patch.
The function get_desired_dma was renamed from get_io_entitlement for
clarity.

Overview

Cooperative Memory Overcommitment (CMO) allows for a set of OS partitions
to be run with less RAM than the aggregate needs of the group of
partitions.  The firmware will balance memory between the partitions
and page in/out memory as needed.  Based on the number and type of IO
adpaters preset each partition is allocated an amount of memory for
DMA operations and this allocation will be guaranteed to the partition;
this is referred to as the partition's 'entitlement'.

Partitions running in a CMO environment can only have virtual IO devices
present.  The VIO bus layer will manage the IO entitlement for the system.
Accounting, at a system and per-device level, is tracked in the VIO bus
code and exposed via sysfs.  A set of dma_ops functions are added to
the bus to allow for this accounting.

Bus initialization

At initialization, the bus will calculate the minimum needs of the system
based on providing each device present with a standard minimum entitlement
along with a spare allocation for the bus to handle hotplug events.
If the minimum needs can not be met the system boot will be halted.

Device changes

The significant changes for devices while running under CMO are that the
devices must specify how much dedicated IO entitlement they desire and
must also handle DMA mapping errors that can occur due to constrained
IO memory.  The virtual IO drivers are modified to silence errors when
DMA mappings fail for CMO and handle these failures gracefully.

Each devices will be guaranteed a minimum entitlement that can always
be mapped.  Devices will specify how much entitlement they desire and
the VIO bus will attempt to provide for this.  Devices can change their
desired entitlement level at any point in time to address particular needs
(via vio_cmo_set_dev_desired()), not just at device probe time.

VIO bus changes

The system will have a particular entitlement level available from which
it can provide memory to the devices.  The bus defines two pools of memory
within this entitlement, the reserved and excess pools.  Each device is
provided with it's own entitlement no less than a system defined minimum
entitlement and no greater than what the device has specified as it's
desired entitlement.  The entitlement provided to devices comes from the
reserve pool.  The reserve pool can also contain a spare allocation as
large as the system defined minimum entitlement which is used for device
hotplug events.  Any entitlement not needed to fulfill the needs of a
reserve pool is placed in the excess pool.  Each device is guaranteed
that it can map up to it's entitled level; additional mapping are possible
as long as there is unmapped memory in the excess pool.

Bus probe

As the system starts, each device is given an entitlement equal only
to the system defined minimum entitlement.  The reserve pool is equal
to the sum of these entitlements, plus a spare allocation.  The VIO bus
also tracks the aggregate desired entitlement of all the devices.  If the
system desired entitlement is greater than the size of the reserve pool,
when devices unmap IO memory it will be reserved and a balance operation
will be scheduled for some time in the future.

Entitlement balancing

The balance function tries to fairly distribute entitlement between the
devices in the system with the goal of providing each device with it's
desired amount of entitlement.  Devices using more than what would be
ideal will have their entitled set-point adjusted; this will effectively
set a goal for lower IO memory usage as future mappings can fail and
deallocations will trigger a balance operation to distribute the newly
unmapped memory.  A fair distribution of entitlement can take several
balance operations to achieve.  Entitlement changes and device DLPAR
events will alter the state of CMO and will trigger balance operations.

Hotplug events

The VIO bus allows for changes in system entitlement at run-time via
'vio_cmo_entitlement_update()'.  When devices are added the hotplug
device event will be preceded by a system entitlement increase and this
is reversed when devices are removed.

The following changes are made that the VIO bus layer for CMO:
 * add IO memory accounting per device structure.
 * add IO memory entitlement query function to driver structure.
 * during vio bus probe, if CMO is enabled, check that driver has
   memory entitlement query function defined.  Fail if function not defined.
 * fail to register driver if io entitlement function not defined.
 * create set of dma_ops at vio level for CMO that will track allocations
   and return DMA failures once entitlement is reached.  Entitlement will
   limited by overall system entitlement.  Devices will have a reserved
   quantity of memory that is guaranteed, the rest can be used as available.
 * expose entitlement, current allocation, desired allocation, and the
   allocation error counter for devices to the user through sysfs
 * provide mechanism for changing a device's desired entitlement at run time
   for devices as an exported function and sysfs tunable
 * track any DMA failures for entitled IO memory for each vio device.
 * check entitlement against available system entitlement on device add
 * track entitlement metrics (high water mark, current usage)
 * provide function to reset high water mark
 * provide minimum and desired entitlement numbers at a bus level
 * provide drivers with a minimum guaranteed entitlement
 * balance available entitlement between devices to satisfy their needs
 * handle system entitlement changes and device hotplug

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-25 15:44:43 +10:00
..
acpi Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 2008-07-16 17:25:46 -07:00
asm-alpha Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 2008-07-24 14:55:09 -07:00
asm-arm Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 2008-07-24 14:55:09 -07:00
asm-avr32 Merge branch 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc 2008-07-24 12:24:40 -07:00
asm-blackfin Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 2008-07-24 14:55:09 -07:00
asm-cris Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 2008-07-24 14:55:09 -07:00
asm-frv Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 2008-07-24 14:55:09 -07:00
asm-generic Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 2008-07-16 17:25:46 -07:00
asm-h8300 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 2008-07-24 14:55:09 -07:00
asm-ia64 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 2008-07-24 14:55:09 -07:00
asm-m32r Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 2008-07-24 14:55:09 -07:00
asm-m68k Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 2008-07-24 14:55:09 -07:00
asm-m68knommu Merge branch 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc 2008-07-24 12:24:40 -07:00
asm-mips Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 2008-07-24 14:55:09 -07:00
asm-mn10300 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 2008-07-24 14:55:09 -07:00
asm-parisc Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 2008-07-24 14:55:09 -07:00
asm-powerpc powerpc/pseries: vio bus support for CMO 2008-07-25 15:44:43 +10:00
asm-s390 Merge branch 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc 2008-07-24 12:24:40 -07:00
asm-sh Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 2008-07-24 14:55:09 -07:00
asm-sparc Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 2008-07-24 14:55:09 -07:00
asm-sparc64 Remove asm/semaphore.h 2008-07-24 08:31:12 -04:00
asm-um Merge branch 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc 2008-07-24 12:24:40 -07:00
asm-v850 remove the v850 port 2008-07-24 10:47:24 -07:00
asm-x86 x86-64: Clean up 'save/restore_i387()' usage 2008-07-24 16:12:40 -07:00
asm-xtensa Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 2008-07-24 14:55:09 -07:00
crypto crypto: hash - Move ahash functions into crypto/hash.h 2008-07-10 20:35:18 +08:00
drm drm/radeon: fixup issue with radeon and PAT support. 2008-07-15 15:48:05 +10:00
keys
linux ELF loader support for auxvec base platform string 2008-07-25 15:44:39 +10:00
math-emu
media V4L/DVB (8395): saa7134: Fix Kbuild dependency of ir-kbd-i2c 2008-07-20 07:29:03 -03:00
mtd
net ipv6: icmp6_dst_gc return change 2008-07-22 14:35:50 -07:00
pcmcia
rdma RDMA/cma: Add RDMA_CM_EVENT_TIMEWAIT_EXIT event 2008-07-22 14:14:23 -07:00
rxrpc
scsi driver core: remove KOBJ_NAME_LEN define 2008-07-21 21:54:52 -07:00
sound ALSA: Release v1.0.17 2008-07-14 09:54:43 +02:00
video neofb: drop the xtimings structure 2008-07-24 10:47:41 -07:00
xen xen: implement Xen-specific spinlocks 2008-07-16 11:15:53 +02:00
Kbuild drm: reorganise drm tree to be more future proof. 2008-07-14 10:45:01 +10:00