Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next
Pull sparc updates from David Miller: "Of note is the addition of a driver for the Data Analytics Accelerator, and some small cleanups" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: oradax: Fix return value check in dax_attach() sparc: vDSO: remove an extra tab sparc64: drop unneeded compat include sparc64: Oracle DAX driver sparc64: Oracle DAX infrastructure
This commit is contained in:
commit
ba49097e1d
1433
Documentation/sparc/oradax/dax-hv-api.txt
Normal file
1433
Documentation/sparc/oradax/dax-hv-api.txt
Normal file
File diff suppressed because it is too large
Load Diff
429
Documentation/sparc/oradax/oracle-dax.txt
Normal file
429
Documentation/sparc/oradax/oracle-dax.txt
Normal file
@ -0,0 +1,429 @@
|
||||
Oracle Data Analytics Accelerator (DAX)
|
||||
---------------------------------------
|
||||
|
||||
DAX is a coprocessor which resides on the SPARC M7 (DAX1) and M8
|
||||
(DAX2) processor chips, and has direct access to the CPU's L3 caches
|
||||
as well as physical memory. It can perform several operations on data
|
||||
streams with various input and output formats. A driver provides a
|
||||
transport mechanism and has limited knowledge of the various opcodes
|
||||
and data formats. A user space library provides high level services
|
||||
and translates these into low level commands which are then passed
|
||||
into the driver and subsequently the Hypervisor and the coprocessor.
|
||||
The library is the recommended way for applications to use the
|
||||
coprocessor, and the driver interface is not intended for general use.
|
||||
This document describes the general flow of the driver, its
|
||||
structures, and its programmatic interface. It also provides example
|
||||
code sufficient to write user or kernel applications that use DAX
|
||||
functionality.
|
||||
|
||||
The user library is open source and available at:
|
||||
https://oss.oracle.com/git/gitweb.cgi?p=libdax.git
|
||||
|
||||
The Hypervisor interface to the coprocessor is described in detail in
|
||||
the accompanying document, dax-hv-api.txt, which is a plain text
|
||||
excerpt of the (Oracle internal) "UltraSPARC Virtual Machine
|
||||
Specification" version 3.0.20+15, dated 2017-09-25.
|
||||
|
||||
|
||||
High Level Overview
|
||||
-------------------
|
||||
|
||||
A coprocessor request is described by a Command Control Block
|
||||
(CCB). The CCB contains an opcode and various parameters. The opcode
|
||||
specifies what operation is to be done, and the parameters specify
|
||||
options, flags, sizes, and addresses. The CCB (or an array of CCBs)
|
||||
is passed to the Hypervisor, which handles queueing and scheduling of
|
||||
requests to the available coprocessor execution units. A status code
|
||||
returned indicates if the request was submitted successfully or if
|
||||
there was an error. One of the addresses given in each CCB is a
|
||||
pointer to a "completion area", which is a 128 byte memory block that
|
||||
is written by the coprocessor to provide execution status. No
|
||||
interrupt is generated upon completion; the completion area must be
|
||||
polled by software to find out when a transaction has finished, but
|
||||
the M7 and later processors provide a mechanism to pause the virtual
|
||||
processor until the completion status has been updated by the
|
||||
coprocessor. This is done using the monitored load and mwait
|
||||
instructions, which are described in more detail later. The DAX
|
||||
coprocessor was designed so that after a request is submitted, the
|
||||
kernel is no longer involved in the processing of it. The polling is
|
||||
done at the user level, which results in almost zero latency between
|
||||
completion of a request and resumption of execution of the requesting
|
||||
thread.
|
||||
|
||||
|
||||
Addressing Memory
|
||||
-----------------
|
||||
|
||||
The kernel does not have access to physical memory in the Sun4v
|
||||
architecture, as there is an additional level of memory virtualization
|
||||
present. This intermediate level is called "real" memory, and the
|
||||
kernel treats this as if it were physical. The Hypervisor handles the
|
||||
translations between real memory and physical so that each logical
|
||||
domain (LDOM) can have a partition of physical memory that is isolated
|
||||
from that of other LDOMs. When the kernel sets up a virtual mapping,
|
||||
it specifies a virtual address and the real address to which it should
|
||||
be mapped.
|
||||
|
||||
The DAX coprocessor can only operate on physical memory, so before a
|
||||
request can be fed to the coprocessor, all the addresses in a CCB must
|
||||
be converted into physical addresses. The kernel cannot do this since
|
||||
it has no visibility into physical addresses. So a CCB may contain
|
||||
either the virtual or real addresses of the buffers or a combination
|
||||
of them. An "address type" field is available for each address that
|
||||
may be given in the CCB. In all cases, the Hypervisor will translate
|
||||
all the addresses to physical before dispatching to hardware. Address
|
||||
translations are performed using the context of the process initiating
|
||||
the request.
|
||||
|
||||
|
||||
The Driver API
|
||||
--------------
|
||||
|
||||
An application makes requests to the driver via the write() system
|
||||
call, and gets results (if any) via read(). The completion areas are
|
||||
made accessible via mmap(), and are read-only for the application.
|
||||
|
||||
The request may either be an immediate command or an array of CCBs to
|
||||
be submitted to the hardware.
|
||||
|
||||
Each open instance of the device is exclusive to the thread that
|
||||
opened it, and must be used by that thread for all subsequent
|
||||
operations. The driver open function creates a new context for the
|
||||
thread and initializes it for use. This context contains pointers and
|
||||
values used internally by the driver to keep track of submitted
|
||||
requests. The completion area buffer is also allocated, and this is
|
||||
large enough to contain the completion areas for many concurrent
|
||||
requests. When the device is closed, any outstanding transactions are
|
||||
flushed and the context is cleaned up.
|
||||
|
||||
On a DAX1 system (M7), the device will be called "oradax1", while on a
|
||||
DAX2 system (M8) it will be "oradax2". If an application requires one
|
||||
or the other, it should simply attempt to open the appropriate
|
||||
device. Only one of the devices will exist on any given system, so the
|
||||
name can be used to determine what the platform supports.
|
||||
|
||||
The immediate commands are CCB_DEQUEUE, CCB_KILL, and CCB_INFO. For
|
||||
all of these, success is indicated by a return value from write()
|
||||
equal to the number of bytes given in the call. Otherwise -1 is
|
||||
returned and errno is set.
|
||||
|
||||
CCB_DEQUEUE
|
||||
|
||||
Tells the driver to clean up resources associated with past
|
||||
requests. Since no interrupt is generated upon the completion of a
|
||||
request, the driver must be told when it may reclaim resources. No
|
||||
further status information is returned, so the user should not
|
||||
subsequently call read().
|
||||
|
||||
CCB_KILL
|
||||
|
||||
Kills a CCB during execution. The CCB is guaranteed to not continue
|
||||
executing once this call returns successfully. On success, read() must
|
||||
be called to retrieve the result of the action.
|
||||
|
||||
CCB_INFO
|
||||
|
||||
Retrieves information about a currently executing CCB. Note that some
|
||||
Hypervisors might return 'notfound' when the CCB is in 'inprogress'
|
||||
state. To ensure a CCB in the 'notfound' state will never be executed,
|
||||
CCB_KILL must be invoked on that CCB. Upon success, read() must be
|
||||
called to retrieve the details of the action.
|
||||
|
||||
Submission of an array of CCBs for execution
|
||||
|
||||
A write() whose length is a multiple of the CCB size is treated as a
|
||||
submit operation. The file offset is treated as the index of the
|
||||
completion area to use, and may be set via lseek() or using the
|
||||
pwrite() system call. If -1 is returned then errno is set to indicate
|
||||
the error. Otherwise, the return value is the length of the array that
|
||||
was actually accepted by the coprocessor. If the accepted length is
|
||||
equal to the requested length, then the submission was completely
|
||||
successful and there is no further status needed; hence, the user
|
||||
should not subsequently call read(). Partial acceptance of the CCB
|
||||
array is indicated by a return value less than the requested length,
|
||||
and read() must be called to retrieve further status information. The
|
||||
status will reflect the error caused by the first CCB that was not
|
||||
accepted, and status_data will provide additional data in some cases.
|
||||
|
||||
MMAP
|
||||
|
||||
The mmap() function provides access to the completion area allocated
|
||||
in the driver. Note that the completion area is not writeable by the
|
||||
user process, and the mmap call must not specify PROT_WRITE.
|
||||
|
||||
|
||||
Completion of a Request
|
||||
-----------------------
|
||||
|
||||
The first byte in each completion area is the command status which is
|
||||
updated by the coprocessor hardware. Software may take advantage of
|
||||
new M7/M8 processor capabilities to efficiently poll this status byte.
|
||||
First, a "monitored load" is achieved via a Load from Alternate Space
|
||||
(ldxa, lduba, etc.) with ASI 0x84 (ASI_MONITOR_PRIMARY). Second, a
|
||||
"monitored wait" is achieved via the mwait instruction (a write to
|
||||
%asr28). This instruction is like pause in that it suspends execution
|
||||
of the virtual processor for the given number of nanoseconds, but in
|
||||
addition will terminate early when one of several events occur. If the
|
||||
block of data containing the monitored location is modified, then the
|
||||
mwait terminates. This causes software to resume execution immediately
|
||||
(without a context switch or kernel to user transition) after a
|
||||
transaction completes. Thus the latency between transaction completion
|
||||
and resumption of execution may be just a few nanoseconds.
|
||||
|
||||
|
||||
Application Life Cycle of a DAX Submission
|
||||
------------------------------------------
|
||||
|
||||
- open dax device
|
||||
- call mmap() to get the completion area address
|
||||
- allocate a CCB and fill in the opcode, flags, parameters, addresses, etc.
|
||||
- submit CCB via write() or pwrite()
|
||||
- go into a loop executing monitored load + monitored wait and
|
||||
terminate when the command status indicates the request is complete
|
||||
(CCB_KILL or CCB_INFO may be used any time as necessary)
|
||||
- perform a CCB_DEQUEUE
|
||||
- call munmap() for completion area
|
||||
- close the dax device
|
||||
|
||||
|
||||
Memory Constraints
|
||||
------------------
|
||||
|
||||
The DAX hardware operates only on physical addresses. Therefore, it is
|
||||
not aware of virtual memory mappings and the discontiguities that may
|
||||
exist in the physical memory that a virtual buffer maps to. There is
|
||||
no I/O TLB or any scatter/gather mechanism. All buffers, whether input
|
||||
or output, must reside in a physically contiguous region of memory.
|
||||
|
||||
The Hypervisor translates all addresses within a CCB to physical
|
||||
before handing off the CCB to DAX. The Hypervisor determines the
|
||||
virtual page size for each virtual address given, and uses this to
|
||||
program a size limit for each address. This prevents the coprocessor
|
||||
from reading or writing beyond the bound of the virtual page, even
|
||||
though it is accessing physical memory directly. A simpler way of
|
||||
saying this is that a DAX operation will never "cross" a virtual page
|
||||
boundary. If an 8k virtual page is used, then the data is strictly
|
||||
limited to 8k. If a user's buffer is larger than 8k, then a larger
|
||||
page size must be used, or the transaction size will be truncated to
|
||||
8k.
|
||||
|
||||
Huge pages. A user may allocate huge pages using standard interfaces.
|
||||
Memory buffers residing on huge pages may be used to achieve much
|
||||
larger DAX transaction sizes, but the rules must still be followed,
|
||||
and no transaction will cross a page boundary, even a huge page. A
|
||||
major caveat is that Linux on Sparc presents 8Mb as one of the huge
|
||||
page sizes. Sparc does not actually provide a 8Mb hardware page size,
|
||||
and this size is synthesized by pasting together two 4Mb pages. The
|
||||
reasons for this are historical, and it creates an issue because only
|
||||
half of this 8Mb page can actually be used for any given buffer in a
|
||||
DAX request, and it must be either the first half or the second half;
|
||||
it cannot be a 4Mb chunk in the middle, since that crosses a
|
||||
(hardware) page boundary. Note that this entire issue may be hidden by
|
||||
higher level libraries.
|
||||
|
||||
|
||||
CCB Structure
|
||||
-------------
|
||||
A CCB is an array of 8 64-bit words. Several of these words provide
|
||||
command opcodes, parameters, flags, etc., and the rest are addresses
|
||||
for the completion area, output buffer, and various inputs:
|
||||
|
||||
struct ccb {
|
||||
u64 control;
|
||||
u64 completion;
|
||||
u64 input0;
|
||||
u64 access;
|
||||
u64 input1;
|
||||
u64 op_data;
|
||||
u64 output;
|
||||
u64 table;
|
||||
};
|
||||
|
||||
See libdax/common/sys/dax1/dax1_ccb.h for a detailed description of
|
||||
each of these fields, and see dax-hv-api.txt for a complete description
|
||||
of the Hypervisor API available to the guest OS (ie, Linux kernel).
|
||||
|
||||
The first word (control) is examined by the driver for the following:
|
||||
- CCB version, which must be consistent with hardware version
|
||||
- Opcode, which must be one of the documented allowable commands
|
||||
- Address types, which must be set to "virtual" for all the addresses
|
||||
given by the user, thereby ensuring that the application can
|
||||
only access memory that it owns
|
||||
|
||||
|
||||
Example Code
|
||||
------------
|
||||
|
||||
The DAX is accessible to both user and kernel code. The kernel code
|
||||
can make hypercalls directly while the user code must use wrappers
|
||||
provided by the driver. The setup of the CCB is nearly identical for
|
||||
both; the only difference is in preparation of the completion area. An
|
||||
example of user code is given now, with kernel code afterwards.
|
||||
|
||||
In order to program using the driver API, the file
|
||||
arch/sparc/include/uapi/asm/oradax.h must be included.
|
||||
|
||||
First, the proper device must be opened. For M7 it will be
|
||||
/dev/oradax1 and for M8 it will be /dev/oradax2. The simplest
|
||||
procedure is to attempt to open both, as only one will succeed:
|
||||
|
||||
fd = open("/dev/oradax1", O_RDWR);
|
||||
if (fd < 0)
|
||||
fd = open("/dev/oradax2", O_RDWR);
|
||||
if (fd < 0)
|
||||
/* No DAX found */
|
||||
|
||||
Next, the completion area must be mapped:
|
||||
|
||||
completion_area = mmap(NULL, DAX_MMAP_LEN, PROT_READ, MAP_SHARED, fd, 0);
|
||||
|
||||
All input and output buffers must be fully contained in one hardware
|
||||
page, since as explained above, the DAX is strictly constrained by
|
||||
virtual page boundaries. In addition, the output buffer must be
|
||||
64-byte aligned and its size must be a multiple of 64 bytes because
|
||||
the coprocessor writes in units of cache lines.
|
||||
|
||||
This example demonstrates the DAX Scan command, which takes as input a
|
||||
vector and a match value, and produces a bitmap as the output. For
|
||||
each input element that matches the value, the corresponding bit is
|
||||
set in the output.
|
||||
|
||||
In this example, the input vector consists of a series of single bits,
|
||||
and the match value is 0. So each 0 bit in the input will produce a 1
|
||||
in the output, and vice versa, which produces an output bitmap which
|
||||
is the input bitmap inverted.
|
||||
|
||||
For details of all the parameters and bits used in this CCB, please
|
||||
refer to section 36.2.1.3 of the DAX Hypervisor API document, which
|
||||
describes the Scan command in detail.
|
||||
|
||||
ccb->control = /* Table 36.1, CCB Header Format */
|
||||
(2L << 48) /* command = Scan Value */
|
||||
| (3L << 40) /* output address type = primary virtual */
|
||||
| (3L << 34) /* primary input address type = primary virtual */
|
||||
/* Section 36.2.1, Query CCB Command Formats */
|
||||
| (1 << 28) /* 36.2.1.1.1 primary input format = fixed width bit packed */
|
||||
| (0 << 23) /* 36.2.1.1.2 primary input element size = 0 (1 bit) */
|
||||
| (8 << 10) /* 36.2.1.1.6 output format = bit vector */
|
||||
| (0 << 5) /* 36.2.1.3 First scan criteria size = 0 (1 byte) */
|
||||
| (31 << 0); /* 36.2.1.3 Disable second scan criteria */
|
||||
|
||||
ccb->completion = 0; /* Completion area address, to be filled in by driver */
|
||||
|
||||
ccb->input0 = (unsigned long) input; /* primary input address */
|
||||
|
||||
ccb->access = /* Section 36.2.1.2, Data Access Control */
|
||||
(2 << 24) /* Primary input length format = bits */
|
||||
| (nbits - 1); /* number of bits in primary input stream, minus 1 */
|
||||
|
||||
ccb->input1 = 0; /* secondary input address, unused */
|
||||
|
||||
ccb->op_data = 0; /* scan criteria (value to be matched) */
|
||||
|
||||
ccb->output = (unsigned long) output; /* output address */
|
||||
|
||||
ccb->table = 0; /* table address, unused */
|
||||
|
||||
The CCB submission is a write() or pwrite() system call to the
|
||||
driver. If the call fails, then a read() must be used to retrieve the
|
||||
status:
|
||||
|
||||
if (pwrite(fd, ccb, 64, 0) != 64) {
|
||||
struct ccb_exec_result status;
|
||||
read(fd, &status, sizeof(status));
|
||||
/* bail out */
|
||||
}
|
||||
|
||||
After a successful submission of the CCB, the completion area may be
|
||||
polled to determine when the DAX is finished. Detailed information on
|
||||
the contents of the completion area can be found in section 36.2.2 of
|
||||
the DAX HV API document.
|
||||
|
||||
while (1) {
|
||||
/* Monitored Load */
|
||||
__asm__ __volatile__("lduba [%1] 0x84, %0\n"
|
||||
: "=r" (status)
|
||||
: "r" (completion_area));
|
||||
|
||||
if (status) /* 0 indicates command in progress */
|
||||
break;
|
||||
|
||||
/* MWAIT */
|
||||
__asm__ __volatile__("wr %%g0, 1000, %%asr28\n" ::); /* 1000 ns */
|
||||
}
|
||||
|
||||
A completion area status of 1 indicates successful completion of the
|
||||
CCB and validity of the output bitmap, which may be used immediately.
|
||||
All other non-zero values indicate error conditions which are
|
||||
described in section 36.2.2.
|
||||
|
||||
if (completion_area[0] != 1) { /* section 36.2.2, 1 = command ran and succeeded */
|
||||
/* completion_area[0] contains the completion status */
|
||||
/* completion_area[1] contains an error code, see 36.2.2 */
|
||||
}
|
||||
|
||||
After the completion area has been processed, the driver must be
|
||||
notified that it can release any resources associated with the
|
||||
request. This is done via the dequeue operation:
|
||||
|
||||
struct dax_command cmd;
|
||||
cmd.command = CCB_DEQUEUE;
|
||||
if (write(fd, &cmd, sizeof(cmd)) != sizeof(cmd)) {
|
||||
/* bail out */
|
||||
}
|
||||
|
||||
Finally, normal program cleanup should be done, i.e., unmapping
|
||||
completion area, closing the dax device, freeing memory etc.
|
||||
|
||||
[Kernel example]
|
||||
|
||||
The only difference in using the DAX in kernel code is the treatment
|
||||
of the completion area. Unlike user applications which mmap the
|
||||
completion area allocated by the driver, kernel code must allocate its
|
||||
own memory to use for the completion area, and this address and its
|
||||
type must be given in the CCB:
|
||||
|
||||
ccb->control |= /* Table 36.1, CCB Header Format */
|
||||
(3L << 32); /* completion area address type = primary virtual */
|
||||
|
||||
ccb->completion = (unsigned long) completion_area; /* Completion area address */
|
||||
|
||||
The dax submit hypercall is made directly. The flags used in the
|
||||
ccb_submit call are documented in the DAX HV API in section 36.3.1.
|
||||
|
||||
#include <asm/hypervisor.h>
|
||||
|
||||
hv_rv = sun4v_ccb_submit((unsigned long)ccb, 64,
|
||||
HV_CCB_QUERY_CMD |
|
||||
HV_CCB_ARG0_PRIVILEGED | HV_CCB_ARG0_TYPE_PRIMARY |
|
||||
HV_CCB_VA_PRIVILEGED,
|
||||
0, &bytes_accepted, &status_data);
|
||||
|
||||
if (hv_rv != HV_EOK) {
|
||||
/* hv_rv is an error code, status_data contains */
|
||||
/* potential additional status, see 36.3.1.1 */
|
||||
}
|
||||
|
||||
After the submission, the completion area polling code is identical to
|
||||
that in user land:
|
||||
|
||||
while (1) {
|
||||
/* Monitored Load */
|
||||
__asm__ __volatile__("lduba [%1] 0x84, %0\n"
|
||||
: "=r" (status)
|
||||
: "r" (completion_area));
|
||||
|
||||
if (status) /* 0 indicates command in progress */
|
||||
break;
|
||||
|
||||
/* MWAIT */
|
||||
__asm__ __volatile__("wr %%g0, 1000, %%asr28\n" ::); /* 1000 ns */
|
||||
}
|
||||
|
||||
if (completion_area[0] != 1) { /* section 36.2.2, 1 = command ran and succeeded */
|
||||
/* completion_area[0] contains the completion status */
|
||||
/* completion_area[1] contains an error code, see 36.2.2 */
|
||||
}
|
||||
|
||||
The output bitmap is ready for consumption immediately after the
|
||||
completion status indicates success.
|
@ -76,6 +76,10 @@
|
||||
#define HV_ETOOMANY 15 /* Too many items specified */
|
||||
#define HV_ECHANNEL 16 /* Invalid LDC channel */
|
||||
#define HV_EBUSY 17 /* Resource busy */
|
||||
#define HV_EUNAVAILABLE 23 /* Resource or operation not
|
||||
* currently available, but may
|
||||
* become available in the future
|
||||
*/
|
||||
|
||||
/* mach_exit()
|
||||
* TRAP: HV_FAST_TRAP
|
||||
@ -941,6 +945,139 @@ unsigned long sun4v_mmu_map_perm_addr(unsigned long vaddr,
|
||||
*/
|
||||
#define HV_FAST_MEM_SYNC 0x32
|
||||
|
||||
/* Coprocessor services
|
||||
*
|
||||
* M7 and later processors provide an on-chip coprocessor which
|
||||
* accelerates database operations, and is known internally as
|
||||
* DAX.
|
||||
*/
|
||||
|
||||
/* ccb_submit()
|
||||
* TRAP: HV_FAST_TRAP
|
||||
* FUNCTION: HV_CCB_SUBMIT
|
||||
* ARG0: address of CCB array
|
||||
* ARG1: size (in bytes) of CCB array being submitted
|
||||
* ARG2: flags
|
||||
* ARG3: reserved
|
||||
* RET0: status (success or error code)
|
||||
* RET1: size (in bytes) of CCB array that was accepted (might be less
|
||||
* than arg1)
|
||||
* RET2: status data
|
||||
* if status == ENOMAP or ENOACCESS, identifies the VA in question
|
||||
* if status == EUNAVAILBLE, unavailable code
|
||||
* RET3: reserved
|
||||
*
|
||||
* ERRORS: EOK successful submission (check size)
|
||||
* EWOULDBLOCK could not finish submissions, try again
|
||||
* EBADALIGN array not 64B aligned or size not 64B multiple
|
||||
* ENORADDR invalid RA for array or in CCB
|
||||
* ENOMAP could not translate address (see status data)
|
||||
* EINVAL invalid ccb or arguments
|
||||
* ETOOMANY too many ccbs with all-or-nothing flag
|
||||
* ENOACCESS guest has no access to submit ccbs or address
|
||||
* in CCB does not have correct permissions (check
|
||||
* status data)
|
||||
* EUNAVAILABLE ccb operation could not be performed at this
|
||||
* time (check status data)
|
||||
* Status data codes:
|
||||
* 0 - exact CCB could not be executed
|
||||
* 1 - CCB opcode cannot be executed
|
||||
* 2 - CCB version cannot be executed
|
||||
* 3 - vcpu cannot execute CCBs
|
||||
* 4 - no CCBs can be executed
|
||||
*/
|
||||
|
||||
#define HV_CCB_SUBMIT 0x34
|
||||
#ifndef __ASSEMBLY__
|
||||
unsigned long sun4v_ccb_submit(unsigned long ccb_buf,
|
||||
unsigned long len,
|
||||
unsigned long flags,
|
||||
unsigned long reserved,
|
||||
void *submitted_len,
|
||||
void *status_data);
|
||||
#endif
|
||||
|
||||
/* flags (ARG2) */
|
||||
#define HV_CCB_QUERY_CMD BIT(1)
|
||||
#define HV_CCB_ARG0_TYPE_REAL 0UL
|
||||
#define HV_CCB_ARG0_TYPE_PRIMARY BIT(4)
|
||||
#define HV_CCB_ARG0_TYPE_SECONDARY BIT(5)
|
||||
#define HV_CCB_ARG0_TYPE_NUCLEUS GENMASK(5, 4)
|
||||
#define HV_CCB_ARG0_PRIVILEGED BIT(6)
|
||||
#define HV_CCB_ALL_OR_NOTHING BIT(7)
|
||||
#define HV_CCB_QUEUE_INFO BIT(8)
|
||||
#define HV_CCB_VA_REJECT 0UL
|
||||
#define HV_CCB_VA_SECONDARY BIT(13)
|
||||
#define HV_CCB_VA_NUCLEUS GENMASK(13, 12)
|
||||
#define HV_CCB_VA_PRIVILEGED BIT(14)
|
||||
#define HV_CCB_VA_READ_ADI_DISABLE BIT(15) /* DAX2 only */
|
||||
|
||||
/* ccb_info()
|
||||
* TRAP: HV_FAST_TRAP
|
||||
* FUNCTION: HV_CCB_INFO
|
||||
* ARG0: real address of CCB completion area
|
||||
* RET0: status (success or error code)
|
||||
* RET1: info array
|
||||
* - RET1[0]: CCB state
|
||||
* - RET1[1]: dax unit
|
||||
* - RET1[2]: queue number
|
||||
* - RET1[3]: queue position
|
||||
*
|
||||
* ERRORS: EOK operation successful
|
||||
* EBADALIGN address not 64B aligned
|
||||
* ENORADDR RA in address not valid
|
||||
* EINVAL CA not valid
|
||||
* EWOULDBLOCK info not available for this CCB currently, try
|
||||
* again
|
||||
* ENOACCESS guest cannot use dax
|
||||
*/
|
||||
|
||||
#define HV_CCB_INFO 0x35
|
||||
#ifndef __ASSEMBLY__
|
||||
unsigned long sun4v_ccb_info(unsigned long ca,
|
||||
void *info_arr);
|
||||
#endif
|
||||
|
||||
/* info array byte offsets (RET1) */
|
||||
#define CCB_INFO_OFFSET_CCB_STATE 0
|
||||
#define CCB_INFO_OFFSET_DAX_UNIT 2
|
||||
#define CCB_INFO_OFFSET_QUEUE_NUM 4
|
||||
#define CCB_INFO_OFFSET_QUEUE_POS 6
|
||||
|
||||
/* CCB state (RET1[0]) */
|
||||
#define HV_CCB_STATE_COMPLETED 0
|
||||
#define HV_CCB_STATE_ENQUEUED 1
|
||||
#define HV_CCB_STATE_INPROGRESS 2
|
||||
#define HV_CCB_STATE_NOTFOUND 3
|
||||
|
||||
/* ccb_kill()
|
||||
* TRAP: HV_FAST_TRAP
|
||||
* FUNCTION: HV_CCB_KILL
|
||||
* ARG0: real address of CCB completion area
|
||||
* RET0: status (success or error code)
|
||||
* RET1: CCB kill status
|
||||
*
|
||||
* ERRORS: EOK operation successful
|
||||
* EBADALIGN address not 64B aligned
|
||||
* ENORADDR RA in address not valid
|
||||
* EINVAL CA not valid
|
||||
* EWOULDBLOCK kill not available for this CCB currently, try
|
||||
* again
|
||||
* ENOACCESS guest cannot use dax
|
||||
*/
|
||||
|
||||
#define HV_CCB_KILL 0x36
|
||||
#ifndef __ASSEMBLY__
|
||||
unsigned long sun4v_ccb_kill(unsigned long ca,
|
||||
void *kill_status);
|
||||
#endif
|
||||
|
||||
/* CCB kill status (RET1) */
|
||||
#define HV_CCB_KILL_COMPLETED 0
|
||||
#define HV_CCB_KILL_DEQUEUED 1
|
||||
#define HV_CCB_KILL_KILLED 2
|
||||
#define HV_CCB_KILL_NOTFOUND 3
|
||||
|
||||
/* Time of day services.
|
||||
*
|
||||
* The hypervisor maintains the time of day on a per-domain basis.
|
||||
@ -3355,6 +3492,7 @@ unsigned long sun4v_m7_set_perfreg(unsigned long reg_num,
|
||||
#define HV_GRP_SDIO_ERR 0x0109
|
||||
#define HV_GRP_REBOOT_DATA 0x0110
|
||||
#define HV_GRP_ATU 0x0111
|
||||
#define HV_GRP_DAX 0x0113
|
||||
#define HV_GRP_M7_PERF 0x0114
|
||||
#define HV_GRP_NIAG_PERF 0x0200
|
||||
#define HV_GRP_FIRE_PERF 0x0201
|
||||
|
91
arch/sparc/include/uapi/asm/oradax.h
Normal file
91
arch/sparc/include/uapi/asm/oradax.h
Normal file
@ -0,0 +1,91 @@
|
||||
/*
|
||||
* Copyright (c) 2017, Oracle and/or its affiliates. All rights reserved.
|
||||
*
|
||||
* This program is free software: you can redistribute it and/or modify
|
||||
* it under the terms of the GNU General Public License as published by
|
||||
* the Free Software Foundation, either version 3 of the License, or
|
||||
* (at your option) any later version.
|
||||
*
|
||||
* This program is distributed in the hope that it will be useful,
|
||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
* GNU General Public License for more details.
|
||||
*
|
||||
* You should have received a copy of the GNU General Public License
|
||||
* along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
*/
|
||||
|
||||
/*
|
||||
* Oracle DAX driver API definitions
|
||||
*/
|
||||
|
||||
#ifndef _ORADAX_H
|
||||
#define _ORADAX_H
|
||||
|
||||
#include <linux/types.h>
|
||||
|
||||
#define CCB_KILL 0
|
||||
#define CCB_INFO 1
|
||||
#define CCB_DEQUEUE 2
|
||||
|
||||
struct dax_command {
|
||||
__u16 command; /* CCB_KILL/INFO/DEQUEUE */
|
||||
__u16 ca_offset; /* offset into mmapped completion area */
|
||||
};
|
||||
|
||||
struct ccb_kill_result {
|
||||
__u16 action; /* action taken to kill ccb */
|
||||
};
|
||||
|
||||
struct ccb_info_result {
|
||||
__u16 state; /* state of enqueued ccb */
|
||||
__u16 inst_num; /* dax instance number of enqueued ccb */
|
||||
__u16 q_num; /* queue number of enqueued ccb */
|
||||
__u16 q_pos; /* ccb position in queue */
|
||||
};
|
||||
|
||||
struct ccb_exec_result {
|
||||
__u64 status_data; /* additional status data (e.g. bad VA) */
|
||||
__u32 status; /* one of DAX_SUBMIT_* */
|
||||
};
|
||||
|
||||
union ccb_result {
|
||||
struct ccb_exec_result exec;
|
||||
struct ccb_info_result info;
|
||||
struct ccb_kill_result kill;
|
||||
};
|
||||
|
||||
#define DAX_MMAP_LEN (16 * 1024)
|
||||
#define DAX_MAX_CCBS 15
|
||||
#define DAX_CCB_BUF_MAXLEN (DAX_MAX_CCBS * 64)
|
||||
#define DAX_NAME "oradax"
|
||||
|
||||
/* CCB_EXEC status */
|
||||
#define DAX_SUBMIT_OK 0
|
||||
#define DAX_SUBMIT_ERR_RETRY 1
|
||||
#define DAX_SUBMIT_ERR_WOULDBLOCK 2
|
||||
#define DAX_SUBMIT_ERR_BUSY 3
|
||||
#define DAX_SUBMIT_ERR_THR_INIT 4
|
||||
#define DAX_SUBMIT_ERR_ARG_INVAL 5
|
||||
#define DAX_SUBMIT_ERR_CCB_INVAL 6
|
||||
#define DAX_SUBMIT_ERR_NO_CA_AVAIL 7
|
||||
#define DAX_SUBMIT_ERR_CCB_ARR_MMU_MISS 8
|
||||
#define DAX_SUBMIT_ERR_NOMAP 9
|
||||
#define DAX_SUBMIT_ERR_NOACCESS 10
|
||||
#define DAX_SUBMIT_ERR_TOOMANY 11
|
||||
#define DAX_SUBMIT_ERR_UNAVAIL 12
|
||||
#define DAX_SUBMIT_ERR_INTERNAL 13
|
||||
|
||||
/* CCB_INFO states - must match HV_CCB_STATE_* definitions */
|
||||
#define DAX_CCB_COMPLETED 0
|
||||
#define DAX_CCB_ENQUEUED 1
|
||||
#define DAX_CCB_INPROGRESS 2
|
||||
#define DAX_CCB_NOTFOUND 3
|
||||
|
||||
/* CCB_KILL actions - must match HV_CCB_KILL_* definitions */
|
||||
#define DAX_KILL_COMPLETED 0
|
||||
#define DAX_KILL_DEQUEUED 1
|
||||
#define DAX_KILL_KILLED 2
|
||||
#define DAX_KILL_NOTFOUND 3
|
||||
|
||||
#endif /* _ORADAX_H */
|
@ -41,6 +41,7 @@ static struct api_info api_table[] = {
|
||||
{ .group = HV_GRP_SDIO_ERR, },
|
||||
{ .group = HV_GRP_REBOOT_DATA, },
|
||||
{ .group = HV_GRP_ATU, .flags = FLAG_PRE_API },
|
||||
{ .group = HV_GRP_DAX, },
|
||||
{ .group = HV_GRP_NIAG_PERF, .flags = FLAG_PRE_API },
|
||||
{ .group = HV_GRP_FIRE_PERF, },
|
||||
{ .group = HV_GRP_N2_CPU, },
|
||||
|
@ -871,3 +871,60 @@ ENTRY(sun4v_m7_set_perfreg)
|
||||
retl
|
||||
nop
|
||||
ENDPROC(sun4v_m7_set_perfreg)
|
||||
|
||||
/* %o0: address of CCB array
|
||||
* %o1: size (in bytes) of CCB array
|
||||
* %o2: flags
|
||||
* %o3: reserved
|
||||
*
|
||||
* returns:
|
||||
* %o0: status
|
||||
* %o1: size (in bytes) of the CCB array that was accepted
|
||||
* %o2: status data
|
||||
* %o3: reserved
|
||||
*/
|
||||
ENTRY(sun4v_ccb_submit)
|
||||
mov %o5, %g1
|
||||
mov HV_CCB_SUBMIT, %o5
|
||||
ta HV_FAST_TRAP
|
||||
stx %o1, [%o4]
|
||||
retl
|
||||
stx %o2, [%g1]
|
||||
ENDPROC(sun4v_ccb_submit)
|
||||
EXPORT_SYMBOL(sun4v_ccb_submit)
|
||||
|
||||
/* %o0: completion area ra for the ccb to get info
|
||||
*
|
||||
* returns:
|
||||
* %o0: status
|
||||
* %o1: CCB state
|
||||
* %o2: position
|
||||
* %o3: dax unit
|
||||
* %o4: queue
|
||||
*/
|
||||
ENTRY(sun4v_ccb_info)
|
||||
mov %o1, %g1
|
||||
mov HV_CCB_INFO, %o5
|
||||
ta HV_FAST_TRAP
|
||||
sth %o1, [%g1 + CCB_INFO_OFFSET_CCB_STATE]
|
||||
sth %o2, [%g1 + CCB_INFO_OFFSET_QUEUE_POS]
|
||||
sth %o3, [%g1 + CCB_INFO_OFFSET_DAX_UNIT]
|
||||
retl
|
||||
sth %o4, [%g1 + CCB_INFO_OFFSET_QUEUE_NUM]
|
||||
ENDPROC(sun4v_ccb_info)
|
||||
EXPORT_SYMBOL(sun4v_ccb_info)
|
||||
|
||||
/* %o0: completion area ra for the ccb to kill
|
||||
*
|
||||
* returns:
|
||||
* %o0: status
|
||||
* %o1: result of the kill
|
||||
*/
|
||||
ENTRY(sun4v_ccb_kill)
|
||||
mov %o1, %g1
|
||||
mov HV_CCB_KILL, %o5
|
||||
ta HV_FAST_TRAP
|
||||
retl
|
||||
sth %o1, [%g1]
|
||||
ENDPROC(sun4v_ccb_kill)
|
||||
EXPORT_SYMBOL(sun4v_ccb_kill)
|
||||
|
@ -9,9 +9,6 @@
|
||||
* Copyright (C) 1997,1998 Jakub Jelinek (jj@sunsite.mff.cuni.cz)
|
||||
*/
|
||||
|
||||
#ifdef CONFIG_COMPAT
|
||||
#include <linux/compat.h> /* for compat_old_sigset_t */
|
||||
#endif
|
||||
#include <linux/sched.h>
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/signal.h>
|
||||
|
@ -251,7 +251,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
|
||||
else
|
||||
return map_vdso(&vdso_image_32_builtin, &vdso_mapping32);
|
||||
#else
|
||||
return map_vdso(&vdso_image_64_builtin, &vdso_mapping64);
|
||||
return map_vdso(&vdso_image_64_builtin, &vdso_mapping64);
|
||||
#endif
|
||||
|
||||
}
|
||||
|
@ -70,5 +70,13 @@ config DISPLAY7SEG
|
||||
another UltraSPARC-IIi-cEngine boardset with a 7-segment display,
|
||||
you should say N to this option.
|
||||
|
||||
config ORACLE_DAX
|
||||
tristate "Oracle Data Analytics Accelerator"
|
||||
default m if SPARC64
|
||||
help
|
||||
Driver for Oracle Data Analytics Accelerator, which is
|
||||
a coprocessor that performs database operations in hardware.
|
||||
It is available on M7 and M8 based systems only.
|
||||
|
||||
endmenu
|
||||
|
||||
|
@ -17,3 +17,4 @@ obj-$(CONFIG_SUN_OPENPROMIO) += openprom.o
|
||||
obj-$(CONFIG_TADPOLE_TS102_UCTRL) += uctrl.o
|
||||
obj-$(CONFIG_SUN_JSFLASH) += jsflash.o
|
||||
obj-$(CONFIG_BBC_I2C) += bbc.o
|
||||
obj-$(CONFIG_ORACLE_DAX) += oradax.o
|
||||
|
1005
drivers/sbus/char/oradax.c
Normal file
1005
drivers/sbus/char/oradax.c
Normal file
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue
Block a user