2019-06-04 10:11:33 +02:00
// SPDX-License-Identifier: GPL-2.0-only
2005-04-16 15:20:36 -07:00
/*
2008-09-25 15:59:19 +01:00
* linux / arch / arm / mm / dma - mapping . c
2005-04-16 15:20:36 -07:00
*
* Copyright ( C ) 2000 - 2004 Russell King
*
* DMA uncached mapping support .
*/
# include <linux/module.h>
# include <linux/mm.h>
2014-10-09 15:26:42 -07:00
# include <linux/genalloc.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 17:04:11 +09:00
# include <linux/gfp.h>
2005-04-16 15:20:36 -07:00
# include <linux/errno.h>
# include <linux/list.h>
# include <linux/init.h>
# include <linux/device.h>
# include <linux/dma-mapping.h>
2019-07-23 11:33:12 +02:00
# include <linux/dma-noncoherent.h>
2011-12-29 13:09:51 +01:00
# include <linux/dma-contiguous.h>
2010-12-15 15:14:45 -05:00
# include <linux/highmem.h>
2011-12-29 13:09:51 +01:00
# include <linux/memblock.h>
2011-08-02 17:28:27 +01:00
# include <linux/slab.h>
2012-05-16 15:48:21 +02:00
# include <linux/iommu.h>
2012-07-30 09:11:33 +02:00
# include <linux/io.h>
2012-05-16 15:48:21 +02:00
# include <linux/vmalloc.h>
2012-06-24 12:46:26 +01:00
# include <linux/sizes.h>
2014-08-06 16:05:25 -07:00
# include <linux/cma.h>
2005-04-16 15:20:36 -07:00
2006-04-02 00:07:39 +01:00
# include <asm/memory.h>
2009-03-12 22:52:09 -04:00
# include <asm/highmem.h>
2005-04-16 15:20:36 -07:00
# include <asm/cacheflush.h>
# include <asm/tlbflush.h>
2011-08-02 17:28:27 +01:00
# include <asm/mach/arch.h>
2012-05-16 15:48:21 +02:00
# include <asm/dma-iommu.h>
2011-12-29 13:09:51 +01:00
# include <asm/mach/map.h>
# include <asm/system_info.h>
# include <asm/dma-contiguous.h>
2006-01-12 16:12:21 +00:00
2015-07-24 09:10:55 +01:00
# include "dma.h"
2011-07-08 21:26:59 +01:00
# include "mm.h"
2016-03-03 15:58:01 +01:00
struct arm_dma_alloc_args {
struct device * dev ;
size_t size ;
gfp_t gfp ;
pgprot_t prot ;
const void * caller ;
bool want_vaddr ;
2016-04-15 11:15:18 +01:00
int coherent_flag ;
2016-03-03 15:58:01 +01:00
} ;
struct arm_dma_free_args {
struct device * dev ;
size_t size ;
void * cpu_addr ;
struct page * page ;
bool want_vaddr ;
} ;
2016-04-15 11:15:18 +01:00
# define NORMAL 0
# define COHERENT 1
2016-03-03 15:58:01 +01:00
struct arm_dma_allocator {
void * ( * alloc ) ( struct arm_dma_alloc_args * args ,
struct page * * ret_page ) ;
void ( * free ) ( struct arm_dma_free_args * args ) ;
} ;
2016-03-03 15:58:00 +01:00
struct arm_dma_buffer {
struct list_head list ;
void * virt ;
2016-03-03 15:58:01 +01:00
struct arm_dma_allocator * allocator ;
2016-03-03 15:58:00 +01:00
} ;
static LIST_HEAD ( arm_dma_bufs ) ;
static DEFINE_SPINLOCK ( arm_dma_bufs_lock ) ;
static struct arm_dma_buffer * arm_dma_buffer_find ( void * virt )
{
struct arm_dma_buffer * buf , * found = NULL ;
unsigned long flags ;
spin_lock_irqsave ( & arm_dma_bufs_lock , flags ) ;
list_for_each_entry ( buf , & arm_dma_bufs , list ) {
if ( buf - > virt = = virt ) {
list_del ( & buf - > list ) ;
found = buf ;
break ;
}
}
spin_unlock_irqrestore ( & arm_dma_bufs_lock , flags ) ;
return found ;
}
2012-02-10 19:55:20 +01:00
/*
* The DMA API is built upon the notion of " buffer ownership " . A buffer
* is either exclusively owned by the CPU ( and therefore may be accessed
* by it ) or exclusively owned by the DMA device . These helper functions
* represent the transitions between these two ownership states .
*
* Note , however , that on later ARMs , this notion does not work due to
* speculative prefetches . We model our approach on the assumption that
* the CPU does do speculative prefetches , which means we clean caches
* before transfers and delay cache invalidation until transfer completion .
*
*/
2012-02-10 19:55:20 +01:00
static void __dma_page_cpu_to_dev ( struct page * , unsigned long ,
2012-02-10 19:55:20 +01:00
size_t , enum dma_data_direction ) ;
2012-02-10 19:55:20 +01:00
static void __dma_page_dev_to_cpu ( struct page * , unsigned long ,
2012-02-10 19:55:20 +01:00
size_t , enum dma_data_direction ) ;
2012-02-10 19:55:20 +01:00
/**
* arm_dma_map_page - map a portion of a page for streaming DMA
* @ dev : valid struct device pointer , or NULL for ISA and EISA - like devices
* @ page : page that buffer resides in
* @ offset : offset into page for start of buffer
* @ size : size of buffer to map
* @ dir : DMA transfer direction
*
* Ensure that any data held in the cache is appropriately discarded
* or written back .
*
* The device owns this memory once this call has completed . The CPU
* can regain ownership by calling dma_unmap_page ( ) .
*/
2012-02-10 19:55:20 +01:00
static dma_addr_t arm_dma_map_page ( struct device * dev , struct page * page ,
2012-02-10 19:55:20 +01:00
unsigned long offset , size_t size , enum dma_data_direction dir ,
2016-08-03 13:46:00 -07:00
unsigned long attrs )
2012-02-10 19:55:20 +01:00
{
2016-08-03 13:46:00 -07:00
if ( ( attrs & DMA_ATTR_SKIP_CPU_SYNC ) = = 0 )
2012-02-10 19:55:20 +01:00
__dma_page_cpu_to_dev ( page , offset , size , dir ) ;
return pfn_to_dma ( dev , page_to_pfn ( page ) ) + offset ;
2012-02-10 19:55:20 +01:00
}
2012-08-21 12:20:17 +02:00
static dma_addr_t arm_coherent_dma_map_page ( struct device * dev , struct page * page ,
unsigned long offset , size_t size , enum dma_data_direction dir ,
2016-08-03 13:46:00 -07:00
unsigned long attrs )
2012-08-21 12:20:17 +02:00
{
return pfn_to_dma ( dev , page_to_pfn ( page ) ) + offset ;
}
2012-02-10 19:55:20 +01:00
/**
* arm_dma_unmap_page - unmap a buffer previously mapped through dma_map_page ( )
* @ dev : valid struct device pointer , or NULL for ISA and EISA - like devices
* @ handle : DMA address of buffer
* @ size : size of buffer ( same as passed to dma_map_page )
* @ dir : DMA transfer direction ( same as passed to dma_map_page )
*
* Unmap a page streaming mode DMA translation . The handle and size
* must match what was provided in the previous dma_map_page ( ) call .
* All other usages are undefined .
*
* After this call , reads by the CPU to the buffer are guaranteed to see
* whatever the device wrote there .
*/
2012-02-10 19:55:20 +01:00
static void arm_dma_unmap_page ( struct device * dev , dma_addr_t handle ,
2016-08-03 13:46:00 -07:00
size_t size , enum dma_data_direction dir , unsigned long attrs )
2012-02-10 19:55:20 +01:00
{
2016-08-03 13:46:00 -07:00
if ( ( attrs & DMA_ATTR_SKIP_CPU_SYNC ) = = 0 )
2012-02-10 19:55:20 +01:00
__dma_page_dev_to_cpu ( pfn_to_page ( dma_to_pfn ( dev , handle ) ) ,
handle & ~ PAGE_MASK , size , dir ) ;
2012-02-10 19:55:20 +01:00
}
2012-02-10 19:55:20 +01:00
static void arm_dma_sync_single_for_cpu ( struct device * dev ,
2012-02-10 19:55:20 +01:00
dma_addr_t handle , size_t size , enum dma_data_direction dir )
{
unsigned int offset = handle & ( PAGE_SIZE - 1 ) ;
struct page * page = pfn_to_page ( dma_to_pfn ( dev , handle - offset ) ) ;
2012-08-21 12:20:17 +02:00
__dma_page_dev_to_cpu ( page , offset , size , dir ) ;
2012-02-10 19:55:20 +01:00
}
2012-02-10 19:55:20 +01:00
static void arm_dma_sync_single_for_device ( struct device * dev ,
2012-02-10 19:55:20 +01:00
dma_addr_t handle , size_t size , enum dma_data_direction dir )
{
unsigned int offset = handle & ( PAGE_SIZE - 1 ) ;
struct page * page = pfn_to_page ( dma_to_pfn ( dev , handle - offset ) ) ;
2012-08-21 12:20:17 +02:00
__dma_page_cpu_to_dev ( page , offset , size , dir ) ;
2012-02-10 19:55:20 +01:00
}
2017-01-20 13:04:01 -08:00
const struct dma_map_ops arm_dma_ops = {
2012-05-16 18:31:23 +02:00
. alloc = arm_dma_alloc ,
. free = arm_dma_free ,
. mmap = arm_dma_mmap ,
2012-06-13 10:01:15 +02:00
. get_sgtable = arm_dma_get_sgtable ,
2012-02-10 19:55:20 +01:00
. map_page = arm_dma_map_page ,
. unmap_page = arm_dma_unmap_page ,
. map_sg = arm_dma_map_sg ,
. unmap_sg = arm_dma_unmap_sg ,
2019-01-04 18:20:05 +01:00
. map_resource = dma_direct_map_resource ,
2012-02-10 19:55:20 +01:00
. sync_single_for_cpu = arm_dma_sync_single_for_cpu ,
. sync_single_for_device = arm_dma_sync_single_for_device ,
. sync_sg_for_cpu = arm_dma_sync_sg_for_cpu ,
. sync_sg_for_device = arm_dma_sync_sg_for_device ,
2017-05-22 11:20:18 +02:00
. dma_supported = arm_dma_supported ,
2012-02-10 19:55:20 +01:00
} ;
EXPORT_SYMBOL ( arm_dma_ops ) ;
2012-08-21 12:20:17 +02:00
static void * arm_coherent_dma_alloc ( struct device * dev , size_t size ,
2016-08-03 13:46:00 -07:00
dma_addr_t * handle , gfp_t gfp , unsigned long attrs ) ;
2012-08-21 12:20:17 +02:00
static void arm_coherent_dma_free ( struct device * dev , size_t size , void * cpu_addr ,
2016-08-03 13:46:00 -07:00
dma_addr_t handle , unsigned long attrs ) ;
2015-06-03 11:25:31 +01:00
static int arm_coherent_dma_mmap ( struct device * dev , struct vm_area_struct * vma ,
void * cpu_addr , dma_addr_t dma_addr , size_t size ,
2016-08-03 13:46:00 -07:00
unsigned long attrs ) ;
2012-08-21 12:20:17 +02:00
2017-01-20 13:04:01 -08:00
const struct dma_map_ops arm_coherent_dma_ops = {
2012-08-21 12:20:17 +02:00
. alloc = arm_coherent_dma_alloc ,
. free = arm_coherent_dma_free ,
2015-06-03 11:25:31 +01:00
. mmap = arm_coherent_dma_mmap ,
2012-08-21 12:20:17 +02:00
. get_sgtable = arm_dma_get_sgtable ,
. map_page = arm_coherent_dma_map_page ,
. map_sg = arm_dma_map_sg ,
2019-01-04 18:20:05 +01:00
. map_resource = dma_direct_map_resource ,
2017-05-22 11:20:18 +02:00
. dma_supported = arm_dma_supported ,
2012-08-21 12:20:17 +02:00
} ;
EXPORT_SYMBOL ( arm_coherent_dma_ops ) ;
2013-12-06 12:30:42 +00:00
static int __dma_supported ( struct device * dev , u64 mask , bool warn )
{
2019-04-29 09:04:53 -05:00
unsigned long max_dma_pfn = min ( max_pfn , arm_dma_pfn_limit ) ;
2013-12-06 12:30:42 +00:00
/*
* Translate the device ' s DMA mask to a PFN limit . This
* PFN number includes the page which we can DMA to .
*/
if ( dma_to_pfn ( dev , mask ) < max_dma_pfn ) {
if ( warn )
dev_warn ( dev , " Coherent DMA mask %#llx (pfn %#lx-%#lx) covers a smaller range of system memory than the DMA zone pfn 0x0-%#lx \n " ,
mask ,
dma_to_pfn ( dev , 0 ) , dma_to_pfn ( dev , mask ) + 1 ,
max_dma_pfn + 1 ) ;
return 0 ;
}
return 1 ;
}
2009-07-24 12:35:02 +01:00
static u64 get_coherent_dma_mask ( struct device * dev )
{
2013-07-09 12:14:49 +01:00
u64 mask = ( u64 ) DMA_BIT_MASK ( 32 ) ;
2009-07-24 12:35:02 +01:00
if ( dev ) {
mask = dev - > coherent_dma_mask ;
/*
* Sanity check the DMA mask - it must be non - zero , and
* must be able to be satisfied by a DMA allocation .
*/
if ( mask = = 0 ) {
dev_warn ( dev , " coherent DMA mask is unset \n " ) ;
return 0 ;
}
2013-12-06 12:30:42 +00:00
if ( ! __dma_supported ( dev , mask , true ) )
2009-07-24 12:35:02 +01:00
return 0 ;
}
2005-04-16 15:20:36 -07:00
2009-07-24 12:35:02 +01:00
return mask ;
}
2016-04-15 11:15:18 +01:00
static void __dma_clear_buffer ( struct page * page , size_t size , int coherent_flag )
2011-12-29 13:09:51 +01:00
{
/*
* Ensure that the allocated pages are zeroed , and that any data
* lurking in the kernel direct - mapped region is invalidated .
*/
2013-01-16 15:38:44 +01:00
if ( PageHighMem ( page ) ) {
phys_addr_t base = __pfn_to_phys ( page_to_pfn ( page ) ) ;
phys_addr_t end = base + size ;
while ( size > 0 ) {
void * ptr = kmap_atomic ( page ) ;
memset ( ptr , 0 , PAGE_SIZE ) ;
2016-04-15 11:15:18 +01:00
if ( coherent_flag ! = COHERENT )
dmac_flush_range ( ptr , ptr + PAGE_SIZE ) ;
2013-01-16 15:38:44 +01:00
kunmap_atomic ( ptr ) ;
page + + ;
size - = PAGE_SIZE ;
}
2016-04-15 11:15:18 +01:00
if ( coherent_flag ! = COHERENT )
outer_flush_range ( base , end ) ;
2013-01-16 15:38:44 +01:00
} else {
void * ptr = page_address ( page ) ;
2012-05-16 15:48:21 +02:00
memset ( ptr , 0 , size ) ;
2016-04-15 11:15:18 +01:00
if ( coherent_flag ! = COHERENT ) {
dmac_flush_range ( ptr , ptr + size ) ;
outer_flush_range ( __pa ( ptr ) , __pa ( ptr ) + size ) ;
}
2012-05-16 15:48:21 +02:00
}
2011-12-29 13:09:51 +01:00
}
2009-11-19 15:31:07 +00:00
/*
* Allocate a DMA buffer for ' dev ' of size ' size ' using the
* specified gfp mask . Note that ' size ' must be page aligned .
*/
2016-04-15 11:15:18 +01:00
static struct page * __dma_alloc_buffer ( struct device * dev , size_t size ,
gfp_t gfp , int coherent_flag )
2009-11-19 15:31:07 +00:00
{
unsigned long order = get_order ( size ) ;
struct page * page , * p , * e ;
page = alloc_pages ( gfp , order ) ;
if ( ! page )
return NULL ;
/*
* Now split the huge page and free the excess pages
*/
split_page ( page , order ) ;
for ( p = page + ( size > > PAGE_SHIFT ) , e = page + ( 1 < < order ) ; p < e ; p + + )
__free_page ( p ) ;
2016-04-15 11:15:18 +01:00
__dma_clear_buffer ( page , size , coherent_flag ) ;
2009-11-19 15:31:07 +00:00
return page ;
}
/*
* Free a DMA buffer . ' size ' must be page aligned .
*/
static void __dma_free_buffer ( struct page * page , size_t size )
{
struct page * e = page + ( size > > PAGE_SHIFT ) ;
while ( page < e ) {
__free_page ( page ) ;
page + + ;
}
}
2012-07-30 09:11:33 +02:00
static void * __alloc_from_contiguous ( struct device * dev , size_t size ,
2013-01-16 15:38:44 +01:00
pgprot_t prot , struct page * * ret_page ,
2016-04-15 11:15:18 +01:00
const void * caller , bool want_vaddr ,
2017-02-24 14:58:44 -08:00
int coherent_flag , gfp_t gfp ) ;
2011-08-02 17:28:27 +01:00
2012-07-30 09:11:33 +02:00
static void * __alloc_remap_buffer ( struct device * dev , size_t size , gfp_t gfp ,
pgprot_t prot , struct page * * ret_page ,
2015-02-09 10:38:35 +01:00
const void * caller , bool want_vaddr ) ;
2011-08-02 17:28:27 +01:00
2012-07-30 09:11:33 +02:00
static void *
__dma_alloc_remap ( struct page * page , size_t size , gfp_t gfp , pgprot_t prot ,
const void * caller )
2011-08-02 17:28:27 +01:00
{
2012-07-30 09:11:33 +02:00
/*
* DMA allocation can be mapped to user space , so lets
* set VM_USERMAP flags too .
*/
2014-10-09 15:26:40 -07:00
return dma_common_contiguous_remap ( page , size ,
VM_ARM_DMA_CONSISTENT | VM_USERMAP ,
prot , caller ) ;
2011-08-02 17:28:27 +01:00
}
2005-04-16 15:20:36 -07:00
2012-07-30 09:11:33 +02:00
static void __dma_free_remap ( void * cpu_addr , size_t size )
2009-11-19 16:46:02 +00:00
{
2014-10-09 15:26:40 -07:00
dma_common_free_remap ( cpu_addr , size ,
VM_ARM_DMA_CONSISTENT | VM_USERMAP ) ;
2009-11-19 16:46:02 +00:00
}
2012-08-20 11:19:25 +02:00
# define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K
2017-09-25 10:29:07 +01:00
static struct gen_pool * atomic_pool __ro_after_init ;
2012-08-20 11:19:25 +02:00
2017-09-25 10:29:07 +01:00
static size_t atomic_pool_size __initdata = DEFAULT_DMA_COHERENT_POOL_SIZE ;
2011-12-29 13:09:51 +01:00
static int __init early_coherent_pool ( char * p )
{
2014-10-09 15:26:42 -07:00
atomic_pool_size = memparse ( p , & p ) ;
2011-12-29 13:09:51 +01:00
return 0 ;
}
early_param ( " coherent_pool " , early_coherent_pool ) ;
/*
* Initialise the coherent pool for atomic allocations .
*/
2012-07-30 09:11:33 +02:00
static int __init atomic_pool_init ( void )
2011-12-29 13:09:51 +01:00
{
2013-11-25 12:01:03 +00:00
pgprot_t prot = pgprot_dmacoherent ( PAGE_KERNEL ) ;
2013-02-26 07:46:24 +01:00
gfp_t gfp = GFP_KERNEL | GFP_DMA ;
2011-12-29 13:09:51 +01:00
struct page * page ;
void * ptr ;
2014-10-09 15:26:42 -07:00
atomic_pool = gen_pool_create ( PAGE_SHIFT , - 1 ) ;
if ( ! atomic_pool )
goto out ;
2016-04-15 11:15:18 +01:00
/*
* The atomic pool is only used for non - coherent allocations
* so we must pass NORMAL for coherent_flag .
*/
2014-05-22 13:38:23 +09:00
if ( dev_get_cma_area ( NULL ) )
2014-10-09 15:26:42 -07:00
ptr = __alloc_from_contiguous ( NULL , atomic_pool_size , prot ,
2017-02-24 14:58:44 -08:00
& page , atomic_pool_init , true , NORMAL ,
GFP_KERNEL ) ;
2012-07-30 09:11:33 +02:00
else
2014-10-09 15:26:42 -07:00
ptr = __alloc_remap_buffer ( NULL , atomic_pool_size , gfp , prot ,
2015-02-09 10:38:35 +01:00
& page , atomic_pool_init , true ) ;
2011-12-29 13:09:51 +01:00
if ( ptr ) {
2014-10-09 15:26:42 -07:00
int ret ;
ret = gen_pool_add_virt ( atomic_pool , ( unsigned long ) ptr ,
page_to_phys ( page ) ,
atomic_pool_size , - 1 ) ;
if ( ret )
goto destroy_genpool ;
gen_pool_set_algo ( atomic_pool ,
gen_pool_first_fit_order_align ,
2017-09-25 10:25:53 +01:00
NULL ) ;
2016-07-18 13:09:36 +01:00
pr_info ( " DMA: preallocated %zu KiB pool for atomic coherent allocations \n " ,
2014-10-09 15:26:42 -07:00
atomic_pool_size / 1024 ) ;
2011-12-29 13:09:51 +01:00
return 0 ;
}
2012-09-24 08:35:03 +02:00
2014-10-09 15:26:42 -07:00
destroy_genpool :
gen_pool_destroy ( atomic_pool ) ;
atomic_pool = NULL ;
out :
2016-07-18 13:09:36 +01:00
pr_err ( " DMA: failed to allocate %zu KiB pool for atomic coherent allocation \n " ,
2014-10-09 15:26:42 -07:00
atomic_pool_size / 1024 ) ;
2011-12-29 13:09:51 +01:00
return - ENOMEM ;
}
/*
* CMA is activated by core_initcall , so we must be called after it .
*/
2012-07-30 09:11:33 +02:00
postcore_initcall ( atomic_pool_init ) ;
2011-12-29 13:09:51 +01:00
struct dma_contig_early_reserve {
phys_addr_t base ;
unsigned long size ;
} ;
static struct dma_contig_early_reserve dma_mmu_remap [ MAX_CMA_AREAS ] __initdata ;
static int dma_mmu_remap_num __initdata ;
void __init dma_contiguous_early_fixup ( phys_addr_t base , unsigned long size )
{
dma_mmu_remap [ dma_mmu_remap_num ] . base = base ;
dma_mmu_remap [ dma_mmu_remap_num ] . size = size ;
dma_mmu_remap_num + + ;
}
void __init dma_contiguous_remap ( void )
{
int i ;
for ( i = 0 ; i < dma_mmu_remap_num ; i + + ) {
phys_addr_t start = dma_mmu_remap [ i ] . base ;
phys_addr_t end = start + dma_mmu_remap [ i ] . size ;
struct map_desc map ;
unsigned long addr ;
if ( end > arm_lowmem_limit )
end = arm_lowmem_limit ;
if ( start > = end )
2012-08-07 14:01:14 +02:00
continue ;
2011-12-29 13:09:51 +01:00
map . pfn = __phys_to_pfn ( start ) ;
map . virtual = __phys_to_virt ( start ) ;
map . length = end - start ;
map . type = MT_MEMORY_DMA_READY ;
/*
2014-07-17 12:17:45 +01:00
* Clear previous low - memory mapping to ensure that the
* TLB does not see any conflicting entries , then flush
* the TLB of the old entries before creating new mappings .
*
* This ensures that any speculatively loaded TLB entries
* ( even though they may be rare ) can not cause any problems ,
* and ensures that this code is architecturally compliant .
2011-12-29 13:09:51 +01:00
*/
for ( addr = __phys_to_virt ( start ) ; addr < __phys_to_virt ( end ) ;
2012-05-14 13:49:56 -04:00
addr + = PMD_SIZE )
2011-12-29 13:09:51 +01:00
pmd_clear ( pmd_off_k ( addr ) ) ;
2014-07-17 12:17:45 +01:00
flush_tlb_kernel_range ( __phys_to_virt ( start ) ,
__phys_to_virt ( end ) ) ;
2018-05-23 10:18:21 +09:00
iotable_init ( & map , 1 ) ;
2011-12-29 13:09:51 +01:00
}
}
2019-07-11 20:58:43 -07:00
static int __dma_update_pte ( pte_t * pte , unsigned long addr , void * data )
2011-12-29 13:09:51 +01:00
{
struct page * page = virt_to_page ( addr ) ;
pgprot_t prot = * ( pgprot_t * ) data ;
set_pte_ext ( pte , mk_pte ( page , prot ) , 0 ) ;
return 0 ;
}
static void __dma_remap ( struct page * page , size_t size , pgprot_t prot )
{
unsigned long start = ( unsigned long ) page_address ( page ) ;
unsigned end = start + size ;
apply_to_page_range ( & init_mm , start , size , __dma_update_pte , & prot ) ;
flush_tlb_kernel_range ( start , end ) ;
}
static void * __alloc_remap_buffer ( struct device * dev , size_t size , gfp_t gfp ,
pgprot_t prot , struct page * * ret_page ,
2015-02-09 10:38:35 +01:00
const void * caller , bool want_vaddr )
2011-12-29 13:09:51 +01:00
{
struct page * page ;
2015-02-09 10:38:35 +01:00
void * ptr = NULL ;
2016-04-15 11:15:18 +01:00
/*
* __alloc_remap_buffer is only called when the device is
* non - coherent
*/
page = __dma_alloc_buffer ( dev , size , gfp , NORMAL ) ;
2011-12-29 13:09:51 +01:00
if ( ! page )
return NULL ;
2015-02-09 10:38:35 +01:00
if ( ! want_vaddr )
goto out ;
2011-12-29 13:09:51 +01:00
ptr = __dma_alloc_remap ( page , size , gfp , prot , caller ) ;
if ( ! ptr ) {
__dma_free_buffer ( page , size ) ;
return NULL ;
}
2015-02-09 10:38:35 +01:00
out :
2011-12-29 13:09:51 +01:00
* ret_page = page ;
return ptr ;
}
2012-07-30 09:11:33 +02:00
static void * __alloc_from_pool ( size_t size , struct page * * ret_page )
2011-12-29 13:09:51 +01:00
{
2014-10-09 15:26:42 -07:00
unsigned long val ;
2012-07-30 09:11:33 +02:00
void * ptr = NULL ;
2011-12-29 13:09:51 +01:00
2014-10-09 15:26:42 -07:00
if ( ! atomic_pool ) {
2012-07-30 09:11:33 +02:00
WARN ( 1 , " coherent pool not initialised! \n " ) ;
2011-12-29 13:09:51 +01:00
return NULL ;
}
2014-10-09 15:26:42 -07:00
val = gen_pool_alloc ( atomic_pool , size ) ;
if ( val ) {
phys_addr_t phys = gen_pool_virt_to_phys ( atomic_pool , val ) ;
* ret_page = phys_to_page ( phys ) ;
ptr = ( void * ) val ;
2011-12-29 13:09:51 +01:00
}
2012-07-30 09:11:33 +02:00
return ptr ;
2011-12-29 13:09:51 +01:00
}
2012-08-28 08:13:02 +03:00
static bool __in_atomic_pool ( void * start , size_t size )
{
2014-10-09 15:26:42 -07:00
return addr_in_gen_pool ( atomic_pool , ( unsigned long ) start , size ) ;
2012-08-28 08:13:02 +03:00
}
2012-07-30 09:11:33 +02:00
static int __free_from_pool ( void * start , size_t size )
2011-12-29 13:09:51 +01:00
{
2012-08-28 08:13:02 +03:00
if ( ! __in_atomic_pool ( start , size ) )
2011-12-29 13:09:51 +01:00
return 0 ;
2014-10-09 15:26:42 -07:00
gen_pool_free ( atomic_pool , ( unsigned long ) start , size ) ;
2012-07-30 09:11:33 +02:00
2011-12-29 13:09:51 +01:00
return 1 ;
}
static void * __alloc_from_contiguous ( struct device * dev , size_t size ,
2013-01-16 15:38:44 +01:00
pgprot_t prot , struct page * * ret_page ,
2016-04-15 11:15:18 +01:00
const void * caller , bool want_vaddr ,
2017-02-24 14:58:44 -08:00
int coherent_flag , gfp_t gfp )
2011-12-29 13:09:51 +01:00
{
unsigned long order = get_order ( size ) ;
size_t count = size > > PAGE_SHIFT ;
struct page * page ;
2015-02-09 10:38:35 +01:00
void * ptr = NULL ;
2011-12-29 13:09:51 +01:00
2018-08-17 15:49:00 -07:00
page = dma_alloc_from_contiguous ( dev , count , order , gfp & __GFP_NOWARN ) ;
2011-12-29 13:09:51 +01:00
if ( ! page )
return NULL ;
2016-04-15 11:15:18 +01:00
__dma_clear_buffer ( page , size , coherent_flag ) ;
2011-12-29 13:09:51 +01:00
2015-02-09 10:38:35 +01:00
if ( ! want_vaddr )
goto out ;
2013-01-16 15:38:44 +01:00
if ( PageHighMem ( page ) ) {
ptr = __dma_alloc_remap ( page , size , GFP_KERNEL , prot , caller ) ;
if ( ! ptr ) {
dma_release_from_contiguous ( dev , page , count ) ;
return NULL ;
}
} else {
__dma_remap ( page , size , prot ) ;
ptr = page_address ( page ) ;
}
2015-02-09 10:38:35 +01:00
out :
2011-12-29 13:09:51 +01:00
* ret_page = page ;
2013-01-16 15:38:44 +01:00
return ptr ;
2011-12-29 13:09:51 +01:00
}
static void __free_from_contiguous ( struct device * dev , struct page * page ,
2015-02-09 10:38:35 +01:00
void * cpu_addr , size_t size , bool want_vaddr )
2011-12-29 13:09:51 +01:00
{
2015-02-09 10:38:35 +01:00
if ( want_vaddr ) {
if ( PageHighMem ( page ) )
__dma_free_remap ( cpu_addr , size ) ;
else
__dma_remap ( page , size , PAGE_KERNEL ) ;
}
2011-12-29 13:09:51 +01:00
dma_release_from_contiguous ( dev , page , size > > PAGE_SHIFT ) ;
}
2016-08-03 13:46:00 -07:00
static inline pgprot_t __get_dma_pgprot ( unsigned long attrs , pgprot_t prot )
2012-05-16 18:31:23 +02:00
{
2016-08-03 13:46:00 -07:00
prot = ( attrs & DMA_ATTR_WRITE_COMBINE ) ?
pgprot_writecombine ( prot ) :
pgprot_dmacoherent ( prot ) ;
2012-05-16 18:31:23 +02:00
return prot ;
}
2011-12-29 13:09:51 +01:00
static void * __alloc_simple_buffer ( struct device * dev , size_t size , gfp_t gfp ,
struct page * * ret_page )
2009-07-24 12:35:02 +01:00
{
2011-12-29 13:09:51 +01:00
struct page * page ;
2016-04-15 11:15:18 +01:00
/* __alloc_simple_buffer is only called when the device is coherent */
page = __dma_alloc_buffer ( dev , size , gfp , COHERENT ) ;
2011-12-29 13:09:51 +01:00
if ( ! page )
return NULL ;
* ret_page = page ;
return page_address ( page ) ;
}
2016-03-03 15:58:01 +01:00
static void * simple_allocator_alloc ( struct arm_dma_alloc_args * args ,
struct page * * ret_page )
{
return __alloc_simple_buffer ( args - > dev , args - > size , args - > gfp ,
ret_page ) ;
}
2011-12-29 13:09:51 +01:00
2016-03-03 15:58:01 +01:00
static void simple_allocator_free ( struct arm_dma_free_args * args )
{
__dma_free_buffer ( args - > page , args - > size ) ;
}
static struct arm_dma_allocator simple_allocator = {
. alloc = simple_allocator_alloc ,
. free = simple_allocator_free ,
} ;
static void * cma_allocator_alloc ( struct arm_dma_alloc_args * args ,
struct page * * ret_page )
{
return __alloc_from_contiguous ( args - > dev , args - > size , args - > prot ,
ret_page , args - > caller ,
2017-02-24 14:58:44 -08:00
args - > want_vaddr , args - > coherent_flag ,
args - > gfp ) ;
2016-03-03 15:58:01 +01:00
}
static void cma_allocator_free ( struct arm_dma_free_args * args )
{
__free_from_contiguous ( args - > dev , args - > page , args - > cpu_addr ,
args - > size , args - > want_vaddr ) ;
}
static struct arm_dma_allocator cma_allocator = {
. alloc = cma_allocator_alloc ,
. free = cma_allocator_free ,
} ;
static void * pool_allocator_alloc ( struct arm_dma_alloc_args * args ,
struct page * * ret_page )
{
return __alloc_from_pool ( args - > size , ret_page ) ;
}
static void pool_allocator_free ( struct arm_dma_free_args * args )
{
__free_from_pool ( args - > cpu_addr , args - > size ) ;
}
static struct arm_dma_allocator pool_allocator = {
. alloc = pool_allocator_alloc ,
. free = pool_allocator_free ,
} ;
static void * remap_allocator_alloc ( struct arm_dma_alloc_args * args ,
struct page * * ret_page )
{
return __alloc_remap_buffer ( args - > dev , args - > size , args - > gfp ,
args - > prot , ret_page , args - > caller ,
args - > want_vaddr ) ;
}
static void remap_allocator_free ( struct arm_dma_free_args * args )
{
if ( args - > want_vaddr )
__dma_free_remap ( args - > cpu_addr , args - > size ) ;
__dma_free_buffer ( args - > page , args - > size ) ;
}
static struct arm_dma_allocator remap_allocator = {
. alloc = remap_allocator_alloc ,
. free = remap_allocator_free ,
} ;
2011-12-29 13:09:51 +01:00
static void * __dma_alloc ( struct device * dev , size_t size , dma_addr_t * handle ,
2015-02-09 10:38:35 +01:00
gfp_t gfp , pgprot_t prot , bool is_coherent ,
2016-08-03 13:46:00 -07:00
unsigned long attrs , const void * caller )
2011-12-29 13:09:51 +01:00
{
u64 mask = get_coherent_dma_mask ( dev ) ;
2012-10-24 14:09:14 +09:00
struct page * page = NULL ;
2009-11-19 21:12:17 +00:00
void * addr ;
2016-03-03 15:58:01 +01:00
bool allowblock , cma ;
2016-03-03 15:58:00 +01:00
struct arm_dma_buffer * buf ;
2016-03-03 15:58:01 +01:00
struct arm_dma_alloc_args args = {
. dev = dev ,
. size = PAGE_ALIGN ( size ) ,
. gfp = gfp ,
. prot = prot ,
. caller = caller ,
2016-08-03 13:46:00 -07:00
. want_vaddr = ( ( attrs & DMA_ATTR_NO_KERNEL_MAPPING ) = = 0 ) ,
2016-04-15 11:15:18 +01:00
. coherent_flag = is_coherent ? COHERENT : NORMAL ,
2016-03-03 15:58:01 +01:00
} ;
2009-07-24 12:35:02 +01:00
2011-12-29 13:09:51 +01:00
# ifdef CONFIG_DMA_API_DEBUG
u64 limit = ( mask + 1 ) & ~ mask ;
if ( limit & & size > = limit ) {
dev_warn ( dev , " coherent allocation too big (requested %#x mask %#llx) \n " ,
size , mask ) ;
return NULL ;
}
# endif
if ( ! mask )
return NULL ;
2016-04-13 05:55:29 +01:00
buf = kzalloc ( sizeof ( * buf ) ,
gfp & ~ ( __GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM ) ) ;
2016-03-03 15:58:00 +01:00
if ( ! buf )
return NULL ;
2011-12-29 13:09:51 +01:00
if ( mask < 0xffffffffULL )
gfp | = GFP_DMA ;
2011-11-24 00:47:12 +01:00
/*
* Following is a work - around ( a . k . a . hack ) to prevent pages
* with __GFP_COMP being passed to split_page ( ) which cannot
* handle them . The real problem is that this flag probably
* should be 0 on ARM as it is not supported on this
* platform ; see CONFIG_HUGETLBFS .
*/
gfp & = ~ ( __GFP_COMP ) ;
2016-03-03 15:58:01 +01:00
args . gfp = gfp ;
2011-11-24 00:47:12 +01:00
2018-11-21 18:57:36 +01:00
* handle = DMA_MAPPING_ERROR ;
2016-03-03 15:58:01 +01:00
allowblock = gfpflags_allow_blocking ( gfp ) ;
cma = allowblock ? dev_get_cma_area ( dev ) : false ;
if ( cma )
buf - > allocator = & cma_allocator ;
2017-05-24 11:24:32 +01:00
else if ( is_coherent )
2016-03-03 15:58:01 +01:00
buf - > allocator = & simple_allocator ;
else if ( allowblock )
buf - > allocator = & remap_allocator ;
2009-11-19 21:12:17 +00:00
else
2016-03-03 15:58:01 +01:00
buf - > allocator = & pool_allocator ;
addr = buf - > allocator - > alloc ( & args , & page ) ;
2009-11-19 16:31:39 +00:00
2016-03-03 15:58:00 +01:00
if ( page ) {
unsigned long flags ;
2011-01-03 00:00:17 +00:00
* handle = pfn_to_dma ( dev , page_to_pfn ( page ) ) ;
2016-03-03 15:58:01 +01:00
buf - > virt = args . want_vaddr ? addr : page ;
2016-03-03 15:58:00 +01:00
spin_lock_irqsave ( & arm_dma_bufs_lock , flags ) ;
list_add ( & buf - > list , & arm_dma_bufs ) ;
spin_unlock_irqrestore ( & arm_dma_bufs_lock , flags ) ;
} else {
kfree ( buf ) ;
}
2009-11-19 16:31:39 +00:00
2016-03-03 15:58:01 +01:00
return args . want_vaddr ? addr : page ;
2009-11-19 21:12:17 +00:00
}
2005-04-16 15:20:36 -07:00
/*
* Allocate DMA - coherent memory space and return both the kernel remapped
* virtual and bus address for that space .
*/
2012-05-16 18:31:23 +02:00
void * arm_dma_alloc ( struct device * dev , size_t size , dma_addr_t * handle ,
2016-08-03 13:46:00 -07:00
gfp_t gfp , unsigned long attrs )
2005-04-16 15:20:36 -07:00
{
2013-10-23 16:14:59 +01:00
pgprot_t prot = __get_dma_pgprot ( attrs , PAGE_KERNEL ) ;
2008-07-18 13:30:14 +04:00
2012-08-21 12:20:17 +02:00
return __dma_alloc ( dev , size , handle , gfp , prot , false ,
2015-02-09 10:38:35 +01:00
attrs , __builtin_return_address ( 0 ) ) ;
2012-08-21 12:20:17 +02:00
}
static void * arm_coherent_dma_alloc ( struct device * dev , size_t size ,
2016-08-03 13:46:00 -07:00
dma_addr_t * handle , gfp_t gfp , unsigned long attrs )
2012-08-21 12:20:17 +02:00
{
2015-07-02 17:28:03 +01:00
return __dma_alloc ( dev , size , handle , gfp , PAGE_KERNEL , true ,
2015-02-09 10:38:35 +01:00
attrs , __builtin_return_address ( 0 ) ) ;
2005-04-16 15:20:36 -07:00
}
2015-06-03 11:25:31 +01:00
static int __arm_dma_mmap ( struct device * dev , struct vm_area_struct * vma ,
2012-05-16 18:31:23 +02:00
void * cpu_addr , dma_addr_t dma_addr , size_t size ,
2016-08-03 13:46:00 -07:00
unsigned long attrs )
2005-04-16 15:20:36 -07:00
{
2018-12-04 10:05:32 +01:00
int ret = - ENXIO ;
2018-04-23 12:30:27 +01:00
unsigned long nr_vma_pages = vma_pages ( vma ) ;
2012-07-30 09:35:26 +02:00
unsigned long nr_pages = PAGE_ALIGN ( size ) > > PAGE_SHIFT ;
2011-12-29 13:09:51 +01:00
unsigned long pfn = dma_to_pfn ( dev , dma_addr ) ;
2012-07-30 09:35:26 +02:00
unsigned long off = vma - > vm_pgoff ;
2017-07-20 11:19:58 +01:00
if ( dma_mmap_from_dev_coherent ( dev , vma , cpu_addr , size , & ret ) )
2012-05-15 19:04:13 +02:00
return ret ;
2012-07-30 09:35:26 +02:00
if ( off < nr_pages & & nr_vma_pages < = ( nr_pages - off ) ) {
ret = remap_pfn_range ( vma , vma - > vm_start ,
pfn + off ,
vma - > vm_end - vma - > vm_start ,
vma - > vm_page_prot ) ;
}
2005-04-16 15:20:36 -07:00
return ret ;
}
2015-06-03 11:25:31 +01:00
/*
* Create userspace mapping for the DMA - coherent memory .
*/
static int arm_coherent_dma_mmap ( struct device * dev , struct vm_area_struct * vma ,
void * cpu_addr , dma_addr_t dma_addr , size_t size ,
2016-08-03 13:46:00 -07:00
unsigned long attrs )
2015-06-03 11:25:31 +01:00
{
return __arm_dma_mmap ( dev , vma , cpu_addr , dma_addr , size , attrs ) ;
}
int arm_dma_mmap ( struct device * dev , struct vm_area_struct * vma ,
void * cpu_addr , dma_addr_t dma_addr , size_t size ,
2016-08-03 13:46:00 -07:00
unsigned long attrs )
2015-06-03 11:25:31 +01:00
{
vma - > vm_page_prot = __get_dma_pgprot ( attrs , vma - > vm_page_prot ) ;
return __arm_dma_mmap ( dev , vma , cpu_addr , dma_addr , size , attrs ) ;
}
2005-04-16 15:20:36 -07:00
/*
2011-12-29 13:09:51 +01:00
* Free a buffer as defined by the above mapping .
2005-04-16 15:20:36 -07:00
*/
2012-08-21 12:20:17 +02:00
static void __arm_dma_free ( struct device * dev , size_t size , void * cpu_addr ,
2016-08-03 13:46:00 -07:00
dma_addr_t handle , unsigned long attrs ,
2012-08-21 12:20:17 +02:00
bool is_coherent )
2005-04-16 15:20:36 -07:00
{
2011-12-29 13:09:51 +01:00
struct page * page = pfn_to_page ( dma_to_pfn ( dev , handle ) ) ;
2016-03-03 15:58:00 +01:00
struct arm_dma_buffer * buf ;
2016-03-03 15:58:01 +01:00
struct arm_dma_free_args args = {
. dev = dev ,
. size = PAGE_ALIGN ( size ) ,
. cpu_addr = cpu_addr ,
. page = page ,
2016-08-03 13:46:00 -07:00
. want_vaddr = ( ( attrs & DMA_ATTR_NO_KERNEL_MAPPING ) = = 0 ) ,
2016-03-03 15:58:01 +01:00
} ;
2016-03-03 15:58:00 +01:00
buf = arm_dma_buffer_find ( cpu_addr ) ;
if ( WARN ( ! buf , " Freeing invalid buffer %p \n " , cpu_addr ) )
return ;
2005-11-25 15:52:51 +00:00
2016-03-03 15:58:01 +01:00
buf - > allocator - > free ( & args ) ;
2016-03-03 15:58:00 +01:00
kfree ( buf ) ;
2005-04-16 15:20:36 -07:00
}
2008-09-25 16:30:57 +01:00
2012-08-21 12:20:17 +02:00
void arm_dma_free ( struct device * dev , size_t size , void * cpu_addr ,
2016-08-03 13:46:00 -07:00
dma_addr_t handle , unsigned long attrs )
2012-08-21 12:20:17 +02:00
{
__arm_dma_free ( dev , size , cpu_addr , handle , attrs , false ) ;
}
static void arm_coherent_dma_free ( struct device * dev , size_t size , void * cpu_addr ,
2016-08-03 13:46:00 -07:00
dma_addr_t handle , unsigned long attrs )
2012-08-21 12:20:17 +02:00
{
__arm_dma_free ( dev , size , cpu_addr , handle , attrs , true ) ;
}
2017-03-29 17:12:47 +01:00
/*
* The whole dma_get_sgtable ( ) idea is fundamentally unsafe - it seems
* that the intention is to allow exporting memory allocated via the
* coherent DMA APIs through the dma_buf API , which only accepts a
* scattertable . This presents a couple of problems :
* 1. Not all memory allocated via the coherent DMA APIs is backed by
* a struct page
* 2. Passing coherent DMA memory into the streaming APIs is not allowed
* as we will try to flush the memory through a different alias to that
* actually being used ( and the flushes are redundant . )
*/
2012-06-13 10:01:15 +02:00
int arm_dma_get_sgtable ( struct device * dev , struct sg_table * sgt ,
void * cpu_addr , dma_addr_t handle , size_t size ,
2016-08-03 13:46:00 -07:00
unsigned long attrs )
2012-06-13 10:01:15 +02:00
{
2017-03-29 17:12:47 +01:00
unsigned long pfn = dma_to_pfn ( dev , handle ) ;
struct page * page ;
2012-06-13 10:01:15 +02:00
int ret ;
2017-03-29 17:12:47 +01:00
/* If the PFN is not valid, we do not have a struct page */
if ( ! pfn_valid ( pfn ) )
return - ENXIO ;
page = pfn_to_page ( pfn ) ;
2012-06-13 10:01:15 +02:00
ret = sg_alloc_table ( sgt , 1 , GFP_KERNEL ) ;
if ( unlikely ( ret ) )
return ret ;
sg_set_page ( sgt - > sgl , page , PAGE_ALIGN ( size ) , 0 ) ;
return 0 ;
}
2009-11-24 16:27:17 +00:00
static void dma_cache_maint_page ( struct page * page , unsigned long offset ,
2009-11-26 16:19:58 +00:00
size_t size , enum dma_data_direction dir ,
void ( * op ) ( const void * , size_t , int ) )
2009-03-12 22:52:09 -04:00
{
2013-01-19 11:05:57 +00:00
unsigned long pfn ;
size_t left = size ;
pfn = page_to_pfn ( page ) + offset / PAGE_SIZE ;
offset % = PAGE_SIZE ;
2009-03-12 22:52:09 -04:00
/*
* A single sg entry may refer to multiple physically contiguous
* pages . But we still need to process highmem pages individually .
* If highmem is not configured then the bulk of this loop gets
* optimized out .
*/
do {
size_t len = left ;
2009-11-24 14:41:01 +00:00
void * vaddr ;
2013-01-19 11:05:57 +00:00
page = pfn_to_page ( pfn ) ;
2009-11-24 14:41:01 +00:00
if ( PageHighMem ( page ) ) {
2013-01-19 11:05:57 +00:00
if ( len + offset > PAGE_SIZE )
2009-11-24 14:41:01 +00:00
len = PAGE_SIZE - offset ;
2013-04-05 03:16:14 +01:00
if ( cache_is_vipt_nonaliasing ( ) ) {
2010-12-15 15:14:45 -05:00
vaddr = kmap_atomic ( page ) ;
2010-03-29 21:46:02 +01:00
op ( vaddr + offset , len , dir ) ;
2010-12-15 15:14:45 -05:00
kunmap_atomic ( vaddr ) ;
2013-04-05 03:16:14 +01:00
} else {
vaddr = kmap_high_get ( page ) ;
if ( vaddr ) {
op ( vaddr + offset , len , dir ) ;
kunmap_high ( page ) ;
}
2009-03-12 22:52:09 -04:00
}
2009-11-24 14:41:01 +00:00
} else {
vaddr = page_address ( page ) + offset ;
2009-11-26 16:19:58 +00:00
op ( vaddr , len , dir ) ;
2009-03-12 22:52:09 -04:00
}
offset = 0 ;
2013-01-19 11:05:57 +00:00
pfn + + ;
2009-03-12 22:52:09 -04:00
left - = len ;
} while ( left ) ;
}
2009-11-24 16:27:17 +00:00
2012-02-10 19:55:20 +01:00
/*
* Make an area consistent for devices .
* Note : Drivers should NOT use this function directly , as it will break
* platforms with CONFIG_DMABOUNCE .
* Use the driver DMA support - see dma - mapping . h ( dma_sync_ * )
*/
static void __dma_page_cpu_to_dev ( struct page * page , unsigned long off ,
2009-11-24 16:27:17 +00:00
size_t size , enum dma_data_direction dir )
{
2014-04-24 11:30:07 -04:00
phys_addr_t paddr ;
2009-11-24 17:53:33 +00:00
2009-11-26 16:19:58 +00:00
dma_cache_maint_page ( page , off , size , dir , dmac_map_area ) ;
2009-11-24 17:53:33 +00:00
paddr = page_to_phys ( page ) + off ;
2009-10-31 16:52:16 +00:00
if ( dir = = DMA_FROM_DEVICE ) {
outer_inv_range ( paddr , paddr + size ) ;
} else {
outer_clean_range ( paddr , paddr + size ) ;
}
/* FIXME: non-speculating: flush on bidirectional mappings? */
2009-11-24 16:27:17 +00:00
}
2012-02-10 19:55:20 +01:00
static void __dma_page_dev_to_cpu ( struct page * page , unsigned long off ,
2009-11-24 16:27:17 +00:00
size_t size , enum dma_data_direction dir )
{
2014-04-24 11:30:07 -04:00
phys_addr_t paddr = page_to_phys ( page ) + off ;
2009-10-31 16:52:16 +00:00
/* FIXME: non-speculating: not required */
2014-05-03 11:06:55 +01:00
/* in any case, don't bother invalidating if DMA to device */
if ( dir ! = DMA_TO_DEVICE ) {
2009-10-31 16:52:16 +00:00
outer_inv_range ( paddr , paddr + size ) ;
2014-05-03 11:06:55 +01:00
dma_cache_maint_page ( page , off , size , dir , dmac_unmap_area ) ;
}
2010-09-13 15:57:36 +01:00
/*
ARM: 7730/1: DMA-mapping: mark all !DMA_TO_DEVICE pages in unmapping as clean
It is common for one sg to include many pages, so mark all these
pages as clean to avoid unnecessary flushing on them in
set_pte_at() or update_mmu_cache().
The patch might improve loading performance of applciation code a bit.
On the below test code to read file(~1GByte size) from usb mass storage
disk to buffer created with mmap(PROT_READ | PROT_EXEC) on
Pandaboard, average ~1% improvement can be observed with the patch on
10 times test.
unsigned int sum = 0;
static unsigned long tv_diff(struct timeval *tv1, struct timeval *tv2)
{
return (tv2->tv_sec - tv1->tv_sec) * 1000000 + (tv2->tv_usec - tv1->tv_usec);
}
int main(int argc, char *argv[])
{
char *mbuffer;
int fd;
int i;
unsigned long page_size, size;
struct stat stat;
struct timeval t1, t2;
page_size = getpagesize();
fd = open(argv[1], O_RDONLY);
assert(fd >= 0);
fstat(fd, &stat);
size = stat.st_size;
printf("%s: file %s, file size %lu, page size %lun", argv[0],
read_filename, size, page_size);
gettimeofday(&t1, NULL);
mbuffer = mmap(NULL, size, PROT_READ | PROT_EXEC, MAP_SHARED, fd, 0);
for (i = 0 ; i < size ; i += page_size)
sum += mbuffer[i];
munmap(mbuffer, page_size);
gettimeofday(&t2, NULL);
printf("tread mmaped time: %luusn", tv_diff(&t1, &t2));
close(fd);
}
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-05-18 11:21:36 +01:00
* Mark the D - cache clean for these pages to avoid extra flushing .
2010-09-13 15:57:36 +01:00
*/
ARM: 7730/1: DMA-mapping: mark all !DMA_TO_DEVICE pages in unmapping as clean
It is common for one sg to include many pages, so mark all these
pages as clean to avoid unnecessary flushing on them in
set_pte_at() or update_mmu_cache().
The patch might improve loading performance of applciation code a bit.
On the below test code to read file(~1GByte size) from usb mass storage
disk to buffer created with mmap(PROT_READ | PROT_EXEC) on
Pandaboard, average ~1% improvement can be observed with the patch on
10 times test.
unsigned int sum = 0;
static unsigned long tv_diff(struct timeval *tv1, struct timeval *tv2)
{
return (tv2->tv_sec - tv1->tv_sec) * 1000000 + (tv2->tv_usec - tv1->tv_usec);
}
int main(int argc, char *argv[])
{
char *mbuffer;
int fd;
int i;
unsigned long page_size, size;
struct stat stat;
struct timeval t1, t2;
page_size = getpagesize();
fd = open(argv[1], O_RDONLY);
assert(fd >= 0);
fstat(fd, &stat);
size = stat.st_size;
printf("%s: file %s, file size %lu, page size %lun", argv[0],
read_filename, size, page_size);
gettimeofday(&t1, NULL);
mbuffer = mmap(NULL, size, PROT_READ | PROT_EXEC, MAP_SHARED, fd, 0);
for (i = 0 ; i < size ; i += page_size)
sum += mbuffer[i];
munmap(mbuffer, page_size);
gettimeofday(&t2, NULL);
printf("tread mmaped time: %luusn", tv_diff(&t1, &t2));
close(fd);
}
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-05-18 11:21:36 +01:00
if ( dir ! = DMA_TO_DEVICE & & size > = PAGE_SIZE ) {
unsigned long pfn ;
size_t left = size ;
pfn = page_to_pfn ( page ) + off / PAGE_SIZE ;
off % = PAGE_SIZE ;
if ( off ) {
pfn + + ;
left - = PAGE_SIZE - off ;
}
while ( left > = PAGE_SIZE ) {
page = pfn_to_page ( pfn + + ) ;
set_bit ( PG_dcache_clean , & page - > flags ) ;
left - = PAGE_SIZE ;
}
}
2009-11-24 16:27:17 +00:00
}
2009-03-12 22:52:09 -04:00
2008-09-25 16:30:57 +01:00
/**
2012-02-10 19:55:20 +01:00
* arm_dma_map_sg - map a set of SG buffers for streaming mode DMA
2008-09-25 16:30:57 +01:00
* @ dev : valid struct device pointer , or NULL for ISA and EISA - like devices
* @ sg : list of buffers
* @ nents : number of buffers to map
* @ dir : DMA transfer direction
*
* Map a set of buffers described by scatterlist in streaming mode for DMA .
* This is the scatter - gather version of the dma_map_single interface .
* Here the scatter gather list elements are each tagged with the
* appropriate dma address and length . They are obtained via
* sg_dma_ { address , length } .
*
* Device ownership issues as mentioned for dma_map_single are the same
* here .
*/
2012-02-10 19:55:20 +01:00
int arm_dma_map_sg ( struct device * dev , struct scatterlist * sg , int nents ,
2016-08-03 13:46:00 -07:00
enum dma_data_direction dir , unsigned long attrs )
2008-09-25 16:30:57 +01:00
{
2017-01-20 13:04:01 -08:00
const struct dma_map_ops * ops = get_dma_ops ( dev ) ;
2008-09-25 16:30:57 +01:00
struct scatterlist * s ;
2008-09-25 21:05:02 +01:00
int i , j ;
2008-09-25 16:30:57 +01:00
for_each_sg ( sg , s , nents , i ) {
2012-05-16 15:48:21 +02:00
# ifdef CONFIG_NEED_SG_DMA_LENGTH
s - > dma_length = s - > length ;
# endif
2012-02-10 19:55:20 +01:00
s - > dma_address = ops - > map_page ( dev , sg_page ( s ) , s - > offset ,
s - > length , dir , attrs ) ;
2008-09-25 21:05:02 +01:00
if ( dma_mapping_error ( dev , s - > dma_address ) )
goto bad_mapping ;
2008-09-25 16:30:57 +01:00
}
return nents ;
2008-09-25 21:05:02 +01:00
bad_mapping :
for_each_sg ( sg , s , i , j )
2012-02-10 19:55:20 +01:00
ops - > unmap_page ( dev , sg_dma_address ( s ) , sg_dma_len ( s ) , dir , attrs ) ;
2008-09-25 21:05:02 +01:00
return 0 ;
2008-09-25 16:30:57 +01:00
}
/**
2012-02-10 19:55:20 +01:00
* arm_dma_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg
2008-09-25 16:30:57 +01:00
* @ dev : valid struct device pointer , or NULL for ISA and EISA - like devices
* @ sg : list of buffers
2011-01-12 18:50:37 +01:00
* @ nents : number of buffers to unmap ( same as was passed to dma_map_sg )
2008-09-25 16:30:57 +01:00
* @ dir : DMA transfer direction ( same as was passed to dma_map_sg )
*
* Unmap a set of streaming mode DMA translations . Again , CPU access
* rules concerning calls here are the same as for dma_unmap_single ( ) .
*/
2012-02-10 19:55:20 +01:00
void arm_dma_unmap_sg ( struct device * dev , struct scatterlist * sg , int nents ,
2016-08-03 13:46:00 -07:00
enum dma_data_direction dir , unsigned long attrs )
2008-09-25 16:30:57 +01:00
{
2017-01-20 13:04:01 -08:00
const struct dma_map_ops * ops = get_dma_ops ( dev ) ;
2008-09-25 21:05:02 +01:00
struct scatterlist * s ;
int i ;
2011-01-03 11:29:28 +00:00
2008-09-25 21:05:02 +01:00
for_each_sg ( sg , s , nents , i )
2012-02-10 19:55:20 +01:00
ops - > unmap_page ( dev , sg_dma_address ( s ) , sg_dma_len ( s ) , dir , attrs ) ;
2008-09-25 16:30:57 +01:00
}
/**
2012-02-10 19:55:20 +01:00
* arm_dma_sync_sg_for_cpu
2008-09-25 16:30:57 +01:00
* @ dev : valid struct device pointer , or NULL for ISA and EISA - like devices
* @ sg : list of buffers
* @ nents : number of buffers to map ( returned from dma_map_sg )
* @ dir : DMA transfer direction ( same as was passed to dma_map_sg )
*/
2012-02-10 19:55:20 +01:00
void arm_dma_sync_sg_for_cpu ( struct device * dev , struct scatterlist * sg ,
2008-09-25 16:30:57 +01:00
int nents , enum dma_data_direction dir )
{
2017-01-20 13:04:01 -08:00
const struct dma_map_ops * ops = get_dma_ops ( dev ) ;
2008-09-25 16:30:57 +01:00
struct scatterlist * s ;
int i ;
2012-02-10 19:55:20 +01:00
for_each_sg ( sg , s , nents , i )
ops - > sync_single_for_cpu ( dev , sg_dma_address ( s ) , s - > length ,
dir ) ;
2008-09-25 16:30:57 +01:00
}
/**
2012-02-10 19:55:20 +01:00
* arm_dma_sync_sg_for_device
2008-09-25 16:30:57 +01:00
* @ dev : valid struct device pointer , or NULL for ISA and EISA - like devices
* @ sg : list of buffers
* @ nents : number of buffers to map ( returned from dma_map_sg )
* @ dir : DMA transfer direction ( same as was passed to dma_map_sg )
*/
2012-02-10 19:55:20 +01:00
void arm_dma_sync_sg_for_device ( struct device * dev , struct scatterlist * sg ,
2008-09-25 16:30:57 +01:00
int nents , enum dma_data_direction dir )
{
2017-01-20 13:04:01 -08:00
const struct dma_map_ops * ops = get_dma_ops ( dev ) ;
2008-09-25 16:30:57 +01:00
struct scatterlist * s ;
int i ;
2012-02-10 19:55:20 +01:00
for_each_sg ( sg , s , nents , i )
ops - > sync_single_for_device ( dev , sg_dma_address ( s ) , s - > length ,
dir ) ;
2008-09-25 16:30:57 +01:00
}
2011-01-03 11:29:28 +00:00
2011-07-08 21:26:59 +01:00
/*
* Return whether the given device DMA address mask can be supported
* properly . For example , if your device can only drive the low 24 - bits
* during bus mastering , then you would pass 0x00ffffff as the mask
* to this function .
*/
2017-05-22 11:20:18 +02:00
int arm_dma_supported ( struct device * dev , u64 mask )
2011-07-08 21:26:59 +01:00
{
2013-12-06 12:30:42 +00:00
return __dma_supported ( dev , mask , false ) ;
2011-07-08 21:26:59 +01:00
}
2018-05-30 16:06:24 +02:00
static const struct dma_map_ops * arm_get_dma_map_ops ( bool coherent )
{
2019-07-23 11:33:12 +02:00
/*
* When CONFIG_ARM_LPAE is set , physical address can extend above
* 32 - bits , which then can ' t be addressed by devices that only support
* 32 - bit DMA .
* Use the generic dma - direct / swiotlb ops code in that case , as that
* handles bounce buffering for us .
*
* Note : this checks CONFIG_ARM_LPAE instead of CONFIG_SWIOTLB as the
* latter is also selected by the Xen code , but that code for now relies
* on non - NULL dev_dma_ops . To be cleaned up later .
*/
if ( IS_ENABLED ( CONFIG_ARM_LPAE ) )
return NULL ;
2018-05-30 16:06:24 +02:00
return coherent ? & arm_coherent_dma_ops : & arm_dma_ops ;
}
2012-05-16 15:48:21 +02:00
# ifdef CONFIG_ARM_DMA_USE_IOMMU
2017-01-06 18:58:13 +05:30
static int __dma_info_to_prot ( enum dma_data_direction dir , unsigned long attrs )
{
int prot = 0 ;
if ( attrs & DMA_ATTR_PRIVILEGED )
prot | = IOMMU_PRIV ;
switch ( dir ) {
case DMA_BIDIRECTIONAL :
return prot | IOMMU_READ | IOMMU_WRITE ;
case DMA_TO_DEVICE :
return prot | IOMMU_READ ;
case DMA_FROM_DEVICE :
return prot | IOMMU_WRITE ;
default :
return prot ;
}
}
2012-05-16 15:48:21 +02:00
/* IOMMU */
2014-02-25 13:09:53 +01:00
static int extend_iommu_mapping ( struct dma_iommu_mapping * mapping ) ;
2012-05-16 15:48:21 +02:00
static inline dma_addr_t __alloc_iova ( struct dma_iommu_mapping * mapping ,
size_t size )
{
unsigned int order = get_order ( size ) ;
unsigned int align = 0 ;
unsigned int count , start ;
2014-05-20 10:02:59 +05:30
size_t mapping_size = mapping - > bits < < PAGE_SHIFT ;
2012-05-16 15:48:21 +02:00
unsigned long flags ;
2014-02-25 13:09:53 +01:00
dma_addr_t iova ;
int i ;
2012-05-16 15:48:21 +02:00
2013-02-06 13:21:14 +09:00
if ( order > CONFIG_ARM_DMA_IOMMU_ALIGNMENT )
order = CONFIG_ARM_DMA_IOMMU_ALIGNMENT ;
2014-02-25 13:01:09 +01:00
count = PAGE_ALIGN ( size ) > > PAGE_SHIFT ;
align = ( 1 < < order ) - 1 ;
2012-05-16 15:48:21 +02:00
spin_lock_irqsave ( & mapping - > lock , flags ) ;
2014-02-25 13:09:53 +01:00
for ( i = 0 ; i < mapping - > nr_bitmaps ; i + + ) {
start = bitmap_find_next_zero_area ( mapping - > bitmaps [ i ] ,
mapping - > bits , 0 , count , align ) ;
if ( start > mapping - > bits )
continue ;
bitmap_set ( mapping - > bitmaps [ i ] , start , count ) ;
break ;
2012-05-16 15:48:21 +02:00
}
2014-02-25 13:09:53 +01:00
/*
* No unused range found . Try to extend the existing mapping
* and perform a second attempt to reserve an IO virtual
* address range of size bytes .
*/
if ( i = = mapping - > nr_bitmaps ) {
if ( extend_iommu_mapping ( mapping ) ) {
spin_unlock_irqrestore ( & mapping - > lock , flags ) ;
2018-11-21 18:57:36 +01:00
return DMA_MAPPING_ERROR ;
2014-02-25 13:09:53 +01:00
}
start = bitmap_find_next_zero_area ( mapping - > bitmaps [ i ] ,
mapping - > bits , 0 , count , align ) ;
if ( start > mapping - > bits ) {
spin_unlock_irqrestore ( & mapping - > lock , flags ) ;
2018-11-21 18:57:36 +01:00
return DMA_MAPPING_ERROR ;
2014-02-25 13:09:53 +01:00
}
bitmap_set ( mapping - > bitmaps [ i ] , start , count ) ;
}
2012-05-16 15:48:21 +02:00
spin_unlock_irqrestore ( & mapping - > lock , flags ) ;
2014-05-20 10:02:59 +05:30
iova = mapping - > base + ( mapping_size * i ) ;
2014-02-25 13:01:09 +01:00
iova + = start < < PAGE_SHIFT ;
2014-02-25 13:09:53 +01:00
return iova ;
2012-05-16 15:48:21 +02:00
}
static inline void __free_iova ( struct dma_iommu_mapping * mapping ,
dma_addr_t addr , size_t size )
{
2014-02-25 13:09:53 +01:00
unsigned int start , count ;
2014-05-20 10:02:59 +05:30
size_t mapping_size = mapping - > bits < < PAGE_SHIFT ;
2012-05-16 15:48:21 +02:00
unsigned long flags ;
2014-02-25 13:09:53 +01:00
dma_addr_t bitmap_base ;
u32 bitmap_index ;
if ( ! size )
return ;
2014-05-20 10:02:59 +05:30
bitmap_index = ( u32 ) ( addr - mapping - > base ) / ( u32 ) mapping_size ;
2014-02-25 13:09:53 +01:00
BUG_ON ( addr < mapping - > base | | bitmap_index > mapping - > extensions ) ;
2014-05-20 10:02:59 +05:30
bitmap_base = mapping - > base + mapping_size * bitmap_index ;
2014-02-25 13:09:53 +01:00
2014-02-25 13:01:09 +01:00
start = ( addr - bitmap_base ) > > PAGE_SHIFT ;
2014-02-25 13:09:53 +01:00
2014-05-20 10:02:59 +05:30
if ( addr + size > bitmap_base + mapping_size ) {
2014-02-25 13:09:53 +01:00
/*
* The address range to be freed reaches into the iova
* range of the next bitmap . This should not happen as
* we don ' t allow this in __alloc_iova ( at the
* moment ) .
*/
BUG ( ) ;
} else
2014-02-25 13:01:09 +01:00
count = size > > PAGE_SHIFT ;
2012-05-16 15:48:21 +02:00
spin_lock_irqsave ( & mapping - > lock , flags ) ;
2014-02-25 13:09:53 +01:00
bitmap_clear ( mapping - > bitmaps [ bitmap_index ] , start , count ) ;
2012-05-16 15:48:21 +02:00
spin_unlock_irqrestore ( & mapping - > lock , flags ) ;
}
ARM: 8505/1: dma-mapping: Optimize allocation
The __iommu_alloc_buffer() is expected to be called to allocate pretty
sizeable buffers. Upon simple tests of video I saw it trying to
allocate 4,194,304 bytes. The function tries to allocate large chunks
in order to optimize IOMMU TLB usage.
The current function is very, very slow.
One problem is the way it keeps trying and trying to allocate big
chunks. Imagine a very fragmented memory that has 4M free but no
contiguous pages at all. Further imagine allocating 4M (1024 pages).
We'll do the following memory allocations:
- For page 1:
- Try to allocate order 10 (no retry)
- Try to allocate order 9 (no retry)
- ...
- Try to allocate order 0 (with retry, but not needed)
- For page 2:
- Try to allocate order 9 (no retry)
- Try to allocate order 8 (no retry)
- ...
- Try to allocate order 0 (with retry, but not needed)
- ...
- ...
Total number of calls to alloc() calls for this case is:
sum(int(math.log(i, 2)) + 1 for i in range(1, 1025))
=> 9228
The above is obviously worse case, but given how slow alloc can be we
really want to try to avoid even somewhat bad cases. I timed the old
code with a device under memory pressure and it wasn't hard to see it
take more than 120 seconds to allocate 4 megs of memory! (NOTE: testing
was done on kernel 3.14, so possibly mainline would behave
differently).
A second problem is that allocating big chunks under memory pressure
when we don't need them is just not a great idea anyway unless we really
need them. We can make due pretty well with smaller chunks so it's
probably wise to leave bigger chunks for other users once memory
pressure is on.
Let's adjust the allocation like this:
1. If a big chunk fails, stop trying to hard and bump down to lower
order allocations.
2. Don't try useless orders. The whole point of big chunks is to
optimize the TLB and it can really only make use of 2M, 1M, 64K and
4K sizes.
We'll still tend to eat up a bunch of big chunks, but that might be the
right answer for some users. A future patch could possibly add a new
DMA_ATTR that would let the caller decide that TLB optimization isn't
important and that we should use smaller chunks. Presumably this would
be a sane strategy for some callers.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-01-29 23:06:08 +01:00
/* We'll try 2M, 1M, 64K, and finally 4K; array must end with 0! */
static const int iommu_order_array [ ] = { 9 , 8 , 4 , 0 } ;
2012-10-15 16:03:52 +02:00
static struct page * * __iommu_alloc_buffer ( struct device * dev , size_t size ,
2016-08-03 13:46:00 -07:00
gfp_t gfp , unsigned long attrs ,
2016-04-15 11:15:18 +01:00
int coherent_flag )
2012-05-16 15:48:21 +02:00
{
struct page * * pages ;
int count = size > > PAGE_SHIFT ;
int array_size = count * sizeof ( struct page * ) ;
int i = 0 ;
ARM: 8505/1: dma-mapping: Optimize allocation
The __iommu_alloc_buffer() is expected to be called to allocate pretty
sizeable buffers. Upon simple tests of video I saw it trying to
allocate 4,194,304 bytes. The function tries to allocate large chunks
in order to optimize IOMMU TLB usage.
The current function is very, very slow.
One problem is the way it keeps trying and trying to allocate big
chunks. Imagine a very fragmented memory that has 4M free but no
contiguous pages at all. Further imagine allocating 4M (1024 pages).
We'll do the following memory allocations:
- For page 1:
- Try to allocate order 10 (no retry)
- Try to allocate order 9 (no retry)
- ...
- Try to allocate order 0 (with retry, but not needed)
- For page 2:
- Try to allocate order 9 (no retry)
- Try to allocate order 8 (no retry)
- ...
- Try to allocate order 0 (with retry, but not needed)
- ...
- ...
Total number of calls to alloc() calls for this case is:
sum(int(math.log(i, 2)) + 1 for i in range(1, 1025))
=> 9228
The above is obviously worse case, but given how slow alloc can be we
really want to try to avoid even somewhat bad cases. I timed the old
code with a device under memory pressure and it wasn't hard to see it
take more than 120 seconds to allocate 4 megs of memory! (NOTE: testing
was done on kernel 3.14, so possibly mainline would behave
differently).
A second problem is that allocating big chunks under memory pressure
when we don't need them is just not a great idea anyway unless we really
need them. We can make due pretty well with smaller chunks so it's
probably wise to leave bigger chunks for other users once memory
pressure is on.
Let's adjust the allocation like this:
1. If a big chunk fails, stop trying to hard and bump down to lower
order allocations.
2. Don't try useless orders. The whole point of big chunks is to
optimize the TLB and it can really only make use of 2M, 1M, 64K and
4K sizes.
We'll still tend to eat up a bunch of big chunks, but that might be the
right answer for some users. A future patch could possibly add a new
DMA_ATTR that would let the caller decide that TLB optimization isn't
important and that we should use smaller chunks. Presumably this would
be a sane strategy for some callers.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-01-29 23:06:08 +01:00
int order_idx = 0 ;
2012-05-16 15:48:21 +02:00
if ( array_size < = PAGE_SIZE )
2015-02-19 07:29:58 +01:00
pages = kzalloc ( array_size , GFP_KERNEL ) ;
2012-05-16 15:48:21 +02:00
else
pages = vzalloc ( array_size ) ;
if ( ! pages )
return NULL ;
2016-08-03 13:46:00 -07:00
if ( attrs & DMA_ATTR_FORCE_CONTIGUOUS )
2012-10-15 16:03:52 +02:00
{
unsigned long order = get_order ( size ) ;
struct page * page ;
2018-08-17 15:49:00 -07:00
page = dma_alloc_from_contiguous ( dev , count , order ,
gfp & __GFP_NOWARN ) ;
2012-10-15 16:03:52 +02:00
if ( ! page )
goto error ;
2016-04-15 11:15:18 +01:00
__dma_clear_buffer ( page , size , coherent_flag ) ;
2012-10-15 16:03:52 +02:00
for ( i = 0 ; i < count ; i + + )
pages [ i ] = page + i ;
return pages ;
}
ARM: 8507/1: dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc
If we know that TLB efficiency will not be an issue when memory is
accessed then it's not terribly important to allocate big chunks of
memory. The whole point of allocating the big chunks was that it would
make TLB usage efficient.
As Marek Szyprowski indicated:
Please note that mapping memory with larger pages significantly
improves performance, especially when IOMMU has a little TLB
cache. This can be easily observed when multimedia devices do
processing of RGB data with 90/270 degree rotation
Image rotation is distinctly an operation that needs to bounce around
through memory, so it makes sense that TLB efficiency is important
there.
Video decoding, on the other hand, is a fairly sequential operation.
During video decoding it's not expected that we'll be jumping all over
memory. Decoding video is also pretty heavy and the TLB misses aren't a
huge deal. Presumably most HW video acceleration users of dma-mapping
will not care about huge pages and will set DMA_ATTR_ALLOC_SINGLE_PAGES.
Allocating big chunks of memory is quite expensive, especially if we're
doing it repeadly and memory is full. In one (out of tree) usage model
it is common that arm_iommu_alloc_attrs() is called 16 times in a row,
each one trying to allocate 4 MB of memory. This is called whenever the
system encounters a new video, which could easily happen while the
memory system is stressed out. In fact, on certain social media
websites that auto-play video and have infinite scrolling, it's quite
common to see not just one of these 16x4MB allocations but 2 or 3 right
after another. Asking the system even to do a small amount of extra
work to give us big chunks in this case is just not a good use of time.
Allocating big chunks of memory is also expensive indirectly. Even if
we ask the system not to do ANY extra work to allocate _our_ memory,
we're still potentially eating up all big chunks in the system.
Presumably there are other users in the system that aren't quite as
flexible and that actually need these big chunks. By eating all the big
chunks we're causing extra work for the rest of the system. We also may
start making other memory allocations fail. While the system may be
robust to such failures (as is the case with dwc2 USB trying to allocate
buffers for Ethernet data and with WiFi trying to allocate buffers for
WiFi data), it is yet another big performance hit.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-01-29 23:08:46 +01:00
/* Go straight to 4K chunks if caller says it's OK. */
2016-08-03 13:46:00 -07:00
if ( attrs & DMA_ATTR_ALLOC_SINGLE_PAGES )
ARM: 8507/1: dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc
If we know that TLB efficiency will not be an issue when memory is
accessed then it's not terribly important to allocate big chunks of
memory. The whole point of allocating the big chunks was that it would
make TLB usage efficient.
As Marek Szyprowski indicated:
Please note that mapping memory with larger pages significantly
improves performance, especially when IOMMU has a little TLB
cache. This can be easily observed when multimedia devices do
processing of RGB data with 90/270 degree rotation
Image rotation is distinctly an operation that needs to bounce around
through memory, so it makes sense that TLB efficiency is important
there.
Video decoding, on the other hand, is a fairly sequential operation.
During video decoding it's not expected that we'll be jumping all over
memory. Decoding video is also pretty heavy and the TLB misses aren't a
huge deal. Presumably most HW video acceleration users of dma-mapping
will not care about huge pages and will set DMA_ATTR_ALLOC_SINGLE_PAGES.
Allocating big chunks of memory is quite expensive, especially if we're
doing it repeadly and memory is full. In one (out of tree) usage model
it is common that arm_iommu_alloc_attrs() is called 16 times in a row,
each one trying to allocate 4 MB of memory. This is called whenever the
system encounters a new video, which could easily happen while the
memory system is stressed out. In fact, on certain social media
websites that auto-play video and have infinite scrolling, it's quite
common to see not just one of these 16x4MB allocations but 2 or 3 right
after another. Asking the system even to do a small amount of extra
work to give us big chunks in this case is just not a good use of time.
Allocating big chunks of memory is also expensive indirectly. Even if
we ask the system not to do ANY extra work to allocate _our_ memory,
we're still potentially eating up all big chunks in the system.
Presumably there are other users in the system that aren't quite as
flexible and that actually need these big chunks. By eating all the big
chunks we're causing extra work for the rest of the system. We also may
start making other memory allocations fail. While the system may be
robust to such failures (as is the case with dwc2 USB trying to allocate
buffers for Ethernet data and with WiFi trying to allocate buffers for
WiFi data), it is yet another big performance hit.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-01-29 23:08:46 +01:00
order_idx = ARRAY_SIZE ( iommu_order_array ) - 1 ;
2013-01-16 15:41:02 +01:00
/*
* IOMMU can map any pages , so himem can also be used here
*/
gfp | = __GFP_NOWARN | __GFP_HIGHMEM ;
2012-05-16 15:48:21 +02:00
while ( count ) {
2015-04-01 07:26:33 +01:00
int j , order ;
ARM: 8505/1: dma-mapping: Optimize allocation
The __iommu_alloc_buffer() is expected to be called to allocate pretty
sizeable buffers. Upon simple tests of video I saw it trying to
allocate 4,194,304 bytes. The function tries to allocate large chunks
in order to optimize IOMMU TLB usage.
The current function is very, very slow.
One problem is the way it keeps trying and trying to allocate big
chunks. Imagine a very fragmented memory that has 4M free but no
contiguous pages at all. Further imagine allocating 4M (1024 pages).
We'll do the following memory allocations:
- For page 1:
- Try to allocate order 10 (no retry)
- Try to allocate order 9 (no retry)
- ...
- Try to allocate order 0 (with retry, but not needed)
- For page 2:
- Try to allocate order 9 (no retry)
- Try to allocate order 8 (no retry)
- ...
- Try to allocate order 0 (with retry, but not needed)
- ...
- ...
Total number of calls to alloc() calls for this case is:
sum(int(math.log(i, 2)) + 1 for i in range(1, 1025))
=> 9228
The above is obviously worse case, but given how slow alloc can be we
really want to try to avoid even somewhat bad cases. I timed the old
code with a device under memory pressure and it wasn't hard to see it
take more than 120 seconds to allocate 4 megs of memory! (NOTE: testing
was done on kernel 3.14, so possibly mainline would behave
differently).
A second problem is that allocating big chunks under memory pressure
when we don't need them is just not a great idea anyway unless we really
need them. We can make due pretty well with smaller chunks so it's
probably wise to leave bigger chunks for other users once memory
pressure is on.
Let's adjust the allocation like this:
1. If a big chunk fails, stop trying to hard and bump down to lower
order allocations.
2. Don't try useless orders. The whole point of big chunks is to
optimize the TLB and it can really only make use of 2M, 1M, 64K and
4K sizes.
We'll still tend to eat up a bunch of big chunks, but that might be the
right answer for some users. A future patch could possibly add a new
DMA_ATTR that would let the caller decide that TLB optimization isn't
important and that we should use smaller chunks. Presumably this would
be a sane strategy for some callers.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-01-29 23:06:08 +01:00
order = iommu_order_array [ order_idx ] ;
/* Drop down when we get small */
if ( __fls ( count ) < order ) {
order_idx + + ;
continue ;
2015-04-01 07:26:33 +01:00
}
2012-05-16 15:48:21 +02:00
ARM: 8505/1: dma-mapping: Optimize allocation
The __iommu_alloc_buffer() is expected to be called to allocate pretty
sizeable buffers. Upon simple tests of video I saw it trying to
allocate 4,194,304 bytes. The function tries to allocate large chunks
in order to optimize IOMMU TLB usage.
The current function is very, very slow.
One problem is the way it keeps trying and trying to allocate big
chunks. Imagine a very fragmented memory that has 4M free but no
contiguous pages at all. Further imagine allocating 4M (1024 pages).
We'll do the following memory allocations:
- For page 1:
- Try to allocate order 10 (no retry)
- Try to allocate order 9 (no retry)
- ...
- Try to allocate order 0 (with retry, but not needed)
- For page 2:
- Try to allocate order 9 (no retry)
- Try to allocate order 8 (no retry)
- ...
- Try to allocate order 0 (with retry, but not needed)
- ...
- ...
Total number of calls to alloc() calls for this case is:
sum(int(math.log(i, 2)) + 1 for i in range(1, 1025))
=> 9228
The above is obviously worse case, but given how slow alloc can be we
really want to try to avoid even somewhat bad cases. I timed the old
code with a device under memory pressure and it wasn't hard to see it
take more than 120 seconds to allocate 4 megs of memory! (NOTE: testing
was done on kernel 3.14, so possibly mainline would behave
differently).
A second problem is that allocating big chunks under memory pressure
when we don't need them is just not a great idea anyway unless we really
need them. We can make due pretty well with smaller chunks so it's
probably wise to leave bigger chunks for other users once memory
pressure is on.
Let's adjust the allocation like this:
1. If a big chunk fails, stop trying to hard and bump down to lower
order allocations.
2. Don't try useless orders. The whole point of big chunks is to
optimize the TLB and it can really only make use of 2M, 1M, 64K and
4K sizes.
We'll still tend to eat up a bunch of big chunks, but that might be the
right answer for some users. A future patch could possibly add a new
DMA_ATTR that would let the caller decide that TLB optimization isn't
important and that we should use smaller chunks. Presumably this would
be a sane strategy for some callers.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-01-29 23:06:08 +01:00
if ( order ) {
/* See if it's easy to allocate a high-order chunk */
pages [ i ] = alloc_pages ( gfp | __GFP_NORETRY , order ) ;
/* Go down a notch at first sign of pressure */
if ( ! pages [ i ] ) {
order_idx + + ;
continue ;
}
} else {
2015-04-01 07:26:33 +01:00
pages [ i ] = alloc_pages ( gfp , 0 ) ;
if ( ! pages [ i ] )
goto error ;
}
2012-05-16 15:48:21 +02:00
2012-09-11 07:39:39 +02:00
if ( order ) {
2012-05-16 15:48:21 +02:00
split_page ( pages [ i ] , order ) ;
2012-09-11 07:39:39 +02:00
j = 1 < < order ;
while ( - - j )
pages [ i + j ] = pages [ i ] + j ;
}
2012-05-16 15:48:21 +02:00
2016-04-15 11:15:18 +01:00
__dma_clear_buffer ( pages [ i ] , PAGE_SIZE < < order , coherent_flag ) ;
2012-05-16 15:48:21 +02:00
i + = 1 < < order ;
count - = 1 < < order ;
}
return pages ;
error :
2012-07-27 17:12:50 +02:00
while ( i - - )
2012-05-16 15:48:21 +02:00
if ( pages [ i ] )
__free_pages ( pages [ i ] , 0 ) ;
2016-01-22 15:11:02 -08:00
kvfree ( pages ) ;
2012-05-16 15:48:21 +02:00
return NULL ;
}
2012-10-15 16:03:52 +02:00
static int __iommu_free_buffer ( struct device * dev , struct page * * pages ,
2016-08-03 13:46:00 -07:00
size_t size , unsigned long attrs )
2012-05-16 15:48:21 +02:00
{
int count = size > > PAGE_SHIFT ;
int i ;
2012-10-15 16:03:52 +02:00
2016-08-03 13:46:00 -07:00
if ( attrs & DMA_ATTR_FORCE_CONTIGUOUS ) {
2012-10-15 16:03:52 +02:00
dma_release_from_contiguous ( dev , pages [ 0 ] , count ) ;
} else {
for ( i = 0 ; i < count ; i + + )
if ( pages [ i ] )
__free_pages ( pages [ i ] , 0 ) ;
}
2016-01-22 15:11:02 -08:00
kvfree ( pages ) ;
2012-05-16 15:48:21 +02:00
return 0 ;
}
/*
* Create a CPU mapping for a specified pages
*/
static void *
2012-07-30 09:11:33 +02:00
__iommu_alloc_remap ( struct page * * pages , size_t size , gfp_t gfp , pgprot_t prot ,
const void * caller )
2012-05-16 15:48:21 +02:00
{
2014-10-09 15:26:40 -07:00
return dma_common_pages_remap ( pages , size ,
VM_ARM_DMA_CONSISTENT | VM_USERMAP , prot , caller ) ;
2012-05-16 15:48:21 +02:00
}
/*
* Create a mapping in device IO address space for specified pages
*/
static dma_addr_t
2017-01-06 18:58:13 +05:30
__iommu_create_mapping ( struct device * dev , struct page * * pages , size_t size ,
unsigned long attrs )
2012-05-16 15:48:21 +02:00
{
2015-01-16 18:02:15 +01:00
struct dma_iommu_mapping * mapping = to_dma_iommu_mapping ( dev ) ;
2012-05-16 15:48:21 +02:00
unsigned int count = PAGE_ALIGN ( size ) > > PAGE_SHIFT ;
dma_addr_t dma_addr , iova ;
2015-09-14 17:49:02 +01:00
int i ;
2012-05-16 15:48:21 +02:00
dma_addr = __alloc_iova ( mapping , size ) ;
2018-11-21 18:57:36 +01:00
if ( dma_addr = = DMA_MAPPING_ERROR )
2012-05-16 15:48:21 +02:00
return dma_addr ;
iova = dma_addr ;
for ( i = 0 ; i < count ; ) {
2015-09-14 17:49:02 +01:00
int ret ;
2012-05-16 15:48:21 +02:00
unsigned int next_pfn = page_to_pfn ( pages [ i ] ) + 1 ;
phys_addr_t phys = page_to_phys ( pages [ i ] ) ;
unsigned int len , j ;
for ( j = i + 1 ; j < count ; j + + , next_pfn + + )
if ( page_to_pfn ( pages [ j ] ) ! = next_pfn )
break ;
len = ( j - i ) < < PAGE_SHIFT ;
2013-09-27 00:36:15 +02:00
ret = iommu_map ( mapping - > domain , iova , phys , len ,
2017-01-06 18:58:13 +05:30
__dma_info_to_prot ( DMA_BIDIRECTIONAL , attrs ) ) ;
2012-05-16 15:48:21 +02:00
if ( ret < 0 )
goto fail ;
iova + = len ;
i = j ;
}
return dma_addr ;
fail :
iommu_unmap ( mapping - > domain , dma_addr , iova - dma_addr ) ;
__free_iova ( mapping , dma_addr , size ) ;
2018-11-21 18:57:36 +01:00
return DMA_MAPPING_ERROR ;
2012-05-16 15:48:21 +02:00
}
static int __iommu_remove_mapping ( struct device * dev , dma_addr_t iova , size_t size )
{
2015-01-16 18:02:15 +01:00
struct dma_iommu_mapping * mapping = to_dma_iommu_mapping ( dev ) ;
2012-05-16 15:48:21 +02:00
/*
* add optional in - page offset from iova to size and align
* result to page size
*/
size = PAGE_ALIGN ( ( iova & ~ PAGE_MASK ) + size ) ;
iova & = PAGE_MASK ;
iommu_unmap ( mapping - > domain , iova , size ) ;
__free_iova ( mapping , iova , size ) ;
return 0 ;
}
2012-08-28 08:13:03 +03:00
static struct page * * __atomic_get_pages ( void * addr )
{
2014-10-09 15:26:42 -07:00
struct page * page ;
phys_addr_t phys ;
phys = gen_pool_virt_to_phys ( atomic_pool , ( unsigned long ) addr ) ;
page = phys_to_page ( phys ) ;
2012-08-28 08:13:03 +03:00
2014-10-09 15:26:42 -07:00
return ( struct page * * ) page ;
2012-08-28 08:13:03 +03:00
}
2016-08-03 13:46:00 -07:00
static struct page * * __iommu_get_pages ( void * cpu_addr , unsigned long attrs )
2012-07-30 09:11:33 +02:00
{
struct vm_struct * area ;
2012-08-28 08:13:03 +03:00
if ( __in_atomic_pool ( cpu_addr , PAGE_SIZE ) )
return __atomic_get_pages ( cpu_addr ) ;
2016-08-03 13:46:00 -07:00
if ( attrs & DMA_ATTR_NO_KERNEL_MAPPING )
2012-05-16 19:38:58 +01:00
return cpu_addr ;
2012-07-30 09:11:33 +02:00
area = find_vm_area ( cpu_addr ) ;
if ( area & & ( area - > flags & VM_ARM_DMA_CONSISTENT ) )
return area - > pages ;
return NULL ;
}
2016-04-15 11:20:13 +01:00
static void * __iommu_alloc_simple ( struct device * dev , size_t size , gfp_t gfp ,
2017-01-06 18:58:13 +05:30
dma_addr_t * handle , int coherent_flag ,
unsigned long attrs )
2012-08-28 08:13:04 +03:00
{
struct page * page ;
void * addr ;
2016-04-15 11:20:13 +01:00
if ( coherent_flag = = COHERENT )
addr = __alloc_simple_buffer ( dev , size , gfp , & page ) ;
else
addr = __alloc_from_pool ( size , & page ) ;
2012-08-28 08:13:04 +03:00
if ( ! addr )
return NULL ;
2017-01-06 18:58:13 +05:30
* handle = __iommu_create_mapping ( dev , & page , size , attrs ) ;
2018-11-21 18:57:36 +01:00
if ( * handle = = DMA_MAPPING_ERROR )
2012-08-28 08:13:04 +03:00
goto err_mapping ;
return addr ;
err_mapping :
__free_from_pool ( addr , size ) ;
return NULL ;
}
2013-02-08 10:54:48 +01:00
static void __iommu_free_atomic ( struct device * dev , void * cpu_addr ,
2016-04-15 11:20:13 +01:00
dma_addr_t handle , size_t size , int coherent_flag )
2012-08-28 08:13:04 +03:00
{
__iommu_remove_mapping ( dev , handle , size ) ;
2016-04-15 11:20:13 +01:00
if ( coherent_flag = = COHERENT )
__dma_free_buffer ( virt_to_page ( cpu_addr ) , size ) ;
else
__free_from_pool ( cpu_addr , size ) ;
2012-08-28 08:13:04 +03:00
}
2016-04-15 11:20:13 +01:00
static void * __arm_iommu_alloc_attrs ( struct device * dev , size_t size ,
2016-08-03 13:46:00 -07:00
dma_addr_t * handle , gfp_t gfp , unsigned long attrs ,
2016-04-15 11:20:13 +01:00
int coherent_flag )
2012-05-16 15:48:21 +02:00
{
2013-11-25 12:01:03 +00:00
pgprot_t prot = __get_dma_pgprot ( attrs , PAGE_KERNEL ) ;
2012-05-16 15:48:21 +02:00
struct page * * pages ;
void * addr = NULL ;
2018-11-21 18:57:36 +01:00
* handle = DMA_MAPPING_ERROR ;
2012-05-16 15:48:21 +02:00
size = PAGE_ALIGN ( size ) ;
2016-04-15 11:20:13 +01:00
if ( coherent_flag = = COHERENT | | ! gfpflags_allow_blocking ( gfp ) )
return __iommu_alloc_simple ( dev , size , gfp , handle ,
2017-01-06 18:58:13 +05:30
coherent_flag , attrs ) ;
2012-08-28 08:13:04 +03:00
2013-06-20 20:31:00 +08:00
/*
* Following is a work - around ( a . k . a . hack ) to prevent pages
* with __GFP_COMP being passed to split_page ( ) which cannot
* handle them . The real problem is that this flag probably
* should be 0 on ARM as it is not supported on this
* platform ; see CONFIG_HUGETLBFS .
*/
gfp & = ~ ( __GFP_COMP ) ;
2016-04-15 11:20:13 +01:00
pages = __iommu_alloc_buffer ( dev , size , gfp , attrs , coherent_flag ) ;
2012-05-16 15:48:21 +02:00
if ( ! pages )
return NULL ;
2017-01-06 18:58:13 +05:30
* handle = __iommu_create_mapping ( dev , pages , size , attrs ) ;
2018-11-21 18:57:36 +01:00
if ( * handle = = DMA_MAPPING_ERROR )
2012-05-16 15:48:21 +02:00
goto err_buffer ;
2016-08-03 13:46:00 -07:00
if ( attrs & DMA_ATTR_NO_KERNEL_MAPPING )
2012-05-16 19:38:58 +01:00
return pages ;
2012-07-30 09:11:33 +02:00
addr = __iommu_alloc_remap ( pages , size , gfp , prot ,
__builtin_return_address ( 0 ) ) ;
2012-05-16 15:48:21 +02:00
if ( ! addr )
goto err_mapping ;
return addr ;
err_mapping :
__iommu_remove_mapping ( dev , * handle , size ) ;
err_buffer :
2012-10-15 16:03:52 +02:00
__iommu_free_buffer ( dev , pages , size , attrs ) ;
2012-05-16 15:48:21 +02:00
return NULL ;
}
2016-04-15 11:20:13 +01:00
static void * arm_iommu_alloc_attrs ( struct device * dev , size_t size ,
2016-08-03 13:46:00 -07:00
dma_addr_t * handle , gfp_t gfp , unsigned long attrs )
2016-04-15 11:20:13 +01:00
{
return __arm_iommu_alloc_attrs ( dev , size , handle , gfp , attrs , NORMAL ) ;
}
static void * arm_coherent_iommu_alloc_attrs ( struct device * dev , size_t size ,
2016-08-03 13:46:00 -07:00
dma_addr_t * handle , gfp_t gfp , unsigned long attrs )
2016-04-15 11:20:13 +01:00
{
return __arm_iommu_alloc_attrs ( dev , size , handle , gfp , attrs , COHERENT ) ;
}
static int __arm_iommu_mmap_attrs ( struct device * dev , struct vm_area_struct * vma ,
2012-05-16 15:48:21 +02:00
void * cpu_addr , dma_addr_t dma_addr , size_t size ,
2016-08-03 13:46:00 -07:00
unsigned long attrs )
2012-05-16 15:48:21 +02:00
{
2012-05-16 19:38:58 +01:00
struct page * * pages = __iommu_get_pages ( cpu_addr , attrs ) ;
2015-08-28 09:41:39 +01:00
unsigned long nr_pages = PAGE_ALIGN ( size ) > > PAGE_SHIFT ;
2019-05-13 17:22:00 -07:00
int err ;
2012-05-16 15:48:21 +02:00
2012-07-30 09:11:33 +02:00
if ( ! pages )
return - ENXIO ;
2012-05-16 15:48:21 +02:00
2019-05-13 17:22:00 -07:00
if ( vma - > vm_pgoff > = nr_pages )
2015-08-28 09:41:39 +01:00
return - ENXIO ;
2019-05-13 17:22:00 -07:00
err = vm_map_pages ( vma , pages , nr_pages ) ;
if ( err )
pr_err ( " Remapping memory failed: %d \n " , err ) ;
2015-08-28 09:42:09 +01:00
2019-05-13 17:22:00 -07:00
return err ;
2012-05-16 15:48:21 +02:00
}
2016-04-15 11:20:13 +01:00
static int arm_iommu_mmap_attrs ( struct device * dev ,
struct vm_area_struct * vma , void * cpu_addr ,
2016-08-03 13:46:00 -07:00
dma_addr_t dma_addr , size_t size , unsigned long attrs )
2016-04-15 11:20:13 +01:00
{
vma - > vm_page_prot = __get_dma_pgprot ( attrs , vma - > vm_page_prot ) ;
return __arm_iommu_mmap_attrs ( dev , vma , cpu_addr , dma_addr , size , attrs ) ;
}
static int arm_coherent_iommu_mmap_attrs ( struct device * dev ,
struct vm_area_struct * vma , void * cpu_addr ,
2016-08-03 13:46:00 -07:00
dma_addr_t dma_addr , size_t size , unsigned long attrs )
2016-04-15 11:20:13 +01:00
{
return __arm_iommu_mmap_attrs ( dev , vma , cpu_addr , dma_addr , size , attrs ) ;
}
2012-05-16 15:48:21 +02:00
/*
* free a page as defined by the above mapping .
* Must not be called with IRQs disabled .
*/
2016-04-15 11:20:13 +01:00
void __arm_iommu_free_attrs ( struct device * dev , size_t size , void * cpu_addr ,
2016-08-03 13:46:00 -07:00
dma_addr_t handle , unsigned long attrs , int coherent_flag )
2012-05-16 15:48:21 +02:00
{
2013-06-17 13:18:52 +09:00
struct page * * pages ;
2012-05-16 15:48:21 +02:00
size = PAGE_ALIGN ( size ) ;
2016-04-15 11:20:13 +01:00
if ( coherent_flag = = COHERENT | | __in_atomic_pool ( cpu_addr , size ) ) {
__iommu_free_atomic ( dev , cpu_addr , handle , size , coherent_flag ) ;
2012-07-30 09:11:33 +02:00
return ;
2012-05-16 15:48:21 +02:00
}
2012-07-30 09:11:33 +02:00
2013-06-17 13:18:52 +09:00
pages = __iommu_get_pages ( cpu_addr , attrs ) ;
if ( ! pages ) {
WARN ( 1 , " trying to free invalid coherent area: %p \n " , cpu_addr ) ;
2012-08-28 08:13:04 +03:00
return ;
}
2016-08-03 13:46:00 -07:00
if ( ( attrs & DMA_ATTR_NO_KERNEL_MAPPING ) = = 0 ) {
2014-10-09 15:26:40 -07:00
dma_common_free_remap ( cpu_addr , size ,
VM_ARM_DMA_CONSISTENT | VM_USERMAP ) ;
2012-05-16 19:38:58 +01:00
}
2012-07-30 09:11:33 +02:00
__iommu_remove_mapping ( dev , handle , size ) ;
2012-10-15 16:03:52 +02:00
__iommu_free_buffer ( dev , pages , size , attrs ) ;
2012-05-16 15:48:21 +02:00
}
2016-04-15 11:20:13 +01:00
void arm_iommu_free_attrs ( struct device * dev , size_t size ,
2016-08-03 13:46:00 -07:00
void * cpu_addr , dma_addr_t handle , unsigned long attrs )
2016-04-15 11:20:13 +01:00
{
__arm_iommu_free_attrs ( dev , size , cpu_addr , handle , attrs , NORMAL ) ;
}
void arm_coherent_iommu_free_attrs ( struct device * dev , size_t size ,
2016-08-03 13:46:00 -07:00
void * cpu_addr , dma_addr_t handle , unsigned long attrs )
2016-04-15 11:20:13 +01:00
{
__arm_iommu_free_attrs ( dev , size , cpu_addr , handle , attrs , COHERENT ) ;
}
2012-06-13 10:01:15 +02:00
static int arm_iommu_get_sgtable ( struct device * dev , struct sg_table * sgt ,
void * cpu_addr , dma_addr_t dma_addr ,
2016-08-03 13:46:00 -07:00
size_t size , unsigned long attrs )
2012-06-13 10:01:15 +02:00
{
unsigned int count = PAGE_ALIGN ( size ) > > PAGE_SHIFT ;
struct page * * pages = __iommu_get_pages ( cpu_addr , attrs ) ;
if ( ! pages )
return - ENXIO ;
return sg_alloc_table_from_pages ( sgt , pages , count , 0 , size ,
GFP_KERNEL ) ;
2012-05-16 15:48:21 +02:00
}
/*
* Map a part of the scatter - gather list into contiguous io address space
*/
static int __map_sg_chunk ( struct device * dev , struct scatterlist * sg ,
size_t size , dma_addr_t * handle ,
2016-08-03 13:46:00 -07:00
enum dma_data_direction dir , unsigned long attrs ,
2012-08-21 12:23:23 +02:00
bool is_coherent )
2012-05-16 15:48:21 +02:00
{
2015-01-16 18:02:15 +01:00
struct dma_iommu_mapping * mapping = to_dma_iommu_mapping ( dev ) ;
2012-05-16 15:48:21 +02:00
dma_addr_t iova , iova_base ;
int ret = 0 ;
unsigned int count ;
struct scatterlist * s ;
2013-09-27 00:36:15 +02:00
int prot ;
2012-05-16 15:48:21 +02:00
size = PAGE_ALIGN ( size ) ;
2018-11-21 18:57:36 +01:00
* handle = DMA_MAPPING_ERROR ;
2012-05-16 15:48:21 +02:00
iova_base = iova = __alloc_iova ( mapping , size ) ;
2018-11-21 18:57:36 +01:00
if ( iova = = DMA_MAPPING_ERROR )
2012-05-16 15:48:21 +02:00
return - ENOMEM ;
for ( count = 0 , s = sg ; count < ( size > > PAGE_SHIFT ) ; s = sg_next ( s ) ) {
2015-12-15 12:54:06 -08:00
phys_addr_t phys = page_to_phys ( sg_page ( s ) ) ;
2012-05-16 15:48:21 +02:00
unsigned int len = PAGE_ALIGN ( s - > offset + s - > length ) ;
2016-08-03 13:46:00 -07:00
if ( ! is_coherent & & ( attrs & DMA_ATTR_SKIP_CPU_SYNC ) = = 0 )
2012-05-16 15:48:21 +02:00
__dma_page_cpu_to_dev ( sg_page ( s ) , s - > offset , s - > length , dir ) ;
2017-01-06 18:58:13 +05:30
prot = __dma_info_to_prot ( dir , attrs ) ;
2013-09-27 00:36:15 +02:00
ret = iommu_map ( mapping - > domain , iova , phys , len , prot ) ;
2012-05-16 15:48:21 +02:00
if ( ret < 0 )
goto fail ;
count + = len > > PAGE_SHIFT ;
iova + = len ;
}
* handle = iova_base ;
return 0 ;
fail :
iommu_unmap ( mapping - > domain , iova_base , count * PAGE_SIZE ) ;
__free_iova ( mapping , iova_base , size ) ;
return ret ;
}
2012-08-21 12:23:23 +02:00
static int __iommu_map_sg ( struct device * dev , struct scatterlist * sg , int nents ,
2016-08-03 13:46:00 -07:00
enum dma_data_direction dir , unsigned long attrs ,
2012-08-21 12:23:23 +02:00
bool is_coherent )
2012-05-16 15:48:21 +02:00
{
struct scatterlist * s = sg , * dma = sg , * start = sg ;
int i , count = 0 ;
unsigned int offset = s - > offset ;
unsigned int size = s - > offset + s - > length ;
unsigned int max = dma_get_max_seg_size ( dev ) ;
for ( i = 1 ; i < nents ; i + + ) {
s = sg_next ( s ) ;
2018-11-21 18:57:36 +01:00
s - > dma_address = DMA_MAPPING_ERROR ;
2012-05-16 15:48:21 +02:00
s - > dma_length = 0 ;
if ( s - > offset | | ( size & ~ PAGE_MASK ) | | size + s - > length > max ) {
if ( __map_sg_chunk ( dev , start , size , & dma - > dma_address ,
2012-08-21 12:23:23 +02:00
dir , attrs , is_coherent ) < 0 )
2012-05-16 15:48:21 +02:00
goto bad_mapping ;
dma - > dma_address + = offset ;
dma - > dma_length = size - offset ;
size = offset = s - > offset ;
start = s ;
dma = sg_next ( dma ) ;
count + = 1 ;
}
size + = s - > length ;
}
2012-08-21 12:23:23 +02:00
if ( __map_sg_chunk ( dev , start , size , & dma - > dma_address , dir , attrs ,
is_coherent ) < 0 )
2012-05-16 15:48:21 +02:00
goto bad_mapping ;
dma - > dma_address + = offset ;
dma - > dma_length = size - offset ;
return count + 1 ;
bad_mapping :
for_each_sg ( sg , s , count , i )
__iommu_remove_mapping ( dev , sg_dma_address ( s ) , sg_dma_len ( s ) ) ;
return 0 ;
}
/**
2012-08-21 12:23:23 +02:00
* arm_coherent_iommu_map_sg - map a set of SG buffers for streaming mode DMA
2012-05-16 15:48:21 +02:00
* @ dev : valid struct device pointer
* @ sg : list of buffers
2012-08-21 12:23:23 +02:00
* @ nents : number of buffers to map
* @ dir : DMA transfer direction
2012-05-16 15:48:21 +02:00
*
2012-08-21 12:23:23 +02:00
* Map a set of i / o coherent buffers described by scatterlist in streaming
* mode for DMA . The scatter gather list elements are merged together ( if
* possible ) and tagged with the appropriate dma address and length . They are
* obtained via sg_dma_ { address , length } .
2012-05-16 15:48:21 +02:00
*/
2012-08-21 12:23:23 +02:00
int arm_coherent_iommu_map_sg ( struct device * dev , struct scatterlist * sg ,
2016-08-03 13:46:00 -07:00
int nents , enum dma_data_direction dir , unsigned long attrs )
2012-08-21 12:23:23 +02:00
{
return __iommu_map_sg ( dev , sg , nents , dir , attrs , true ) ;
}
/**
* arm_iommu_map_sg - map a set of SG buffers for streaming mode DMA
* @ dev : valid struct device pointer
* @ sg : list of buffers
* @ nents : number of buffers to map
* @ dir : DMA transfer direction
*
* Map a set of buffers described by scatterlist in streaming mode for DMA .
* The scatter gather list elements are merged together ( if possible ) and
* tagged with the appropriate dma address and length . They are obtained via
* sg_dma_ { address , length } .
*/
int arm_iommu_map_sg ( struct device * dev , struct scatterlist * sg ,
2016-08-03 13:46:00 -07:00
int nents , enum dma_data_direction dir , unsigned long attrs )
2012-08-21 12:23:23 +02:00
{
return __iommu_map_sg ( dev , sg , nents , dir , attrs , false ) ;
}
static void __iommu_unmap_sg ( struct device * dev , struct scatterlist * sg ,
2016-08-03 13:46:00 -07:00
int nents , enum dma_data_direction dir ,
unsigned long attrs , bool is_coherent )
2012-05-16 15:48:21 +02:00
{
struct scatterlist * s ;
int i ;
for_each_sg ( sg , s , nents , i ) {
if ( sg_dma_len ( s ) )
__iommu_remove_mapping ( dev , sg_dma_address ( s ) ,
sg_dma_len ( s ) ) ;
2016-08-03 13:46:00 -07:00
if ( ! is_coherent & & ( attrs & DMA_ATTR_SKIP_CPU_SYNC ) = = 0 )
2012-05-16 15:48:21 +02:00
__dma_page_dev_to_cpu ( sg_page ( s ) , s - > offset ,
s - > length , dir ) ;
}
}
2012-08-21 12:23:23 +02:00
/**
* arm_coherent_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg
* @ dev : valid struct device pointer
* @ sg : list of buffers
* @ nents : number of buffers to unmap ( same as was passed to dma_map_sg )
* @ dir : DMA transfer direction ( same as was passed to dma_map_sg )
*
* Unmap a set of streaming mode DMA translations . Again , CPU access
* rules concerning calls here are the same as for dma_unmap_single ( ) .
*/
void arm_coherent_iommu_unmap_sg ( struct device * dev , struct scatterlist * sg ,
2016-08-03 13:46:00 -07:00
int nents , enum dma_data_direction dir ,
unsigned long attrs )
2012-08-21 12:23:23 +02:00
{
__iommu_unmap_sg ( dev , sg , nents , dir , attrs , true ) ;
}
/**
* arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg
* @ dev : valid struct device pointer
* @ sg : list of buffers
* @ nents : number of buffers to unmap ( same as was passed to dma_map_sg )
* @ dir : DMA transfer direction ( same as was passed to dma_map_sg )
*
* Unmap a set of streaming mode DMA translations . Again , CPU access
* rules concerning calls here are the same as for dma_unmap_single ( ) .
*/
void arm_iommu_unmap_sg ( struct device * dev , struct scatterlist * sg , int nents ,
2016-08-03 13:46:00 -07:00
enum dma_data_direction dir ,
unsigned long attrs )
2012-08-21 12:23:23 +02:00
{
__iommu_unmap_sg ( dev , sg , nents , dir , attrs , false ) ;
}
2012-05-16 15:48:21 +02:00
/**
* arm_iommu_sync_sg_for_cpu
* @ dev : valid struct device pointer
* @ sg : list of buffers
* @ nents : number of buffers to map ( returned from dma_map_sg )
* @ dir : DMA transfer direction ( same as was passed to dma_map_sg )
*/
void arm_iommu_sync_sg_for_cpu ( struct device * dev , struct scatterlist * sg ,
int nents , enum dma_data_direction dir )
{
struct scatterlist * s ;
int i ;
for_each_sg ( sg , s , nents , i )
2012-08-21 12:23:23 +02:00
__dma_page_dev_to_cpu ( sg_page ( s ) , s - > offset , s - > length , dir ) ;
2012-05-16 15:48:21 +02:00
}
/**
* arm_iommu_sync_sg_for_device
* @ dev : valid struct device pointer
* @ sg : list of buffers
* @ nents : number of buffers to map ( returned from dma_map_sg )
* @ dir : DMA transfer direction ( same as was passed to dma_map_sg )
*/
void arm_iommu_sync_sg_for_device ( struct device * dev , struct scatterlist * sg ,
int nents , enum dma_data_direction dir )
{
struct scatterlist * s ;
int i ;
for_each_sg ( sg , s , nents , i )
2012-08-21 12:23:23 +02:00
__dma_page_cpu_to_dev ( sg_page ( s ) , s - > offset , s - > length , dir ) ;
2012-05-16 15:48:21 +02:00
}
/**
2012-08-21 12:23:23 +02:00
* arm_coherent_iommu_map_page
2012-05-16 15:48:21 +02:00
* @ dev : valid struct device pointer
* @ page : page that buffer resides in
* @ offset : offset into page for start of buffer
* @ size : size of buffer to map
* @ dir : DMA transfer direction
*
2012-08-21 12:23:23 +02:00
* Coherent IOMMU aware version of arm_dma_map_page ( )
2012-05-16 15:48:21 +02:00
*/
2012-08-21 12:23:23 +02:00
static dma_addr_t arm_coherent_iommu_map_page ( struct device * dev , struct page * page ,
2012-05-16 15:48:21 +02:00
unsigned long offset , size_t size , enum dma_data_direction dir ,
2016-08-03 13:46:00 -07:00
unsigned long attrs )
2012-05-16 15:48:21 +02:00
{
2015-01-16 18:02:15 +01:00
struct dma_iommu_mapping * mapping = to_dma_iommu_mapping ( dev ) ;
2012-05-16 15:48:21 +02:00
dma_addr_t dma_addr ;
2013-06-10 19:34:39 +01:00
int ret , prot , len = PAGE_ALIGN ( size + offset ) ;
2012-05-16 15:48:21 +02:00
dma_addr = __alloc_iova ( mapping , len ) ;
2018-11-21 18:57:36 +01:00
if ( dma_addr = = DMA_MAPPING_ERROR )
2012-05-16 15:48:21 +02:00
return dma_addr ;
2017-01-06 18:58:13 +05:30
prot = __dma_info_to_prot ( dir , attrs ) ;
2013-06-10 19:34:39 +01:00
ret = iommu_map ( mapping - > domain , dma_addr , page_to_phys ( page ) , len , prot ) ;
2012-05-16 15:48:21 +02:00
if ( ret < 0 )
goto fail ;
return dma_addr + offset ;
fail :
__free_iova ( mapping , dma_addr , len ) ;
2018-11-21 18:57:36 +01:00
return DMA_MAPPING_ERROR ;
2012-05-16 15:48:21 +02:00
}
2012-08-21 12:23:23 +02:00
/**
* arm_iommu_map_page
* @ dev : valid struct device pointer
* @ page : page that buffer resides in
* @ offset : offset into page for start of buffer
* @ size : size of buffer to map
* @ dir : DMA transfer direction
*
* IOMMU aware version of arm_dma_map_page ( )
*/
static dma_addr_t arm_iommu_map_page ( struct device * dev , struct page * page ,
unsigned long offset , size_t size , enum dma_data_direction dir ,
2016-08-03 13:46:00 -07:00
unsigned long attrs )
2012-08-21 12:23:23 +02:00
{
2016-08-03 13:46:00 -07:00
if ( ( attrs & DMA_ATTR_SKIP_CPU_SYNC ) = = 0 )
2012-08-21 12:23:23 +02:00
__dma_page_cpu_to_dev ( page , offset , size , dir ) ;
return arm_coherent_iommu_map_page ( dev , page , offset , size , dir , attrs ) ;
}
/**
* arm_coherent_iommu_unmap_page
* @ dev : valid struct device pointer
* @ handle : DMA address of buffer
* @ size : size of buffer ( same as passed to dma_map_page )
* @ dir : DMA transfer direction ( same as passed to dma_map_page )
*
* Coherent IOMMU aware version of arm_dma_unmap_page ( )
*/
static void arm_coherent_iommu_unmap_page ( struct device * dev , dma_addr_t handle ,
2016-08-03 13:46:00 -07:00
size_t size , enum dma_data_direction dir , unsigned long attrs )
2012-08-21 12:23:23 +02:00
{
2015-01-16 18:02:15 +01:00
struct dma_iommu_mapping * mapping = to_dma_iommu_mapping ( dev ) ;
2012-08-21 12:23:23 +02:00
dma_addr_t iova = handle & PAGE_MASK ;
int offset = handle & ~ PAGE_MASK ;
int len = PAGE_ALIGN ( size + offset ) ;
if ( ! iova )
return ;
iommu_unmap ( mapping - > domain , iova , len ) ;
__free_iova ( mapping , iova , len ) ;
}
2012-05-16 15:48:21 +02:00
/**
* arm_iommu_unmap_page
* @ dev : valid struct device pointer
* @ handle : DMA address of buffer
* @ size : size of buffer ( same as passed to dma_map_page )
* @ dir : DMA transfer direction ( same as passed to dma_map_page )
*
* IOMMU aware version of arm_dma_unmap_page ( )
*/
static void arm_iommu_unmap_page ( struct device * dev , dma_addr_t handle ,
2016-08-03 13:46:00 -07:00
size_t size , enum dma_data_direction dir , unsigned long attrs )
2012-05-16 15:48:21 +02:00
{
2015-01-16 18:02:15 +01:00
struct dma_iommu_mapping * mapping = to_dma_iommu_mapping ( dev ) ;
2012-05-16 15:48:21 +02:00
dma_addr_t iova = handle & PAGE_MASK ;
struct page * page = phys_to_page ( iommu_iova_to_phys ( mapping - > domain , iova ) ) ;
int offset = handle & ~ PAGE_MASK ;
int len = PAGE_ALIGN ( size + offset ) ;
if ( ! iova )
return ;
2016-08-03 13:46:00 -07:00
if ( ( attrs & DMA_ATTR_SKIP_CPU_SYNC ) = = 0 )
2012-05-16 15:48:21 +02:00
__dma_page_dev_to_cpu ( page , offset , size , dir ) ;
iommu_unmap ( mapping - > domain , iova , len ) ;
__free_iova ( mapping , iova , len ) ;
}
2016-08-10 13:22:17 +02:00
/**
* arm_iommu_map_resource - map a device resource for DMA
* @ dev : valid struct device pointer
* @ phys_addr : physical address of resource
* @ size : size of resource to map
* @ dir : DMA transfer direction
*/
static dma_addr_t arm_iommu_map_resource ( struct device * dev ,
phys_addr_t phys_addr , size_t size ,
enum dma_data_direction dir , unsigned long attrs )
{
struct dma_iommu_mapping * mapping = to_dma_iommu_mapping ( dev ) ;
dma_addr_t dma_addr ;
int ret , prot ;
phys_addr_t addr = phys_addr & PAGE_MASK ;
unsigned int offset = phys_addr & ~ PAGE_MASK ;
size_t len = PAGE_ALIGN ( size + offset ) ;
dma_addr = __alloc_iova ( mapping , len ) ;
2018-11-21 18:57:36 +01:00
if ( dma_addr = = DMA_MAPPING_ERROR )
2016-08-10 13:22:17 +02:00
return dma_addr ;
2017-01-06 18:58:13 +05:30
prot = __dma_info_to_prot ( dir , attrs ) | IOMMU_MMIO ;
2016-08-10 13:22:17 +02:00
ret = iommu_map ( mapping - > domain , dma_addr , addr , len , prot ) ;
if ( ret < 0 )
goto fail ;
return dma_addr + offset ;
fail :
__free_iova ( mapping , dma_addr , len ) ;
2018-11-21 18:57:36 +01:00
return DMA_MAPPING_ERROR ;
2016-08-10 13:22:17 +02:00
}
/**
* arm_iommu_unmap_resource - unmap a device DMA resource
* @ dev : valid struct device pointer
* @ dma_handle : DMA address to resource
* @ size : size of resource to map
* @ dir : DMA transfer direction
*/
static void arm_iommu_unmap_resource ( struct device * dev , dma_addr_t dma_handle ,
size_t size , enum dma_data_direction dir ,
unsigned long attrs )
{
struct dma_iommu_mapping * mapping = to_dma_iommu_mapping ( dev ) ;
dma_addr_t iova = dma_handle & PAGE_MASK ;
unsigned int offset = dma_handle & ~ PAGE_MASK ;
size_t len = PAGE_ALIGN ( size + offset ) ;
if ( ! iova )
return ;
iommu_unmap ( mapping - > domain , iova , len ) ;
__free_iova ( mapping , iova , len ) ;
}
2012-05-16 15:48:21 +02:00
static void arm_iommu_sync_single_for_cpu ( struct device * dev ,
dma_addr_t handle , size_t size , enum dma_data_direction dir )
{
2015-01-16 18:02:15 +01:00
struct dma_iommu_mapping * mapping = to_dma_iommu_mapping ( dev ) ;
2012-05-16 15:48:21 +02:00
dma_addr_t iova = handle & PAGE_MASK ;
struct page * page = phys_to_page ( iommu_iova_to_phys ( mapping - > domain , iova ) ) ;
unsigned int offset = handle & ~ PAGE_MASK ;
if ( ! iova )
return ;
2012-08-21 12:23:23 +02:00
__dma_page_dev_to_cpu ( page , offset , size , dir ) ;
2012-05-16 15:48:21 +02:00
}
static void arm_iommu_sync_single_for_device ( struct device * dev ,
dma_addr_t handle , size_t size , enum dma_data_direction dir )
{
2015-01-16 18:02:15 +01:00
struct dma_iommu_mapping * mapping = to_dma_iommu_mapping ( dev ) ;
2012-05-16 15:48:21 +02:00
dma_addr_t iova = handle & PAGE_MASK ;
struct page * page = phys_to_page ( iommu_iova_to_phys ( mapping - > domain , iova ) ) ;
unsigned int offset = handle & ~ PAGE_MASK ;
if ( ! iova )
return ;
__dma_page_cpu_to_dev ( page , offset , size , dir ) ;
}
2017-01-20 13:04:01 -08:00
const struct dma_map_ops iommu_ops = {
2012-05-16 15:48:21 +02:00
. alloc = arm_iommu_alloc_attrs ,
. free = arm_iommu_free_attrs ,
. mmap = arm_iommu_mmap_attrs ,
2012-06-13 10:01:15 +02:00
. get_sgtable = arm_iommu_get_sgtable ,
2012-05-16 15:48:21 +02:00
. map_page = arm_iommu_map_page ,
. unmap_page = arm_iommu_unmap_page ,
. sync_single_for_cpu = arm_iommu_sync_single_for_cpu ,
. sync_single_for_device = arm_iommu_sync_single_for_device ,
. map_sg = arm_iommu_map_sg ,
. unmap_sg = arm_iommu_unmap_sg ,
. sync_sg_for_cpu = arm_iommu_sync_sg_for_cpu ,
. sync_sg_for_device = arm_iommu_sync_sg_for_device ,
2016-08-10 13:22:17 +02:00
. map_resource = arm_iommu_map_resource ,
. unmap_resource = arm_iommu_unmap_resource ,
2017-05-22 10:53:03 +02:00
2017-05-22 11:20:18 +02:00
. dma_supported = arm_dma_supported ,
2012-05-16 15:48:21 +02:00
} ;
2017-01-20 13:04:01 -08:00
const struct dma_map_ops iommu_coherent_ops = {
2016-04-15 11:20:13 +01:00
. alloc = arm_coherent_iommu_alloc_attrs ,
. free = arm_coherent_iommu_free_attrs ,
. mmap = arm_coherent_iommu_mmap_attrs ,
2012-08-21 12:23:23 +02:00
. get_sgtable = arm_iommu_get_sgtable ,
. map_page = arm_coherent_iommu_map_page ,
. unmap_page = arm_coherent_iommu_unmap_page ,
. map_sg = arm_coherent_iommu_map_sg ,
. unmap_sg = arm_coherent_iommu_unmap_sg ,
2016-08-10 13:22:17 +02:00
. map_resource = arm_iommu_map_resource ,
. unmap_resource = arm_iommu_unmap_resource ,
2017-05-22 10:53:03 +02:00
2017-05-22 11:20:18 +02:00
. dma_supported = arm_dma_supported ,
2012-08-21 12:23:23 +02:00
} ;
2012-05-16 15:48:21 +02:00
/**
* arm_iommu_create_mapping
* @ bus : pointer to the bus holding the client device ( for IOMMU calls )
* @ base : start address of the valid IO address space
2014-02-25 13:01:09 +01:00
* @ size : maximum size of the valid IO address space
2012-05-16 15:48:21 +02:00
*
* Creates a mapping structure which holds information about used / unused
* IO address ranges , which is required to perform memory allocation and
* mapping with IOMMU aware functions .
*
* The client device need to be attached to the mapping with
* arm_iommu_attach_device function .
*/
struct dma_iommu_mapping *
2015-04-29 11:29:19 +01:00
arm_iommu_create_mapping ( struct bus_type * bus , dma_addr_t base , u64 size )
2012-05-16 15:48:21 +02:00
{
2014-02-25 13:01:09 +01:00
unsigned int bits = size > > PAGE_SHIFT ;
unsigned int bitmap_size = BITS_TO_LONGS ( bits ) * sizeof ( long ) ;
2012-05-16 15:48:21 +02:00
struct dma_iommu_mapping * mapping ;
2014-02-25 13:01:09 +01:00
int extensions = 1 ;
2012-05-16 15:48:21 +02:00
int err = - ENOMEM ;
2015-04-29 11:29:19 +01:00
/* currently only 32-bit DMA address space is supported */
if ( size > DMA_BIT_MASK ( 32 ) + 1 )
return ERR_PTR ( - ERANGE ) ;
2014-02-25 13:01:09 +01:00
if ( ! bitmap_size )
2012-05-16 15:48:21 +02:00
return ERR_PTR ( - EINVAL ) ;
2014-02-25 13:01:09 +01:00
if ( bitmap_size > PAGE_SIZE ) {
extensions = bitmap_size / PAGE_SIZE ;
bitmap_size = PAGE_SIZE ;
}
2012-05-16 15:48:21 +02:00
mapping = kzalloc ( sizeof ( struct dma_iommu_mapping ) , GFP_KERNEL ) ;
if ( ! mapping )
goto err ;
2014-02-25 13:01:09 +01:00
mapping - > bitmap_size = bitmap_size ;
treewide: kzalloc() -> kcalloc()
The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:
kzalloc(a * b, gfp)
with:
kcalloc(a * b, gfp)
as well as handling cases of:
kzalloc(a * b * c, gfp)
with:
kzalloc(array3_size(a, b, c), gfp)
as it's slightly less ugly than:
kzalloc_array(array_size(a, b), c, gfp)
This does, however, attempt to ignore constant size factors like:
kzalloc(4 * 1024, gfp)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@
(
kzalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kzalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@
(
kzalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)
// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@
- kzalloc
+ kcalloc
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
kzalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
kzalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@
(
kzalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@
(
kzalloc(sizeof(THING) * C2, ...)
|
kzalloc(sizeof(TYPE) * C2, ...)
|
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * E2
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- E1 * E2
+ E1, E2
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 14:03:40 -07:00
mapping - > bitmaps = kcalloc ( extensions , sizeof ( unsigned long * ) ,
GFP_KERNEL ) ;
2014-02-25 13:09:53 +01:00
if ( ! mapping - > bitmaps )
2012-05-16 15:48:21 +02:00
goto err2 ;
2014-02-25 13:01:09 +01:00
mapping - > bitmaps [ 0 ] = kzalloc ( bitmap_size , GFP_KERNEL ) ;
2014-02-25 13:09:53 +01:00
if ( ! mapping - > bitmaps [ 0 ] )
goto err3 ;
mapping - > nr_bitmaps = 1 ;
mapping - > extensions = extensions ;
2012-05-16 15:48:21 +02:00
mapping - > base = base ;
2014-02-25 13:01:09 +01:00
mapping - > bits = BITS_PER_BYTE * bitmap_size ;
2014-02-25 13:09:53 +01:00
2012-05-16 15:48:21 +02:00
spin_lock_init ( & mapping - > lock ) ;
mapping - > domain = iommu_domain_alloc ( bus ) ;
if ( ! mapping - > domain )
2014-02-25 13:09:53 +01:00
goto err4 ;
2012-05-16 15:48:21 +02:00
kref_init ( & mapping - > kref ) ;
return mapping ;
2014-02-25 13:09:53 +01:00
err4 :
kfree ( mapping - > bitmaps [ 0 ] ) ;
2012-05-16 15:48:21 +02:00
err3 :
2014-02-25 13:09:53 +01:00
kfree ( mapping - > bitmaps ) ;
2012-05-16 15:48:21 +02:00
err2 :
kfree ( mapping ) ;
err :
return ERR_PTR ( err ) ;
}
2013-01-04 06:22:42 -05:00
EXPORT_SYMBOL_GPL ( arm_iommu_create_mapping ) ;
2012-05-16 15:48:21 +02:00
static void release_iommu_mapping ( struct kref * kref )
{
2014-02-25 13:09:53 +01:00
int i ;
2012-05-16 15:48:21 +02:00
struct dma_iommu_mapping * mapping =
container_of ( kref , struct dma_iommu_mapping , kref ) ;
iommu_domain_free ( mapping - > domain ) ;
2014-02-25 13:09:53 +01:00
for ( i = 0 ; i < mapping - > nr_bitmaps ; i + + )
kfree ( mapping - > bitmaps [ i ] ) ;
kfree ( mapping - > bitmaps ) ;
2012-05-16 15:48:21 +02:00
kfree ( mapping ) ;
}
2014-02-25 13:09:53 +01:00
static int extend_iommu_mapping ( struct dma_iommu_mapping * mapping )
{
int next_bitmap ;
2015-07-08 13:21:55 +01:00
if ( mapping - > nr_bitmaps > = mapping - > extensions )
2014-02-25 13:09:53 +01:00
return - EINVAL ;
next_bitmap = mapping - > nr_bitmaps ;
mapping - > bitmaps [ next_bitmap ] = kzalloc ( mapping - > bitmap_size ,
GFP_ATOMIC ) ;
if ( ! mapping - > bitmaps [ next_bitmap ] )
return - ENOMEM ;
mapping - > nr_bitmaps + + ;
return 0 ;
}
2012-05-16 15:48:21 +02:00
void arm_iommu_release_mapping ( struct dma_iommu_mapping * mapping )
{
if ( mapping )
kref_put ( & mapping - > kref , release_iommu_mapping ) ;
}
2013-01-04 06:22:42 -05:00
EXPORT_SYMBOL_GPL ( arm_iommu_release_mapping ) ;
2012-05-16 15:48:21 +02:00
2015-01-23 16:21:49 +02:00
static int __arm_iommu_attach_device ( struct device * dev ,
struct dma_iommu_mapping * mapping )
{
int err ;
err = iommu_attach_device ( mapping - > domain , dev ) ;
if ( err )
return err ;
kref_get ( & mapping - > kref ) ;
2015-01-16 18:02:15 +01:00
to_dma_iommu_mapping ( dev ) = mapping ;
2015-01-23 16:21:49 +02:00
pr_debug ( " Attached IOMMU controller to %s device. \n " , dev_name ( dev ) ) ;
return 0 ;
}
2012-05-16 15:48:21 +02:00
/**
* arm_iommu_attach_device
* @ dev : valid struct device pointer
* @ mapping : io address space mapping structure ( returned from
* arm_iommu_create_mapping )
*
2015-01-23 16:21:49 +02:00
* Attaches specified io address space mapping to the provided device .
* This replaces the dma operations ( dma_map_ops pointer ) with the
* IOMMU aware version .
*
2014-08-27 17:52:44 +01:00
* More than one client might be attached to the same io address space
* mapping .
2012-05-16 15:48:21 +02:00
*/
int arm_iommu_attach_device ( struct device * dev ,
struct dma_iommu_mapping * mapping )
{
int err ;
2015-01-23 16:21:49 +02:00
err = __arm_iommu_attach_device ( dev , mapping ) ;
2012-05-16 15:48:21 +02:00
if ( err )
return err ;
2015-01-23 16:21:49 +02:00
set_dma_ops ( dev , & iommu_ops ) ;
2012-05-16 15:48:21 +02:00
return 0 ;
}
2013-01-04 06:22:42 -05:00
EXPORT_SYMBOL_GPL ( arm_iommu_attach_device ) ;
2012-05-16 15:48:21 +02:00
2017-05-27 19:17:45 +05:30
/**
* arm_iommu_detach_device
* @ dev : valid struct device pointer
*
* Detaches the provided device from a previously attached map .
2018-12-19 00:12:07 +01:00
* This overwrites the dma_ops pointer with appropriate non - IOMMU ops .
2017-05-27 19:17:45 +05:30
*/
void arm_iommu_detach_device ( struct device * dev )
2013-01-24 15:16:57 +02:00
{
struct dma_iommu_mapping * mapping ;
mapping = to_dma_iommu_mapping ( dev ) ;
if ( ! mapping ) {
dev_warn ( dev , " Not attached \n " ) ;
return ;
}
iommu_detach_device ( mapping - > domain , dev ) ;
kref_put ( & mapping - > kref , release_iommu_mapping ) ;
2015-01-16 18:02:15 +01:00
to_dma_iommu_mapping ( dev ) = NULL ;
2018-05-30 16:06:24 +02:00
set_dma_ops ( dev , arm_get_dma_map_ops ( dev - > archdata . dma_coherent ) ) ;
2013-01-24 15:16:57 +02:00
pr_debug ( " Detached IOMMU controller from %s device. \n " , dev_name ( dev ) ) ;
}
2013-01-04 06:22:42 -05:00
EXPORT_SYMBOL_GPL ( arm_iommu_detach_device ) ;
2013-01-24 15:16:57 +02:00
2017-01-20 13:04:01 -08:00
static const struct dma_map_ops * arm_get_iommu_dma_map_ops ( bool coherent )
2014-08-27 17:52:44 +01:00
{
return coherent ? & iommu_coherent_ops : & iommu_ops ;
}
static bool arm_setup_iommu_dma_ops ( struct device * dev , u64 dma_base , u64 size ,
2016-04-07 18:42:05 +01:00
const struct iommu_ops * iommu )
2014-08-27 17:52:44 +01:00
{
struct dma_iommu_mapping * mapping ;
if ( ! iommu )
return false ;
mapping = arm_iommu_create_mapping ( dev - > bus , dma_base , size ) ;
if ( IS_ERR ( mapping ) ) {
pr_warn ( " Failed to create %llu-byte IOMMU mapping for device %s \n " ,
size , dev_name ( dev ) ) ;
return false ;
}
2015-01-23 16:21:49 +02:00
if ( __arm_iommu_attach_device ( dev , mapping ) ) {
2014-08-27 17:52:44 +01:00
pr_warn ( " Failed to attached device %s to IOMMU_mapping \n " ,
dev_name ( dev ) ) ;
arm_iommu_release_mapping ( mapping ) ;
return false ;
}
return true ;
}
static void arm_teardown_iommu_dma_ops ( struct device * dev )
{
2015-01-16 18:02:15 +01:00
struct dma_iommu_mapping * mapping = to_dma_iommu_mapping ( dev ) ;
2014-08-27 17:52:44 +01:00
2015-01-16 18:01:43 +01:00
if ( ! mapping )
return ;
2017-05-27 19:17:45 +05:30
arm_iommu_detach_device ( dev ) ;
2014-08-27 17:52:44 +01:00
arm_iommu_release_mapping ( mapping ) ;
}
# else
static bool arm_setup_iommu_dma_ops ( struct device * dev , u64 dma_base , u64 size ,
2016-04-07 18:42:05 +01:00
const struct iommu_ops * iommu )
2014-08-27 17:52:44 +01:00
{
return false ;
}
static void arm_teardown_iommu_dma_ops ( struct device * dev ) { }
# define arm_get_iommu_dma_map_ops arm_get_dma_map_ops
# endif /* CONFIG_ARM_DMA_USE_IOMMU */
void arch_setup_dma_ops ( struct device * dev , u64 dma_base , u64 size ,
2016-04-07 18:42:05 +01:00
const struct iommu_ops * iommu , bool coherent )
2014-08-27 17:52:44 +01:00
{
2017-01-20 13:04:01 -08:00
const struct dma_map_ops * dma_ops ;
2014-08-27 17:52:44 +01:00
ARM: SoC/iommu configuration for 3.19
The iomm-config branch contains work from Will Deacon, quoting his description:
This series adds automatic IOMMU and DMA-mapping configuration for
OF-based DMA masters described using the generic IOMMU devicetree
bindings. Although there is plenty of future work around splitting up
iommu_ops, adding default IOMMU domains and sorting out automatic IOMMU
group creation for the platform_bus, this is already useful enough for
people to port over their IOMMU drivers and start using the new probing
infrastructure (indeed, Marek has patches queued for the Exynos IOMMU).
The branch touches core ARM and IOMMU driver files, and the respective
maintainers (Russell King and Joerg Roedel) agreed to have the contents
merged through the arm-soc tree. The final version was ready just before
the merge window, so we ended up delaying it a bit longer than the rest,
but we don't expect to see regressions because this is just additional
infrastructure that will get used in drivers starting in 3.20 but is
unused so far.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIVAwUAVJCfoGCrR//JCVInAQIfvxAAhVeEKyhroIGiuCmylWK/TdXja+xO46g+
hkrijO0cPB5C7K45AW2a2aCUM0jSjr81dUprQ/uojr3xXxnJ59t7tDAXpKpFy8xi
5gb/wd/Cea90RtR1mUnNr/+P1sJKemcvmhCuib7111E5wd/s617bLd1+zgCuHguj
g733GjDE7SUSTEStviDg963pn+l2IartjhRPhAKmGWiLZA7RiWe35pzDTZGCApnd
yfZafXxn4IeUcxQUT6lAsW7xShzCUI2CZ8nZ4tG6YcyR2UNB5BVrPb1BAm6Eb28C
1WmyjnAAyXxc6pqPTalO+JctpS7ujjbtwlOOwgthKyKMfpFnqyavablDl6GvtHn8
NIa3HdnKQTXl9/nRXCvIjeWDyaZEZ5ueacfhMm4PWRSIkqKFVgwY18nNkOul9fuz
0UD9EuN0PPHV2hCIp9Kl3Jju5pi2EEzCt/Vn0YGsZTZuVOfREZ3izDtyKFg1tjif
AJ5kFRc1X+6hXNDUWUOmLOnjBvupbq2axFbLeAzQxla/O/0pwHWhiuqXu3uB4six
1Hlgt7yI7pob86VcQKTCg1v8kOvQTEuL2BtUWkCpbyrVSafYRVKwlUNnQlmu5F3c
sL14hhK9QSHyCmJ7yKchY104QVKmN8v3ks8PyUNoPxq57ChH4E6FVAZpMz08uF5V
mIWREpeIPNw=
=ELLq
-----END PGP SIGNATURE-----
Merge tag 'iommu-config-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC/iommu configuration update from Arnd Bergmann:
"The iomm-config branch contains work from Will Deacon, quoting his
description:
This series adds automatic IOMMU and DMA-mapping configuration for
OF-based DMA masters described using the generic IOMMU devicetree
bindings. Although there is plenty of future work around splitting up
iommu_ops, adding default IOMMU domains and sorting out automatic IOMMU
group creation for the platform_bus, this is already useful enough for
people to port over their IOMMU drivers and start using the new probing
infrastructure (indeed, Marek has patches queued for the Exynos IOMMU).
The branch touches core ARM and IOMMU driver files, and the respective
maintainers (Russell King and Joerg Roedel) agreed to have the
contents merged through the arm-soc tree.
The final version was ready just before the merge window, so we ended
up delaying it a bit longer than the rest, but we don't expect to see
regressions because this is just additional infrastructure that will
get used in drivers starting in 3.20 but is unused so far"
* tag 'iommu-config-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
iommu: store DT-probed IOMMU data privately
arm: dma-mapping: plumb our iommu mapping ops into arch_setup_dma_ops
arm: call iommu_init before of_platform_populate
dma-mapping: detect and configure IOMMU in of_dma_configure
iommu: fix initialization without 'add_device' callback
iommu: provide helper function to configure an IOMMU for an of master
iommu: add new iommu_ops callback for adding an OF device
dma-mapping: replace set_arch_dma_coherent_ops with arch_setup_dma_ops
iommu: provide early initialisation hook for IOMMU drivers
2014-12-16 14:53:01 -08:00
dev - > archdata . dma_coherent = coherent ;
2019-07-23 11:33:12 +02:00
# ifdef CONFIG_SWIOTLB
dev - > dma_coherent = coherent ;
# endif
2015-05-15 02:00:02 +03:00
/*
* Don ' t override the dma_ops if they have already been set . Ideally
* this should be the only location where dma_ops are set , remove this
* check when all other callers of set_dma_ops will have disappeared .
*/
if ( dev - > dma_ops )
return ;
2014-08-27 17:52:44 +01:00
if ( arm_setup_iommu_dma_ops ( dev , dma_base , size , iommu ) )
dma_ops = arm_get_iommu_dma_map_ops ( coherent ) ;
else
dma_ops = arm_get_dma_map_ops ( coherent ) ;
set_dma_ops ( dev , dma_ops ) ;
2017-04-13 14:04:21 -07:00
# ifdef CONFIG_XEN
if ( xen_initial_domain ( ) ) {
dev - > archdata . dev_dma_ops = dev - > dma_ops ;
dev - > dma_ops = xen_dma_ops ;
}
# endif
ARM: dma-mapping: Don't tear down third-party mappings
arch_setup_dma_ops() is used in device probe code paths to create an
IOMMU mapping and attach it to the device. The function assumes that the
device is attached to a device-specific IOMMU instance (or at least a
device-specific TLB in a shared IOMMU instance) and thus creates a
separate mapping for every device.
On several systems (Renesas R-Car Gen2 being one of them), that
assumption is not true, and IOMMU mappings must be shared between
multiple devices. In those cases the IOMMU driver knows better than the
generic ARM dma-mapping layer and attaches mapping to devices manually
with arm_iommu_attach_device(), which sets the DMA ops for the device.
The arch_setup_dma_ops() function takes this into account and bails out
immediately if the device already has DMA ops assigned. However, the
corresponding arch_teardown_dma_ops() function, called from driver
unbind code paths (including probe deferral), will tear the mapping down
regardless of who created it. When the device is reprobed
arch_setup_dma_ops() will be called again but won't perform any
operation as the DMA ops will still be set.
We need to reset the DMA ops in arch_teardown_dma_ops() to fix this.
However, we can't do so unconditionally, as then a new mapping would be
created by arch_setup_dma_ops() when the device is reprobed, regardless
of whether the device needs to share a mapping or not. We must thus keep
track of whether arch_setup_dma_ops() created the mapping, and only in
that case tear it down in arch_teardown_dma_ops().
Keep track of that information in the dev_archdata structure. As the
structure is embedded in all instances of struct device let's not grow
it, but turn the existing dma_coherent bool field into a bitfield that
can be used for other purposes.
Fixes: 09515ef5ddad ("of/acpi: Configure dma operations at probe time for platform/amba/pci bus devices")
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-05-27 19:17:43 +05:30
dev - > archdata . dma_ops_setup = true ;
2014-08-27 17:52:44 +01:00
}
void arch_teardown_dma_ops ( struct device * dev )
{
ARM: dma-mapping: Don't tear down third-party mappings
arch_setup_dma_ops() is used in device probe code paths to create an
IOMMU mapping and attach it to the device. The function assumes that the
device is attached to a device-specific IOMMU instance (or at least a
device-specific TLB in a shared IOMMU instance) and thus creates a
separate mapping for every device.
On several systems (Renesas R-Car Gen2 being one of them), that
assumption is not true, and IOMMU mappings must be shared between
multiple devices. In those cases the IOMMU driver knows better than the
generic ARM dma-mapping layer and attaches mapping to devices manually
with arm_iommu_attach_device(), which sets the DMA ops for the device.
The arch_setup_dma_ops() function takes this into account and bails out
immediately if the device already has DMA ops assigned. However, the
corresponding arch_teardown_dma_ops() function, called from driver
unbind code paths (including probe deferral), will tear the mapping down
regardless of who created it. When the device is reprobed
arch_setup_dma_ops() will be called again but won't perform any
operation as the DMA ops will still be set.
We need to reset the DMA ops in arch_teardown_dma_ops() to fix this.
However, we can't do so unconditionally, as then a new mapping would be
created by arch_setup_dma_ops() when the device is reprobed, regardless
of whether the device needs to share a mapping or not. We must thus keep
track of whether arch_setup_dma_ops() created the mapping, and only in
that case tear it down in arch_teardown_dma_ops().
Keep track of that information in the dev_archdata structure. As the
structure is embedded in all instances of struct device let's not grow
it, but turn the existing dma_coherent bool field into a bitfield that
can be used for other purposes.
Fixes: 09515ef5ddad ("of/acpi: Configure dma operations at probe time for platform/amba/pci bus devices")
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-05-27 19:17:43 +05:30
if ( ! dev - > archdata . dma_ops_setup )
return ;
2014-08-27 17:52:44 +01:00
arm_teardown_iommu_dma_ops ( dev ) ;
2019-02-06 18:43:24 +01:00
/* Let arch_setup_dma_ops() start again from scratch upon re-probe */
set_dma_ops ( dev , NULL ) ;
2014-08-27 17:52:44 +01:00
}
2019-07-23 11:33:12 +02:00
# ifdef CONFIG_SWIOTLB
void arch_sync_dma_for_device ( struct device * dev , phys_addr_t paddr ,
size_t size , enum dma_data_direction dir )
{
__dma_page_cpu_to_dev ( phys_to_page ( paddr ) , paddr & ( PAGE_SIZE - 1 ) ,
size , dir ) ;
}
void arch_sync_dma_for_cpu ( struct device * dev , phys_addr_t paddr ,
size_t size , enum dma_data_direction dir )
{
__dma_page_dev_to_cpu ( phys_to_page ( paddr ) , paddr & ( PAGE_SIZE - 1 ) ,
size , dir ) ;
}
long arch_dma_coherent_to_pfn ( struct device * dev , void * cpu_addr ,
dma_addr_t dma_addr )
{
return dma_to_pfn ( dev , dma_addr ) ;
}
pgprot_t arch_dma_mmap_pgprot ( struct device * dev , pgprot_t prot ,
unsigned long attrs )
{
if ( ! dev_is_dma_coherent ( dev ) )
return __get_dma_pgprot ( attrs , prot ) ;
return prot ;
}
void * arch_dma_alloc ( struct device * dev , size_t size , dma_addr_t * dma_handle ,
gfp_t gfp , unsigned long attrs )
{
return __dma_alloc ( dev , size , dma_handle , gfp ,
__get_dma_pgprot ( attrs , PAGE_KERNEL ) , false ,
attrs , __builtin_return_address ( 0 ) ) ;
}
void arch_dma_free ( struct device * dev , size_t size , void * cpu_addr ,
dma_addr_t dma_handle , unsigned long attrs )
{
__arm_dma_free ( dev , size , cpu_addr , dma_handle , attrs , false ) ;
}
# endif /* CONFIG_SWIOTLB */