linux/Documentation/vm
Andrea Arcangeli a9b85f9415 userfaultfd: change the read API to return a uffd_msg
I had requests to return the full address (not the page aligned one) to
userland.

It's not entirely clear how the page offset could be relevant because
userfaults aren't like SIGBUS that can sigjump to a different place and it
actually skip resolving the fault depending on a page offset.  There's
currently no real way to skip the fault especially because after a
UFFDIO_COPY|ZEROPAGE, the fault is optimized to be retried within the
kernel without having to return to userland first (not even self modifying
code replacing the .text that touched the faulting address would prevent
the fault to be repeated).  Userland cannot skip repeating the fault even
more so if the fault was triggered by a KVM secondary page fault or any
get_user_pages or any copy-user inside some syscall which will return to
kernel code.  The second time FAULT_FLAG_RETRY_NOWAIT won't be set leading
to a SIGBUS being raised because the userfault can't wait if it cannot
release the mmap_map first (and FAULT_FLAG_RETRY_NOWAIT is required for
that).

Still returning userland a proper structure during the read() on the uffd,
can allow to use the current UFFD_API for the future non-cooperative
extensions too and it looks cleaner as well.  Once we get additional
fields there's no point to return the fault address page aligned anymore
to reuse the bits below PAGE_SHIFT.

The only downside is that the read() syscall will read 32bytes instead of
8bytes but that's not going to be measurable overhead.

The total number of new events that can be extended or of new future bits
for already shipped events, is limited to 64 by the features field of the
uffdio_api structure.  If more will be needed a bump of UFFD_API will be
required.

[akpm@linux-foundation.org: use __packed]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: zhang.zhanghailiang@huawei.com
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
..
.gitignore
00-INDEX Documentation/: update 00-INDEX files 2014-02-10 16:01:40 -08:00
active_mm.txt Fix common misspellings 2011-03-31 11:26:23 -03:00
balance
cleancache.txt cleancache: forbid overriding cleancache_ops 2015-04-14 16:49:03 -07:00
frontswap.txt doc: fix quite a few typos within Documentation 2012-11-19 14:28:24 +01:00
highmem.txt mm: highmem documentation 2010-10-26 16:52:08 -07:00
hugetlbpage.txt mm, doc: cleanup and clarify munmap behavior for hugetlb memory 2015-04-15 16:35:19 -07:00
hwpoison.txt mm/memory-failure.c: support use of a dedicated thread to handle SIGBUS(BUS_MCEERR_AO) 2014-06-04 16:54:13 -07:00
ksm.txt ksm: add some comments 2013-02-23 17:50:23 -08:00
numa doc: fix broken references 2011-09-27 18:08:04 +02:00
numa_memory_policy.txt Documentation/vm/numa_memory_policy.txt: fix wrong document in numa_memory_policy.txt 2014-04-18 16:40:08 -07:00
overcommit-accounting mm: add overcommit_kbytes sysctl variable 2014-01-21 16:19:44 -08:00
page_migration
page_owner.txt Documentation: add new page_owner document 2014-12-13 12:42:48 -08:00
pagemap.txt Documentation/vm/pagemap.txt: correct location of page-types tool 2015-04-11 15:11:21 +02:00
remap_file_pages.txt mm: replace remap_file_pages() syscall with emulation 2015-02-10 14:30:30 -08:00
slub.txt Documentations: Fix slabinfo.c directory in vm/slub.txt 2012-05-10 11:45:23 +03:00
soft-dirty.txt mm: track vma changes with VM_SOFTDIRTY bit 2013-09-11 15:57:56 -07:00
split_page_table_lock x86, mm: do not leak page->ptl for pmd page tables 2013-11-21 16:42:28 -08:00
transhuge.txt doc: add information about max_ptes_none 2015-03-20 07:41:55 -06:00
unevictable-lru.txt Documentation/vm/unevictable-lru.txt: clarify MAP_LOCKED behavior 2015-06-24 17:49:44 -07:00
userfaultfd.txt userfaultfd: change the read API to return a uffd_msg 2015-09-04 16:54:41 -07:00
zsmalloc.txt zsmalloc: zsmalloc documentation 2015-04-15 16:35:21 -07:00
zswap.txt zswap: runtime enable/disable 2015-06-25 17:00:37 -07:00