Merge branch 'akpm' (second patch-bomb from Andrew)
Merge second patchbomb from Andrew Morton: - the rest of MM - misc fs fixes - add execveat() syscall - new ratelimit feature for fault-injection - decompressor updates - ipc/ updates - fallocate feature creep - fsnotify cleanups - a few other misc things * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (99 commits) cgroups: Documentation: fix trivial typos and wrong paragraph numberings parisc: percpu: update comments referring to __get_cpu_var percpu: update local_ops.txt to reflect this_cpu operations percpu: remove __get_cpu_var and __raw_get_cpu_var macros fsnotify: remove destroy_list from fsnotify_mark fsnotify: unify inode and mount marks handling fallocate: create FAN_MODIFY and IN_MODIFY events mm/cma: make kmemleak ignore CMA regions slub: fix cpuset check in get_any_partial slab: fix cpuset check in fallback_alloc shmdt: use i_size_read() instead of ->i_size ipc/shm.c: fix overly aggressive shmdt() when calls span multiple segments ipc/msg: increase MSGMNI, remove scaling ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM ipc/sem.c: change memory barrier in sem_lock() to smp_rmb() lib/decompress.c: consistency of compress formats for kernel image decompress_bunzip2: off by one in get_next_block() usr/Kconfig: make initrd compression algorithm selection not expert fault-inject: add ratelimit option ratelimit: add initialization macro ...
This commit is contained in:
commit
78a45c6f06
@ -445,7 +445,7 @@ across partially overlapping sets of CPUs would risk unstable dynamics
|
||||
that would be beyond our understanding. So if each of two partially
|
||||
overlapping cpusets enables the flag 'cpuset.sched_load_balance', then we
|
||||
form a single sched domain that is a superset of both. We won't move
|
||||
a task to a CPU outside it cpuset, but the scheduler load balancing
|
||||
a task to a CPU outside its cpuset, but the scheduler load balancing
|
||||
code might waste some compute cycles considering that possibility.
|
||||
|
||||
This mismatch is why there is not a simple one-to-one relation
|
||||
@ -552,8 +552,8 @@ otherwise initial value -1 that indicates the cpuset has no request.
|
||||
1 : search siblings (hyperthreads in a core).
|
||||
2 : search cores in a package.
|
||||
3 : search cpus in a node [= system wide on non-NUMA system]
|
||||
( 4 : search nodes in a chunk of node [on NUMA system] )
|
||||
( 5 : search system wide [on NUMA system] )
|
||||
4 : search nodes in a chunk of node [on NUMA system]
|
||||
5 : search system wide [on NUMA system]
|
||||
|
||||
The system default is architecture dependent. The system default
|
||||
can be changed using the relax_domain_level= boot parameter.
|
||||
|
@ -326,7 +326,7 @@ per cgroup, instead of globally.
|
||||
|
||||
* tcp memory pressure: sockets memory pressure for the tcp protocol.
|
||||
|
||||
2.7.3 Common use cases
|
||||
2.7.2 Common use cases
|
||||
|
||||
Because the "kmem" counter is fed to the main user counter, kernel memory can
|
||||
never be limited completely independently of user memory. Say "U" is the user
|
||||
@ -354,19 +354,19 @@ set:
|
||||
|
||||
3. User Interface
|
||||
|
||||
0. Configuration
|
||||
3.0. Configuration
|
||||
|
||||
a. Enable CONFIG_CGROUPS
|
||||
b. Enable CONFIG_MEMCG
|
||||
c. Enable CONFIG_MEMCG_SWAP (to use swap extension)
|
||||
d. Enable CONFIG_MEMCG_KMEM (to use kmem extension)
|
||||
|
||||
1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
|
||||
3.1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
|
||||
# mount -t tmpfs none /sys/fs/cgroup
|
||||
# mkdir /sys/fs/cgroup/memory
|
||||
# mount -t cgroup none /sys/fs/cgroup/memory -o memory
|
||||
|
||||
2. Make the new group and move bash into it
|
||||
3.2. Make the new group and move bash into it
|
||||
# mkdir /sys/fs/cgroup/memory/0
|
||||
# echo $$ > /sys/fs/cgroup/memory/0/tasks
|
||||
|
||||
|
@ -829,6 +829,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
|
||||
CONFIG_DEBUG_PAGEALLOC, hence this option will not help
|
||||
tracking down these problems.
|
||||
|
||||
debug_pagealloc=
|
||||
[KNL] When CONFIG_DEBUG_PAGEALLOC is set, this
|
||||
parameter enables the feature at boot time. In
|
||||
default, it is disabled. We can avoid allocating huge
|
||||
chunk of memory for debug pagealloc if we don't enable
|
||||
it at boot time and the system will work mostly same
|
||||
with the kernel built without CONFIG_DEBUG_PAGEALLOC.
|
||||
on: enable the feature
|
||||
|
||||
debugpat [X86] Enable PAT debugging
|
||||
|
||||
decnet.addr= [HW,NET]
|
||||
@ -1228,9 +1237,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
|
||||
multiple times interleaved with hugepages= to reserve
|
||||
huge pages of different sizes. Valid pages sizes on
|
||||
x86-64 are 2M (when the CPU supports "pse") and 1G
|
||||
(when the CPU supports the "pdpe1gb" cpuinfo flag)
|
||||
Note that 1GB pages can only be allocated at boot time
|
||||
using hugepages= and not freed afterwards.
|
||||
(when the CPU supports the "pdpe1gb" cpuinfo flag).
|
||||
|
||||
hvc_iucv= [S390] Number of z/VM IUCV hypervisor console (HVC)
|
||||
terminal devices. Valid values: 0..8
|
||||
@ -2506,6 +2513,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
|
||||
OSS [HW,OSS]
|
||||
See Documentation/sound/oss/oss-parameters.txt
|
||||
|
||||
page_owner= [KNL] Boot-time page_owner enabling option.
|
||||
Storage of the information about who allocated
|
||||
each page is disabled in default. With this switch,
|
||||
we can turn it on.
|
||||
on: enable the feature
|
||||
|
||||
panic= [KNL] Kernel behaviour on panic: delay <timeout>
|
||||
timeout > 0: seconds before rebooting
|
||||
timeout = 0: wait forever
|
||||
|
@ -8,6 +8,11 @@ to implement them for any given architecture and shows how they can be used
|
||||
properly. It also stresses on the precautions that must be taken when reading
|
||||
those local variables across CPUs when the order of memory writes matters.
|
||||
|
||||
Note that local_t based operations are not recommended for general kernel use.
|
||||
Please use the this_cpu operations instead unless there is really a special purpose.
|
||||
Most uses of local_t in the kernel have been replaced by this_cpu operations.
|
||||
this_cpu operations combine the relocation with the local_t like semantics in
|
||||
a single instruction and yield more compact and faster executing code.
|
||||
|
||||
|
||||
* Purpose of local atomic operations
|
||||
@ -87,10 +92,10 @@ the per cpu variable. For instance :
|
||||
local_inc(&get_cpu_var(counters));
|
||||
put_cpu_var(counters);
|
||||
|
||||
If you are already in a preemption-safe context, you can directly use
|
||||
__get_cpu_var() instead.
|
||||
If you are already in a preemption-safe context, you can use
|
||||
this_cpu_ptr() instead.
|
||||
|
||||
local_inc(&__get_cpu_var(counters));
|
||||
local_inc(this_cpu_ptr(&counters));
|
||||
|
||||
|
||||
|
||||
@ -134,7 +139,7 @@ static void test_each(void *info)
|
||||
{
|
||||
/* Increment the counter from a non preemptible context */
|
||||
printk("Increment on cpu %d\n", smp_processor_id());
|
||||
local_inc(&__get_cpu_var(counters));
|
||||
local_inc(this_cpu_ptr(&counters));
|
||||
|
||||
/* This is what incrementing the variable would look like within a
|
||||
* preemptible context (it disables preemption) :
|
||||
|
@ -116,10 +116,12 @@ set during run time.
|
||||
|
||||
auto_msgmni:
|
||||
|
||||
Enables/Disables automatic recomputing of msgmni upon memory add/remove
|
||||
or upon ipc namespace creation/removal (see the msgmni description
|
||||
above). Echoing "1" into this file enables msgmni automatic recomputing.
|
||||
Echoing "0" turns it off. auto_msgmni default value is 1.
|
||||
This variable has no effect and may be removed in future kernel
|
||||
releases. Reading it always returns 0.
|
||||
Up to Linux 3.17, it enabled/disabled automatic recomputing of msgmni
|
||||
upon memory add/remove or upon ipc namespace creation/removal.
|
||||
Echoing "1" into this file enabled msgmni automatic recomputing.
|
||||
Echoing "0" turned it off. auto_msgmni default value was 1.
|
||||
|
||||
|
||||
==============================================================
|
||||
|
81
Documentation/vm/page_owner.txt
Normal file
81
Documentation/vm/page_owner.txt
Normal file
@ -0,0 +1,81 @@
|
||||
page owner: Tracking about who allocated each page
|
||||
-----------------------------------------------------------
|
||||
|
||||
* Introduction
|
||||
|
||||
page owner is for the tracking about who allocated each page.
|
||||
It can be used to debug memory leak or to find a memory hogger.
|
||||
When allocation happens, information about allocation such as call stack
|
||||
and order of pages is stored into certain storage for each page.
|
||||
When we need to know about status of all pages, we can get and analyze
|
||||
this information.
|
||||
|
||||
Although we already have tracepoint for tracing page allocation/free,
|
||||
using it for analyzing who allocate each page is rather complex. We need
|
||||
to enlarge the trace buffer for preventing overlapping until userspace
|
||||
program launched. And, launched program continually dump out the trace
|
||||
buffer for later analysis and it would change system behviour with more
|
||||
possibility rather than just keeping it in memory, so bad for debugging.
|
||||
|
||||
page owner can also be used for various purposes. For example, accurate
|
||||
fragmentation statistics can be obtained through gfp flag information of
|
||||
each page. It is already implemented and activated if page owner is
|
||||
enabled. Other usages are more than welcome.
|
||||
|
||||
page owner is disabled in default. So, if you'd like to use it, you need
|
||||
to add "page_owner=on" into your boot cmdline. If the kernel is built
|
||||
with page owner and page owner is disabled in runtime due to no enabling
|
||||
boot option, runtime overhead is marginal. If disabled in runtime, it
|
||||
doesn't require memory to store owner information, so there is no runtime
|
||||
memory overhead. And, page owner inserts just two unlikely branches into
|
||||
the page allocator hotpath and if it returns false then allocation is
|
||||
done like as the kernel without page owner. These two unlikely branches
|
||||
would not affect to allocation performance. Following is the kernel's
|
||||
code size change due to this facility.
|
||||
|
||||
- Without page owner
|
||||
text data bss dec hex filename
|
||||
40662 1493 644 42799 a72f mm/page_alloc.o
|
||||
|
||||
- With page owner
|
||||
text data bss dec hex filename
|
||||
40892 1493 644 43029 a815 mm/page_alloc.o
|
||||
1427 24 8 1459 5b3 mm/page_ext.o
|
||||
2722 50 0 2772 ad4 mm/page_owner.o
|
||||
|
||||
Although, roughly, 4 KB code is added in total, page_alloc.o increase by
|
||||
230 bytes and only half of it is in hotpath. Building the kernel with
|
||||
page owner and turning it on if needed would be great option to debug
|
||||
kernel memory problem.
|
||||
|
||||
There is one notice that is caused by implementation detail. page owner
|
||||
stores information into the memory from struct page extension. This memory
|
||||
is initialized some time later than that page allocator starts in sparse
|
||||
memory system, so, until initialization, many pages can be allocated and
|
||||
they would have no owner information. To fix it up, these early allocated
|
||||
pages are investigated and marked as allocated in initialization phase.
|
||||
Although it doesn't mean that they have the right owner information,
|
||||
at least, we can tell whether the page is allocated or not,
|
||||
more accurately. On 2GB memory x86-64 VM box, 13343 early allocated pages
|
||||
are catched and marked, although they are mostly allocated from struct
|
||||
page extension feature. Anyway, after that, no page is left in
|
||||
un-tracking state.
|
||||
|
||||
* Usage
|
||||
|
||||
1) Build user-space helper
|
||||
cd tools/vm
|
||||
make page_owner_sort
|
||||
|
||||
2) Enable page owner
|
||||
Add "page_owner=on" to boot cmdline.
|
||||
|
||||
3) Do the job what you want to debug
|
||||
|
||||
4) Analyze information from page owner
|
||||
cat /sys/kernel/debug/page_owner > page_owner_full.txt
|
||||
grep -v ^PFN page_owner_full.txt > page_owner.txt
|
||||
./page_owner_sort page_owner.txt sorted_page_owner.txt
|
||||
|
||||
See the result about who allocated each page
|
||||
in the sorted_page_owner.txt.
|
@ -4045,7 +4045,7 @@ F: drivers/tty/serial/ucc_uart.c
|
||||
FREESCALE SOC SOUND DRIVERS
|
||||
M: Timur Tabi <timur@tabi.org>
|
||||
M: Nicolin Chen <nicoleotsuka@gmail.com>
|
||||
M: Xiubo Li <Li.Xiubo@freescale.com>
|
||||
M: Xiubo Li <Xiubo.Lee@gmail.com>
|
||||
L: alsa-devel@alsa-project.org (moderated for non-subscribers)
|
||||
L: linuxppc-dev@lists.ozlabs.org
|
||||
S: Maintained
|
||||
|
@ -5,6 +5,7 @@ config ARM
|
||||
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
|
||||
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
|
||||
select ARCH_HAVE_CUSTOM_GPIO_H
|
||||
select ARCH_HAS_GCOV_PROFILE_ALL
|
||||
select ARCH_MIGHT_HAVE_PC_PARPORT
|
||||
select ARCH_SUPPORTS_ATOMIC_RMW
|
||||
select ARCH_USE_BUILTIN_BSWAP
|
||||
|
@ -2,6 +2,7 @@ config ARM64
|
||||
def_bool y
|
||||
select ARCH_BINFMT_ELF_RANDOMIZE_PIE
|
||||
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
|
||||
select ARCH_HAS_GCOV_PROFILE_ALL
|
||||
select ARCH_HAS_SG_CHAIN
|
||||
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
|
||||
select ARCH_USE_CMPXCHG_LOCKREF
|
||||
|
@ -1,5 +1,6 @@
|
||||
config MICROBLAZE
|
||||
def_bool y
|
||||
select ARCH_HAS_GCOV_PROFILE_ALL
|
||||
select ARCH_MIGHT_HAVE_PC_PARPORT
|
||||
select ARCH_WANT_IPC_PARSE_VERSION
|
||||
select ARCH_WANT_OPTIONAL_GPIOLIB
|
||||
|
@ -38,14 +38,14 @@
|
||||
LDREGX \t2(\t1),\t2
|
||||
addil LT%exception_data,%r27
|
||||
LDREG RT%exception_data(%r1),\t1
|
||||
/* t1 = &__get_cpu_var(exception_data) */
|
||||
/* t1 = this_cpu_ptr(&exception_data) */
|
||||
add,l \t1,\t2,\t1
|
||||
/* t1 = t1->fault_ip */
|
||||
LDREG EXCDATA_IP(\t1), \t1
|
||||
.endm
|
||||
#else
|
||||
.macro get_fault_ip t1 t2
|
||||
/* t1 = &__get_cpu_var(exception_data) */
|
||||
/* t1 = this_cpu_ptr(&exception_data) */
|
||||
addil LT%exception_data,%r27
|
||||
LDREG RT%exception_data(%r1),\t2
|
||||
/* t1 = t2->fault_ip */
|
||||
|
@ -129,6 +129,7 @@ config PPC
|
||||
select HAVE_BPF_JIT if PPC64
|
||||
select HAVE_ARCH_JUMP_LABEL
|
||||
select ARCH_HAVE_NMI_SAFE_CMPXCHG
|
||||
select ARCH_HAS_GCOV_PROFILE_ALL
|
||||
select GENERIC_SMP_IDLE_THREAD
|
||||
select GENERIC_CMOS_UPDATE
|
||||
select GENERIC_TIME_VSYSCALL_OLD
|
||||
|
@ -1514,7 +1514,7 @@ static void kernel_unmap_linear_page(unsigned long vaddr, unsigned long lmi)
|
||||
mmu_kernel_ssize, 0);
|
||||
}
|
||||
|
||||
void kernel_map_pages(struct page *page, int numpages, int enable)
|
||||
void __kernel_map_pages(struct page *page, int numpages, int enable)
|
||||
{
|
||||
unsigned long flags, vaddr, lmi;
|
||||
int i;
|
||||
|
@ -429,7 +429,7 @@ static int change_page_attr(struct page *page, int numpages, pgprot_t prot)
|
||||
}
|
||||
|
||||
|
||||
void kernel_map_pages(struct page *page, int numpages, int enable)
|
||||
void __kernel_map_pages(struct page *page, int numpages, int enable)
|
||||
{
|
||||
if (PageHighMem(page))
|
||||
return;
|
||||
|
@ -65,6 +65,7 @@ config S390
|
||||
def_bool y
|
||||
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
|
||||
select ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS
|
||||
select ARCH_HAS_GCOV_PROFILE_ALL
|
||||
select ARCH_HAVE_NMI_SAFE_CMPXCHG
|
||||
select ARCH_INLINE_READ_LOCK
|
||||
select ARCH_INLINE_READ_LOCK_BH
|
||||
|
@ -120,7 +120,7 @@ static void ipte_range(pte_t *pte, unsigned long address, int nr)
|
||||
}
|
||||
}
|
||||
|
||||
void kernel_map_pages(struct page *page, int numpages, int enable)
|
||||
void __kernel_map_pages(struct page *page, int numpages, int enable)
|
||||
{
|
||||
unsigned long address;
|
||||
int nr, i, j;
|
||||
|
@ -16,6 +16,7 @@ config SUPERH
|
||||
select HAVE_DEBUG_BUGVERBOSE
|
||||
select ARCH_HAVE_CUSTOM_GPIO_H
|
||||
select ARCH_HAVE_NMI_SAFE_CMPXCHG if (GUSA_RB || CPU_SH4A)
|
||||
select ARCH_HAS_GCOV_PROFILE_ALL
|
||||
select PERF_USE_VMALLOC
|
||||
select HAVE_DEBUG_KMEMLEAK
|
||||
select HAVE_KERNEL_GZIP
|
||||
|
@ -415,8 +415,9 @@
|
||||
#define __NR_getrandom 347
|
||||
#define __NR_memfd_create 348
|
||||
#define __NR_bpf 349
|
||||
#define __NR_execveat 350
|
||||
|
||||
#define NR_syscalls 350
|
||||
#define NR_syscalls 351
|
||||
|
||||
/* Bitmask values returned from kern_features system call. */
|
||||
#define KERN_FEATURE_MIXED_MODE_STACK 0x00000001
|
||||
|
@ -6,6 +6,11 @@ sys64_execve:
|
||||
jmpl %g1, %g0
|
||||
flushw
|
||||
|
||||
sys64_execveat:
|
||||
set sys_execveat, %g1
|
||||
jmpl %g1, %g0
|
||||
flushw
|
||||
|
||||
#ifdef CONFIG_COMPAT
|
||||
sunos_execv:
|
||||
mov %g0, %o2
|
||||
@ -13,6 +18,11 @@ sys32_execve:
|
||||
set compat_sys_execve, %g1
|
||||
jmpl %g1, %g0
|
||||
flushw
|
||||
|
||||
sys32_execveat:
|
||||
set compat_sys_execveat, %g1
|
||||
jmpl %g1, %g0
|
||||
flushw
|
||||
#endif
|
||||
|
||||
.align 32
|
||||
|
@ -87,3 +87,4 @@ sys_call_table:
|
||||
/*335*/ .long sys_syncfs, sys_sendmmsg, sys_setns, sys_process_vm_readv, sys_process_vm_writev
|
||||
/*340*/ .long sys_ni_syscall, sys_kcmp, sys_finit_module, sys_sched_setattr, sys_sched_getattr
|
||||
/*345*/ .long sys_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf
|
||||
/*350*/ .long sys_execveat
|
||||
|
@ -88,6 +88,7 @@ sys_call_table32:
|
||||
.word sys_syncfs, compat_sys_sendmmsg, sys_setns, compat_sys_process_vm_readv, compat_sys_process_vm_writev
|
||||
/*340*/ .word sys_kern_features, sys_kcmp, sys_finit_module, sys_sched_setattr, sys_sched_getattr
|
||||
.word sys32_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf
|
||||
/*350*/ .word sys32_execveat
|
||||
|
||||
#endif /* CONFIG_COMPAT */
|
||||
|
||||
@ -167,3 +168,4 @@ sys_call_table:
|
||||
.word sys_syncfs, sys_sendmmsg, sys_setns, sys_process_vm_readv, sys_process_vm_writev
|
||||
/*340*/ .word sys_kern_features, sys_kcmp, sys_finit_module, sys_sched_setattr, sys_sched_getattr
|
||||
.word sys_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf
|
||||
/*350*/ .word sys64_execveat
|
||||
|
@ -1621,7 +1621,7 @@ static void __init kernel_physical_mapping_init(void)
|
||||
}
|
||||
|
||||
#ifdef CONFIG_DEBUG_PAGEALLOC
|
||||
void kernel_map_pages(struct page *page, int numpages, int enable)
|
||||
void __kernel_map_pages(struct page *page, int numpages, int enable)
|
||||
{
|
||||
unsigned long phys_start = page_to_pfn(page) << PAGE_SHIFT;
|
||||
unsigned long phys_end = phys_start + (numpages * PAGE_SIZE);
|
||||
|
@ -24,6 +24,7 @@ config X86
|
||||
select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
|
||||
select ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS
|
||||
select ARCH_HAS_FAST_MULTIPLIER
|
||||
select ARCH_HAS_GCOV_PROFILE_ALL
|
||||
select ARCH_MIGHT_HAVE_PC_PARPORT
|
||||
select ARCH_MIGHT_HAVE_PC_SERIO
|
||||
select HAVE_AOUT if X86_32
|
||||
|
@ -35,6 +35,7 @@ int ia32_classify_syscall(unsigned syscall)
|
||||
case __NR_socketcall:
|
||||
return 4;
|
||||
case __NR_execve:
|
||||
case __NR_execveat:
|
||||
return 5;
|
||||
default:
|
||||
return 1;
|
||||
|
@ -480,6 +480,7 @@ GLOBAL(\label)
|
||||
PTREGSCALL stub32_rt_sigreturn, sys32_rt_sigreturn
|
||||
PTREGSCALL stub32_sigreturn, sys32_sigreturn
|
||||
PTREGSCALL stub32_execve, compat_sys_execve
|
||||
PTREGSCALL stub32_execveat, compat_sys_execveat
|
||||
PTREGSCALL stub32_fork, sys_fork
|
||||
PTREGSCALL stub32_vfork, sys_vfork
|
||||
|
||||
|
@ -50,6 +50,7 @@ int audit_classify_syscall(int abi, unsigned syscall)
|
||||
case __NR_openat:
|
||||
return 3;
|
||||
case __NR_execve:
|
||||
case __NR_execveat:
|
||||
return 5;
|
||||
default:
|
||||
return 0;
|
||||
|
@ -652,6 +652,20 @@ ENTRY(stub_execve)
|
||||
CFI_ENDPROC
|
||||
END(stub_execve)
|
||||
|
||||
ENTRY(stub_execveat)
|
||||
CFI_STARTPROC
|
||||
addq $8, %rsp
|
||||
PARTIAL_FRAME 0
|
||||
SAVE_REST
|
||||
FIXUP_TOP_OF_STACK %r11
|
||||
call sys_execveat
|
||||
RESTORE_TOP_OF_STACK %r11
|
||||
movq %rax,RAX(%rsp)
|
||||
RESTORE_REST
|
||||
jmp int_ret_from_sys_call
|
||||
CFI_ENDPROC
|
||||
END(stub_execveat)
|
||||
|
||||
/*
|
||||
* sigreturn is special because it needs to restore all registers on return.
|
||||
* This cannot be done with SYSRET, so use the IRET return path instead.
|
||||
@ -697,6 +711,20 @@ ENTRY(stub_x32_execve)
|
||||
CFI_ENDPROC
|
||||
END(stub_x32_execve)
|
||||
|
||||
ENTRY(stub_x32_execveat)
|
||||
CFI_STARTPROC
|
||||
addq $8, %rsp
|
||||
PARTIAL_FRAME 0
|
||||
SAVE_REST
|
||||
FIXUP_TOP_OF_STACK %r11
|
||||
call compat_sys_execveat
|
||||
RESTORE_TOP_OF_STACK %r11
|
||||
movq %rax,RAX(%rsp)
|
||||
RESTORE_REST
|
||||
jmp int_ret_from_sys_call
|
||||
CFI_ENDPROC
|
||||
END(stub_x32_execveat)
|
||||
|
||||
#endif
|
||||
|
||||
/*
|
||||
|
@ -1817,7 +1817,7 @@ static int __set_pages_np(struct page *page, int numpages)
|
||||
return __change_page_attr_set_clr(&cpa, 0);
|
||||
}
|
||||
|
||||
void kernel_map_pages(struct page *page, int numpages, int enable)
|
||||
void __kernel_map_pages(struct page *page, int numpages, int enable)
|
||||
{
|
||||
if (PageHighMem(page))
|
||||
return;
|
||||
|
@ -364,3 +364,4 @@
|
||||
355 i386 getrandom sys_getrandom
|
||||
356 i386 memfd_create sys_memfd_create
|
||||
357 i386 bpf sys_bpf
|
||||
358 i386 execveat sys_execveat stub32_execveat
|
||||
|
@ -328,6 +328,7 @@
|
||||
319 common memfd_create sys_memfd_create
|
||||
320 common kexec_file_load sys_kexec_file_load
|
||||
321 common bpf sys_bpf
|
||||
322 64 execveat stub_execveat
|
||||
|
||||
#
|
||||
# x32-specific system call numbers start at 512 to avoid cache impact
|
||||
@ -366,3 +367,4 @@
|
||||
542 x32 getsockopt compat_sys_getsockopt
|
||||
543 x32 io_setup compat_sys_io_setup
|
||||
544 x32 io_submit compat_sys_io_submit
|
||||
545 x32 execveat stub_x32_execveat
|
||||
|
@ -31,6 +31,7 @@
|
||||
#define stub_fork sys_fork
|
||||
#define stub_vfork sys_vfork
|
||||
#define stub_execve sys_execve
|
||||
#define stub_execveat sys_execveat
|
||||
#define stub_rt_sigreturn sys_rt_sigreturn
|
||||
|
||||
#define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym, compat)
|
||||
|
@ -228,8 +228,8 @@ memory_block_action(unsigned long phys_index, unsigned long action, int online_t
|
||||
struct page *first_page;
|
||||
int ret;
|
||||
|
||||
first_page = pfn_to_page(phys_index << PFN_SECTION_SHIFT);
|
||||
start_pfn = page_to_pfn(first_page);
|
||||
start_pfn = phys_index << PFN_SECTION_SHIFT;
|
||||
first_page = pfn_to_page(start_pfn);
|
||||
|
||||
switch (action) {
|
||||
case MEM_ONLINE:
|
||||
|
@ -44,15 +44,14 @@ static const char *default_compressor = "lzo";
|
||||
static unsigned int num_devices = 1;
|
||||
|
||||
#define ZRAM_ATTR_RO(name) \
|
||||
static ssize_t zram_attr_##name##_show(struct device *d, \
|
||||
static ssize_t name##_show(struct device *d, \
|
||||
struct device_attribute *attr, char *b) \
|
||||
{ \
|
||||
struct zram *zram = dev_to_zram(d); \
|
||||
return scnprintf(b, PAGE_SIZE, "%llu\n", \
|
||||
(u64)atomic64_read(&zram->stats.name)); \
|
||||
} \
|
||||
static struct device_attribute dev_attr_##name = \
|
||||
__ATTR(name, S_IRUGO, zram_attr_##name##_show, NULL);
|
||||
static DEVICE_ATTR_RO(name);
|
||||
|
||||
static inline int init_done(struct zram *zram)
|
||||
{
|
||||
@ -287,19 +286,18 @@ static inline int is_partial_io(struct bio_vec *bvec)
|
||||
/*
|
||||
* Check if request is within bounds and aligned on zram logical blocks.
|
||||
*/
|
||||
static inline int valid_io_request(struct zram *zram, struct bio *bio)
|
||||
static inline int valid_io_request(struct zram *zram,
|
||||
sector_t start, unsigned int size)
|
||||
{
|
||||
u64 start, end, bound;
|
||||
u64 end, bound;
|
||||
|
||||
/* unaligned request */
|
||||
if (unlikely(bio->bi_iter.bi_sector &
|
||||
(ZRAM_SECTOR_PER_LOGICAL_BLOCK - 1)))
|
||||
if (unlikely(start & (ZRAM_SECTOR_PER_LOGICAL_BLOCK - 1)))
|
||||
return 0;
|
||||
if (unlikely(bio->bi_iter.bi_size & (ZRAM_LOGICAL_BLOCK_SIZE - 1)))
|
||||
if (unlikely(size & (ZRAM_LOGICAL_BLOCK_SIZE - 1)))
|
||||
return 0;
|
||||
|
||||
start = bio->bi_iter.bi_sector;
|
||||
end = start + (bio->bi_iter.bi_size >> SECTOR_SHIFT);
|
||||
end = start + (size >> SECTOR_SHIFT);
|
||||
bound = zram->disksize >> SECTOR_SHIFT;
|
||||
/* out of range range */
|
||||
if (unlikely(start >= bound || end > bound || start > end))
|
||||
@ -453,7 +451,7 @@ static int zram_decompress_page(struct zram *zram, char *mem, u32 index)
|
||||
}
|
||||
|
||||
static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
|
||||
u32 index, int offset, struct bio *bio)
|
||||
u32 index, int offset)
|
||||
{
|
||||
int ret;
|
||||
struct page *page;
|
||||
@ -645,14 +643,13 @@ out:
|
||||
}
|
||||
|
||||
static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
|
||||
int offset, struct bio *bio)
|
||||
int offset, int rw)
|
||||
{
|
||||
int ret;
|
||||
int rw = bio_data_dir(bio);
|
||||
|
||||
if (rw == READ) {
|
||||
atomic64_inc(&zram->stats.num_reads);
|
||||
ret = zram_bvec_read(zram, bvec, index, offset, bio);
|
||||
ret = zram_bvec_read(zram, bvec, index, offset);
|
||||
} else {
|
||||
atomic64_inc(&zram->stats.num_writes);
|
||||
ret = zram_bvec_write(zram, bvec, index, offset);
|
||||
@ -853,7 +850,7 @@ out:
|
||||
|
||||
static void __zram_make_request(struct zram *zram, struct bio *bio)
|
||||
{
|
||||
int offset;
|
||||
int offset, rw;
|
||||
u32 index;
|
||||
struct bio_vec bvec;
|
||||
struct bvec_iter iter;
|
||||
@ -868,6 +865,7 @@ static void __zram_make_request(struct zram *zram, struct bio *bio)
|
||||
return;
|
||||
}
|
||||
|
||||
rw = bio_data_dir(bio);
|
||||
bio_for_each_segment(bvec, bio, iter) {
|
||||
int max_transfer_size = PAGE_SIZE - offset;
|
||||
|
||||
@ -882,15 +880,15 @@ static void __zram_make_request(struct zram *zram, struct bio *bio)
|
||||
bv.bv_len = max_transfer_size;
|
||||
bv.bv_offset = bvec.bv_offset;
|
||||
|
||||
if (zram_bvec_rw(zram, &bv, index, offset, bio) < 0)
|
||||
if (zram_bvec_rw(zram, &bv, index, offset, rw) < 0)
|
||||
goto out;
|
||||
|
||||
bv.bv_len = bvec.bv_len - max_transfer_size;
|
||||
bv.bv_offset += max_transfer_size;
|
||||
if (zram_bvec_rw(zram, &bv, index + 1, 0, bio) < 0)
|
||||
if (zram_bvec_rw(zram, &bv, index + 1, 0, rw) < 0)
|
||||
goto out;
|
||||
} else
|
||||
if (zram_bvec_rw(zram, &bvec, index, offset, bio) < 0)
|
||||
if (zram_bvec_rw(zram, &bvec, index, offset, rw) < 0)
|
||||
goto out;
|
||||
|
||||
update_position(&index, &offset, &bvec);
|
||||
@ -915,7 +913,8 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
|
||||
if (unlikely(!init_done(zram)))
|
||||
goto error;
|
||||
|
||||
if (!valid_io_request(zram, bio)) {
|
||||
if (!valid_io_request(zram, bio->bi_iter.bi_sector,
|
||||
bio->bi_iter.bi_size)) {
|
||||
atomic64_inc(&zram->stats.invalid_io);
|
||||
goto error;
|
||||
}
|
||||
@ -945,25 +944,64 @@ static void zram_slot_free_notify(struct block_device *bdev,
|
||||
atomic64_inc(&zram->stats.notify_free);
|
||||
}
|
||||
|
||||
static int zram_rw_page(struct block_device *bdev, sector_t sector,
|
||||
struct page *page, int rw)
|
||||
{
|
||||
int offset, err;
|
||||
u32 index;
|
||||
struct zram *zram;
|
||||
struct bio_vec bv;
|
||||
|
||||
zram = bdev->bd_disk->private_data;
|
||||
if (!valid_io_request(zram, sector, PAGE_SIZE)) {
|
||||
atomic64_inc(&zram->stats.invalid_io);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
down_read(&zram->init_lock);
|
||||
if (unlikely(!init_done(zram))) {
|
||||
err = -EIO;
|
||||
goto out_unlock;
|
||||
}
|
||||
|
||||
index = sector >> SECTORS_PER_PAGE_SHIFT;
|
||||
offset = sector & (SECTORS_PER_PAGE - 1) << SECTOR_SHIFT;
|
||||
|
||||
bv.bv_page = page;
|
||||
bv.bv_len = PAGE_SIZE;
|
||||
bv.bv_offset = 0;
|
||||
|
||||
err = zram_bvec_rw(zram, &bv, index, offset, rw);
|
||||
out_unlock:
|
||||
up_read(&zram->init_lock);
|
||||
/*
|
||||
* If I/O fails, just return error(ie, non-zero) without
|
||||
* calling page_endio.
|
||||
* It causes resubmit the I/O with bio request by upper functions
|
||||
* of rw_page(e.g., swap_readpage, __swap_writepage) and
|
||||
* bio->bi_end_io does things to handle the error
|
||||
* (e.g., SetPageError, set_page_dirty and extra works).
|
||||
*/
|
||||
if (err == 0)
|
||||
page_endio(page, rw, 0);
|
||||
return err;
|
||||
}
|
||||
|
||||
static const struct block_device_operations zram_devops = {
|
||||
.swap_slot_free_notify = zram_slot_free_notify,
|
||||
.rw_page = zram_rw_page,
|
||||
.owner = THIS_MODULE
|
||||
};
|
||||
|
||||
static DEVICE_ATTR(disksize, S_IRUGO | S_IWUSR,
|
||||
disksize_show, disksize_store);
|
||||
static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
|
||||
static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
|
||||
static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
|
||||
static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
|
||||
static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
|
||||
mem_limit_store);
|
||||
static DEVICE_ATTR(mem_used_max, S_IRUGO | S_IWUSR, mem_used_max_show,
|
||||
mem_used_max_store);
|
||||
static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
|
||||
max_comp_streams_show, max_comp_streams_store);
|
||||
static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
|
||||
comp_algorithm_show, comp_algorithm_store);
|
||||
static DEVICE_ATTR_RW(disksize);
|
||||
static DEVICE_ATTR_RO(initstate);
|
||||
static DEVICE_ATTR_WO(reset);
|
||||
static DEVICE_ATTR_RO(orig_data_size);
|
||||
static DEVICE_ATTR_RO(mem_used_total);
|
||||
static DEVICE_ATTR_RW(mem_limit);
|
||||
static DEVICE_ATTR_RW(mem_used_max);
|
||||
static DEVICE_ATTR_RW(max_comp_streams);
|
||||
static DEVICE_ATTR_RW(comp_algorithm);
|
||||
|
||||
ZRAM_ATTR_RO(num_reads);
|
||||
ZRAM_ATTR_RO(num_writes);
|
||||
|
@ -66,8 +66,8 @@ static const size_t max_zpage_size = PAGE_SIZE / 4 * 3;
|
||||
/* Flags for zram pages (table[page_no].value) */
|
||||
enum zram_pageflags {
|
||||
/* Page consists entirely of zeros */
|
||||
ZRAM_ZERO = ZRAM_FLAG_SHIFT + 1,
|
||||
ZRAM_ACCESS, /* page in now accessed */
|
||||
ZRAM_ZERO = ZRAM_FLAG_SHIFT,
|
||||
ZRAM_ACCESS, /* page is now accessed */
|
||||
|
||||
__NR_ZRAM_PAGEFLAGS,
|
||||
};
|
||||
|
@ -509,45 +509,67 @@ static void finish_pri_tag(struct device_state *dev_state,
|
||||
spin_unlock_irqrestore(&pasid_state->lock, flags);
|
||||
}
|
||||
|
||||
static void handle_fault_error(struct fault *fault)
|
||||
{
|
||||
int status;
|
||||
|
||||
if (!fault->dev_state->inv_ppr_cb) {
|
||||
set_pri_tag_status(fault->state, fault->tag, PPR_INVALID);
|
||||
return;
|
||||
}
|
||||
|
||||
status = fault->dev_state->inv_ppr_cb(fault->dev_state->pdev,
|
||||
fault->pasid,
|
||||
fault->address,
|
||||
fault->flags);
|
||||
switch (status) {
|
||||
case AMD_IOMMU_INV_PRI_RSP_SUCCESS:
|
||||
set_pri_tag_status(fault->state, fault->tag, PPR_SUCCESS);
|
||||
break;
|
||||
case AMD_IOMMU_INV_PRI_RSP_INVALID:
|
||||
set_pri_tag_status(fault->state, fault->tag, PPR_INVALID);
|
||||
break;
|
||||
case AMD_IOMMU_INV_PRI_RSP_FAIL:
|
||||
set_pri_tag_status(fault->state, fault->tag, PPR_FAILURE);
|
||||
break;
|
||||
default:
|
||||
BUG();
|
||||
}
|
||||
}
|
||||
|
||||
static void do_fault(struct work_struct *work)
|
||||
{
|
||||
struct fault *fault = container_of(work, struct fault, work);
|
||||
int npages, write;
|
||||
struct page *page;
|
||||
struct mm_struct *mm;
|
||||
struct vm_area_struct *vma;
|
||||
u64 address;
|
||||
int ret, write;
|
||||
|
||||
write = !!(fault->flags & PPR_FAULT_WRITE);
|
||||
|
||||
down_read(&fault->state->mm->mmap_sem);
|
||||
npages = get_user_pages(NULL, fault->state->mm,
|
||||
fault->address, 1, write, 0, &page, NULL);
|
||||
up_read(&fault->state->mm->mmap_sem);
|
||||
mm = fault->state->mm;
|
||||
address = fault->address;
|
||||
|
||||
if (npages == 1) {
|
||||
put_page(page);
|
||||
} else if (fault->dev_state->inv_ppr_cb) {
|
||||
int status;
|
||||
|
||||
status = fault->dev_state->inv_ppr_cb(fault->dev_state->pdev,
|
||||
fault->pasid,
|
||||
fault->address,
|
||||
fault->flags);
|
||||
switch (status) {
|
||||
case AMD_IOMMU_INV_PRI_RSP_SUCCESS:
|
||||
set_pri_tag_status(fault->state, fault->tag, PPR_SUCCESS);
|
||||
break;
|
||||
case AMD_IOMMU_INV_PRI_RSP_INVALID:
|
||||
set_pri_tag_status(fault->state, fault->tag, PPR_INVALID);
|
||||
break;
|
||||
case AMD_IOMMU_INV_PRI_RSP_FAIL:
|
||||
set_pri_tag_status(fault->state, fault->tag, PPR_FAILURE);
|
||||
break;
|
||||
default:
|
||||
BUG();
|
||||
}
|
||||
} else {
|
||||
set_pri_tag_status(fault->state, fault->tag, PPR_INVALID);
|
||||
down_read(&mm->mmap_sem);
|
||||
vma = find_extend_vma(mm, address);
|
||||
if (!vma || address < vma->vm_start) {
|
||||
/* failed to get a vma in the right range */
|
||||
up_read(&mm->mmap_sem);
|
||||
handle_fault_error(fault);
|
||||
goto out;
|
||||
}
|
||||
|
||||
ret = handle_mm_fault(mm, vma, address, write);
|
||||
if (ret & VM_FAULT_ERROR) {
|
||||
/* failed to service fault */
|
||||
up_read(&mm->mmap_sem);
|
||||
handle_fault_error(fault);
|
||||
goto out;
|
||||
}
|
||||
|
||||
up_read(&mm->mmap_sem);
|
||||
|
||||
out:
|
||||
finish_pri_tag(fault->dev_state, fault->state, fault->tag);
|
||||
|
||||
put_pasid_state(fault->state);
|
||||
|
@ -344,13 +344,20 @@ static int snvs_rtc_resume(struct device *dev)
|
||||
|
||||
return 0;
|
||||
}
|
||||
#endif
|
||||
|
||||
static const struct dev_pm_ops snvs_rtc_pm_ops = {
|
||||
.suspend_noirq = snvs_rtc_suspend,
|
||||
.resume_noirq = snvs_rtc_resume,
|
||||
};
|
||||
|
||||
#define SNVS_RTC_PM_OPS (&snvs_rtc_pm_ops)
|
||||
|
||||
#else
|
||||
|
||||
#define SNVS_RTC_PM_OPS NULL
|
||||
|
||||
#endif
|
||||
|
||||
static const struct of_device_id snvs_dt_ids[] = {
|
||||
{ .compatible = "fsl,sec-v4.0-mon-rtc-lp", },
|
||||
{ /* sentinel */ }
|
||||
@ -361,7 +368,7 @@ static struct platform_driver snvs_rtc_driver = {
|
||||
.driver = {
|
||||
.name = "snvs_rtc",
|
||||
.owner = THIS_MODULE,
|
||||
.pm = &snvs_rtc_pm_ops,
|
||||
.pm = SNVS_RTC_PM_OPS,
|
||||
.of_match_table = snvs_dt_ids,
|
||||
},
|
||||
.probe = snvs_rtc_probe,
|
||||
|
@ -418,7 +418,7 @@ out:
|
||||
}
|
||||
|
||||
/*
|
||||
* ashmem_shrink - our cache shrinker, called from mm/vmscan.c :: shrink_slab
|
||||
* ashmem_shrink - our cache shrinker, called from mm/vmscan.c
|
||||
*
|
||||
* 'nr_to_scan' is the number of objects to scan for freeing.
|
||||
*
|
||||
@ -785,7 +785,6 @@ static long ashmem_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
|
||||
.nr_to_scan = LONG_MAX,
|
||||
};
|
||||
ret = ashmem_shrink_count(&ashmem_shrinker, &sc);
|
||||
nodes_setall(sc.nodes_to_scan);
|
||||
ashmem_shrink_scan(&ashmem_shrinker, &sc);
|
||||
}
|
||||
break;
|
||||
|
@ -135,8 +135,10 @@ extern void affs_fix_checksum(struct super_block *sb, struct buffer_head *bh);
|
||||
extern void secs_to_datestamp(time_t secs, struct affs_date *ds);
|
||||
extern umode_t prot_to_mode(u32 prot);
|
||||
extern void mode_to_prot(struct inode *inode);
|
||||
__printf(3, 4)
|
||||
extern void affs_error(struct super_block *sb, const char *function,
|
||||
const char *fmt, ...);
|
||||
__printf(3, 4)
|
||||
extern void affs_warning(struct super_block *sb, const char *function,
|
||||
const char *fmt, ...);
|
||||
extern bool affs_nofilenametruncate(const struct dentry *dentry);
|
||||
|
@ -10,8 +10,6 @@
|
||||
|
||||
#include "affs.h"
|
||||
|
||||
static char ErrorBuffer[256];
|
||||
|
||||
/*
|
||||
* Functions for accessing Amiga-FFS structures.
|
||||
*/
|
||||
@ -444,30 +442,30 @@ mode_to_prot(struct inode *inode)
|
||||
void
|
||||
affs_error(struct super_block *sb, const char *function, const char *fmt, ...)
|
||||
{
|
||||
va_list args;
|
||||
struct va_format vaf;
|
||||
va_list args;
|
||||
|
||||
va_start(args,fmt);
|
||||
vsnprintf(ErrorBuffer,sizeof(ErrorBuffer),fmt,args);
|
||||
va_end(args);
|
||||
|
||||
pr_crit("error (device %s): %s(): %s\n", sb->s_id,
|
||||
function,ErrorBuffer);
|
||||
va_start(args, fmt);
|
||||
vaf.fmt = fmt;
|
||||
vaf.va = &args;
|
||||
pr_crit("error (device %s): %s(): %pV\n", sb->s_id, function, &vaf);
|
||||
if (!(sb->s_flags & MS_RDONLY))
|
||||
pr_warn("Remounting filesystem read-only\n");
|
||||
sb->s_flags |= MS_RDONLY;
|
||||
va_end(args);
|
||||
}
|
||||
|
||||
void
|
||||
affs_warning(struct super_block *sb, const char *function, const char *fmt, ...)
|
||||
{
|
||||
va_list args;
|
||||
struct va_format vaf;
|
||||
va_list args;
|
||||
|
||||
va_start(args,fmt);
|
||||
vsnprintf(ErrorBuffer,sizeof(ErrorBuffer),fmt,args);
|
||||
va_start(args, fmt);
|
||||
vaf.fmt = fmt;
|
||||
vaf.va = &args;
|
||||
pr_warn("(device %s): %s(): %pV\n", sb->s_id, function, &vaf);
|
||||
va_end(args);
|
||||
|
||||
pr_warn("(device %s): %s(): %s\n", sb->s_id,
|
||||
function,ErrorBuffer);
|
||||
}
|
||||
|
||||
bool
|
||||
|
@ -12,35 +12,10 @@
|
||||
* affs regular file handling primitives
|
||||
*/
|
||||
|
||||
#include <linux/aio.h>
|
||||
#include "affs.h"
|
||||
|
||||
#if PAGE_SIZE < 4096
|
||||
#error PAGE_SIZE must be at least 4096
|
||||
#endif
|
||||
|
||||
static int affs_grow_extcache(struct inode *inode, u32 lc_idx);
|
||||
static struct buffer_head *affs_alloc_extblock(struct inode *inode, struct buffer_head *bh, u32 ext);
|
||||
static inline struct buffer_head *affs_get_extblock(struct inode *inode, u32 ext);
|
||||
static struct buffer_head *affs_get_extblock_slow(struct inode *inode, u32 ext);
|
||||
static int affs_file_open(struct inode *inode, struct file *filp);
|
||||
static int affs_file_release(struct inode *inode, struct file *filp);
|
||||
|
||||
const struct file_operations affs_file_operations = {
|
||||
.llseek = generic_file_llseek,
|
||||
.read = new_sync_read,
|
||||
.read_iter = generic_file_read_iter,
|
||||
.write = new_sync_write,
|
||||
.write_iter = generic_file_write_iter,
|
||||
.mmap = generic_file_mmap,
|
||||
.open = affs_file_open,
|
||||
.release = affs_file_release,
|
||||
.fsync = affs_file_fsync,
|
||||
.splice_read = generic_file_splice_read,
|
||||
};
|
||||
|
||||
const struct inode_operations affs_file_inode_operations = {
|
||||
.setattr = affs_notify_change,
|
||||
};
|
||||
|
||||
static int
|
||||
affs_file_open(struct inode *inode, struct file *filp)
|
||||
@ -355,7 +330,8 @@ affs_get_block(struct inode *inode, sector_t block, struct buffer_head *bh_resul
|
||||
|
||||
/* store new block */
|
||||
if (bh_result->b_blocknr)
|
||||
affs_warning(sb, "get_block", "block already set (%x)", bh_result->b_blocknr);
|
||||
affs_warning(sb, "get_block", "block already set (%lx)",
|
||||
(unsigned long)bh_result->b_blocknr);
|
||||
AFFS_BLOCK(sb, ext_bh, block) = cpu_to_be32(blocknr);
|
||||
AFFS_HEAD(ext_bh)->block_count = cpu_to_be32(block + 1);
|
||||
affs_adjust_checksum(ext_bh, blocknr - bh_result->b_blocknr + 1);
|
||||
@ -377,7 +353,8 @@ affs_get_block(struct inode *inode, sector_t block, struct buffer_head *bh_resul
|
||||
return 0;
|
||||
|
||||
err_big:
|
||||
affs_error(inode->i_sb,"get_block","strange block request %d", block);
|
||||
affs_error(inode->i_sb, "get_block", "strange block request %d",
|
||||
(int)block);
|
||||
return -EIO;
|
||||
err_ext:
|
||||
// unlock cache
|
||||
@ -412,6 +389,22 @@ static void affs_write_failed(struct address_space *mapping, loff_t to)
|
||||
}
|
||||
}
|
||||
|
||||
static ssize_t
|
||||
affs_direct_IO(int rw, struct kiocb *iocb, struct iov_iter *iter,
|
||||
loff_t offset)
|
||||
{
|
||||
struct file *file = iocb->ki_filp;
|
||||
struct address_space *mapping = file->f_mapping;
|
||||
struct inode *inode = mapping->host;
|
||||
size_t count = iov_iter_count(iter);
|
||||
ssize_t ret;
|
||||
|
||||
ret = blockdev_direct_IO(rw, iocb, inode, iter, offset, affs_get_block);
|
||||
if (ret < 0 && (rw & WRITE))
|
||||
affs_write_failed(mapping, offset + count);
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int affs_write_begin(struct file *file, struct address_space *mapping,
|
||||
loff_t pos, unsigned len, unsigned flags,
|
||||
struct page **pagep, void **fsdata)
|
||||
@ -438,6 +431,7 @@ const struct address_space_operations affs_aops = {
|
||||
.writepage = affs_writepage,
|
||||
.write_begin = affs_write_begin,
|
||||
.write_end = generic_write_end,
|
||||
.direct_IO = affs_direct_IO,
|
||||
.bmap = _affs_bmap
|
||||
};
|
||||
|
||||
@ -867,8 +861,9 @@ affs_truncate(struct inode *inode)
|
||||
// lock cache
|
||||
ext_bh = affs_get_extblock(inode, ext);
|
||||
if (IS_ERR(ext_bh)) {
|
||||
affs_warning(sb, "truncate", "unexpected read error for ext block %u (%d)",
|
||||
ext, PTR_ERR(ext_bh));
|
||||
affs_warning(sb, "truncate",
|
||||
"unexpected read error for ext block %u (%ld)",
|
||||
(unsigned int)ext, PTR_ERR(ext_bh));
|
||||
return;
|
||||
}
|
||||
if (AFFS_I(inode)->i_lc) {
|
||||
@ -914,8 +909,9 @@ affs_truncate(struct inode *inode)
|
||||
struct buffer_head *bh = affs_bread_ino(inode, last_blk, 0);
|
||||
u32 tmp;
|
||||
if (IS_ERR(bh)) {
|
||||
affs_warning(sb, "truncate", "unexpected read error for last block %u (%d)",
|
||||
ext, PTR_ERR(bh));
|
||||
affs_warning(sb, "truncate",
|
||||
"unexpected read error for last block %u (%ld)",
|
||||
(unsigned int)ext, PTR_ERR(bh));
|
||||
return;
|
||||
}
|
||||
tmp = be32_to_cpu(AFFS_DATA_HEAD(bh)->next);
|
||||
@ -961,3 +957,19 @@ int affs_file_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
|
||||
mutex_unlock(&inode->i_mutex);
|
||||
return ret;
|
||||
}
|
||||
const struct file_operations affs_file_operations = {
|
||||
.llseek = generic_file_llseek,
|
||||
.read = new_sync_read,
|
||||
.read_iter = generic_file_read_iter,
|
||||
.write = new_sync_write,
|
||||
.write_iter = generic_file_write_iter,
|
||||
.mmap = generic_file_mmap,
|
||||
.open = affs_file_open,
|
||||
.release = affs_file_release,
|
||||
.fsync = affs_file_fsync,
|
||||
.splice_read = generic_file_splice_read,
|
||||
};
|
||||
|
||||
const struct inode_operations affs_file_inode_operations = {
|
||||
.setattr = affs_notify_change,
|
||||
};
|
||||
|
@ -269,10 +269,6 @@ more:
|
||||
}
|
||||
ctx->pos++;
|
||||
goto more;
|
||||
|
||||
befs_debug(sb, "<--- %s pos %lld", __func__, ctx->pos);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static struct inode *
|
||||
|
@ -42,6 +42,10 @@ static int load_em86(struct linux_binprm *bprm)
|
||||
return -ENOEXEC;
|
||||
}
|
||||
|
||||
/* Need to be able to load the file after exec */
|
||||
if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
|
||||
return -ENOENT;
|
||||
|
||||
allow_write_access(bprm->file);
|
||||
fput(bprm->file);
|
||||
bprm->file = NULL;
|
||||
|
@ -144,6 +144,10 @@ static int load_misc_binary(struct linux_binprm *bprm)
|
||||
if (!fmt)
|
||||
goto ret;
|
||||
|
||||
/* Need to be able to load the file after exec */
|
||||
if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
|
||||
return -ENOENT;
|
||||
|
||||
if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) {
|
||||
retval = remove_arg_zero(bprm);
|
||||
if (retval)
|
||||
|
@ -24,6 +24,16 @@ static int load_script(struct linux_binprm *bprm)
|
||||
|
||||
if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!'))
|
||||
return -ENOEXEC;
|
||||
|
||||
/*
|
||||
* If the script filename will be inaccessible after exec, typically
|
||||
* because it is a "/dev/fd/<fd>/.." path against an O_CLOEXEC fd, give
|
||||
* up now (on the assumption that the interpreter will want to load
|
||||
* this file).
|
||||
*/
|
||||
if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
|
||||
return -ENOENT;
|
||||
|
||||
/*
|
||||
* This section does the #! interpretation.
|
||||
* Sorta complicated, but hopefully it will work. -TYT
|
||||
|
@ -40,13 +40,14 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused)
|
||||
static void drop_slab(void)
|
||||
{
|
||||
int nr_objects;
|
||||
struct shrink_control shrink = {
|
||||
.gfp_mask = GFP_KERNEL,
|
||||
};
|
||||
|
||||
nodes_setall(shrink.nodes_to_scan);
|
||||
do {
|
||||
nr_objects = shrink_slab(&shrink, 1000, 1000);
|
||||
int nid;
|
||||
|
||||
nr_objects = 0;
|
||||
for_each_online_node(nid)
|
||||
nr_objects += shrink_node_slabs(GFP_KERNEL, nid,
|
||||
1000, 1000);
|
||||
} while (nr_objects > 10);
|
||||
}
|
||||
|
||||
|
113
fs/exec.c
113
fs/exec.c
@ -748,18 +748,25 @@ EXPORT_SYMBOL(setup_arg_pages);
|
||||
|
||||
#endif /* CONFIG_MMU */
|
||||
|
||||
static struct file *do_open_exec(struct filename *name)
|
||||
static struct file *do_open_execat(int fd, struct filename *name, int flags)
|
||||
{
|
||||
struct file *file;
|
||||
int err;
|
||||
static const struct open_flags open_exec_flags = {
|
||||
struct open_flags open_exec_flags = {
|
||||
.open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
|
||||
.acc_mode = MAY_EXEC | MAY_OPEN,
|
||||
.intent = LOOKUP_OPEN,
|
||||
.lookup_flags = LOOKUP_FOLLOW,
|
||||
};
|
||||
|
||||
file = do_filp_open(AT_FDCWD, name, &open_exec_flags);
|
||||
if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
|
||||
return ERR_PTR(-EINVAL);
|
||||
if (flags & AT_SYMLINK_NOFOLLOW)
|
||||
open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
|
||||
if (flags & AT_EMPTY_PATH)
|
||||
open_exec_flags.lookup_flags |= LOOKUP_EMPTY;
|
||||
|
||||
file = do_filp_open(fd, name, &open_exec_flags);
|
||||
if (IS_ERR(file))
|
||||
goto out;
|
||||
|
||||
@ -770,12 +777,13 @@ static struct file *do_open_exec(struct filename *name)
|
||||
if (file->f_path.mnt->mnt_flags & MNT_NOEXEC)
|
||||
goto exit;
|
||||
|
||||
fsnotify_open(file);
|
||||
|
||||
err = deny_write_access(file);
|
||||
if (err)
|
||||
goto exit;
|
||||
|
||||
if (name->name[0] != '\0')
|
||||
fsnotify_open(file);
|
||||
|
||||
out:
|
||||
return file;
|
||||
|
||||
@ -787,7 +795,7 @@ exit:
|
||||
struct file *open_exec(const char *name)
|
||||
{
|
||||
struct filename tmp = { .name = name };
|
||||
return do_open_exec(&tmp);
|
||||
return do_open_execat(AT_FDCWD, &tmp, 0);
|
||||
}
|
||||
EXPORT_SYMBOL(open_exec);
|
||||
|
||||
@ -1428,10 +1436,12 @@ static int exec_binprm(struct linux_binprm *bprm)
|
||||
/*
|
||||
* sys_execve() executes a new program.
|
||||
*/
|
||||
static int do_execve_common(struct filename *filename,
|
||||
struct user_arg_ptr argv,
|
||||
struct user_arg_ptr envp)
|
||||
static int do_execveat_common(int fd, struct filename *filename,
|
||||
struct user_arg_ptr argv,
|
||||
struct user_arg_ptr envp,
|
||||
int flags)
|
||||
{
|
||||
char *pathbuf = NULL;
|
||||
struct linux_binprm *bprm;
|
||||
struct file *file;
|
||||
struct files_struct *displaced;
|
||||
@ -1472,7 +1482,7 @@ static int do_execve_common(struct filename *filename,
|
||||
check_unsafe_exec(bprm);
|
||||
current->in_execve = 1;
|
||||
|
||||
file = do_open_exec(filename);
|
||||
file = do_open_execat(fd, filename, flags);
|
||||
retval = PTR_ERR(file);
|
||||
if (IS_ERR(file))
|
||||
goto out_unmark;
|
||||
@ -1480,7 +1490,28 @@ static int do_execve_common(struct filename *filename,
|
||||
sched_exec();
|
||||
|
||||
bprm->file = file;
|
||||
bprm->filename = bprm->interp = filename->name;
|
||||
if (fd == AT_FDCWD || filename->name[0] == '/') {
|
||||
bprm->filename = filename->name;
|
||||
} else {
|
||||
if (filename->name[0] == '\0')
|
||||
pathbuf = kasprintf(GFP_TEMPORARY, "/dev/fd/%d", fd);
|
||||
else
|
||||
pathbuf = kasprintf(GFP_TEMPORARY, "/dev/fd/%d/%s",
|
||||
fd, filename->name);
|
||||
if (!pathbuf) {
|
||||
retval = -ENOMEM;
|
||||
goto out_unmark;
|
||||
}
|
||||
/*
|
||||
* Record that a name derived from an O_CLOEXEC fd will be
|
||||
* inaccessible after exec. Relies on having exclusive access to
|
||||
* current->files (due to unshare_files above).
|
||||
*/
|
||||
if (close_on_exec(fd, rcu_dereference_raw(current->files->fdt)))
|
||||
bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE;
|
||||
bprm->filename = pathbuf;
|
||||
}
|
||||
bprm->interp = bprm->filename;
|
||||
|
||||
retval = bprm_mm_init(bprm);
|
||||
if (retval)
|
||||
@ -1521,6 +1552,7 @@ static int do_execve_common(struct filename *filename,
|
||||
acct_update_integrals(current);
|
||||
task_numa_free(current);
|
||||
free_bprm(bprm);
|
||||
kfree(pathbuf);
|
||||
putname(filename);
|
||||
if (displaced)
|
||||
put_files_struct(displaced);
|
||||
@ -1538,6 +1570,7 @@ out_unmark:
|
||||
|
||||
out_free:
|
||||
free_bprm(bprm);
|
||||
kfree(pathbuf);
|
||||
|
||||
out_files:
|
||||
if (displaced)
|
||||
@ -1553,7 +1586,18 @@ int do_execve(struct filename *filename,
|
||||
{
|
||||
struct user_arg_ptr argv = { .ptr.native = __argv };
|
||||
struct user_arg_ptr envp = { .ptr.native = __envp };
|
||||
return do_execve_common(filename, argv, envp);
|
||||
return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
|
||||
}
|
||||
|
||||
int do_execveat(int fd, struct filename *filename,
|
||||
const char __user *const __user *__argv,
|
||||
const char __user *const __user *__envp,
|
||||
int flags)
|
||||
{
|
||||
struct user_arg_ptr argv = { .ptr.native = __argv };
|
||||
struct user_arg_ptr envp = { .ptr.native = __envp };
|
||||
|
||||
return do_execveat_common(fd, filename, argv, envp, flags);
|
||||
}
|
||||
|
||||
#ifdef CONFIG_COMPAT
|
||||
@ -1569,7 +1613,23 @@ static int compat_do_execve(struct filename *filename,
|
||||
.is_compat = true,
|
||||
.ptr.compat = __envp,
|
||||
};
|
||||
return do_execve_common(filename, argv, envp);
|
||||
return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
|
||||
}
|
||||
|
||||
static int compat_do_execveat(int fd, struct filename *filename,
|
||||
const compat_uptr_t __user *__argv,
|
||||
const compat_uptr_t __user *__envp,
|
||||
int flags)
|
||||
{
|
||||
struct user_arg_ptr argv = {
|
||||
.is_compat = true,
|
||||
.ptr.compat = __argv,
|
||||
};
|
||||
struct user_arg_ptr envp = {
|
||||
.is_compat = true,
|
||||
.ptr.compat = __envp,
|
||||
};
|
||||
return do_execveat_common(fd, filename, argv, envp, flags);
|
||||
}
|
||||
#endif
|
||||
|
||||
@ -1609,6 +1669,20 @@ SYSCALL_DEFINE3(execve,
|
||||
{
|
||||
return do_execve(getname(filename), argv, envp);
|
||||
}
|
||||
|
||||
SYSCALL_DEFINE5(execveat,
|
||||
int, fd, const char __user *, filename,
|
||||
const char __user *const __user *, argv,
|
||||
const char __user *const __user *, envp,
|
||||
int, flags)
|
||||
{
|
||||
int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
|
||||
|
||||
return do_execveat(fd,
|
||||
getname_flags(filename, lookup_flags, NULL),
|
||||
argv, envp, flags);
|
||||
}
|
||||
|
||||
#ifdef CONFIG_COMPAT
|
||||
COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
|
||||
const compat_uptr_t __user *, argv,
|
||||
@ -1616,4 +1690,17 @@ COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
|
||||
{
|
||||
return compat_do_execve(getname(filename), argv, envp);
|
||||
}
|
||||
|
||||
COMPAT_SYSCALL_DEFINE5(execveat, int, fd,
|
||||
const char __user *, filename,
|
||||
const compat_uptr_t __user *, argv,
|
||||
const compat_uptr_t __user *, envp,
|
||||
int, flags)
|
||||
{
|
||||
int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
|
||||
|
||||
return compat_do_execveat(fd,
|
||||
getname_flags(filename, lookup_flags, NULL),
|
||||
argv, envp, flags);
|
||||
}
|
||||
#endif
|
||||
|
@ -370,6 +370,7 @@ extern int fat_file_fsync(struct file *file, loff_t start, loff_t end,
|
||||
int datasync);
|
||||
|
||||
/* fat/inode.c */
|
||||
extern int fat_block_truncate_page(struct inode *inode, loff_t from);
|
||||
extern void fat_attach(struct inode *inode, loff_t i_pos);
|
||||
extern void fat_detach(struct inode *inode);
|
||||
extern struct inode *fat_iget(struct super_block *sb, loff_t i_pos);
|
||||
|
@ -443,6 +443,9 @@ int fat_setattr(struct dentry *dentry, struct iattr *attr)
|
||||
}
|
||||
|
||||
if (attr->ia_valid & ATTR_SIZE) {
|
||||
error = fat_block_truncate_page(inode, attr->ia_size);
|
||||
if (error)
|
||||
goto out;
|
||||
down_write(&MSDOS_I(inode)->truncate_lock);
|
||||
truncate_setsize(inode, attr->ia_size);
|
||||
fat_truncate_blocks(inode, attr->ia_size);
|
||||
|
@ -294,6 +294,18 @@ static sector_t _fat_bmap(struct address_space *mapping, sector_t block)
|
||||
return blocknr;
|
||||
}
|
||||
|
||||
/*
|
||||
* fat_block_truncate_page() zeroes out a mapping from file offset `from'
|
||||
* up to the end of the block which corresponds to `from'.
|
||||
* This is required during truncate to physically zeroout the tail end
|
||||
* of that block so it doesn't yield old data if the file is later grown.
|
||||
* Also, avoid causing failure from fsx for cases of "data past EOF"
|
||||
*/
|
||||
int fat_block_truncate_page(struct inode *inode, loff_t from)
|
||||
{
|
||||
return block_truncate_page(inode->i_mapping, from, fat_get_block);
|
||||
}
|
||||
|
||||
static const struct address_space_operations fat_aops = {
|
||||
.readpage = fat_readpage,
|
||||
.readpages = fat_readpages,
|
||||
|
@ -412,10 +412,10 @@ static int hugetlb_vmtruncate(struct inode *inode, loff_t offset)
|
||||
pgoff = offset >> PAGE_SHIFT;
|
||||
|
||||
i_size_write(inode, offset);
|
||||
mutex_lock(&mapping->i_mmap_mutex);
|
||||
i_mmap_lock_write(mapping);
|
||||
if (!RB_EMPTY_ROOT(&mapping->i_mmap))
|
||||
hugetlb_vmtruncate_list(&mapping->i_mmap, pgoff);
|
||||
mutex_unlock(&mapping->i_mmap_mutex);
|
||||
i_mmap_unlock_write(mapping);
|
||||
truncate_hugepages(inode, offset);
|
||||
return 0;
|
||||
}
|
||||
@ -472,12 +472,12 @@ static struct inode *hugetlbfs_get_root(struct super_block *sb,
|
||||
}
|
||||
|
||||
/*
|
||||
* Hugetlbfs is not reclaimable; therefore its i_mmap_mutex will never
|
||||
* Hugetlbfs is not reclaimable; therefore its i_mmap_rwsem will never
|
||||
* be taken from reclaim -- unlike regular filesystems. This needs an
|
||||
* annotation because huge_pmd_share() does an allocation under
|
||||
* i_mmap_mutex.
|
||||
* i_mmap_rwsem.
|
||||
*/
|
||||
static struct lock_class_key hugetlbfs_i_mmap_mutex_key;
|
||||
static struct lock_class_key hugetlbfs_i_mmap_rwsem_key;
|
||||
|
||||
static struct inode *hugetlbfs_get_inode(struct super_block *sb,
|
||||
struct inode *dir,
|
||||
@ -495,8 +495,8 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb,
|
||||
struct hugetlbfs_inode_info *info;
|
||||
inode->i_ino = get_next_ino();
|
||||
inode_init_owner(inode, dir, mode);
|
||||
lockdep_set_class(&inode->i_mapping->i_mmap_mutex,
|
||||
&hugetlbfs_i_mmap_mutex_key);
|
||||
lockdep_set_class(&inode->i_mapping->i_mmap_rwsem,
|
||||
&hugetlbfs_i_mmap_rwsem_key);
|
||||
inode->i_mapping->a_ops = &hugetlbfs_aops;
|
||||
inode->i_mapping->backing_dev_info =&hugetlbfs_backing_dev_info;
|
||||
inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
|
||||
|
@ -346,7 +346,7 @@ void address_space_init_once(struct address_space *mapping)
|
||||
memset(mapping, 0, sizeof(*mapping));
|
||||
INIT_RADIX_TREE(&mapping->page_tree, GFP_ATOMIC);
|
||||
spin_lock_init(&mapping->tree_lock);
|
||||
mutex_init(&mapping->i_mmap_mutex);
|
||||
init_rwsem(&mapping->i_mmap_rwsem);
|
||||
INIT_LIST_HEAD(&mapping->private_list);
|
||||
spin_lock_init(&mapping->private_lock);
|
||||
mapping->i_mmap = RB_ROOT;
|
||||
|
@ -130,7 +130,7 @@ void final_putname(struct filename *name)
|
||||
|
||||
#define EMBEDDED_NAME_MAX (PATH_MAX - sizeof(struct filename))
|
||||
|
||||
static struct filename *
|
||||
struct filename *
|
||||
getname_flags(const char __user *filename, int flags, int *empty)
|
||||
{
|
||||
struct filename *result, *err;
|
||||
|
@ -69,8 +69,8 @@ static void dnotify_recalc_inode_mask(struct fsnotify_mark *fsn_mark)
|
||||
if (old_mask == new_mask)
|
||||
return;
|
||||
|
||||
if (fsn_mark->i.inode)
|
||||
fsnotify_recalc_inode_mask(fsn_mark->i.inode);
|
||||
if (fsn_mark->inode)
|
||||
fsnotify_recalc_inode_mask(fsn_mark->inode);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -80,7 +80,7 @@ static void inotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
|
||||
return;
|
||||
|
||||
inode_mark = container_of(mark, struct inotify_inode_mark, fsn_mark);
|
||||
inode = igrab(mark->i.inode);
|
||||
inode = igrab(mark->inode);
|
||||
if (inode) {
|
||||
seq_printf(m, "inotify wd:%x ino:%lx sdev:%x mask:%x ignored_mask:%x ",
|
||||
inode_mark->wd, inode->i_ino, inode->i_sb->s_dev,
|
||||
@ -112,7 +112,7 @@ static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
|
||||
mflags |= FAN_MARK_IGNORED_SURV_MODIFY;
|
||||
|
||||
if (mark->flags & FSNOTIFY_MARK_FLAG_INODE) {
|
||||
inode = igrab(mark->i.inode);
|
||||
inode = igrab(mark->inode);
|
||||
if (!inode)
|
||||
return;
|
||||
seq_printf(m, "fanotify ino:%lx sdev:%x mflags:%x mask:%x ignored_mask:%x ",
|
||||
@ -122,7 +122,7 @@ static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
|
||||
seq_putc(m, '\n');
|
||||
iput(inode);
|
||||
} else if (mark->flags & FSNOTIFY_MARK_FLAG_VFSMOUNT) {
|
||||
struct mount *mnt = real_mount(mark->m.mnt);
|
||||
struct mount *mnt = real_mount(mark->mnt);
|
||||
|
||||
seq_printf(m, "fanotify mnt_id:%x mflags:%x mask:%x ignored_mask:%x\n",
|
||||
mnt->mnt_id, mflags, mark->mask, mark->ignored_mask);
|
||||
|
@ -242,13 +242,13 @@ int fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is,
|
||||
|
||||
if (inode_node) {
|
||||
inode_mark = hlist_entry(srcu_dereference(inode_node, &fsnotify_mark_srcu),
|
||||
struct fsnotify_mark, i.i_list);
|
||||
struct fsnotify_mark, obj_list);
|
||||
inode_group = inode_mark->group;
|
||||
}
|
||||
|
||||
if (vfsmount_node) {
|
||||
vfsmount_mark = hlist_entry(srcu_dereference(vfsmount_node, &fsnotify_mark_srcu),
|
||||
struct fsnotify_mark, m.m_list);
|
||||
struct fsnotify_mark, obj_list);
|
||||
vfsmount_group = vfsmount_mark->group;
|
||||
}
|
||||
|
||||
|
@ -12,12 +12,19 @@ extern void fsnotify_flush_notify(struct fsnotify_group *group);
|
||||
/* protects reads of inode and vfsmount marks list */
|
||||
extern struct srcu_struct fsnotify_mark_srcu;
|
||||
|
||||
/* Calculate mask of events for a list of marks */
|
||||
extern u32 fsnotify_recalc_mask(struct hlist_head *head);
|
||||
|
||||
/* compare two groups for sorting of marks lists */
|
||||
extern int fsnotify_compare_groups(struct fsnotify_group *a,
|
||||
struct fsnotify_group *b);
|
||||
|
||||
extern void fsnotify_set_inode_mark_mask_locked(struct fsnotify_mark *fsn_mark,
|
||||
__u32 mask);
|
||||
/* Add mark to a proper place in mark list */
|
||||
extern int fsnotify_add_mark_list(struct hlist_head *head,
|
||||
struct fsnotify_mark *mark,
|
||||
int allow_dups);
|
||||
/* add a mark to an inode */
|
||||
extern int fsnotify_add_inode_mark(struct fsnotify_mark *mark,
|
||||
struct fsnotify_group *group, struct inode *inode,
|
||||
@ -31,6 +38,11 @@ extern int fsnotify_add_vfsmount_mark(struct fsnotify_mark *mark,
|
||||
extern void fsnotify_destroy_vfsmount_mark(struct fsnotify_mark *mark);
|
||||
/* inode specific destruction of a mark */
|
||||
extern void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark);
|
||||
/* Destroy all marks in the given list */
|
||||
extern void fsnotify_destroy_marks(struct list_head *to_free);
|
||||
/* Find mark belonging to given group in the list of marks */
|
||||
extern struct fsnotify_mark *fsnotify_find_mark(struct hlist_head *head,
|
||||
struct fsnotify_group *group);
|
||||
/* run the list of all marks associated with inode and flag them to be freed */
|
||||
extern void fsnotify_clear_marks_by_inode(struct inode *inode);
|
||||
/* run the list of all marks associated with vfsmount and flag them to be freed */
|
||||
|
@ -30,21 +30,6 @@
|
||||
|
||||
#include "../internal.h"
|
||||
|
||||
/*
|
||||
* Recalculate the mask of events relevant to a given inode locked.
|
||||
*/
|
||||
static void fsnotify_recalc_inode_mask_locked(struct inode *inode)
|
||||
{
|
||||
struct fsnotify_mark *mark;
|
||||
__u32 new_mask = 0;
|
||||
|
||||
assert_spin_locked(&inode->i_lock);
|
||||
|
||||
hlist_for_each_entry(mark, &inode->i_fsnotify_marks, i.i_list)
|
||||
new_mask |= mark->mask;
|
||||
inode->i_fsnotify_mask = new_mask;
|
||||
}
|
||||
|
||||
/*
|
||||
* Recalculate the inode->i_fsnotify_mask, or the mask of all FS_* event types
|
||||
* any notifier is interested in hearing for this inode.
|
||||
@ -52,7 +37,7 @@ static void fsnotify_recalc_inode_mask_locked(struct inode *inode)
|
||||
void fsnotify_recalc_inode_mask(struct inode *inode)
|
||||
{
|
||||
spin_lock(&inode->i_lock);
|
||||
fsnotify_recalc_inode_mask_locked(inode);
|
||||
inode->i_fsnotify_mask = fsnotify_recalc_mask(&inode->i_fsnotify_marks);
|
||||
spin_unlock(&inode->i_lock);
|
||||
|
||||
__fsnotify_update_child_dentry_flags(inode);
|
||||
@ -60,23 +45,22 @@ void fsnotify_recalc_inode_mask(struct inode *inode)
|
||||
|
||||
void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark)
|
||||
{
|
||||
struct inode *inode = mark->i.inode;
|
||||
struct inode *inode = mark->inode;
|
||||
|
||||
BUG_ON(!mutex_is_locked(&mark->group->mark_mutex));
|
||||
assert_spin_locked(&mark->lock);
|
||||
|
||||
spin_lock(&inode->i_lock);
|
||||
|
||||
hlist_del_init_rcu(&mark->i.i_list);
|
||||
mark->i.inode = NULL;
|
||||
hlist_del_init_rcu(&mark->obj_list);
|
||||
mark->inode = NULL;
|
||||
|
||||
/*
|
||||
* this mark is now off the inode->i_fsnotify_marks list and we
|
||||
* hold the inode->i_lock, so this is the perfect time to update the
|
||||
* inode->i_fsnotify_mask
|
||||
*/
|
||||
fsnotify_recalc_inode_mask_locked(inode);
|
||||
|
||||
inode->i_fsnotify_mask = fsnotify_recalc_mask(&inode->i_fsnotify_marks);
|
||||
spin_unlock(&inode->i_lock);
|
||||
}
|
||||
|
||||
@ -85,30 +69,19 @@ void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark)
|
||||
*/
|
||||
void fsnotify_clear_marks_by_inode(struct inode *inode)
|
||||
{
|
||||
struct fsnotify_mark *mark, *lmark;
|
||||
struct fsnotify_mark *mark;
|
||||
struct hlist_node *n;
|
||||
LIST_HEAD(free_list);
|
||||
|
||||
spin_lock(&inode->i_lock);
|
||||
hlist_for_each_entry_safe(mark, n, &inode->i_fsnotify_marks, i.i_list) {
|
||||
list_add(&mark->i.free_i_list, &free_list);
|
||||
hlist_del_init_rcu(&mark->i.i_list);
|
||||
hlist_for_each_entry_safe(mark, n, &inode->i_fsnotify_marks, obj_list) {
|
||||
list_add(&mark->free_list, &free_list);
|
||||
hlist_del_init_rcu(&mark->obj_list);
|
||||
fsnotify_get_mark(mark);
|
||||
}
|
||||
spin_unlock(&inode->i_lock);
|
||||
|
||||
list_for_each_entry_safe(mark, lmark, &free_list, i.free_i_list) {
|
||||
struct fsnotify_group *group;
|
||||
|
||||
spin_lock(&mark->lock);
|
||||
fsnotify_get_group(mark->group);
|
||||
group = mark->group;
|
||||
spin_unlock(&mark->lock);
|
||||
|
||||
fsnotify_destroy_mark(mark, group);
|
||||
fsnotify_put_mark(mark);
|
||||
fsnotify_put_group(group);
|
||||
}
|
||||
fsnotify_destroy_marks(&free_list);
|
||||
}
|
||||
|
||||
/*
|
||||
@ -119,27 +92,6 @@ void fsnotify_clear_inode_marks_by_group(struct fsnotify_group *group)
|
||||
fsnotify_clear_marks_by_group_flags(group, FSNOTIFY_MARK_FLAG_INODE);
|
||||
}
|
||||
|
||||
/*
|
||||
* given a group and inode, find the mark associated with that combination.
|
||||
* if found take a reference to that mark and return it, else return NULL
|
||||
*/
|
||||
static struct fsnotify_mark *fsnotify_find_inode_mark_locked(
|
||||
struct fsnotify_group *group,
|
||||
struct inode *inode)
|
||||
{
|
||||
struct fsnotify_mark *mark;
|
||||
|
||||
assert_spin_locked(&inode->i_lock);
|
||||
|
||||
hlist_for_each_entry(mark, &inode->i_fsnotify_marks, i.i_list) {
|
||||
if (mark->group == group) {
|
||||
fsnotify_get_mark(mark);
|
||||
return mark;
|
||||
}
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
/*
|
||||
* given a group and inode, find the mark associated with that combination.
|
||||
* if found take a reference to that mark and return it, else return NULL
|
||||
@ -150,7 +102,7 @@ struct fsnotify_mark *fsnotify_find_inode_mark(struct fsnotify_group *group,
|
||||
struct fsnotify_mark *mark;
|
||||
|
||||
spin_lock(&inode->i_lock);
|
||||
mark = fsnotify_find_inode_mark_locked(group, inode);
|
||||
mark = fsnotify_find_mark(&inode->i_fsnotify_marks, group);
|
||||
spin_unlock(&inode->i_lock);
|
||||
|
||||
return mark;
|
||||
@ -168,10 +120,10 @@ void fsnotify_set_inode_mark_mask_locked(struct fsnotify_mark *mark,
|
||||
assert_spin_locked(&mark->lock);
|
||||
|
||||
if (mask &&
|
||||
mark->i.inode &&
|
||||
mark->inode &&
|
||||
!(mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED)) {
|
||||
mark->flags |= FSNOTIFY_MARK_FLAG_OBJECT_PINNED;
|
||||
inode = igrab(mark->i.inode);
|
||||
inode = igrab(mark->inode);
|
||||
/*
|
||||
* we shouldn't be able to get here if the inode wasn't
|
||||
* already safely held in memory. But bug in case it
|
||||
@ -192,9 +144,7 @@ int fsnotify_add_inode_mark(struct fsnotify_mark *mark,
|
||||
struct fsnotify_group *group, struct inode *inode,
|
||||
int allow_dups)
|
||||
{
|
||||
struct fsnotify_mark *lmark, *last = NULL;
|
||||
int ret = 0;
|
||||
int cmp;
|
||||
int ret;
|
||||
|
||||
mark->flags |= FSNOTIFY_MARK_FLAG_INODE;
|
||||
|
||||
@ -202,37 +152,10 @@ int fsnotify_add_inode_mark(struct fsnotify_mark *mark,
|
||||
assert_spin_locked(&mark->lock);
|
||||
|
||||
spin_lock(&inode->i_lock);
|
||||
|
||||
mark->i.inode = inode;
|
||||
|
||||
/* is mark the first mark? */
|
||||
if (hlist_empty(&inode->i_fsnotify_marks)) {
|
||||
hlist_add_head_rcu(&mark->i.i_list, &inode->i_fsnotify_marks);
|
||||
goto out;
|
||||
}
|
||||
|
||||
/* should mark be in the middle of the current list? */
|
||||
hlist_for_each_entry(lmark, &inode->i_fsnotify_marks, i.i_list) {
|
||||
last = lmark;
|
||||
|
||||
if ((lmark->group == group) && !allow_dups) {
|
||||
ret = -EEXIST;
|
||||
goto out;
|
||||
}
|
||||
|
||||
cmp = fsnotify_compare_groups(lmark->group, mark->group);
|
||||
if (cmp < 0)
|
||||
continue;
|
||||
|
||||
hlist_add_before_rcu(&mark->i.i_list, &lmark->i.i_list);
|
||||
goto out;
|
||||
}
|
||||
|
||||
BUG_ON(last == NULL);
|
||||
/* mark should be the last entry. last is the current last entry */
|
||||
hlist_add_behind_rcu(&mark->i.i_list, &last->i.i_list);
|
||||
out:
|
||||
fsnotify_recalc_inode_mask_locked(inode);
|
||||
mark->inode = inode;
|
||||
ret = fsnotify_add_mark_list(&inode->i_fsnotify_marks, mark,
|
||||
allow_dups);
|
||||
inode->i_fsnotify_mask = fsnotify_recalc_mask(&inode->i_fsnotify_marks);
|
||||
spin_unlock(&inode->i_lock);
|
||||
|
||||
return ret;
|
||||
|
@ -156,7 +156,7 @@ static int idr_callback(int id, void *p, void *data)
|
||||
*/
|
||||
if (fsn_mark)
|
||||
printk(KERN_WARNING "fsn_mark->group=%p inode=%p wd=%d\n",
|
||||
fsn_mark->group, fsn_mark->i.inode, i_mark->wd);
|
||||
fsn_mark->group, fsn_mark->inode, i_mark->wd);
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
@ -433,7 +433,7 @@ static void inotify_remove_from_idr(struct fsnotify_group *group,
|
||||
if (wd == -1) {
|
||||
WARN_ONCE(1, "%s: i_mark=%p i_mark->wd=%d i_mark->group=%p"
|
||||
" i_mark->inode=%p\n", __func__, i_mark, i_mark->wd,
|
||||
i_mark->fsn_mark.group, i_mark->fsn_mark.i.inode);
|
||||
i_mark->fsn_mark.group, i_mark->fsn_mark.inode);
|
||||
goto out;
|
||||
}
|
||||
|
||||
@ -442,7 +442,7 @@ static void inotify_remove_from_idr(struct fsnotify_group *group,
|
||||
if (unlikely(!found_i_mark)) {
|
||||
WARN_ONCE(1, "%s: i_mark=%p i_mark->wd=%d i_mark->group=%p"
|
||||
" i_mark->inode=%p\n", __func__, i_mark, i_mark->wd,
|
||||
i_mark->fsn_mark.group, i_mark->fsn_mark.i.inode);
|
||||
i_mark->fsn_mark.group, i_mark->fsn_mark.inode);
|
||||
goto out;
|
||||
}
|
||||
|
||||
@ -456,9 +456,9 @@ static void inotify_remove_from_idr(struct fsnotify_group *group,
|
||||
"mark->inode=%p found_i_mark=%p found_i_mark->wd=%d "
|
||||
"found_i_mark->group=%p found_i_mark->inode=%p\n",
|
||||
__func__, i_mark, i_mark->wd, i_mark->fsn_mark.group,
|
||||
i_mark->fsn_mark.i.inode, found_i_mark, found_i_mark->wd,
|
||||
i_mark->fsn_mark.inode, found_i_mark, found_i_mark->wd,
|
||||
found_i_mark->fsn_mark.group,
|
||||
found_i_mark->fsn_mark.i.inode);
|
||||
found_i_mark->fsn_mark.inode);
|
||||
goto out;
|
||||
}
|
||||
|
||||
@ -470,7 +470,7 @@ static void inotify_remove_from_idr(struct fsnotify_group *group,
|
||||
if (unlikely(atomic_read(&i_mark->fsn_mark.refcnt) < 3)) {
|
||||
printk(KERN_ERR "%s: i_mark=%p i_mark->wd=%d i_mark->group=%p"
|
||||
" i_mark->inode=%p\n", __func__, i_mark, i_mark->wd,
|
||||
i_mark->fsn_mark.group, i_mark->fsn_mark.i.inode);
|
||||
i_mark->fsn_mark.group, i_mark->fsn_mark.inode);
|
||||
/* we can't really recover with bad ref cnting.. */
|
||||
BUG();
|
||||
}
|
||||
|
@ -110,6 +110,17 @@ void fsnotify_put_mark(struct fsnotify_mark *mark)
|
||||
}
|
||||
}
|
||||
|
||||
/* Calculate mask of events for a list of marks */
|
||||
u32 fsnotify_recalc_mask(struct hlist_head *head)
|
||||
{
|
||||
u32 new_mask = 0;
|
||||
struct fsnotify_mark *mark;
|
||||
|
||||
hlist_for_each_entry(mark, head, obj_list)
|
||||
new_mask |= mark->mask;
|
||||
return new_mask;
|
||||
}
|
||||
|
||||
/*
|
||||
* Any time a mark is getting freed we end up here.
|
||||
* The caller had better be holding a reference to this mark so we don't actually
|
||||
@ -133,7 +144,7 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark,
|
||||
mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
|
||||
|
||||
if (mark->flags & FSNOTIFY_MARK_FLAG_INODE) {
|
||||
inode = mark->i.inode;
|
||||
inode = mark->inode;
|
||||
fsnotify_destroy_inode_mark(mark);
|
||||
} else if (mark->flags & FSNOTIFY_MARK_FLAG_VFSMOUNT)
|
||||
fsnotify_destroy_vfsmount_mark(mark);
|
||||
@ -150,7 +161,7 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark,
|
||||
mutex_unlock(&group->mark_mutex);
|
||||
|
||||
spin_lock(&destroy_lock);
|
||||
list_add(&mark->destroy_list, &destroy_list);
|
||||
list_add(&mark->g_list, &destroy_list);
|
||||
spin_unlock(&destroy_lock);
|
||||
wake_up(&destroy_waitq);
|
||||
/*
|
||||
@ -192,6 +203,27 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark,
|
||||
mutex_unlock(&group->mark_mutex);
|
||||
}
|
||||
|
||||
/*
|
||||
* Destroy all marks in the given list. The marks must be already detached from
|
||||
* the original inode / vfsmount.
|
||||
*/
|
||||
void fsnotify_destroy_marks(struct list_head *to_free)
|
||||
{
|
||||
struct fsnotify_mark *mark, *lmark;
|
||||
struct fsnotify_group *group;
|
||||
|
||||
list_for_each_entry_safe(mark, lmark, to_free, free_list) {
|
||||
spin_lock(&mark->lock);
|
||||
fsnotify_get_group(mark->group);
|
||||
group = mark->group;
|
||||
spin_unlock(&mark->lock);
|
||||
|
||||
fsnotify_destroy_mark(mark, group);
|
||||
fsnotify_put_mark(mark);
|
||||
fsnotify_put_group(group);
|
||||
}
|
||||
}
|
||||
|
||||
void fsnotify_set_mark_mask_locked(struct fsnotify_mark *mark, __u32 mask)
|
||||
{
|
||||
assert_spin_locked(&mark->lock);
|
||||
@ -245,6 +277,39 @@ int fsnotify_compare_groups(struct fsnotify_group *a, struct fsnotify_group *b)
|
||||
return -1;
|
||||
}
|
||||
|
||||
/* Add mark into proper place in given list of marks */
|
||||
int fsnotify_add_mark_list(struct hlist_head *head, struct fsnotify_mark *mark,
|
||||
int allow_dups)
|
||||
{
|
||||
struct fsnotify_mark *lmark, *last = NULL;
|
||||
int cmp;
|
||||
|
||||
/* is mark the first mark? */
|
||||
if (hlist_empty(head)) {
|
||||
hlist_add_head_rcu(&mark->obj_list, head);
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* should mark be in the middle of the current list? */
|
||||
hlist_for_each_entry(lmark, head, obj_list) {
|
||||
last = lmark;
|
||||
|
||||
if ((lmark->group == mark->group) && !allow_dups)
|
||||
return -EEXIST;
|
||||
|
||||
cmp = fsnotify_compare_groups(lmark->group, mark->group);
|
||||
if (cmp >= 0) {
|
||||
hlist_add_before_rcu(&mark->obj_list, &lmark->obj_list);
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
BUG_ON(last == NULL);
|
||||
/* mark should be the last entry. last is the current last entry */
|
||||
hlist_add_behind_rcu(&mark->obj_list, &last->obj_list);
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Attach an initialized mark to a given group and fs object.
|
||||
* These marks may be used for the fsnotify backend to determine which
|
||||
@ -305,7 +370,7 @@ err:
|
||||
spin_unlock(&mark->lock);
|
||||
|
||||
spin_lock(&destroy_lock);
|
||||
list_add(&mark->destroy_list, &destroy_list);
|
||||
list_add(&mark->g_list, &destroy_list);
|
||||
spin_unlock(&destroy_lock);
|
||||
wake_up(&destroy_waitq);
|
||||
|
||||
@ -322,6 +387,24 @@ int fsnotify_add_mark(struct fsnotify_mark *mark, struct fsnotify_group *group,
|
||||
return ret;
|
||||
}
|
||||
|
||||
/*
|
||||
* Given a list of marks, find the mark associated with given group. If found
|
||||
* take a reference to that mark and return it, else return NULL.
|
||||
*/
|
||||
struct fsnotify_mark *fsnotify_find_mark(struct hlist_head *head,
|
||||
struct fsnotify_group *group)
|
||||
{
|
||||
struct fsnotify_mark *mark;
|
||||
|
||||
hlist_for_each_entry(mark, head, obj_list) {
|
||||
if (mark->group == group) {
|
||||
fsnotify_get_mark(mark);
|
||||
return mark;
|
||||
}
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
/*
|
||||
* clear any marks in a group in which mark->flags & flags is true
|
||||
*/
|
||||
@ -352,8 +435,8 @@ void fsnotify_clear_marks_by_group(struct fsnotify_group *group)
|
||||
void fsnotify_duplicate_mark(struct fsnotify_mark *new, struct fsnotify_mark *old)
|
||||
{
|
||||
assert_spin_locked(&old->lock);
|
||||
new->i.inode = old->i.inode;
|
||||
new->m.mnt = old->m.mnt;
|
||||
new->inode = old->inode;
|
||||
new->mnt = old->mnt;
|
||||
if (old->group)
|
||||
fsnotify_get_group(old->group);
|
||||
new->group = old->group;
|
||||
@ -386,8 +469,8 @@ static int fsnotify_mark_destroy(void *ignored)
|
||||
|
||||
synchronize_srcu(&fsnotify_mark_srcu);
|
||||
|
||||
list_for_each_entry_safe(mark, next, &private_destroy_list, destroy_list) {
|
||||
list_del_init(&mark->destroy_list);
|
||||
list_for_each_entry_safe(mark, next, &private_destroy_list, g_list) {
|
||||
list_del_init(&mark->g_list);
|
||||
fsnotify_put_mark(mark);
|
||||
}
|
||||
|
||||
|
@ -32,31 +32,20 @@
|
||||
|
||||
void fsnotify_clear_marks_by_mount(struct vfsmount *mnt)
|
||||
{
|
||||
struct fsnotify_mark *mark, *lmark;
|
||||
struct fsnotify_mark *mark;
|
||||
struct hlist_node *n;
|
||||
struct mount *m = real_mount(mnt);
|
||||
LIST_HEAD(free_list);
|
||||
|
||||
spin_lock(&mnt->mnt_root->d_lock);
|
||||
hlist_for_each_entry_safe(mark, n, &m->mnt_fsnotify_marks, m.m_list) {
|
||||
list_add(&mark->m.free_m_list, &free_list);
|
||||
hlist_del_init_rcu(&mark->m.m_list);
|
||||
hlist_for_each_entry_safe(mark, n, &m->mnt_fsnotify_marks, obj_list) {
|
||||
list_add(&mark->free_list, &free_list);
|
||||
hlist_del_init_rcu(&mark->obj_list);
|
||||
fsnotify_get_mark(mark);
|
||||
}
|
||||
spin_unlock(&mnt->mnt_root->d_lock);
|
||||
|
||||
list_for_each_entry_safe(mark, lmark, &free_list, m.free_m_list) {
|
||||
struct fsnotify_group *group;
|
||||
|
||||
spin_lock(&mark->lock);
|
||||
fsnotify_get_group(mark->group);
|
||||
group = mark->group;
|
||||
spin_unlock(&mark->lock);
|
||||
|
||||
fsnotify_destroy_mark(mark, group);
|
||||
fsnotify_put_mark(mark);
|
||||
fsnotify_put_group(group);
|
||||
}
|
||||
fsnotify_destroy_marks(&free_list);
|
||||
}
|
||||
|
||||
void fsnotify_clear_vfsmount_marks_by_group(struct fsnotify_group *group)
|
||||
@ -64,67 +53,36 @@ void fsnotify_clear_vfsmount_marks_by_group(struct fsnotify_group *group)
|
||||
fsnotify_clear_marks_by_group_flags(group, FSNOTIFY_MARK_FLAG_VFSMOUNT);
|
||||
}
|
||||
|
||||
/*
|
||||
* Recalculate the mask of events relevant to a given vfsmount locked.
|
||||
*/
|
||||
static void fsnotify_recalc_vfsmount_mask_locked(struct vfsmount *mnt)
|
||||
{
|
||||
struct mount *m = real_mount(mnt);
|
||||
struct fsnotify_mark *mark;
|
||||
__u32 new_mask = 0;
|
||||
|
||||
assert_spin_locked(&mnt->mnt_root->d_lock);
|
||||
|
||||
hlist_for_each_entry(mark, &m->mnt_fsnotify_marks, m.m_list)
|
||||
new_mask |= mark->mask;
|
||||
m->mnt_fsnotify_mask = new_mask;
|
||||
}
|
||||
|
||||
/*
|
||||
* Recalculate the mnt->mnt_fsnotify_mask, or the mask of all FS_* event types
|
||||
* any notifier is interested in hearing for this mount point
|
||||
*/
|
||||
void fsnotify_recalc_vfsmount_mask(struct vfsmount *mnt)
|
||||
{
|
||||
struct mount *m = real_mount(mnt);
|
||||
|
||||
spin_lock(&mnt->mnt_root->d_lock);
|
||||
fsnotify_recalc_vfsmount_mask_locked(mnt);
|
||||
m->mnt_fsnotify_mask = fsnotify_recalc_mask(&m->mnt_fsnotify_marks);
|
||||
spin_unlock(&mnt->mnt_root->d_lock);
|
||||
}
|
||||
|
||||
void fsnotify_destroy_vfsmount_mark(struct fsnotify_mark *mark)
|
||||
{
|
||||
struct vfsmount *mnt = mark->m.mnt;
|
||||
struct vfsmount *mnt = mark->mnt;
|
||||
struct mount *m = real_mount(mnt);
|
||||
|
||||
BUG_ON(!mutex_is_locked(&mark->group->mark_mutex));
|
||||
assert_spin_locked(&mark->lock);
|
||||
|
||||
spin_lock(&mnt->mnt_root->d_lock);
|
||||
|
||||
hlist_del_init_rcu(&mark->m.m_list);
|
||||
mark->m.mnt = NULL;
|
||||
|
||||
fsnotify_recalc_vfsmount_mask_locked(mnt);
|
||||
hlist_del_init_rcu(&mark->obj_list);
|
||||
mark->mnt = NULL;
|
||||
|
||||
m->mnt_fsnotify_mask = fsnotify_recalc_mask(&m->mnt_fsnotify_marks);
|
||||
spin_unlock(&mnt->mnt_root->d_lock);
|
||||
}
|
||||
|
||||
static struct fsnotify_mark *fsnotify_find_vfsmount_mark_locked(struct fsnotify_group *group,
|
||||
struct vfsmount *mnt)
|
||||
{
|
||||
struct mount *m = real_mount(mnt);
|
||||
struct fsnotify_mark *mark;
|
||||
|
||||
assert_spin_locked(&mnt->mnt_root->d_lock);
|
||||
|
||||
hlist_for_each_entry(mark, &m->mnt_fsnotify_marks, m.m_list) {
|
||||
if (mark->group == group) {
|
||||
fsnotify_get_mark(mark);
|
||||
return mark;
|
||||
}
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
/*
|
||||
* given a group and vfsmount, find the mark associated with that combination.
|
||||
* if found take a reference to that mark and return it, else return NULL
|
||||
@ -132,10 +90,11 @@ static struct fsnotify_mark *fsnotify_find_vfsmount_mark_locked(struct fsnotify_
|
||||
struct fsnotify_mark *fsnotify_find_vfsmount_mark(struct fsnotify_group *group,
|
||||
struct vfsmount *mnt)
|
||||
{
|
||||
struct mount *m = real_mount(mnt);
|
||||
struct fsnotify_mark *mark;
|
||||
|
||||
spin_lock(&mnt->mnt_root->d_lock);
|
||||
mark = fsnotify_find_vfsmount_mark_locked(group, mnt);
|
||||
mark = fsnotify_find_mark(&m->mnt_fsnotify_marks, group);
|
||||
spin_unlock(&mnt->mnt_root->d_lock);
|
||||
|
||||
return mark;
|
||||
@ -151,9 +110,7 @@ int fsnotify_add_vfsmount_mark(struct fsnotify_mark *mark,
|
||||
int allow_dups)
|
||||
{
|
||||
struct mount *m = real_mount(mnt);
|
||||
struct fsnotify_mark *lmark, *last = NULL;
|
||||
int ret = 0;
|
||||
int cmp;
|
||||
int ret;
|
||||
|
||||
mark->flags |= FSNOTIFY_MARK_FLAG_VFSMOUNT;
|
||||
|
||||
@ -161,37 +118,9 @@ int fsnotify_add_vfsmount_mark(struct fsnotify_mark *mark,
|
||||
assert_spin_locked(&mark->lock);
|
||||
|
||||
spin_lock(&mnt->mnt_root->d_lock);
|
||||
|
||||
mark->m.mnt = mnt;
|
||||
|
||||
/* is mark the first mark? */
|
||||
if (hlist_empty(&m->mnt_fsnotify_marks)) {
|
||||
hlist_add_head_rcu(&mark->m.m_list, &m->mnt_fsnotify_marks);
|
||||
goto out;
|
||||
}
|
||||
|
||||
/* should mark be in the middle of the current list? */
|
||||
hlist_for_each_entry(lmark, &m->mnt_fsnotify_marks, m.m_list) {
|
||||
last = lmark;
|
||||
|
||||
if ((lmark->group == group) && !allow_dups) {
|
||||
ret = -EEXIST;
|
||||
goto out;
|
||||
}
|
||||
|
||||
cmp = fsnotify_compare_groups(lmark->group, mark->group);
|
||||
if (cmp < 0)
|
||||
continue;
|
||||
|
||||
hlist_add_before_rcu(&mark->m.m_list, &lmark->m.m_list);
|
||||
goto out;
|
||||
}
|
||||
|
||||
BUG_ON(last == NULL);
|
||||
/* mark should be the last entry. last is the current last entry */
|
||||
hlist_add_behind_rcu(&mark->m.m_list, &last->m.m_list);
|
||||
out:
|
||||
fsnotify_recalc_vfsmount_mask_locked(mnt);
|
||||
mark->mnt = mnt;
|
||||
ret = fsnotify_add_mark_list(&m->mnt_fsnotify_marks, mark, allow_dups);
|
||||
m->mnt_fsnotify_mask = fsnotify_recalc_mask(&m->mnt_fsnotify_marks);
|
||||
spin_unlock(&mnt->mnt_root->d_lock);
|
||||
|
||||
return ret;
|
||||
|
11
fs/open.c
11
fs/open.c
@ -295,6 +295,17 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
|
||||
|
||||
sb_start_write(inode->i_sb);
|
||||
ret = file->f_op->fallocate(file, mode, offset, len);
|
||||
|
||||
/*
|
||||
* Create inotify and fanotify events.
|
||||
*
|
||||
* To keep the logic simple always create events if fallocate succeeds.
|
||||
* This implies that events are even created if the file size remains
|
||||
* unchanged, e.g. when using flag FALLOC_FL_KEEP_SIZE.
|
||||
*/
|
||||
if (ret == 0)
|
||||
fsnotify_modify(file);
|
||||
|
||||
sb_end_write(inode->i_sb);
|
||||
return ret;
|
||||
}
|
||||
|
@ -25,7 +25,11 @@ static void *seq_buf_alloc(unsigned long size)
|
||||
{
|
||||
void *buf;
|
||||
|
||||
buf = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
|
||||
/*
|
||||
* __GFP_NORETRY to avoid oom-killings with high-order allocations -
|
||||
* it's better to fall back to vmalloc() than to kill things.
|
||||
*/
|
||||
buf = kmalloc(size, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
|
||||
if (!buf && size > PAGE_SIZE)
|
||||
buf = vmalloc(size);
|
||||
return buf;
|
||||
|
@ -53,6 +53,10 @@ struct linux_binprm {
|
||||
#define BINPRM_FLAGS_EXECFD_BIT 1
|
||||
#define BINPRM_FLAGS_EXECFD (1 << BINPRM_FLAGS_EXECFD_BIT)
|
||||
|
||||
/* filename of the binary will be inaccessible after exec */
|
||||
#define BINPRM_FLAGS_PATH_INACCESSIBLE_BIT 2
|
||||
#define BINPRM_FLAGS_PATH_INACCESSIBLE (1 << BINPRM_FLAGS_PATH_INACCESSIBLE_BIT)
|
||||
|
||||
/* Function parameter for binfmt->coredump */
|
||||
struct coredump_params {
|
||||
const siginfo_t *siginfo;
|
||||
|
@ -45,6 +45,7 @@
|
||||
* bitmap_set(dst, pos, nbits) Set specified bit area
|
||||
* bitmap_clear(dst, pos, nbits) Clear specified bit area
|
||||
* bitmap_find_next_zero_area(buf, len, pos, n, mask) Find bit free area
|
||||
* bitmap_find_next_zero_area_off(buf, len, pos, n, mask) as above
|
||||
* bitmap_shift_right(dst, src, n, nbits) *dst = *src >> n
|
||||
* bitmap_shift_left(dst, src, n, nbits) *dst = *src << n
|
||||
* bitmap_remap(dst, src, old, new, nbits) *dst = map(old, new)(src)
|
||||
@ -114,11 +115,36 @@ extern int __bitmap_weight(const unsigned long *bitmap, unsigned int nbits);
|
||||
|
||||
extern void bitmap_set(unsigned long *map, unsigned int start, int len);
|
||||
extern void bitmap_clear(unsigned long *map, unsigned int start, int len);
|
||||
extern unsigned long bitmap_find_next_zero_area(unsigned long *map,
|
||||
unsigned long size,
|
||||
unsigned long start,
|
||||
unsigned int nr,
|
||||
unsigned long align_mask);
|
||||
|
||||
extern unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
|
||||
unsigned long size,
|
||||
unsigned long start,
|
||||
unsigned int nr,
|
||||
unsigned long align_mask,
|
||||
unsigned long align_offset);
|
||||
|
||||
/**
|
||||
* bitmap_find_next_zero_area - find a contiguous aligned zero area
|
||||
* @map: The address to base the search on
|
||||
* @size: The bitmap size in bits
|
||||
* @start: The bitnumber to start searching at
|
||||
* @nr: The number of zeroed bits we're looking for
|
||||
* @align_mask: Alignment mask for zero area
|
||||
*
|
||||
* The @align_mask should be one less than a power of 2; the effect is that
|
||||
* the bit offset of all zero areas this function finds is multiples of that
|
||||
* power of 2. A @align_mask of 0 means no alignment is required.
|
||||
*/
|
||||
static inline unsigned long
|
||||
bitmap_find_next_zero_area(unsigned long *map,
|
||||
unsigned long size,
|
||||
unsigned long start,
|
||||
unsigned int nr,
|
||||
unsigned long align_mask)
|
||||
{
|
||||
return bitmap_find_next_zero_area_off(map, size, start, nr,
|
||||
align_mask, 0);
|
||||
}
|
||||
|
||||
extern int bitmap_scnprintf(char *buf, unsigned int len,
|
||||
const unsigned long *src, int nbits);
|
||||
|
@ -357,6 +357,9 @@ asmlinkage long compat_sys_lseek(unsigned int, compat_off_t, unsigned int);
|
||||
|
||||
asmlinkage long compat_sys_execve(const char __user *filename, const compat_uptr_t __user *argv,
|
||||
const compat_uptr_t __user *envp);
|
||||
asmlinkage long compat_sys_execveat(int dfd, const char __user *filename,
|
||||
const compat_uptr_t __user *argv,
|
||||
const compat_uptr_t __user *envp, int flags);
|
||||
|
||||
asmlinkage long compat_sys_select(int n, compat_ulong_t __user *inp,
|
||||
compat_ulong_t __user *outp, compat_ulong_t __user *exp,
|
||||
|
@ -5,6 +5,7 @@
|
||||
|
||||
#include <linux/types.h>
|
||||
#include <linux/debugfs.h>
|
||||
#include <linux/ratelimit.h>
|
||||
#include <linux/atomic.h>
|
||||
|
||||
/*
|
||||
@ -25,14 +26,18 @@ struct fault_attr {
|
||||
unsigned long reject_end;
|
||||
|
||||
unsigned long count;
|
||||
struct ratelimit_state ratelimit_state;
|
||||
struct dentry *dname;
|
||||
};
|
||||
|
||||
#define FAULT_ATTR_INITIALIZER { \
|
||||
.interval = 1, \
|
||||
.times = ATOMIC_INIT(1), \
|
||||
.require_end = ULONG_MAX, \
|
||||
.stacktrace_depth = 32, \
|
||||
.verbose = 2, \
|
||||
#define FAULT_ATTR_INITIALIZER { \
|
||||
.interval = 1, \
|
||||
.times = ATOMIC_INIT(1), \
|
||||
.require_end = ULONG_MAX, \
|
||||
.stacktrace_depth = 32, \
|
||||
.ratelimit_state = RATELIMIT_STATE_INIT_DISABLED, \
|
||||
.verbose = 2, \
|
||||
.dname = NULL, \
|
||||
}
|
||||
|
||||
#define DECLARE_FAULT_ATTR(name) struct fault_attr name = FAULT_ATTR_INITIALIZER
|
||||
|
@ -18,6 +18,7 @@
|
||||
#include <linux/pid.h>
|
||||
#include <linux/bug.h>
|
||||
#include <linux/mutex.h>
|
||||
#include <linux/rwsem.h>
|
||||
#include <linux/capability.h>
|
||||
#include <linux/semaphore.h>
|
||||
#include <linux/fiemap.h>
|
||||
@ -401,7 +402,7 @@ struct address_space {
|
||||
atomic_t i_mmap_writable;/* count VM_SHARED mappings */
|
||||
struct rb_root i_mmap; /* tree of private and shared mappings */
|
||||
struct list_head i_mmap_nonlinear;/*list VM_NONLINEAR mappings */
|
||||
struct mutex i_mmap_mutex; /* protect tree, count, list */
|
||||
struct rw_semaphore i_mmap_rwsem; /* protect tree, count, list */
|
||||
/* Protected by tree_lock together with the radix tree */
|
||||
unsigned long nrpages; /* number of total pages */
|
||||
unsigned long nrshadows; /* number of shadow entries */
|
||||
@ -467,6 +468,26 @@ struct block_device {
|
||||
|
||||
int mapping_tagged(struct address_space *mapping, int tag);
|
||||
|
||||
static inline void i_mmap_lock_write(struct address_space *mapping)
|
||||
{
|
||||
down_write(&mapping->i_mmap_rwsem);
|
||||
}
|
||||
|
||||
static inline void i_mmap_unlock_write(struct address_space *mapping)
|
||||
{
|
||||
up_write(&mapping->i_mmap_rwsem);
|
||||
}
|
||||
|
||||
static inline void i_mmap_lock_read(struct address_space *mapping)
|
||||
{
|
||||
down_read(&mapping->i_mmap_rwsem);
|
||||
}
|
||||
|
||||
static inline void i_mmap_unlock_read(struct address_space *mapping)
|
||||
{
|
||||
up_read(&mapping->i_mmap_rwsem);
|
||||
}
|
||||
|
||||
/*
|
||||
* Might pages of this file be mapped into userspace?
|
||||
*/
|
||||
@ -2075,6 +2096,7 @@ extern int vfs_open(const struct path *, struct file *, const struct cred *);
|
||||
extern struct file * dentry_open(const struct path *, int, const struct cred *);
|
||||
extern int filp_close(struct file *, fl_owner_t id);
|
||||
|
||||
extern struct filename *getname_flags(const char __user *, int, int *);
|
||||
extern struct filename *getname(const char __user *);
|
||||
extern struct filename *getname_kernel(const char *);
|
||||
|
||||
|
@ -196,24 +196,6 @@ struct fsnotify_group {
|
||||
#define FSNOTIFY_EVENT_PATH 1
|
||||
#define FSNOTIFY_EVENT_INODE 2
|
||||
|
||||
/*
|
||||
* Inode specific fields in an fsnotify_mark
|
||||
*/
|
||||
struct fsnotify_inode_mark {
|
||||
struct inode *inode; /* inode this mark is associated with */
|
||||
struct hlist_node i_list; /* list of marks by inode->i_fsnotify_marks */
|
||||
struct list_head free_i_list; /* tmp list used when freeing this mark */
|
||||
};
|
||||
|
||||
/*
|
||||
* Mount point specific fields in an fsnotify_mark
|
||||
*/
|
||||
struct fsnotify_vfsmount_mark {
|
||||
struct vfsmount *mnt; /* vfsmount this mark is associated with */
|
||||
struct hlist_node m_list; /* list of marks by inode->i_fsnotify_marks */
|
||||
struct list_head free_m_list; /* tmp list used when freeing this mark */
|
||||
};
|
||||
|
||||
/*
|
||||
* a mark is simply an object attached to an in core inode which allows an
|
||||
* fsnotify listener to indicate they are either no longer interested in events
|
||||
@ -230,11 +212,17 @@ struct fsnotify_mark {
|
||||
* in kernel that found and may be using this mark. */
|
||||
atomic_t refcnt; /* active things looking at this mark */
|
||||
struct fsnotify_group *group; /* group this mark is for */
|
||||
struct list_head g_list; /* list of marks by group->i_fsnotify_marks */
|
||||
struct list_head g_list; /* list of marks by group->i_fsnotify_marks
|
||||
* Also reused for queueing mark into
|
||||
* destroy_list when it's waiting for
|
||||
* the end of SRCU period before it can
|
||||
* be freed */
|
||||
spinlock_t lock; /* protect group and inode */
|
||||
struct hlist_node obj_list; /* list of marks for inode / vfsmount */
|
||||
struct list_head free_list; /* tmp list used when freeing this mark */
|
||||
union {
|
||||
struct fsnotify_inode_mark i;
|
||||
struct fsnotify_vfsmount_mark m;
|
||||
struct inode *inode; /* inode this mark is associated with */
|
||||
struct vfsmount *mnt; /* vfsmount this mark is associated with */
|
||||
};
|
||||
__u32 ignored_mask; /* events types to ignore */
|
||||
#define FSNOTIFY_MARK_FLAG_INODE 0x01
|
||||
@ -243,7 +231,6 @@ struct fsnotify_mark {
|
||||
#define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY 0x08
|
||||
#define FSNOTIFY_MARK_FLAG_ALIVE 0x10
|
||||
unsigned int flags; /* vfsmount or inode mark? */
|
||||
struct list_head destroy_list;
|
||||
void (*free_mark)(struct fsnotify_mark *mark); /* called on final put+free */
|
||||
};
|
||||
|
||||
|
@ -110,11 +110,8 @@ struct vm_area_struct;
|
||||
#define GFP_TEMPORARY (__GFP_WAIT | __GFP_IO | __GFP_FS | \
|
||||
__GFP_RECLAIMABLE)
|
||||
#define GFP_USER (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL)
|
||||
#define GFP_HIGHUSER (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL | \
|
||||
__GFP_HIGHMEM)
|
||||
#define GFP_HIGHUSER_MOVABLE (__GFP_WAIT | __GFP_IO | __GFP_FS | \
|
||||
__GFP_HARDWALL | __GFP_HIGHMEM | \
|
||||
__GFP_MOVABLE)
|
||||
#define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM)
|
||||
#define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER | __GFP_MOVABLE)
|
||||
#define GFP_IOFS (__GFP_IO | __GFP_FS)
|
||||
#define GFP_TRANSHUGE (GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
|
||||
__GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | \
|
||||
|
@ -7,15 +7,6 @@
|
||||
#include <linux/notifier.h>
|
||||
#include <linux/nsproxy.h>
|
||||
|
||||
/*
|
||||
* ipc namespace events
|
||||
*/
|
||||
#define IPCNS_MEMCHANGED 0x00000001 /* Notify lowmem size changed */
|
||||
#define IPCNS_CREATED 0x00000002 /* Notify new ipc namespace created */
|
||||
#define IPCNS_REMOVED 0x00000003 /* Notify ipc namespace removed */
|
||||
|
||||
#define IPCNS_CALLBACK_PRI 0
|
||||
|
||||
struct user_namespace;
|
||||
|
||||
struct ipc_ids {
|
||||
@ -38,7 +29,6 @@ struct ipc_namespace {
|
||||
unsigned int msg_ctlmni;
|
||||
atomic_t msg_bytes;
|
||||
atomic_t msg_hdrs;
|
||||
int auto_msgmni;
|
||||
|
||||
size_t shm_ctlmax;
|
||||
size_t shm_ctlall;
|
||||
@ -77,18 +67,8 @@ extern atomic_t nr_ipc_ns;
|
||||
extern spinlock_t mq_lock;
|
||||
|
||||
#ifdef CONFIG_SYSVIPC
|
||||
extern int register_ipcns_notifier(struct ipc_namespace *);
|
||||
extern int cond_register_ipcns_notifier(struct ipc_namespace *);
|
||||
extern void unregister_ipcns_notifier(struct ipc_namespace *);
|
||||
extern int ipcns_notify(unsigned long);
|
||||
extern void shm_destroy_orphaned(struct ipc_namespace *ns);
|
||||
#else /* CONFIG_SYSVIPC */
|
||||
static inline int register_ipcns_notifier(struct ipc_namespace *ns)
|
||||
{ return 0; }
|
||||
static inline int cond_register_ipcns_notifier(struct ipc_namespace *ns)
|
||||
{ return 0; }
|
||||
static inline void unregister_ipcns_notifier(struct ipc_namespace *ns) { }
|
||||
static inline int ipcns_notify(unsigned long l) { return 0; }
|
||||
static inline void shm_destroy_orphaned(struct ipc_namespace *ns) {}
|
||||
#endif /* CONFIG_SYSVIPC */
|
||||
|
||||
|
@ -21,6 +21,8 @@
|
||||
#ifndef __KMEMLEAK_H
|
||||
#define __KMEMLEAK_H
|
||||
|
||||
#include <linux/slab.h>
|
||||
|
||||
#ifdef CONFIG_DEBUG_KMEMLEAK
|
||||
|
||||
extern void kmemleak_init(void) __ref;
|
||||
|
@ -400,8 +400,8 @@ int memcg_cache_id(struct mem_cgroup *memcg);
|
||||
|
||||
void memcg_update_array_size(int num_groups);
|
||||
|
||||
struct kmem_cache *
|
||||
__memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp);
|
||||
struct kmem_cache *__memcg_kmem_get_cache(struct kmem_cache *cachep);
|
||||
void __memcg_kmem_put_cache(struct kmem_cache *cachep);
|
||||
|
||||
int __memcg_charge_slab(struct kmem_cache *cachep, gfp_t gfp, int order);
|
||||
void __memcg_uncharge_slab(struct kmem_cache *cachep, int order);
|
||||
@ -492,7 +492,13 @@ memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp)
|
||||
if (unlikely(fatal_signal_pending(current)))
|
||||
return cachep;
|
||||
|
||||
return __memcg_kmem_get_cache(cachep, gfp);
|
||||
return __memcg_kmem_get_cache(cachep);
|
||||
}
|
||||
|
||||
static __always_inline void memcg_kmem_put_cache(struct kmem_cache *cachep)
|
||||
{
|
||||
if (memcg_kmem_enabled())
|
||||
__memcg_kmem_put_cache(cachep);
|
||||
}
|
||||
#else
|
||||
#define for_each_memcg_cache_index(_idx) \
|
||||
@ -528,6 +534,10 @@ memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp)
|
||||
{
|
||||
return cachep;
|
||||
}
|
||||
|
||||
static inline void memcg_kmem_put_cache(struct kmem_cache *cachep)
|
||||
{
|
||||
}
|
||||
#endif /* CONFIG_MEMCG_KMEM */
|
||||
#endif /* _LINUX_MEMCONTROL_H */
|
||||
|
||||
|
@ -19,6 +19,7 @@
|
||||
#include <linux/bit_spinlock.h>
|
||||
#include <linux/shrinker.h>
|
||||
#include <linux/resource.h>
|
||||
#include <linux/page_ext.h>
|
||||
|
||||
struct mempolicy;
|
||||
struct anon_vma;
|
||||
@ -2060,7 +2061,22 @@ static inline void vm_stat_account(struct mm_struct *mm,
|
||||
#endif /* CONFIG_PROC_FS */
|
||||
|
||||
#ifdef CONFIG_DEBUG_PAGEALLOC
|
||||
extern void kernel_map_pages(struct page *page, int numpages, int enable);
|
||||
extern bool _debug_pagealloc_enabled;
|
||||
extern void __kernel_map_pages(struct page *page, int numpages, int enable);
|
||||
|
||||
static inline bool debug_pagealloc_enabled(void)
|
||||
{
|
||||
return _debug_pagealloc_enabled;
|
||||
}
|
||||
|
||||
static inline void
|
||||
kernel_map_pages(struct page *page, int numpages, int enable)
|
||||
{
|
||||
if (!debug_pagealloc_enabled())
|
||||
return;
|
||||
|
||||
__kernel_map_pages(page, numpages, enable);
|
||||
}
|
||||
#ifdef CONFIG_HIBERNATION
|
||||
extern bool kernel_page_present(struct page *page);
|
||||
#endif /* CONFIG_HIBERNATION */
|
||||
@ -2094,9 +2110,9 @@ int drop_caches_sysctl_handler(struct ctl_table *, int,
|
||||
void __user *, size_t *, loff_t *);
|
||||
#endif
|
||||
|
||||
unsigned long shrink_slab(struct shrink_control *shrink,
|
||||
unsigned long nr_pages_scanned,
|
||||
unsigned long lru_pages);
|
||||
unsigned long shrink_node_slabs(gfp_t gfp_mask, int nid,
|
||||
unsigned long nr_scanned,
|
||||
unsigned long nr_eligible);
|
||||
|
||||
#ifndef CONFIG_MMU
|
||||
#define randomize_va_space 0
|
||||
@ -2155,20 +2171,36 @@ extern void copy_user_huge_page(struct page *dst, struct page *src,
|
||||
unsigned int pages_per_huge_page);
|
||||
#endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
|
||||
|
||||
extern struct page_ext_operations debug_guardpage_ops;
|
||||
extern struct page_ext_operations page_poisoning_ops;
|
||||
|
||||
#ifdef CONFIG_DEBUG_PAGEALLOC
|
||||
extern unsigned int _debug_guardpage_minorder;
|
||||
extern bool _debug_guardpage_enabled;
|
||||
|
||||
static inline unsigned int debug_guardpage_minorder(void)
|
||||
{
|
||||
return _debug_guardpage_minorder;
|
||||
}
|
||||
|
||||
static inline bool debug_guardpage_enabled(void)
|
||||
{
|
||||
return _debug_guardpage_enabled;
|
||||
}
|
||||
|
||||
static inline bool page_is_guard(struct page *page)
|
||||
{
|
||||
return test_bit(PAGE_DEBUG_FLAG_GUARD, &page->debug_flags);
|
||||
struct page_ext *page_ext;
|
||||
|
||||
if (!debug_guardpage_enabled())
|
||||
return false;
|
||||
|
||||
page_ext = lookup_page_ext(page);
|
||||
return test_bit(PAGE_EXT_DEBUG_GUARD, &page_ext->flags);
|
||||
}
|
||||
#else
|
||||
static inline unsigned int debug_guardpage_minorder(void) { return 0; }
|
||||
static inline bool debug_guardpage_enabled(void) { return false; }
|
||||
static inline bool page_is_guard(struct page *page) { return false; }
|
||||
#endif /* CONFIG_DEBUG_PAGEALLOC */
|
||||
|
||||
|
@ -10,7 +10,6 @@
|
||||
#include <linux/rwsem.h>
|
||||
#include <linux/completion.h>
|
||||
#include <linux/cpumask.h>
|
||||
#include <linux/page-debug-flags.h>
|
||||
#include <linux/uprobes.h>
|
||||
#include <linux/page-flags-layout.h>
|
||||
#include <asm/page.h>
|
||||
@ -186,9 +185,6 @@ struct page {
|
||||
void *virtual; /* Kernel virtual address (NULL if
|
||||
not kmapped, ie. highmem) */
|
||||
#endif /* WANT_PAGE_VIRTUAL */
|
||||
#ifdef CONFIG_WANT_PAGE_DEBUG_FLAGS
|
||||
unsigned long debug_flags; /* Use atomic bitops on this */
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_KMEMCHECK
|
||||
/*
|
||||
@ -534,4 +530,12 @@ enum tlb_flush_reason {
|
||||
NR_TLB_FLUSH_REASONS,
|
||||
};
|
||||
|
||||
/*
|
||||
* A swap entry has to fit into a "unsigned long", as the entry is hidden
|
||||
* in the "index" field of the swapper address space.
|
||||
*/
|
||||
typedef struct {
|
||||
unsigned long val;
|
||||
} swp_entry_t;
|
||||
|
||||
#endif /* _LINUX_MM_TYPES_H */
|
||||
|
@ -154,7 +154,7 @@ struct mmu_notifier_ops {
|
||||
* Therefore notifier chains can only be traversed when either
|
||||
*
|
||||
* 1. mmap_sem is held.
|
||||
* 2. One of the reverse map locks is held (i_mmap_mutex or anon_vma->rwsem).
|
||||
* 2. One of the reverse map locks is held (i_mmap_rwsem or anon_vma->rwsem).
|
||||
* 3. No other concurrent thread can access the list (release)
|
||||
*/
|
||||
struct mmu_notifier {
|
||||
|
@ -722,6 +722,9 @@ typedef struct pglist_data {
|
||||
int nr_zones;
|
||||
#ifdef CONFIG_FLAT_NODE_MEM_MAP /* means !SPARSEMEM */
|
||||
struct page *node_mem_map;
|
||||
#ifdef CONFIG_PAGE_EXTENSION
|
||||
struct page_ext *node_page_ext;
|
||||
#endif
|
||||
#endif
|
||||
#ifndef CONFIG_NO_BOOTMEM
|
||||
struct bootmem_data *bdata;
|
||||
@ -1075,6 +1078,7 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn)
|
||||
#define SECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SECTION_MASK)
|
||||
|
||||
struct page;
|
||||
struct page_ext;
|
||||
struct mem_section {
|
||||
/*
|
||||
* This is, logically, a pointer to an array of struct
|
||||
@ -1092,6 +1096,14 @@ struct mem_section {
|
||||
|
||||
/* See declaration of similar field in struct zone */
|
||||
unsigned long *pageblock_flags;
|
||||
#ifdef CONFIG_PAGE_EXTENSION
|
||||
/*
|
||||
* If !SPARSEMEM, pgdat doesn't have page_ext pointer. We use
|
||||
* section. (see page_ext.h about this.)
|
||||
*/
|
||||
struct page_ext *page_ext;
|
||||
unsigned long pad;
|
||||
#endif
|
||||
/*
|
||||
* WARNING: mem_section must be a power-of-2 in size for the
|
||||
* calculation and use of SECTION_ROOT_MASK to make sense.
|
||||
|
@ -92,6 +92,17 @@ static inline bool oom_gfp_allowed(gfp_t gfp_mask)
|
||||
|
||||
extern struct task_struct *find_lock_task_mm(struct task_struct *p);
|
||||
|
||||
static inline bool task_will_free_mem(struct task_struct *task)
|
||||
{
|
||||
/*
|
||||
* A coredumping process may sleep for an extended period in exit_mm(),
|
||||
* so the oom killer cannot assume that the process will promptly exit
|
||||
* and release memory.
|
||||
*/
|
||||
return (task->flags & PF_EXITING) &&
|
||||
!(task->signal->flags & SIGNAL_GROUP_COREDUMP);
|
||||
}
|
||||
|
||||
/* sysctls */
|
||||
extern int sysctl_oom_dump_tasks;
|
||||
extern int sysctl_oom_kill_allocating_task;
|
||||
|
@ -1,32 +0,0 @@
|
||||
#ifndef LINUX_PAGE_DEBUG_FLAGS_H
|
||||
#define LINUX_PAGE_DEBUG_FLAGS_H
|
||||
|
||||
/*
|
||||
* page->debug_flags bits:
|
||||
*
|
||||
* PAGE_DEBUG_FLAG_POISON is set for poisoned pages. This is used to
|
||||
* implement generic debug pagealloc feature. The pages are filled with
|
||||
* poison patterns and set this flag after free_pages(). The poisoned
|
||||
* pages are verified whether the patterns are not corrupted and clear
|
||||
* the flag before alloc_pages().
|
||||
*/
|
||||
|
||||
enum page_debug_flags {
|
||||
PAGE_DEBUG_FLAG_POISON, /* Page is poisoned */
|
||||
PAGE_DEBUG_FLAG_GUARD,
|
||||
};
|
||||
|
||||
/*
|
||||
* Ensure that CONFIG_WANT_PAGE_DEBUG_FLAGS reliably
|
||||
* gets turned off when no debug features are enabling it!
|
||||
*/
|
||||
|
||||
#ifdef CONFIG_WANT_PAGE_DEBUG_FLAGS
|
||||
#if !defined(CONFIG_PAGE_POISONING) && \
|
||||
!defined(CONFIG_PAGE_GUARD) \
|
||||
/* && !defined(CONFIG_PAGE_DEBUG_SOMETHING_ELSE) && ... */
|
||||
#error WANT_PAGE_DEBUG_FLAGS is turned on with no debug features!
|
||||
#endif
|
||||
#endif /* CONFIG_WANT_PAGE_DEBUG_FLAGS */
|
||||
|
||||
#endif /* LINUX_PAGE_DEBUG_FLAGS_H */
|
84
include/linux/page_ext.h
Normal file
84
include/linux/page_ext.h
Normal file
@ -0,0 +1,84 @@
|
||||
#ifndef __LINUX_PAGE_EXT_H
|
||||
#define __LINUX_PAGE_EXT_H
|
||||
|
||||
#include <linux/types.h>
|
||||
#include <linux/stacktrace.h>
|
||||
|
||||
struct pglist_data;
|
||||
struct page_ext_operations {
|
||||
bool (*need)(void);
|
||||
void (*init)(void);
|
||||
};
|
||||
|
||||
#ifdef CONFIG_PAGE_EXTENSION
|
||||
|
||||
/*
|
||||
* page_ext->flags bits:
|
||||
*
|
||||
* PAGE_EXT_DEBUG_POISON is set for poisoned pages. This is used to
|
||||
* implement generic debug pagealloc feature. The pages are filled with
|
||||
* poison patterns and set this flag after free_pages(). The poisoned
|
||||
* pages are verified whether the patterns are not corrupted and clear
|
||||
* the flag before alloc_pages().
|
||||
*/
|
||||
|
||||
enum page_ext_flags {
|
||||
PAGE_EXT_DEBUG_POISON, /* Page is poisoned */
|
||||
PAGE_EXT_DEBUG_GUARD,
|
||||
PAGE_EXT_OWNER,
|
||||
};
|
||||
|
||||
/*
|
||||
* Page Extension can be considered as an extended mem_map.
|
||||
* A page_ext page is associated with every page descriptor. The
|
||||
* page_ext helps us add more information about the page.
|
||||
* All page_ext are allocated at boot or memory hotplug event,
|
||||
* then the page_ext for pfn always exists.
|
||||
*/
|
||||
struct page_ext {
|
||||
unsigned long flags;
|
||||
#ifdef CONFIG_PAGE_OWNER
|
||||
unsigned int order;
|
||||
gfp_t gfp_mask;
|
||||
struct stack_trace trace;
|
||||
unsigned long trace_entries[8];
|
||||
#endif
|
||||
};
|
||||
|
||||
extern void pgdat_page_ext_init(struct pglist_data *pgdat);
|
||||
|
||||
#ifdef CONFIG_SPARSEMEM
|
||||
static inline void page_ext_init_flatmem(void)
|
||||
{
|
||||
}
|
||||
extern void page_ext_init(void);
|
||||
#else
|
||||
extern void page_ext_init_flatmem(void);
|
||||
static inline void page_ext_init(void)
|
||||
{
|
||||
}
|
||||
#endif
|
||||
|
||||
struct page_ext *lookup_page_ext(struct page *page);
|
||||
|
||||
#else /* !CONFIG_PAGE_EXTENSION */
|
||||
struct page_ext;
|
||||
|
||||
static inline void pgdat_page_ext_init(struct pglist_data *pgdat)
|
||||
{
|
||||
}
|
||||
|
||||
static inline struct page_ext *lookup_page_ext(struct page *page)
|
||||
{
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static inline void page_ext_init(void)
|
||||
{
|
||||
}
|
||||
|
||||
static inline void page_ext_init_flatmem(void)
|
||||
{
|
||||
}
|
||||
#endif /* CONFIG_PAGE_EXTENSION */
|
||||
#endif /* __LINUX_PAGE_EXT_H */
|
38
include/linux/page_owner.h
Normal file
38
include/linux/page_owner.h
Normal file
@ -0,0 +1,38 @@
|
||||
#ifndef __LINUX_PAGE_OWNER_H
|
||||
#define __LINUX_PAGE_OWNER_H
|
||||
|
||||
#ifdef CONFIG_PAGE_OWNER
|
||||
extern bool page_owner_inited;
|
||||
extern struct page_ext_operations page_owner_ops;
|
||||
|
||||
extern void __reset_page_owner(struct page *page, unsigned int order);
|
||||
extern void __set_page_owner(struct page *page,
|
||||
unsigned int order, gfp_t gfp_mask);
|
||||
|
||||
static inline void reset_page_owner(struct page *page, unsigned int order)
|
||||
{
|
||||
if (likely(!page_owner_inited))
|
||||
return;
|
||||
|
||||
__reset_page_owner(page, order);
|
||||
}
|
||||
|
||||
static inline void set_page_owner(struct page *page,
|
||||
unsigned int order, gfp_t gfp_mask)
|
||||
{
|
||||
if (likely(!page_owner_inited))
|
||||
return;
|
||||
|
||||
__set_page_owner(page, order, gfp_mask);
|
||||
}
|
||||
#else
|
||||
static inline void reset_page_owner(struct page *page, unsigned int order)
|
||||
{
|
||||
}
|
||||
static inline void set_page_owner(struct page *page,
|
||||
unsigned int order, gfp_t gfp_mask)
|
||||
{
|
||||
}
|
||||
|
||||
#endif /* CONFIG_PAGE_OWNER */
|
||||
#endif /* __LINUX_PAGE_OWNER_H */
|
@ -254,8 +254,6 @@ do { \
|
||||
#endif /* CONFIG_SMP */
|
||||
|
||||
#define per_cpu(var, cpu) (*per_cpu_ptr(&(var), cpu))
|
||||
#define __raw_get_cpu_var(var) (*raw_cpu_ptr(&(var)))
|
||||
#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
|
||||
|
||||
/*
|
||||
* Must be an lvalue. Since @var must be a simple identifier,
|
||||
|
@ -17,14 +17,20 @@ struct ratelimit_state {
|
||||
unsigned long begin;
|
||||
};
|
||||
|
||||
#define DEFINE_RATELIMIT_STATE(name, interval_init, burst_init) \
|
||||
\
|
||||
struct ratelimit_state name = { \
|
||||
#define RATELIMIT_STATE_INIT(name, interval_init, burst_init) { \
|
||||
.lock = __RAW_SPIN_LOCK_UNLOCKED(name.lock), \
|
||||
.interval = interval_init, \
|
||||
.burst = burst_init, \
|
||||
}
|
||||
|
||||
#define RATELIMIT_STATE_INIT_DISABLED \
|
||||
RATELIMIT_STATE_INIT(ratelimit_state, 0, DEFAULT_RATELIMIT_BURST)
|
||||
|
||||
#define DEFINE_RATELIMIT_STATE(name, interval_init, burst_init) \
|
||||
\
|
||||
struct ratelimit_state name = \
|
||||
RATELIMIT_STATE_INIT(name, interval_init, burst_init) \
|
||||
|
||||
static inline void ratelimit_state_init(struct ratelimit_state *rs,
|
||||
int interval, int burst)
|
||||
{
|
||||
|
@ -1364,6 +1364,10 @@ struct task_struct {
|
||||
unsigned sched_reset_on_fork:1;
|
||||
unsigned sched_contributes_to_load:1;
|
||||
|
||||
#ifdef CONFIG_MEMCG_KMEM
|
||||
unsigned memcg_kmem_skip_account:1;
|
||||
#endif
|
||||
|
||||
unsigned long atomic_flags; /* Flags needing atomic access. */
|
||||
|
||||
pid_t pid;
|
||||
@ -1679,8 +1683,7 @@ struct task_struct {
|
||||
/* bitmask and counter of trace recursion */
|
||||
unsigned long trace_recursion;
|
||||
#endif /* CONFIG_TRACING */
|
||||
#ifdef CONFIG_MEMCG /* memcg uses this to do batch job */
|
||||
unsigned int memcg_kmem_skip_account;
|
||||
#ifdef CONFIG_MEMCG
|
||||
struct memcg_oom_info {
|
||||
struct mem_cgroup *memcg;
|
||||
gfp_t gfp_mask;
|
||||
@ -2482,6 +2485,10 @@ extern void do_group_exit(int);
|
||||
extern int do_execve(struct filename *,
|
||||
const char __user * const __user *,
|
||||
const char __user * const __user *);
|
||||
extern int do_execveat(int, struct filename *,
|
||||
const char __user * const __user *,
|
||||
const char __user * const __user *,
|
||||
int);
|
||||
extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *);
|
||||
struct task_struct *fork_idle(int);
|
||||
extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
|
||||
|
@ -18,8 +18,6 @@ struct shrink_control {
|
||||
*/
|
||||
unsigned long nr_to_scan;
|
||||
|
||||
/* shrink from these nodes */
|
||||
nodemask_t nodes_to_scan;
|
||||
/* current node being shrunk (for NUMA aware shrinkers) */
|
||||
int nid;
|
||||
};
|
||||
|
@ -493,7 +493,6 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
|
||||
* @memcg: pointer to the memcg this cache belongs to
|
||||
* @list: list_head for the list of all caches in this memcg
|
||||
* @root_cache: pointer to the global, root cache, this cache was derived from
|
||||
* @nr_pages: number of pages that belongs to this cache.
|
||||
*/
|
||||
struct memcg_cache_params {
|
||||
bool is_root_cache;
|
||||
@ -506,7 +505,6 @@ struct memcg_cache_params {
|
||||
struct mem_cgroup *memcg;
|
||||
struct list_head list;
|
||||
struct kmem_cache *root_cache;
|
||||
atomic_t nr_pages;
|
||||
};
|
||||
};
|
||||
};
|
||||
|
@ -1,6 +1,8 @@
|
||||
#ifndef __LINUX_STACKTRACE_H
|
||||
#define __LINUX_STACKTRACE_H
|
||||
|
||||
#include <linux/types.h>
|
||||
|
||||
struct task_struct;
|
||||
struct pt_regs;
|
||||
|
||||
@ -20,6 +22,8 @@ extern void save_stack_trace_tsk(struct task_struct *tsk,
|
||||
struct stack_trace *trace);
|
||||
|
||||
extern void print_stack_trace(struct stack_trace *trace, int spaces);
|
||||
extern int snprint_stack_trace(char *buf, size_t size,
|
||||
struct stack_trace *trace, int spaces);
|
||||
|
||||
#ifdef CONFIG_USER_STACKTRACE_SUPPORT
|
||||
extern void save_stack_trace_user(struct stack_trace *trace);
|
||||
@ -32,6 +36,7 @@ extern void save_stack_trace_user(struct stack_trace *trace);
|
||||
# define save_stack_trace_tsk(tsk, trace) do { } while (0)
|
||||
# define save_stack_trace_user(trace) do { } while (0)
|
||||
# define print_stack_trace(trace, spaces) do { } while (0)
|
||||
# define snprint_stack_trace(buf, size, trace, spaces) do { } while (0)
|
||||
#endif
|
||||
|
||||
#endif
|
||||
|
@ -102,14 +102,6 @@ union swap_header {
|
||||
} info;
|
||||
};
|
||||
|
||||
/* A swap entry has to fit into a "unsigned long", as
|
||||
* the entry is hidden in the "index" field of the
|
||||
* swapper address space.
|
||||
*/
|
||||
typedef struct {
|
||||
unsigned long val;
|
||||
} swp_entry_t;
|
||||
|
||||
/*
|
||||
* current->reclaim_state points to one of these when a task is running
|
||||
* memory reclaim
|
||||
|
@ -877,4 +877,9 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
|
||||
asmlinkage long sys_getrandom(char __user *buf, size_t count,
|
||||
unsigned int flags);
|
||||
asmlinkage long sys_bpf(int cmd, union bpf_attr *attr, unsigned int size);
|
||||
|
||||
asmlinkage long sys_execveat(int dfd, const char __user *filename,
|
||||
const char __user *const __user *argv,
|
||||
const char __user *const __user *envp, int flags);
|
||||
|
||||
#endif
|
||||
|
@ -90,6 +90,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
|
||||
#ifdef CONFIG_DEBUG_VM_VMACACHE
|
||||
VMACACHE_FIND_CALLS,
|
||||
VMACACHE_FIND_HITS,
|
||||
VMACACHE_FULL_FLUSHES,
|
||||
#endif
|
||||
NR_VM_EVENT_ITEMS
|
||||
};
|
||||
|
@ -707,9 +707,11 @@ __SYSCALL(__NR_getrandom, sys_getrandom)
|
||||
__SYSCALL(__NR_memfd_create, sys_memfd_create)
|
||||
#define __NR_bpf 280
|
||||
__SYSCALL(__NR_bpf, sys_bpf)
|
||||
#define __NR_execveat 281
|
||||
__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
|
||||
|
||||
#undef __NR_syscalls
|
||||
#define __NR_syscalls 281
|
||||
#define __NR_syscalls 282
|
||||
|
||||
/*
|
||||
* All syscalls below here should go away really,
|
||||
|
@ -51,16 +51,28 @@ struct msginfo {
|
||||
};
|
||||
|
||||
/*
|
||||
* Scaling factor to compute msgmni:
|
||||
* the memory dedicated to msg queues (msgmni * msgmnb) should occupy
|
||||
* at most 1/MSG_MEM_SCALE of the lowmem (see the formula in ipc/msg.c):
|
||||
* up to 8MB : msgmni = 16 (MSGMNI)
|
||||
* 4 GB : msgmni = 8K
|
||||
* more than 16 GB : msgmni = 32K (IPCMNI)
|
||||
* MSGMNI, MSGMAX and MSGMNB are default values which can be
|
||||
* modified by sysctl.
|
||||
*
|
||||
* MSGMNI is the upper limit for the number of messages queues per
|
||||
* namespace.
|
||||
* It has been chosen to be as large possible without facilitating
|
||||
* scenarios where userspace causes overflows when adjusting the limits via
|
||||
* operations of the form retrieve current limit; add X; update limit".
|
||||
*
|
||||
* MSGMNB is the default size of a new message queue. Non-root tasks can
|
||||
* decrease the size with msgctl(IPC_SET), root tasks
|
||||
* (actually: CAP_SYS_RESOURCE) can both increase and decrease the queue
|
||||
* size. The optimal value is application dependent.
|
||||
* 16384 is used because it was always used (since 0.99.10)
|
||||
*
|
||||
* MAXMAX is the maximum size of an individual message, it's a global
|
||||
* (per-namespace) limit that applies for all message queues.
|
||||
* It's set to 1/2 of MSGMNB, to ensure that at least two messages fit into
|
||||
* the queue. This is also an arbitrary choice (since 2.6.0).
|
||||
*/
|
||||
#define MSG_MEM_SCALE 32
|
||||
|
||||
#define MSGMNI 16 /* <= IPCMNI */ /* max # of msg queue identifiers */
|
||||
#define MSGMNI 32000 /* <= IPCMNI */ /* max # of msg queue identifiers */
|
||||
#define MSGMAX 8192 /* <= INT_MAX */ /* max size of message (bytes) */
|
||||
#define MSGMNB 16384 /* <= INT_MAX */ /* default max size of a message queue */
|
||||
|
||||
|
@ -63,10 +63,22 @@ struct seminfo {
|
||||
int semaem;
|
||||
};
|
||||
|
||||
#define SEMMNI 128 /* <= IPCMNI max # of semaphore identifiers */
|
||||
#define SEMMSL 250 /* <= 8 000 max num of semaphores per id */
|
||||
/*
|
||||
* SEMMNI, SEMMSL and SEMMNS are default values which can be
|
||||
* modified by sysctl.
|
||||
* The values has been chosen to be larger than necessary for any
|
||||
* known configuration.
|
||||
*
|
||||
* SEMOPM should not be increased beyond 1000, otherwise there is the
|
||||
* risk that semop()/semtimedop() fails due to kernel memory fragmentation when
|
||||
* allocating the sop array.
|
||||
*/
|
||||
|
||||
|
||||
#define SEMMNI 32000 /* <= IPCMNI max # of semaphore identifiers */
|
||||
#define SEMMSL 32000 /* <= INT_MAX max num of semaphores per id */
|
||||
#define SEMMNS (SEMMNI*SEMMSL) /* <= INT_MAX max # of semaphores in system */
|
||||
#define SEMOPM 32 /* <= 1 000 max num of ops per semop call */
|
||||
#define SEMOPM 500 /* <= 1 000 max num of ops per semop call */
|
||||
#define SEMVMX 32767 /* <= 32767 semaphore maximum value */
|
||||
#define SEMAEM SEMVMX /* adjust on exit max value */
|
||||
|
||||
|
@ -51,6 +51,7 @@
|
||||
#include <linux/mempolicy.h>
|
||||
#include <linux/key.h>
|
||||
#include <linux/buffer_head.h>
|
||||
#include <linux/page_ext.h>
|
||||
#include <linux/debug_locks.h>
|
||||
#include <linux/debugobjects.h>
|
||||
#include <linux/lockdep.h>
|
||||
@ -484,6 +485,11 @@ void __init __weak thread_info_cache_init(void)
|
||||
*/
|
||||
static void __init mm_init(void)
|
||||
{
|
||||
/*
|
||||
* page_ext requires contiguous pages,
|
||||
* bigger than MAX_ORDER unless SPARSEMEM.
|
||||
*/
|
||||
page_ext_init_flatmem();
|
||||
mem_init();
|
||||
kmem_cache_init();
|
||||
percpu_init_late();
|
||||
@ -621,6 +627,7 @@ asmlinkage __visible void __init start_kernel(void)
|
||||
initrd_start = 0;
|
||||
}
|
||||
#endif
|
||||
page_ext_init();
|
||||
debug_objects_mem_init();
|
||||
kmemleak_init();
|
||||
setup_per_cpu_pageset();
|
||||
|
@ -3,7 +3,7 @@
|
||||
#
|
||||
|
||||
obj-$(CONFIG_SYSVIPC_COMPAT) += compat.o
|
||||
obj-$(CONFIG_SYSVIPC) += util.o msgutil.o msg.o sem.o shm.o ipcns_notifier.o syscall.o
|
||||
obj-$(CONFIG_SYSVIPC) += util.o msgutil.o msg.o sem.o shm.o syscall.o
|
||||
obj-$(CONFIG_SYSVIPC_SYSCTL) += ipc_sysctl.o
|
||||
obj_mq-$(CONFIG_COMPAT) += compat_mq.o
|
||||
obj-$(CONFIG_POSIX_MQUEUE) += mqueue.o msgutil.o $(obj_mq-y)
|
||||
|
@ -62,29 +62,6 @@ static int proc_ipc_dointvec_minmax_orphans(struct ctl_table *table, int write,
|
||||
return err;
|
||||
}
|
||||
|
||||
static int proc_ipc_callback_dointvec_minmax(struct ctl_table *table, int write,
|
||||
void __user *buffer, size_t *lenp, loff_t *ppos)
|
||||
{
|
||||
struct ctl_table ipc_table;
|
||||
size_t lenp_bef = *lenp;
|
||||
int rc;
|
||||
|
||||
memcpy(&ipc_table, table, sizeof(ipc_table));
|
||||
ipc_table.data = get_ipc(table);
|
||||
|
||||
rc = proc_dointvec_minmax(&ipc_table, write, buffer, lenp, ppos);
|
||||
|
||||
if (write && !rc && lenp_bef == *lenp)
|
||||
/*
|
||||
* Tunable has successfully been changed by hand. Disable its
|
||||
* automatic adjustment. This simply requires unregistering
|
||||
* the notifiers that trigger recalculation.
|
||||
*/
|
||||
unregister_ipcns_notifier(current->nsproxy->ipc_ns);
|
||||
|
||||
return rc;
|
||||
}
|
||||
|
||||
static int proc_ipc_doulongvec_minmax(struct ctl_table *table, int write,
|
||||
void __user *buffer, size_t *lenp, loff_t *ppos)
|
||||
{
|
||||
@ -96,54 +73,19 @@ static int proc_ipc_doulongvec_minmax(struct ctl_table *table, int write,
|
||||
lenp, ppos);
|
||||
}
|
||||
|
||||
/*
|
||||
* Routine that is called when the file "auto_msgmni" has successfully been
|
||||
* written.
|
||||
* Two values are allowed:
|
||||
* 0: unregister msgmni's callback routine from the ipc namespace notifier
|
||||
* chain. This means that msgmni won't be recomputed anymore upon memory
|
||||
* add/remove or ipc namespace creation/removal.
|
||||
* 1: register back the callback routine.
|
||||
*/
|
||||
static void ipc_auto_callback(int val)
|
||||
{
|
||||
if (!val)
|
||||
unregister_ipcns_notifier(current->nsproxy->ipc_ns);
|
||||
else {
|
||||
/*
|
||||
* Re-enable automatic recomputing only if not already
|
||||
* enabled.
|
||||
*/
|
||||
recompute_msgmni(current->nsproxy->ipc_ns);
|
||||
cond_register_ipcns_notifier(current->nsproxy->ipc_ns);
|
||||
}
|
||||
}
|
||||
|
||||
static int proc_ipcauto_dointvec_minmax(struct ctl_table *table, int write,
|
||||
static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
|
||||
void __user *buffer, size_t *lenp, loff_t *ppos)
|
||||
{
|
||||
struct ctl_table ipc_table;
|
||||
int oldval;
|
||||
int rc;
|
||||
int dummy = 0;
|
||||
|
||||
memcpy(&ipc_table, table, sizeof(ipc_table));
|
||||
ipc_table.data = get_ipc(table);
|
||||
oldval = *((int *)(ipc_table.data));
|
||||
ipc_table.data = &dummy;
|
||||
|
||||
rc = proc_dointvec_minmax(&ipc_table, write, buffer, lenp, ppos);
|
||||
if (write)
|
||||
pr_info_once("writing to auto_msgmni has no effect");
|
||||
|
||||
if (write && !rc) {
|
||||
int newval = *((int *)(ipc_table.data));
|
||||
/*
|
||||
* The file "auto_msgmni" has correctly been set.
|
||||
* React by (un)registering the corresponding tunable, if the
|
||||
* value has changed.
|
||||
*/
|
||||
if (newval != oldval)
|
||||
ipc_auto_callback(newval);
|
||||
}
|
||||
|
||||
return rc;
|
||||
return proc_dointvec_minmax(&ipc_table, write, buffer, lenp, ppos);
|
||||
}
|
||||
|
||||
#else
|
||||
@ -151,8 +93,7 @@ static int proc_ipcauto_dointvec_minmax(struct ctl_table *table, int write,
|
||||
#define proc_ipc_dointvec NULL
|
||||
#define proc_ipc_dointvec_minmax NULL
|
||||
#define proc_ipc_dointvec_minmax_orphans NULL
|
||||
#define proc_ipc_callback_dointvec_minmax NULL
|
||||
#define proc_ipcauto_dointvec_minmax NULL
|
||||
#define proc_ipc_auto_msgmni NULL
|
||||
#endif
|
||||
|
||||
static int zero;
|
||||
@ -204,10 +145,19 @@ static struct ctl_table ipc_kern_table[] = {
|
||||
.data = &init_ipc_ns.msg_ctlmni,
|
||||
.maxlen = sizeof(init_ipc_ns.msg_ctlmni),
|
||||
.mode = 0644,
|
||||
.proc_handler = proc_ipc_callback_dointvec_minmax,
|
||||
.proc_handler = proc_ipc_dointvec_minmax,
|
||||
.extra1 = &zero,
|
||||
.extra2 = &int_max,
|
||||
},
|
||||
{
|
||||
.procname = "auto_msgmni",
|
||||
.data = NULL,
|
||||
.maxlen = sizeof(int),
|
||||
.mode = 0644,
|
||||
.proc_handler = proc_ipc_auto_msgmni,
|
||||
.extra1 = &zero,
|
||||
.extra2 = &one,
|
||||
},
|
||||
{
|
||||
.procname = "msgmnb",
|
||||
.data = &init_ipc_ns.msg_ctlmnb,
|
||||
@ -224,15 +174,6 @@ static struct ctl_table ipc_kern_table[] = {
|
||||
.mode = 0644,
|
||||
.proc_handler = proc_ipc_dointvec,
|
||||
},
|
||||
{
|
||||
.procname = "auto_msgmni",
|
||||
.data = &init_ipc_ns.auto_msgmni,
|
||||
.maxlen = sizeof(int),
|
||||
.mode = 0644,
|
||||
.proc_handler = proc_ipcauto_dointvec_minmax,
|
||||
.extra1 = &zero,
|
||||
.extra2 = &one,
|
||||
},
|
||||
#ifdef CONFIG_CHECKPOINT_RESTORE
|
||||
{
|
||||
.procname = "sem_next_id",
|
||||
|
@ -1,92 +0,0 @@
|
||||
/*
|
||||
* linux/ipc/ipcns_notifier.c
|
||||
* Copyright (C) 2007 BULL SA. Nadia Derbey
|
||||
*
|
||||
* Notification mechanism for ipc namespaces:
|
||||
* The callback routine registered in the memory chain invokes the ipcns
|
||||
* notifier chain with the IPCNS_MEMCHANGED event.
|
||||
* Each callback routine registered in the ipcns namespace recomputes msgmni
|
||||
* for the owning namespace.
|
||||
*/
|
||||
|
||||
#include <linux/msg.h>
|
||||
#include <linux/rcupdate.h>
|
||||
#include <linux/notifier.h>
|
||||
#include <linux/nsproxy.h>
|
||||
#include <linux/ipc_namespace.h>
|
||||
|
||||
#include "util.h"
|
||||
|
||||
|
||||
|
||||
static BLOCKING_NOTIFIER_HEAD(ipcns_chain);
|
||||
|
||||
|
||||
static int ipcns_callback(struct notifier_block *self,
|
||||
unsigned long action, void *arg)
|
||||
{
|
||||
struct ipc_namespace *ns;
|
||||
|
||||
switch (action) {
|
||||
case IPCNS_MEMCHANGED: /* amount of lowmem has changed */
|
||||
case IPCNS_CREATED:
|
||||
case IPCNS_REMOVED:
|
||||
/*
|
||||
* It's time to recompute msgmni
|
||||
*/
|
||||
ns = container_of(self, struct ipc_namespace, ipcns_nb);
|
||||
/*
|
||||
* No need to get a reference on the ns: the 1st job of
|
||||
* free_ipc_ns() is to unregister the callback routine.
|
||||
* blocking_notifier_chain_unregister takes the wr lock to do
|
||||
* it.
|
||||
* When this callback routine is called the rd lock is held by
|
||||
* blocking_notifier_call_chain.
|
||||
* So the ipc ns cannot be freed while we are here.
|
||||
*/
|
||||
recompute_msgmni(ns);
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
|
||||
return NOTIFY_OK;
|
||||
}
|
||||
|
||||
int register_ipcns_notifier(struct ipc_namespace *ns)
|
||||
{
|
||||
int rc;
|
||||
|
||||
memset(&ns->ipcns_nb, 0, sizeof(ns->ipcns_nb));
|
||||
ns->ipcns_nb.notifier_call = ipcns_callback;
|
||||
ns->ipcns_nb.priority = IPCNS_CALLBACK_PRI;
|
||||
rc = blocking_notifier_chain_register(&ipcns_chain, &ns->ipcns_nb);
|
||||
if (!rc)
|
||||
ns->auto_msgmni = 1;
|
||||
return rc;
|
||||
}
|
||||
|
||||
int cond_register_ipcns_notifier(struct ipc_namespace *ns)
|
||||
{
|
||||
int rc;
|
||||
|
||||
memset(&ns->ipcns_nb, 0, sizeof(ns->ipcns_nb));
|
||||
ns->ipcns_nb.notifier_call = ipcns_callback;
|
||||
ns->ipcns_nb.priority = IPCNS_CALLBACK_PRI;
|
||||
rc = blocking_notifier_chain_cond_register(&ipcns_chain,
|
||||
&ns->ipcns_nb);
|
||||
if (!rc)
|
||||
ns->auto_msgmni = 1;
|
||||
return rc;
|
||||
}
|
||||
|
||||
void unregister_ipcns_notifier(struct ipc_namespace *ns)
|
||||
{
|
||||
blocking_notifier_chain_unregister(&ipcns_chain, &ns->ipcns_nb);
|
||||
ns->auto_msgmni = 0;
|
||||
}
|
||||
|
||||
int ipcns_notify(unsigned long val)
|
||||
{
|
||||
return blocking_notifier_call_chain(&ipcns_chain, val, NULL);
|
||||
}
|
36
ipc/msg.c
36
ipc/msg.c
@ -989,43 +989,12 @@ SYSCALL_DEFINE5(msgrcv, int, msqid, struct msgbuf __user *, msgp, size_t, msgsz,
|
||||
return do_msgrcv(msqid, msgp, msgsz, msgtyp, msgflg, do_msg_fill);
|
||||
}
|
||||
|
||||
/*
|
||||
* Scale msgmni with the available lowmem size: the memory dedicated to msg
|
||||
* queues should occupy at most 1/MSG_MEM_SCALE of lowmem.
|
||||
* Also take into account the number of nsproxies created so far.
|
||||
* This should be done staying within the (MSGMNI , IPCMNI/nr_ipc_ns) range.
|
||||
*/
|
||||
void recompute_msgmni(struct ipc_namespace *ns)
|
||||
{
|
||||
struct sysinfo i;
|
||||
unsigned long allowed;
|
||||
int nb_ns;
|
||||
|
||||
si_meminfo(&i);
|
||||
allowed = (((i.totalram - i.totalhigh) / MSG_MEM_SCALE) * i.mem_unit)
|
||||
/ MSGMNB;
|
||||
nb_ns = atomic_read(&nr_ipc_ns);
|
||||
allowed /= nb_ns;
|
||||
|
||||
if (allowed < MSGMNI) {
|
||||
ns->msg_ctlmni = MSGMNI;
|
||||
return;
|
||||
}
|
||||
|
||||
if (allowed > IPCMNI / nb_ns) {
|
||||
ns->msg_ctlmni = IPCMNI / nb_ns;
|
||||
return;
|
||||
}
|
||||
|
||||
ns->msg_ctlmni = allowed;
|
||||
}
|
||||
|
||||
void msg_init_ns(struct ipc_namespace *ns)
|
||||
{
|
||||
ns->msg_ctlmax = MSGMAX;
|
||||
ns->msg_ctlmnb = MSGMNB;
|
||||
|
||||
recompute_msgmni(ns);
|
||||
ns->msg_ctlmni = MSGMNI;
|
||||
|
||||
atomic_set(&ns->msg_bytes, 0);
|
||||
atomic_set(&ns->msg_hdrs, 0);
|
||||
@ -1069,9 +1038,6 @@ void __init msg_init(void)
|
||||
{
|
||||
msg_init_ns(&init_ipc_ns);
|
||||
|
||||
printk(KERN_INFO "msgmni has been set to %d\n",
|
||||
init_ipc_ns.msg_ctlmni);
|
||||
|
||||
ipc_init_proc_interface("sysvipc/msg",
|
||||
" key msqid perms cbytes qnum lspid lrpid uid gid cuid cgid stime rtime ctime\n",
|
||||
IPC_MSG_IDS, sysvipc_msg_proc_show);
|
||||
|
@ -45,14 +45,6 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
|
||||
msg_init_ns(ns);
|
||||
shm_init_ns(ns);
|
||||
|
||||
/*
|
||||
* msgmni has already been computed for the new ipc ns.
|
||||
* Thus, do the ipcns creation notification before registering that
|
||||
* new ipcns in the chain.
|
||||
*/
|
||||
ipcns_notify(IPCNS_CREATED);
|
||||
register_ipcns_notifier(ns);
|
||||
|
||||
ns->user_ns = get_user_ns(user_ns);
|
||||
|
||||
return ns;
|
||||
@ -99,25 +91,11 @@ void free_ipcs(struct ipc_namespace *ns, struct ipc_ids *ids,
|
||||
|
||||
static void free_ipc_ns(struct ipc_namespace *ns)
|
||||
{
|
||||
/*
|
||||
* Unregistering the hotplug notifier at the beginning guarantees
|
||||
* that the ipc namespace won't be freed while we are inside the
|
||||
* callback routine. Since the blocking_notifier_chain_XXX routines
|
||||
* hold a rw lock on the notifier list, unregister_ipcns_notifier()
|
||||
* won't take the rw lock before blocking_notifier_call_chain() has
|
||||
* released the rd lock.
|
||||
*/
|
||||
unregister_ipcns_notifier(ns);
|
||||
sem_exit_ns(ns);
|
||||
msg_exit_ns(ns);
|
||||
shm_exit_ns(ns);
|
||||
atomic_dec(&nr_ipc_ns);
|
||||
|
||||
/*
|
||||
* Do the ipcns removal notification after decrementing nr_ipc_ns in
|
||||
* order to have a correct value when recomputing msgmni.
|
||||
*/
|
||||
ipcns_notify(IPCNS_REMOVED);
|
||||
put_user_ns(ns->user_ns);
|
||||
proc_free_inum(ns->proc_inum);
|
||||
kfree(ns);
|
||||
|
13
ipc/sem.c
13
ipc/sem.c
@ -326,10 +326,17 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
|
||||
|
||||
/* Then check that the global lock is free */
|
||||
if (!spin_is_locked(&sma->sem_perm.lock)) {
|
||||
/* spin_is_locked() is not a memory barrier */
|
||||
smp_mb();
|
||||
/*
|
||||
* The ipc object lock check must be visible on all
|
||||
* cores before rechecking the complex count. Otherwise
|
||||
* we can race with another thread that does:
|
||||
* complex_count++;
|
||||
* spin_unlock(sem_perm.lock);
|
||||
*/
|
||||
smp_rmb();
|
||||
|
||||
/* Now repeat the test of complex_count:
|
||||
/*
|
||||
* Now repeat the test of complex_count:
|
||||
* It can't change anymore until we drop sem->lock.
|
||||
* Thus: if is now 0, then it will stay 0.
|
||||
*/
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user