IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Pull block bits from Jens Axboe:
"As vacation is coming up, thought I'd better get rid of my pending
changes in my for-linus branch for this iteration. It contains:
- Two patches for mtip32xx. Killing a non-compliant sysfs interface
and moving it to debugfs, where it belongs.
- A few patches from Asias. Two legit bug fixes, and one killing an
interface that is no longer in use.
- A patch from Jan, making the annoying partition ioctl warning a bit
less annoying, by restricting it to !CAP_SYS_RAWIO only.
- Three bug fixes for drbd from Lars Ellenberg.
- A fix for an old regression for umem, it hasn't really worked since
the plugging scheme was changed in 3.0.
- A few fixes from Tejun.
- A splice fix from Eric Dumazet, fixing an issue with pipe
resizing."
* 'for-linus' of git://git.kernel.dk/linux-block:
scsi: Silence unnecessary warnings about ioctl to partition
block: Drop dead function blk_abort_queue()
block: Mitigate lock unbalance caused by lock switching
block: Avoid missed wakeup in request waitqueue
umem: fix up unplugging
splice: fix racy pipe->buffers uses
drbd: fix null pointer dereference with on-congestion policy when diskless
drbd: fix list corruption by failing but already aborted reads
drbd: fix access of unallocated pages and kernel panic
xen/blkfront: Add WARN to deal with misbehaving backends.
blkcg: drop local variable @q from blkg_destroy()
mtip32xx: Create debugfs entries for troubleshooting
mtip32xx: Remove 'registers' and 'flags' from sysfs
blkcg: fix blkg_alloc() failure path
block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED
block: fix return value on cfq_init() failure
mtip32xx: Remove version.h header file inclusion
xen/blkback: Copy id field when doing BLKIF_DISCARD.
During kmem_cache_init_late(), we transition to the LATE state,
and after some more work, to the FULL state, its last state
This is quite different from slub, that will only transition to
its last state (previously SYSFS), in a (late)initcall, after a lot
more of the kernel is ready.
This means that in slab, we have no way to taking actions dependent
on the initialization of other pieces of the kernel that are supposed
to start way after kmem_init_late(), such as cgroups initialization.
To achieve more consistency in this behavior, that patch only
transitions to the UP state in kmem_init_late. In my analysis,
setup_cpu_cache() should be happy to test for >= UP, instead of
== FULL. It also has passed some tests I've made.
We then only mark FULL state after the reap timers are in place,
meaning that no further setup is expected.
Signed-off-by: Glauber Costa <glommer@parallels.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Commit 8c138b only sits in Pekka's and linux-next tree now, which tries
to replace obj_size(cachep) with cachep->object_size, but has a typo in
kmem_cache_free() by using "size" instead of "object_size", which casues
some regressions.
Reported-and-tested-by: Fengguang Wu <wfg@linux.intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Commit 3b0efdf ("mm, sl[aou]b: Extract common fields from struct
kmem_cache") renamed the kmem_cache structure's "next" field to "list"
but forgot to update one instance in leaks_show().
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
A consistent name with slub saves us an acessor function.
In both caches, this field represents the same thing. We would
like to use it from the mem_cgroup code.
Signed-off-by: Glauber Costa <glommer@parallels.com>
Acked-by: Christoph Lameter <cl@linux.com>
CC: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Conflicts:
include/linux/mmzone.h
Synced with Linus' tree so that trivial patch can be applied
on top of up-to-date code properly.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Since there are five lists in LRU cache, the array nr in get_scan_count
should be:
nr[0] = anon inactive pages to scan; nr[1] = anon active pages to scan
nr[2] = file inactive pages to scan; nr[3] = file active pages to scan
Signed-off-by: Wanpeng Li <liwp.linux@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This patch enabled the tlb flush range support in generic mmu layer.
Most of arch has self tlb flush range support, like ARM/IA64 etc.
X86 arch has no this support in hardware yet. But another instruction
'invlpg' can implement this function in some degree. So, enable this
feather in generic layer for x86 now. and maybe useful for other archs
in further.
Generic mmu_gather struct is protected by micro
HAVE_GENERIC_MMU_GATHER. Other archs that has flush range supported
own self mmu_gather struct. So, now this change is safe for them.
In future we may unify this struct and related functions on multiple
archs.
Thanks for Peter Zijlstra time and time reminder for multiple
architecture code safe!
Signed-off-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-7-git-send-email-alex.shi@intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
mempool_create_node() currently assumes %GFP_KERNEL. Its only user,
blk_init_free_list(), is about to be updated to use other allocation
flags - add @gfp_mask argument to the function.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If the range passed to mbind() is not allocated on nodes set in the
nodemask, it migrates the pages to respect the constraint.
The final formal of migrate_pages() is a mode of type enum migrate_mode,
not a boolean. do_mbind() is currently passing "true" which is the
equivalent of MIGRATE_SYNC_LIGHT. This should instead be MIGRATE_SYNC
for synchronous page migration.
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__alloc_memory_core_early() asks memblock for a range of memory then try
to reserve it. If the reserved region array lacks space for the new
range, memblock_double_array() is called to allocate more space for the
array. If memblock is used to allocate memory for the new array it can
end up using a range that overlaps with the range originally allocated in
__alloc_memory_core_early(), leading to possible data corruption.
With this patch memblock_double_array() now calls memblock_find_in_range()
with a narrowed candidate range (in cases where the reserved.regions array
is being doubled) so any memory allocated will not overlap with the
original range that was being reserved. The range is narrowed by passing
in the starting address and size of the previously allocated range. Then
the range above the ending address is searched and if a candidate is not
found, the range below the starting address is searched.
Signed-off-by: Greg Pearson <greg.pearson@hp.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix kernel-doc warnings in mm/memory.c:
Warning(mm/memory.c:1377): No description found for parameter 'start'
Warning(mm/memory.c:1377): Excess function parameter 'address' description in 'zap_page_range'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix kernel-doc warnings such as
Warning(../mm/page_cgroup.c:432): No description found for parameter 'id'
Warning(../mm/page_cgroup.c:432): Excess function parameter 'mem' description in 'swap_cgroup_record'
Signed-off-by: Wanpeng Li <liwp@linux.vnet.ibm.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea asked for addr, end, vma->vm_start, and vma->vm_end to be emitted
when !rwsem_is_locked(&tlb->mm->mmap_sem). Otherwise, debugging the
underlying issue is more difficult.
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If use_hierarchy is set, reclaim testing soon oopses in css_is_ancestor()
called from __mem_cgroup_same_or_subtree() called from page_referenced():
when processes are exiting, it's easy for mm_match_cgroup() to pass along
a NULL memcg coming from a NULL mm->owner.
Check for that in __mem_cgroup_same_or_subtree(). Return true or false?
False because we cannot know if it was in the hierarchy, but also false
because it's better not to count a reference from an exiting process.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The divide in p->signal->oom_score_adj * totalpages / 1000 within
oom_badness() was causing an overflow of the signed long data type.
This adds both the root bias and p->signal->oom_score_adj before doing the
normalization which fixes the issue and also cleans up the calculation.
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Current implementation of unfreeze_partials() is so complicated,
but benefit from it is insignificant. In addition many code in
do {} while loop have a bad influence to a fail rate of cmpxchg_double_slab.
Under current implementation which test status of cpu partial slab
and acquire list_lock in do {} while loop,
we don't need to acquire a list_lock and gain a little benefit
when front of the cpu partial slab is to be discarded, but this is a rare case.
In case that add_partial is performed and cmpxchg_double_slab is failed,
remove_partial should be called case by case.
I think that these are disadvantages of current implementation,
so I do refactoring unfreeze_partials().
Minimizing code in do {} while loop introduce a reduced fail rate
of cmpxchg_double_slab. Below is output of 'slabinfo -r kmalloc-256'
when './perf stat -r 33 hackbench 50 process 4000 > /dev/null' is done.
** before **
Cmpxchg_double Looping
------------------------
Locked Cmpxchg Double redos 182685
Unlocked Cmpxchg Double redos 0
** after **
Cmpxchg_double Looping
------------------------
Locked Cmpxchg Double redos 177995
Unlocked Cmpxchg Double redos 1
We can see cmpxchg_double_slab fail rate is improved slightly.
Bolow is output of './perf stat -r 30 hackbench 50 process 4000 > /dev/null'.
** before **
Performance counter stats for './hackbench 50 process 4000' (30 runs):
108517.190463 task-clock # 7.926 CPUs utilized ( +- 0.24% )
2,919,550 context-switches # 0.027 M/sec ( +- 3.07% )
100,774 CPU-migrations # 0.929 K/sec ( +- 4.72% )
124,201 page-faults # 0.001 M/sec ( +- 0.15% )
401,500,234,387 cycles # 3.700 GHz ( +- 0.24% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
250,576,913,354 instructions # 0.62 insns per cycle ( +- 0.13% )
45,934,956,860 branches # 423.297 M/sec ( +- 0.14% )
188,219,787 branch-misses # 0.41% of all branches ( +- 0.56% )
13.691837307 seconds time elapsed ( +- 0.24% )
** after **
Performance counter stats for './hackbench 50 process 4000' (30 runs):
107784.479767 task-clock # 7.928 CPUs utilized ( +- 0.22% )
2,834,781 context-switches # 0.026 M/sec ( +- 2.33% )
93,083 CPU-migrations # 0.864 K/sec ( +- 3.45% )
123,967 page-faults # 0.001 M/sec ( +- 0.15% )
398,781,421,836 cycles # 3.700 GHz ( +- 0.22% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
250,189,160,419 instructions # 0.63 insns per cycle ( +- 0.09% )
45,855,370,128 branches # 425.436 M/sec ( +- 0.10% )
169,881,248 branch-misses # 0.37% of all branches ( +- 0.43% )
13.596272341 seconds time elapsed ( +- 0.22% )
No regression is found, but rather we can see slightly better result.
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
get_freelist(), unfreeze_partials() are only called with interrupt disabled,
so __cmpxchg_double_slab() is suitable.
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
slab_node() could access current->mempolicy from interrupt context.
However there's a race condition during exit where the mempolicy
is first freed and then the pointer zeroed.
Using this from interrupts seems bogus anyways. The interrupt
will interrupt a random process and therefore get a random
mempolicy. Many times, this will be idle's, which noone can change.
Just disable this here and always use local for slab
from interrupts. I also cleaned up the callers of slab_node a bit
which always passed the same argument.
I believe the original mempolicy code did that in fact,
so it's likely a regression.
v2: send version with correct logic
v3: simplify. fix typo.
Reported-by: Arun Sharma <asharma@fb.com>
Cc: penberg@kernel.org
Cc: cl@linux.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
[tdmackey@twitter.com: Rework control flow based on feedback from
cl@linux.com, fix logic, and cleanup current task_struct reference]
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: David Mackey <tdmackey@twitter.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Minchan Kim reports that when a system has many swap areas, and tmpfs
swaps out to the ninth or more, shmem_getpage_gfp()'s attempts to read
back the page cannot locate it, and the read fails with -ENOMEM.
Whoops. Yes, I blindly followed read_swap_header()'s pte_to_swp_entry(
swp_entry_to_pte()) technique for determining maximum usable swap
offset, without stopping to realize that that actually depends upon the
pte swap encoding shifting swap offset to the higher bits and truncating
it there. Whereas our radix_tree swap encoding leaves offset in the
lower bits: it's swap "type" (that is, index of swap area) that was
truncated.
Fix it by reducing the SWP_TYPE_SHIFT() in swapops.h, and removing the
broken radix_to_swp_entry(swp_to_radix_entry()) from read_swap_header().
This does not reduce the usable size of a swap area any further, it
leaves it as claimed when making the original commit: no change from 3.0
on x86_64, nor on i386 without PAE; but 3.0's 512GB is reduced to 128GB
per swapfile on i386 with PAE. It's not a change I would have risked
five years ago, but with x86_64 supported for ten years, I believe it's
appropriate now.
Hmm, and what if some architecture implements its swap pte with offset
encoded below type? That would equally break the maximum usable swap
offset check. Happily, they all follow the same tradition of encoding
offset above type, but I'll prepare a check on that for next.
Reported-and-Reviewed-and-Tested-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org [3.1, 3.2, 3.3, 3.4]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull core updates (RCU and locking) from Ingo Molnar:
"Most of the diffstat comes from the RCU slow boot regression fixes,
but there's also a debuggability improvements/fixes."
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
memblock: Document memblock_is_region_{memory,reserved}()
rcu: Precompute RCU_FAST_NO_HZ timer offsets
rcu: Move RCU_FAST_NO_HZ per-CPU variables to rcu_dynticks structure
rcu: Update RCU_FAST_NO_HZ tracing for lazy callbacks
rcu: RCU_FAST_NO_HZ detection of callback adoption
spinlock: Indicate that a lockup is only suspected
kdump: Execute kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop()
panic: Make panic_on_oops configurable
The size of the slab object is frequently needed. Since we now
have a size field directly in the kmem_cache structure there is no
need anymore of the obj_size macro/function.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Define a struct that describes common fields used in all slab allocators.
A slab allocator either uses the common definition (like SLOB) or is
required to provide members of kmem_cache with the definition given.
After that it will be possible to share code that
only operates on those fields of kmem_cache.
The patch basically takes the slob definition of kmem cache and
uses the field namees for the other allocators.
It also standardizes the names used for basic object lengths in
allocators:
object_size Struct size specified at kmem_cache_create. Basically
the payload expected to be used by the subsystem.
size The size of memory allocator for each object. This size
is larger than object_size and includes padding, alignment
and extra metadata for each object (f.e. for debugging
and rcu).
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Those are rather trivial now and its better to see inline what is
really going on.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Add fields to the page struct so that it is properly documented that
slab overlays the lru fields.
This cleans up some casts in slab.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Reviewed-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Those have become so simple that they are no longer needed.
Reviewed-by: Joonsoo Kim <js1304@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Reviewed-by: Joonsoo Kim <js1304@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Define the fields used by slob in mm_types.h and use struct page instead
of struct slob_page in slob. This cleans up numerous of typecasts in slob.c and
makes readers aware of slob's use of page struct fields.
[Also cleans up some bitrot in slob.c. The page struct field layout
in slob.c is an old layout and does not match the one in mm_types.h]
Reviewed-by: Glauber Costa <glommer@parallels.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered
by splice_shrink_spd() called from vmsplice_to_pipe()
commit 35f3d14dbbc5 (pipe: add support for shrinking and growing pipes)
added capability to adjust pipe->buffers.
Problem is some paths don't hold pipe mutex and assume pipe->buffers
doesn't change for their duration.
Fix this by adding nr_pages_max field in struct splice_pipe_desc, and
use it in place of pipe->buffers where appropriate.
splice_shrink_spd() loses its struct pipe_inode_info argument.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Tom Herbert <therbert@google.com>
Cc: stable <stable@vger.kernel.org> # 2.6.35
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The check whether frontswap is enabled or not is done in the API functions in
the frontswap header, before they are passed to the internal
double-underscored frontswap functions.
Remove the check from __frontswap_init for consistency.
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Currently it has a complex structure where different things are compared
at each branch. Simplify that and make both branches look similar.
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Split frontswap_shrink to simplify the locking in the original code.
Also, assert that the function that was split still runs under the
swap spinlock.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
An attempt at making frontswap_shrink shorter and more readable. This patch
splits out walking through the swap list to find an entry with enough
pages to unuse.
Also, assert that the internal __frontswap_unuse_pages is called under swap
lock, since that part of code was previously directly happen inside the lock.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Code was duplicated in two functions, clean it up.
Also, assert that the deduplicated code runs under the swap spinlock.
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Convert calculations of proportion of writeback each bdi does to new flexible
proportion code. That allows us to use aging period of fixed wallclock time
which gives better proportion estimates given the hugely varying throughput of
different devices.
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
If the privileges given to root threads (3% of allowable memory) or a
negative value of /proc/pid/oom_score_adj happen to exceed the amount of
rss of a thread, its badness score overflows as a result of commit
a7f638f999ff ("mm, oom: normalize oom scores to oom_score_adj scale only
for userspace").
Fix this by making the type signed and return 1, meaning the thread is
still eligible for kill, if the value is negative.
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
At first glance one would think that memblock_is_region_memory()
and memblock_is_region_reserved() would be implemented in the
same way. Unfortunately they aren't and the former returns
whether the region specified is a subset of a memory bank while
the latter returns whether the region specified intersects with
reserved memory.
Document the two functions so that users aren't tempted to
make the implementation the same between them and to clarify the
purpose of the functions.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1337845521-32755-1-git-send-email-sboyd@codeaurora.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit bde05d1ccd51 ("shmem: replace page if mapping excludes its zone")
is not at all likely to break for anyone, but it was an earlier version
from before review feedback was incorporated. Fix that up now.
* shmem_replace_page must flush_dcache_page after copy_highpage [akpm]
* Expand comment on why shmem_unuse_inode needs page_swapcount [akpm]
* Remove excess of VM_BUG_ONs from shmem_replace_page [wangcong]
* Check page_private matches swap before calling shmem_replace_page [hughd]
* shmem_replace_page allow for unexpected race in radix_tree lookup [hughd]
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Stephane Marchesin <marcheu@chromium.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Rob Clark <rob.clark@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull signal and vfs compile breakage fixes from Al Viro.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
fixups for signal breakage
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
nommu: fix compilation of nommu.c
Compiling 3.5-rc1 for nommu targets gives:
CC mm/nommu.o
mm/nommu.c: In function ‘sys_mmap_pgoff’:
mm/nommu.c:1489:2: error: ‘ret’ undeclared (first use in this function)
mm/nommu.c:1489:2: note: each undeclared identifier is reported only once for each function it appears in
It is trivially fixed by replacing 'ret' with the local variable that is
already defined for the return value 'retval'.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In some environments, dramatic performance savings may be obtained because
swapped pages are saved in RAM (or a RAM-like device) instead of a swap disk.
This tag provides the basic infrastructure along with some changes to the
existing backends.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJPsorBAAoJEFjIrFwIi8fJcz8H/RBXCtFo0kiJmRked3nMAIDO
/2zN/q/Qawsg9aeoGlP7G8hQi9PMipbhQj3ixHyCTMv0zMbH988GXbBce+gIcg6e
TOQi7xXAuPEwLizmSpiTv84XzN5bMgu1oJXEqIXw0EIpuZAmp+9m/o3WBwEAtyxi
B+hvjE7eZM8f75K3lxs6sOtmIcERj9zqmT933Y8+i9iiuRyGMey2SyKtvVLbYZ+j
HroFMUi0so5TzxT/cpkRiHu0U75c651o+LV00zh7InMqbwyRsWlKTf53k8Q/q2WP
I7dVmfItwN/TpOrYTfxglYFlbYuUP35ziFvZ2trd6hcs9RK8OuKw+OmBLReHTtc=
=x9Vp
-----END PGP SIGNATURE-----
Merge tag 'stable/frontswap.v16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm
Pull frontswap feature from Konrad Rzeszutek Wilk:
"Frontswap provides a "transcendent memory" interface for swap pages.
In some environments, dramatic performance savings may be obtained
because swapped pages are saved in RAM (or a RAM-like device) instead
of a swap disk. This tag provides the basic infrastructure along with
some changes to the existing backends."
Fix up trivial conflict in mm/Makefile due to removal of swap token code
changing a line next to the new frontswap entry.
This pull request came in before the merge window even opened, it got
delayed to after the merge window by me just wanting to make sure it had
actual users. Apparently IBM is using this on their embedded side, and
Jan Beulich says that it's already made available for SLES and OpenSUSE
users.
Also acked by Rik van Riel, and Konrad points to other people liking it
too. So in it goes.
By Dan Magenheimer (4) and Konrad Rzeszutek Wilk (2)
via Konrad Rzeszutek Wilk
* tag 'stable/frontswap.v16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm:
frontswap: s/put_page/store/g s/get_page/load
MAINTAINER: Add myself for the frontswap API
mm: frontswap: config and doc files
mm: frontswap: core frontswap functionality
mm: frontswap: core swap subsystem hooks and headers
mm: frontswap: add frontswap header file
* Fix a merge conflict in mm/slub.c::acquire_slab() due to commit 02d7633
("slub: fix a memory leak in get_partial_node()").
Conflicts:
mm/slub.c
Signed-off-by: Pekka Enberg <penberg@kernel.org>
This reverts commit 5ceb9ce6fe9462a298bb2cd5c9f1ca6cb80a0199.
That commit seems to be the cause of the mm compation list corruption
issues that Dave Jones reported. The locking (or rather, absense
there-of) is dubious, as is the use of the 'page' variable once it has
been found to be outside the pageblock range.
So revert it for now, we can re-visit this for 3.6. If we even need to:
as Minchan Kim says, "The patch wasn't a bug fix and even test workload
was very theoretical".
Reported-and-tested-by: Dave Jones <davej@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
New tmpfs use of !PageUptodate pages for fallocate() is triggering the
WARNING: at mm/page-writeback.c:1990 when __set_page_dirty_nobuffers()
is called from migrate_page_copy() for compaction.
It is anomalous that migration should use __set_page_dirty_nobuffers()
on an address_space that does not participate in dirty and writeback
accounting; and this has also been observed to insert surprising dirty
tags into a tmpfs radix_tree, despite tmpfs not using tags at all.
We should probably give migrate_page_copy() a better way to preserve the
tag and migrate accounting info, when mapping_cap_account_dirty(). But
that needs some more work: so in the interim, avoid the warning by using
a simple SetPageDirty on PageSwapBacked pages.
Reported-and-tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull slab updates from Pekka Enberg:
"Mainly a bunch of SLUB fixes from Joonsoo Kim"
* 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
slub: use __SetPageSlab function to set PG_slab flag
slub: fix a memory leak in get_partial_node()
slub: remove unused argument of init_kmem_cache_node()
slub: fix a possible memory leak
Documentations: Fix slabinfo.c directory in vm/slub.txt
slub: fix incorrect return type of get_any_partial()
Pull vfs changes from Al Viro.
"A lot of misc stuff. The obvious groups:
* Miklos' atomic_open series; kills the damn abuse of
->d_revalidate() by NFS, which was the major stumbling block for
all work in that area.
* ripping security_file_mmap() and dealing with deadlocks in the
area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in
general.
* ->encode_fh() switched to saner API; insane fake dentry in
mm/cleancache.c gone.
* assorted annotations in fs (endianness, __user)
* parts of Artem's ->s_dirty work (jff2 and reiserfs parts)
* ->update_time() work from Josef.
* other bits and pieces all over the place.
Normally it would've been in two or three pull requests, but
signal.git stuff had eaten a lot of time during this cycle ;-/"
Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the
'truncate_range' inode method was removed by the VM changes, the VFS
update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due
to sparse fix added twice, with other changes nearby).
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits)
nfs: don't open in ->d_revalidate
vfs: retry last component if opening stale dentry
vfs: nameidata_to_filp(): don't throw away file on error
vfs: nameidata_to_filp(): inline __dentry_open()
vfs: do_dentry_open(): don't put filp
vfs: split __dentry_open()
vfs: do_last() common post lookup
vfs: do_last(): add audit_inode before open
vfs: do_last(): only return EISDIR for O_CREAT
vfs: do_last(): check LOOKUP_DIRECTORY
vfs: do_last(): make ENOENT exit RCU safe
vfs: make follow_link check RCU safe
vfs: do_last(): use inode variable
vfs: do_last(): inline walk_component()
vfs: do_last(): make exit RCU safe
vfs: split do_lookup()
Btrfs: move over to use ->update_time
fs: introduce inode operation ->update_time
reiserfs: get rid of resierfs_sync_super
reiserfs: mark the superblock as dirty a bit later
...