61307 Commits

Author SHA1 Message Date
David Daney
bff2dc42bc smp.h: move !SMP version of on_each_cpu() out-of-line
All of the other non-trivial !SMP versions of functions in smp.h are
out-of-line in up.c.  Move on_each_cpu() there as well.

This allows us to get rid of the #include <linux/irqflags.h>.  The
drawback is that this makes both the x86_64 and i386 defconfig !SMP
kernels about 200 bytes larger each.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:58:25 -07:00
David Daney
fa688207c9 smp: quit unconditionally enabling irq in on_each_cpu_mask and on_each_cpu_cond
As in commit f21afc25f9ed ("smp.h: Use local_irq_{save,restore}() in
!SMP version of on_each_cpu()"), we don't want to enable irqs if they
are not already enabled.  There are currently no known problematical
callers of these functions, but since it is a known failure pattern, we
preemptively fix them.

Since they are not trivial functions, make them non-inline by moving
them to up.c.  This also makes it so we don't have to fix #include
dependancies for preempt_{disable,enable}.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:58:23 -07:00
Wanpeng Li
f9121153fd mm/hwpoison: don't need to hold compound lock for hugetlbfs page
compound lock is introduced by commit e9da73d67("thp: compound_lock."), it
is used to serialize put_page against __split_huge_page_refcount().  In
addition, transparent hugepages will be splitted in hwpoison handler and
just one subpage will be poisoned.  There is unnecessary to hold compound
lock for hugetlbfs page.  This patch replace compound_trans_order by
compond_order in the place where the page is hugetlbfs page.

Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:58:08 -07:00
Maxim Patlasov
5a53748568 mm/page-writeback.c: add strictlimit feature
The feature prevents mistrusted filesystems (ie: FUSE mounts created by
unprivileged users) to grow a large number of dirty pages before
throttling.  For such filesystems balance_dirty_pages always check bdi
counters against bdi limits.  I.e.  even if global "nr_dirty" is under
"freerun", it's not allowed to skip bdi checks.  The only use case for now
is fuse: it sets bdi max_ratio to 1% by default and system administrators
are supposed to expect that this limit won't be exceeded.

The feature is on if a BDI is marked by BDI_CAP_STRICTLIMIT flag.  A
filesystem may set the flag when it initializes its BDI.

The problematic scenario comes from the fact that nobody pays attention to
the NR_WRITEBACK_TEMP counter (i.e.  number of pages under fuse
writeback).  The implementation of fuse writeback releases original page
(by calling end_page_writeback) almost immediately.  A fuse request queued
for real processing bears a copy of original page.  Hence, if userspace
fuse daemon doesn't finalize write requests in timely manner, an
aggressive mmap writer can pollute virtually all memory by those temporary
fuse page copies.  They are carefully accounted in NR_WRITEBACK_TEMP, but
nobody cares.

To make further explanations shorter, let me use "NR_WRITEBACK_TEMP
problem" as a shortcut for "a possibility of uncontrolled grow of amount
of RAM consumed by temporary pages allocated by kernel fuse to process
writeback".

The problem was very easy to reproduce.  There is a trivial example
filesystem implementation in fuse userspace distribution: fusexmp_fh.c.  I
added "sleep(1);" to the write methods, then recompiled and mounted it.
Then created a huge file on the mount point and run a simple program which
mmap-ed the file to a memory region, then wrote a data to the region.  An
hour later I observed almost all RAM consumed by fuse writeback.  Since
then some unrelated changes in kernel fuse made it more difficult to
reproduce, but it is still possible now.

Putting this theoretical happens-in-the-lab thing aside, there is another
thing that really hurts real world (FUSE) users.  This is write-through
page cache policy FUSE currently uses.  I.e.  handling write(2), kernel
fuse populates page cache and flushes user data to the server
synchronously.  This is excessively suboptimal.  Pavel Emelyanov's patches
("writeback cache policy") solve the problem, but they also make resolving
NR_WRITEBACK_TEMP problem absolutely necessary.  Otherwise, simply copying
a huge file to a fuse mount would result in memory starvation.  Miklos,
the maintainer of FUSE, believes strictlimit feature the way to go.

And eventually putting FUSE topics aside, there is one more use-case for
strictlimit feature.  Using a slow USB stick (mass storage) in a machine
with huge amount of RAM installed is a well-known pain.  Let's make simple
computations.  Assuming 64GB of RAM installed, existing implementation of
balance_dirty_pages will start throttling only after 9.6GB of RAM becomes
dirty (freerun == 15% of total RAM).  So, the command "cp 9GB_file
/media/my-usb-storage/" may return in a few seconds, but subsequent
"umount /media/my-usb-storage/" will take more than two hours if effective
throughput of the storage is, to say, 1MB/sec.

After inclusion of strictlimit feature, it will be trivial to add a knob
(e.g.  /sys/devices/virtual/bdi/x:y/strictlimit) to enable it on demand.
Manually or via udev rule.  May be I'm wrong, but it seems to be quite a
natural desire to limit the amount of dirty memory for some devices we are
not fully trust (in the sense of sustainable throughput).

[akpm@linux-foundation.org: fix warning in page-writeback.c]
Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:58:04 -07:00
Wanpeng Li
7d9f073b8d mm/writeback: make writeback_inodes_wb static
It's not used globally and could be static.

Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:58:02 -07:00
Lisa Du
6e543d5780 mm: vmscan: fix do_try_to_free_pages() livelock
This patch is based on KOSAKI's work and I add a little more description,
please refer https://lkml.org/lkml/2012/6/14/74.

Currently, I found system can enter a state that there are lots of free
pages in a zone but only order-0 and order-1 pages which means the zone is
heavily fragmented, then high order allocation could make direct reclaim
path's long stall(ex, 60 seconds) especially in no swap and no compaciton
enviroment.  This problem happened on v3.4, but it seems issue still lives
in current tree, the reason is do_try_to_free_pages enter live lock:

kswapd will go to sleep if the zones have been fully scanned and are still
not balanced.  As kswapd thinks there's little point trying all over again
to avoid infinite loop.  Instead it changes order from high-order to
0-order because kswapd think order-0 is the most important.  Look at
73ce02e9 in detail.  If watermarks are ok, kswapd will go back to sleep
and may leave zone->all_unreclaimable =3D 0.  It assume high-order users
can still perform direct reclaim if they wish.

Direct reclaim continue to reclaim for a high order which is not a
COSTLY_ORDER without oom-killer until kswapd turn on
zone->all_unreclaimble= .  This is because to avoid too early oom-kill.
So it means direct_reclaim depends on kswapd to break this loop.

In worst case, direct-reclaim may continue to page reclaim forever when
kswapd sleeps forever until someone like watchdog detect and finally kill
the process.  As described in:
http://thread.gmane.org/gmane.linux.kernel.mm/103737

We can't turn on zone->all_unreclaimable from direct reclaim path because
direct reclaim path don't take any lock and this way is racy.  Thus this
patch removes zone->all_unreclaimable field completely and recalculates
zone reclaimable state every time.

Note: we can't take the idea that direct-reclaim see zone->pages_scanned
directly and kswapd continue to use zone->all_unreclaimable.  Because, it
is racy.  commit 929bea7c71 (vmscan: all_unreclaimable() use
zone->all_unreclaimable as a name) describes the detail.

[akpm@linux-foundation.org: uninline zone_reclaimable_pages() and zone_reclaimable()]
Cc: Aaditya Kumar <aaditya.kumar.30@gmail.com>
Cc: Ying Han <yinghan@google.com>
Cc: Nick Piggin <npiggin@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Bob Liu <lliubbo@gmail.com>
Cc: Neil Zhang <zhangwm@marvell.com>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Lisa Du <cldu@marvell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:58:01 -07:00
Vlastimil Babka
7a8010cd36 mm: munlock: manual pte walk in fast path instead of follow_page_mask()
Currently munlock_vma_pages_range() calls follow_page_mask() to obtain
each individual struct page.  This entails repeated full page table
translations and page table lock taken for each page separately.

This patch avoids the costly follow_page_mask() where possible, by
iterating over ptes within single pmd under single page table lock.  The
first pte is obtained by get_locked_pte() for non-THP page acquired by the
initial follow_page_mask().  The rest of the on-stack pagevec for munlock
is filled up using pte_walk as long as pte_present() and vm_normal_page()
are sufficient to obtain the struct page.

After this patch, a 14% speedup was measured for munlocking a 56GB large
memory area with THP disabled.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jörn Engel <joern@logfs.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:58:01 -07:00
Cyrill Gorcunov
d9104d1ca9 mm: track vma changes with VM_SOFTDIRTY bit
Pavel reported that in case if vma area get unmapped and then mapped (or
expanded) in-place, the soft dirty tracker won't be able to recognize this
situation since it works on pte level and ptes are get zapped on unmap,
loosing soft dirty bit of course.

So to resolve this situation we need to track actions on vma level, there
VM_SOFTDIRTY flag comes in.  When new vma area created (or old expanded)
we set this bit, and keep it here until application calls for clearing
soft dirty bit.

Thus when user space application track memory changes now it can detect if
vma area is renewed.

Reported-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:56 -07:00
Yinghai Lu
e76b63f80d memblock, numa: binary search node id
Current early_pfn_to_nid() on arch that support memblock go over
memblock.memory one by one, so will take too many try near the end.

We can use existing memblock_search to find the node id for given pfn,
that could save some time on bigger system that have many entries
memblock.memory array.

Here are the timing differences for several machines.  In each case with
the patch less time was spent in __early_pfn_to_nid().

                        3.11-rc5        with patch      difference (%)
                        --------        ----------      --------------
UV1: 256 nodes  9TB:     411.66          402.47         -9.19 (2.23%)
UV2: 255 nodes 16TB:    1141.02         1138.12         -2.90 (0.25%)
UV2:  64 nodes  2TB:     128.15          126.53         -1.62 (1.26%)
UV2:  32 nodes  2TB:     121.87          121.07         -0.80 (0.66%)
                        Time in seconds.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:51 -07:00
Naoya Horiguchi
83467efbdb mm: migrate: check movability of hugepage in unmap_and_move_huge_page()
Currently hugepage migration works well only for pmd-based hugepages
(mainly due to lack of testing,) so we had better not enable migration of
other levels of hugepages until we are ready for it.

Some users of hugepage migration (mbind, move_pages, and migrate_pages) do
page table walk and check pud/pmd_huge() there, so they are safe.  But the
other users (softoffline and memory hotremove) don't do this, so without
this patch they can try to migrate unexpected types of hugepages.

To prevent this, we introduce hugepage_migration_support() as an
architecture dependent check of whether hugepage are implemented on a pmd
basis or not.  And on some architecture multiple sizes of hugepages are
available, so hugepage_migration_support() also checks hugepage size.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:49 -07:00
Naoya Horiguchi
c8721bbbdd mm: memory-hotplug: enable memory hotplug to handle hugepage
Until now we can't offline memory blocks which contain hugepages because a
hugepage is considered as an unmovable page.  But now with this patch
series, a hugepage has become movable, so by using hugepage migration we
can offline such memory blocks.

What's different from other users of hugepage migration is that we need to
decompose all the hugepages inside the target memory block into free buddy
pages after hugepage migration, because otherwise free hugepages remaining
in the memory block intervene the memory offlining.  For this reason we
introduce new functions dissolve_free_huge_page() and
dissolve_free_huge_pages().

Other than that, what this patch does is straightforwardly to add hugepage
migration code, that is, adding hugepage code to the functions which scan
over pfn and collect hugepages to be migrated, and adding a hugepage
allocation function to alloc_migrate_target().

As for larger hugepages (1GB for x86_64), it's not easy to do hotremove
over them because it's larger than memory block.  So we now simply leave
it to fail as it is.

[yongjun_wei@trendmicro.com.cn: remove duplicated include]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:48 -07:00
Naoya Horiguchi
71ea2efb1e mm: migrate: remove VM_HUGETLB from vma flag check in vma_migratable()
Enable hugepage migration from migrate_pages(2), move_pages(2), and
mbind(2).

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:48 -07:00
Naoya Horiguchi
74060e4d78 mm: mbind: add hugepage migration code to mbind()
Extend do_mbind() to handle vma with VM_HUGETLB set.  We will be able to
migrate hugepage with mbind(2) after applying the enablement patch which
comes later in this series.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:48 -07:00
Naoya Horiguchi
b8ec1cee5a mm: soft-offline: use migrate_pages() instead of migrate_huge_page()
Currently migrate_huge_page() takes a pointer to a hugepage to be migrated
as an argument, instead of taking a pointer to the list of hugepages to be
migrated.  This behavior was introduced in commit 189ebff28 ("hugetlb:
simplify migrate_huge_page()"), and was OK because until now hugepage
migration is enabled only for soft-offlining which migrates only one
hugepage in a single call.

But the situation will change in the later patches in this series which
enable other users of page migration to support hugepage migration.  They
can kick migration for both of normal pages and hugepages in a single
call, so we need to go back to original implementation which uses linked
lists to collect the hugepages to be migrated.

With this patch, soft_offline_huge_page() switches to use migrate_pages(),
and migrate_huge_page() is not used any more.  So let's remove it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:47 -07:00
Naoya Horiguchi
31caf665e6 mm: migrate: make core migration code aware of hugepage
Currently hugepage migration is available only for soft offlining, but
it's also useful for some other users of page migration (clearly because
users of hugepage can enjoy the benefit of mempolicy and memory hotplug.)
So this patchset tries to extend such users to support hugepage migration.

The target of this patchset is to enable hugepage migration for NUMA
related system calls (migrate_pages(2), move_pages(2), and mbind(2)), and
memory hotplug.

This patchset does not add hugepage migration for memory compaction,
because users of memory compaction mainly expect to construct thp by
arranging raw pages, and there's little or no need to compact hugepages.
CMA, another user of page migration, can have benefit from hugepage
migration, but is not enabled to support it for now (just because of lack
of testing and expertise in CMA.)

Hugepage migration of non pmd-based hugepage (for example 1GB hugepage in
x86_64, or hugepages in architectures like ia64) is not enabled for now
(again, because of lack of testing.)

As for how these are achived, I extended the API (migrate_pages()) to
handle hugepage (with patch 1 and 2) and adjusted code of each caller to
check and collect movable hugepages (with patch 3-7).  Remaining 2 patches
are kind of miscellaneous ones to avoid unexpected behavior.  Patch 8 is
about making sure that we only migrate pmd-based hugepages.  And patch 9
is about choosing appropriate zone for hugepage allocation.

My test is mainly functional one, simply kicking hugepage migration via
each entry point and confirm that migration is done correctly.  Test code
is available here:

  git://github.com/Naoya-Horiguchi/test_hugepage_migration_extension.git

And I always run libhugetlbfs test when changing hugetlbfs's code.  With
this patchset, no regression was found in the test.

This patch (of 9):

Before enabling each user of page migration to support hugepage,
this patch enables the list of pages for migration to link not only
LRU pages, but also hugepages. As a result, putback_movable_pages()
and migrate_pages() can handle both of LRU pages and hugepages.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:46 -07:00
Joonyoung Shim
674470d979 lib/genalloc.c: fix overflow of ending address of memory chunk
In struct gen_pool_chunk, end_addr means the end address of memory chunk
(inclusive), but in the implementation it is treated as address + size of
memory chunk (exclusive), so it points to the address plus one instead of
correct ending address.

The ending address of memory chunk plus one will cause overflow on the
memory chunk including the last address of memory map, e.g.  when starting
address is 0xFFF00000 and size is 0x100000 on 32bit machine, ending
address will be 0x100000000.

Use correct ending address like starting address + size - 1.

[akpm@linux-foundation.org: add comment to struct gen_pool_chunk:end_addr]
Signed-off-by: Joonyoung Shim <jy0922.shim@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:35 -07:00
Christoph Lameter
2bb921e526 vmstat: create separate function to fold per cpu diffs into local counters
The main idea behind this patchset is to reduce the vmstat update overhead
by avoiding interrupt enable/disable and the use of per cpu atomics.

This patch (of 3):

It is better to have a separate folding function because
refresh_cpu_vm_stats() also does other things like expire pages in the
page allocator caches.

If we have a separate function then refresh_cpu_vm_stats() is only called
from the local cpu which allows additional optimizations.

The folding function is only called when a cpu is being downed and
therefore no other processor will be accessing the counters.  Also
simplifies synchronization.

[akpm@linux-foundation.org: fix UP build]
Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Tejun Heo <tj@kernel.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:31 -07:00
Joonsoo Kim
d2cf5ad631 swap: clean-up #ifdef in page_mapping()
PageSwapCache() is always false when !CONFIG_SWAP, so compiler
properly discard related code. Therefore, we don't need #ifdef explicitly.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:31 -07:00
Johannes Weiner
81c0a2bb51 mm: page_alloc: fair zone allocator policy
Each zone that holds userspace pages of one workload must be aged at a
speed proportional to the zone size.  Otherwise, the time an individual
page gets to stay in memory depends on the zone it happened to be
allocated in.  Asymmetry in the zone aging creates rather unpredictable
aging behavior and results in the wrong pages being reclaimed, activated
etc.

But exactly this happens right now because of the way the page allocator
and kswapd interact.  The page allocator uses per-node lists of all zones
in the system, ordered by preference, when allocating a new page.  When
the first iteration does not yield any results, kswapd is woken up and the
allocator retries.  Due to the way kswapd reclaims zones below the high
watermark while a zone can be allocated from when it is above the low
watermark, the allocator may keep kswapd running while kswapd reclaim
ensures that the page allocator can keep allocating from the first zone in
the zonelist for extended periods of time.  Meanwhile the other zones
rarely see new allocations and thus get aged much slower in comparison.

The result is that the occasional page placed in lower zones gets
relatively more time in memory, even gets promoted to the active list
after its peers have long been evicted.  Meanwhile, the bulk of the
working set may be thrashing on the preferred zone even though there may
be significant amounts of memory available in the lower zones.

Even the most basic test -- repeatedly reading a file slightly bigger than
memory -- shows how broken the zone aging is.  In this scenario, no single
page should be able stay in memory long enough to get referenced twice and
activated, but activation happens in spades:

  $ grep active_file /proc/zoneinfo
      nr_inactive_file 0
      nr_active_file 0
      nr_inactive_file 0
      nr_active_file 8
      nr_inactive_file 1582
      nr_active_file 11994
  $ cat data data data data >/dev/null
  $ grep active_file /proc/zoneinfo
      nr_inactive_file 0
      nr_active_file 70
      nr_inactive_file 258753
      nr_active_file 443214
      nr_inactive_file 149793
      nr_active_file 12021

Fix this with a very simple round robin allocator.  Each zone is allowed a
batch of allocations that is proportional to the zone's size, after which
it is treated as full.  The batch counters are reset when all zones have
been tried and the allocator enters the slowpath and kicks off kswapd
reclaim.  Allocation and reclaim is now fairly spread out to all
available/allowable zones:

  $ grep active_file /proc/zoneinfo
      nr_inactive_file 0
      nr_active_file 0
      nr_inactive_file 174
      nr_active_file 4865
      nr_inactive_file 53
      nr_active_file 860
  $ cat data data data data >/dev/null
  $ grep active_file /proc/zoneinfo
      nr_inactive_file 0
      nr_active_file 0
      nr_inactive_file 666622
      nr_active_file 4988
      nr_inactive_file 190969
      nr_active_file 937

When zone_reclaim_mode is enabled, allocations will now spread out to all
zones on the local node, not just the first preferred zone (which on a 4G
node might be a tiny Normal zone).

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Paul Bolle <paul.bollee@gmail.com>
Cc: Zlatko Calusic <zcalusic@bitsync.net>
Tested-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:23 -07:00
Srivatsa S. Bhat
f92310c187 mm/page_alloc.c: fix the value of fallback_migratetype in alloc_extfrag tracepoint()
In the current code, the value of fallback_migratetype that is printed
using the mm_page_alloc_extfrag tracepoint, is the value of the
migratetype *after* it has been set to the preferred migratetype (if the
ownership was changed).  Obviously that wouldn't have been the original
intent.  (We already have a separate 'change_ownership' field to tell
whether the ownership of the pageblock was changed from the
fallback_migratetype to the preferred type.)

The intent of the fallback_migratetype field is to show the migratetype
from which we borrowed pages in order to satisfy the allocation request.
So fix the code to print that value correctly.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:19 -07:00
Shaohua Li
ebc2a1a691 swap: make cluster allocation per-cpu
swap cluster allocation is to get better request merge to improve
performance.  But the cluster is shared globally, if multiple tasks are
doing swap, this will cause interleave disk access.  While multiple tasks
swap is quite common, for example, each numa node has a kswapd thread
doing swap and multiple threads/processes doing direct page reclaim.

ioscheduler can't help too much here, because tasks don't send swapout IO
down to block layer in the meantime.  Block layer does merge some IOs, but
a lot not, depending on how many tasks are doing swapout concurrently.  In
practice, I've seen a lot of small size IO in swapout workloads.

We makes the cluster allocation per-cpu here.  The interleave disk access
issue goes away.  All tasks swapout to their own cluster, so swapout will
become sequential, which can be easily merged to big size IO.  If one CPU
can't get its per-cpu cluster (for example, there is no free cluster
anymore in the swap), it will fallback to scan swap_map.  The CPU can
still continue swap.  We don't need recycle free swap entries of other
CPUs.

In my test (swap to a 2-disk raid0 partition), this improves around 10%
swapout throughput, and request size is increased significantly.

How does this impact swap readahead is uncertain though.  On one side,
page reclaim always isolates and swaps several adjancent pages, this will
make page reclaim write the pages sequentially and benefit readahead.  On
the other side, several CPU write pages interleave means the pages don't
live _sequentially_ but relatively _near_.  In the per-cpu allocation
case, if adjancent pages are written by different cpus, they will live
relatively _far_.  So how this impacts swap readahead depends on how many
pages page reclaim isolates and swaps one time.  If the number is big,
this patch will benefit swap readahead.  Of course, this is about
sequential access pattern.  The patch has no impact for random access
pattern, because the new cluster allocation algorithm is just for SSD.

Alternative solution is organizing swap layout to be per-mm instead of
this per-cpu approach.  In the per-mm layout, we allocate a disk range for
each mm, so pages of one mm live in swap disk adjacently.  per-mm layout
has potential issues of lock contention if multiple reclaimers are swap
pages from one mm.  For a sequential workload, per-mm layout is better to
implement swap readahead, because pages from the mm are adjacent in disk.
But per-cpu layout isn't very bad in this workload, as page reclaim always
isolates and swaps several pages one time, such pages will still live in
disk sequentially and readahead can utilize this.  For a random workload,
per-mm layout isn't beneficial of request merge, because it's quite
possible pages from different mm are swapout in the meantime and IO can't
be merged in per-mm layout.  while with per-cpu layout we can merge
requests from any mm.  Considering random workload is more popular in
workloads with swap (and per-cpu approach isn't too bad for sequential
workload too), I'm choosing per-cpu layout.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Shaohua Li <shli@fusionio.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Kyungmin Park <kmpark@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:17 -07:00
Shaohua Li
815c2c543d swap: make swap discard async
swap can do cluster discard for SSD, which is good, but there are some
problems here:

1. swap do the discard just before page reclaim gets a swap entry and
   writes the disk sectors.  This is useless for high end SSD, because an
   overwrite to a sector implies a discard to original sector too.  A
   discard + overwrite == overwrite.

2. the purpose of doing discard is to improve SSD firmware garbage
   collection.  Idealy we should send discard as early as possible, so
   firmware can do something smart.  Sending discard just after swap entry
   is freed is considered early compared to sending discard before write.
   Of course, if workload is already bound to gc speed, sending discard
   earlier or later doesn't make

3. block discard is a sync API, which will delay scan_swap_map()
   significantly.

4. Write and discard command can be executed parallel in PCIe SSD.
   Making swap discard async can make execution more efficiently.

This patch makes swap discard async and moves discard to where swap entry
is freed.  Discard and write have no dependence now, so above issues can
be avoided.  Idealy we should do discard for any freed sectors, but some
SSD discard is very slow.  This patch still does discard for a whole
cluster.

My test does a several round of 'mmap, write, unmap', which will trigger a
lot of swap discard.  In a fusionio card, with this patch, the test
runtime is reduced to 18% of the time without it, so around 5.5x faster.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Shaohua Li <shli@fusionio.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Kyungmin Park <kmpark@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:15 -07:00
Shaohua Li
2a8f944934 swap: change block allocation algorithm for SSD
I'm using a fast SSD to do swap.  scan_swap_map() sometimes uses up to
20~30% CPU time (when cluster is hard to find, the CPU time can be up to
80%), which becomes a bottleneck.  scan_swap_map() scans a byte array to
search a 256 page cluster, which is very slow.

Here I introduced a simple algorithm to search cluster.  Since we only
care about 256 pages cluster, we can just use a counter to track if a
cluster is free.  Every 256 pages use one int to store the counter.  If
the counter of a cluster is 0, the cluster is free.  All free clusters
will be added to a list, so searching cluster is very efficient.  With
this, scap_swap_map() overhead disappears.

This might help low end SD card swap too.  Because if the cluster is
aligned, SD firmware can do flash erase more efficiently.

We only enable the algorithm for SSD.  Hard disk swap isn't fast enough
and has downside with the algorithm which might introduce regression (see
below).

The patch slightly changes which cluster is choosen.  It always adds free
cluster to list tail.  This can help wear leveling for low end SSD too.
And if no cluster found, the scan_swap_map() will do search from the end
of last cluster.  So if no cluster found, the scan_swap_map() will do
search from the end of last free cluster, which is random.  For SSD, this
isn't a problem at all.

Another downside is the cluster must be aligned to 256 pages, which will
reduce the chance to find a cluster.  I would expect this isn't a big
problem for SSD because of the non-seek penality.  (And this is the reason
I only enable the algorithm for SSD).

Signed-off-by: Shaohua Li <shli@fusionio.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Kyungmin Park <kmpark@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:15 -07:00
Dave Hansen
6df46865ff mm: vmstats: track TLB flush stats on UP too
The previous patch doing vmstats for TLB flushes ("mm: vmstats: tlb flush
counters") effectively missed UP since arch/x86/mm/tlb.c is only compiled
for SMP.

UP systems do not do remote TLB flushes, so compile those counters out on
UP.

arch/x86/kernel/cpu/mtrr/generic.c calls __flush_tlb() directly.  This is
probably an optimization since both the mtrr code and __flush_tlb() write
cr4.  It would probably be safe to make that a flush_tlb_all() (and then
get these statistics), but the mtrr code is ancient and I'm hesitant to
touch it other than to just stick in the counters.

[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:09 -07:00
Dave Hansen
9824cf9753 mm: vmstats: tlb flush counters
I was investigating some TLB flush scaling issues and realized that we do
not have any good methods for figuring out how many TLB flushes we are
doing.

It would be nice to be able to do these in generic code, but the
arch-independent calls don't explicitly specify whether we actually need
to do remote flushes or not.  In the end, we really need to know if we
actually _did_ global vs.  local invalidations, so that leaves us with few
options other than to muck with the counters from arch-specific code.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:08 -07:00
Oleg Nesterov
ef0855d334 mm: mempolicy: turn vma_set_policy() into vma_dup_policy()
Simple cleanup.  Every user of vma_set_policy() does the same work, this
looks a bit annoying imho.  And the new trivial helper which does
mpol_dup() + vma_set_policy() to simplify the callers.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:00 -07:00
Cai Zhiyong
bab55417b1 block: support embedded device command line partition
Read block device partition table from command line.  The partition used
for fixed block device (eMMC) embedded device.  It is no MBR, save
storage space.  Bootloader can be easily accessed by absolute address of
data on the block device.  Users can easily change the partition.

This code reference MTD partition, source "drivers/mtd/cmdlinepart.c"
About the partition verbose reference
"Documentation/block/cmdline-partition.txt"

[akpm@linux-foundation.org: fix printk text]
[yongjun_wei@trendmicro.com.cn: fix error return code in parse_parts()]
Signed-off-by: Cai Zhiyong <caizhiyong@huawei.com>
Cc: Karel Zak <kzak@redhat.com>
Cc: "Wanglin (Albert)" <albert.wanglin@huawei.com>
Cc: Marius Groeger <mag@sysgo.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Artem Bityutskiy <dedekind@infradead.org>
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:56:57 -07:00
Oleg Nesterov
e1403b8edf include/linux/sched.h: don't use task->pid/tgid in same_thread_group/has_group_leader_pid
task_struct->pid/tgid should go away.

1. Change same_thread_group() to use task->signal for comparison.

2. Change has_group_leader_pid(task) to compare task_pid(task) with
   signal->leader_pid.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Sergey Dyasly <dserrg@gmail.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:56:56 -07:00
Andrew Morton
3b8967d713 include/linux/smp.h:on_each_cpu(): switch back to a C function
Revert commit c846ef7deba2 ("include/linux/smp.h:on_each_cpu(): switch
back to a macro").  It turns out that the problematic linux/irqflags.h
include was fixed within ia64 and mn10300.

Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: David Daney <david.daney@cavium.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:56:19 -07:00
Linus Torvalds
bbda1baeeb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Brown paper bag fix in HTB scheduler, class options set incorrectly
    due to a typoe.  Fix from Vimalkumar.

 2) It's possible for the ipv6 FIB garbage collector to run before all
    the necessary datastructure are setup during init, defer the
    notifier registry to avoid this problem.  Fix from Michal Kubecek.

 3) New i40e ethernet driver from the Intel folks.

 4) Add new qmi wwan device IDs, from Bjørn Mork.

 5) Doorbell lock in bnx2x driver is not initialized properly in some
    configurations, fix from Ariel Elior.

 6) Revert an ipv6 packet option padding change that broke standardized
    ipv6 implementation test suites.  From Jiri Pirko.

 7) Fix synchronization of ARP information in bonding layer, from
    Nikolay Aleksandrov.

 8) Fix missing error return resulting in illegal memory accesses in
    openvswitch, from Daniel Borkmann.

 9) SCTP doesn't signal poll events properly due to mistaken operator
    precedence, fix also from Daniel Borkmann.

10) __netdev_pick_tx() passes wrong index to sk_tx_queue_set() which
    essentially disables caching of TX queue in sockets :-/ Fix from
    Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits)
  net_sched: htb: fix a typo in htb_change_class()
  net: qmi_wwan: add new Qualcomm devices
  ipv6: don't call fib6_run_gc() until routing is ready
  net: tilegx driver: avoid compiler warning
  fib6_rules: fix indentation
  irda: vlsi_ir: Remove casting the return value which is a void pointer
  irda: donauboe: Remove casting the return value which is a void pointer
  net: fix multiqueue selection
  net: sctp: fix smatch warning in sctp_send_asconf_del_ip
  net: sctp: fix bug in sctp_poll for SOCK_SELECT_ERR_QUEUE
  net: fib: fib6_add: fix potential NULL pointer dereference
  net: ovs: flow: fix potential illegal memory access in __parse_flow_nlattrs
  bcm63xx_enet: remove deprecated IRQF_DISABLED
  net: korina: remove deprecated IRQF_DISABLED
  macvlan: Move skb_clone check closer to call
  qlcnic: Fix warning reported by kbuild test robot.
  bonding: fix bond_arp_rcv setting and arp validate desync state
  bonding: fix store_arp_validate race with mode change
  ipv6/exthdrs: accept tlv which includes only padding
  bnx2x: avoid atomic allocations during initialization
  ...
2013-09-11 14:33:16 -07:00
Michal Kubeček
2c861cc65e ipv6: don't call fib6_run_gc() until routing is ready
When loading the ipv6 module, ndisc_init() is called before
ip6_route_init(). As the former registers a handler calling
fib6_run_gc(), this opens a window to run the garbage collector
before necessary data structures are initialized. If a network
device is initialized in this window, adding MAC address to it
triggers a NETDEV_CHANGEADDR event, leading to a crash in
fib6_clean_all().

Take the event handler registration out of ndisc_init() into a
separate function ndisc_late_init() and move it after
ip6_route_init().

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-11 17:04:09 -04:00
Linus Torvalds
2b76db6a0f for-linus-3.12-merge minor 9p fixes and tweaks for 3.12 merge window
The first fixes namespace issues which causes a kernel
 NULL pointer dereference, the second fixes uevent
 handling to work better with udev, and the third
 switches some code to use srlcpy instead of strncpy
 in order to be safer.
 
 All changes have been baking in for-next for at least
 2 weeks.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 Comment: GPGTools - http://gpgtools.org
 
 iQIcBAABAgAGBQJSMJjZAAoJEDZk62b0Tg6x81sQAKa60QStBKhnL65bvG+ooIsS
 mhwfmFyaWOKw1ezwY2Vk0+JnmKDBpKmqjjwyL3nLP18TcRZStPiFdcJBKWl+czge
 FTv14t54CcjysYPbYN7+gUap4F5mfg0mcHaR0UGow505dNyjwd7mqkZhy1IqhdvP
 Ue/h0RE46GeNtdirxrKBdEfW/7TAL0tcoRgjKu0ev1V2sXCJZywuXgkzWjByRXwT
 JOg04gGnYThuek0/KUPRhf0KxB0CyKrZiics7LGb40HkYYxs7ahADACttLyiDr8l
 GntfHXLgvVlU5QcSbKRfLp0zNbi7AxWmJrwYsEwpas4tUw1Q+pVJ2EE2Ameuq5G+
 LrMGmRVQCVYw8UN+OYUO7glhXEJcCPJj6vxgm+NVXx24yaQyGI1aTsIEjHwZ/hkm
 wlQHC47z6/fIypkXpsU6pYWF/r3GwXHokYReejATQWEPIzIxvHeThe0jjqMLth7F
 zmsHZTpmECqtti1fizy5wBZD25wAIxdf+rf8nKy1VvcSN4s08ESSlC/kV/siNeko
 efFnL8xbjP5SPEVoBtXM6eTDHrQ0S+ACSGWtp0FGXKOW4PKzS60ve2Stp+FYZgQc
 WgXI7+NBU6Z9z+cZ9bsY0hrGwK1YZiR4F3KJ5ofTuxAO6n7zd+N3fGBuQJ2tiW9P
 pKtIXNozWqnAU9Wx4rGa
 =YbFT
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-3.12-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs

Pull 9p updates from Eric Van Hensbergen:
 "Minor 9p fixes and tweaks for 3.12 merge window

  The first fixes namespace issues which causes a kernel NULL pointer
  dereference, the second fixes uevent handling to work better with
  udev, and the third switches some code to use srlcpy instead of
  strncpy in order to be safer.

  All changes have been baking in for-next for at least 2 weeks"

* tag 'for-linus-3.12-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  fs/9p: avoid accessing utsname after namespace has been torn down
  9p: send uevent after adding/removing mount_tag attribute
  fs: 9p: use strlcpy instead of strncpy
2013-09-11 12:34:13 -07:00
Alex Deucher
9a71677874 drm/radeon: add some additional berlin pci ids
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2013-09-11 11:44:30 -04:00
Rafael J. Wysocki
0df03a30c3 Merge branch 'pm-cpufreq'
* pm-cpufreq:
  intel_pstate: Add Haswell CPU models
  Revert "cpufreq: make sure frequency transitions are serialized"
  cpufreq: Use signed type for 'ret' variable, to store negative error values
  cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes
  cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug
  cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock
  cpufreq: Split __cpufreq_remove_dev() into two parts
  cpufreq: Fix wrong time unit conversion
  cpufreq: serialize calls to __cpufreq_governor()
  cpufreq: don't allow governor limits to be changed when it is disabled
2013-09-11 15:23:15 +02:00
Mark Brown
2ae2caff83 Merge remote-tracking branch 'asoc/fix/rsnd' into asoc-linus 2013-09-11 11:17:17 +01:00
Mark Brown
c34c0d7684 Merge remote-tracking branch 'asoc/fix/fsl' into asoc-linus 2013-09-11 11:17:15 +01:00
Mark Brown
29dc5dd229 Merge remote-tracking branch 'asoc/fix/atmel' into asoc-linus 2013-09-11 11:17:14 +01:00
Linus Torvalds
a22a0fdba4 New drivers:
- APM X-Gene system reboot driver by Feng Kan and Loc Ho (APM).
 
 - Qualcomm MSM reboot/poweroff driver by Abhimanyu Kapur (Codeaurora).
 
 - Texas Instruments BQ24190 charger driver by Mark A. Greer (Animal Creek
   Technologies).
 
 - Texas Instruments TWL4030 MADC battery driver by Lukas Märdian and Marek
   Belisko (Golden Delicious Computers). The driver is used on Freerunner
   GTA04 phones.
 
 Highlighted fixes and improvements:
 
 - Suspend/wakeup logic improvements: power supply objects will block
   system suspend until all power supply events are processed. Thanks to
   Zoran Markovic (Linaro), Arve Hjonnevag and Todd Poynor (Google).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJSL/sOAAoJEGgI9fZJve1bDMAP/ih4C5NsuRsQxZMzTWIft80+
 S9r/co8Q/1LXgxqdMe1xq0pPgceuXCTHXjSK+ZM+/A9Qi/jFVg1S8skrAlUzMq2m
 PaRbyAVxtahU1uUIXqQLNbYO/IuRKNdXnREVt6w4K9BMjbjhsNdIRG6zOdsLxhKL
 r64sm3fqEJNRxIGuGzXPnMLNTL99uDJmMs1oPHAJwGXqWQA74K8iBHRgzi+kTAxG
 juDerMGoxU7modEQMlK4jKYj2bju4liYsrQ9pQBU1QUmHgPl7QLigXHtnkbWivrN
 nPPSJCNdtCXXtazU0ptkI717aPo0SBtEVQETF960ZTASdoqt2Ver8h4RW7dBAOIw
 tc4hdbJjZvuaDz4Os2Wy0NW+1uXVmj/Jyl9WtNtI8XYBMT0GWNN16xZZYKX303UG
 h0Myw2XhNko95g6XxQBfiat5VlH8aSIDz3LVTM1RC9zrDvjmpsI7bx+17Tw2+McK
 WdPNNHjCcgRNOIVt/vLYxQYxbJWfPf6IW2nn1v9Kc96K5WD2+3n6bn833szxFu75
 FnNghYLKkp9iXKXgGysHKoczAQa0eUxjAE/UAo10i4Eg7JmXF2Q4A+wwa2QrMutK
 FlfQo8pJGAiw0xGjkbPw8DkgBEkY8v9cJEIr0BUI6LwUoraeAjlkLjC5Im3V9bYM
 YRSzcSEESc3J/1YTGmam
 =k4RI
 -----END PGP SIGNATURE-----

Merge tag 'for-v3.12' of git://git.infradead.org/battery-2.6

Pull battery/power supply driver updates from Anton Vorontsov:
 "New drivers:

   - APM X-Gene system reboot driver by Feng Kan and Loc Ho (APM).

   - Qualcomm MSM reboot/poweroff driver by Abhimanyu Kapur (Codeaurora).

   - Texas Instruments BQ24190 charger driver by Mark A.  Greer (Animal
     Creek Technologies).

   - Texas Instruments TWL4030 MADC battery driver by Lukas Märdian and
     Marek Belisko (Golden Delicious Computers).  The driver is used on
     Freerunner GTA04 phones.

  Highlighted fixes and improvements:

   - Suspend/wakeup logic improvements: power supply objects will block
     system suspend until all power supply events are processed.  Thanks
     to Zoran Markovic (Linaro), Arve Hjonnevag and Todd Poynor (Google)"

* tag 'for-v3.12' of git://git.infradead.org/battery-2.6:
  rx51_battery: Fix channel number when reading adc value
  power: Add twl4030_madc battery driver.
  bq24190_charger: Workaround SS definition problem on i386 builds
  power_supply: Prevent suspend until power supply events are processed
  vexpress-poweroff: Should depend on the required infrastructure
  twl4030-charger: Fix compiler warning with regulator_enable()
  rx51_battery: Replace hardcoded channels values.
  bq24190_charger: Add support for TI BQ24190 Battery Charger
  ab8500-charger: We print an unintended error message
  max8925_power: Fix missing of_node_put
  power_supply: Replace strict_strtol() with kstrtol()
  power: Add APM X-Gene system reboot driver
  power_supply: tosa_battery: Get rid of irq_to_gpio usage
  power supply: collie_battery: Convert to use dev_pm_ops
  power_supply: Make goldfish_battery depend on GOLDFISH || COMPILE_TEST
  power: reset: Add msm restart support
  MAINTAINERS: drivers/power: add entry for SmartReflex AVS drivers
2013-09-10 22:58:14 -07:00
Nicholas Bellinger
2999ee7fda target/iscsi: Bump versions to v4.1.0
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-10 20:23:37 -07:00
Linus Torvalds
fa1586a7e4 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "Daniel had some fixes queued up, that were delayed, the stolen memory
  ones and vga arbiter ones are quite useful, along with his usual bunch
  of stuff, nothing for HSW outputs yet.

  The one nouveau fix is for a regression I caused with the poweroff stuff"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (30 commits)
  drm/nouveau: fix oops on runtime suspend/resume
  drm/i915: Delay disabling of VGA memory until vgacon->fbcon handoff is done
  drm/i915: try not to lose backlight CBLV precision
  drm/i915: Confine page flips to BCS on Valleyview
  drm/i915: Skip stolen region initialisation if none is reserved
  drm/i915: fix gpu hang vs. flip stall deadlocks
  drm/i915: Hold an object reference whilst we shrink it
  drm/i915: fix i9xx_crtc_clock_get for multiplied pixels
  drm/i915: handle sdvo input pixel multiplier correctly again
  drm/i915: fix hpd work vs. flush_work in the pageflip code deadlock
  drm/i915: fix up the relocate_entry refactoring
  drm/i915: Fix pipe config warnings when dealing with LVDS fixed mode
  drm/i915: Don't call sg_free_table() if sg_alloc_table() fails
  i915: Update VGA arbiter support for newer devices
  vgaarb: Fix VGA decodes changes
  vgaarb: Don't disable resources that are not owned
  drm/i915: Pin pages whilst mapping the dma-buf
  drm/i915: enable trickle feed on Haswell
  x86: add early quirk for reserving Intel graphics stolen memory v5
  drm/i915: split PCI IDs out into i915_drm.h v4
  ...
2013-09-10 20:05:57 -07:00
Linus Torvalds
cf596766fc Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "This was a very quiet cycle! Just a few bugfixes and some cleanup"

* 'nfsd-next' of git://linux-nfs.org/~bfields/linux:
  rpc: let xdr layer allocate gssproxy receieve pages
  rpc: fix huge kmalloc's in gss-proxy
  rpc: comment on linux_cred encoding, treat all as unsigned
  rpc: clean up decoding of gssproxy linux creds
  svcrpc: remove unused rq_resused
  nfsd4: nfsd4_create_clid_dir prints uninitialized data
  nfsd4: fix leak of inode reference on delegation failure
  Revert "nfsd: nfs4_file_get_access: need to be more careful with O_RDWR"
  sunrpc: prepare NFS for 2038
  nfsd4: fix setlease error return
  nfsd: nfs4_file_get_access: need to be more careful with O_RDWR
2013-09-10 20:04:59 -07:00
Nicholas Bellinger
d397a445f4 target: Add Third Party Copy (3PC) bit in INQUIRY response
This patch adds the Third Party Copy (3PC) bit to signal support
for EXTENDED_COPY within standard inquiry response data.

Also add emulate_3pc device attribute in configfs (enabled by default)
to allow the exposure of this bit to be disabled, if necessary.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Zach Brown <zab@redhat.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-10 16:48:46 -07:00
Nicholas Bellinger
cbf031f425 target: Add support for EXTENDED_COPY copy offload emulation
This patch adds support for EXTENDED_COPY emulation from SPC-3, that
enables full copy offload target support within both a single virtual
backend device, and across multiple virtual backend devices.  It also
functions independent of target fabric, and supports copy offload
across multiple target fabric ports.

This implemenation supports both EXTENDED_COPY PUSH and PULL models
of operation, so the actual CDB may be received on either source or
desination logical unit.

For Target Descriptors, it currently supports the NAA IEEE Registered
Extended designator (type 0xe4), which allows the reference of target
ports to occur independent of fabric type using EVPD 0x83 WWNs.

For Segment Descriptors, it currently supports copy from block to
block (0x02) mode.

It also honors any present SCSI reservations of the destination target
port.  Note that only Supports No List Identifier (SNLID=1) mode is
supported.

Also included is basic RECEIVE_COPY_RESULTS with service action type
OPERATING PARAMETERS (0x03) required for SNLID=1 operation.

v3 changes:
  - Fix incorrect return type in target_do_receive_copy_results()
    (Fengguang)

v2 changes:
  - Use target_alloc_sgl() instead of transport_generic_get_mem()
  - Convert debug output to use pr_debug()
  - Convert target_xcopy_parse_target_descriptors() NAA IEEN WWN
    dump to use 0x%16phN format specification
  - Drop unnecessary xcopy_pt_cmd->xpt_passthrough_wsem, and
    associated usage in xcopy_pt_write_pending() and
    target_xcopy_issue_pt_cmd()
  - Add check for unsupported EXTENDED_COPY(LID4) service action
    bits in target_do_xcopy()

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Zach Brown <zab@redhat.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-10 16:48:43 -07:00
Nicholas Bellinger
d9ea32bff2 target: Add global device list for EXTENDED_COPY
EXTENDED_COPY needs to be able to search a global list of devices
based on NAA WWN device identifiers, so add a simple g_device_list
protected by g_device_mutex.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Zach Brown <zab@redhat.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-10 16:48:40 -07:00
Nicholas Bellinger
c5ff8d6bc3 target: Make helpers non static for EXTENDED_COPY command setup
Both target_alloc_sgl() and transport_generic_map_mem_to_cmd() are
required by EXTENDED_COPY logic when setting up internally dispatched
command descriptors, so go ahead and make both of these non static.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Zach Brown <zab@redhat.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-10 16:48:39 -07:00
Nicholas Bellinger
b3faa2e87c target/tcm_qla2xxx: Add/use target_reverse_dma_direction() in target_core_fabric.h
Reversing the dma_data_direction for pci_map_sg() friends is useful
for other drivers, so move it from tcm_qla2xxx into inline code
within target_core_fabric.h.

Also drop internal usage of equivlient in tcm_qla2xxx fabric code.

Reported-by: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Giridhar Malavali <giridhar.malavali@qlogic.com>
Cc: Chad Dupuis <chad.dupuis@qlogic.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-10 16:48:35 -07:00
Nicholas Bellinger
68ff9b9b27 target: Add support for COMPARE_AND_WRITE emulation
This patch adds support for COMPARE_AND_WRITE emulation on a per block
basis.  This logic is used as an atomic test and set primative currently
used by VMWare ESX VAAI for performing array side locking of individual
VMFS extent ownership.

This includes the COMPARE_AND_WRITE CDB parsing within sbc_parse_cdb(),
and does the majority of the work within the compare_and_write_callback()
to perform the verify instance user data comparision, and subsequent
write instance user data I/O submission upon a successfull comparision.

The synchronization is enforced by se_device->caw_sem, that is obtained
before the initial READ I/O submission in sbc_compare_and_write().  The
mutex is then released upon MISCOMPARE in compare_and_write_callback(),
or upon WRITE instance user-data completion in compare_and_write_post().

The implementation currently assumes a single logical block (NoLB=1).

v4 changes:
 - Explicitly clear cmd->transport_complete_callback for two failure
   cases in sbc_compare_and_write() in order to avoid double unlock
   of ->caw_sem in compare_and_write_callback() (Dan Carpenter)

v3 changes:
 - Convert se_device->caw_mutex to ->caw_sem

v2 changes:
 - Set SCF_COMPARE_AND_WRITE and cmd->execute_cmd() to
   sbc_compare_and_write() during setup in sbc_parse_cdb()
 - Use sbc_compare_and_write() for initial READ submission with
   DMA_FROM_DEVICE
 - Reset cmd->execute_cmd() to sbc_execute_rw() for write instance
   user-data in compare_and_write_callback()
 - Drop SCF_BIDI command flag usage
 - Set TRANSPORT_PROCESSING + transport_state flags before write
   instance submission, and convert to __target_execute_cmd()
 - Prevent sbc_get_size() from being being called twice to
   generate incorrect size in sbc_parse_cdb()
 - Enforce se_device->caw_mutex synchronization between initial
   READ I/O submission, and final WRITE I/O completion.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-10 16:41:59 -07:00
Glauber Costa
5ca302c8e5 list_lru: dynamically adjust node arrays
We currently use a compile-time constant to size the node array for the
list_lru structure.  Due to this, we don't need to allocate any memory at
initialization time.  But as a consequence, the structures that contain
embedded list_lru lists can become way too big (the superblock for
instance contains two of them).

This patch aims at ameliorating this situation by dynamically allocating
the node arrays with the firmware provided nr_node_ids.

Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:32 -04:00
Dave Chinner
a0b02131c5 shrinker: Kill old ->shrink API.
There are no more users of this API, so kill it dead, dead, dead and
quietly bury the corpse in a shallow, unmarked grave in a dark forest deep
in the hills...

[glommer@openvz.org: added flowers to the grave]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Reviewed-by: Greg Thelen <gthelen@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:32 -04:00
Dave Chinner
9b17c62382 fs: convert inode and dentry shrinking to be node aware
Now that the shrinker is passing a node in the scan control structure, we
can pass this to the the generic LRU list code to isolate reclaim to the
lists on matching nodes.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@parallels.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:31 -04:00