linux/include
KAMEZAWA Hiroyuki 08e552c69c memcg: synchronized LRU
A big patch for changing memcg's LRU semantics.

Now,
  - page_cgroup is linked to mem_cgroup's its own LRU (per zone).

  - LRU of page_cgroup is not synchronous with global LRU.

  - page and page_cgroup is one-to-one and statically allocated.

  - To find page_cgroup is on what LRU, you have to check pc->mem_cgroup as
    - lru = page_cgroup_zoneinfo(pc, nid_of_pc, zid_of_pc);

  - SwapCache is handled.

And, when we handle LRU list of page_cgroup, we do following.

	pc = lookup_page_cgroup(page);
	lock_page_cgroup(pc); .....................(1)
	mz = page_cgroup_zoneinfo(pc);
	spin_lock(&mz->lru_lock);
	.....add to LRU
	spin_unlock(&mz->lru_lock);
	unlock_page_cgroup(pc);

But (1) is spin_lock and we have to be afraid of dead-lock with zone->lru_lock.
So, trylock() is used at (1), now. Without (1), we can't trust "mz" is correct.

This is a trial to remove this dirty nesting of locks.
This patch changes mz->lru_lock to be zone->lru_lock.
Then, above sequence will be written as

        spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU
	mem_cgroup_add/remove/etc_lru() {
		pc = lookup_page_cgroup(page);
		mz = page_cgroup_zoneinfo(pc);
		if (PageCgroupUsed(pc)) {
			....add to LRU
		}
        spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU

This is much simpler.
(*) We're safe even if we don't take lock_page_cgroup(pc). Because..
    1. When pc->mem_cgroup can be modified.
       - at charge.
       - at account_move().
    2. at charge
       the PCG_USED bit is not set before pc->mem_cgroup is fixed.
    3. at account_move()
       the page is isolated and not on LRU.

Pros.
  - easy for maintenance.
  - memcg can make use of laziness of pagevec.
  - we don't have to duplicated LRU/Active/Unevictable bit in page_cgroup.
  - LRU status of memcg will be synchronized with global LRU's one.
  - # of locks are reduced.
  - account_move() is simplified very much.
Cons.
  - may increase cost of LRU rotation.
    (no impact if memcg is not configured.)

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:05 -08:00
..
acpi trivial: fix an -> a typos in documentation and comments 2009-01-06 11:28:07 +01:00
asm-arm
asm-frv frv: introduce asm/swab.h 2009-01-06 18:10:28 -08:00
asm-generic remove linux/hardirq.h from asm-generic/local.h 2009-01-06 15:59:13 -08:00
asm-h8300
asm-m32r m32r: introduce asm/swab.h 2009-01-06 18:10:28 -08:00
asm-m68k m68k: introduce asm/swab.h 2009-01-06 18:10:27 -08:00
asm-mn10300 mn10300: introduce asm/swab.h 2009-01-06 18:10:29 -08:00
crypto crypto: aes - Precompute tables 2008-12-25 11:05:13 +11:00
drm drm: Add a debug node for vblank state. 2008-12-29 17:47:27 +10:00
keys
linux memcg: synchronized LRU 2009-01-08 08:31:05 -08:00
math-emu
media V4L/DVB (10141): v4l2: debugging API changed to match against driver name instead of ID. 2009-01-02 17:11:52 -02:00
mtd trivial: fix then -> than typos in comments and documentation 2009-01-06 11:28:06 +01:00
net wimax: headers for kernel API and user space interaction 2009-01-07 10:00:16 -08:00
pcmcia
rdma
rxrpc
scsi [SCSI] fcoe: Fibre Channel over Ethernet 2008-12-29 11:24:33 -06:00
sound Merge branch 'topic/asoc' into for-linus 2009-01-06 09:48:51 +01:00
trace sched, trace: update trace_sched_wakeup() 2008-12-25 13:10:21 +01:00
video video: sh_mobile_lcdcfb deferred io support 2008-12-22 18:44:48 +09:00
xen xen: add xenfs to allow usermode <-> Xen interaction 2009-01-08 08:30:59 -08:00
Kbuild