linux

iv/linux

History

Michal Hocko 4165b9b461 hugetlb: do not account hugetlb pages as NR_FILE_PAGES hugetlb pages uses add_to_page_cache to track shared mappings. This is OK from the data structure point of view but it is less so from the NR_FILE_PAGES accounting: - huge pages are accounted as 4k which is clearly wrong - this counter is used as the amount of the reclaimable page cache which is incorrect as well because hugetlb pages are special and not reclaimable - the counter is then exported to userspace via /proc/meminfo (in Cached:), /proc/vmstat and /proc/zoneinfo as nr_file_pages which is confusing at least: Cached: 8883504 kB HugePages_Free: 8348 ... Cached: 8916048 kB HugePages_Free: 156 ... thats 8192 huge pages allocated which is ~16G accounted as 32M There are usually not that many huge pages in the system for this to make any visible difference e.g. by fooling __vm_enough_memory or zone_pagecache_reclaimable. Fix this by special casing huge pages in both __delete_from_page_cache and __add_to_page_cache_locked. replace_page_cache_page is currently only used by fuse and that shouldn't touch hugetlb pages AFAICS but it is more robust to check for special casing there as well. Hugetlb pages shouldn't get to any other paths where we do accounting: - migration - we have a special handling via hugetlbfs_migrate_page - shmem - doesn't handle hugetlb pages directly even for SHM_HUGETLB resp. MAP_HUGETLB - swapcache - hugetlb is not swapable This has a user visible effect but I believe it is reasonable because the previously exported number is simply bogus. An alternative would be to account hugetlb pages with their real size and treat them similar to shmem. But this has some drawbacks. First we would have to special case in kernel users of NR_FILE_PAGES and considering how hugetlb is special we would have to do it everywhere. We do not want Cached exported by /proc/meminfo to include it because the value would be even more misleading. __vm_enough_memory and zone_pagecache_reclaimable would have to do the same thing because those pages are simply not reclaimable. The correction is even not trivial because we would have to consider all active hugetlb page sizes properly. Users of the counter outside of the kernel would have to do the same. So the question is why to account something that needs to be basically excluded for each reasonable usage. This doesn't make much sense to me. It seems that this has been broken since hugetlb was introduced but I haven't checked the whole history. [akpm@linux-foundation.org: tweak comments] Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Tested-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2015-06-24 17:49:43 -07:00
..
kasan	mm/mempool.c: kasan: poison mempool elements	2015-04-15 16:35:20 -07:00
backing-dev.c	block: discard bdi_unregister() in favour of bdi_destroy()	2015-05-28 10:12:42 -06:00
balloon_compaction.c	mm/balloon_compaction: fix deflation when compaction is disabled	2014-10-29 16:33:15 -07:00
bootmem.c	mem-hotplug: reset node managed pages when hot-adding a new pgdat	2014-11-13 16:17:06 -08:00
cleancache.c	cleancache: remove limit on the number of cleancache enabled filesystems	2015-04-14 16:49:03 -07:00
cma_debug.c	mm/cma_debug.c: remove blank lines before DEFINE_SIMPLE_ATTRIBUTE()	2015-04-15 16:35:20 -07:00
cma.c	mm: cma: add trace events for CMA allocations and freeings	2015-04-15 16:35:19 -07:00
cma.h	mm: cma: allocation trigger	2015-04-14 16:49:00 -07:00
compaction.c	mm/compaction.c: fix "suitable_migration_target() unused" warning	2015-04-15 16:35:20 -07:00
debug-pagealloc.c	mm/debug-pagealloc: make debug-pagealloc boottime configurable	2014-12-13 12:42:48 -08:00
debug.c	mm: account pmd page tables to the process	2015-02-11 17:06:04 -08:00
dmapool.c	mm/dmapool.c: fixed a brace coding style issue	2014-10-09 22:26:00 -04:00
early_ioremap.c
fadvise.c	vfs: remove get_xip_mem	2015-02-16 17:56:03 -08:00
failslab.c
filemap.c	hugetlb: do not account hugetlb pages as NR_FILE_PAGES	2015-06-24 17:49:43 -07:00
frontswap.c	mm/frontswap.c: fix the condition in BUG_ON	2014-12-10 17:41:08 -08:00
gup.c	mm: use READ_ONCE() for non-scalar types	2015-04-15 16:35:18 -07:00
highmem.c	mm/highmem: make kmap cache coloring aware	2014-08-06 18:01:22 -07:00
huge_memory.c	thp: cleanup how khugepaged enters freezer	2015-06-24 17:49:41 -07:00
hugetlb_cgroup.c	mm: page_counter: pull "-1" handling out of page_counter_memparse()	2015-02-11 17:06:02 -08:00
hugetlb.c	mm/hugetlb: introduce minimum hugepage order	2015-06-24 17:49:42 -07:00
hwpoison-inject.c	mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling	2015-06-24 17:49:42 -07:00
init-mm.c
internal.h	mm: remove rest of ACCESS_ONCE() usages	2015-04-15 16:35:18 -07:00
interval_tree.c	mm: replace vma->sharead.linear with vma->shared	2015-02-10 14:30:31 -08:00
Kconfig	mm: cma: debugfs interface	2015-04-14 16:49:00 -07:00
Kconfig.debug	mm/debug_pagealloc: remove obsolete Kconfig options	2015-01-08 15:10:52 -08:00
kmemcheck.c	mm/slab_common: move kmem_cache definition to internal header	2014-10-09 22:25:50 -04:00
kmemleak-test.c	mm/kmemleak-test.c: use pr_fmt for logging	2014-06-06 16:08:18 -07:00
kmemleak.c	gfp: add __GFP_NOACCOUNT	2015-05-14 17:55:51 -07:00
ksm.c	mm: remove rest of ACCESS_ONCE() usages	2015-04-15 16:35:18 -07:00
list_lru.c	memcg: reparent list_lrus and free kmemcg_id on css offline	2015-02-12 18:54:10 -08:00
maccess.c
madvise.c	vfs: remove get_xip_mem	2015-02-16 17:56:03 -08:00
Makefile	mm: move memtest under mm	2015-04-14 16:49:06 -07:00
memblock.c	mm/memblock.c: add debug output for memblock_add()	2015-04-15 16:35:19 -07:00
memcontrol.c	mm: oom_kill: simplify OOM killer locking	2015-06-24 17:49:43 -07:00
memory_hotplug.c	mm/memory_hotplug.c: set zone->wait_table to null after freeing it	2015-06-10 16:43:43 -07:00
memory-failure.c	mm/memory-failure: me_huge_page() does nothing for thp	2015-06-24 17:49:42 -07:00
memory.c	sched/preempt, mm/fault: Trigger might_sleep() in might_fault() with disabled pagefaults	2015-05-19 08:39:14 +02:00
mempolicy.c	mm, numa: really disable NUMA balancing by default on single node machines	2015-05-14 17:55:51 -07:00
mempool.c	mm/mempool.c: kasan: poison mempool elements	2015-04-15 16:35:20 -07:00
memtest.c	memtest: use phys_addr_t for physical addresses	2015-04-14 16:49:06 -07:00
migrate.c	mm: soft-offline: don't free target page in successful page migration	2015-06-24 17:49:42 -07:00
mincore.c	mincore: apply page table walker on do_mincore()	2015-02-11 17:06:06 -08:00
mlock.c	mm: move mm_populate()-related code to mm/gup.c	2015-04-14 16:49:00 -07:00
mm_init.c	mm/mm_init.c: mark mminit_loglevel __meminitdata	2015-02-12 18:54:11 -08:00
mmap.c	mm/mmap.c: use while instead of if+goto	2015-04-15 16:35:19 -07:00
mmu_context.c
mmu_notifier.c	mmu_notifier: add the callback for mmu_notifier_invalidate_range()	2014-11-13 13:46:09 +11:00
mmzone.c	mm: microoptimize zonelist operations	2015-02-11 17:06:02 -08:00
mprotect.c	mm: fix mprotect() behaviour on VM_LOCKED VMAs	2015-06-24 17:49:41 -07:00
mremap.c	mm: new arch_remap() hook	2015-06-24 17:49:41 -07:00
msync.c	mm: remove rest usage of VM_NONLINEAR and pte_file()	2015-02-10 14:30:31 -08:00
nobootmem.c	mem-hotplug: reset node managed pages when hot-adding a new pgdat	2014-11-13 16:17:06 -08:00
nommu.c	nommu: use __vfs_read()	2015-04-11 22:27:56 -04:00
oom_kill.c	mm: oom_kill: simplify OOM killer locking	2015-06-24 17:49:43 -07:00
page_alloc.c	mm: page_alloc: inline should_alloc_retry()	2015-06-24 17:49:43 -07:00
page_counter.c	mm: page_counter: pull "-1" handling out of page_counter_memparse()	2015-02-11 17:06:02 -08:00
page_ext.c	mm/page_owner: keep track of page owners	2014-12-13 12:42:48 -08:00
page_io.c	direct_IO: remove rw from a_ops->direct_IO()	2015-04-11 22:29:45 -04:00
page_isolation.c	CMA: page_isolation: check buddy before accessing it	2015-05-14 17:55:51 -07:00
page_owner.c	mm/page_owner.c: remove unnecessary stack_trace field	2015-02-11 17:06:07 -08:00
page-writeback.c	writeback: use \|1 instead of +1 to protect against div by zero	2015-04-23 10:36:33 -06:00
pagewalk.c	mm/pagewalk.c: prevent positive return value of walk_page_test() from being passed to callers	2015-03-25 16:20:30 -07:00
percpu-km.c	percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated	2014-09-02 14:46:05 -04:00
percpu-vm.c	percpu: move region iterations out of pcpu_[de]populate_chunk()	2014-09-02 14:46:02 -04:00
percpu.c	percpu: Fix trivial typos in comments	2015-03-24 13:41:54 -04:00
pgtable-generic.c	mm: convert p[te\|md]_mknonnuma and remaining page table manipulations	2015-02-12 18:54:08 -08:00
process_vm_access.c	process_vm_access: switch to {compat_,}import_iovec()	2015-04-11 22:27:12 -04:00
quicklist.c
readahead.c	fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info	2015-01-20 14:03:04 -07:00
rmap.c	rmap: fix theoretical race between do_wp_page and shrink_active_list	2015-06-24 17:49:42 -07:00
shmem.c	Merge branch 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2015-06-22 12:51:21 -07:00
slab_common.c	slab: correct size_index table before replacing the bootstrap kmem_cache_node	2015-06-24 17:49:41 -07:00
slab.c	slab: correct size_index table before replacing the bootstrap kmem_cache_node	2015-06-24 17:49:41 -07:00
slab.h	slab: correct size_index table before replacing the bootstrap kmem_cache_node	2015-06-24 17:49:41 -07:00
slob.c	slob: make slob_alloc_node() static and remove EXPORT_SYMBOL()	2015-04-14 16:48:59 -07:00
slub.c	slab: correct size_index table before replacing the bootstrap kmem_cache_node	2015-06-24 17:49:41 -07:00
sparse-vmemmap.c
sparse.c
swap_cgroup.c	mm: page_cgroup: rename file to mm/swap_cgroup.c	2014-12-10 17:41:09 -08:00
swap_state.c	mm: remove rest of ACCESS_ONCE() usages	2015-04-15 16:35:18 -07:00
swap.c	mm: drop bogus VM_BUG_ON_PAGE assert in put_page() codepath	2015-06-24 17:49:42 -07:00
swapfile.c	mm: remove rest of ACCESS_ONCE() usages	2015-04-15 16:35:18 -07:00
truncate.c	mm: rename deactivate_page to deactivate_file_page	2015-04-15 16:35:17 -07:00
util.c	mm: uninline and cleanup page-mapping related helpers	2015-04-15 16:35:19 -07:00
vmacache.c	mm,vmacache: count number of system-wide flushes	2014-12-13 12:42:48 -08:00
vmalloc.c	mm/vmalloc: get rid of dirty bitmap inside vmap_block structure	2015-04-15 16:35:18 -07:00
vmpressure.c	mm/vmpressure.c: fix race in vmpressure_work_fn()	2014-12-02 17:32:07 -08:00
vmscan.c	mm: rename RECLAIM_SWAP to RECLAIM_UNMAP	2015-06-24 17:49:42 -07:00
vmstat.c	vmstat: Reduce time interval to stat update on idle cpu	2015-02-11 17:06:07 -08:00
workingset.c	list_lru: add helpers to isolate items	2015-02-12 18:54:10 -08:00
zbud.c	mm/zpool: add name argument to create zpool	2015-02-12 18:54:12 -08:00
zpool.c	mm/zpool: add name argument to create zpool	2015-02-12 18:54:12 -08:00
zsmalloc.c	zsmalloc: fix a null pointer dereference in destroy_handle_cache()	2015-06-10 16:43:43 -07:00
zswap.c	mm/zpool: add name argument to create zpool	2015-02-12 18:54:12 -08:00