IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Recently 0day reported many strange performance changes (regression or
improvement), in which there was no obvious relation between the culprit
commit and the benchmark at the first look, and it causes people to doubt
the test itself is wrong.
Upon further check, many of these cases are caused by the change to the
alignment of kernel text or data, as whole text/data of kernel are linked
together, change in one domain may affect alignments of other domains.
gcc has an option '-falign-functions=n' to force text aligned, and with
that option enabled, some of those performance changes will be gone, like
[1][2][3].
Add this option so that developers and 0day can easily find performance
bump caused by text alignment change, as tracking these strange bump is
quite time consuming. Though it can't help in other cases like data
alignment changes like [4].
Following is some size data for v5.7 kernel built with a RHEL config used
in 0day:
text data bss dec filename
19738771 13292906 5554236 38585913 vmlinux.noalign
19758591 13297002 5529660 38585253 vmlinux.align32
Raw vmlinux size in bytes:
v5.7 v5.7+align32
253950832 254018000 +0.02%
Some benchmark data, most of them have no big change:
* hackbench: [ -1.8%, +0.5%]
* fsmark: [ -3.2%, +3.4%] # ext4/xfs/btrfs
* kbuild: [ -2.0%, +0.9%]
* will-it-scale: [ -0.5%, +1.8%] # mmap1/pagefault3
* netperf:
- TCP_CRR [+16.6%, +97.4%]
- TCP_RR [-18.5%, -1.8%]
- TCP_STREAM [ -1.1%, +1.9%]
[1] https://lore.kernel.org/lkml/20200114085637.GA29297@shao2-debian/
[2] https://lore.kernel.org/lkml/20200330011254.GA14393@feng-iot/
[3] https://lore.kernel.org/lkml/1d98d1f0-fe84-6df7-f5bd-f4cb2cdb7f45@intel.com/
[4] https://lore.kernel.org/lkml/20200205123216.GO12867@shao2-debian/
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@intel.com>
Link: http://lkml.kernel.org/r/1595475001-90945-1-git-send-email-feng.tang@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When onlining a first memory block in a zone, pcp lists are not updated
thus pcp struct will have the default setting of ->high = 0,->batch = 1.
This means till the second memory block in a zone(if it have) is onlined
the pcp lists of this zone will not contain any pages because pcp's
->count is always greater than ->high thus free_pcppages_bulk() is called
to free batch size(=1) pages every time system wants to add a page to the
pcp list through free_unref_page().
To put this in a word, system is not using benefits offered by the pcp
lists when there is a single onlineable memory block in a zone. Correct
this by always updating the pcp lists when memory block is onlined.
Fixes: 1f522509c7 ("mem-hotplug: avoid multiple zones sharing same boot strapping boot_pageset")
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Link: http://lkml.kernel.org/r/1596372896-15336-1-git-send-email-charante@codeaurora.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some of our servers spend significant time at kernel boot initializing
memory block sysfs directories and then creating symlinks between them and
the corresponding nodes. The slowness happens because the machines get
stuck with the smallest supported memory block size on x86 (128M), which
results in 16,288 directories to cover the 2T of installed RAM. The
search for each memory block is noticeable even with commit 4fb6eabf10
("drivers/base/memory.c: cache memory blocks in xarray to accelerate
lookup").
Commit 078eb6aa50 ("x86/mm/memory_hotplug: determine block size based on
the end of boot memory") chooses the block size based on alignment with
memory end. That addresses hotplug failures in qemu guests, but for bare
metal systems whose memory end isn't aligned to even the smallest size, it
leaves them at 128M.
Make kernels that aren't running on a hypervisor use the largest supported
size (2G) to minimize overhead on big machines. Kernel boot goes 7%
faster on the aforementioned servers, shaving off half a second.
[daniel.m.jordan@oracle.com: v3]
Link: http://lkml.kernel.org/r/20200714205450.945834-1-daniel.m.jordan@oracle.com
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200609225451.3542648-1-daniel.m.jordan@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix W=1 compile warnings (invalid kerneldoc):
mm/mmu_notifier.c:187: warning: Function parameter or member 'interval_sub' not described in 'mmu_interval_read_bgin'
mm/mmu_notifier.c:708: warning: Function parameter or member 'subscription' not described in 'mmu_notifier_registr'
mm/mmu_notifier.c:708: warning: Excess function parameter 'mn' description in 'mmu_notifier_register'
mm/mmu_notifier.c:880: warning: Function parameter or member 'subscription' not described in 'mmu_notifier_put'
mm/mmu_notifier.c:880: warning: Excess function parameter 'mn' description in 'mmu_notifier_put'
mm/mmu_notifier.c:982: warning: Function parameter or member 'ops' not described in 'mmu_interval_notifier_insert'
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: http://lkml.kernel.org/r/20200728171109.28687-4-krzk@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current_gfp_context() converts a number of PF_MEMALLOC_* per-process
flags into the corresponding GFP_* flags for memory allocation. In that
function, current->flags is accessed 3 times. That may lead to duplicated
access of the same memory location.
This is not usually a problem with minimal debug config options on as the
compiler can optimize away the duplicated memory accesses. With most of
the debug config options on, however, that may not be the case. For
example, the x86-64 object size of the __need_fs_reclaim() in a debug
kernel that calls current_gfp_context() was 309 bytes. With this patch
applied, the object size is reduced to 202 bytes. This is a saving of 107
bytes and will probably be slightly faster too.
Use READ_ONCE() to access current->flags to prevent the compiler from
possibly accessing current->flags multiple times.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michel Lespinasse <walken@google.com>
Link: http://lkml.kernel.org/r/20200618212936.9776-1-longman@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The routine cma_init_reserved_areas is designed to activate all
reserved cma areas. It quits when it first encounters an error.
This can leave some areas in a state where they are reserved but
not activated. There is no feedback to code which performed the
reservation. Attempting to allocate memory from areas in such a
state will result in a BUG.
Modify cma_init_reserved_areas to always attempt to activate all
areas. The called routine, cma_activate_area is responsible for
leaving the area in a valid state. No one is making active use
of returned error codes, so change the routine to void.
How to reproduce: This example uses kernelcore, hugetlb and cma
as an easy way to reproduce. However, this is a more general cma
issue.
Two node x86 VM 16GB total, 8GB per node
Kernel command line parameters, kernelcore=4G hugetlb_cma=8G
Related boot time messages,
hugetlb_cma: reserve 8192 MiB, up to 4096 MiB per node
cma: Reserved 4096 MiB at 0x0000000100000000
hugetlb_cma: reserved 4096 MiB on node 0
cma: Reserved 4096 MiB at 0x0000000300000000
hugetlb_cma: reserved 4096 MiB on node 1
cma: CMA area hugetlb could not be activated
# echo 8 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
...
Call Trace:
bitmap_find_next_zero_area_off+0x51/0x90
cma_alloc+0x1a5/0x310
alloc_fresh_huge_page+0x78/0x1a0
alloc_pool_huge_page+0x6f/0xf0
set_max_huge_pages+0x10c/0x250
nr_hugepages_store_common+0x92/0x120
? __kmalloc+0x171/0x270
kernfs_fop_write+0xc1/0x1a0
vfs_write+0xc7/0x1f0
ksys_write+0x5f/0xe0
do_syscall_64+0x4d/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: c64be2bb1c ("drivers: add Contiguous Memory Allocator")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Barry Song <song.bao.hua@hisilicon.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200730163123.6451-1-mike.kravetz@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>