IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Round the initrd memblock reservation to page boundaries to prevent
other data sharing the initrd pages. This prevents an allocation
possibly overlapping with the initrd, which would later get trampled
on in free_initrd_mem().
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Rather than repeatedly testing phys_initrd_size to see if the initrd
is still enabled, return from the new function to avoid executing the
remaining initialisation.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Move the ARM initrd initialisation code out of arm_memblock_init() into
its own function, so it can be cleaned up.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
The tegra DRM driver produces a harmless warning when built for NOMMU:
drivers/gpu/drm/tegra/gem.c: In function 'tegra_drm_mmap':
drivers/gpu/drm/tegra/gem.c:508:12: unused variable 'prot'
This is because pgprot_writecombine() on ARM returns a constant and
ignores its argument. The version in asm-generic doesn't have that
problem, so let's use that one instead. We don't actually care
about the value on NOMMU, and this is consistent with what some
other architectures do.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
The decompress.c file contains a declaration for strstr() so we can
include some compression library code.
With the updated LZ4 implementation, we run into the same problem again
for strlen():
In file included from ../include/linux/rcupdate.h:40:0,
from ../include/linux/srcu.h:33,
from ../include/linux/notifier.h:15,
from ../include/linux/memory_hotplug.h:6,
from ../include/linux/mmzone.h:749,
from ../include/linux/gfp.h:5,
from ../include/linux/kmod.h:22,
from ../include/linux/module.h:13,
from ../arch/arm/boot/compressed/../../../../lib/lz4/lz4_decompress.c:39,
from ../arch/arm/boot/compressed/../../../../lib/decompress_unlz4.c:13,
from ../arch/arm/boot/compressed/decompress.c:55:
include/linux/cpumask.h: In function 'cpumask_parse':
include/linux/cpumask.h:592:53: error: implicit declaration of function 'strlen';did you mean 'strstr'? [-Werror=implicit-function-declaration]
This adds another declaration to work around the new problem.
Fixes: ce83d9ab80d6 ("lib: update LZ4 compressor module")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Now, the active way setup function is called with a fixed value zero
for the second argument. The code can be simpler.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Nothing in this header file depends on <linux/types.h>.
Rather, <linux/errno.h> should be included for -ENODEV.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
According to the spec 'ELF for the ARM Architecture' (IHI 0044E),
addends for R_ARM_PREL31 relocations are 31-bit signed quantities,
so we need to sign extend the value to 32 bits before it can be used
as an offset in the calculation of the relocated value.
We have not been bitten by this because these relocations are usually
emitted against the start of a section, which means the addends never
assume negative values in practice. But it is a bug nonetheless, so fix
it.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Now that exception based address is handled dynamically for
processors with CP15, remove Hivecs configuration in assembly.
Signed-off-by: afzal mohammed <afzal.mohd.ma@gmail.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
VECTORS_BASE displays the exception base address. Now on no-MMU as
the exception base address is dynamically estimated, define
VECTORS_BASE to the variable holding it.
As it is the case, limit VECTORS_BASE constant definition to MMU.
Suggested-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: afzal mohammed <afzal.mohd.ma@gmail.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
No-MMU dynamic exception base address configuration on CP15
processors. In the case of low vectors, decision based on whether
security extensions are enabled & whether remap vectors to RAM
CONFIG option is selected.
For no-MMU without CP15, current default value of 0x0 is retained.
Signed-off-by: afzal mohammed <afzal.mohd.ma@gmail.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
For MMU configurations, VECTORS_BASE is always 0xffff0000, a macro
definition will suffice.
For no-MMU, exception base address is dynamically determined in
subsequent patches. To preserve bisectability, now make the
macro applicable for no-MMU scenario too.
Thanks to 0-DAY kernel test infrastructure that found the
bisectability issue. This macro will be restricted to MMU case upon
dynamically determining exception base address for no-MMU.
Once exception address is handled dynamically for no-MMU,
VECTORS_BASE can be removed from Kconfig.
Signed-off-by: afzal mohammed <afzal.mohd.ma@gmail.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Similar to c68b0274fb ("ARM: reduce "Booted secondary processor"
message to debug level"), demote the "CPU: shutdown" pr_notice() into a
pr_debug().
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
All low-level PM/SMP code using virt_to_phys() should actually use
__pa_symbol() against kernel symbols. Update code where relevant to move
away from virt_to_phys().
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
x86 has an option: CONFIG_DEBUG_VIRTUAL to do additional checks on
virt_to_phys calls. The goal is to catch users who are calling
virt_to_phys on non-linear addresses immediately. This includes caller
using __virt_to_phys() on image addresses instead of __pa_symbol(). This
is a generally useful debug feature to spot bad code (particulary in
drivers).
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
In preparation for adding CONFIG_DEBUG_VIRTUAL support, define a set of
common constants: KERNEL_START and KERNEL_END which abstract
CONFIG_XIP_KERNEL vs. !CONFIG_XIP_KERNEL. Update the code where
relevant.
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
In preparation for defining KERNEL_START on ARM, rename KERNEL_START to
PART_KERNEL_START, and to be consistent, do this for all
partition-related constants.
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
adjust_lowmem_bounds is responsible for setting up the boundary for
lowmem/highmem. This needs to be setup before memblock reservations can
occur. At the time memblock reservations can occur, memory can also be
removed from the system. The lowmem/highmem boundary and end of memory
may be affected by this but it is currently not recalculated. On some
systems this may be harmless, on others this may result in incorrect
ranges being passed to the main memory allocator. Correct this by
recalculating the lowmem/highmem boundary after all reservations have
been made.
Tested-by: Magnus Lilja <lilja.magnus@gmail.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The logic for sanity_check_meminfo has become difficult to
follow. Clean up the code so it's more obvious what the code
is actually trying to do. Additionally, meminfo is now removed
so rename the function to better describe its purpose.
Tested-by: Magnus Lilja <lilja.magnus@gmail.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The 64-bit get_user() wasn't clearing the high word due to a typo in the
error handler. The exception handler entry was already correct, though.
Noticed during recent usercopy test additions in lib/test_user_copy.c.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
In commit 76624175dc ("arm64: uaccess: consistently check object sizes"),
the object size checks are moved outside the access_ok() so that bad
destinations are detected before hitting the "memset(dest, 0, size)" in the
copy_from_user() failure path.
This makes the same change for arm, with attention given to possibly
extracting the uaccess routines into a common header file for all
architectures in the future.
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Clean up arch/arm/mm/Kconfig a little to provide a symbol which
indicates whether the CPU may support the Thumb instruction set. This
gets rid of the growing dependencies on ARM_THUMB, and also gives us a
useful Kconfig symbol for choosing the kuser code.
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Ensure that if userspace supplies insufficient data to
PTRACE_SETREGSET to fill all the registers, the thread's old
registers are preserved.
Cc: <stable@vger.kernel.org> # 3.0.x-
Fixes: 5be6f62b00 ("ARM: 6883/1: ptrace: Migrate to regsets framework")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Asynchronous external abort is coded differently in DFSR with LPAE enabled.
Fixes: 9254970c "ARM: 8447/1: catch pending imprecise abort on unmask".
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The following patch was sketched by Russell in response to my
crashes on the PB11MPCore after the patch for software-based
priviledged no access support for ARMv8.1. See this thread:
http://marc.info/?l=linux-arm-kernel&m=144051749807214&w=2
I am unsure what is going on, I suspect everyone involved in
the discussion is. I just want to repost this to get the
discussion restarted, as I still have to apply this patch
with every kernel iteration to get my PB11MPCore Realview
running.
Testing by Neil Armstrong on the Oxnas NAS has revealed that
this bug exist also on that widely deployed hardware, so
we are probably currently regressing all ARM11MPCore systems.
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: a5e090acbf ("ARM: software-based priviledged-no-access support")
Tested-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Update my entries in the MAINTAINERS file with the same email address
for kernel work, and, now that the git tree is hosted on more suitable
hardware, add git tree references where appropriate.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Due to the way kbuild works, this header was unintentionally exported
back in 2013 when it was created, despite it not being in a uapi/
directory. This is very non-intuitive behaviour by Kbuild.
However, we've had this include exported to userland for almost four
years, and searching google for "ARM types.h __UINTPTR_TYPE__" gives
no hint that anyone has complained about it. So, let's make it
officially exported in this state.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
__pa_symbol is technically the macro that should be used for kernel
symbols. Switch to this as a pre-requisite for DEBUG_VIRTUAL which
will do bounds checking.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The usercopy checking code currently calls __va(__pa(...)) to check for
aliases on symbols. Switch to using lm_alias instead.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
__pa_symbol is the correct API to find the physical address of symbols.
Switch to it to allow for debugging APIs to work correctly. Other
functions such as p*d_populate may call __pa internally. Ensure that the
address passed is in the linear region by calling lm_alias.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
__pa_symbol is the correct api to get the physical address of kernel
symbols. Switch to it to allow for better debug checking.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Certain architectures may have the kernel image mapped separately to
alias the linear map. Introduce a macro lm_alias to translate a kernel
image symbol into its linear alias. This is used in part with work to
add CONFIG_DEBUG_VIRTUAL support for arm64.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
6b101e2a3c ("mm/CMA: fix boot regression due to physical address of
high_memory") added checks to use __pa_nodebug on x86 since
CONFIG_DEBUG_VIRTUAL complains about high_memory not being linearlly
mapped. arm64 is now getting support for CONFIG_DEBUG_VIRTUAL as well.
Rather than add an explosion of arches to the #ifdef, switch to an
alternate method to calculate the physical start of highmem using
the page before highmem starts. This avoids the need for the #ifdef and
extra __pa_nodebug calls.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
DEBUG_VIRTUAL currently depends on DEBUG_KERNEL && X86. arm64 is getting
the same support. Rather than add a list of architectures, switch this
to ARCH_HAS_DEBUG_VIRTUAL and let architectures select it as
appropriate.
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
REMAP_VECTORS_TO_RAM depends on DRAM_BASE, but since DRAM_BASE is a
hex, REMAP_VECTORS_TO_RAM could never get enabled. Also depending on
DRAM_BASE is redundant as whenever REMAP_VECTORS_TO_RAM makes itself
available to Kconfig, DRAM_BASE also is available as the Kconfig
gets sourced on !MMU.
Signed-off-by: Afzal Mohammed <afzal.mohd.ma@gmail.com>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
commit ab6494f0c9 ("nommu: Add noMMU support to the DMA API") have
add CONFIG_MMU compilation flag but that prohibit to use dma_mmap_wc()
when the platform doesn't have MMU.
This patch call vm_iomap_memory() in noMMU case to test if addresses
are correct and set vma->vm_flags rather than all return an error.
Signed-off-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
On APQ8060, the kernel crashes in arch_hw_breakpoint_init, taking an
undefined instruction trap within write_wb_reg. This is because Scorpion
CPUs erroneously appear to set DBGPRSR.SPD when WFI is issued, even if
the core is not powered down. When DBGPRSR.SPD is set, breakpoint and
watchpoint registers are treated as undefined.
It's possible to trigger similar crashes later on from userspace, by
requesting the kernel to install a breakpoint or watchpoint, as we can
go idle at any point between the reset of the debug registers and their
later use. This has always been the case.
Given that this has always been broken, no-one has complained until now,
and there is no clear workaround, disable hardware breakpoints and
watchpoints on Scorpion to avoid these issues.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
ARM has a few system calls (most notably mmap) for which the names of
the functions which are referenced in the syscall table do not match the
names of the syscall tracepoints. As a consequence of this, these
tracepoints are not made available. Implement
arch_syscall_match_sym_name to fix this and allow tracing even these
system calls.
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
When the data cache is PIPT or VIPT non-aliasing, and cache operations
are broadcast by the hardware, we can always postpone the flush in
flush_dcache_page(). A similar change was done for ARM64 in commit
b5b6c9e914 ("arm64: Avoid cache flushing in flush_dcache_page()").
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pull DAX updates from Dan Williams:
"The completion of Jan's DAX work for 4.10.
As I mentioned in the libnvdimm-for-4.10 pull request, these are some
final fixes for the DAX dirty-cacheline-tracking invalidation work
that was merged through the -mm, ext4, and xfs trees in -rc1. These
patches were prepared prior to the merge window, but we waited for
4.10-rc1 to have a stable merge base after all the prerequisites were
merged.
Quoting Jan on the overall changes in these patches:
"So I'd like all these 6 patches to go for rc2. The first three
patches fix invalidation of exceptional DAX entries (a bug which
is there for a long time) - without these patches data loss can
occur on power failure even though user called fsync(2). The other
three patches change locking of DAX faults so that ->iomap_begin()
is called in a more relaxed locking context and we are safe to
start a transaction there for ext4"
These have received a build success notification from the kbuild
robot, and pass the latest libnvdimm unit tests. There have not been
any -next releases since -rc1, so they have not appeared there"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
ext4: Simplify DAX fault path
dax: Call ->iomap_begin without entry lock during dax fault
dax: Finish fault completely when loading holes
dax: Avoid page invalidation races and unnecessary radix tree traversals
mm: Invalidate DAX radix tree entries only if appropriate
ext2: Return BH_New buffers for zeroed blocks
- A merge error on my part broke the DocBook build. I've requisitioned
one of tglx's frozen sharks for appropriate disciplinary action and
resolved to be more careful about testing the DocBook stuff as long as
it's still around.
- Fix an error in unaligned-memory-access.txt
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYZoL/AAoJEI3ONVYwIuV6zCQP/2q0yjKH0qVe+FQqjqBkAcfs
m6B/8QXKLEiTfS4sthvIa2iQgN6ZKWTg8DyVndXoGhE5qgZZAuctxxNj2nwpt0XD
jjTrfLVhyASXCCCfSDPcujtgrQElNOm814+ZsAo5e4KCJk1V7Ap1NlwJpd6beld5
ac6INH3r8X3Qgm2BNbiMJ3VApubcjAkjJRi75mc1JIZGtIAf8ePzjkKVpGyTsaaM
qQQDbx5mXNrYPXFJYHgCOGcVGkWRAqIcDiYscS6XHpmj5DWfz/x6OdicdeB9l7b4
Jw4TWVNTXTWz0THRUD74N1SdHwwetnWnZ3abcaorFIA/yr8tw1OuMmEqGn5VNGX5
Bxk8YYMjjezH4y7+XC7IGhZ1dcrGrdc3AJTqZYIoTOI+HxGJ4RjZsqHghgHy3Qvi
YgLumVmXLQQ6gSTE7sRixq/2hpLUlUrR0po+zQcwO/8Ka0JA860qhYcTjNzRdO1s
csQIJtybytbtX28lz4eMxadFcwTtMyYMyVTslUsP/HJQefNcpPz97OQ0zV7UTmi8
zpHRBo5GnDJNL0u/R6tNDsDo5XaXzy+uR/HDUz/oBn7QofhtrUsgBX2DHeV6pqr7
guvfVVhQ4I3r7/BGyFaiXf28W9f3ADwE1jbibLjEA2IrLxaClIGLxLS/fGql5MAI
FQhqlQhsKFeVTAxMtC9j
=W4e5
-----END PGP SIGNATURE-----
Merge tag 'docs-4.10-rc1-fix' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"Two small fixes:
- A merge error on my part broke the DocBook build. I've
requisitioned one of tglx's frozen sharks for appropriate
disciplinary action and resolved to be more careful about testing
the DocBook stuff as long as it's still around.
- Fix an error in unaligned-memory-access.txt"
* tag 'docs-4.10-rc1-fix' of git://git.lwn.net/linux:
Documentation/unaligned-memory-access.txt: fix incorrect comparison operator
docs: Fix build failure
Pull crypto fix from Herbert Xu:
"This fixes a boot failure on some platforms when crypto self test is
enabled along with the new acomp interface"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: testmgr - Use heap buffer for acomp test input
mm/filemap.c: In function 'clear_bit_unlock_is_negative_byte':
mm/filemap.c:933:9: error: too few arguments to function 'test_bit'
return test_bit(PG_waiters);
^~~~~~~~
Fixes: b91e1302ad ('mm: optimize PageWaiters bit use for unlock_page()')
Signed-off-by: Olof Johansson <olof@lixom.net>
Brown-paper-bag-by: Linus Torvalds <dummy@duh.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit 6290602709 ("mm: add PageWaiters indicating tasks are
waiting for a page bit") Nick Piggin made our page locking no longer
unconditionally touch the hashed page waitqueue, which not only helps
performance in general, but is particularly helpful on NUMA machines
where the hashed wait queues can bounce around a lot.
However, the "clear lock bit atomically and then test the waiters bit"
sequence turns out to be much more expensive than it needs to be,
because you get a nasty stall when trying to access the same word that
just got updated atomically.
On architectures where locking is done with LL/SC, this would be trivial
to fix with a new primitive that clears one bit and tests another
atomically, but that ends up not working on x86, where the only atomic
operations that return the result end up being cmpxchg and xadd. The
atomic bit operations return the old value of the same bit we changed,
not the value of an unrelated bit.
On x86, we could put the lock bit in the high bit of the byte, and use
"xadd" with that bit (where the overflow ends up not touching other
bits), and look at the other bits of the result. However, an even
simpler model is to just use a regular atomic "and" to clear the lock
bit, and then the sign bit in eflags will indicate the resulting state
of the unrelated bit #7.
So by moving the PageWaiters bit up to bit #7, we can atomically clear
the lock bit and test the waiters bit on x86 too. And architectures
with LL/SC (which is all the usual RISC suspects), the particular bit
doesn't matter, so they are fine with this approach too.
This avoids the extra access to the same atomic word, and thus avoids
the costly stall at page unlock time.
The only downside is that the interface ends up being a bit odd and
specialized: clear a bit in a byte, and test the sign bit. Nick doesn't
love the resulting name of the new primitive, but I'd rather make the
name be descriptive and very clear about the limitation imposed by
trying to work across all relevant architectures than make it be some
generic thing that doesn't make the odd semantics explicit.
So this introduces the new architecture primitive
clear_bit_unlock_is_negative_byte();
and adds the trivial implementation for x86. We have a generic
non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)"
combination) which can be overridden by any architecture that can do
better. According to Nick, Power has the same hickup x86 has, for
example, but some other architectures may not even care.
All these optimizations mean that my page locking stress-test (which is
just executing a lot of small short-lived shell scripts: "make test" in
the git source tree) no longer makes our page locking look horribly bad.
Before all these optimizations, just the unlock_page() costs were just
over 3% of all CPU overhead on "make test". After this, it's down to
0.66%, so just a quarter of the cost it used to be.
(The difference on NUMA is bigger, but there this micro-optimization is
likely less noticeable, since the big issue on NUMA was not the accesses
to 'struct page', but the waitqueue accesses that were already removed
by Nick's earlier commit).
Acked-by: Nick Piggin <npiggin@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull crypto fix from Herbert Xu:
"This fixes a hash corruption bug in the marvell driver"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: marvell - Copy IVDIG before launching partial DMA ahash requests
Pull networking fixes from David Miller:
1) Various ipvlan fixes from Eric Dumazet and Mahesh Bandewar.
The most important is to not assume the packet is RX just because
the destination address matches that of the device. Such an
assumption causes problems when an interface is put into loopback
mode.
2) If we retry when creating a new tc entry (because we dropped the
RTNL mutex in order to load a module, for example) we end up with
-EAGAIN and then loop trying to replay the request. But we didn't
reset some state when looping back to the top like this, and if
another thread meanwhile inserted the same tc entry we were trying
to, we re-link it creating an enless loop in the tc chain. Fix from
Daniel Borkmann.
3) There are two different WRITE bits in the MDIO address register for
the stmmac chip, depending upon the chip variant. Due to a bug we
could set them both, fix from Hock Leong Kweh.
4) Fix mlx4 bug in XDP_TX handling, from Tariq Toukan.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net: stmmac: fix incorrect bit set in gmac4 mdio addr register
r8169: add support for RTL8168 series add-on card.
net: xdp: remove unused bfp_warn_invalid_xdp_buffer()
openvswitch: upcall: Fix vlan handling.
ipv4: Namespaceify tcp_tw_reuse knob
net: korina: Fix NAPI versus resources freeing
net, sched: fix soft lockup in tc_classify
net/mlx4_en: Fix user prio field in XDP forward
tipc: don't send FIN message from connectionless socket
ipvlan: fix multicast processing
ipvlan: fix various issues in ipvlan_process_multicast()
In the actual implementation ether_addr_equal function tests for equality to 0
when returning. It seems in commit 0d74c4 it is somehow overlooked to change
this operator to reflect the actual function.
Signed-off-by: Cihangir Akturk <cakturk@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>