IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Pull perf fixes from Ingo Molnar:
"I'd like to apologize for this very late pull request: I was dithering
through the week whether to send the fixes, and then yesterday Jiri's
crash fix for a regression introduced in this cycle clearly marked
perf/urgent as 'must merge now'.
Most of the commits are tooling fixes, plus there's three kernel fixes
via four commits:
- race fix in the Intel PEBS code
- fix an AUX bug and roll back a previous attempt
- fix AMD family 17h generic HW cache-event perf counters
The largest diffstat contribution comes from the AMD fix - a new event
table is introduced, which is a fairly low risk change but has a large
linecount"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix race in intel_pmu_disable_event()
perf/x86/intel/pt: Remove software double buffering PMU capability
perf/ring_buffer: Fix AUX software double buffering
perf tools: Remove needless asm/unistd.h include fixing build in some places
tools arch uapi: Copy missing unistd.h headers for arc, hexagon and riscv
tools build: Add -ldl to the disassembler-four-args feature test
perf cs-etm: Always allocate memory for cs_etm_queue::prev_packet
perf cs-etm: Don't check cs_etm_queue::prev_packet validity
perf report: Report OOM in status line in the GTK UI
perf bench numa: Add define for RUSAGE_THREAD if not present
tools lib traceevent: Change tag string for error
perf annotate: Fix build on 32 bit for BPF annotation
tools uapi x86: Sync vmx.h with the kernel
perf bpf: Return value with unlocking in perf_env__find_btf()
MAINTAINERS: Include vendor specific files under arch/*/events/*
perf/x86/amd: Update generic hardware cache events for Family 17h
Pull scheduler fix from Ingo Molnar:
"Fix a kobject memory leak in the cpufreq code"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/cpufreq: Fix kobject memleak
Pull x86 fix from Ingo Molnar:
"Disable function tracing during early SME setup to fix a boot crash on
SME-enabled kernels running distro kernels (some of which have
function tracing enabled)"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/mem_encrypt: Disable all instrumentation for early SME setup
Poking-mm initialization might require to duplicate the PGD in early
stage. Initialize the PGD cache earlier to prevent boot failures.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 4fc19708b165 ("x86/alternatives: Initialize temporary mm for patching")
Link: http://lkml.kernel.org/r/20190505011124.39692-1-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull vfs fixes from Al Viro:
- a couple of ->i_link use-after-free fixes
- regression fix for wrong errno on absent device name in mount(2)
(this cycle stuff)
- ancient UFS braino in large GID handling on Solaris UFS images (bogus
cut'n'paste from large UID handling; wrong field checked to decide
whether we should look at old (16bit) or new (32bit) field)
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavour
Abort file_remove_privs() for non-reg. files
[fix] get rid of checking for absent device name in vfs_get_tree()
apparmorfs: fix use-after-free on symlink traversal
securityfs: fix use-after-free on symlink traversal
New race in x86_pmu_stop() was introduced by replacing the
atomic __test_and_clear_bit() of cpuc->active_mask by separate
test_bit() and __clear_bit() calls in the following commit:
3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
The race causes panic for PEBS events with enabled callchains:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
...
RIP: 0010:perf_prepare_sample+0x8c/0x530
Call Trace:
<NMI>
perf_event_output_forward+0x2a/0x80
__perf_event_overflow+0x51/0xe0
handle_pmi_common+0x19e/0x240
intel_pmu_handle_irq+0xad/0x170
perf_event_nmi_handler+0x2e/0x50
nmi_handle+0x69/0x110
default_do_nmi+0x3e/0x100
do_nmi+0x11a/0x180
end_repeat_nmi+0x16/0x1a
RIP: 0010:native_write_msr+0x6/0x20
...
</NMI>
intel_pmu_disable_event+0x98/0xf0
x86_pmu_stop+0x6e/0xb0
x86_pmu_del+0x46/0x140
event_sched_out.isra.97+0x7e/0x160
...
The event is configured to make samples from PEBS drain code,
but when it's disabled, we'll go through NMI path instead,
where data->callchain will not get allocated and we'll crash:
x86_pmu_stop
test_bit(hwc->idx, cpuc->active_mask)
intel_pmu_disable_event(event)
{
...
intel_pmu_pebs_disable(event);
...
EVENT OVERFLOW -> <NMI>
intel_pmu_handle_irq
handle_pmi_common
TEST PASSES -> test_bit(bit, cpuc->active_mask))
perf_event_overflow
perf_prepare_sample
{
...
if (!(sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY))
data->callchain = perf_callchain(event, regs);
CRASH -> size += data->callchain->nr;
}
</NMI>
...
x86_pmu_disable_event(event)
}
__clear_bit(hwc->idx, cpuc->active_mask);
Fixing this by disabling the event itself before setting
off the PEBS bit.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Arcari <darcari@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Lendacky Thomas <Thomas.Lendacky@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
Link: http://lkml.kernel.org/r/20190504151556.31031-1-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
One regression fix.
Changes we merged to STRICT_KERNEL_RWX on 32-bit were causing crashes under
load on some machines depending on memory layout.
Thanks to:
Christophe Leroy.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJczY3sAAoJEFHr6jzI4aWA9UkP/3iM1wt/L3iYpmfPA0MJ4h4f
oEB2WR4KFhBHcMbLKUNS0m9MfecAxHyrFXW46lnFbqVLbaJ3rBSRBitTySo1VnT3
wJKJi/rEf1uNmjxFTKRuE4eL1+4H2OVbwF+CLRB4jaILBmgDyvgYQDkeYQE689ut
VjLA1C/8PRlKFO/82nGPaASdugqk4o+swxNxODhG+Bjv+9KGSinK4LN6zqyLoU/f
aS4APLO59xd6vXosDZRiK6C5efoMu+8kAm8/pyA9E8+0TFTdrtZcPrbCme9Vj9xh
i1Ruzxf9njZaSG37TQpgaFA/IKlSjBqAdaRmA6dy7fQ+GAOpt7YhtNIhk9prvOT6
uGarcHuusQ/Nhy0pb+Kt5H90RKzcZr/pgDEjhujJdCnVozQnX4fvkOjm8SvV6vc4
ucy2zzraMRo3OTvs38YAyGj/MPxOvMYZ1G+v/rAxpfvmdDl6KQ3uCvSczujP7qo6
49VfvfENGRTQhb+KSQtD4VtR9rFvMco5F8m1TFlYI1+RcoxHqqcR8aTK22wQvs45
2J97mkOvVeIp2pO7+yB5c2t3rPG9Qg9ARifnxXyF+9Cb1h1USy3vF2zUCj+wZSeP
l+cYYcjrk/C6btDTVARMOXLKyeB4CWHaSCz8s3HIrXkuZsJbA3/qGi25mCWAEOVj
1qN6UpFNICtXOHBCAPYx
=/m4Q
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.1-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fix from Michael Ellerman:
"One regression fix.
Changes we merged to STRICT_KERNEL_RWX on 32-bit were causing crashes
under load on some machines depending on memory layout.
Thanks to Christophe Leroy"
* tag 'powerpc-5.1-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/32s: Fix BATs setting with CONFIG_STRICT_KERNEL_RWX
Now freeing hw queue resource is moved to hctx's release handler,
we don't need to worry about the race between blk_cleanup_queue and
run queue any more.
So don't drain in-progress dispatch in blk_cleanup_queue().
This is basically revert of c2856ae2f315 ("blk-mq: quiesce queue before
freeing queue").
Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
hctx is always released after requeue is freed.
With holding queue's kobject refcount, it is safe for driver to run queue,
so one run queue might be scheduled after blk_sync_queue() is done.
So moving the cancel of hctx->run_work into blk_mq_hw_sysfs_release()
for avoiding run released queue.
Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In normal queue cleanup path, hctx is released after request queue
is freed, see blk_mq_release().
However, in __blk_mq_update_nr_hw_queues(), hctx may be freed because
of hw queues shrinking. This way is easy to cause use-after-free,
because: one implicit rule is that it is safe to call almost all block
layer APIs if the request queue is alive; and one hctx may be retrieved
by one API, then the hctx can be freed by blk_mq_update_nr_hw_queues();
finally use-after-free is triggered.
Fixes this issue by always freeing hctx after releasing request queue.
If some hctxs are removed in blk_mq_update_nr_hw_queues(), introduce
a per-queue list to hold them, then try to resuse these hctxs if numa
node is matched.
Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Split blk_mq_alloc_and_init_hctx into two parts, and one is
blk_mq_alloc_hctx() for allocating all hctx resources, another
is blk_mq_init_hctx() for initializing hctx, which serves as
counter-part of blk_mq_exit_hctx().
Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org
Cc: Martin K . Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Once blk_cleanup_queue() returns, tags shouldn't be used any more,
because blk_mq_free_tag_set() may be called. Commit 45a9c9d909b2
("blk-mq: Fix a use-after-free") fixes this issue exactly.
However, that commit introduces another issue. Before 45a9c9d909b2,
we are allowed to run queue during cleaning up queue if the queue's
kobj refcount is held. After that commit, queue can't be run during
queue cleaning up, otherwise oops can be triggered easily because
some fields of hctx are freed by blk_mq_free_queue() in blk_cleanup_queue().
We have invented ways for addressing this kind of issue before, such as:
8dc765d438f1 ("SCSI: fix queue cleanup race before queue initialization is done")
c2856ae2f315 ("blk-mq: quiesce queue before freeing queue")
But still can't cover all cases, recently James reports another such
kind of issue:
https://marc.info/?l=linux-scsi&m=155389088124782&w=2
This issue can be quite hard to address by previous way, given
scsi_run_queue() may run requeues for other LUNs.
Fixes the above issue by freeing hctx's resources in its release handler, and this
way is safe becasue tags isn't needed for freeing such hctx resource.
This approach follows typical design pattern wrt. kobject's release handler.
Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Reported-by: James Smart <james.smart@broadcom.com>
Fixes: 45a9c9d909b2 ("blk-mq: Fix a use-after-free")
Cc: stable@vger.kernel.org
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
With holding queue's kobject refcount, it is safe for driver
to schedule requeue. However, blk_mq_kick_requeue_list() may
be called after blk_sync_queue() is done because of concurrent
requeue activities, then requeue work may not be completed when
freeing queue, and kernel oops is triggered.
So moving the cancel of requeue_work into blk_mq_release() for
avoiding race between requeue and freeing queue.
Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Just like aio/io_uring, we need to grab 2 refcount for queuing one
request, one is for submission, another is for completion.
If the request isn't queued from plug code path, the refcount grabbed
in generic_make_request() serves for submission. In theroy, this
refcount should have been released after the sumission(async run queue)
is done. blk_freeze_queue() works with blk_sync_queue() together
for avoiding race between cleanup queue and IO submission, given async
run queue activities are canceled because hctx->run_work is scheduled with
the refcount held, so it is fine to not hold the refcount when
running the run queue work function for dispatch IO.
However, if request is staggered into plug list, and finally queued
from plug code path, the refcount in submission side is actually missed.
And we may start to run queue after queue is removed because the queue's
kobject refcount isn't guaranteed to be grabbed in flushing plug list
context, then kernel oops is triggered, see the following race:
blk_mq_flush_plug_list():
blk_mq_sched_insert_requests()
insert requests to sw queue or scheduler queue
blk_mq_run_hw_queue
Because of concurrent run queue, all requests inserted above may be
completed before calling the above blk_mq_run_hw_queue. Then queue can
be freed during the above blk_mq_run_hw_queue().
Fixes the issue by grab .q_usage_counter before calling
blk_mq_sched_insert_requests() in blk_mq_flush_plug_list(). This way is
safe because the queue is absolutely alive before inserting request.
Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Fix old Windows versions on AMD (recent regression)
* Fix old Linux versions on processors without EPT
* Fixes for LAPIC timer optimizations
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAlzMc18UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNE0ggAj4c9FVC5aFeiBAj1YIcDijT3UtmG
AjhoESE61rZI3PkZ5vcj2GC8eS7sKxExpCrQLsB5rLCF+7X90+tW155BHTHGU0ey
ZgfGj23vlbZpvwZ4B5ujQ/Lmpry76pmy8EYekQogPP/eJxOB3oMk06tjh1mfSdIn
D4Gj8jvYBB2ygAfmW91+YLLZos56id0N+Hyn/s95w4I1o6hKlkdpTOURAJKSGTb1
2t0+XADUt4ZwPM6+2X/eOBMGpeZP0/eR7H3kdyPy3ydm0sFjMiAAs0NbNp3eblB6
oqnytnGUPt8EEoq+wdZahLTbgJst2Ds++XAvVdBZED7zwGaBSETfg03eCg==
=YP4M
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
- PPC and ARM bugfixes from submaintainers
- Fix old Windows versions on AMD (recent regression)
- Fix old Linux versions on processors without EPT
- Fixes for LAPIC timer optimizations
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
KVM: nVMX: Fix size checks in vmx_set_nested_state
KVM: selftests: make hyperv_cpuid test pass on AMD
KVM: lapic: Check for in-kernel LAPIC before deferencing apic pointer
KVM: fix KVM_CLEAR_DIRTY_LOG for memory slots of unaligned size
x86/kvm/mmu: reset MMU context when 32-bit guest switches PAE
KVM: x86: Whitelist port 0x7e for pre-incrementing %rip
Documentation: kvm: fix dirty log ioctl arch lists
KVM: VMX: Move RSB stuffing to before the first RET after VM-Exit
KVM: arm/arm64: Don't emulate virtual timers on userspace ioctls
kvm: arm: Skip stage2 huge mappings for unaligned ipa backed by THP
KVM: arm/arm64: Ensure vcpu target is unset on reset failure
KVM: lapic: Convert guest TSC to host time domain if necessary
KVM: lapic: Allow user to disable adaptive tuning of timer advancement
KVM: lapic: Track lapic timer advance per vCPU
KVM: lapic: Disable timer advancement if adaptive tuning goes haywire
x86: kvm: hyper-v: deal with buggy TLB flush requests from WS2012
KVM: x86: Consider LAPIC TSC-Deadline timer expired if deadline too short
KVM: PPC: Book3S: Protect memslots while validating user address
KVM: PPC: Book3S HV: Perserve PSSCR FAKE_SUSPEND bit on guest exit
KVM: arm/arm64: vgic-v3: Retire pending interrupts on disabling LPIs
...
This patch updates the parisc huge TLB page support to use per-pagetable spinlocks.
This patch requires Mikulas' per-pagetable spinlock patch and the revised TLB
serialization patch from Helge and myself. With Mikulas' patch, we need to use
the per-pagetable spinlock for page table updates. The TLB lock is only used
to serialize TLB flushes on machines with the Merced bus.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
PA-RISC uses a global spinlock to protect pagetable updates in the TLB
fault handlers. When multiple cores are taking TLB faults simultaneously,
the cache line containing the spinlock becomes a bottleneck.
This patch embeds the spinlock in the top level page directory, so that
every process has its own lock. It improves performance by 30% when
doing parallel compilations.
At least on the N class systems, only one PxTLB inter processor
broadcast can be active at any one time on the Merced bus. If a Merced
bus is found, this patch serializes the TLB flushes with the
pa_tlb_flush_lock spinlock.
v1: Initial patch by Mikulas
v2: Added Merced detection by Helge
v3: Revised TLB serialization by Dave & Helge
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
When making the text sections writeable with set_kernel_text_rw(1),
include all text sections including those in the __init section.
Otherwise functions marked with __meminit will stay read-only.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # 4.20+
The pdtlb and pitlb instructions are strongly ordered. The asms invoking
these instructions should be compiler memory barriers to ensure the
compiler doesn't reorder memory operations around these instructions.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
CC: stable@vger.kernel.org # v4.20+
Fixes: 3847dab77421 ("parisc: Add alternative coding infrastructure")
Signed-off-by: Helge Deller <deller@gmx.de>
There are only a couple of instructions that can function as a memory
barrier on parisc. Currently, we use the sync instruction as a memory
barrier when releasing a spinlock. However, the ldcw instruction is a
better barrier when we have a handy memory location since it operates in
the cache on coherent machines.
This patch updates the spinlock release code to use ldcw. I also
changed the "stw,ma" instructions to "stw" instructions as it is not an
adequate barrier.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
TLB operations only need to be serialized on machines with the Merced
(Stretch) bus. The only machines in this category are L and N class, and
they require a 64-bit PA 2.0 kernel. On these machines, we use local TLB
purges in the tmpalias routines.
We don't need to serialize TLB purges on all other machines. Thus, the
lock/unlock code can be removed when CONFIG_PA20 is not defined.
Further, when CONFIG_PA20 is not defined, alternative patching converts
the TLB purges to local purges when PA 2.0 hardware has been detected.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Tested-By: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
The commit 1c30844d2dfe ("mm: reclaim small amounts of memory when an
external fragmentation event occurs") breaks memory management on a
parisc c8000 workstation with this memory layout:
0) Start 0x0000000000000000 End 0x000000003fffffff Size 1024 MB
1) Start 0x0000000100000000 End 0x00000001bfdfffff Size 3070 MB
2) Start 0x0000004040000000 End 0x00000040ffffffff Size 3072 MB
With the patch 1c30844d2dfe, the kernel will incorrectly reclaim the
first zone when it fills up, ignoring the fact that there are two
completely free zones. Basiscally, it limits cache size to 1GiB.
The parisc kernel is currently using the DISCONTIGMEM implementation,
but isn't NUMA. Avoid this issue or strange work-arounds by switching to
the more commonly used SPARSEMEM implementation.
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Signed-off-by: Helge Deller <deller@gmx.de>
The idle task might have been allocated above 4GB. With the current code
we cannot access that memory because the CPU is still running in narrow
mode.
This was found on a J5000 machine and the patch is required to enable
SPARSEMEM on that machine.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Update lists to reflect that we have now KGDB and kretprobes
support.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
It's not used by patch_map()/patch_unmap(), so lets remove
it.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Now that kprobes and kretprobes are implemented, update the list in
Documentation to reflect that.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Implement kretprobes on parisc, parts stolen from powerpc.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Even a 32-bit kernel requires at least 27 MB to decompress itself, so
halt the system with a message if the system has less memory than 32 MB.
Signed-off-by: Helge Deller <deller@gmx.de>
This patch add KGDB support to PA-RISC. It also implements
single-stepping utilizing the recovery counter.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Instead of re-mapping the whole kernel text with RWX rights
add a patch_text() which can be used to replace instructions
in the kernel .text section. Based on the ARM implementation.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
These functions will be used for adding code patching
functions later.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Do not offset mmap base address because of stack randomization if
current task does not want randomization.
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Signed-off-by: Helge Deller <deller@gmx.de>
The TMP75B has a different control register, supports 12-bit
resolution and the default conversion rate is 37 Hz.
Signed-off-by: Iker Perez del Palomar Sustatxa <iker.perez@codethink.co.uk>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
kobject_init() call added one more operation that has to be
done when doing the early initialization of both static and
dynamic livepatch structures.
It would have been easier when the early initialization code
was not duplicated. Let's deduplicate it for future generations
of livepatching hackers.
The patch does not change the existing behavior.
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
kobject_init() always succeeds and sets the reference count to 1.
It allows to always free the structures via kobject_put() and
the related release callback.
Note that the custom kobject state handling was used only
because we did not know that kobject_put() can and actually
should get called even when kobject_init_and_add() fails.
The patch should not change the existing behavior.
Suggested-by: "Tobin C. Harding" <tobin@kernel.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Pull i2c fixes from Wolfram Sang:
"I2C driver bugfixes and a MAINTAINERS update for you"
* 'i2c/for-current-fixed' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: Prevent runtime suspend of adapter when Host Notify is required
i2c: synquacer: fix enumeration of slave devices
MAINTAINERS: friendly takeover of i2c-gpio driver
i2c: designware: ratelimit 'transfer when suspended' errors
i2c: imx: correct the method of getting private data in notifier_call
Allow the boot CPU/CPU0 to be nohz_full. Have the boot CPU take the
do_timer duty during boot until a housekeeping CPU can take over.
This is supported when CONFIG_PM_SLEEP_SMP is not configured, or when
it is configured and the arch allows suspend on non-zero CPUs.
nohz_full has been trialed at a large supercomputer site and found to
significantly reduce jitter. In order to deploy it in production, they
need CPU0 to be nohz_full because their job control system requires
the application CPUs to start from 0, and the housekeeping CPUs are
placed higher. An equivalent job scheduling that uses CPU0 for
housekeeping could be achieved by modifying their system, but it is
preferable if nohz_full can support their environment without
modification.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lkml.kernel.org/r/20190411033448.20842-6-npiggin@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
During housekeeping mask setup, currently a possible CPU is required.
That does not guarantee the CPU would be available at boot time, so
check to ensure that at least one present CPU is in the mask.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lkml.kernel.org/r/20190411033448.20842-5-npiggin@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch provides an arch option, ARCH_SUSPEND_NONZERO_CPU, to
opt-in to allowing suspend to occur on one of the housekeeping CPUs
rather than hardcoded CPU0.
This will allow CPU0 to be a nohz_full CPU with a later change.
It may be possible for platforms with hardware/firmware restrictions
on suspend/wake effectively support this by handing off the final
stage to CPU0 when kernel housekeeping is no longer required. Another
option is to make housekeeping / nohz_full mask dynamic at runtime,
but the complexity could not be justified at this time.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lkml.kernel.org/r/20190411033448.20842-4-npiggin@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This adds a function to disable secondary CPUs for suspend that are
not necessarily non-zero / non-boot CPUs. Platforms will be able to
use this to suspend using non-zero CPUs.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lkml.kernel.org/r/20190411033448.20842-3-npiggin@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that we have a way to switch between MSC buffer windows, add code to
track the current window. The hardware register NWSA that contains the
address of the next window is unfortunately not always usable, and since
the driver has full control of the window switching, there is no reason
not to keep this on the software side.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>