28925 Commits

Author SHA1 Message Date
Masami Hiramatsu
a6682814f3 tracing/kprobes: Allow kprobe-events to record module symbol
Allow kprobe-events to record module symbols.

Since data symbols in a non-loaded module doesn't exist, it fails to
define such symbol as an argument of kprobe-event. But if the kprobe
event is defined on that module, we can defer to resolve the symbol
address.

Note that if given symbol is not found, the event is kept unavailable.
User can enable it but the event is not recorded.

Link: http://lkml.kernel.org/r/153547312336.26502.11432902826345374463.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-10 22:19:12 -04:00
Masami Hiramatsu
59158ec4ae tracing/kprobes: Check the probe on unloaded module correctly
Current kprobe event doesn't checks correctly whether the
given event is on unloaded module or not. It just checks
the event has ":" in the name.

That is not enough because if we define a probe on non-exist
symbol on loaded module, it allows to define that (with
warning message)

To ensure it correctly, this searches the module name on
loaded module list and only if there is not, it allows to
define it. (this event will be available when the target
module is loaded)

Link: http://lkml.kernel.org/r/153547309528.26502.8300278470528281328.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-10 22:19:12 -04:00
Masami Hiramatsu
f3f58935ed tracing/uprobes: Fix to return -EFAULT if copy_from_user failed
Fix probe_mem_read() to return -EFAULT if copy_from_user()
failed. The copy_from_user() returns remaining bytes
when it failed, but probe_mem_read() caller expects it
returns error code like as probe_kernel_read().

Link: http://lkml.kernel.org/r/153547306719.26502.8353484532699160223.stgit@devbox

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-10 22:19:11 -04:00
Masami Hiramatsu
a1303af5d7 tracing: probeevent: Add $argN for accessing function args
Add $argN special fetch variable for accessing function
arguments. This allows user to trace the Nth argument easily
at the function entry.

Note that this returns most probably assignment of registers
and stacks. In some case, it may not work well. If you need
to access correct registers or stacks you should use perf-probe.

Link: http://lkml.kernel.org/r/152465888632.26224.3412465701570253696.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-10 22:19:11 -04:00
Masami Hiramatsu
40b53b7718 tracing: probeevent: Add array type support
Add array type support for probe events.
This allows user to get arraied types from memory address.
The array type syntax is

	TYPE[N]

Where TYPE is one of types (u8/16/32/64,s8/16/32/64,
x8/16/32/64, symbol, string) and N is a fixed value less
than 64.

The string array type is a bit different from other types. For
other base types, <base-type>[1] is equal to <base-type>
(e.g. +0(%di):x32[1] is same as +0(%di):x32.) But string[1] is not
equal to string. The string type itself represents "char array",
but string array type represents "char * array". So, for example,
+0(%di):string[1] is equal to +0(+0(%di)):string.

Link: http://lkml.kernel.org/r/152465891533.26224.6150658225601339931.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-10 22:19:10 -04:00
Masami Hiramatsu
60c2e0cebf tracing: probeevent: Add symbol type
Add "symbol" type to probeevent, which is an alias of u32 or u64
(depends on BITS_PER_LONG). This shows the result value in
symbol+offset style. This type is only available with kprobe
events.

Link: http://lkml.kernel.org/r/152465882860.26224.14779072294412467338.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-10 22:19:10 -04:00
Masami Hiramatsu
9b960a3883 tracing: probeevent: Unify fetch_insn processing common part
Unify the fetch_insn bottom process (from stage 2: dereference
indirect data) from kprobe and uprobe events, since those are
mostly same.

Link: http://lkml.kernel.org/r/152465879965.26224.8547240824606804815.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-10 22:19:09 -04:00
Masami Hiramatsu
0a46c8549f tracing: probeevent: Append traceprobe_ for exported function
Append traceprobe_ for exported function set_print_fmt() as
same as other functions.

Link: http://lkml.kernel.org/r/152465877071.26224.11143125027282999726.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-10 22:19:09 -04:00
Masami Hiramatsu
9178412ddf tracing: probeevent: Return consumed bytes of dynamic area
Cleanup string fetching routine so that returns the consumed
bytes of dynamic area and store the string information as
data_loc format instead of data_rloc.
This simplifies the fetcharg loop.

Link: http://lkml.kernel.org/r/152465874163.26224.12125143907501289031.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-10 22:19:08 -04:00
Masami Hiramatsu
f451bc89d8 tracing: probeevent: Unify fetch type tables
Unify {k,u}probe_fetch_type_table to probe_fetch_type_table
because the main difference of those type tables (fetcharg
methods) are gone. Now we can consolidate it.

Link: http://lkml.kernel.org/r/152465871274.26224.13999436317830479698.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-10 22:19:08 -04:00
Masami Hiramatsu
533059281e tracing: probeevent: Introduce new argument fetching code
Replace {k,u}probe event argument fetching framework with switch-case based.
Currently that is implemented with structures, macros and chain of
function-pointers, which is more complicated than necessary and may get a
performance penalty by retpoline.

This simplify that with an array of "fetch_insn" (opcode and oprands), and
make process_fetch_insn() just interprets it. No function pointers are used.

Link: http://lkml.kernel.org/r/152465868340.26224.2551120475197839464.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-10 22:19:07 -04:00
Masami Hiramatsu
7bfbc63eda tracing: probeevent: Remove NOKPROBE_SYMBOL from print functions
Remove unneeded NOKPROBE_SYMBOL from print functions since
the print functions are only used when printing out the
trace data, and not from kprobe handler.

Link: http://lkml.kernel.org/r/152465865422.26224.10111548170594014954.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-10 22:19:07 -04:00
Masami Hiramatsu
eeb07b0615 tracing: probeevent: Cleanup argument field definition
Cleanup event argument definition code in one place for
maintenancability.

Link: http://lkml.kernel.org/r/152465862529.26224.9068605421476018902.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-10 22:19:06 -04:00
Masami Hiramatsu
56de763052 tracing: probeevent: Cleanup print argument functions
Cleanup the print-argument function to decouple it into
print-name and print-value, so that it can support more
flexible expression, like array type.

Link: http://lkml.kernel.org/r/152465859635.26224.13452846788717102315.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-10 22:19:06 -04:00
Song Liu
a6ca88b241 trace_uprobe: support reference counter in fd-based uprobe
This patch enables uprobes with reference counter in fd-based uprobe.
Highest 32 bits of perf_event_attr.config is used to stored offset
of the reference count (semaphore).

Format information in /sys/bus/event_source/devices/uprobe/format/ is
updated to reflect this new feature.

Link: http://lkml.kernel.org/r/20181002053636.1896903-1-songliubraving@fb.com

Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-10 22:14:17 -04:00
Eric W. Biederman
a36700589b signal: Guard against negative signal numbers in copy_siginfo_from_user32
While fixing an out of bounds array access in known_siginfo_layout
reported by the kernel test robot it became apparent that the same bug
exists in siginfo_layout and affects copy_siginfo_from_user32.

The straight forward fix that makes guards against making this mistake
in the future and should keep the code size small is to just take an
unsigned signal number instead of a signed signal number, as I did to
fix known_siginfo_layout.

Cc: stable@vger.kernel.org
Fixes: cc731525f26a ("signal: Remove kernel interal si_code magic")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-10-10 20:34:14 -05:00
Eric W. Biederman
b2a2ab527d signal: Guard against negative signal numbers in copy_siginfo_from_user
The bounds checks in known_siginfo_layout only guards against positive
numbers that are too large, large negative can slip through and
can cause out of bounds accesses.

Ordinarily this is not a concern because early in signal processing
the signal number is filtered with valid_signal which ensures it
is a small positive signal number, but copy_siginfo_from_user
is called before this check is performed.

[   73.031126] BUG: unable to handle kernel paging request at ffffffff6281bcb6
[   73.032038] PGD 3014067 P4D 3014067 PUD 0
[   73.032565] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
[   73.033287] CPU: 0 PID: 732 Comm: trinity-c3 Tainted: G        W       T 4.19.0-rc1-00077-g4ce5f9c #1
[   73.034423] RIP: 0010:copy_siginfo_from_user+0x4d/0xd0
[   73.034908] Code: 00 8b 53 08 81 fa 80 00 00 00 0f 84 90 00 00 00 85 d2 7e 2d 48 63 0b 83 f9 1f 7f 1c 8d 71 ff bf d8 04 01 50 48 0f a3 f7 73 0e <0f> b6 8c 09 20 bb 81 82 39 ca 7f 15 eb 68 31 c0 83 fa 06 7f 0c eb
[   73.036665] RSP: 0018:ffff88001b8f7e20 EFLAGS: 00010297
[   73.037160] RAX: 0000000000000000 RBX: ffff88001b8f7e90 RCX: fffffffff00000cb
[   73.037865] RDX: 0000000000000001 RSI: 00000000f00000ca RDI: 00000000500104d8
[   73.038546] RBP: ffff88001b8f7e80 R08: 0000000000000000 R09: 0000000000000000
[   73.039201] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000008
[   73.039874] R13: 00000000000002dc R14: 0000000000000000 R15: 0000000000000000
[   73.040613] FS:  000000000104a880(0000) GS:ffff88001f000000(0000) knlGS:0000000000000000
[   73.041649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   73.042405] CR2: ffffffff6281bcb6 CR3: 000000001cb52003 CR4: 00000000001606b0
[   73.043351] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   73.044286] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[   73.045221] Call Trace:
[   73.045556]  __x64_sys_rt_tgsigqueueinfo+0x34/0xa0
[   73.046199]  do_syscall_64+0x1a4/0x390
[   73.046708]  ? vtime_user_enter+0x61/0x80
[   73.047242]  ? __context_tracking_enter+0x4e/0x60
[   73.047714]  ? __context_tracking_enter+0x4e/0x60
[   73.048278]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Therefore fix known_siginfo_layout to take an unsigned signal number
instead of a signed signal number.  All valid signal numbers are small
positive numbers so they will not be affected, but invalid negative
signal numbers will now become large positive signal numbers and will
not be used as indices into the sig_sicodes array.

Making the signal number unsigned makes it difficult for similar mistakes to
happen in the future.

Fixes: 4ce5f9c9e754 ("signal: Use a smaller struct siginfo in the kernel")
Inspired-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-10-10 20:28:33 -05:00
Peng Hao
d59e0ba194 tick/sched : Remove redundant cpu_online() check
can_stop_idle_tick() checks cpu_online() twice. The first check leaves the
function when the CPU is not online, so the second check it
redundant. Remove it.

Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fweisbec@gmail.com
Link: https://lkml.kernel.org/r/1539099815-2943-1-git-send-email-penghao122@sina.com.cn
2018-10-10 11:47:20 +02:00
Prashant Bhole
3b4a63f674 bpf: return EOPNOTSUPP when map lookup isn't supported
Return ERR_PTR(-EOPNOTSUPP) from map_lookup_elem() methods of below
map types:
- BPF_MAP_TYPE_PROG_ARRAY
- BPF_MAP_TYPE_STACK_TRACE
- BPF_MAP_TYPE_XSKMAP
- BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-09 21:52:20 -07:00
Prashant Bhole
509db2833e bpf: error handling when map_lookup_elem isn't supported
The error value returned by map_lookup_elem doesn't differentiate
whether lookup was failed because of invalid key or lookup is not
supported.

Lets add handling for -EOPNOTSUPP return value of map_lookup_elem()
method of map, with expectation from map's implementation that it
should return -EOPNOTSUPP if lookup is not supported.

The errno for bpf syscall for BPF_MAP_LOOKUP_ELEM command will be set
to EOPNOTSUPP if map lookup is not supported.

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-09 21:52:20 -07:00
Wenwen Wang
8af03d1ae2 bpf: btf: Fix a missing check bug
In btf_parse_hdr(), the length of the btf data header is firstly copied
from the user space to 'hdr_len' and checked to see whether it is larger
than 'btf_data_size'. If yes, an error code EINVAL is returned. Otherwise,
the whole header is copied again from the user space to 'btf->hdr'.
However, after the second copy, there is no check between
'btf->hdr->hdr_len' and 'hdr_len' to confirm that the two copies get the
same value. Given that the btf data is in the user space, a malicious user
can race to change the data between the two copies. By doing so, the user
can provide malicious data to the kernel and cause undefined behavior.

This patch adds a necessary check after the second copy, to make sure
'btf->hdr->hdr_len' has the same value as 'hdr_len'. Otherwise, an error
code EINVAL will be returned.

Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-09 21:42:51 -07:00
Borislav Petkov
b69c2e20f6 resource: Clean it up a bit
- Drop BUG_ON()s and do normal error handling instead, in
  find_next_iomem_res().

- Align function arguments on opening braces.

- Get rid of local var sibling_only in find_next_iomem_res().

- Shorten unnecessarily long first_level_children_only arg name.

Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Bjorn Helgaas <bhelgaas@google.com>
CC: Brijesh Singh <brijesh.singh@amd.com>
CC: Dan Williams <dan.j.williams@intel.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Lianbo Jiang <lijiang@redhat.com>
CC: Takashi Iwai <tiwai@suse.de>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Tom Lendacky <thomas.lendacky@amd.com>
CC: Vivek Goyal <vgoyal@redhat.com>
CC: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
CC: bhe@redhat.com
CC: dan.j.williams@intel.com
CC: dyoung@redhat.com
CC: kexec@lists.infradead.org
CC: mingo@redhat.com
Link: <new submission>
2018-10-09 17:25:58 +02:00
Bjorn Helgaas
010a93bf97 resource: Fix find_next_iomem_res() iteration issue
Previously find_next_iomem_res() used "*res" as both an input parameter for
the range to search and the type of resource to search for, and an output
parameter for the resource we found, which makes the interface confusing.

The current callers use find_next_iomem_res() incorrectly because they
allocate a single struct resource and use it for repeated calls to
find_next_iomem_res().  When find_next_iomem_res() returns a resource, it
overwrites the start, end, flags, and desc members of the struct.  If we
call find_next_iomem_res() again, we must update or restore these fields.
The previous code restored res.start and res.end, but not res.flags or
res.desc.

Since the callers did not restore res.flags, if they searched for flags
IORESOURCE_MEM | IORESOURCE_BUSY and found a resource with flags
IORESOURCE_MEM | IORESOURCE_BUSY | IORESOURCE_SYSRAM, the next search would
incorrectly skip resources unless they were also marked as
IORESOURCE_SYSRAM.

Fix this by restructuring the interface so it takes explicit "start, end,
flags" parameters and uses "*res" only as an output parameter.

Based on a patch by Lianbo Jiang <lijiang@redhat.com>.

 [ bp: While at it:
   - make comments kernel-doc style.
   -

Originally-by: http://lore.kernel.org/lkml/20180921073211.20097-2-lijiang@redhat.com
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Brijesh Singh <brijesh.singh@amd.com>
CC: Dan Williams <dan.j.williams@intel.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Lianbo Jiang <lijiang@redhat.com>
CC: Takashi Iwai <tiwai@suse.de>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Tom Lendacky <thomas.lendacky@amd.com>
CC: Vivek Goyal <vgoyal@redhat.com>
CC: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
CC: bhe@redhat.com
CC: dan.j.williams@intel.com
CC: dyoung@redhat.com
CC: kexec@lists.infradead.org
CC: mingo@redhat.com
CC: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/153805812916.1157.177580438135143788.stgit@bhelgaas-glaptop.roam.corp.google.com
2018-10-09 17:18:36 +02:00
Bjorn Helgaas
a98959fdbd resource: Include resource end in walk_*() interfaces
find_next_iomem_res() finds an iomem resource that covers part of a range
described by "start, end".  All callers expect that range to be inclusive,
i.e., both start and end are included, but find_next_iomem_res() doesn't
handle the end address correctly.

If it finds an iomem resource that contains exactly the end address, it
skips it, e.g., if "start, end" is [0x0-0x10000] and there happens to be an
iomem resource [mem 0x10000-0x10000] (the single byte at 0x10000), we skip
it:

  find_next_iomem_res(...)
  {
    start = 0x0;
    end = 0x10000;
    for (p = next_resource(...)) {
      # p->start = 0x10000;
      # p->end = 0x10000;
      # we *should* return this resource, but this condition is false:
      if ((p->end >= start) && (p->start < end))
        break;

Adjust find_next_iomem_res() so it allows a resource that includes the
single byte at the end of the range.  This is a corner case that we
probably don't see in practice.

Fixes: 58c1b5b07907 ("[PATCH] memory hotadd fixes: find_next_system_ram catch range fix")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Brijesh Singh <brijesh.singh@amd.com>
CC: Dan Williams <dan.j.williams@intel.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Lianbo Jiang <lijiang@redhat.com>
CC: Takashi Iwai <tiwai@suse.de>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Tom Lendacky <thomas.lendacky@amd.com>
CC: Vivek Goyal <vgoyal@redhat.com>
CC: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
CC: bhe@redhat.com
CC: dan.j.williams@intel.com
CC: dyoung@redhat.com
CC: kexec@lists.infradead.org
CC: mingo@redhat.com
CC: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/153805812254.1157.16736368485811773752.stgit@bhelgaas-glaptop.roam.corp.google.com
2018-10-09 17:18:34 +02:00
Rik van Riel
7d49b28a80 smp,cpumask: introduce on_each_cpu_cond_mask
Introduce a variant of on_each_cpu_cond that iterates only over the
CPUs in a cpumask, in order to avoid making callbacks for every single
CPU in the system when we only need to test a subset.

Cc: npiggin@gmail.com
Cc: mingo@kernel.org
Cc: will.deacon@arm.com
Cc: songliubraving@fb.com
Cc: kernel-team@fb.com
Cc: hpa@zytor.com
Cc: luto@kernel.org
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180926035844.1420-5-riel@surriel.com
2018-10-09 16:51:11 +02:00
Rik van Riel
c3f7f2c7eb smp: use __cpumask_set_cpu in on_each_cpu_cond
The code in on_each_cpu_cond sets CPUs in a locally allocated bitmask,
which should never be used by other CPUs simultaneously. There is no
need to use locked memory accesses to set the bits in this bitmap.

Switch to __cpumask_set_cpu.

Cc: npiggin@gmail.com
Cc: mingo@kernel.org
Cc: will.deacon@arm.com
Cc: songliubraving@fb.com
Cc: kernel-team@fb.com
Cc: hpa@zytor.com
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180926035844.1420-4-riel@surriel.com
2018-10-09 16:51:11 +02:00
Christoph Hellwig
b9fd04262a dma-direct: respect DMA_ATTR_NO_WARN
Respect the DMA_ATTR_NO_WARN flags for allocations in dma-direct.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Robin Murphy <robin.murphy@arm.com>
2018-10-09 15:08:46 +02:00
He Zhe
e6fe3e5b7d printk: Give error on attempt to set log buffer length to over 2G
The current printk() is ready to handle log buffer size up to 2G.
Give an explicit error for users who want to use larger log buffer.

Also fix printk formatting to show the 2G as a positive number.

Link: http://lkml.kernel.org/r/20181008135916.gg4kkmoki5bgtco5@pathway.suse.cz
Cc: rostedt@goodmis.org
Cc: linux-kernel@vger.kernel.org
Suggested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: He Zhe <zhe.he@windriver.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
[pmladek: Fixed to the really safe limit 2GB.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-10-09 14:02:05 +02:00
Lance Roy
4de1a293a0 futex: Replace spin_is_locked() with lockdep
lockdep_assert_held() is better suited for checking locking requirements,
since it won't get confused when the lock is held by some other task. This
is also a step towards possibly removing spin_is_locked().

Signed-off-by: Lance Roy <ldr709@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <dvhart@infradead.org>
Link: https://lkml.kernel.org/r/20181003053902.6910-12-ldr709@gmail.com
2018-10-09 13:19:28 +02:00
Waiman Long
8ca2b56cd7 locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y
A sizable portion of the CPU cycles spent on the __lock_acquire() is used
up by the atomic increment of the class->ops stat counter. By taking it out
from the lock_class structure and changing it to a per-cpu per-lock-class
counter, we can reduce the amount of cacheline contention on the class
structure when multiple CPUs are trying to acquire locks of the same
class simultaneously.

To limit the increase in memory consumption because of the percpu nature
of that counter, it is now put back under the CONFIG_DEBUG_LOCKDEP
config option. So the memory consumption increase will only occur if
CONFIG_DEBUG_LOCKDEP is defined. The lock_class structure, however,
is reduced in size by 16 bytes on 64-bit archs after ops removal and
a minor restructuring of the fields.

This patch also fixes a bug in the increment code as the counter is of
the 'unsigned long' type, but atomic_inc() was used to increment it.

Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/d66681f3-8781-9793-1dcf-2436a284550b@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-09 09:56:33 +02:00
David S. Miller
071a234ad7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2018-10-08

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) sk_lookup_[tcp|udp] and sk_release helpers from Joe Stringer which allow
BPF programs to perform lookups for sockets in a network namespace. This would
allow programs to determine early on in processing whether the stack is
expecting to receive the packet, and perform some action (eg drop,
forward somewhere) based on this information.

2) per-cpu cgroup local storage from Roman Gushchin.
Per-cpu cgroup local storage is very similar to simple cgroup storage
except all the data is per-cpu. The main goal of per-cpu variant is to
implement super fast counters (e.g. packet counters), which don't require
neither lookups, neither atomic operations in a fast path.
The example of these hybrid counters is in selftests/bpf/netcnt_prog.c

3) allow HW offload of programs with BPF-to-BPF function calls from Quentin Monnet

4) support more than 64-byte key/value in HW offloaded BPF maps from Jakub Kicinski

5) rename of libbpf interfaces from Andrey Ignatov.
libbpf is maturing as a library and should follow good practices in
library design and implementation to play well with other libraries.
This patch set brings consistent naming convention to global symbols.

6) relicense libbpf as LGPL-2.1 OR BSD-2-Clause from Alexei Starovoitov
to let Apache2 projects use libbpf

7) various AF_XDP fixes from Björn and Magnus
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08 23:42:44 -07:00
Geert Uytterhoeven
b8d62f33b7 genirq: Fix grammar s/an /a /
Fix a grammar mistake in <linux/interrupt.h>.

[ mingo: While at it also fix another similar error in another comment as well. ]

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Jiri Kosina <trivial@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181008111726.26286-1-geert%2Brenesas@glider.be
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-09 07:50:41 +02:00
Christoph Hellwig
79ac32a427 dma-direct: document the zone selection logic
What we are doing here isn't quite obvious, so add a comment explaining
it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-09 07:43:25 +02:00
Quentin Monnet
e4052d06a5 bpf: allow offload of programs with BPF-to-BPF function calls
Now that there is at least one driver supporting BPF-to-BPF function
calls, lift the restriction, in the verifier, on hardware offload of
eBPF programs containing such calls. But prevent jit_subprogs(), still
in the verifier, from being run for offloaded programs.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08 10:24:13 +02:00
Quentin Monnet
c941ce9c28 bpf: add verifier callback to get stack usage info for offloaded progs
In preparation for BPF-to-BPF calls in offloaded programs, add a new
function attribute to the struct bpf_prog_offload_ops so that drivers
supporting eBPF offload can hook at the end of program verification, and
potentially extract information collected by the verifier.

Implement a minimal callback (returning 0) in the drivers providing the
structs, namely netdevsim and nfp.

This will be useful in the nfp driver, in later commits, to extract the
number of subprograms as well as the stack depth for those subprograms.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08 10:24:12 +02:00
Stephen Boyd
99c65fa7c5 dma-debug: Check for drivers mapping invalid addresses in dma_map_single()
I recently debugged a DMA mapping oops where a driver was trying to map
a buffer returned from request_firmware() with dma_map_single(). Memory
returned from request_firmware() is mapped into the vmalloc region and
this isn't a valid region to map with dma_map_single() per the DMA
documentation's "What memory is DMA'able?" section.

Unfortunately, we don't really check that in the DMA debugging code, so
enabling DMA debugging doesn't help catch this problem. Let's add a new
DMA debug function to check for a vmalloc address or an invalid virtual
address and print a warning if this happens. This makes it a little
easier to debug these sorts of problems, instead of seeing odd behavior
or crashes when drivers attempt to map the vmalloc space for DMA.

Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-08 09:44:17 +02:00
Eric W. Biederman
601d5abfea signal: In sigqueueinfo prefer sig not si_signo
Andrei Vagin <avagin@gmail.com> reported:

> Accoding to the man page, the user should not set si_signo, it has to be set
> by kernel.
>
> $ man 2 rt_sigqueueinfo
>
>     The uinfo argument specifies the data to accompany  the  signal.   This
>        argument  is  a  pointer to a structure of type siginfo_t, described in
>        sigaction(2) (and defined  by  including  <sigaction.h>).   The  caller
>        should set the following fields in this structure:
>
>        si_code
>               This  must  be  one of the SI_* codes in the Linux kernel source
>               file include/asm-generic/siginfo.h, with  the  restriction  that
>               the  code  must  be  negative (i.e., cannot be SI_USER, which is
>               used by the kernel to indicate a signal  sent  by  kill(2))  and
>               cannot  (since  Linux  2.6.39) be SI_TKILL (which is used by the
>               kernel to indicate a signal sent using tgkill(2)).
>
>        si_pid This should be set to a process ID, typically the process ID  of
>               the sender.
>
>        si_uid This  should  be set to a user ID, typically the real user ID of
>               the sender.
>
>        si_value
>               This field contains the user data to accompany the signal.   For
>               more information, see the description of the last (union sigval)
>               argument of sigqueue(3).
>
>        Internally, the kernel sets the si_signo field to the  value  specified
>        in  sig,  so that the receiver of the signal can also obtain the signal
>        number via that field.
>
> On Tue, Sep 25, 2018 at 07:19:02PM +0200, Eric W. Biederman wrote:
>>
>> If there is some application that calls sigqueueinfo directly that has
>> a problem with this added sanity check we can revisit this when we see
>> what kind of crazy that application is doing.
>
>
> I already know two "applications" ;)
>
> https://github.com/torvalds/linux/blob/master/tools/testing/selftests/ptrace/peeksiginfo.c
> https://github.com/checkpoint-restore/criu/blob/master/test/zdtm/static/sigpending.c
>
> Disclaimer: I'm the author of both of them.

Looking at the kernel code the historical behavior has alwasy been to prefer
the signal number passed in by the kernel.

So sigh.  Implmenet __copy_siginfo_from_user and __copy_siginfo_from_user32 to
take that signal number and prefer it.  The user of ptrace will still
use copy_siginfo_from_user and copy_siginfo_from_user32 as they do not and
never have had a signal number there.

Luckily this change has never made it farther than linux-next.

Fixes: e75dc036c445 ("signal: Fail sigqueueinfo if si_signo != sig")
Reported-by: Andrei Vagin <avagin@gmail.com>
Tested-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-10-08 09:35:26 +02:00
David S. Miller
72438f8cef Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-10-06 14:43:42 -07:00
Ingo Molnar
02678a5823 Merge branch 'core/core' into x86/build, to prevent conflicts
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-06 15:51:56 +02:00
Lianbo Jiang
9cf38d5559 kexec: Allocate decrypted control pages for kdump if SME is enabled
When SME is enabled in the first kernel, it needs to allocate decrypted
pages for kdump because when the kdump kernel boots, these pages need to
be accessed decrypted in the initial boot stage, before SME is enabled.

 [ bp: clean up text. ]

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: kexec@lists.infradead.org
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: akpm@linux-foundation.org
Cc: dan.j.williams@intel.com
Cc: bhelgaas@google.com
Cc: baiyaowei@cmss.chinamobile.com
Cc: tiwai@suse.de
Cc: brijesh.singh@amd.com
Cc: dyoung@redhat.com
Cc: bhe@redhat.com
Cc: jroedel@suse.de
Link: https://lkml.kernel.org/r/20180930031033.22110-3-lijiang@redhat.com
2018-10-06 12:01:51 +02:00
Greg Kroah-Hartman
c1d84a1b42 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Dave writes:
  "Networking fixes:

  1) Fix truncation of 32-bit right shift in bpf, from Jann Horn.

  2) Fix memory leak in wireless wext compat, from Stefan Seyfried.

  3) Use after free in cfg80211's reg_process_hint(), from Yu Zhao.

  4) Need to cancel pending work when unbinding in smsc75xx otherwise
     we oops, also from Yu Zhao.

  5) Don't allow enslaving a team device to itself, from Ido Schimmel.

  6) Fix backwards compat with older userspace for rtnetlink FDB dumps.
     From Mauricio Faria.

  7) Add validation of tc policy netlink attributes, from David Ahern.

  8) Fix RCU locking in rawv6_send_hdrinc(), from Wei Wang."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
  net: mvpp2: Extract the correct ethtype from the skb for tx csum offload
  ipv6: take rcu lock in rawv6_send_hdrinc()
  net: sched: Add policy validation for tc attributes
  rtnetlink: fix rtnl_fdb_dump() for ndmsg header
  yam: fix a missing-check bug
  net: bpfilter: Fix type cast and pointer warnings
  net: cxgb3_main: fix a missing-check bug
  bpf: 32-bit RSH verification must truncate input before the ALU op
  net: phy: phylink: fix SFP interface autodetection
  be2net: don't flip hw_features when VXLANs are added/deleted
  net/packet: fix packet drop as of virtio gso
  net: dsa: b53: Keep CPU port as tagged in all VLANs
  openvswitch: load NAT helper
  bnxt_en: get the reduced max_irqs by the ones used by RDMA
  bnxt_en: free hwrm resources, if driver probe fails.
  bnxt_en: Fix enables field in HWRM_QUEUE_COS2BW_CFG request
  bnxt_en: Fix VNIC reservations on the PF.
  team: Forbid enslaving team device to itself
  net/usb: cancel pending work when unbinding smsc75xx
  mlxsw: spectrum: Delete RIF when VLAN device is removed
  ...
2018-10-06 02:11:30 -07:00
Greg Kroah-Hartman
31d099085d Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Ingo writes:
  "perf fixes:
    - fix a CPU#0 hot unplug bug and a PCI enumeration bug in the x86 Intel uncore PMU driver
    - fix a CPU event enumeration bug in the x86 AMD PMU driver
    - fix a perf ring-buffer corruption bug when using tracepoints
    - fix a PMU unregister locking bug"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events
  perf/x86/intel/uncore: Fix PCI BDF address of M3UPI on SKX
  perf/ring_buffer: Prevent concurent ring buffer access
  perf/x86/intel/uncore: Use boot_cpu_data.phys_proc_id instead of hardcorded physical package ID 0
  perf/core: Fix perf_pmu_unregister() locking
2018-10-05 16:07:13 -07:00
Greg Kroah-Hartman
8be673735e Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Ingo writes:
  "scheduler fixes:

   These fixes address a rather involved performance regression between
   v4.17->v4.19 in the sched/numa auto-balancing code. Since distros
   really need this fix we accelerated it to sched/urgent for a faster
   upstream merge.

   NUMA scheduling and balancing performance is now largely back to
   v4.17 levels, without reintroducing the NUMA placement bugs that
   v4.18 and v4.19 fixed.

   Many thanks to Srikar Dronamraju, Mel Gorman and Jirka Hladky, for
   reporting, testing, re-testing and solving this rather complex set of
   bugs."

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/numa: Migrate pages to local nodes quicker early in the lifetime of a task
  mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration
  sched/numa: Avoid task migration for small NUMA improvement
  mm/migrate: Use spin_trylock() while resetting rate limit
  sched/numa: Limit the conditions where scan period is reset
  sched/numa: Reset scan rate whenever task moves across nodes
  sched/numa: Pass destination CPU as a parameter to migrate_task_rq
  sched/numa: Stop multiple tasks from moving to the CPU at the same time
2018-10-05 15:39:38 -07:00
David S. Miller
b8d5b7cec4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2018-10-05

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix to truncate input on ALU operations in 32 bit mode, from Jann.

2) Fixes for cgroup local storage to reject reserved flags on element
   update and rejection of map allocation with zero-sized value, from Roman.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-05 10:53:13 -07:00
Jann Horn
b799207e1e bpf: 32-bit RSH verification must truncate input before the ALU op
When I wrote commit 468f6eafa6c4 ("bpf: fix 32-bit ALU op verification"), I
assumed that, in order to emulate 64-bit arithmetic with 32-bit logic, it
is sufficient to just truncate the output to 32 bits; and so I just moved
the register size coercion that used to be at the start of the function to
the end of the function.

That assumption is true for almost every op, but not for 32-bit right
shifts, because those can propagate information towards the least
significant bit. Fix it by always truncating inputs for 32-bit ops to 32
bits.

Also get rid of the coerce_reg_to_size() after the ALU op, since that has
no effect.

Fixes: 468f6eafa6c4 ("bpf: fix 32-bit ALU op verification")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-05 18:41:45 +02:00
He Zhe
dd5adbfbfc printk: Add KBUILD_MODNAME and remove a redundant print prefix
Add KBUILD_MODNAME to make prints more clear.

Link: http://lkml.kernel.org/r/1538239553-81805-3-git-send-email-zhe.he@windriver.com
Cc: rostedt@goodmis.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: He Zhe <zhe.he@windriver.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-10-05 15:29:25 +02:00
He Zhe
51a72ab737 printk: Correct wrong casting
log_first_seq and console_seq are 64-bit unsigned integers.
Correct a wrong casting that might cut off the output.

Link: http://lkml.kernel.org/r/1538239553-81805-2-git-send-email-zhe.he@windriver.com
Cc: rostedt@goodmis.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: He Zhe <zhe.he@windriver.com>
[sergey.senozhatsky@gmail.com: More descriptive commit message]
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-10-05 15:02:35 +02:00
He Zhe
277fcdb2cf printk: Fix panic caused by passing log_buf_len to command line
log_buf_len_setup does not check input argument before passing it to
simple_strtoull. The argument would be a NULL pointer if "log_buf_len",
without its value, is set in command line and thus causes the following
panic.

PANIC: early exception 0xe3 IP 10:ffffffffaaeacd0d error 0 cr2 0x0
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc4-yocto-standard+ #1
[    0.000000] RIP: 0010:_parse_integer_fixup_radix+0xd/0x70
...
[    0.000000] Call Trace:
[    0.000000]  simple_strtoull+0x29/0x70
[    0.000000]  memparse+0x26/0x90
[    0.000000]  log_buf_len_setup+0x17/0x22
[    0.000000]  do_early_param+0x57/0x8e
[    0.000000]  parse_args+0x208/0x320
[    0.000000]  ? rdinit_setup+0x30/0x30
[    0.000000]  parse_early_options+0x29/0x2d
[    0.000000]  ? rdinit_setup+0x30/0x30
[    0.000000]  parse_early_param+0x36/0x4d
[    0.000000]  setup_arch+0x336/0x99e
[    0.000000]  start_kernel+0x6f/0x4ee
[    0.000000]  x86_64_start_reservations+0x24/0x26
[    0.000000]  x86_64_start_kernel+0x6f/0x72
[    0.000000]  secondary_startup_64+0xa4/0xb0

This patch adds a check to prevent the panic.

Link: http://lkml.kernel.org/r/1538239553-81805-1-git-send-email-zhe.he@windriver.com
Cc: stable@vger.kernel.org
Cc: rostedt@goodmis.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: He Zhe <zhe.he@windriver.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-10-05 14:11:12 +02:00
Borislav Petkov
d0e7d14455 cpu/SMT: State SMT is disabled even with nosmt and without "=force"
When booting with "nosmt=force" a message is issued into dmesg to
confirm that SMT has been force-disabled but such a message is not
issued when only "nosmt" is on the kernel command line.

Fix that.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181004172227.10094-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-05 10:20:31 +02:00
Alexander Duyck
1fc8e6423e dma-direct: fix return value of dma_direct_supported
It appears that in commit 9d7a224b463e ("dma-direct: always allow dma mask
<= physiscal memory size") the logic of the test was changed from a "<" to
a ">=" however I don't see any reason for that change. I am assuming that
there was some additional change planned, specifically I suspect the logic
was intended to be reversed and possibly used for a return. Since that is
the case I have gone ahead and done that.

This addresses issues I had on my system that prevented me from booting
with the above mentioned commit applied on an x86_64 system w/ Intel IOMMU.

Fixes: 9d7a224b463e ("dma-direct: always allow dma mask <= physiscal memory size")
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-05 09:15:15 +02:00