linux

iv/linux

Author	SHA1	Message	Date
Alexei Starovoitov	70ed0706a4	bpf: disable preemption for bpf progs attached to uprobe trace_call_bpf() no longer disables preemption on its own. All callers of this function has to do it explicitly. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de>	2020-02-24 16:17:14 -08:00
Thomas Gleixner	1b7a51a63b	bpf/trace: Remove EXPORT from trace_call_bpf() All callers are built in. No point to export this. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-02-24 16:16:38 -08:00
Thomas Gleixner	f03efe49bd	bpf/tracing: Remove redundant preempt_disable() in __bpf_trace_run() __bpf_trace_run() disables preemption around the BPF_PROG_RUN() invocation. This is redundant because __bpf_trace_run() is invoked from a trace point via __DO_TRACE() which already disables preemption _before_ invoking any of the functions which are attached to a trace point. Remove it and add a cant_sleep() check. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200224145642.847220186@linutronix.de	2020-02-24 16:12:20 -08:00
Thomas Gleixner	dbca151cad	bpf: Update locking comment in hashtab code The comment where the bucket lock is acquired says: /* bpf_map_update_elem() can be called in_irq() */ which is not really helpful and aside of that it does not explain the subtle details of the hash bucket locks expecially in the context of BPF and perf, kprobes and tracing. Add a comment at the top of the file which explains the protection scopes and the details how potential deadlocks are prevented. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200224145642.755793061@linutronix.de	2020-02-24 16:12:20 -08:00
Thomas Gleixner	2ed905c521	bpf: Enforce preallocation for instrumentation programs on RT Aside of the general unsafety of run-time map allocation for instrumentation type programs RT enabled kernels have another constraint: The instrumentation programs are invoked with preemption disabled, but the memory allocator spinlocks cannot be acquired in atomic context because they are converted to 'sleeping' spinlocks on RT. Therefore enforce map preallocation for these programs types when RT is enabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200224145642.648784007@linutronix.de	2020-02-24 16:12:19 -08:00
Thomas Gleixner	94dacdbd5d	bpf: Tighten the requirements for preallocated hash maps The assumption that only programs attached to perf NMI events can deadlock on memory allocators is wrong. Assume the following simplified callchain: kmalloc() from regular non BPF context cache empty freelist empty lock(zone->lock); tracepoint or kprobe BPF() update_elem() lock(bucket) kmalloc() cache empty freelist empty lock(zone->lock); <- DEADLOCK There are other ways which do not involve locking to create wreckage: kmalloc() from regular non BPF context local_irq_save(); ... obj = slab_first(); kprobe() BPF() update_elem() lock(bucket) kmalloc() local_irq_save(); ... obj = slab_first(); <- Same object as above ... So preallocation _must_ be enforced for all variants of intrusive instrumentation. Unfortunately immediate enforcement would break backwards compatibility, so for now such programs still are allowed to run, but a one time warning is emitted in dmesg and the verifier emits a warning in the verifier log as well so developers are made aware about this and can fix their programs before the enforcement becomes mandatory. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200224145642.540542802@linutronix.de	2020-02-24 16:12:19 -08:00
Alexei Starovoitov	8eece07c01	Two migrate disable related stubs for BPF to base the RT patches on -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl5O6asTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoRRwEACaqcIHhYRVZJZkVqtjYoYPnP74XCaS ZTbJiFMJDxcM0lt5YfjjNA7qFA8AVfOOTBdPUrjqw05LuykibpNXEBpk+n2lDEgc x9tZHcbylFlmhJxsnAI/S++l++I7ieqYNbNXUhUkkiN86OMxfn1rSoCo639ManxB WiwR+R7Q/aicUN95kLGPzvAt41K/DQ30pNpjAE5Y2z+Hl26JPti2jcMHhfFWDD23 91mRd0ryuMSQm+ZkyiZdobfd6OzOGtYLnnxRjNJVSFz7q0s+hLUarWI+wnOxh4fD Jb+eahKPXSvA787/TbOvM8iNkIPX70HeBpmQIpIjbI+JEjo05amaEiYZ3cwZgs5P rDpFadyfi4peftbGs7G+/8RnlDJ4Towbw0X8Sl6t1RaKAfz04zsAlmW4zyNucRUD 1pN0mxfA/OCssGahXceP0VKpqqm6hDZipFfwFheXk6zX+5YWyNGX2iP3Fh4u2vbx vTwxQPmsusQuehQfmR23HzNG9I67knumsJI0PiFNhnroiNklevY+cPNMRN+tj3fB osOSgcKVmbsKUc9jcn7yfIxSyZ1exnT2Wsri4lMHo7+kZGEguX2qnLRd1tOtVrgO hjawoDrhXaA9AjU80e3p9eXmRYke9oZmQdeA8X0vl9YVTsLdxlishYCNBTysHvUl ULgHkTY/WoxZkA== =4Vci -----END PGP SIGNATURE----- Merge tag 'sched-for-bpf-2020-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into bpf-next Two migrate disable related stubs for BPF to base the RT patches on	2020-02-21 18:27:47 -08:00
David S. Miller	732a0dee50	Merge branch 'mlxfw-Improve-error-reporting-and-FW-reactivate-support' Saeed Mahameed says: ==================== mlxfw: Improve error reporting and FW reactivate support This patchset improves mlxfw error reporting to netlink and to kernel log. V2: - Use proper err codes, EBUSY/EIO instead of EALREADY/EREMOTEIO - Fix typo. From Eran and me. 1) patch #1, Make mlxfw/mlxsw fw flash devlink status notify generic, and enable it for mlx5. 2) patches #2..#5 are improving mlxfw flash error messages by reporting detailed mlxfw FSM error messages to netlink and kernel log. 3) patches #6,7 From Eran: Add FW reactivate flow to mlxfw and mlx5 ==================== Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-21 15:41:10 -08:00
Eran Ben Elisha	b7331aa204	net/mlx5: Add fsm_reactivate callback support Add support for fsm reactivate via MIRC (Management Image Re-activation Control) set and query commands. For re-activation flow, driver shall first run MIRC set, and then wait until FW is done (via querying MIRC status). Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-21 15:41:10 -08:00
Eran Ben Elisha	958dfd0dc6	net/mlxfw: Add reactivate flow support to FSM burn flow Expose fsm_reactivate callback to the mlxfw_dev_ops struct. FSM reactivate is needed before flashing the new image in order to flush the old flashed but not running firmware image. In case mlxfw_dev do not support the reactivation, this step will be skipped. But if later image flash will fail, a hint will be provided by the extack to advise the user that the failure might be related to it. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-21 15:41:10 -08:00
Saeed Mahameed	5042e8b97d	net/mlxfw: Use MLXFW_ERR_MSG macro for error reporting Instead of always calling both mlxfw_err and NL_SET_ERR_MSG_MOD with the same message, use the dedicated macro instead. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-21 15:41:10 -08:00
Saeed Mahameed	6a3f707c00	net/mlxfw: Convert pr_* to dev_* in mlxfw_fsm.c Introduce mlxfw_{info, err, dbg} macros and make them call corresponding dev_* macros, then convert all instances of pr_* to mlxfw_*. This will allow printing the device name mlxfw is operating on. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-21 15:41:10 -08:00
Saeed Mahameed	f7fe7aa88f	net/mlxfw: More error messages coverage Make sure mlxfw_firmware_flash reports a detailed user readable error message in every possible error path, basically every time mlxfw_dev->ops->*() is called and an error is returned, or when image initialization is failed. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-21 15:41:10 -08:00
Saeed Mahameed	86a1270fd7	net/mlxfw: Improve FSM err message reporting and return codes Report unique and standard error codes corresponding to the specific FW flash error. In addition, add a more detailed error messages to netlink. Before: $ devlink dev flash pci/0000:05:00.0 file ... Error: mlxfw: Firmware flash failed. devlink answers: Invalid argument After: $ devlink dev flash pci/0000:05:00.0 file ... Error: mlxfw: Firmware flash failed: pending reset. devlink answers: Device busy Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-21 15:41:10 -08:00
Saeed Mahameed	4ae575661f	net/mlxfw: Generic mlx FW flash status notify FW flash status notify is currently implemented via a callback to the caller mlx module, and all it is doing is to call devlink_flash_update_status_notify with the specific module devlink instance. Instead of repeating the whole process for all mlx modules and re-implement the status_notify callback again and again. Just provide the devlink instance as part of mlxfw_dev when calling mlxfw_firmware_flash and let mlxfw do the devlink status updates directly. This will be very useful for adding status notify support to mlx5, as already done in this patch, with a simple one line of just providing the devlink instance to mlxfw_firmware_flash. mlxfw now depends on NET_DEVLINK as all other mlx modules. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-21 15:41:10 -08:00
David S. Miller	b105e8e281	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2020-02-21 The following pull-request contains BPF updates for your net-next tree. We've added 25 non-merge commits during the last 4 day(s) which contain a total of 33 files changed, 2433 insertions(+), 161 deletions(-). The main changes are: 1) Allow for adding TCP listen sockets into sock_map/hash so they can be used with reuseport BPF programs, from Jakub Sitnicki. 2) Add a new bpf_program__set_attach_target() helper for adding libbpf support to specify the tracepoint/function dynamically, from Eelco Chaudron. 3) Add bpf_read_branch_records() BPF helper which helps use cases like profile guided optimizations, from Daniel Xu. 4) Enable bpf_perf_event_read_value() in all tracing programs, from Song Liu. 5) Relax BTF mandatory check if only used for libbpf itself e.g. to process BTF defined maps, from Andrii Nakryiko. 6) Move BPF selftests -mcpu compilation attribute from 'probe' to 'v3' as it has been observed that former fails in envs with low memlock, from Yonghong Song. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-21 15:22:45 -08:00
David S. Miller	e65ee2fb54	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflict resolution of ice_virtchnl_pf.c based upon work by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-21 13:39:34 -08:00
Daniel Borkmann	eb1e1478b6	Merge branch 'bpf-sockmap-listen' Jakub Sitnicki says: ==================== This patch set turns SOCK{MAP,HASH} into generic collections for TCP sockets, both listening and established. Adding support for listening sockets enables us to use these BPF map types with reuseport BPF programs. Why? SOCKMAP and SOCKHASH, in comparison to REUSEPORT_SOCKARRAY, allow the socket to be in more than one map at the same time. Having a BPF map type that can hold listening sockets, and gracefully co-exist with reuseport BPF is important if, in the future, we want BPF programs that run at socket lookup time [0]. Cover letter for v1 of this series tells the full story of how we got here [1]. Although SOCK{MAP,HASH} are not a drop-in replacement for SOCKARRAY just yet, because UDP support is lacking, it's a step in this direction. We're working with Lorenz on extending SOCK{MAP,HASH} to hold UDP sockets, and expect to post RFC series for sockmap + UDP in the near future. I've dropped Acks from all patches that have been touched since v6. The audit for missing READ_ONCE annotations for access to sk_prot is ongoing. Thus far I've found one location specific to TCP listening sockets that needed annotating. This got fixed it in this iteration. I wonder if sparse checker could be put to work to identify places where we have sk_prot access while not holding sk_lock... The patch series depends on another one, posted earlier [2], that has been split out of it. v6 -> v7: - Extended the series to cover SOCKHASH. (patches 4-8, 10-11) (John) - Rebased onto recent bpf-next. Resolved conflicts in recent fixes to sk_state checks on sockmap/sockhash update path. (patch 4) - Added missing READ_ONCE annotation in sock_copy. (patch 1) - Split out patches that simplify sk_psock_restore_proto [2]. v5 -> v6: - Added a fix-up for patch 1 which I forgot to commit in v5. Sigh. v4 -> v5: - Rebase onto recent bpf-next to resolve conflicts. (Daniel) v3 -> v4: - Make tcp_bpf_clone parameter names consistent across function declaration and definition. (Martin) - Use sock_map_redirect_okay helper everywhere we need to take a different action for listening sockets. (Lorenz) - Expand comment explaining the need for a callback from reuseport to sockarray code in reuseport_detach_sock. (Martin) - Mention the possibility of using a u64 counter for reuseport IDs in the future in the description for patch 10. (Martin) v2 -> v3: - Generate reuseport ID when group is created. Please see patch 10 description for details. (Martin) - Fix the build when CONFIG_NET_SOCK_MSG is not selected by either CONFIG_BPF_STREAM_PARSER or CONFIG_TLS. (kbuild bot & John) - Allow updating sockmap from BPF on BPF_SOCK_OPS_TCP_LISTEN_CB callback. An oversight in previous iterations. Users may want to populate the sockmap with listening sockets from BPF as well. - Removed RCU read lock assertion in sock_map_lookup_sys. (Martin) - Get rid of a warning when child socket was cloned with parent's psock state. (John) - Check for tcp_bpf_unhash rather than tcp_bpf_recvmsg when deciding if sk_proto needs restoring on clone. Check for recvmsg in the context of listening socket cloning was confusing. (Martin) - Consolidate sock_map_sk_is_suitable with sock_map_update_okay. This led to adding dedicated predicates for sockhash. Update self-tests accordingly. (John) - Annotate unlikely branch in bpf_{sk,msg}_redirect_map when socket isn't in a map, or isn't a valid redirect target. (John) - Document paired READ/WRITE_ONCE annotations and cover shared access in more detail in patch 2 description. (John) - Correct a couple of log messages in sockmap_listen self-tests so the message reflects the actual failure. - Rework reuseport tests from sockmap_listen suite so that ENOENT error from bpf_sk_select_reuseport handler does not happen on happy path. v1 -> v2: - af_ops->syn_recv_sock callback is no longer overridden and burdened with restoring sk_prot and clearing sk_user_data in the child socket. As child socket is already hashed when syn_recv_sock returns, it is too late to put it in the right state. Instead patches 3 & 4 address restoring sk_prot and clearing sk_user_data before we hash the child socket. (Pointed out by Martin Lau) - Annotate shared access to sk->sk_prot with READ_ONCE/WRITE_ONCE macros as we write to it from sk_msg while socket might be getting cloned on another CPU. (Suggested by John Fastabend) - Convert tests for SOCKMAP holding listening sockets to return-on-error style, and hook them up to test_progs. Also use BPF skeleton for setup. Add new tests to cover the race scenario discovered during v1 review. RFC -> v1: - Switch from overriding proto->accept to af_ops->syn_recv_sock, which happens earlier. Clearing the psock state after accept() does not work for child sockets that become orphaned (never got accepted). v4-mapped sockets need special care. - Return the socket cookie on SOCKMAP lookup from syscall to be on par with REUSEPORT_SOCKARRAY. Requires SOCKMAP to take u64 on lookup/update from syscall. - Make bpf_sk_redirect_map (ingress) and bpf_msg_redirect_map (egress) SOCKMAP helpers fail when target socket is a listening one. - Make bpf_sk_select_reuseport helper fail when target is a TCP established socket. - Teach libbpf to recognize SK_REUSEPORT program type from section name. - Add a dedicated set of tests for SOCKMAP holding listening sockets, covering map operations, overridden socket callbacks, and BPF helpers. [0] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/ [1] https://lore.kernel.org/bpf/20191123110751.6729-1-jakub@cloudflare.com/ [2] https://lore.kernel.org/bpf/20200217121530.754315-1-jakub@cloudflare.com/ ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2020-02-21 22:31:41 +01:00
Jakub Sitnicki	44d28be2b8	selftests/bpf: Tests for sockmap/sockhash holding listening sockets Now that SOCKMAP and SOCKHASH map types can store listening sockets, user-space and BPF API is open to a new set of potential pitfalls. Exercise the map operations, with extra attention to code paths susceptible to races between map ops and socket cloning, and BPF helpers that work with SOCKMAP/SOCKHASH to gain confidence that all works as expected. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200218171023.844439-12-jakub@cloudflare.com	2020-02-21 22:29:46 +01:00
Jakub Sitnicki	11318ba8ca	selftests/bpf: Extend SK_REUSEPORT tests to cover SOCKMAP/SOCKHASH Parametrize the SK_REUSEPORT tests so that the map type for storing sockets is not hard-coded in the test setup routine. This, together with careful state cleaning after the tests, lets us run the test cases for REUSEPORT_ARRAY, SOCKMAP, and SOCKHASH to have test coverage for all supported map types. The last two support only TCP sockets at the moment. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200218171023.844439-11-jakub@cloudflare.com	2020-02-21 22:29:45 +01:00
Jakub Sitnicki	035ff358f2	net: Generate reuseport group ID on group creation Commit 736b46027eb4 ("net: Add ID (if needed) to sock_reuseport and expose reuseport_lock") has introduced lazy generation of reuseport group IDs that survive group resize. By comparing the identifier we check if BPF reuseport program is not trying to select a socket from a BPF map that belongs to a different reuseport group than the one the packet is for. Because SOCKARRAY used to be the only BPF map type that can be used with reuseport BPF, it was possible to delay the generation of reuseport group ID until a socket from the group was inserted into BPF map for the first time. Now that SOCK{MAP,HASH} can be used with reuseport BPF we have two options, either generate the reuseport ID on map update, like SOCKARRAY does, or allocate an ID from the start when reuseport group gets created. This patch takes the latter approach to keep sockmap free of calls into reuseport code. This streamlines the reuseport_id access as its lifetime now matches the longevity of reuseport object. The cost of this simplification, however, is that we allocate reuseport IDs for all SO_REUSEPORT users. Even those that don't use SOCKARRAY in their setups. With the way identifiers are currently generated, we can have at most S32_MAX reuseport groups, which hopefully is sufficient. If we ever get close to the limit, we can switch an u64 counter like sk_cookie. Another change is that we now always call into SOCKARRAY logic to unlink the socket from the map when unhashing or closing the socket. Previously we did it only when at least one socket from the group was in a BPF map. It is worth noting that this doesn't conflict with sockmap tear-down in case a socket is in a SOCK{MAP,HASH} and belongs to a reuseport group. sockmap tear-down happens first: prot->unhash `- tcp_bpf_unhash \|- tcp_bpf_remove \| `- while (sk_psock_link_pop(psock)) \| `- sk_psock_unlink \| `- sock_map_delete_from_link \| `- __sock_map_delete \| `- sock_map_unref \| `- sk_psock_put \| `- sk_psock_drop \| `- rcu_assign_sk_user_data(sk, NULL) `- inet_unhash `- reuseport_detach_sock `- bpf_sk_reuseport_detach `- WRITE_ONCE(sk->sk_user_data, NULL) Suggested-by: Martin Lau <kafai@fb.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200218171023.844439-10-jakub@cloudflare.com	2020-02-21 22:29:45 +01:00
Jakub Sitnicki	9fed9000c5	bpf: Allow selecting reuseport socket from a SOCKMAP/SOCKHASH SOCKMAP & SOCKHASH now support storing references to listening sockets. Nothing keeps us from using these map types a collection of sockets to select from in BPF reuseport programs. Whitelist the map types with the bpf_sk_select_reuseport helper. The restriction that the socket has to be a member of a reuseport group still applies. Sockets in SOCKMAP/SOCKHASH that don't have sk_reuseport_cb set are not a valid target and we signal it with -EINVAL. The main benefit from this change is that, in contrast to REUSEPORT_SOCKARRAY, SOCK{MAP,HASH} don't impose a restriction that a listening socket can be just one BPF map at the same time. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200218171023.844439-9-jakub@cloudflare.com	2020-02-21 22:29:45 +01:00
Jakub Sitnicki	1d59f3bcee	bpf, sockmap: Let all kernel-land lookup values in SOCKMAP/SOCKHASH Don't require the kernel code, like BPF helpers, that needs access to SOCK{MAP,HASH} map contents to live in net/core/sock_map.c. Expose the lookup operation to all kernel-land. Lookup from BPF context is not whitelisted yet. While syscalls have a dedicated lookup handler. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200218171023.844439-8-jakub@cloudflare.com	2020-02-21 22:29:45 +01:00
Jakub Sitnicki	c1cdf65da0	bpf, sockmap: Return socket cookie on lookup from syscall Tooling that populates the SOCK{MAP,HASH} with sockets from user-space needs a way to inspect its contents. Returning the struct sock * that the map holds to user-space is neither safe nor useful. An approach established by REUSEPORT_SOCKARRAY is to return a socket cookie (a unique identifier) instead. Since socket cookies are u64 values, SOCK{MAP,HASH} need to support such a value size for lookup to be possible. This requires special handling on update, though. Attempts to do a lookup on a map holding u32 values will be met with ENOSPC error. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200218171023.844439-7-jakub@cloudflare.com	2020-02-21 22:29:45 +01:00
Jakub Sitnicki	6e830c2f6c	bpf, sockmap: Don't set up upcalls and progs for listening sockets Now that sockmap/sockhash can hold listening sockets, when setting up the psock we will (i) grab references to verdict/parser progs, and (2) override socket upcalls sk_data_ready and sk_write_space. However, since we cannot redirect to listening sockets so we don't need to link the socket to the BPF progs. And more importantly we don't want the listening socket to have overridden upcalls because they would get inherited by child sockets cloned from it. Introduce a separate initialization path for listening sockets that does not change the upcalls and ignores the BPF progs. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200218171023.844439-6-jakub@cloudflare.com	2020-02-21 22:29:45 +01:00
Jakub Sitnicki	8ca30379a4	bpf, sockmap: Allow inserting listening TCP sockets into sockmap In order for sockmap/sockhash types to become generic collections for storing TCP sockets we need to loosen the checks during map update, while tightening the checks in redirect helpers. Currently sock{map,hash} require the TCP socket to be in established state, which prevents inserting listening sockets. Change the update pre-checks so the socket can also be in listening state. Since it doesn't make sense to redirect with sock{map,hash} to listening sockets, add appropriate socket state checks to BPF redirect helpers too. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200218171023.844439-5-jakub@cloudflare.com	2020-02-21 22:29:45 +01:00
Jakub Sitnicki	e80251555f	tcp_bpf: Don't let child socket inherit parent protocol ops on copy Prepare for cloning listening sockets that have their protocol callbacks overridden by sk_msg. Child sockets must not inherit parent callbacks that access state stored in sk_user_data owned by the parent. Restore the child socket protocol callbacks before it gets hashed and any of the callbacks can get invoked. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200218171023.844439-4-jakub@cloudflare.com	2020-02-21 22:29:45 +01:00
Jakub Sitnicki	f1ff5ce2cd	net, sk_msg: Clear sk_user_data pointer on clone if tagged sk_user_data can hold a pointer to an object that is not intended to be shared between the parent socket and the child that gets a pointer copy on clone. This is the case when sk_user_data points at reference-counted object, like struct sk_psock. One way to resolve it is to tag the pointer with a no-copy flag by repurposing its lowest bit. Based on the bit-flag value we clear the child sk_user_data pointer after cloning the parent socket. The no-copy flag is stored in the pointer itself as opposed to externally, say in socket flags, to guarantee that the pointer and the flag are copied from parent to child socket in an atomic fashion. Parent socket state is subject to change while copying, we don't hold any locks at that time. This approach relies on an assumption that sk_user_data holds a pointer to an object aligned at least 2 bytes. A manual audit of existing users of rcu_dereference_sk_user_data helper confirms our assumption. Also, an RCU-protected sk_user_data is not likely to hold a pointer to a char value or a pathological case of "struct { char c; }". To be safe, warn when the flag-bit is set when setting sk_user_data to catch any future misuses. It is worth considering why clearing sk_user_data unconditionally is not an option. There exist users, DRBD, NVMe, and Xen drivers being among them, that rely on the pointer being copied when cloning the listening socket. Potentially we could distinguish these users by checking if the listening socket has been created in kernel-space via sock_create_kern, and hence has sk_kern_sock flag set. However, this is not the case for NVMe and Xen drivers, which create sockets without marking them as belonging to the kernel. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200218171023.844439-3-jakub@cloudflare.com	2020-02-21 22:29:45 +01:00
Jakub Sitnicki	b8e202d1d1	net, sk_msg: Annotate lockless access to sk_prot on clone sk_msg and ULP frameworks override protocol callbacks pointer in sk->sk_prot, while tcp accesses it locklessly when cloning the listening socket, that is with neither sk_lock nor sk_callback_lock held. Once we enable use of listening sockets with sockmap (and hence sk_msg), there will be shared access to sk->sk_prot if socket is getting cloned while being inserted/deleted to/from the sockmap from another CPU: Read side: tcp_v4_rcv sk = __inet_lookup_skb(...) tcp_check_req(sk) inet_csk(sk)->icsk_af_ops->syn_recv_sock tcp_v4_syn_recv_sock tcp_create_openreq_child inet_csk_clone_lock sk_clone_lock READ_ONCE(sk->sk_prot) Write side: sock_map_ops->map_update_elem sock_map_update_elem sock_map_update_common sock_map_link_no_progs tcp_bpf_init tcp_bpf_update_sk_prot sk_psock_update_proto WRITE_ONCE(sk->sk_prot, ops) sock_map_ops->map_delete_elem sock_map_delete_elem __sock_map_delete sock_map_unref sk_psock_put sk_psock_drop sk_psock_restore_proto tcp_update_ulp WRITE_ONCE(sk->sk_prot, proto) Mark the shared access with READ_ONCE/WRITE_ONCE annotations. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200218171023.844439-2-jakub@cloudflare.com	2020-02-21 22:29:45 +01:00
Linus Torvalds	0c0ddd6ae4	linux-watchdog 5.6-rc3 tag -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iEYEABECAAYFAl5PyHAACgkQ+iyteGJfRsrYtACgt7aQulV9htAryig1fxw3Hw96 YuYAoKjddxt5WDkcpBHcrLG4fQpzV0ap =4YJU -----END PGP SIGNATURE----- Merge tag 'linux-watchdog-5.6-rc3' of git://www.linux-watchdog.org/linux-watchdog Pull watchdog fixes from Wim Van Sebroeck: - mtk_wdt needs RESET_CONTROLLER to build - da9062 driver fixes: - fix power management ops - do not ping the hw during stop() - add dependency on I2C * tag 'linux-watchdog-5.6-rc3' of git://www.linux-watchdog.org/linux-watchdog: watchdog: da9062: Add dependency on I2C watchdog: da9062: fix power management ops watchdog: da9062: do not ping the hw during stop() watchdog: fix mtk_wdt.c RESET_CONTROLLER build error	2020-02-21 13:02:49 -08:00
Linus Torvalds	bb65619e97	Char/Misc fixes for 5.6-rc3 Here are some small char/misc driver fixes for 5.6-rc3. Also included in here are some updates for some documentation files that I seem to be maintaining these days. The driver fixes are: - small fixes for the habanalabs driver - fsi driver bugfix All of these have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXk+9bw8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ymcoQCffSD3qJ6cVclyTHAUOtxJHuFLz3sAn0X6f77F u3UE5eTXLGNrt0u8Ab0i =9dXU -----END PGP SIGNATURE----- Merge tag 'char-misc-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small char/misc driver fixes for 5.6-rc3. Also included in here are some updates for some documentation files that I seem to be maintaining these days. The driver fixes are: - small fixes for the habanalabs driver - fsi driver bugfix All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: Documentation/process: Swap out the ambassador for Canonical habanalabs: patched cb equals user cb in device memset habanalabs: do not halt CoreSight during hard reset habanalabs: halt the engines before hard-reset MAINTAINERS: remove unnecessary ':' characters fsi: aspeed: add unspecified HAS_IOMEM dependency COPYING: state that all contributions really are covered by this file Documentation/process: Change Microsoft contact for embargoed hardware issues embargoed-hardware-issues: drop Amazon contact as the email address now bounces Documentation/process: Add Arm contact for embargoed HW issues	2020-02-21 12:57:05 -08:00
Linus Torvalds	e5553ac71e	Staging driver fixes for 5.6-rc3 Here are some small staging driver fixes for 5.6-rc3, along with the removal of an unused/unneeded driver as well. The android vsoc driver is not needed anymore by anyone, so it was removed. The other driver fixes are: - ashmem bugfixes - greybus audio driver bugfix - wireless driver bugfixes and tiny cleanups to error paths All of these have been in linux-next for a while now with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXk++hg8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ymvngCfcgS9D+FY+V0VIQlt+39h9U7cQeMAn0P0EepL FDc7GtK937axJBesRE+J =rVJM -----END PGP SIGNATURE----- Merge tag 'staging-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver fixes from Greg KH: "Here are some small staging driver fixes for 5.6-rc3, along with the removal of an unused/unneeded driver as well. The android vsoc driver is not needed anymore by anyone, so it was removed. The other driver fixes are: - ashmem bugfixes - greybus audio driver bugfix - wireless driver bugfixes and tiny cleanups to error paths All of these have been in linux-next for a while now with no reported issues" * tag 'staging-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: rtl8723bs: Remove unneeded goto statements staging: rtl8188eu: Remove some unneeded goto statements staging: rtl8723bs: Fix potential overuse of kernel memory staging: rtl8188eu: Fix potential overuse of kernel memory staging: rtl8723bs: Fix potential security hole staging: rtl8188eu: Fix potential security hole staging: greybus: use after free in gb_audio_manager_remove_all() staging: android: Delete the 'vsoc' driver staging: rtl8723bs: fix copy of overlapping memory staging: android: ashmem: Disallow ashmem memory from being remapped staging: vt6656: fix sign of rx_dbm to bb_pre_ed_rssi.	2020-02-21 12:53:53 -08:00
Linus Torvalds	ef11f1b76a	TTY/Serial driver fixes for 5.6-rc3 Here are a number of small tty and serial driver fixes for 5.6-rc3 that resolve a bunch of reported issues. They are: - vt selection and ioctl fixes - serdev bugfix - atmel serial driver fixes - qcom serial driver fixes - other minor serial driver fixes All of these have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXk+/Dw8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ymGXgCfcYfb71LB7KbExN4GFETm93pca+QAoI+ihzFw zQJFJAe/dmiP2TifAdY7 =oeU9 -----END PGP SIGNATURE----- Merge tag 'tty-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver fixes from Greg KH: "Here are a number of small tty and serial driver fixes for 5.6-rc3 that resolve a bunch of reported issues. They are: - vt selection and ioctl fixes - serdev bugfix - atmel serial driver fixes - qcom serial driver fixes - other minor serial driver fixes All of these have been in linux-next for a while with no reported issues" * tag 'tty-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: vt: selection, close sel_buffer race vt: selection, handle pending signals in paste_selection serial: cpm_uart: call cpm_muram_init before registering console tty: serial: qcom_geni_serial: Fix RX cancel command failure serial: 8250: Check UPF_IRQ_SHARED in advance tty: serial: imx: setup the correct sg entry for tx dma vt: vt_ioctl: fix race in VT_RESIZEX vt: fix scrollback flushing on background consoles tty: serial: tegra: Handle RX transfer in PIO mode if DMA wasn't started tty/serial: atmel: manage shutdown in case of RS485 or ISO7816 mode serdev: ttyport: restore client ops on deregistration serial: ar933x_uart: set UART_CS_{RX,TX}_READY_ORIDE	2020-02-21 12:48:29 -08:00
Linus Torvalds	cee853e825	USB fixes for 5.6-rc3 Here are a number of small USB driver fixes for 5.6-rc3. Included in here are: - MAINTAINER file updates - USB gadget driver fixes - usb core quirk additions and fixes for regressions - xhci driver fixes - usb serial driver id additions and fixes - thunderbolt bugfix Thunderbolt patches come in through here now that USB4 is really thunderbolt. All of these have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXk+/zw8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ynQywCeOlklNLcTKw/JkbsbtMlY/yTl4W8AoLLoEgSy GyUV8E67wZNdXLKH+n14 =o97t -----END PGP SIGNATURE----- Merge tag 'usb-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB/Thunderbolt fixes from Greg KH: "Here are a number of small USB driver fixes for 5.6-rc3. Included in here are: - MAINTAINER file updates - USB gadget driver fixes - usb core quirk additions and fixes for regressions - xhci driver fixes - usb serial driver id additions and fixes - thunderbolt bugfix Thunderbolt patches come in through here now that USB4 is really thunderbolt. All of these have been in linux-next for a while with no reported issues" * tag 'usb-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (34 commits) USB: misc: iowarrior: add support for the 100 device thunderbolt: Prevent crash if non-active NVMem file is read usb: gadget: udc-xilinx: Fix xudc_stop() kernel-doc format USB: misc: iowarrior: add support for the 28 and 28L devices USB: misc: iowarrior: add support for 2 OEMed devices USB: Fix novation SourceControl XL after suspend xhci: Fix memory leak when caching protocol extended capability PSI tables - take 2 Revert "xhci: Fix memory leak when caching protocol extended capability PSI tables" MAINTAINERS: Sort entries in database for THUNDERBOLT usb: dwc3: debug: fix string position formatting mixup with ret and len usb: gadget: serial: fix Tx stall after buffer overflow usb: gadget: ffs: ffs_aio_cancel(): Save/restore IRQ flags usb: dwc2: Fix SET/CLEAR_FEATURE and GET_STATUS flows usb: dwc2: Fix in ISOC request length checking usb: gadget: composite: Support more than 500mA MaxPower usb: gadget: composite: Fix bMaxPower for SuperSpeedPlus usb: gadget: u_audio: Fix high-speed max packet size usb: dwc3: gadget: Check for IOC/LST bit in TRB->ctrl fields USB: core: clean up endpoint-descriptor parsing USB: quirks: blacklist duplicate ep on Sound Devices USBPre2 ...	2020-02-21 12:44:53 -08:00
Linus Torvalds	88f8bbfa94	drm fixes for 5.6-rc3 core: - Allow only 1 rotation argument, and allow 0 rotation in video cmdline. i915: - Workaround missing Display Stream Compression (DSC) state readout by forcing modeset when its enabled at probe - Fix EHL port clock voltage level requirements - Fix queuing retire workers on the virtual engine - Fix use of partially initialized waiters - Stop using drm_pci_alloc/drm_pci/free - Fix rewind of RING_TAIL by forcing a context reload - Fix locking on resetting ring->head - Propagate our bug filing URL change to stable kernels panfrost: - Small compiler warning fix for panfrost. - Fix when using performance counters in panfrost when using per fd address space. sun4xi: - Fix dt binding nouveau: - tu11x modesetting fix - ACR/GR firmware support for tu11x (fw is public now) msm: - fix UBWC on GPU and display side for sc7180 - fix DSI suspend/resume issue encountered on sc7180 - fix some breakage on so called "linux-android" devices (fallout from sc7180/a618 support, not seen earlier due to bootloader/firmware differences) - couple other misc fixes amdgpu: - HDCP fixes - xclk fix for raven - GFXOFF fixes -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJeT28iAAoJEAx081l5xIa+fugP/37HbQre6t9NLx2xgNp9G608 pNTzEhT7+kHZ9xpAul5QXCAtM9h7WHgM6fJyIqZdE8lDcQxKrtWe4ol/Yo/hux7K 8DaZoZV5MjAryrTXPA2oe8o+u4VAUd3+kgWwGXfKeBGxaKlE8a6h0V895z9TPhqx UaNcpk2X7SGTO+8avd0qZXzotreZfBMGw6awGeB+5Y9GJOgnLIqxmP9LZpuvvw6D XbzJxjMsGO2m/V1iBJLCB0che2Lu5pPfc/JBVw7UTFK/Pzq5UI5Q+YsVmNOLZ+vr rslGuxZ8V58Zh58K5svbvtIwYij4pWFEuQH88XgPDP2iFdTPY5wXSqAItAiFJ9ql 4MlC9G1sT+raCkcPQRJmhKXsdwrH8oh9OYE8J2vEasapiql3d5dar7dfQ91NtaSH sIJGx67zgw+ZVgdpVq4QXD900lDbcrIpxqHyTYjedgZynCBiO9L8vrhJuzg+0Bnp LioK+u7UZiS2WQwIPBN4V0x3sJKvLWhOceU40ME1JhYxAqC52428xXRPtLbBbW1A jqFlPJMlo4K5hS/GTD/kIHdmdY5k/KX1tuo3WqxxjbnWAu7vPPX12+78etIojW5R 5JiokYdRT2o4Ke7nK5DCf51iA9OU64EAkAMIeuqMoZW2JTDI8Cdpj6p0MhqRWu4k C+eX/2RfrfR9juPtR3+n =q5x/ -----END PGP SIGNATURE----- Merge tag 'drm-fixes-2020-02-21' of git://anongit.freedesktop.org/drm/drm Pull drm fixes from Dave Airlie: "Varied fixes for rc3. i915 is the largest, they are seeing some ACPI problems with their CI which hopefully get solved soon [1]. msm has a bunch of fixes for new hw added in the merge, a bunch of amdgpu fixes, and nouveau adds support for some new firmwares for turing tu11x GPUs that were just released into linux-firmware by nvidia, they operate the same as the ones we already have for tu10x so should be fine to hook up. Otherwise it's just misc fixes for panfrost and sun4i. core: - Allow only one rotation argument, and allow zero rotation in video cmdline. i915: - Workaround missing Display Stream Compression (DSC) state readout by forcing modeset when its enabled at probe - Fix EHL port clock voltage level requirements - Fix queuing retire workers on the virtual engine - Fix use of partially initialized waiters - Stop using drm_pci_alloc/drm_pci/free - Fix rewind of RING_TAIL by forcing a context reload - Fix locking on resetting ring->head - Propagate our bug filing URL change to stable kernels panfrost: - Small compiler warning fix for panfrost. - Fix when using performance counters in panfrost when using per fd address space. sun4xi: - Fix dt binding nouveau: - tu11x modesetting fix - ACR/GR firmware support for tu11x (fw is public now) msm: - fix UBWC on GPU and display side for sc7180 - fix DSI suspend/resume issue encountered on sc7180 - fix some breakage on so called "linux-android" devices (fallout from sc7180/a618 support, not seen earlier due to bootloader/firmware differences) - couple other misc fixes amdgpu: - HDCP fixes - xclk fix for raven - GFXOFF fixes" [1] The Intel suspend testing should now be fixed by commit 63fb9623427f ("ACPI: PM: s2idle: Check fixed wakeup events in acpi_s2idle_wake()") * tag 'drm-fixes-2020-02-21' of git://anongit.freedesktop.org/drm/drm: (39 commits) drm/amdgpu/display: clean up hdcp workqueue handling drm/amdgpu: add is_raven_kicker judgement for raven1 drm/i915/gt: Avoid resetting ring->head outside of its timeline mutex drm/i915/execlists: Always force a context reload when rewinding RING_TAIL drm/i915: Wean off drm_pci_alloc/drm_pci_free drm/i915/gt: Protect defer_request() from new waiters drm/i915/gt: Prevent queuing retire workers on the virtual engine drm/i915/dsc: force full modeset whenever DSC is enabled at probe drm/i915/ehl: Update port clock voltage level requirements drm/i915: Update drm/i915 bug filing URL MAINTAINERS: Update drm/i915 bug filing URL drm/i915: Initialise basic fence before acquiring seqno drm/i915/gem: Require per-engine reset support for non-persistent contexts drm/nouveau/kms/gv100-: Re-set LUT after clearing for modesets drm/nouveau/gr/tu11x: initial support drm/nouveau/acr/tu11x: initial support drm/amdgpu/gfx10: disable gfxoff when reading rlc clock drm/amdgpu/gfx9: disable gfxoff when reading rlc clock drm/amdgpu/soc15: fix xclk for raven drm/amd/powerplay: always refetch the enabled features status on dpm enablement ...	2020-02-21 12:18:02 -08:00
Linus Torvalds	3dc55dba67	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: 1) Limit xt_hashlimit hash table size to avoid OOM or hung tasks, from Cong Wang. 2) Fix deadlock in xsk by publishing global consumer pointers when NAPI is finished, from Magnus Karlsson. 3) Set table field properly to RT_TABLE_COMPAT when necessary, from Jethro Beekman. 4) NLA_STRING attributes are not necessary NULL terminated, deal wiht that in IFLA_ALT_IFNAME. From Eric Dumazet. 5) Fix checksum handling in atlantic driver, from Dmitry Bezrukov. 6) Handle mtu==0 devices properly in wireguard, from Jason A. Donenfeld. 7) Fix several lockdep warnings in bonding, from Taehee Yoo. 8) Fix cls_flower port blocking, from Jason Baron. 9) Sanitize internal map names in libbpf, from Toke Høiland-Jørgensen. 10) Fix RDMA race in qede driver, from Michal Kalderon. 11) Fix several false lockdep warnings by adding conditions to list_for_each_entry_rcu(), from Madhuparna Bhowmik. 12) Fix sleep in atomic in mlx5 driver, from Huy Nguyen. 13) Fix potential deadlock in bpf_map_do_batch(), from Yonghong Song. 14) Hey, variables declared in switch statement before any case statements are not initialized. I learn something every day. Get rids of this stuff in several parts of the networking, from Kees Cook. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits) bnxt_en: Issue PCIe FLR in kdump kernel to cleanup pending DMAs. bnxt_en: Improve device shutdown method. net: netlink: cap max groups which will be considered in netlink_bind() net: thunderx: workaround BGX TX Underflow issue ionic: fix fw_status read net: disable BRIDGE_NETFILTER by default net: macb: Properly handle phylink on at91rm9200 s390/qeth: fix off-by-one in RX copybreak check s390/qeth: don't warn for napi with 0 budget s390/qeth: vnicc Fix EOPNOTSUPP precedence openvswitch: Distribute switch variables for initialization net: ip6_gre: Distribute switch variables for initialization net: core: Distribute switch variables for initialization udp: rehash on disconnect net/tls: Fix to avoid gettig invalid tls record bpf: Fix a potential deadlock with bpf_map_do_batch bpf: Do not grab the bucket spinlock by default on htab batch ops ice: Wait for VF to be reset/ready before configuration ice: Don't tell the OS that link is going down ice: Don't reject odd values of usecs set by user ...	2020-02-21 11:59:51 -08:00
David S. Miller	b4d9785ce5	Merge branch 'Migrate-QRTR-Nameservice-to-Kernel' Manivannan Sadhasivam says: ==================== Migrate QRTR Nameservice to Kernel This patchset migrates the Qualcomm IPC Router (QRTR) Nameservice from userspace to kernel under net/qrtr. The userspace implementation of it can be found here: https://github.com/andersson/qrtr/blob/master/src/ns.c This change is required for enabling the WiFi functionality of some Qualcomm WLAN devices using ATH11K without any dependency on a userspace daemon. Since the QRTR NS is not usually packed in most of the distros, users need to clone, build and install it to get the WiFi working. It will become a hassle when the user doesn't have any other source of network connectivity. The original userspace code is published under BSD3 license. For migrating it to Linux kernel, I have adapted Dual BSD/GPL license. This patchset has been verified on Dragonboard410c and Intel NUC with QCA6390 WLAN device. Changes in v2: * Sorted the local variables in reverse XMAS tree order ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-21 11:59:13 -08:00
Manivannan Sadhasivam	31d6cbeeb8	net: qrtr: Fix the local node ID as 1 In order to start the QRTR nameservice, the local node ID needs to be valid. Hence, fix it to 1. Previously, the node ID was configured through a userspace tool before starting the nameservice daemon. Since we have now integrated the nameservice handling to kernel, this change is necessary for making it functional. Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-21 11:59:12 -08:00
Manivannan Sadhasivam	0c2204a4ad	net: qrtr: Migrate nameservice to kernel from userspace The QRTR nameservice has been maintained in userspace for some time. This commit migrates it to Linux kernel. This change is required in order to eliminate the need of starting a userspace daemon for making the WiFi functional for ath11k based devices. Since the QRTR NS is not usually packed in most of the distros, users need to clone, build and install it to get the WiFi working. It will become a hassle when the user doesn't have any other source of network connectivity. Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-21 11:59:12 -08:00
Dan Murphy	cd26d72d4d	net: phy: dp83867: Add speed optimization feature Set the speed optimization bit on the DP83867 PHY. This feature can also be strapped on the 64 pin PHY devices but the 48 pin devices do not have the strap pin available to enable this feature in the hardware. PHY team suggests to have this bit set. With this bit set the PHY will auto negotiate and report the link parameters in the PHYSTS register. This register provides a single location within the register set for quick access to commonly accessed information. In this case when auto negotiation is on the PHY core reads the bits that have been configured or if auto negotiation is off the PHY core reads the BMCR register and sets the phydev parameters accordingly. This Giga bit PHY can throttle the speed to 100Mbps or 10Mbps to accomodate a 4-wire cable. If this should occur the PHYSTS register contains the current negotiated speed and duplex mode. In overriding the genphy_read_status the dp83867_read_status will do a genphy_read_status to setup the LP and pause bits. And then the PHYSTS register is read and the phydev speed and duplex mode settings are updated. Signed-off-by: Dan Murphy <dmurphy@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-21 11:43:56 -08:00
Linus Torvalds	b0dd1eb220	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: - A few y2038 fixes which missed the merge window while dependencies in NFS were being sorted out. - A bunch of fixes. Some minor, some not. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: MAINTAINERS: use tabs for SAFESETID lib/stackdepot.c: fix global out-of-bounds in stack_slabs mm/sparsemem: pfn_to_page is not valid yet on SPARSEMEM mm/vmscan.c: don't round up scan size for online memory cgroup lib/string.c: update match_string() doc-strings with correct behavior mm/memcontrol.c: lost css_put in memcg_expand_shrinker_maps() mm/swapfile.c: fix a comment in sys_swapon() scripts/get_maintainer.pl: deprioritize old Fixes: addresses get_maintainer: remove uses of P: for maintainer name selftests/vm: add missed tests in run_vmtests include/uapi/linux/swab.h: fix userspace breakage, use __BITS_PER_LONG for swap Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()" y2038: hide timeval/timespec/itimerval/itimerspec types y2038: remove unused time32 interfaces y2038: remove ktime to/from timespec/timeval conversion	2020-02-21 11:40:10 -08:00
Randy Dunlap	bb8d00ff51	MAINTAINERS: use tabs for SAFESETID Use tabs for indentation instead of spaces for SAFESETID. All (!) other entries in MAINTAINERS use tabs (according to my simple grepping). Link: http://lkml.kernel.org/r/2bb2e52a-2694-816d-57b4-6cabfadd6c1a@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Micah Morton <mortonm@chromium.org> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-02-21 11:22:15 -08:00
Alexander Potapenko	305e519ce4	lib/stackdepot.c: fix global out-of-bounds in stack_slabs Walter Wu has reported a potential case in which init_stack_slab() is called after stack_slabs[STACK_ALLOC_MAX_SLABS - 1] has already been initialized. In that case init_stack_slab() will overwrite stack_slabs[STACK_ALLOC_MAX_SLABS], which may result in a memory corruption. Link: http://lkml.kernel.org/r/20200218102950.260263-1-glider@google.com Fixes: cd11016e5f521 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB") Signed-off-by: Alexander Potapenko <glider@google.com> Reported-by: Walter Wu <walter-zh.wu@mediatek.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-02-21 11:22:15 -08:00
Wei Yang	18e19f195c	mm/sparsemem: pfn_to_page is not valid yet on SPARSEMEM When we use SPARSEMEM instead of SPARSEMEM_VMEMMAP, pfn_to_page() doesn't work before sparse_init_one_section() is called. This leads to a crash when hotplug memory: BUG: unable to handle page fault for address: 0000000006400000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP PTI CPU: 3 PID: 221 Comm: kworker/u16:1 Tainted: G W 5.5.0-next-20200205+ #343 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 Workqueue: kacpi_hotplug acpi_hotplug_work_fn RIP: 0010:__memset+0x24/0x30 Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 <f3> 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3 RSP: 0018:ffffb43ac0373c80 EFLAGS: 00010a87 RAX: ffffffffffffffff RBX: ffff8a1518800000 RCX: 0000000000050000 RDX: 0000000000000000 RSI: 00000000000000ff RDI: 0000000006400000 RBP: 0000000000140000 R08: 0000000000100000 R09: 0000000006400000 R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000000 R13: 0000000000000028 R14: 0000000000000000 R15: ffff8a153ffd9280 FS: 0000000000000000(0000) GS:ffff8a153ab00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000006400000 CR3: 0000000136fca000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: sparse_add_section+0x1c9/0x26a __add_pages+0xbf/0x150 add_pages+0x12/0x60 add_memory_resource+0xc8/0x210 __add_memory+0x62/0xb0 acpi_memory_device_add+0x13f/0x300 acpi_bus_attach+0xf6/0x200 acpi_bus_scan+0x43/0x90 acpi_device_hotplug+0x275/0x3d0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x1a7/0x370 worker_thread+0x30/0x380 kthread+0x112/0x130 ret_from_fork+0x35/0x40 We should use memmap as it did. On x86 the impact is limited to x86_32 builds, or x86_64 configurations that override the default setting for SPARSEMEM_VMEMMAP. Other memory hotplug archs (arm64, ia64, and ppc) also default to SPARSEMEM_VMEMMAP=y. [dan.j.williams@intel.com: changelog update] {rppt@linux.ibm.com: changelog update] Link: http://lkml.kernel.org/r/20200219030454.4844-1-bhe@redhat.com Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Baoquan He <bhe@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-02-21 11:22:15 -08:00
Gavin Shan	76073c646f	mm/vmscan.c: don't round up scan size for online memory cgroup Commit 68600f623d69 ("mm: don't miss the last page because of round-off error") makes the scan size round up to @denominator regardless of the memory cgroup's state, online or offline. This affects the overall reclaiming behavior: the corresponding LRU list is eligible for reclaiming only when its size logically right shifted by @sc->priority is bigger than zero in the former formula. For example, the inactive anonymous LRU list should have at least 0x4000 pages to be eligible for reclaiming when we have 60/12 for swappiness/priority and without taking scan/rotation ratio into account. After the roundup is applied, the inactive anonymous LRU list becomes eligible for reclaiming when its size is bigger than or equal to 0x1000 in the same condition. (0x4000 >> 12) * 60 / (60 + 140 + 1) = 1 ((0x1000 >> 12) * 60) + 200) / (60 + 140 + 1) = 1 aarch64 has 512MB huge page size when the base page size is 64KB. The memory cgroup that has a huge page is always eligible for reclaiming in that case. The reclaiming is likely to stop after the huge page is reclaimed, meaing the further iteration on @sc->priority and the silbing and child memory cgroups will be skipped. The overall behaviour has been changed. This fixes the issue by applying the roundup to offlined memory cgroups only, to give more preference to reclaim memory from offlined memory cgroup. It sounds reasonable as those memory is unlikedly to be used by anyone. The issue was found by starting up 8 VMs on a Ampere Mustang machine, which has 8 CPUs and 16 GB memory. Each VM is given with 2 vCPUs and 2GB memory. It took 264 seconds for all VMs to be completely up and 784MB swap is consumed after that. With this patch applied, it took 236 seconds and 60MB swap to do same thing. So there is 10% performance improvement for my case. Note that KSM is disable while THP is enabled in the testing. total used free shared buff/cache available Mem: 16196 10065 2049 16 4081 3749 Swap: 8175 784 7391 total used free shared buff/cache available Mem: 16196 11324 3656 24 1215 2936 Swap: 8175 60 8115 Link: http://lkml.kernel.org/r/20200211024514.8730-1-gshan@redhat.com Fixes: 68600f623d69 ("mm: don't miss the last page because of round-off error") Signed-off-by: Gavin Shan <gshan@redhat.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: <stable@vger.kernel.org> [4.20+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-02-21 11:22:15 -08:00
Alexandru Ardelean	c11d3fa011	lib/string.c: update match_string() doc-strings with correct behavior There were a few attempts at changing behavior of the match_string() helpers (i.e. 'match_string()' & 'sysfs_match_string()'), to change & extend the behavior according to the doc-string. But the simplest approach is to just fix the doc-strings. The current behavior is fine as-is, and some bugs were introduced trying to fix it. As for extending the behavior, new helpers can always be introduced if needed. The match_string() helpers behave more like 'strncmp()' in the sense that they go up to n elements or until the first NULL element in the array of strings. This change updates the doc-strings with this info. Link: http://lkml.kernel.org/r/20200213072722.8249-1-alexandru.ardelean@analog.com Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: "Tobin C . Harding" <tobin@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-02-21 11:22:15 -08:00
Vasily Averin	75866af62b	mm/memcontrol.c: lost css_put in memcg_expand_shrinker_maps() for_each_mem_cgroup() increases css reference counter for memory cgroup and requires to use mem_cgroup_iter_break() if the walk is cancelled. Link: http://lkml.kernel.org/r/c98414fb-7e1f-da0f-867a-9340ec4bd30b@virtuozzo.com Fixes: 0a4465d34028 ("mm, memcg: assign memcg-aware shrinkers bitmap to memcg") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-02-21 11:22:15 -08:00
Christoph Hellwig	fed98ef4d8	mm/swapfile.c: fix a comment in sys_swapon() claim_swapfile now always takes i_rwsem. Link: http://lkml.kernel.org/r/20200114161225.309792-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-02-21 11:22:15 -08:00
Douglas Anderson	0ef82fcefb	scripts/get_maintainer.pl: deprioritize old Fixes: addresses Recently, I found that get_maintainer was causing me to send emails to the old addresses for maintainers. Since I usually just trust the output of get_maintainer to know the right email address, I didn't even look carefully and fired off two patch series that went to the wrong place. Oops. The problem was introduced recently when trying to add signatures from Fixes. The problem was that these email addresses were added too early in the process of compiling our list of places to send. Things added to the list earlier are considered more canonical and when we later added maintainer entries we ended up deduplicating to the old address. Here are two examples using mainline commits (to make it easier to replicate) for the two maintainers that I messed up recently: $ git format-patch d8549bcd0529~..d8549bcd0529 $ ./scripts/get_maintainer.pl 0001-clk-Add-clk_hw.patch \| grep Boyd Stephen Boyd <sboyd@codeaurora.org>... $ git format-patch 6d1238aa3395~..6d1238aa3395 $ ./scripts/get_maintainer.pl 0001-arm64-dts-qcom-qcs404.patch \| grep Andy Andy Gross <andy.gross@linaro.org> Let's move the adding of addresses from Fixes: to the end since the email addresses from these are much more likely to be older. After this patch the above examples get the right addresses for the two examples. Link: http://lkml.kernel.org/r/20200127095001.1.I41fba9f33590bfd92cd01960161d8384268c6569@changeid Fixes: 2f5bd343694e ("scripts/get_maintainer.pl: add signatures from Fixes: <badcommit> lines in commit message") Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Joe Perches <joe@perches.com> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Andy Gross <agross@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-02-21 11:22:15 -08:00
Joe Perches	ef0c08192a	get_maintainer: remove uses of P: for maintainer name Commit 1ca84ed6425f ("MAINTAINERS: Reclaim the P: tag for Maintainer Entry Profile") changed the use of the "P:" tag from "Person" to "Profile (ie: special subsystem coding styles and characteristics)" Change how get_maintainer.pl parses the "P:" tag to match. Link: http://lkml.kernel.org/r/ca53823fc5d25c0be32ad937d0207a0589c08643.camel@perches.com Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Dan Williams <dan.j.william@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-02-21 11:22:15 -08:00

1 2 3 4 5 ...

901364 Commits