IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
nftables payload statements are used to mangle SCTP headers, but they can
only replace the Internet Checksum. As a consequence, nftables rules that
mangle sport/dport/vtag in SCTP headers potentially generate packets that
are discarded by the receiver, unless the CRC-32C is "offloaded" (e.g the
rule mangles a skb having 'ip_summed' equal to 'CHECKSUM_PARTIAL'.
Fix this extending uAPI definitions and L4 checksum update function, in a
way that userspace programs (e.g. nft) can instruct the kernel to compute
CRC-32C in SCTP headers. Also ensure that LIBCRC32C is built if NF_TABLES
is 'y' or 'm' in the kernel build configuration.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 4fc427e05158 ("ipv6_route_seq_next should increase position index")
tried to fix the issue where seq_file pos is not increased
if a NULL element is returned with seq_ops->next(). See bug
https://bugzilla.kernel.org/show_bug.cgi?id=206283
The commit effectively does:
- increase pos for all seq_ops->start()
- increase pos for all seq_ops->next()
For ipv6_route, increasing pos for all seq_ops->next() is correct.
But increasing pos for seq_ops->start() is not correct
since pos is used to determine how many items to skip during
seq_ops->start():
iter->skip = *pos;
seq_ops->start() just fetches the *current* pos item.
The item can be skipped only after seq_ops->show() which essentially
is the beginning of seq_ops->next().
For example, I have 7 ipv6 route entries,
root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=4096
00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001 eth0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001 eth0
00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo
fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000004 00000000 00000001 eth0
00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
0+1 records in
0+1 records out
1050 bytes (1.0 kB, 1.0 KiB) copied, 0.00707908 s, 148 kB/s
root@arch-fb-vm1:~/net-next
In the above, I specify buffer size 4096, so all records can be returned
to user space with a single trip to the kernel.
If I use buffer size 128, since each record size is 149, internally
kernel seq_read() will read 149 into its internal buffer and return the data
to user space in two read() syscalls. Then user read() syscall will trigger
next seq_ops->start(). Since the current implementation increased pos even
for seq_ops->start(), it will skip record #2, #4 and #6, assuming the first
record is #1.
root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=128
00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001 eth0
00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0
00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
4+1 records in
4+1 records out
600 bytes copied, 0.00127758 s, 470 kB/s
To fix the problem, create a fake pos pointer so seq_ops->start()
won't actually increase seq_file pos. With this fix, the
above `dd` command with `bs=128` will show correct result.
Fixes: 4fc427e05158 ("ipv6_route_seq_next should increase position index")
Cc: Alexei Starovoitov <ast@kernel.org>
Suggested-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
smc_ism_register_dmb() returns error codes set by the ISM driver which
are not guaranteed to be negative or in the errno range. Such values
would not be handled by ERR_PTR() and finally the return code will be
used as a memory address.
Fix that by using a valid negative errno value with ERR_PTR().
Fixes: 72b7f6c48708 ("net/smc: unique reason code for exceeded max dmb count")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The SMCD_DMBE_SIZES should include all valid DMBE buffer sizes, so the
correct value is 6 which means 1MB. With 7 the registration of an ISM
buffer would always fail because of the invalid size requested.
Fix that and set the value to 6.
Fixes: c6ba7c9ba43d ("net/smc: add base infrastructure for SMC-D and ISM")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When a delayed event is enqueued then the event worker will send this
event the next time it is running and no other flow is currently
active. The event handler is called for the delayed event, and the
pointer to the event keeps set in lgr->delayed_event. This pointer is
cleared later in the processing by smc_llc_flow_start().
This can lead to a use-after-free condition when the processing does not
reach smc_llc_flow_start(), but frees the event because of an error
situation. Then the delayed_event pointer is still set but the event is
freed.
Fix this by always clearing the delayed event pointer when the event is
provided to the event handler for processing, and remove the code to
clear it in smc_llc_flow_start().
Fixes: 555da9af827d ("net/smc: add event-based llc_flow framework")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
IF CONFIG_BPFILTER_UMH is set, building fails:
In file included from /usr/include/sys/socket.h:33:0,
from net/bpfilter/main.c:6:
/usr/include/bits/socket.h:390:10: fatal error: asm/socket.h: No such file or directory
#include <asm/socket.h>
^~~~~~~~~~~~~~
compilation terminated.
scripts/Makefile.userprogs:43: recipe for target 'net/bpfilter/main.o' failed
make[2]: *** [net/bpfilter/main.o] Error 1
Add missing include path to fix this.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix the loss of transmission of a call's final ack when a socket gets shut
down. This means that the server will retransmit the last data packet or
send a ping ack and then get an ICMP indicating the port got closed. The
server will then view this as a failure.
Fixes: 3136ef49a14c ("rxrpc: Delay terminal ACK transmission on a client call")
Signed-off-by: David Howells <dhowells@redhat.com>
Fix rxrpc_unbundle_conn() to not drop the bundle usage count when cleaning
up an exclusive connection.
Based on the suggested fix from Hillf Danton.
Fixes: 245500d853e9 ("rxrpc: Rewrite the client connection manager")
Reported-by: syzbot+d57aaf84dd8a550e6d91@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
This definition is used by the iptables legacy UAPI, restore it.
Fixes: d3519cb89f6d ("netfilter: nf_tables: add inet ingress support")
Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As per RFC4443, the destination address field for ICMPv6 error messages
is copied from the source address field of the invoking packet.
In configurations with Virtual Routing and Forwarding tables, looking up
which routing table to use for sending ICMPv6 error messages is
currently done by using the destination net_device.
If the source and destination interfaces are within separate VRFs, or
one in the global routing table and the other in a VRF, looking up the
source address of the invoking packet in the destination interface's
routing table will fail if the destination interface's routing table
contains no route to the invoking packet's source address.
One observable effect of this issue is that traceroute6 does not work in
the following cases:
- Route leaking between global routing table and VRF
- Route leaking between VRFs
Use the source device routing table when sending ICMPv6 error
messages.
[ In the context of ipv4, it has been pointed out that a similar issue
may exist with ICMP errors triggered when forwarding between network
namespaces. It would be worthwhile to investigate whether ipv6 has
similar issues, but is outside of the scope of this investigation. ]
[ Testing shows that similar issues exist with ipv6 unreachable /
fragmentation needed messages. However, investigation of this
additional failure mode is beyond this investigation's scope. ]
Link: https://tools.ietf.org/html/rfc4443
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As per RFC792, ICMP errors should be sent to the source host.
However, in configurations with Virtual Routing and Forwarding tables,
looking up which routing table to use is currently done by using the
destination net_device.
commit 9d1a6c4ea43e ("net: icmp_route_lookup should use rt dev to
determine L3 domain") changes the interface passed to
l3mdev_master_ifindex() and inet_addr_type_dev_table() from skb_in->dev
to skb_dst(skb_in)->dev. This effectively uses the destination device
rather than the source device for choosing which routing table should be
used to lookup where to send the ICMP error.
Therefore, if the source and destination interfaces are within separate
VRFs, or one in the global routing table and the other in a VRF, looking
up the source host in the destination interface's routing table will
fail if the destination interface's routing table contains no route to
the source host.
One observable effect of this issue is that traceroute does not work in
the following cases:
- Route leaking between global routing table and VRF
- Route leaking between VRFs
Preferably use the source device routing table when sending ICMP error
messages. If no source device is set, fall-back on the destination
device routing table. Else, use the main routing table (index 0).
[ It has been pointed out that a similar issue may exist with ICMP
errors triggered when forwarding between network namespaces. It would
be worthwhile to investigate, but is outside of the scope of this
investigation. ]
[ It has also been pointed out that a similar issue exists with
unreachable / fragmentation needed messages, which can be triggered by
changing the MTU of eth1 in r1 to 1400 and running:
ip netns exec h1 ping -s 1450 -Mdo -c1 172.16.2.2
Some investigation points to raw_icmp_error() and raw_err() as being
involved in this last scenario. The focus of this patch is TTL expired
ICMP messages, which go through icmp_route_lookup.
Investigation of failure modes related to raw_icmp_error() is beyond
this investigation's scope. ]
Fixes: 9d1a6c4ea43e ("net: icmp_route_lookup should use rt dev to determine L3 domain")
Link: https://tools.ietf.org/html/rfc792
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Extend nf_queue selftest to cover re-queueing, non-gso mode and
delayed queueing, from Florian Westphal.
2) Clear skb->tstamp in IPVS forwarding path, from Julian Anastasov.
3) Provide netlink extended error reporting for EEXIST case.
4) Missing VLAN offload tag and proto in log target.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
GRE tunnel has its own header_ops, ipgre_header_ops, and sets it
conditionally. When it is set, it assumes the outer IP header is
already created before ipgre_xmit().
This is not true when we send packets through a raw packet socket,
where L2 headers are supposed to be constructed by user. Packet
socket calls dev_validate_header() to validate the header. But
GRE tunnel does not set dev->hard_header_len, so that check can
be simply bypassed, therefore uninit memory could be passed down
to ipgre_xmit(). Similar for dev->needed_headroom.
dev->hard_header_len is supposed to be the length of the header
created by dev->header_ops->create(), so it should be used whenever
header_ops is set, and dev->needed_headroom should be used when it
is not set.
Reported-and-tested-by: syzbot+4a2c52677a8a1aa283cb@syzkaller.appspotmail.com
Cc: William Tu <u9012063@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Xie He <xie.he.0141@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simplify the code by using new function dev_fetch_sw_netstats().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/b6047017-8226-6b7e-a3cd-064e69fdfa27@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In several places the same code is used to populate rtnl_link_stats64
fields with data from pcpu_sw_netstats. Therefore factor out this code
to a new function dev_fetch_sw_netstats().
v2:
- constify argument netstats
- don't ignore netstats being NULL or an ERRPTR
- switch to EXPORT_SYMBOL_GPL
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/6d16a338-52f5-df69-0020-6bc771a7d498@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 109f6e39fa07c48f5801 ("af_unix: Allow SO_PEERCRED
to work across namespaces.") introduced the old_pid variable
in unix_listen, but it's never used.
Remove the declaration and the call to put_pid.
Signed-off-by: Or Cohen <orcohen@paloaltonetworks.com>
Link: https://lore.kernel.org/r/20201011153527.18628-1-orcohen@paloaltonetworks.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
SOCK_TSTAMP_NEW (timespec64 instead of timespec) is also used for
hardware time stamps (configured via SO_TIMESTAMPING_NEW).
User space (ptp4l) first configures hardware time stamping via
SO_TIMESTAMPING_NEW which sets SOCK_TSTAMP_NEW. In the next step, ptp4l
disables SO_TIMESTAMPNS(_NEW) (software time stamps), but this must not
switch hardware time stamps back to "32 bit mode".
This problem happens on 32 bit platforms were the libc has already
switched to struct timespec64 (from SO_TIMExxx_OLD to SO_TIMExxx_NEW
socket options). ptp4l complains with "missing timestamp on transmitted
peer delay request" because the wrong format is received (and
discarded).
Fixes: 887feae36aee ("socket: Add SO_TIMESTAMP[NS]_NEW")
Fixes: 783da70e8396 ("net: add sock_enable_timestamps")
Signed-off-by: Christian Eggers <ceggers@arri.de>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The comparison of optname with SO_TIMESTAMPING_NEW is wrong way around,
so SOCK_TSTAMP_NEW will first be set and than reset again. Additionally
move it out of the test for SOF_TIMESTAMPING_RX_SOFTWARE as this seems
unrelated.
This problem happens on 32 bit platforms were the libc has already
switched to struct timespec64 (from SO_TIMExxx_OLD to SO_TIMExxx_NEW
socket options). ptp4l complains with "missing timestamp on transmitted
peer delay request" because the wrong format is received (and
discarded).
Fixes: 9718475e6908 ("socket: Add SO_TIMESTAMPING_NEW")
Signed-off-by: Christian Eggers <ceggers@arri.de>
Reviewed-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Reviewed-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Replace commas with semicolons. Commas introduce unnecessary
variability in the code structure and are hard to see. What is done
is essentially described by the following Coccinelle semantic patch
(http://coccinelle.lip6.fr/):
// <smpl>
@@ expression e1,e2; @@
e1
-,
+;
e2
... when any
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/1602412498-32025-6-git-send-email-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Replace commas with semicolons. Commas introduce unnecessary
variability in the code structure and are hard to see. What is done
is essentially described by the following Coccinelle semantic patch
(http://coccinelle.lip6.fr/):
// <smpl>
@@ expression e1,e2; @@
e1
-,
+;
e2
... when any
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/1602412498-32025-5-git-send-email-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Replace commas with semicolons. Commas introduce unnecessary
variability in the code structure and are hard to see. What is done
is essentially described by the following Coccinelle semantic patch
(http://coccinelle.lip6.fr/):
// <smpl>
@@ expression e1,e2; @@
e1
-,
+;
e2
... when any
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/1602412498-32025-4-git-send-email-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dump vlan tag and proto for the usual vlan offload case if the
NF_LOG_MACDECODE flag is set on. Without this information the logging is
misleading as there is no reference to the VLAN header.
[12716.993704] test: IN=veth0 OUT= MACSRC=86:6c:92:ea:d6:73 MACDST=0e:3b:eb:86:73:76 VPROTO=8100 VID=10 MACPROTO=0800 SRC=192.168.10.2 DST=172.217.168.163 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=2548 DF PROTO=TCP SPT=55848 DPT=80 WINDOW=501 RES=0x00 ACK FIN URGP=0
[12721.157643] test: IN=veth0 OUT= MACSRC=86:6c:92:ea:d6:73 MACDST=0e:3b:eb:86:73:76 VPROTO=8100 VID=10 MACPROTO=0806 ARP HTYPE=1 PTYPE=0x0800 OPCODE=2 MACSRC=86:6c:92:ea:d6:73 IPSRC=192.168.10.2 MACDST=0e:3b:eb:86:73:76 IPDST=192.168.10.1
Fixes: 83e96d443b37 ("netfilter: log: split family specific code to nf_log_{ip,ip6,common}.c files")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+EXPEQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpiR4EAC3trm1ojXVF7y9/XRhcPpb4Pror+ZA1coO
gyoy+zUuCEl9WCzzHWqXULMYMP0YzNJnJs0oLQPA1s0sx1H4uDMl/UXg0OXZisYG
Y59Kca3c1DHFwj9KPQXfGmCEjc/rbDWK5TqRc2iZMz+6E5Mt71UFZHtenwgV1zD8
hTmZZkzLCu2ePfOvrPONgL5tDqPWGVyn61phoC7cSzMF66juXGYuvQGktzi/m6q+
jAxUnhKvKTlLB9wsq3s5X/20/QD56Yuba9U+YxeeNDBE8MDWQOsjz0mZCV1fn4p3
h/6762aRaWaXH7EwMtsHFUWy7arJZg/YoFYNYLv4Ksyy3y4sMABZCy3A+JyzrgQ0
hMu7vjsP+k22X1WH8nyejBfWNEmxu6dpgckKrgF0dhJcXk/acWA3XaDWZ80UwfQy
isKRAP1rA0MJKHDMIwCzSQJDPvtUAkPptbNZJcUSU78o+pPoCaQ93V++LbdgGtKn
iGJJqX05dVbcsDx5X7fluphjkUTC4yFr7ZgLgbOIedXajWRD8iOkO2xxCHk6SKFl
iv9entvRcX9k3SHK9uffIUkRBUujMU0+HCIQFCO1qGmkCaS5nSrovZl4HoL7L/Dj
+T8+v7kyJeklLXgJBaE7jk01O4HwZKjwPWMbCjvL9NKk8j7c1soYnRu5uNvi85Mu
/9wn671s+w==
=udgj
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.10-2020-10-12' of git://git.kernel.dk/linux-block
Pull io_uring updates from Jens Axboe:
- Add blkcg accounting for io-wq offload (Dennis)
- A use-after-free fix for io-wq (Hillf)
- Cancelation fixes and improvements
- Use proper files_struct references for offload
- Cleanup of io_uring_get_socket() since that can now go into our own
header
- SQPOLL fixes and cleanups, and support for sharing the thread
- Improvement to how page accounting is done for registered buffers and
huge pages, accounting the real pinned state
- Series cleaning up the xarray code (Willy)
- Various cleanups, refactoring, and improvements (Pavel)
- Use raw spinlock for io-wq (Sebastian)
- Add support for ring restrictions (Stefano)
* tag 'io_uring-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (62 commits)
io_uring: keep a pointer ref_node in file_data
io_uring: refactor *files_register()'s error paths
io_uring: clean file_data access in files_register
io_uring: don't delay io_init_req() error check
io_uring: clean leftovers after splitting issue
io_uring: remove timeout.list after hrtimer cancel
io_uring: use a separate struct for timeout_remove
io_uring: improve submit_state.ios_left accounting
io_uring: simplify io_file_get()
io_uring: kill extra check in fixed io_file_get()
io_uring: clean up ->files grabbing
io_uring: don't io_prep_async_work() linked reqs
io_uring: Convert advanced XArray uses to the normal API
io_uring: Fix XArray usage in io_uring_add_task_file
io_uring: Fix use of XArray in __io_uring_files_cancel
io_uring: fix break condition for __io_uring_register() waiting
io_uring: no need to call xa_destroy() on empty xarray
io_uring: batch account ->req_issue and task struct references
io_uring: kill callback_head argument for io_req_task_work_add()
io_uring: move req preps out of io_issue_sqe()
...
Pull crypto updates from Herbert Xu:
"API:
- Allow DRBG testing through user-space af_alg
- Add tcrypt speed testing support for keyed hashes
- Add type-safe init/exit hooks for ahash
Algorithms:
- Mark arc4 as obsolete and pending for future removal
- Mark anubis, khazad, sead and tea as obsolete
- Improve boot-time xor benchmark
- Add OSCCA SM2 asymmetric cipher algorithm and use it for integrity
Drivers:
- Fixes and enhancement for XTS in caam
- Add support for XIP8001B hwrng in xiphera-trng
- Add RNG and hash support in sun8i-ce/sun8i-ss
- Allow imx-rngc to be used by kernel entropy pool
- Use crypto engine in omap-sham
- Add support for Ingenic X1830 with ingenic"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (205 commits)
X.509: Fix modular build of public_key_sm2
crypto: xor - Remove unused variable count in do_xor_speed
X.509: fix error return value on the failed path
crypto: bcm - Verify GCM/CCM key length in setkey
crypto: qat - drop input parameter from adf_enable_aer()
crypto: qat - fix function parameters descriptions
crypto: atmel-tdes - use semicolons rather than commas to separate statements
crypto: drivers - use semicolons rather than commas to separate statements
hwrng: mxc-rnga - use semicolons rather than commas to separate statements
hwrng: iproc-rng200 - use semicolons rather than commas to separate statements
hwrng: stm32 - use semicolons rather than commas to separate statements
crypto: xor - use ktime for template benchmarking
crypto: xor - defer load time benchmark to a later time
crypto: hisilicon/zip - fix the uninitalized 'curr_qm_qp_num'
crypto: hisilicon/zip - fix the return value when device is busy
crypto: hisilicon/zip - fix zero length input in GZIP decompress
crypto: hisilicon/zip - fix the uncleared debug registers
lib/mpi: Fix unused variable warnings
crypto: x86/poly1305 - Remove assignments with no effect
hwrng: npcm - modify readl to readb
...
Pull compat iovec cleanups from Al Viro:
"Christoph's series around import_iovec() and compat variant thereof"
* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
security/keys: remove compat_keyctl_instantiate_key_iov
mm: remove compat_process_vm_{readv,writev}
fs: remove compat_sys_vmsplice
fs: remove the compat readv/writev syscalls
fs: remove various compat readv/writev helpers
iov_iter: transparently handle compat iovecs in import_iovec
iov_iter: refactor rw_copy_check_uvector and import_iovec
iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c
compat.h: fix a spelling error in <linux/compat.h>
Pull copy_and_csum cleanups from Al Viro:
"Saner calling conventions for csum_and_copy_..._user() and friends"
[ Removing 800+ lines of code and cleaning stuff up is good - Linus ]
* 'work.csum_and_copy' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ppc: propagate the calling conventions change down to csum_partial_copy_generic()
amd64: switch csum_partial_copy_generic() to new calling conventions
sparc64: propagate the calling convention changes down to __csum_partial_copy_...()
xtensa: propagate the calling conventions change down into csum_partial_copy_generic()
mips: propagate the calling convention change down into __csum_partial_copy_..._user()
mips: __csum_partial_copy_kernel() has no users left
mips: csum_and_copy_{to,from}_user() are never called under KERNEL_DS
sparc32: propagate the calling conventions change down to __csum_partial_copy_sparc_generic()
i386: propagate the calling conventions change down to csum_partial_copy_generic()
sh: propage the calling conventions change down to csum_partial_copy_generic()
m68k: get rid of zeroing destination on error in csum_and_copy_from_user()
arm: propagate the calling convention changes down to csum_partial_copy_from_user()
alpha: propagate the calling convention changes down to csum_partial_copy.c helpers
saner calling conventions for csum_and_copy_..._user()
csum_and_copy_..._user(): pass 0xffffffff instead of 0 as initial sum
csum_partial_copy_nocheck(): drop the last argument
unify generic instances of csum_partial_copy_nocheck()
icmp_push_reply(): reorder adding the checksum up
skb_copy_and_csum_bits(): don't bother with the last argument
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-10-12
The main changes are:
1) The BPF verifier improvements to track register allocation pattern, from Alexei and Yonghong.
2) libbpf relocation support for different size load/store, from Andrii.
3) bpf_redirect_peer() helper and support for inner map array with different max_entries, from Daniel.
4) BPF support for per-cpu variables, form Hao.
5) sockmap improvements, from John.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for net-next:
1) Inspect the reply packets coming from DR/TUN and refresh connection
state and timeout, from longguang yue and Julian Anastasov.
2) Series to add support for the inet ingress chain type in nf_tables.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The initial support for netlink extended ACK is missing the chain update
path, which results in misleading error reporting in case of EEXIST.
Fixes 36dd1bcc07e5 ("netfilter: nf_tables: initial support for extended ACK reporting")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
con->out_msg must be cleared on Policy::stateful_server
(!CEPH_MSG_CONNECT_LOSSY) faults. Not doing so botches the
reconnection attempt, because after writing the banner the
messenger moves on to writing the data section of that message
(either from where it got interrupted by the connection reset or
from the beginning) instead of writing struct ceph_msg_connect.
This results in a bizarre error message because the server
sends CEPH_MSGR_TAG_BADPROTOVER but we think we wrote struct
ceph_msg_connect:
libceph: mds0 (1)172.21.15.45:6828 socket error on write
ceph: mds0 reconnect start
libceph: mds0 (1)172.21.15.45:6829 socket closed (con state OPEN)
libceph: mds0 (1)172.21.15.45:6829 protocol version mismatch, my 32 != server's 32
libceph: mds0 (1)172.21.15.45:6829 protocol version mismatch
AFAICT this bug goes back to the dawn of the kernel client.
The reason it survived for so long is that only MDS sessions
are stateful and only two MDS messages have a data section:
CEPH_MSG_CLIENT_RECONNECT (always, but reconnecting is rare)
and CEPH_MSG_CLIENT_REQUEST (only when xattrs are involved).
The connection has to get reset precisely when such message
is being sent -- in this case it was the former.
Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/47723
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
The queued con->work can start executing (and therefore logging)
before we get to this "con->work has been queued" message, making
the logs confusing. Move it up, with the meaning of "con->work
is about to be queued".
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Replace a global map->crush_workspace (protected by a global mutex)
with a list of workspaces, up to the number of CPUs + 1.
This is based on a patch from Robin Geuze <robing@nl.team.blue>.
Robin and his team have observed a 10-20% increase in IOPS on all
queue depths and lower CPU usage as well on a high-end all-NVMe
100GbE cluster.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
In p9_fd_create_unix, checking is performed to see if the addr (passed
as an argument) is NULL or not.
However, no check is performed to see if addr is a valid address, i.e.,
it doesn't entirely consist of only 0's.
The initialization of sun_server.sun_path to be equal to this faulty
addr value leads to an uninitialized variable, as detected by KMSAN.
Checking for this (faulty addr) and returning a negative error number
appropriately, resolves this issue.
Link: http://lkml.kernel.org/r/20201012042404.2508-1-anant.thazhemadam@gmail.com
Reported-by: syzbot+75d51fe5bf4ebe988518@syzkaller.appspotmail.com
Tested-by: syzbot+75d51fe5bf4ebe988518@syzkaller.appspotmail.com
Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Currently, we often run with a nop parser namely one that just does
this, 'return skb->len'. This happens when either our verdict program
can handle streaming data or it is only looking at socket data such
as IP addresses and other metadata associated with the flow. The second
case is common for a L3/L4 proxy for instance.
So lets allow loading programs without the parser then we can skip
the stream parser logic and avoid having to add a BPF program that
is effectively a nop.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/160239297866.8495.13345662302749219672.stgit@john-Precision-5820-Tower
We are about to allow skb_verdict to run without skb_parser programs
as a first step change code to check each program type specifically.
This should be a mechanical change without any impact to actual result.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/160239294756.8495.5796595770890272219.stgit@john-Precision-5820-Tower
Move skb->sk assignment out of sk_psock_bpf_run() and into individual
callers. Then we can use proper skb_set_owner_r() call to assign a
sk to a skb. This improves things by also charging the truesize against
the sockets sk_rmem_alloc counter. With this done we get some accounting
in place to ensure the memory associated with skbs on the workqueue are
still being accounted for somewhere. Finally, by using skb_set_owner_r
the destructor is setup so we can just let the normal skb_kfree logic
recover the memory. Combined with previous patch dropping skb_orphan()
we now can recover from memory pressure and maintain accounting.
Note, we will charge the skbs against their originating socket even
if being redirected into another socket. Once the skb completes the
redirect op the kfree_skb will give the memory back. This is important
because if we charged the socket we are redirecting to (like it was
done before this series) the sock_writeable() test could fail because
of the skb trying to be sent is already charged against the socket.
Also TLS case is special. Here we wait until we have decided not to
simply PASS the packet up the stack. In the case where we PASS the
packet up the stack we already have an skb which is accounted for on
the TLS socket context.
For the parser case we continue to just set/clear skb->sk this is
because the skb being used here may be combined with other skbs or
turned into multiple skbs depending on the parser logic. For example
the parser could request a payload length greater than skb->len so
that the strparser needs to collect multiple skbs. At any rate
the final result will be handled in the strparser recv callback.
Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/160226867513.5692.10579573214635925960.stgit@john-Precision-5820-Tower
Calling skb_orphan() is unnecessary in the strp rcv handler because the skb
is from a skb_clone() in __strp_recv. So it never has a destructor or a
sk assigned. Plus its confusing to read because it might hint to the reader
that the skb could have an sk assigned which is not true. Even if we did
have an sk assigned it would be cleaner to simply wait for the upcoming
kfree_skb().
Additionally, move the comment about strparser clone up so its closer to
the logic it is describing and add to it so that it is more complete.
Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/160226865548.5692.9098315689984599579.stgit@john-Precision-5820-Tower
In the sk_skb redirect case we didn't handle the case where we overrun
the sk_rmem_alloc entry on ingress redirect or sk_wmem_alloc on egress.
Because we didn't have anything implemented we simply dropped the skb.
This meant data could be dropped if socket memory accounting was in
place.
This fixes the above dropped data case by moving the memory checks
later in the code where we actually do the send or recv. This pushes
those checks into the workqueue and allows us to return an EAGAIN error
which in turn allows us to try again later from the workqueue.
Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/160226863689.5692.13861422742592309285.stgit@john-Precision-5820-Tower
The skb_set_owner_w is unnecessary here. The sendpage call will create a
fresh skb and set the owner correctly from workqueue. Its also not entirely
harmless because it consumes cycles, but also impacts resource accounting
by increasing sk_wmem_alloc. This is charging the socket we are going to
send to for the skb, but we will put it on the workqueue for some time
before this happens so we are artifically inflating sk_wmem_alloc for
this period. Further, we don't know how many skbs will be used to send the
packet or how it will be broken up when sent over the new socket so
charging it with one big sum is also not correct when the workqueue may
break it up if facing memory pressure. Seeing we don't know how/when
this is going to be sent drop the early accounting.
A later patch will do proper accounting charged on receive socket for
the case where skbs get enqueued on the workqueue.
Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/160226861708.5692.17964237936462425136.stgit@john-Precision-5820-Tower
When we receive an skb and the ingress skb verdict program returns
SK_PASS we currently set the ingress flag and put it on the workqueue
so it can be turned into a sk_msg and put on the sk_msg ingress queue.
Then finally telling userspace with data_ready hook.
Here we observe that if the workqueue is empty then we can try to
convert into a sk_msg type and call data_ready directly without
bouncing through a workqueue. Its a common pattern to have a recv
verdict program for visibility that always returns SK_PASS. In this
case unless there is an ENOMEM error or we overrun the socket we
can avoid the workqueue completely only using it when we fall back
to error cases caused by memory pressure.
By doing this we eliminate another case where data may be dropped
if errors occur on memory limits in workqueue.
Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/160226859704.5692.12929678876744977669.stgit@john-Precision-5820-Tower