IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
We do not need locking in xfrm_trans_queue because it is designed
to use per-CPU buffers. However, the original code incorrectly
used skb_queue_tail which takes the lock. This patch switches
it to __skb_queue_tail instead.
Reported-and-tested-by: Artem Savkov <asavkov@redhat.com>
Fixes: acf568ee859f ("xfrm: Reinject transport-mode packets...")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
One thing is notable: I applied two patches and later
reverted them - we'll get back to that once all the driver
situation is sorted out.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAlpOQXUACgkQB8qZga/f
l8Qpuw/+OclRyelfxh1v1xwFYDUAJZZmU9wr/Yx/ezZ8NoebA5bfJSXV/s+Tgw5E
oORx7LUkbxwreQtoEHtc9/IE7SCfXrB8kWoy5A/Q094SDglWOiQbRuYQ0gn4pMkV
zukm4O3+cHHGj1slnSOzQWNeF/5mbNwEMo5Id5ZnSjMfoPl+CWH8qvfu4oRFhmiG
tZ0gIGARX9FL3v+RyqEhugTxfCzAYRTinGQhG4r6LlkgCqTnza7VhG+3N+fPMkjS
4Rs9ucnMnunrbd9lbbpTb+vWAJ+McJfVw/Gtmjp/W8vyxZFEr0EHiY31btmMAhTO
ibZVZYCslL3WM2vIxxy0nGR6O28eCRzU4ETSOrInv4ZooplvmFHVnjms1hqiSaZO
4qy8Yb8cPrIPTcI3OYWvicBAAcHLqlEw8GC4rltf2bw6a0FdJ3igitFy9MPFhxBW
OZ0YS+exHAb9lBbk49qOM0Bqu7ug5MUTygX9RGTeWB0sRDmc5OQVsAqvfaapGts9
u+huQzO2Y1b8IDVAL/tTOoDz6A1Qc/S2BFDNilFKVeGOhB35jFA3BN4vJzmpp9Oy
cz8150ls6BbfHjrFiuHlQWwaoG6GTebSln9XnEqNXfh5GFj1H/FYTQOv4rIaIjrY
wdirSv6UopaRjnBSb062glmb9ZFHQEKBWDvRC7jTRaXMRTBQ2zo=
=M0gz
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2018-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
We have things all over the place, no point listing them.
One thing is notable: I applied two patches and later
reverted them - we'll get back to that once all the driver
situation is sorted out.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
set rm->atomic.op_active to 0 when rds_pin_pages() fails
or the user supplied address is invalid,
this prevents a NULL pointer usage in rds_atomic_free_op()
Signed-off-by: Mohamed Ghannam <simo.ghannam@gmail.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function is used from two places: rtnl_dump_ifinfo and
rtnl_getlink. In rtnl_getlink(), we give a request skb into
get_target_net(), but in rtnl_dump_ifinfo, we give a response skb
into get_target_net().
The problem here is that NETLINK_CB() isn't initialized for the response
skb. In both cases we can get a user socket and give it instead of skb
into get_target_net().
This bug was found by syzkaller with this call-trace:
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 3149 Comm: syzkaller140561 Not tainted 4.15.0-rc4-mm1+ #47
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:__netlink_ns_capable+0x8b/0x120 net/netlink/af_netlink.c:868
RSP: 0018:ffff8801c880f348 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff8443f900
RDX: 000000000000007b RSI: ffffffff86510f40 RDI: 00000000000003d8
RBP: ffff8801c880f360 R08: 0000000000000000 R09: 1ffff10039101e4f
R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff86510f40
R13: 000000000000000c R14: 0000000000000004 R15: 0000000000000011
FS: 0000000001a1a880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020151000 CR3: 00000001c9511005 CR4: 00000000001606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
netlink_ns_capable+0x26/0x30 net/netlink/af_netlink.c:886
get_target_net+0x9d/0x120 net/core/rtnetlink.c:1765
rtnl_dump_ifinfo+0x2e5/0xee0 net/core/rtnetlink.c:1806
netlink_dump+0x48c/0xce0 net/netlink/af_netlink.c:2222
__netlink_dump_start+0x4f0/0x6d0 net/netlink/af_netlink.c:2319
netlink_dump_start include/linux/netlink.h:214 [inline]
rtnetlink_rcv_msg+0x7f0/0xb10 net/core/rtnetlink.c:4485
netlink_rcv_skb+0x21e/0x460 net/netlink/af_netlink.c:2441
rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4540
netlink_unicast_kernel net/netlink/af_netlink.c:1308 [inline]
netlink_unicast+0x4be/0x6a0 net/netlink/af_netlink.c:1334
netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1897
Cc: Jiri Benc <jbenc@redhat.com>
Fixes: 79e1ad148c84 ("rtnetlink: use netnsid to query interface")
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the function l2cap_parse_conf_rsp and in the function
l2cap_parse_conf_req the following variable is declared without
initialization:
struct l2cap_conf_efs efs;
In addition, when parsing input configuration parameters in both of
these functions, the switch case for handling EFS elements may skip the
memcpy call that will write to the efs variable:
...
case L2CAP_CONF_EFS:
if (olen == sizeof(efs))
memcpy(&efs, (void *)val, olen);
...
The olen in the above if is attacker controlled, and regardless of that
if, in both of these functions the efs variable would eventually be
added to the outgoing configuration request that is being built:
l2cap_add_conf_opt(&ptr, L2CAP_CONF_EFS, sizeof(efs), (unsigned long) &efs);
So by sending a configuration request, or response, that contains an
L2CAP_CONF_EFS element, but with an element length that is not
sizeof(efs) - the memcpy to the uninitialized efs variable can be
avoided, and the uninitialized variable would be returned to the
attacker (16 bytes).
This issue has been assigned CVE-2017-1000410
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Ben Seri <ben@armis.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If there are multiple mesh stations with the same MAC address,
they will both get confused and start throwing warnings.
Obviously in this case nothing can actually work anyway, so just
drop frames that look like they're from ourselves early on.
Reported-by: Gui Iribarren <gui@altermundi.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
nl80211_nan_add_func() does not check if the required attribute
NL80211_NAN_FUNC_FOLLOW_UP_DEST is present when processing
NL80211_CMD_ADD_NAN_FUNCTION request. This request can be issued
by users with CAP_NET_ADMIN privilege and may result in NULL dereference
and a system crash. Add a check for the required attribute presence.
Signed-off-by: Hao Chen <flank3rsky@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
syzbot noticed a NULL pointer dereference panic in sctp_stream_free()
which was caused by an incomplete error handling in sctp_stream_init().
By not clearing stream->outcnt, it made a for() in sctp_stream_free()
think that it had elements to free, but not, leading to the panic.
As suggested by Xin Long, this patch also simplifies the error path by
moving it to the only if() that uses it.
See-also: https://www.spinics.net/lists/netdev/msg473756.html
See-also: https://www.spinics.net/lists/netdev/msg465024.html
Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: f952be79cebd ("sctp: introduce struct sctp_stream_out_ext")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When args->nr_local is 0, nr_pages gets also 0 due some size
calculation via rds_rm_size(), which is later used to allocate
pages for DMA, this bug produces a heap Out-Of-Bound write access
to a specific memory region.
Signed-off-by: Mohamed Ghannam <simo.ghannam@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to make the dsa_legacy_register() stub return 0 in order for
dsa_init_module() to successfully register and continue registering the
ETH_P_XDSA packet handler.
Fixes: 2a93c1a3651f ("net: dsa: Allow compiling out legacy support")
Reported-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 04d7b574b245 ("tipc: add multipoint-to-point flow control") we
introduced a protocol for preventing buffer overflow when many group
members try to simultaneously send messages to the same receiving member.
Stress test of this mechanism has revealed a couple of related bugs:
- When the receiving member receives an advertisement REMIT message from
one of the senders, it will sometimes prematurely activate a pending
member and send it the remitted advertisement, although the upper
limit for active senders has been reached. This leads to accumulation
of illegal advertisements, and eventually to messages being dropped
because of receive buffer overflow.
- When the receiving member leaves REMITTED state while a received
message is being read, we miss to look at the pending queue, to
activate the oldest pending peer. This leads to some pending senders
being starved out, and never getting the opportunity to profit from
the remitted advertisement.
We fix the former in the function tipc_group_proto_rcv() by returning
directly from the function once it becomes clear that the remitting
peer cannot leave REMITTED state at that point.
We fix the latter in the function tipc_group_update_rcv_win() by looking
up and activate the longest pending peer when it becomes clear that the
remitting peer now can leave REMITTED state.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In kernel log ths message appears on every boot:
"warning: `NetworkChangeNo' uses legacy ethtool link settings API,
link modes are only partially reported"
When ethtool link settings API changed, it started complaining about
usages of old API. Ironically, the original patch was from google but
the application using the legacy API is chrome.
Linux ABI is fixed as much as possible. The kernel must not break it
and should not complain about applications using legacy API's.
This patch just removes the warning since using legacy API's
in Linux is perfectly acceptable.
Fixes: 3f1ac7a700d0 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove DCCP probe module since jprobe has been deprecated.
That function is now replaced by dccp/dccp_probe trace-event.
You can use it via ftrace or perftools.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add DCCP sendmsg trace event (dccp/dccp_probe) for
replacing dccpprobe. User can trace this event via
ftrace or perftools.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove SCTP probe module since jprobe has been deprecated.
That function is now replaced by sctp/sctp_probe and
sctp/sctp_probe_path trace-events.
You can use it via ftrace or perftools.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add SCTP ACK tracking trace event to trace the changes of SCTP
association state in response to incoming packets.
It is used for debugging SCTP congestion control algorithms,
and will replace sctp_probe module.
Note that this event a bit tricky. Since this consists of 2
events (sctp_probe and sctp_probe_path) so you have to enable
both events as below.
# cd /sys/kernel/debug/tracing
# echo 1 > events/sctp/sctp_probe/enable
# echo 1 > events/sctp/sctp_probe_path/enable
Or, you can enable all the events under sctp.
# echo 1 > events/sctp/enable
Since sctp_probe_path event is always invoked from sctp_probe
event, you can not see any output if you only enable
sctp_probe_path.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove TCP probe module since jprobe has been deprecated.
That function is now replaced by tcp/tcp_probe trace-event.
You can use it via ftrace or perftools.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds an event to trace TCP stat variables with
slightly intrusive trace-event. This uses ftrace/perf
event log buffer to trace those state, no needs to
prepare own ring-buffer, nor custom user apps.
User can use ftrace to trace this event as below;
# cd /sys/kernel/debug/tracing
# echo 1 > events/tcp/tcp_probe/enable
(run workloads)
# cat trace
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
inet_diag currently provides less/greater than or equal operators for
comparing ports when filtering sockets. An equal comparison can be
performed by combining the two existing operators, or a user can for
example request a port range and then do the final filtering in
userspace. However, these approaches both have drawbacks. Implementing
equal using LE/GE causes the size and complexity of a filter to grow
quickly as the number of ports increase, while it on busy machines would
be great if the kernel only returns information about relevant sockets.
This patch introduces source and destination port equal operators.
INET_DIAG_BC_S_EQ is used to match a source port, INET_DIAG_BC_D_EQ a
destination port, and usage is the same as for the existing port
operators. I.e., the port to match is stored in the no-member of the
next inet_diag_bc_op-struct in the filter.
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
OVS_NLERR prints a newline at the end of the message string, so the
message string does not need to include a newline explicitly. Done
using Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
DCCP_CRIT prints some other text and then a newline after the message
string, so the message string does not need to include a newline
explicitly. Done using Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
When dev_requeue_skb() is called with bulked skb list, only the
first skb of the list will be requeued to qdisc layer, and leak
the others without free them.
TCP is broken due to skb leak since no free skb will be considered
as still in the host queue and never be retransmitted. This happend
when dev_requeue_skb() called from qdisc_restart().
qdisc_restart
|-- dequeue_skb
|-- sch_direct_xmit()
|-- dev_requeue_skb() <-- skb may bluked
Fix dev_requeue_skb() to requeue the full bluked list. Also change
to use __skb_queue_tail() in __dev_requeue_skb() to avoid skb out
of order.
Fixes: a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to update lastuse to to the most updated value between what
is already set and the new value.
If HW matching fails, i.e. because of an issue, the stats are not updated
but it could be that software did match and updated lastuse.
Fixes: 5712bf9c5c30 ("net/sched: act_mirred: Use passed lastuse argument")
Fixes: 9fea47d93bcc ("net/sched: act_gact: Update statistics when offloaded to hardware")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the check of the offload state to after the qdisc dump action was
called, so the qdisc could update it if it was changed.
Fixes: 7a4fa29106d9 ("net: sched: Add TCA_HW_OFFLOAD")
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the offload flag, TCQ_F_OFFLOADED, in each dump call (and ignore
the offloading function return value in relation to this flag).
This is done because a qdisc is being initialized, and therefore offloaded
before being grafted. Since the ability of the driver to offload the qdisc
depends on its location, a qdisc can be offloaded and un-offloaded by graft
calls, that doesn't effect the qdisc itself.
Fixes: 428a68af3a7c ("net: sched: Move to new offload indication in RED"
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 582442d6d5bc ("ipv6: Allow the MTU of ipip6 tunnel to be set
below 1280") fixed a mtu setting issue. It works for ipip6 tunnel.
But ip6gre dev updates the mtu also with ip6_tnl_change_mtu. Since
the inner packet over ip6gre can be ipv4 and it's mtu should also
be allowed to set below 1280, the same issue also exists on ip6gre.
This patch is to fix it by simply changing to check if parms.proto
is IPPROTO_IPV6 in ip6_tnl_change_mtu instead, to make ip6gre to
go to 'else' branch.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an ip6_tunnel is in mode 'any', where the transport layer
protocol can be either 4 or 41, dst_cache must be disabled.
This is because xfrm policies might apply to only one of the two
protocols. Caching dst would cause xfrm policies for one protocol
incorrectly used for the other.
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 87c320e51519a83c496ab7bfb4e96c8f9c001e89.
Changing the error return code in some situations turns out to
be harmful in practice. In particular Michael Ellerman reports
that DHCP fails on his powerpc machines, and this revert gets
things working again.
Johannes Berg agrees that this revert is the best course of
action for now.
Fixes: 029b6d140550 ("Revert "net: core: maybe return -EEXIST in __dev_alloc_name"")
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
request_module can sleep, thus we cannot hold rcu_read_lock() while
calling it. The function also jumps back and takes rcu_read_lock()
again (in xfrm_state_get_afinfo()), resulting in an imbalance.
This codepath is triggered whenever a new offloaded state is created.
Fixes: ffdb5211da1c ("xfrm: Auto-load xfrm offload modules")
Reported-by: syzbot+ca425f44816d749e8eb49755567a75ee48cf4a30@syzkaller.appspotmail.com
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
If a message sent to a PF_KEY socket ended with an incomplete extension
header (fewer than 4 bytes remaining), then parse_exthdrs() read past
the end of the message, into uninitialized memory. Fix it by returning
-EINVAL in this case.
Reproducer:
#include <linux/pfkeyv2.h>
#include <sys/socket.h>
#include <unistd.h>
int main()
{
int sock = socket(PF_KEY, SOCK_RAW, PF_KEY_V2);
char buf[17] = { 0 };
struct sadb_msg *msg = (void *)buf;
msg->sadb_msg_version = PF_KEY_V2;
msg->sadb_msg_type = SADB_DELETE;
msg->sadb_msg_len = 2;
write(sock, buf, 17);
}
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
If a message sent to a PF_KEY socket ended with one of the extensions
that takes a 'struct sadb_address' but there were not enough bytes
remaining in the message for the ->sa_family member of the 'struct
sockaddr' which is supposed to follow, then verify_address_len() read
past the end of the message, into uninitialized memory. Fix it by
returning -EINVAL in this case.
This bug was found using syzkaller with KMSAN.
Reproducer:
#include <linux/pfkeyv2.h>
#include <sys/socket.h>
#include <unistd.h>
int main()
{
int sock = socket(PF_KEY, SOCK_RAW, PF_KEY_V2);
char buf[24] = { 0 };
struct sadb_msg *msg = (void *)buf;
struct sadb_address *addr = (void *)(msg + 1);
msg->sadb_msg_version = PF_KEY_V2;
msg->sadb_msg_type = SADB_DELETE;
msg->sadb_msg_len = 3;
addr->sadb_address_len = 1;
addr->sadb_address_exttype = SADB_EXT_ADDRESS_SRC;
write(sock, buf, 24);
}
Reported-by: Alexander Potapenko <glider@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
syzkaller triggered following KASAN splat:
BUG: KASAN: slab-out-of-bounds in xfrm_hash_rebuild+0xdbe/0xf00 net/xfrm/xfrm_policy.c:618
read of size 2 at addr ffff8801c8e92fe4 by task kworker/1:1/23 [..]
Workqueue: events xfrm_hash_rebuild [..]
__asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:428
xfrm_hash_rebuild+0xdbe/0xf00 net/xfrm/xfrm_policy.c:618
process_one_work+0xbbf/0x1b10 kernel/workqueue.c:2112
worker_thread+0x223/0x1990 kernel/workqueue.c:2246 [..]
The reproducer triggers:
1016 if (error) {
1017 list_move_tail(&walk->walk.all, &x->all);
1018 goto out;
1019 }
in xfrm_policy_walk() via pfkey (it sets tiny rcv space, dump
callback returns -ENOBUFS).
In this case, *walk is located the pfkey socket struct, so this socket
becomes visible in the global policy list.
It looks like this is intentional -- phony walker has walk.dead set to 1
and all other places skip such "policies".
Ccing original authors of the two commits that seem to expose this
issue (first patch missed ->dead check, second patch adds pfkey
sockets to policies dumper list).
Fixes: 880a6fab8f6ba5b ("xfrm: configure policy hash table thresholds by netlink")
Fixes: 12a169e7d8f4b1c ("ipsec: Put dumpers on the dump list")
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Timo Teras <timo.teras@iki.fi>
Cc: Christophe Gouault <christophe.gouault@6wind.com>
Reported-by: syzbot <bot+c028095236fcb6f4348811565b75084c754dc729@syzkaller.appspotmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Currently we allow state updates to competely replace the contents
of x->encap. This is bad because on the user side ESP only sets up
header lengths depending on encap_type once when the state is first
created. This could result in the header lengths getting out of
sync with the actual state configuration.
In practice key managers will never do a state update to change the
encapsulation type. Only the port numbers need to be changed as the
peer NAT entry is updated.
Therefore this patch adds a check in xfrm_state_update to forbid
any changes to the encap_type.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
net/ipv6/ip6_gre.c is a case of parallel adds.
include/trace/events/tcp.h is a little bit more tricky. The removal
of in-trace-macro ifdefs in 'net' paralleled with moving
show_tcp_state_name and friends over to include/trace/events/sock.h
in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
strparser wants to check socket ownership without producing any
warnings. As indicated by the comment in the code, it is permissible
for owned_by_user to return true.
Fixes: 43a0c6751a322847 ("strparser: Stream parser for messages")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-and-tested-by: <syzbot+c91c53af67f9ebe599a337d2e70950366153b295@syzkaller.appspotmail.com>
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_copy_ubufs must unclone before it is safe to modify its
skb_shared_info with skb_zcopy_clear.
Commit b90ddd568792 ("skbuff: skb_copy_ubufs must release uarg even
without user frags") ensures that all skbs release their zerocopy
state, even those without frags.
But I forgot an edge case where such an skb arrives that is cloned.
The stack does not build such packets. Vhost/tun skbs have their
frags orphaned before cloning. TCP skbs only attach zerocopy state
when a frag is added.
But if TCP packets can be trimmed or linearized, this might occur.
Tracing the code I found no instance so far (e.g., skb_linearize
ends up calling skb_zcopy_clear if !skb->data_len).
Still, it is non-obvious that no path exists. And it is fragile to
rely on this.
Fixes: b90ddd568792 ("skbuff: skb_copy_ubufs must release uarg even without user frags")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the qdisc is not found here, it is going to be created. Therefore,
this is not an error path. Remove the extack message set and don't
confuse user with error message in case the qdisc was created
successfully.
Fixes: 09215598119e ("net: sched: sch_api: handle generic qdisc errors")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 42b531de17d2f6 ("tipc: Fix missing connection request
handling"), we replaced unconditional wakeup() with condtional
wakeup for clients with flags POLLIN | POLLRDNORM | POLLRDBAND.
This breaks the applications which do a connect followed by poll
with POLLOUT flag. These applications are not woken when the
connection is ESTABLISHED and hence sleep forever.
In this commit, we fix it by including the POLLOUT event for
sockets in TIPC_CONNECTING state.
Fixes: 42b531de17d2f6 ("tipc: Fix missing connection request handling")
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2017-12-28
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Fix incorrect state pruning related to recognition of zero initialized
stack slots, where stacksafe exploration would mistakenly return a
positive pruning verdict too early ignoring other slots, from Gianluca.
2) Various BPF to BPF calls related follow-up fixes. Fix an off-by-one
in maximum call depth check, and rework maximum stack depth tracking
logic to fix a bypass of the total stack size check reported by Jann.
Also fix a bug in arm64 JIT where prog->jited_len was uninitialized.
Addition of various test cases to BPF selftests, from Alexei.
3) Addition of a BPF selftest to test_verifier that is related to BPF to
BPF calls which demonstrates a late caller stack size increase and
thus out of bounds access. Fixed above in 2). Test case from Jann.
4) Addition of correlating BPF helper calls, BPF to BPF calls as well
as BPF maps to bpftool xlated dump in order to allow for better
BPF program introspection and debugging, from Daniel.
5) Fixing several bugs in BPF to BPF calls kallsyms handling in order
to get it actually to work for subprogs, from Daniel.
6) Extending sparc64 JIT support for BPF to BPF calls and fix a couple
of build errors for libbpf on sparc64, from David.
7) Allow narrower context access for BPF dev cgroup typed programs in
order to adapt to LLVM code generation. Also adjust memlock rlimit
in the test_dev_cgroup BPF selftest, from Yonghong.
8) Add netdevsim Kconfig entry to BPF selftests since test_offload.py
relies on netdevsim device being available, from Jakub.
9) Reduce scope of xdp_do_generic_redirect_map() to being static,
from Xiongwei.
10) Minor cleanups and spelling fixes in BPF verifier, from Colin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Zerocopy payload is now always stored in frags, and space for headers
is reversed, so this memory is unused.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This avoids an unnecessary copy of 1-2KB and improves tso_fragment,
which has to fall back to tcp_fragment if skb->len != skb_data_len.
It also avoids a surprising inconsistency in notifications:
Zerocopy packets sent over loopback have their frags copied, so set
SO_EE_CODE_ZEROCOPY_COPIED in the notification. But this currently
does not happen for small packets, because when all data fits in the
linear fragment, data is not copied in skb_orphan_frags_rx.
Reported-by: Tom Deseyn <tom.deseyn@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Skbs that reach MAX_SKB_FRAGS cannot be extended further. Do the
same for zerocopy frags as non-zerocopy frags and set the PSH bit.
This improves GRO assembly.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a net-next follow-up to commit 268b79067942 ("skbuff: orphan
frags before zerocopy clone"), which fixed a bug in net, but added a
call to skb_zerocopy_clone at each frag to do so.
When segmenting skbs with user frags, either the user frags must be
replaced with private copies and uarg released, or the uarg must have
its refcount increased for each new skb.
skb_orphan_frags does the first, except for cases that can handle
reference counting. skb_zerocopy_clone then does the second.
Call these once per nskb, instead of once per frag.
That is, in the common case. With a frag list, also refresh when the
origin skb (frag_skb) changes.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch(180d8cd942ce) replaces all uses of struct sock fields'
memory_pressure, memory_allocated, sockets_allocated, and sysctl_mem
to accessor macros. But the sockets_allocated field of sctp sock is
not replaced at all. Then replace it now for unifying the code.
Fixes: 180d8cd942ce ("foundations of per-cgroup memory pressure controlling.")
Cc: Glauber Costa <glommer@parallels.com>
Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If kmem_cache_alloc() fails in the middle of the for() loop,
cleanup anything that might have been allocated so far.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f10b4cff98c6 ("rds: tcp: atomically purge entries from
rds_tcp_conn_list during netns delete") adds the field t_tcp_detached,
but this needs to be initialized explicitly to false.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the rds_sock is not added to the bind_hash_table, we must
reset rs_bound_addr so that rds_remove_bound will not trip on
this rds_sock.
rds_add_bound() does a rds_sock_put() in this failure path, so
failing to reset rs_bound_addr will result in a socket refcount
bug, and will trigger a WARN_ON with the stack shown below when
the application subsequently tries to close the PF_RDS socket.
WARNING: CPU: 20 PID: 19499 at net/rds/af_rds.c:496 \
rds_sock_destruct+0x15/0x30 [rds]
:
__sk_destruct+0x21/0x190
rds_remove_bound.part.13+0xb6/0x140 [rds]
rds_release+0x71/0x120 [rds]
sock_release+0x1a/0x70
sock_close+0xe/0x20
__fput+0xd5/0x210
task_work_run+0x82/0xa0
do_exit+0x2ce/0xb30
? syscall_trace_enter+0x1cc/0x2b0
do_group_exit+0x39/0xa0
SyS_exit_group+0x10/0x10
do_syscall_64+0x61/0x1a0
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce peer_offset parameter in order to add the capability
to specify two different values for payload offset on tx/rx side.
If just offset is provided by userspace use it for rx side as well
in order to maintain compatibility with older l2tp versions
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>