45345 Commits

Author SHA1 Message Date
Nogah Frankel
147c1e9b90 switchdev: bridge: Offload multicast disabled
Offload multicast disabled flag, for more accurate mc flood behavior:
When it is on, the mdb should be ignored.
When it is off, unregistered mc packets should be flooded to mc router
ports.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 11:46:38 -05:00
Jiri Pirko
40c81b25b1 sched: check negative err value to safe one level of indent
As it is more common, check err for !0. That allows to safe one level of
indentation and makes the code easier to read. Also, make 'next' variable
global in function as it is used twice.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 11:38:09 -05:00
Jiri Pirko
7215032ced sched: add missing curly braces in else branch in tc_ctl_tfilter
Curly braces need to be there, for stylistic reasons.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 11:38:09 -05:00
Jiri Pirko
6bb16e7ae2 sched: move err set right before goto errout in tc_ctl_tfilter
This makes the reader to know right away what is the error value.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 11:38:09 -05:00
Jiri Pirko
33a48927c1 sched: push TC filter protocol creation into a separate function
Make the long function tc_ctl_tfilter a little bit shorter and easier to
read. Also make the creation of filter proto symmetric to destruction.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 11:38:08 -05:00
Jiri Pirko
cf1facda2f sched: move tcf_proto_destroy and tcf_destroy_chain helpers into cls_api
Creation is done in this file, move destruction to be at the same place.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 11:38:08 -05:00
Jiri Pirko
79112c26f1 sched: rename tcf_destroy to tcf_destroy_proto
This function destroys TC filter protocol, not TC filter. So name it
accordingly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 11:38:08 -05:00
Ido Schimmel
2f3a5272e5 ipv4: fib: Add events for FIB replace and append
The FIB notification chain currently uses the NLM_F_{REPLACE,APPEND}
flags to signal routes being replaced or appended.

Instead of using netlink flags for in-kernel notifications we can simply
introduce two new events in the FIB notification chain. This has the
added advantage of making the API cleaner, thereby making it clear that
these events should be supported by listeners of the notification chain.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 11:32:13 -05:00
Ido Schimmel
5b7d616dbc ipv4: fib: Send notification before deleting FIB alias
When a FIB alias is replaced following NLM_F_REPLACE, the ENTRY_ADD
notification is sent after the reference on the previous FIB info was
dropped. This is problematic as potential listeners might need to access
it in their notification blocks.

Solve this by sending the notification prior to the deletion of the
replaced FIB alias. This is consistent with ENTRY_DEL notifications.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 11:32:12 -05:00
Ido Schimmel
42d5aa76ec ipv4: fib: Send deletion notification with actual FIB alias type
When a FIB alias is removed, a notification is sent using the type
passed from user space - can be RTN_UNSPEC - instead of the actual type
of the removed alias. This is problematic for listeners of the FIB
notification chain, as several FIB aliases can exist with matching
parameters, but the type.

Solve this by passing the actual type of the removed FIB alias.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 11:32:12 -05:00
Ido Schimmel
58e3bdd597 ipv4: fib: Only flush FIB aliases belonging to currently flushed table
In case the MAIN table is flushed and its trie is shared with the LOCAL
table, then we might be flushing FIB aliases belonging to the latter.
This can lead to FIB_ENTRY_DEL notifications sent with the wrong table
ID.

The above doesn't affect current listeners, as the table ID is ignored
during entry deletion, but this will change later in the patchset.

When flushing a particular table, skip any aliases belonging to a
different one.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Patrick McHardy <kaber@trash.net>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10 11:32:12 -05:00
Jarno Rajahalme
316d4d78cf openvswitch: Pack struct sw_flow_key.
struct sw_flow_key has two 16-bit holes. Move the most matched
conntrack match fields there.  In some typical cases this reduces the
size of the key that needs to be hashed into half and into one cache
line.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 22:59:34 -05:00
Jarno Rajahalme
dd41d33f0b openvswitch: Add force commit.
Stateful network admission policy may allow connections to one
direction and reject connections initiated in the other direction.
After policy change it is possible that for a new connection an
overlapping conntrack entry already exists, where the original
direction of the existing connection is opposed to the new
connection's initial packet.

Most importantly, conntrack state relating to the current packet gets
the "reply" designation based on whether the original direction tuple
or the reply direction tuple matched.  If this "directionality" is
wrong w.r.t. to the stateful network admission policy it may happen
that packets in neither direction are correctly admitted.

This patch adds a new "force commit" option to the OVS conntrack
action that checks the original direction of an existing conntrack
entry.  If that direction is opposed to the current packet, the
existing conntrack entry is deleted and a new one is subsequently
created in the correct direction.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 22:59:34 -05:00
Jarno Rajahalme
9dd7f8907c openvswitch: Add original direction conntrack tuple to sw_flow_key.
Add the fields of the conntrack original direction 5-tuple to struct
sw_flow_key.  The new fields are initially marked as non-existent, and
are populated whenever a conntrack action is executed and either finds
or generates a conntrack entry.  This means that these fields exist
for all packets that were not rejected by conntrack as untrackable.

The original tuple fields in the sw_flow_key are filled from the
original direction tuple of the conntrack entry relating to the
current packet, or from the original direction tuple of the master
conntrack entry, if the current conntrack entry has a master.
Generally, expected connections of connections having an assigned
helper (e.g., FTP), have a master conntrack entry.

The main purpose of the new conntrack original tuple fields is to
allow matching on them for policy decision purposes, with the premise
that the admissibility of tracked connections reply packets (as well
as original direction packets), and both direction packets of any
related connections may be based on ACL rules applying to the master
connection's original direction 5-tuple.  This also makes it easier to
make policy decisions when the actual packet headers might have been
transformed by NAT, as the original direction 5-tuple represents the
packet headers before any such transformation.

When using the original direction 5-tuple the admissibility of return
and/or related packets need not be based on the mere existence of a
conntrack entry, allowing separation of admission policy from the
established conntrack state.  While existence of a conntrack entry is
required for admission of the return or related packets, policy
changes can render connections that were initially admitted to be
rejected or dropped afterwards.  If the admission of the return and
related packets was based on mere conntrack state (e.g., connection
being in an established state), a policy change that would make the
connection rejected or dropped would need to find and delete all
conntrack entries affected by such a change.  When using the original
direction 5-tuple matching the affected conntrack entries can be
allowed to time out instead, as the established state of the
connection would not need to be the basis for packet admission any
more.

It should be noted that the directionality of related connections may
be the same or different than that of the master connection, and
neither the original direction 5-tuple nor the conntrack state bits
carry this information.  If needed, the directionality of the master
connection can be stored in master's conntrack mark or labels, which
are automatically inherited by the expected related connections.

The fact that neither ARP nor ND packets are trackable by conntrack
allows mutual exclusion between ARP/ND and the new conntrack original
tuple fields.  Hence, the IP addresses are overlaid in union with ARP
and ND fields.  This allows the sw_flow_key to not grow much due to
this patch, but it also means that we must be careful to never use the
new key fields with ARP or ND packets.  ARP is easy to distinguish and
keep mutually exclusive based on the ethernet type, but ND being an
ICMPv6 protocol requires a bit more attention.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 22:59:34 -05:00
Jarno Rajahalme
09aa98ad49 openvswitch: Inherit master's labels.
We avoid calling into nf_conntrack_in() for expected connections, as
that would remove the expectation that we want to stick around until
we are ready to commit the connection.  Instead, we do a lookup in the
expectation table directly.  However, after a successful expectation
lookup we have set the flow key label field from the master
connection, whereas nf_conntrack_in() does not do this.  This leads to
master's labels being inherited after an expectation lookup, but those
labels not being inherited after the corresponding conntrack action
with a commit flag.

This patch resolves the problem by changing the commit code path to
also inherit the master's labels to the expected connection.
Resolving this conflict in favor of inheriting the labels allows more
information be passed from the master connection to related
connections, which would otherwise be much harder if the 32 bits in
the connmark are not enough.  Labels can still be set explicitly, so
this change only affects the default values of the labels in presense
of a master connection.

Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 22:59:34 -05:00
Jarno Rajahalme
6ffcea7995 openvswitch: Refactor labels initialization.
Refactoring conntrack labels initialization makes changes in later
patches easier to review.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 22:59:34 -05:00
Jarno Rajahalme
b87cec3814 openvswitch: Simplify labels length logic.
Since 23014011ba42 ("netfilter: conntrack: support a fixed size of 128
distinct labels"), the size of conntrack labels extension has fixed to
128 bits, so we do not need to check for labels sizes shorter than 128
at run-time.  This patch simplifies labels length logic accordingly,
but allows the conntrack labels size to be increased in the future
without breaking the build.  In the event of conntrack labels
increasing in size OVS would still be able to deal with the 128 first
label bits.

Suggested-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 22:59:34 -05:00
Jarno Rajahalme
cb80d58fae openvswitch: Unionize ovs_key_ct_label with a u32 array.
Make the array of labels in struct ovs_key_ct_label an union, adding a
u32 array of the same byte size as the existing u8 array.  It is
faster to loop through the labels 32 bits at the time, which is also
the alignment of netlink attributes.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 22:59:34 -05:00
Jarno Rajahalme
193e309678 openvswitch: Do not trigger events for unconfirmed connections.
Receiving change events before the 'new' event for the connection has
been received can be confusing.  Avoid triggering change events for
setting conntrack mark or labels before the conntrack entry has been
confirmed.

Fixes: 182e3042e15d ("openvswitch: Allow matching on conntrack mark")
Fixes: c2ac66735870 ("openvswitch: Allow matching on conntrack label")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 22:59:34 -05:00
Jarno Rajahalme
9ff464db50 openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted.
The conntrack lookup for existing connections fails to invert the
packet 5-tuple for NATted packets, and therefore fails to find the
existing conntrack entry.  Conntrack only stores 5-tuples for incoming
packets, and there are various situations where a lookup on a packet
that has already been transformed by NAT needs to be made.  Looking up
an existing conntrack entry upon executing packet received from the
userspace is one of them.

This patch fixes ovs_ct_find_existing() to invert the packet 5-tuple
for the conntrack lookup whenever the packet has already been
transformed by conntrack from its input form as evidenced by one of
the NAT flags being set in the conntrack state metadata.

Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 22:59:34 -05:00
Jarno Rajahalme
5e17da634a openvswitch: Fix comments for skb->_nfct
Fix comments referring to skb 'nfct' and 'nfctinfo' fields now that
they are combined into '_nfct'.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 22:59:34 -05:00
Florian Fainelli
50f008e583 net: dsa: Fix duplicate object rule
While adding switch.o to the list of DSA object files, we essentially
duplicated the previous obj-y line and just added switch.o, remove the
duplicate.

Fixes: f515f192ab4f ("net: dsa: add switch notifier")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 17:11:09 -05:00
Xin Long
242bd2d519 sctp: implement sender-side procedures for Add Incoming/Outgoing Streams Request Parameter
This patch is to implement Sender-Side Procedures for the Add
Outgoing and Incoming Streams Request Parameter described in
rfc6525 section 5.1.5-5.1.6.

It is also to add sockopt SCTP_ADD_STREAMS in rfc6525 section
6.3.4 for users.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 16:57:38 -05:00
Xin Long
78098117f8 sctp: add support for generating stream reconf add incoming/outgoing streams request chunk
This patch is to define Add Incoming/Outgoing Streams Request
Parameter described in rfc6525 section 4.5 and 4.6. They can
be in one same chunk trunk as rfc6525 section 3.1-7 describes,
so make them in one function.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 16:57:38 -05:00
Xin Long
a92ce1a42d sctp: implement sender-side procedures for SSN/TSN Reset Request Parameter
This patch is to implement Sender-Side Procedures for the SSN/TSN
Reset Request Parameter descibed in rfc6525 section 5.1.4.

It is also to add sockopt SCTP_RESET_ASSOC in rfc6525 section 6.3.3
for users.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 16:57:38 -05:00
Xin Long
c56480a1e9 sctp: add support for generating stream reconf ssn/tsn reset request chunk
This patch is to define SSN/TSN Reset Request Parameter described
in rfc6525 section 4.3.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 16:57:38 -05:00
Xin Long
119aecbae5 sctp: streams should be recovered when it fails to send request.
Now when sending stream reset request, it closes the streams to
block further xmit of data until this request is completed, then
calls sctp_send_reconf to send the chunk.

But if sctp_send_reconf returns err, and it doesn't recover the
streams' states back,  which means the request chunk would not be
queued and sent, so the asoc will get stuck, streams are closed
and no packet is even queued.

This patch is to fix it by recovering the streams' states when
it fails to send the request, it is also to fix a return value.

Fixes: 7f9d68ac944e ("sctp: implement sender-side procedures for SSN Reset Request Parameter")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 16:57:38 -05:00
Hangbin Liu
9c8bb163ae igmp, mld: Fix memory leak in igmpv3/mld_del_delrec()
In function igmpv3/mld_add_delrec() we allocate pmc and put it in
idev->mc_tomb, so we should free it when we don't need it in del_delrec().
But I removed kfree(pmc) incorrectly in latest two patches. Now fix it.

Fixes: 24803f38a5c0 ("igmp: do not remove igmp souce list info when ...")
Fixes: 1666d49e1d41 ("mld: do not remove mld souce list info when ...")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 16:43:45 -05:00
WANG Cong
98e3862ca2 kcm: fix 0-length case for kcm_sendmsg()
Dmitry reported a kernel warning:

 WARNING: CPU: 3 PID: 2936 at net/kcm/kcmsock.c:627
 kcm_write_msgs+0x12e3/0x1b90 net/kcm/kcmsock.c:627
 CPU: 3 PID: 2936 Comm: a.out Not tainted 4.10.0-rc6+ #209
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:15 [inline]
  dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
  panic+0x1fb/0x412 kernel/panic.c:179
  __warn+0x1c4/0x1e0 kernel/panic.c:539
  warn_slowpath_null+0x2c/0x40 kernel/panic.c:582
  kcm_write_msgs+0x12e3/0x1b90 net/kcm/kcmsock.c:627
  kcm_sendmsg+0x163a/0x2200 net/kcm/kcmsock.c:1029
  sock_sendmsg_nosec net/socket.c:635 [inline]
  sock_sendmsg+0xca/0x110 net/socket.c:645
  sock_write_iter+0x326/0x600 net/socket.c:848
  new_sync_write fs/read_write.c:499 [inline]
  __vfs_write+0x483/0x740 fs/read_write.c:512
  vfs_write+0x187/0x530 fs/read_write.c:560
  SYSC_write fs/read_write.c:607 [inline]
  SyS_write+0xfb/0x230 fs/read_write.c:599
  entry_SYSCALL_64_fastpath+0x1f/0xc2

when calling syscall(__NR_write, sock2, 0x208aaf27ul, 0x0ul) on a KCM
seqpacket socket. It appears that kcm_sendmsg() does not handle len==0
case correctly, which causes an empty skb is allocated and queued.
Fix this by skipping the skb allocation for len==0 case.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09 16:38:48 -05:00
Koen Vandeputte
f181d6a3bc mac80211: fix CSA in IBSS mode
Add the missing IBSS capability flag during capability init as it needs
to be inserted into the generated beacon in order for CSA to work.

Fixes: cd7760e62c2ac ("mac80211: add support for CSA in IBSS mode")
Signed-off-by: Piotr Gawlowicz <gawlowicz@tkn.tu-berlin.de>
Signed-off-by: Mikołaj Chwalisz <chwalisz@tkn.tu-berlin.de>
Tested-by: Koen Vandeputte <koen.vandeputte@ncentric.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-02-09 15:18:24 +01:00
Luca Coelho
8585989d14 cfg80211: fix NAN bands definition
The nl80211_nan_dual_band_conf enumeration doesn't make much sense.
The default value is assigned to a bit, which makes it weird if the
default bit and other bits are set at the same time.

To improve this, get rid of NL80211_NAN_BAND_DEFAULT and add a wiphy
configuration to let the drivers define which bands are supported.
This is exposed to the userspace, which then can make a decision on
which band(s) to use.  Additionally, rename all "dual_band" elements
to "bands", to make things clearer.

Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-02-09 15:17:30 +01:00
Florian Westphal
37b103830e xfrm: policy: make policy backend const
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-09 10:22:19 +01:00
Florian Westphal
bdba9fe01e xfrm: policy: remove xfrm_policy_put_afinfo
Alternative is to keep it an make the (unused) afinfo arg const to avoid
the compiler warnings once the afinfo structs get constified.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-09 10:22:18 +01:00
Florian Westphal
a2817d8b27 xfrm: policy: remove family field
Only needed it to register the policy backend at init time.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-09 10:22:18 +01:00
Florian Westphal
3d7d25a68e xfrm: policy: remove garbage_collect callback
Just call xfrm_garbage_collect_deferred() directly.
This gets rid of a write to afinfo in register/unregister and allows to
constify afinfo later on.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-09 10:22:18 +01:00
Florian Westphal
2b61997aa0 xfrm: policy: xfrm_policy_unregister_afinfo can return void
Nothing checks the return value. Also, the errors returned on unregister
are impossible (we only support INET and INET6, so no way
xfrm_policy_afinfo[afinfo->family] can be anything other than 'afinfo'
itself).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-09 10:22:17 +01:00
Florian Westphal
f5e2bb4f5b xfrm: policy: xfrm_get_tos cannot fail
The comment makes it look like get_tos() is used to validate something,
but it turns out the comment was about xfrm_find_bundle() which got removed
years ago.

xfrm_get_tos will return either the tos (ipv4) or 0 (ipv6).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-09 10:22:17 +01:00
Florian Westphal
960fdfdeb9 xfrm: input: constify xfrm_input_afinfo
Nothing writes to these structures (the module owner was not used).

While at it, size xfrm_input_afinfo[] by the highest existing xfrm family
(INET6), not AF_MAX.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-09 10:22:17 +01:00
Ido Schimmel
982acb9756 ipv4: fib: Notify about nexthop status changes
When a multipath route is hit the kernel doesn't consider nexthops that
are DEAD or LINKDOWN when IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN is set.
Devices that offload multipath routes need to be made aware of nexthop
status changes. Otherwise, the device will keep forwarding packets to
non-functional nexthops.

Add the FIB_EVENT_NH_{ADD,DEL} events to the fib notification chain,
which notify capable devices when they should add or delete a nexthop
from their tables.

Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-08 15:25:18 -05:00
Florian Fainelli
382e1eea2d net: dsa: Do not destroy invalid network devices
dsa_slave_create() can fail, and dsa_user_port_unapply() will properly check
for the network device not being NULL before attempting to destroy it. We were
not setting the slave network device as NULL if dsa_slave_create() failed, so
we would later on be calling dsa_slave_destroy() on a now free'd and
unitialized network device, causing crashes in dsa_slave_destroy().

Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-08 14:50:51 -05:00
Roopa Prabhu
8ef9594764 bridge: vlan tunnel id info range fill size calc cleanups
This fixes a bug and cleans up tunnelid range size
calculation code by using consistent variable names
and checks in size calculation and fill functions.

tested for a few cases of vlan-vni range mappings:
(output from patched iproute2):
$bridge vlan showtunnel
port     vid        tunid
vxlan0   100-105    1000-1005
         200        2000
         210        2100
         211-213    2100-2102
         214        2104
         216-217    2108-2109
         219        2119

Fixes: efa5356b0d97 ("bridge: per vlan dst_metadata netlink support")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-08 14:39:19 -05:00
Eric Dumazet
97e219b7c1 gro_cells: move to net/core/gro_cells.c
We have many gro cells users, so lets move the code to avoid
duplication.

This creates a CONFIG_GRO_CELLS option.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-08 14:38:18 -05:00
WANG Cong
73d2c6678e ping: fix a null pointer dereference
Andrey reported a kernel crash:

  general protection fault: 0000 [#1] SMP KASAN
  Dumping ftrace buffer:
     (ftrace buffer empty)
  Modules linked in:
  CPU: 2 PID: 3880 Comm: syz-executor1 Not tainted 4.10.0-rc6+ #124
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  task: ffff880060048040 task.stack: ffff880069be8000
  RIP: 0010:ping_v4_push_pending_frames net/ipv4/ping.c:647 [inline]
  RIP: 0010:ping_v4_sendmsg+0x1acd/0x23f0 net/ipv4/ping.c:837
  RSP: 0018:ffff880069bef8b8 EFLAGS: 00010206
  RAX: dffffc0000000000 RBX: ffff880069befb90 RCX: 0000000000000000
  RDX: 0000000000000018 RSI: ffff880069befa30 RDI: 00000000000000c2
  RBP: ffff880069befbb8 R08: 0000000000000008 R09: 0000000000000000
  R10: 0000000000000002 R11: 0000000000000000 R12: ffff880069befab0
  R13: ffff88006c624a80 R14: ffff880069befa70 R15: 0000000000000000
  FS:  00007f6f7c716700(0000) GS:ffff88006de00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000004a6f28 CR3: 000000003a134000 CR4: 00000000000006e0
  Call Trace:
   inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
   sock_sendmsg_nosec net/socket.c:635 [inline]
   sock_sendmsg+0xca/0x110 net/socket.c:645
   SYSC_sendto+0x660/0x810 net/socket.c:1687
   SyS_sendto+0x40/0x50 net/socket.c:1655
   entry_SYSCALL_64_fastpath+0x1f/0xc2

This is because we miss a check for NULL pointer for skb_peek() when
the queue is empty. Other places already have the same check.

Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-08 13:58:21 -05:00
Willem de Bruijn
57031eb794 packet: round up linear to header len
Link layer protocols may unconditionally pull headers, as Ethernet
does in eth_type_trans. Ensure that the entire link layer header
always lies in the skb linear segment. tpacket_snd has such a check.
Extend this to packet_snd.

Variable length link layer headers complicate the computation
somewhat. Here skb->len may be smaller than dev->hard_header_len.

Round up the linear length to be at least as long as the smallest of
the two.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-08 13:56:37 -05:00
Willem de Bruijn
217e6fa24c net: introduce device min_header_len
The stack must not pass packets to device drivers that are shorter
than the minimum link layer header length.

Previously, packet sockets would drop packets smaller than or equal
to dev->hard_header_len, but this has false positives. Zero length
payload is used over Ethernet. Other link layer protocols support
variable length headers. Support for validation of these protocols
removed the min length check for all protocols.

Introduce an explicit dev->min_header_len parameter and drop all
packets below this value. Initially, set it to non-zero only for
Ethernet and loopback. Other protocols can follow in a patch to
net-next.

Fixes: 9ed988cd5915 ("packet: validate variable length ll headers")
Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-08 13:56:37 -05:00
WANG Cong
d7426c69a1 sit: fix a double free on error path
Dmitry reported a double free in sit_init_net():

  kernel BUG at mm/percpu.c:689!
  invalid opcode: 0000 [#1] SMP KASAN
  Dumping ftrace buffer:
     (ftrace buffer empty)
  Modules linked in:
  CPU: 0 PID: 15692 Comm: syz-executor1 Not tainted 4.10.0-rc6-next-20170206 #1
  Hardware name: Google Google Compute Engine/Google Compute Engine,
  BIOS Google 01/01/2011
  task: ffff8801c9cc27c0 task.stack: ffff88017d1d8000
  RIP: 0010:pcpu_free_area+0x68b/0x810 mm/percpu.c:689
  RSP: 0018:ffff88017d1df488 EFLAGS: 00010046
  RAX: 0000000000010000 RBX: 00000000000007c0 RCX: ffffc90002829000
  RDX: 0000000000010000 RSI: ffffffff81940efb RDI: ffff8801db841d94
  RBP: ffff88017d1df590 R08: dffffc0000000000 R09: 1ffffffff0bb3bdd
  R10: dffffc0000000000 R11: 00000000000135dd R12: ffff8801db841d80
  R13: 0000000000038e40 R14: 00000000000007c0 R15: 00000000000007c0
  FS:  00007f6ea608f700(0000) GS:ffff8801dbe00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 000000002000aff8 CR3: 00000001c8d44000 CR4: 00000000001426f0
  DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
  Call Trace:
   free_percpu+0x212/0x520 mm/percpu.c:1264
   ipip6_dev_free+0x43/0x60 net/ipv6/sit.c:1335
   sit_init_net+0x3cb/0xa10 net/ipv6/sit.c:1831
   ops_init+0x10a/0x530 net/core/net_namespace.c:115
   setup_net+0x2ed/0x690 net/core/net_namespace.c:291
   copy_net_ns+0x26c/0x530 net/core/net_namespace.c:396
   create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106
   unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
   SYSC_unshare kernel/fork.c:2281 [inline]
   SyS_unshare+0x64e/0xfc0 kernel/fork.c:2231
   entry_SYSCALL_64_fastpath+0x1f/0xc2

This is because when tunnel->dst_cache init fails, we free dev->tstats
once in ipip6_tunnel_init() and twice in sit_init_net(). This looks
redundant but its ndo_uinit() does not seem enough to clean up everything
here. So avoid this by setting dev->tstats to NULL after the first free,
at least for -net.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-08 13:12:22 -05:00
Marcus Huewe
a11a7f71ca ipv6: addrconf: fix generation of new temporary addresses
Under some circumstances it is possible that no new temporary addresses
will be generated.

For instance, addrconf_prefix_rcv_add_addr() indirectly calls
ipv6_create_tempaddr(), which creates a tentative temporary address and
starts dad. Next, addrconf_prefix_rcv_add_addr() indirectly calls
addrconf_verify_rtnl(). Now, assume that the previously created temporary
address has the least preferred lifetime among all existing addresses and
is still tentative (that is, dad is still running). Hence, the next run of
addrconf_verify_rtnl() is performed when the preferred lifetime of the
temporary address ends. If dad succeeds before the next run, the temporary
address becomes deprecated during the next run, but no new temporary
address is generated.

In order to fix this, schedule the next addrconf_verify_rtnl() run slightly
before the temporary address becomes deprecated, if dad succeeded.

Signed-off-by: Marcus Huewe <suse-tux@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-08 11:54:40 -05:00
Manuel Messner
935b7f6430 netfilter: nft_exthdr: add TCP option matching
This patch implements the kernel side of the TCP option patch.

Signed-off-by: Manuel Messner <mm@skelett.io>
Reviewed-by: Florian Westphal <fw@strlen.de>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-08 14:17:09 +01:00
Florian Westphal
edee4f1e92 netfilter: nft_ct: add zone id set support
zones allow tracking multiple connections sharing identical tuples,
this is needed e.g. when tracking distinct vlans with overlapping ip
addresses (conntrack is l2 agnostic).

Thus the zone has to be set before the packet is picked up by the
connection tracker.  This is done by means of 'conntrack templates' which
are conntrack structures used solely to pass this info from one netfilter
hook to the next.

The iptables CT target instantiates these connection tracking templates
once per rule, i.e. the template is fixed/tied to particular zone, can
be read-only and therefore be re-used by as many skbs simultaneously as
needed.

We can't follow this model because we want to take the zone id from
an sreg at rule eval time so we could e.g. fill in the zone id from
the packets vlan id or a e.g. nftables key : value maps.

To avoid cost of per packet alloc/free of the template, use a percpu
template 'scratch' object and use the refcount to detect the (unlikely)
case where the template is still attached to another skb (i.e., previous
skb was nfqueued ...).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-08 14:16:23 +01:00
Florian Westphal
5c178d81b6 netfilter: nft_ct: prepare for key-dependent error unwind
Next patch will add ZONE_ID set support which will need similar
error unwind (put operation) as conntrack labels.

Prepare for this: remove the 'label_got' boolean in favor
of a switch statement that can be extended in next patch.

As we already have that in the set_destroy function place that in
a separate function and call it from the set init function.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-08 14:16:23 +01:00