58400 Commits

Author SHA1 Message Date
Ursula Braun
6dabd40545 net/smc: introduce bookkeeping of SMCR link groups
If the smc module is unloaded return control from exit routine only,
if all link groups are freed.
If an IB device is thrown away return control from device removal only,
if all link groups belonging to this device are freed.
Counters for the total number of SMCR link groups and for the total
number of SMCR links per IB device are introduced. smc module unloading
continues only if the total number of SMCR link groups is zero. IB device
removal continues only it the total number of SMCR links per IB device
has decreased to zero.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16 12:26:49 -08:00
Vladimir Oltean
c80ed84e76 net: dsa: tag_8021q: Fix dsa_8021q_restore_pvid for an absent pvid
This sequence of operations:
ip link set dev br0 type bridge vlan_filtering 1
bridge vlan del dev swp2 vid 1
ip link set dev br0 type bridge vlan_filtering 1
ip link set dev br0 type bridge vlan_filtering 0

apparently fails with the message:

[   31.305716] sja1105 spi0.1: Reset switch and programmed static config. Reason: VLAN filtering
[   31.322161] sja1105 spi0.1: Couldn't determine PVID attributes (pvid 0)
[   31.328939] sja1105 spi0.1: Failed to setup VLAN tagging for port 1: -2
[   31.335599] ------------[ cut here ]------------
[   31.340215] WARNING: CPU: 1 PID: 194 at net/switchdev/switchdev.c:157 switchdev_port_attr_set_now+0x9c/0xa4
[   31.349981] br0: Commit of attribute (id=6) failed.
[   31.354890] Modules linked in:
[   31.357942] CPU: 1 PID: 194 Comm: ip Not tainted 5.4.0-rc6-01792-gf4f632e07665-dirty #2062
[   31.366167] Hardware name: Freescale LS1021A
[   31.370437] [<c03144dc>] (unwind_backtrace) from [<c030e184>] (show_stack+0x10/0x14)
[   31.378153] [<c030e184>] (show_stack) from [<c11d1c1c>] (dump_stack+0xe0/0x10c)
[   31.385437] [<c11d1c1c>] (dump_stack) from [<c034c730>] (__warn+0xf4/0x10c)
[   31.392373] [<c034c730>] (__warn) from [<c034c7bc>] (warn_slowpath_fmt+0x74/0xb8)
[   31.399827] [<c034c7bc>] (warn_slowpath_fmt) from [<c11ca204>] (switchdev_port_attr_set_now+0x9c/0xa4)
[   31.409097] [<c11ca204>] (switchdev_port_attr_set_now) from [<c117036c>] (__br_vlan_filter_toggle+0x6c/0x118)
[   31.418971] [<c117036c>] (__br_vlan_filter_toggle) from [<c115d010>] (br_changelink+0xf8/0x518)
[   31.427637] [<c115d010>] (br_changelink) from [<c0f8e9ec>] (__rtnl_newlink+0x3f4/0x76c)
[   31.435613] [<c0f8e9ec>] (__rtnl_newlink) from [<c0f8eda8>] (rtnl_newlink+0x44/0x60)
[   31.443329] [<c0f8eda8>] (rtnl_newlink) from [<c0f89f20>] (rtnetlink_rcv_msg+0x2cc/0x51c)
[   31.451477] [<c0f89f20>] (rtnetlink_rcv_msg) from [<c1008df8>] (netlink_rcv_skb+0xb8/0x110)
[   31.459796] [<c1008df8>] (netlink_rcv_skb) from [<c1008648>] (netlink_unicast+0x17c/0x1f8)
[   31.468026] [<c1008648>] (netlink_unicast) from [<c1008980>] (netlink_sendmsg+0x2bc/0x3b4)
[   31.476261] [<c1008980>] (netlink_sendmsg) from [<c0f43858>] (___sys_sendmsg+0x230/0x250)
[   31.484408] [<c0f43858>] (___sys_sendmsg) from [<c0f44c84>] (__sys_sendmsg+0x50/0x8c)
[   31.492209] [<c0f44c84>] (__sys_sendmsg) from [<c0301000>] (ret_fast_syscall+0x0/0x28)
[   31.500090] Exception stack(0xedf47fa8 to 0xedf47ff0)
[   31.505122] 7fa0:                   00000002 b6f2e060 00000003 beabd6a4 00000000 00000000
[   31.513265] 7fc0: 00000002 b6f2e060 5d6e3213 00000128 00000000 00000001 00000006 000619c4
[   31.521405] 7fe0: 00086078 beabd658 0005edbc b6e7ce68

The reason is the implementation of br_get_pvid:

static inline u16 br_get_pvid(const struct net_bridge_vlan_group *vg)
{
	if (!vg)
		return 0;

	smp_rmb();
	return vg->pvid;
}

Since VID 0 is an invalid pvid from the bridge's point of view, let's
add this check in dsa_8021q_restore_pvid to avoid restoring a pvid that
doesn't really exist.

Fixes: 5f33183b7fdf ("net: dsa: tag_8021q: Restore bridge VLANs when enabling vlan_filtering")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16 12:23:53 -08:00
Andrea Mayer
c71644d00f seg6: fix skb transport_header after decap_and_validate()
in the receive path (more precisely in ip6_rcv_core()) the
skb->transport_header is set to skb->network_header + sizeof(*hdr). As a
consequence, after routing operations, destination input expects to find
skb->transport_header correctly set to the next protocol (or extension
header) that follows the network protocol. However, decap behaviors (DX*,
DT*) remove the outer IPv6 and SRH extension and do not set again the
skb->transport_header pointer correctly. For this reason, the patch sets
the skb->transport_header to the skb->network_header + sizeof(hdr) in each
DX* and DT* behavior.

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16 12:18:32 -08:00
Andrea Mayer
7f91ed8c4f seg6: fix srh pointer in get_srh()
pskb_may_pull may change pointers in header. For this reason, it is
mandatory to reload any pointer that points into skb header.

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16 12:18:32 -08:00
Pablo Neira Ayuso
ff4bf2f42a netfilter: nf_tables: add nft_unregister_flowtable_hook()
Unbind flowtable callback if hook is unregistered.

This patch is implicitly fixing the error path of
nf_tables_newflowtable() and nft_flowtable_event().

Fixes: 8bb69f3b2918 ("netfilter: nf_tables: add flowtable offload control plane")
Reported-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15 23:44:54 +01:00
wenxu
d7c03a9f5c netfilter: nf_tables: check if bind callback fails and unbind if hook registration fails
Undo the callback binding before unregistering the existing hooks. This
should also check for error of the bind setup call.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15 23:44:53 +01:00
Pablo Neira Ayuso
63b48c73ff netfilter: nf_tables_offload: undo updates if transaction fails
The nft_flow_rule_offload_commit() function might fail after several
successful commands, thus, leaving the hardware filtering policy in
inconsistent state.

This patch adds nft_flow_rule_offload_abort() function which undoes the
updates that have been already processed if one command in this
transaction fails. Hence, the hardware ruleset is left as it was before
this aborted transaction.

The deletion path needs to create the flow_rule object too, in case that
an existing rule needs to be re-added from the abort path.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15 23:44:52 +01:00
Pablo Neira Ayuso
23403cd889 netfilter: nf_tables_offload: release flow_rule on error from commit path
If hardware offload commit path fails, release all flow_rule objects.

Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15 23:44:51 +01:00
Pablo Neira Ayuso
6ca61c7a8b netfilter: nf_tables_offload: remove reference to flow rule from deletion path
The cookie is sufficient to delete the rule from the hardware.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15 23:44:50 +01:00
wenxu
458a1828e9 netfilter: nf_flow_table: remove unnecessary parameter in flow_offload_fill_dir
The ct object is already in the flow_offload structure, remove it.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15 23:44:50 +01:00
wenxu
ea13ca3051 netfilter: nf_flow_table_offload: Fix check ndo_setup_tc when setup_block
It should check the ndo_setup_tc in the nf_flow_table_offload_setup.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15 23:44:49 +01:00
Alexei Starovoitov
91cc1a9974 bpf: Annotate context types
Annotate BPF program context types with program-side type and kernel-side type.
This type information is used by the verifier. btf_get_prog_ctx_type() is
used in the later patches to verify that BTF type of ctx in BPF program matches to
kernel expected ctx type. For example, the XDP program type is:
BPF_PROG_TYPE(BPF_PROG_TYPE_XDP, xdp, struct xdp_md, struct xdp_buff)
That means that XDP program should be written as:
int xdp_prog(struct xdp_md *ctx) { ... }

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20191114185720.1641606-16-ast@kernel.org
2019-11-15 23:44:48 +01:00
Phil Sutter
28f8bfd1ac netfilter: Support iif matches in POSTROUTING
Instead of generally passing NULL to NF_HOOK_COND() for input device,
pass skb->dev which contains input device for routed skbs.

Note that iptables (both legacy and nft) reject rules with input
interface match from being added to POSTROUTING chains, but nftables
allows this.

Cc: Eric Garver <eric@garver.life>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15 23:44:48 +01:00
Pablo Neira Ayuso
5c27d8d76c netfilter: nf_flow_table_offload: add IPv6 support
Add nf_flow_rule_route_ipv6() and use it from the IPv6 and the inet
flowtable type definitions. Rename the nf_flow_rule_route() function to
nf_flow_rule_route_ipv4().

Adjust maximum number of actions, which now becomes 16 to leave
sufficient room for the IPv6 address mangling for NAT.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15 23:44:47 +01:00
Pablo Neira Ayuso
4a766d490d netfilter: nf_flow_table_offload: add flow_action_entry_next() and use it
This function retrieves a spare action entry from the array of actions.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15 23:44:46 +01:00
Arnd Bergmann
6408c40c39 netfilter: nft_meta: use 64-bit time arithmetic
On 32-bit architectures, get_seconds() returns an unsigned 32-bit
time value, which also matches the type used in the nft_meta
code. This will not overflow in year 2038 as a time_t would, but
it still suffers from the overflow problem later on in year 2106.

Change this instance to use the time64_t type consistently
and avoid the deprecated get_seconds().

The nft_meta_weekday() calculation potentially gets a little slower
on 32-bit architectures, but now it has the same behavior as on
64-bit architectures and does not overflow.

Fixes: 63d10e12b00d ("netfilter: nft_meta: support for time matching")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15 23:44:45 +01:00
Arnd Bergmann
fcbad8293d netfilter: xt_time: use time64_t
The current xt_time driver suffers from the y2038 overflow on 32-bit
architectures, when the time of day calculations break.

Also, on both 32-bit and 64-bit architectures, there is a problem with
info->date_start/stop, which is part of the user ABI and overflows in
in 2106.

Fix the first issue by using time64_t and explicit calls to div_u64()
and div_u64_rem(), and document the seconds issue.

The explicit 64-bit division is unfortunately slower on 32-bit
architectures, but doing it as unsigned lets us use the optimized
division-through-multiplication path in most configurations.  This should
be fine, as the code already does not allow any negative time of day
values.

Using u32 seconds values consistently would probably also work and
be a little more efficient, but that doesn't feel right as it would
propagate the y2106 overflow to more place rather than fewer.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15 23:44:45 +01:00
Alexei Starovoitov
9cc31b3a09 bpf: Fix race in btf_resolve_helper_id()
btf_resolve_helper_id() caching logic is a bit racy, since under root the
verifier can verify several programs in parallel. Fix it with READ/WRITE_ONCE.
Fix the type as well, since error is also recorded.

Fixes: a7658e1a4164 ("bpf: Check types of arguments passed into helpers")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191114185720.1641606-15-ast@kernel.org
2019-11-15 23:44:20 +01:00
Alexei Starovoitov
faeb2dce08 bpf: Add kernel test functions for fentry testing
Add few kernel functions with various number of arguments,
their types and sizes for BPF trampoline testing to cover
different calling conventions.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20191114185720.1641606-9-ast@kernel.org
2019-11-15 23:43:01 +01:00
Tonghao Zhang
61ca533c0e net: openvswitch: don't call pad_packet if not necessary
The nla_put_u16/nla_put_u32 makes sure that
*attrlen is align. The call tree is that:

nla_put_u16/nla_put_u32
  -> nla_put		attrlen = sizeof(u16) or sizeof(u32)
  -> __nla_put		attrlen
  -> __nla_reserve	attrlen
  -> skb_put(skb, nla_total_size(attrlen))

nla_total_size returns the total length of attribute
including padding.

Cc: Joe Stringer <joe@ovn.org>
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15 12:43:27 -08:00
Vladimir Oltean
8dce89aa5f net: dsa: ocelot: add tagger for Ocelot/Felix switches
While it is entirely possible that this tagger format is in fact more
generic than just these 2 switch families, I don't have that knowledge.
The Seville switch in NXP T1040 has a similar frame format, but there
are enough differences (e.g. DEST field starts at bit 57 instead of 56)
that calling this file tag_vitesse.c is a bit of a stretch at the
moment. The frame format has been listed in a comment so that people who
add support for further Vitesse switches can rework this tagger while
keeping compatibility with Felix.

The "ocelot" name was chosen instead of "felix" because even the Ocelot
switch can act as a DSA device when it is used in NPI mode, and the Felix
tagger format is almost identical. Currently it is only used for the
Felix switch embedded in the NXP LS1028A chip.

The ABI for this tagger should be considered "not stable" at the moment.
The DSA tag is always placed before the Ethernet header and therefore,
we are using the long prefix for RX tags to avoid putting the DSA master
port in promiscuous mode. Once there will be an API in DSA for drivers
to request DSA masters to be in promiscuous mode unconditionally, we
will switch to the "no prefix" extraction frame header, which will save
16 padding bytes for each RX frame.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15 12:32:16 -08:00
Ursula Braun
0b29ec6436 net/smc: immediate termination for SMCR link groups
If the SMC module is unloaded or an IB device is thrown away, the
immediate link group freeing introduced for SMCD is exploited for SMCR
as well. That means SMCR-specifics are added to smc_conn_kill().

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15 12:28:28 -08:00
Ursula Braun
6a37ad3da5 net/smc: wait for tx completions before link freeing
Make sure all pending work requests are completed before freeing
a link.
Dismiss tx pending slots already when terminating a link group to
exploit termination shortcut in tx completion queue handler.

And kill the completion queue tasklets after destroy of the
completion queues, otherwise there is a time window for another
tasklet schedule of an already killed tasklet.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15 12:28:28 -08:00
Ursula Braun
2c1d3e5030 net/smc: abnormal termination without orderly flag
For abnormal termination issue an LLC DELETE_LINK without the
orderly flag.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15 12:28:28 -08:00
Ursula Braun
15e1b99aad net/smc: no WR buffer wait for terminating link group
Avoid waiting for a free work request buffer, if the link group
is already terminating.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15 12:28:28 -08:00
Ursula Braun
5edd6b9cb8 net/smc: introduce bookkeeping of SMCD link groups
If the ism module is unloaded return control from exit routine only,
if all link groups are freed.
If an IB device is thrown away return control from device removal only,
if all link groups belonging to this device are freed.
A counters for the total number of SMCD link groups per ISM device is
introduced. ism module unloading continues only if the total number of
SMCD link groups for all ISM devices is zero. ISM device
removal continues only it the total number of SMCD link groups per ISM
device has decreased to zero.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15 12:28:28 -08:00
Ursula Braun
5421ec281d net/smc: abnormal termination of SMCD link groups
A final cleanup due to SMCD device removal means immediate freeing
of all link groups belonging to this device in interrupt context.

This patch introduces a separate SMCD link group termination routine,
which terminates all link groups of an SMCD device.

This new routine smcd_terminate_all ()is reused if the smc module is
unloaded.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15 12:28:28 -08:00
Ursula Braun
42bfba9eaa net/smc: immediate termination for SMCD link groups
SMCD link group termination is called when peer signals its shutdown
of its corresponding link group. For regular shutdowns no connections
exist anymore. For abnormal shutdowns connections must be killed and
their DMBs must be unregistered immediately. That means the SMCR method
to delay the link group freeing several seconds does not fit.

This patch adds immediate termination of a link group and its SMCD
connections and makes sure all SMCD link group related cleanup steps
are finished.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15 12:28:28 -08:00
Ursula Braun
50c6b20eff net/smc: fix final cleanup sequence for SMCD devices
If peer announces shutdown, use the link group terminate worker for
local cleanup of link groups and connections to terminate link group
in proper context.

Make sure link groups are cleaned up first before destroying the
event queue of the SMCD device, because link group cleanup may
raise events.

Send signal shutdown only if peer has not done it already.

Send socket abort or close only, if peer has not already announced
shutdown.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15 12:28:28 -08:00
YueHaibing
d6649d788e net/tls: Fix unused function warning
If PROC_FS is not set, gcc warning this:

net/tls/tls_proc.c:23:12: warning:
 'tls_statistics_seq_show' defined but not used [-Wunused-function]

Use #ifdef to guard this.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15 12:12:28 -08:00
Stefano Garzarella
36c5b48b91 vsock: fix bind() behaviour taking care of CID
When we are looking for a socket bound to a specific address,
we also have to take into account the CID.

This patch is useful with multi-transports support because it
allows the binding of the same port with different CID, and
it prevents a connection to a wrong socket bound to the same
port, but with different CID.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 18:12:18 -08:00
Stefano Garzarella
6a2c096210 vsock: prevent transport modules unloading
This patch adds 'module' member in the 'struct vsock_transport'
in order to get/put the transport module. This prevents the
module unloading while sockets are assigned to it.

We increase the module refcnt when a socket is assigned to a
transport, and we decrease the module refcnt when the socket
is destructed.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 18:12:18 -08:00
Stefano Garzarella
b1bba80a43 vsock/vmci: register vmci_transport only when VMCI guest/host are active
To allow other transports to be loaded with vmci_transport,
we register the vmci_transport as G2H or H2G only when a VMCI guest
or host is active.

To do that, this patch adds a callback registered in the vmci driver
that will be called when the host or guest becomes active.
This callback will register the vmci_transport in the VSOCK core.

Cc: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 18:12:18 -08:00
Stefano Garzarella
c0cfa2d8a7 vsock: add multi-transports support
This patch adds the support of multiple transports in the
VSOCK core.

With the multi-transports support, we can use vsock with nested VMs
(using also different hypervisors) loading both guest->host and
host->guest transports at the same time.

Major changes:
- vsock core module can be loaded regardless of the transports
- vsock_core_init() and vsock_core_exit() are renamed to
  vsock_core_register() and vsock_core_unregister()
- vsock_core_register() has a feature parameter (H2G, G2H, DGRAM)
  to identify which directions the transport can handle and if it's
  support DGRAM (only vmci)
- each stream socket is assigned to a transport when the remote CID
  is set (during the connect() or when we receive a connection request
  on a listener socket).
  The remote CID is used to decide which transport to use:
  - remote CID <= VMADDR_CID_HOST will use guest->host transport;
  - remote CID == local_cid (guest->host transport) will use guest->host
    transport for loopback (host->guest transports don't support loopback);
  - remote CID > VMADDR_CID_HOST will use host->guest transport;
- listener sockets are not bound to any transports since no transport
  operations are done on it. In this way we can create a listener
  socket, also if the transports are not loaded or with VMADDR_CID_ANY
  to listen on all transports.
- DGRAM sockets are handled as before, since only the vmci_transport
  provides this feature.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 18:12:18 -08:00
Stefano Garzarella
039642574c hv_sock: set VMADDR_CID_HOST in the hvs_remote_addr_init()
Remote peer is always the host, so we set VMADDR_CID_HOST as
remote CID instead of VMADDR_CID_ANY.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 18:12:18 -08:00
Stefano Garzarella
55f3e149b6 vsock: move vsock_insert_unbound() in the vsock_create()
vsock_insert_unbound() was called only when 'sock' parameter of
__vsock_create() was not null. This only happened when
__vsock_create() was called by vsock_create().

In order to simplify the multi-transports support, this patch
moves vsock_insert_unbound() at the end of vsock_create().

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 18:12:18 -08:00
Stefano Garzarella
b9ca2f5ff7 vsock: add vsock_create_connected() called by transports
All transports call __vsock_create() with the same parameters,
most of them depending on the parent socket. In order to simplify
the VSOCK core APIs exposed to the transports, this patch adds
the vsock_create_connected() callable from transports to create
a new socket when a connection request is received.
We also unexported the __vsock_create().

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 18:12:18 -08:00
Stefano Garzarella
b9f2b0ffde vsock: handle buffer_size sockopts in the core
virtio_transport and vmci_transport handle the buffer_size
sockopts in a very similar way.

In order to support multiple transports, this patch moves this
handling in the core to allow the user to change the options
also if the socket is not yet assigned to any transport.

This patch also adds the '.notify_buffer_size' callback in the
'struct virtio_transport' in order to inform the transport,
when the buffer_size is changed by the user. It is also useful
to limit the 'buffer_size' requested (e.g. virtio transports).

Acked-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 18:12:18 -08:00
Stefano Garzarella
daabfbca34 vsock: add 'struct vsock_sock *' param to vsock_core_get_transport()
Since now the 'struct vsock_sock' object contains a pointer to
the transport, this patch adds a parameter to the
vsock_core_get_transport() to return the right transport
assigned to the socket.

This patch modifies also the virtio_transport_get_ops(), that
uses the vsock_core_get_transport(), adding the
'struct vsock_sock *' parameter.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 18:12:18 -08:00
Stefano Garzarella
4c7246dc45 vsock/virtio: add transport parameter to the virtio_transport_reset_no_sock()
We are going to add 'struct vsock_sock *' parameter to
virtio_transport_get_ops().

In some cases, like in the virtio_transport_reset_no_sock(),
we don't have any socket assigned to the packet received,
so we can't use the virtio_transport_get_ops().

In order to allow virtio_transport_reset_no_sock() to use the
'.send_pkt' callback from the 'vhost_transport' or 'virtio_transport',
we add the 'struct virtio_transport *' to it and to its caller:
virtio_transport_recv_pkt().

We moved the 'vhost_transport' and 'virtio_transport' definition,
to pass their address to the virtio_transport_recv_pkt().

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 18:12:18 -08:00
Stefano Garzarella
fe502c4a38 vsock: add 'transport' member in the struct vsock_sock
As a preparation to support multiple transports, this patch adds
the 'transport' member at the 'struct vsock_sock'.
This new field is initialized during the creation in the
__vsock_create() function.

This patch also renames the global 'transport' pointer to
'transport_single', since for now we're only supporting a single
transport registered at run-time.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 18:12:18 -08:00
Stefano Garzarella
3603a2e991 vsock: remove include/linux/vm_sockets.h file
This header file now only includes the "uapi/linux/vm_sockets.h".
We can include directly it when needed.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 18:12:17 -08:00
Stefano Garzarella
db205c7668 vsock: remove vm_sockets_get_local_cid()
vm_sockets_get_local_cid() is only used in virtio_transport_common.c.
We can replace it calling the virtio_transport_get_ops() and
using the get_local_cid() callback registered by the transport.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 18:12:17 -08:00
Stefano Garzarella
7ed78bc495 vsock/vmci: remove unused VSOCK_DEFAULT_CONNECT_TIMEOUT
The VSOCK_DEFAULT_CONNECT_TIMEOUT definition was introduced with
commit d021c344051af ("VSOCK: Introduce VM Sockets"), but it is
never used in the net/vmw_vsock/vmci_transport.c.

VSOCK_DEFAULT_CONNECT_TIMEOUT is used and defined in
net/vmw_vsock/af_vsock.c

Cc: Jorgen Hansen <jhansen@vmware.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 18:12:17 -08:00
Matt Bennett
d7f9f47d4d tipc: add back tipc prefix to log messages
The tipc prefix for log messages generated by tipc was
removed in commit 07f6c4bc048a ("tipc: convert tipc reference
table to use generic rhashtable").

This is still a useful prefix so add it back.

Signed-off-by: Matt Bennett <matt.bennett@alliedtelesis.co.nz>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 18:03:03 -08:00
David S. Miller
f97d139aaa mlx5-updates-2019-11-12
1) Merge mlx5-next for devlink reload and flowtable offloads dependencies
 2) Devlink reload support
 3) TC Flowtable offloads
 4) Misc cleanup
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl3MgsAACgkQSD+KveBX
 +j6q8Qf+LFSUhQIpDnuuJpEnu1SAIxnzcm22vUZ3b0zAfWj2mTLqrntYghRnUpbF
 fk62w+yuHtaB+fsyzq17psR6rEcVbL8dfNGOtsz9TSsmDC5EvClFc/icUQTQPYGg
 9EZQMTtAQ7dicjAX4dpT9ZnSjaiWWKKMFCc8ZOgZuAjsuhXG5iXQC0JgCJzyWJQ4
 xmR3MJT7bAPDfRl2bWiwwLUlfG0qZljaBki1ThP2r5NttgNYwNCaHNuxnfji/YSB
 7ZzJkq2ejID+HjBnEiowtXuOsLX0ETc2oZ3BOimAaG9qMK7wtF/5an1T1LLQT93L
 LGxDIhn65r0xVwN/chn8zlvDaBsVbA==
 =ZUuo
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2019-11-12' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2019-11-12

1) Merge mlx5-next for devlink reload and flowtable offloads dependencies
2) Devlink reload support
3) TC Flowtable offloads
4) Misc cleanup
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 18:01:45 -08:00
Tonghao Zhang
bd1903b7c4 net: openvswitch: add hash info to upcall
When using the kernel datapath, the upcall don't
include skb hash info relatived. That will introduce
some problem, because the hash of skb is important
in kernel stack. For example, VXLAN module uses
it to select UDP src port. The tx queue selection
may also use the hash in stack.

Hash is computed in different ways. Hash is random
for a TCP socket, and hash may be computed in hardware,
or software stack. Recalculation hash is not easy.

Hash of TCP socket is computed:
tcp_v4_connect
    -> sk_set_txhash (is random)

__tcp_transmit_skb
    -> skb_set_hash_from_sk

There will be one upcall, without information of skb
hash, to ovs-vswitchd, for the first packet of a TCP
session. The rest packets will be processed in Open vSwitch
modules, hash kept. If this tcp session is forward to
VXLAN module, then the UDP src port of first tcp packet
is different from rest packets.

TCP packets may come from the host or dockers, to Open vSwitch.
To fix it, we store the hash info to upcall, and restore hash
when packets sent back.

+---------------+          +-------------------------+
|   Docker/VMs  |          |     ovs-vswitchd        |
+----+----------+          +-+--------------------+--+
     |                       ^                    |
     |                       |                    |
     |                       |  upcall            v restore packet hash (not recalculate)
     |                     +-+--------------------+--+
     |  tap netdev         |                         |   vxlan module
     +--------------->     +-->  Open vSwitch ko     +-->
       or internal type    |                         |
                           +-------------------------+

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 17:29:46 -08:00
Saeed Mahameed
c94ef13b04 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
1) New generic devlink param "enable_roce", for downstream devlink
   reload support

2) Do vport ACL configuration on per vport basis when
   enabling/disabling a vport. This enables to have vports enabled/disabled
   outside of eswitch config for future

3) Split the code for legacy vs offloads mode and make it clear

4) Tide up vport locking and workqueue usage

5) Fix metadata enablement for ECPF

6) Make explicit use of VF property to publish IB_DEVICE_VIRTUAL_FUNCTION

7) E-Switch and flow steering core low level support and refactoring for
   netfilter flowtables offload

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-13 14:24:58 -08:00
David S. Miller
b3dff0eb4b linux-can-fixes-for-5.4-20191113
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEmvEkXzgOfc881GuFWsYho5HknSAFAl3L0SETHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRBaxiGjkeSdIBReCACawwJlue5MIbawq+JD8wQ+RE117kJc
 x6z2QFEynNchywthMRfaCSj+QP0n5C4JY6BHtn7LMT2JOrPVzQAwz2cZ4eUPinUA
 eSd0/0ZqoBH+IZW4c3sYNkrGSi47qIytGi3jgbfNbZj4mIYBCikLDhLhWVk9+rTp
 HZiqS2vWUaZ42bKvbHE2CHTV/oPdz/3wJIQmctkyoFvxKcJM1WyI6y7QSJ/VJWv1
 bpX4dymK4qgnGHGyiOcoeRmaDFOg53gDTRfIQZoc++5/dlczov7eA48XedrzQUV7
 qGR4j6BjZrxloQsS3Y4ynTRC+x5aE+jajEaf3QEKrO7VCRS6jdV1bX2E
 =13Km
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-5.4-20191113' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2019-11-13

this is a pull request of 9 patches for net/master, hopefully for the v5.4
release cycle.

All nine patches are by Oleksij Rempel and fix locking and use-after-free bugs
in the j1939 stack found by the syzkaller syzbot.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-13 11:47:02 -08:00
David S. Miller
895badff2c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2019-11-13

1) Remove a unnecessary net_exit function from the xfrm interface.
   From Xin Long.

2) Assign xfrm4_udp_encap_rcv to a UDP socket only if xfrm
   is configured. From Alexey Dobriyan.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-13 11:28:54 -08:00