IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
As done treewide earlier, this catches several more open-coded
allocation size calculations that were added to the kernel during the
merge window. This performs the following mechanical transformations
using Coccinelle:
kvmalloc(a * b, ...) -> kvmalloc_array(a, b, ...)
kvzalloc(a * b, ...) -> kvcalloc(a, b, ...)
devm_kzalloc(..., a * b, ...) -> devm_kcalloc(..., a, b, ...)
Signed-off-by: Kees Cook <keescook@chromium.org>
This patch adds OEM commands and response handling. It also defines OEM
command and response structure as per NCSI specification along with its
handlers.
ncsi_cmd_handler_oem: This is a generic command request handler for OEM
commands
ncsi_rsp_handler_oem: This is a generic response handler for OEM commands
Signed-off-by: Vijay Khemka <vijaykhemka@fb.com>
Reviewed-by: Justin Lee <justin.lee1@dell.com>
Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
move_addr_to_kernel() returns only negative values on error, or zero on
success. Rewrite the error check to an idiomatic form to avoid confusing
the reader.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A number of TC attributes are processed without proper validation
(e.g., length checks). Add a tca policy for all input attributes and use
when invoking nlmsg_parse.
The 2 Fixes tags below cover the latest additions. The other attributes
are a string (KIND), nested attribute (OPTIONS which does seem to have
validation in most cases), for dumps only or a flag.
Fixes: 5bc1701881e39 ("net: sched: introduce multichain support for filters")
Fixes: d47a6b0e7c492 ("net: sched: introduce ingress/egress block index attributes for qdisc")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, rtnl_fdb_dump() assumes the family header is 'struct ifinfomsg',
which is not always true -- 'struct ndmsg' is used by iproute2 ('ip neigh').
The problem is, the function bails out early if nlmsg_parse() fails, which
does occur for iproute2 usage of 'struct ndmsg' because the payload length
is shorter than the family header alone (as 'struct ifinfomsg' is assumed).
This breaks backward compatibility with userspace -- nothing is sent back.
Some examples with iproute2 and netlink library for go [1]:
1) $ bridge fdb show
33:33:00:00:00:01 dev ens3 self permanent
01:00:5e:00:00:01 dev ens3 self permanent
33:33:ff:15:98:30 dev ens3 self permanent
This one works, as it uses 'struct ifinfomsg'.
fdb_show() @ iproute2/bridge/fdb.c
"""
.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
...
if (rtnl_dump_request(&rth, RTM_GETNEIGH, [...]
"""
2) $ ip --family bridge neigh
RTNETLINK answers: Invalid argument
Dump terminated
This one fails, as it uses 'struct ndmsg'.
do_show_or_flush() @ iproute2/ip/ipneigh.c
"""
.n.nlmsg_type = RTM_GETNEIGH,
.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)),
"""
3) $ ./neighlist
< no output >
This one fails, as it uses 'struct ndmsg'-based.
neighList() @ netlink/neigh_linux.go
"""
req := h.newNetlinkRequest(unix.RTM_GETNEIGH, [...]
msg := Ndmsg{
"""
The actual breakage was introduced by commit 0ff50e83b512 ("net: rtnetlink:
bail out from rtnl_fdb_dump() on parse error"), because nlmsg_parse() fails
if the payload length (with the _actual_ family header) is less than the
family header length alone (which is assumed, in parameter 'hdrlen').
This is true in the examples above with struct ndmsg, with size and payload
length shorter than struct ifinfomsg.
However, that commit just intends to fix something under the assumption the
family header is indeed an 'struct ifinfomsg' - by preventing access to the
payload as such (via 'ifm' pointer) if the payload length is not sufficient
to actually contain it.
The assumption was introduced by commit 5e6d24358799 ("bridge: netlink dump
interface at par with brctl"), to support iproute2's 'bridge fdb' command
(not 'ip neigh') which indeed uses 'struct ifinfomsg', thus is not broken.
So, in order to unbreak the 'struct ndmsg' family headers and still allow
'struct ifinfomsg' to continue to work, check for the known message sizes
used with 'struct ndmsg' in iproute2 (with zero or one attribute which is
not used in this function anyway) then do not parse the data as ifinfomsg.
Same examples with this patch applied (or revert/before the original fix):
$ bridge fdb show
33:33:00:00:00:01 dev ens3 self permanent
01:00:5e:00:00:01 dev ens3 self permanent
33:33:ff:15:98:30 dev ens3 self permanent
$ ip --family bridge neigh
dev ens3 lladdr 33:33:00:00:00:01 PERMANENT
dev ens3 lladdr 01:00:5e:00:00:01 PERMANENT
dev ens3 lladdr 33:33:ff:15:98:30 PERMANENT
$ ./neighlist
netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x33, 0x33, 0x0, 0x0, 0x0, 0x1}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x1, 0x0, 0x5e, 0x0, 0x0, 0x1}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x33, 0x33, 0xff, 0x15, 0x98, 0x30}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
Tested on mainline (v4.19-rc6) and net-next (3bd09b05b068).
References:
[1] netlink library for go (test-case)
https://github.com/vishvananda/netlink
$ cat ~/go/src/neighlist/main.go
package main
import ("fmt"; "syscall"; "github.com/vishvananda/netlink")
func main() {
neighs, _ := netlink.NeighList(0, syscall.AF_BRIDGE)
for _, neigh := range neighs { fmt.Printf("%#v\n", neigh) }
}
$ export GOPATH=~/go
$ go get github.com/vishvananda/netlink
$ go build neighlist
$ ~/go/src/neighlist/neighlist
Thanks to David Ahern for suggestions to improve this patch.
Fixes: 0ff50e83b512 ("net: rtnetlink: bail out from rtnl_fdb_dump() on parse error")
Fixes: 5e6d24358799 ("bridge: netlink dump interface at par with brctl")
Reported-by: Aidan Obley <aobley@pivotal.io>
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case ip_fib_metrics_init() returns an error, we better
rewrite rt->fib6_metrics with &dst_default_metrics so that
we do not crash later in ip_fib_metrics_put()
Fixes: 767a2217533f ("net: common metrics init helper for FIB entries")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid the socket lookup cost in udp_gro_receive if no socket has a
udp tunnel callback configured.
udp_sk(sk)->gro_receive requires a registration with
setup_udp_tunnel_sock, which enables the static key.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following Sparse warnings:
net/bpfilter/bpfilter_kern.c:62:21: warning: cast removes address space
of expression
net/bpfilter/bpfilter_kern.c:101:49: warning: Using plain integer as
NULL pointer
Signed-off-by: Shanthosh RK <shanthosh.rk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The newly introduced gss_seq_send64_fetch_and_inc() fails to build on
32-bit architectures:
net/sunrpc/auth_gss/gss_krb5_seal.c:144:14: note: in expansion of macro 'cmpxchg'
seq_send = cmpxchg(&ctx->seq_send64, old, old + 1);
^~~~~~~
arch/x86/include/asm/cmpxchg.h:128:3: error: call to '__cmpxchg_wrong_size' declared with attribute error: Bad argument size for cmpxchg
__cmpxchg_wrong_size(); \
As the message tells us, cmpxchg() cannot be used on 64-bit arguments,
that's what cmpxchg64() does.
Fixes: 571ed1fd2390 ("SUNRPC: Replace krb5_seq_lock with a lockless scheme")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Fix the rxrpc_data_ready() function to pick up all packets and to not miss
any. There are two problems:
(1) The sk_data_ready pointer on the UDP socket is set *after* it is
bound. This means that it's open for business before we're ready to
dequeue packets and there's a tiny window exists in which a packet can
sneak onto the receive queue, but we never know about it.
Fix this by setting the pointers on the socket prior to binding it.
(2) skb_recv_udp() will return an error (such as ENETUNREACH) if there was
an error on the transmission side, even though we set the
sk_error_report hook. Because rxrpc_data_ready() returns immediately
in such a case, it never actually removes its packet from the receive
queue.
Fix this by abstracting out the UDP dequeuing and checksumming into a
separate function that keeps hammering on skb_recv_udp() until it
returns -EAGAIN, passing the packets extracted to the remainder of the
function.
and two potential problems:
(3) It might be possible in some circumstances or in the future for
packets to be being added to the UDP receive queue whilst rxrpc is
running consuming them, so the data_ready() handler might get called
less often than once per packet.
Allow for this by fully draining the queue on each call as (2).
(4) If a packet fails the checksum check, the code currently returns after
discarding the packet without checking for more.
Allow for this by fully draining the queue on each call as (2).
Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Fix some refs to init_net that should've been changed to the appropriate
network namespace.
Fixes: 2baec2c3f854 ("rxrpc: Support network namespacing")
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
In commit ec3ed293e766 ("net_sched: change tcf_del_walker() to take idrinfo->lock")
we move fl_hw_destroy_tmplt() to a workqueue to avoid blocking
with the spinlock held. Unfortunately, this causes a lot of
troubles here:
1. tcf_chain_destroy() could be called right after we queue the work
but before the work runs. This is a use-after-free.
2. The chain refcnt is already 0, we can't even just hold it again.
We can check refcnt==1 but it is ugly.
3. The chain with refcnt 0 is still visible in its block, which means
it could be still found and used!
4. The block has a refcnt too, we can't hold it without introducing a
proper API either.
We can make it working but the end result is ugly. Instead of wasting
time on reviewing it, let's just convert the troubling spinlock to
a mutex, which allows us to use non-atomic allocations too.
Fixes: ec3ed293e766 ("net_sched: change tcf_del_walker() to take idrinfo->lock")
Reported-by: Ido Schimmel <idosch@idosch.org>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As we now do not allow ethtool to deactivate the queue id we are
running an AF_XDP socket on, we can simplify the implementation of
xdp_clear_umem_at_qid().
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
We already check the RSS indirection table does not use queues which
would be disabled by channel reconfiguration. Make sure user does not
try to disable queues which have a UMEM and zero-copy AF_XDP socket
installed.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
ethtool_set_channels() validates the config against driver's max
settings. It retrieves the current config and stores it in a
variable called max. This was okay when only max settings were
accessed but we will soon want to access current settings as
well, so calling the entire structure max makes the code less
readable.
While at it drop unnecessary parenthesis.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Previously, the xsk code did not record which umem was bound to a
specific queue id. This was not required if all drivers were zero-copy
enabled as this had to be recorded in the driver anyway. So if a user
tried to bind two umems to the same queue, the driver would say
no. But if copy-mode was first enabled and then zero-copy mode (or the
reverse order), we mistakenly enabled both of them on the same umem
leading to buggy behavior. The main culprit for this is that we did
not store the association of umem to queue id in the copy case and
only relied on the driver reporting this. As this relation was not
stored in the driver for copy mode (it does not rely on the AF_XDP
NDOs), this obviously could not work.
This patch fixes the problem by always recording the umem to queue id
relationship in the netdev_queue and netdev_rx_queue structs. This way
we always know what kind of umem has been bound to a queue id and can
act appropriately at bind time.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Move the attribute parsing from neigh_dump_table to neigh_dump_info, and
pass the filter arguments down to neigh_dump_table in a new struct. Add
the filter option to proxy neigh dumps as well to make them consistent.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we use raw socket as the vhost backend, a packet from virito with
gso offloading information, cannot be sent out in later validaton at
xmit path, as we did not set correct skb->protocol which is further used
for looking up the gso function.
To fix this, we set this field according to virito hdr information.
Fixes: e858fae2b0b8f4 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
Signed-off-by: Jianfeng Tan <jianfeng.tan@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the refcounting and potential free of dst metrics associated
for ipv4 and ipv6 to a common helper.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipv4 and ipv6 both use refcounted metrics if FIB entries have metrics set.
Move the common initialization code to a helper and use for both protocols.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the refcounting and potential free of dst metrics associated
with a fib entry to a helper and use it in both ipv4 and ipv6.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consolidate initialization of ipv4 and ipv6 metrics when fib entries
are created into a single helper, ip_fib_metrics_init, that handles
the call to ip_metrics_convert.
If no metrics are defined for the fib entry, then the metrics is set
to dst_default_metrics.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Load the respective NAT helper module if the flow uses it.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This traffic scheduler allows traffic classes states (transmission
allowed/not allowed, in the simplest case) to be scheduled, according
to a pre-generated time sequence. This is the basis of the IEEE
802.1Qbv specification.
Example configuration:
tc qdisc replace dev enp3s0 parent root handle 100 taprio \
num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 2@2 \
base-time 1528743495910289987 \
sched-entry S 01 300000 \
sched-entry S 02 300000 \
sched-entry S 04 300000 \
clockid CLOCK_TAI
The configuration format is similar to mqprio. The main difference is
the presence of a schedule, built by multiple "sched-entry"
definitions, each entry has the following format:
sched-entry <CMD> <GATE MASK> <INTERVAL>
The only supported <CMD> is "S", which means "SetGateStates",
following the IEEE 802.1Qbv-2015 definition (Table 8-6). <GATE MASK>
is a bitmask where each bit is a associated with a traffic class, so
bit 0 (the least significant bit) being "on" means that traffic class
0 is "active" for that schedule entry. <INTERVAL> is a time duration
in nanoseconds that specifies for how long that state defined by <CMD>
and <GATE MASK> should be held before moving to the next entry.
This schedule is circular, that is, after the last entry is executed
it starts from the first one, indefinitely.
The other parameters can be defined as follows:
- base-time: specifies the instant when the schedule starts, if
'base-time' is a time in the past, the schedule will start at
base-time + (N * cycle-time)
where N is the smallest integer so the resulting time is greater
than "now", and "cycle-time" is the sum of all the intervals of the
entries in the schedule;
- clockid: specifies the reference clock to be used;
The parameters should be similar to what the IEEE 802.1Q family of
specification defines.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
msix_vec_per_pf_min - This param sets the number of minimal MSIX
vectors required for the device initialization. This value is set
in the device which limits MSIX vectors per PF.
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
msix_vec_per_pf_max - This param sets the number of MSIX vectors
that the device requests from the host on driver initialization.
This value is set in the device which is applicable per PF.
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ignore_ari - Device ignores ARI(Alternate Routing ID) capability,
even when platforms has the support and creates same number of
partitions when platform does not support ARI capability.
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* fix use-after-free in regulatory code
* fix rx-mgmt key flag in AP mode (mac80211)
* fix wireless extensions compat code memory leak
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAlu2bEUACgkQB8qZga/f
l8RAgw/7BfRpm3Kr7XW919naGkt/pQeJxUcuF9YggBpTCrp/DSLQYsjOBE5DyS/m
728oPD8jEDehUHasWKsbG7wit1S7ImExCHTPim8C1mbABhbqdhwD4ceUvBO7RYi2
p0+yN8X8z5D0qruMrNwhtxdE8iV9bBgmY6u1jubpJFkKLPf2euZyroH40b879CIn
aHqB42GNJCdwO2UFaPDH2cdx5DFWrDlfA1LGbrbuzrXMBfNGWYgen2JJH/5iDOyU
1rVXk/pUpVffp0Zde+66NtyCxxC0+hQwrTczEKXICb5qoWJpz6kugFGGO1oDQgdp
AbM7KNrV712h/qwTEnC1NG0KUXgocpwWIuf/cuTow0vGUJSl+O2pLS/3GLOwH2du
1u/FF4LiBc4NFXmWBPMN3LUN+Ica0/YWSbVwcv2c4guemV1EOGinlbFc+tnoue7M
fpLkQJUYCiEVFRXGWVaSl0Hr6z+zwgfa8qHYN2yq1qyB0dYHryYiVhgKLV7yisCm
RNy1hmVuV7rMsL3f4iUq/2xnL3U8qK1+19Mr+i58/kU4Tx2jMkWj0kivRdgYX4EN
XcBhJzWXrb3yMldACrCji6iRnrFMqg7osyEgFiMLMl4cZfs057+qsrJyR5xajVOi
Ws+hj3a1LukJ1nIhou4uOLAt9D7ohuJojQq6D2GLeBwfGPNkz4E=
=QyWK
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-davem-2018-10-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Just three small fixes:
* fix use-after-free in regulatory code
* fix rx-mgmt key flag in AP mode (mac80211)
* fix wireless extensions compat code memory leak
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx5 core driver and ethernet netdev updates, please note there is a small
devlink releated update to allow extack argument to eswitch operations.
From Eli Britstein,
1) devlink: Add extack argument to the eswitch related operations
2) net/mlx5e: E-Switch, return extack messages for failures in the e-switch devlink callbacks
3) net/mlx5e: Add extack messages for TC offload failures
From Eran Ben Elisha,
4) mlx5e: Add counter for aRFS rule insertion failures
From Feras Daoud
5) Fast teardown support for mlx5 device
This change introduces the enhanced version of the "Force teardown" that
allows SW to perform teardown in a faster way without the need to reclaim
all the FW pages.
Fast teardown provides the following advantages:
1- Fix a FW race condition that could cause command timeout
2- Avoid moving to polling mode
3- Close the vport to prevent PCI ACK to be sent without been scatter
to memory
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJbtU45AAoJEEg/ir3gV/o+/C4H/RHA4KImrb476EdB3VNYMqAN
dgXb+bmh6sZP+jHWqQ4c3aVeh6/T8qm4gwiSn2nVTtHEnxtCdIYljzDC1Nswczeg
pSjD1eOP7M1LpAOmBb8xdnJcX7yM7r1bTklnp2sN853WShbsDRYgZBHsBwTzx25U
ZdzL4QTLuohlG/aLrbGXMntIy45ya2fVQrnK54s18nFlgsdFjEs0mi0xaUKNBC6+
P8CTohHAxuuxmL5b+6MIYLZCdgd8cLNQFdtqbckEVw7SvcRTxfraRlyqJ0YOgTGB
TdSWnqZz2JYH29wSFbpFG8qX6GCv8FoiZ+fKzldbolHk442rrktHv3+Y7qQuZVs=
=NVks
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2018-10-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2018-10-03
mlx5 core driver and ethernet netdev updates, please note there is a small
devlink releated update to allow extack argument to eswitch operations.
From Eli Britstein,
1) devlink: Add extack argument to the eswitch related operations
2) net/mlx5e: E-Switch, return extack messages for failures in the e-switch devlink callbacks
3) net/mlx5e: Add extack messages for TC offload failures
From Eran Ben Elisha,
4) mlx5e: Add counter for aRFS rule insertion failures
From Feras Daoud
5) Fast teardown support for mlx5 device
This change introduces the enhanced version of the "Force teardown" that
allows SW to perform teardown in a faster way without the need to reclaim
all the FW pages.
Fast teardown provides the following advantages:
1- Fix a FW race condition that could cause command timeout
2- Avoid moving to polling mode
3- Close the vport to prevent PCI ACK to be sent without been scatter
to memory
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQIVAwUAW7XYNfu3V2unywtrAQJCjhAAle39wPhGLcvO69dEwL3skLT74d2XTT+Q
0e7+5eUbyNFfW+hEphZBCtTCTOFYhzxoqATZHVpEquOZOt2B5b43W97LitHvpirf
rHAOT6Y6P6t682/f7oHiXx4tW9hfEpgXyFj1aqluAcK7lOvBQf/koy4t1DAMfiTv
dmc0A8OwBxfKfwSmAmRY26mpuLTzGe+gZkMzMFScicJ/GmWSIw8ujfajjIFv7c0F
kYub9KkTCsc2LfI/2KFIZVgyI0acDlwMKVmZirzlZJWTUSaf71v6ApFVNbUOSKBm
64lm6h5jr5//x3rK6DdI6MV943uCwl5uFKbnTVu/Wv+bjvCuBviIvXuYWCHm6QMl
uqt7Ht3qIid0VKBJc1ekrEvfDo7xi61GWV07BknRpMbUjlqadST7td7wDfuK017C
OUzKdJFNTzNwAWzkJIgRpyyRjUSm3pTxEL7iTZ0+shENpKpT+SyzKgseNPUCHAPv
8kprzGD3pXNjbpgZOeLcbpGbhnFTrV6NKguIAKy/NBMVlnx4yzbNTCjEzwLx/oG4
G0KlPUzXR/ZO3CtjDGz0k/pjJOAHmLxWfaaAk+opks8mFDaKKGHAg1rzkM5KN6kk
GGsO0vWFIRfbFqqWhojFxEtiNgrtHIStVJSZtnBG0WRdn8jdrYAnFBo9leGnS+5/
y6PM91VI2+8=
=b6yx
-----END PGP SIGNATURE-----
Merge tag 'rxrpc-next-20181004' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
rxrpc: Development
Here are some development patches for AF_RXRPC. The most significant points
are:
(1) Change the tracepoint that indicates a packet has been transmitted
into one that indicates a packet is about to be transmitted. Without
this, the response tracepoint may occur first if the round trip is
fast enough.
(2) Sort out AFS address list handling to better enforce maximum capacity
to use helper functions to fill them and to do an insertion sort to
order them. This is here to make (3) easier.
(3) Keep AF_INET addresses as AF_INET addresses rather than converting
them to AF_INET6 in both AF_RXRPC and kAFS. I hadn't realised that a
UDP6 socket would just call down into UDP4 if given an AF_INET
address.
(4) Allow the timestamp on the first DATA packet of a reply to be
retrieved by a kernel service. This will give the kAFS a more
accurate base from which to calculate the callback promise expiration.
(5) Allow the rxrpc protocol epoch value to be retrieved from an incoming
call. This will allow kAFS to determine if the fileserver restarted
and if two addresses apparently assigned to the same fileserver
actually are different boxes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow the DNS resolver to retrieve a set of servers and their associated
addresses, ports, preference and weight ratings.
In terms of communication with userspace, "srv=1" is added to the callout
string (the '1' indicating the maximum data version supported by the
kernel) to ask the userspace side for this.
If the userspace side doesn't recognise it, it will ignore the option and
return the usual text address list.
If the userspace side does recognise it, it will return some binary data
that begins with a zero byte that would cause the string parsers to give an
error. The second byte contains the version of the data in the blob (this
may be between 1 and the version specified in the callout data). The
remainder of the payload is version-specific.
In version 1, the payload looks like (note that this is packed):
u8 Non-string marker (ie. 0)
u8 Content (0 => Server list)
u8 Version (ie. 1)
u8 Source (eg. DNS_RECORD_FROM_DNS_SRV)
u8 Status (eg. DNS_LOOKUP_GOOD)
u8 Number of servers
foreach-server {
u16 Name length (LE)
u16 Priority (as per SRV record) (LE)
u16 Weight (as per SRV record) (LE)
u16 Port (LE)
u8 Source (eg. DNS_RECORD_FROM_NSS)
u8 Status (eg. DNS_LOOKUP_GOT_NOT_FOUND)
u8 Protocol (eg. DNS_SERVER_PROTOCOL_UDP)
u8 Number of addresses
char[] Name (not NUL-terminated)
foreach-address {
u8 Family (AF_INET{,6})
union {
u8[4] ipv4_addr
u8[16] ipv6_addr
}
}
}
This can then be used to fetch a whole cell's VL-server configuration for
AFS, for example.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow the epoch value to be queried on a server connection. This is in the
rxrpc header of every packet for use in routing and is derived from the
client's state. It's also not supposed to change unless the client gets
restarted.
AFS can make use of this information to deduce whether a fileserver has
been restarted because the fileserver makes client calls to the filesystem
driver's cache manager to send notifications (ie. callback breaks) about
conflicting changes from other clients. These convey the fileserver's own
epoch value back to the filesystem.
Signed-off-by: David Howells <dhowells@redhat.com>
Allow the timestamp on the sk_buff holding the first DATA packet of a reply
to be queried. This can then be used as a base for the expiry time
calculation on the callback promise duration indicated by an operation
result.
Signed-off-by: David Howells <dhowells@redhat.com>
Stephen Rothwell reports the following link failure with IPv6 as module:
x86_64-linux-gnu-ld: net/core/filter.o: in function `sk_lookup':
(.text+0x19219): undefined reference to `__udp6_lib_lookup'
Fix the build by only enabling the IPv6 socket lookup if IPv6 support is
compiled into the kernel.
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
rxrpc_extract_addr_from_skb() doesn't use the argument that points to the
local endpoint, so remove the argument.
Signed-off-by: David Howells <dhowells@redhat.com>
AF_RXRPC opens an IPv6 socket through which to send and receive network
packets, both IPv6 and IPv4. It currently turns AF_INET addresses into
AF_INET-as-AF_INET6 addresses based on an assumption that this was
necessary; on further inspection of the code, however, it turns out that
the IPv6 code just farms packets aimed at AF_INET addresses out to the IPv4
code.
Fix AF_RXRPC to use AF_INET addresses directly when given them.
Fixes: 7b674e390e51 ("rxrpc: Fix IPv6 support")
Signed-off-by: David Howells <dhowells@redhat.com>
Print the data Tx trace line before transmitting so that it appears before
the trace lines indicating success or failure of the transmission. This
makes the trace log less confusing.
Signed-off-by: David Howells <dhowells@redhat.com>
rxrpc_lose_skb() is now exactly the same as rxrpc_free_skb(), so remove it
and use the latter instead.
Signed-off-by: David Howells <dhowells@redhat.com>
Minor conflict in net/core/rtnetlink.c, David Ahern's bug fix in 'net'
overlapped the renaming of a netlink attribute in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add extack argument to the eswitch related operations.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Clean up: Use the appropriate C macro instead of open-coding
container_of() .
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Clean up: fill in or update documenting comments for transport
switch entry points.
For xprt_rdma_allocate:
The first paragraph is no longer true since commit 5a6d1db45569
("SUNRPC: Add a transport-specific private field in rpc_rqst").
The second paragraph is no longer true since commit 54cbd6b0c6b9
("xprtrdma: Delay DMA mapping Send and Receive buffers").
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
To show that a caller did attempt to allocate and post more Receive
buffers, the trace point in rpcrdma_post_recvs() should report when
rpcrdma_post_recvs() was invoked but no new Receive buffers were
posted.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Clean up: rb_flags might be used for other things besides
RPCRDMA_BUF_F_EMPTY_SCQ, so initialize it in a generic spot
instead of in a send-completion-queue-related helper.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Replace "fallthru" with a proper "fall through" annotation.
This fix is part of the ongoing efforts to enabling
-Wimplicit-fallthrough
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a trivial split into lookup and insert functions, no change in
behavior.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Avoid taking the global auth_domain_lock in most lookups of the auth domain
by adding an RCU protected lookup.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Module removal is RCU safe by design, so we really have no need to
lock the 'authtab[]' array.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Clean up: This code was copied from xprtsock.c and
backchannel_rqst.c. For rpcrdma, the backchannel server runs
exclusively in process context, thus disabling bottom-halves is
unnecessary.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>