IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Commit 14972cbd34ff ("net: lwtunnel: Handle fragmentation") moved
fragmentation logic away from lwtunnel by carry encap headroom and
use it in output MTU calculation. But the forwarding part was not
covered and created difference in MTU for output and forwarding and
further to silent drops on ipv4 forwarding path. Fix it by taking
into account lwtunnel encap headroom.
The same commit also introduced difference in how to treat RTAX_MTU
in IPv4 and IPv6 where latter explicitly removes lwtunnel encap
headroom from route MTU. Make IPv4 version do the same.
Fixes: 14972cbd34ff ("net: lwtunnel: Handle fragmentation")
Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The goal is to keep the mark during a bpf_redirect(), like it is done for
legacy encapsulation / decapsulation, when there is no x-netns.
This was initially done in commit 213dd74aee76 ("skbuff: Do not scrub skb
mark within the same name space").
When the call to skb_scrub_packet() was added in dev_forward_skb() (commit
8b27f27797ca ("skb: allow skb_scrub_packet() to be used by tunnels")), the
second argument (xnet) was set to true to force a call to skb_orphan(). At
this time, the mark was always cleanned up by skb_scrub_packet(), whatever
xnet value was.
This call to skb_orphan() was removed later in commit
9c4c325252c5 ("skbuff: preserve sock reference when scrubbing the skb.").
But this 'true' stayed here without any real reason.
Let's correctly set xnet in ____dev_forward_skb(), this function has access
to the previous interface and to the new interface.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
priv->cbs is an array of priv->info->num_cbs_shapers elements of type
struct sja1105_cbs_entry which only get allocated if CONFIG_NET_SCH_CBS
is enabled.
However, sja1105_reload_cbs() is called from sja1105_static_config_reload()
which in turn is called for any of the items in sja1105_reset_reasons,
therefore during the normal runtime of the driver and not just from a
code path which can be triggered by the tc-cbs offload.
The sja1105_reload_cbs() function does not contain a check whether the
priv->cbs array is NULL or not, it just assumes it isn't and proceeds to
iterate through the credit-based shaper elements. This leads to a NULL
pointer dereference.
The solution is to return success if the priv->cbs array has not been
allocated, since sja1105_reload_cbs() has nothing to do.
Fixes: 4d7525085a9b ("net: dsa: sja1105: offload the Credit-Based Shaper qdisc")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Schmidt says:
====================
pull-request: ieee802154 for net 2021-06-24
An update from ieee802154 for your *net* tree.
This time we only have fixes for ieee802154 hwsim driver.
Sparked from some syzcaller reports We got a potential
crash fix from Eric Dumazet and two memory leak fixes from
Dongliang Mu.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Complete to commit def4ec6dce393e ("e1000e: PCIm function state support")
Check the PCIm state only on CSME systems. There is no point to do this
check on non CSME systems.
This patch fixes a generation a false-positive warning:
"Error in exiting dmoff"
Fixes: def4ec6dce39 ("e1000e: PCIm function state support")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-06-24
This series contains updates to i40e driver only.
Dinghao Liu corrects error handling for failed i40e_vsi_request_irq()
call.
Mateusz allows for disabling of autonegotiation for all BaseT media.
Jesse corrects the multiplier being used on 5Gb speeds for PTP.
Jan adds locking in paths calling i40e_setup_pf_switch() that were
missing it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
First problem is that optlen is fetched without checking
there is more than one byte to parse.
Fix this by taking care of IPV6_TLV_PAD1 before
fetching optlen (under appropriate sanity checks against len)
Second problem is that IPV6_TLV_PADN checks of zero
padding are performed before the check of remaining length.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Fixes: c1412fce7ecc ("net/ipv6/exthdrs.c: Strict PadN option checking")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart says:
====================
net: macsec: fix key length when offloading
The key length used to copy the key to offloading drivers and to store
it is wrong and was working by chance as it matched the default key
length. But using a different key length fails. Fix it by using instead
the max length accepted in uAPI to store the key and the actual key
length when copying it.
This was tested on the MSCC PHY driver but not on the Atlantic MAC
(looking at the code it looks ok, but testing would be appreciated).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The key length used to store the macsec key was set to MACSEC_KEYID_LEN
(16), which is an issue as:
- This was never meant to be the key length.
- The key length can be > 16.
Fix this by using MACSEC_MAX_KEY_LEN instead (the max length accepted in
uAPI).
Fixes: 27736563ce32 ("net: atlantic: MACSec egress offload implementation")
Fixes: 9ff40a751a6f ("net: atlantic: MACSec ingress offload implementation")
Reported-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The key length used to store the macsec key was set to MACSEC_KEYID_LEN
(16), which is an issue as:
- This was never meant to be the key length.
- The key length can be > 16.
Fix this by using MACSEC_MAX_KEY_LEN instead (the max length accepted in
uAPI).
Fixes: 28c5107aa904 ("net: phy: mscc: macsec support")
Reported-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The key length used when offloading macsec to Ethernet or PHY drivers
was set to MACSEC_KEYID_LEN (16), which is an issue as:
- This was never meant to be the key length.
- The key length can be > 16.
Fix this by using MACSEC_MAX_KEY_LEN to store the key (the max length
accepted in uAPI) and secy->key_len to copy it.
Fixes: 3cf3227a21d1 ("net: macsec: hardware offloading infrastructure")
Reported-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAmDUKNUTHG1rbEBwZW5n
dXRyb25peC5kZQAKCRCpyVqK+u3vqb9BB/9V7TVpbOzQ3vvhS7w/7oz357aodB2t
uL4R5jV0fCf9hKXlHdbT3VKG49sPCnOSlZ3YYpFwfxB/AowWOe5sf9zKwcIb7z95
O3ZouirZYFRcTwFQs/wN4kUetyXl/gcHVJpvxQHowqf3U+f7u//JOiiBNeIbAdZA
f5+KfB77ZfuUpOjtijX3Tt4HnfbzWKCcf17cncTK0dWVJkChfqtBP1+H/9iH/xjj
8vVaW8Kx1FDLY7CKcHfCGxQy5QUy2OK+hfmeP60tUXdnB1znBbWH2D1Sc/aHlb3R
iWTBMBoxbZbN6YBYMBDjzwXgfYNb6IIAcV02xlvn+9nMShJPKeoxz3Mz
=9WcJ
-----END PGP SIGNATURE-----
Merge tag 'linux-can-fixes-for-5.13-20210624' of git://git.kernel.org/
pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2021-06-24
this is a pull request of 2 patches for net/master.
The first patch is by Norbert Slusarek and prevent allocation of
filter for optlen == 0 in the j1939 CAN protocol.
The last patch is by Stephane Grosjean and fixes a potential
starvation in the TX path of the peak_pciefd driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sukadev Bhattiprolu says:
====================
ibmvnic: Assorted bug fixes
Assorted bug fixes that we tested over the last several weeks.
Thanks to Brian King, Cris Forno, Dany Madden and Rick Lindsley for
reviews and help with testing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Parenthesize a check to be more explicit and to fix a sparse warning
seen on some distros.
Fixes: 91dc5d2553fbf ("ibmvnic: fix miscellaneous checks")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Free tx_pool and clear it, if allocation of tso_pool fails.
release_tx_pools() assumes we have both tx and tso_pools if ->tx_pool is
non-NULL. If allocation of tso_pool fails in init_tx_pools(), the assumption
will not be true and we would end up dereferencing ->tx_buff, ->free_map
fields from a NULL pointer.
Fixes: 3205306c6b8d ("ibmvnic: Update TX pool initialization routine")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
free_long_term_buff() checks ltb->buff to decide whether we have a long
term buffer to free. So set ltb->buff to NULL afer freeing. While here,
also clear ->map_id, fix up some coding style and log an error.
Fixes: 9c4eaabd1bb39 ("Check CRQ command return codes")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes a crash in replenish_rx_pool() when called from ibmvnic_poll()
after a previous call to replenish_rx_pool() encountered an error when
allocating a socket buffer.
Thanks to Rick Lindsley and Dany Madden for helping debug the crash.
Fixes: 4f0b6812e9b9 ("ibmvnic: Introduce batched RX buffer descriptor transmission")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We batch subordinate command response queue (scrq) descriptors that we
need to send to the VIOS using an "indirect" buffer. If after we queue
one or more scrqs in the indirect buffer encounter an error (say fail
to allocate an skb), we leave the queued scrq descriptors in the
indirect buffer until the next call to ibmvnic_xmit().
On the next call to ibmvnic_xmit(), it is possible that the adapter is
going through a reset and it is possible that the long term buffers
have been unmapped on the VIOS side. If we proceed to flush (send) the
packets that are in the indirect buffer, we will end up using the old
map ids and this can cause the VIOS to trigger an unnecessary FATAL
error reset.
Instead of flushing packets remaining on the indirect_buff, discard
(clean) them instead.
Fixes: 0d973388185d4 ("ibmvnic: Introduce xmit_more support using batched subCRQ hcalls")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 7c451f3ef676c805a4b77a743a01a5c21a250a73.
When a vnic interface is taken down and then up, connectivity is not
restored. We bisected it to this commit. Reverting this commit until
we can fully investigate the issue/benefit of the change.
Fixes: 7c451f3ef676 ("ibmvnic: remove duplicate napi_schedule call in open function")
Reported-by: Cristobal Forno <cforno12@linux.ibm.com>
Reported-by: Abdul Haleem <abdhalee@in.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 1c7d45e7b2c29080bf6c8cd0e213cc3cbb62a054.
We tried to optimize the number of hcalls we send and skipped sending
the REQUEST_MAP calls for some maps. However during resets, we need to
resend all the maps to the VIOS since the VIOS does not remember the
old values. In fact we may have failed over to a new VIOS which will
not have any of the mappings.
When we send packets with map ids the VIOS does not know about, it
triggers a FATAL reset. While the client does recover from the FATAL
error reset, we are seeing a large number of such resets. Handling
FATAL resets is lot more unnecessary work than issuing a few more
hcalls so revert the commit and resend the maps to the VIOS.
Fixes: 1c7d45e7b2c ("ibmvnic: simplify reset_long_term_buff function")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A recent change that made i40e use new udp_tunnel infrastructure
uses a method that expects to be called under rtnl lock.
However, not all codepaths do the lock prior to calling
i40e_setup_pf_switch.
Fix that by adding additional rtnl locking and unlocking.
Fixes: 40a98cb6f01f ("i40e: convert to new udp_tunnel infrastructure")
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
As reported by Alex Sergeev, the i40e driver is incrementing the PTP
clock at 40Gb speeds when linked at 5Gb. Fix this bug by making
sure that the right multiplier is selected when linked at 5Gb.
Fixes: 3dbdd6c2f70a ("i40e: Add support for 5Gbps cards")
Cc: stable@vger.kernel.org
Reported-by: Alex Sergeev <asergeev@carbonrobotics.com>
Suggested-by: Alex Sergeev <asergeev@carbonrobotics.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Disabling autonegotiation was allowed only for 10GBaseT PHY.
The condition was changed to check if link media type is BaseT.
Fixes: 3ce12ee9d8f9 ("i40e: Fix order of checks when enabling/disabling autoneg in ethtool")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Karen Sornek <karen.sornek@intel.com>
Signed-off-by: Dawid Lukwinski <dawid.lukwinski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When vsi->type == I40E_VSI_FDIR, we have caught the return value of
i40e_vsi_request_irq() but without further handling. Check and execute
memory clean on failure just like the other i40e_vsi_request_irq().
Fixes: 8a9eb7d3cbcab ("i40e: rework fdir setup and teardown")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Rather than just indicating that transmission can start, this patch
requires the explicit flushing of the network TX queue when the driver
is informed by the device that it can transmit, next to its
configuration.
In this way, if frames have already been written by the application,
they will actually be transmitted.
Fixes: ffd137f7043c ("can: peak/pcie_fd: remove useless code when interface starts")
Link: https://lore.kernel.org/r/20210623142600.149904-1-s.grosjean@peak-system.com
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
If optval != NULL and optlen == 0 are specified for SO_J1939_FILTER in
j1939_sk_setsockopt(), memdup_sockptr() will return ZERO_PTR for 0
size allocation. The new filter will be mistakenly assigned ZERO_PTR.
This patch checks for optlen != 0 and filter will be assigned NULL in
case of optlen == 0.
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Link: https://lore.kernel.org/r/20210620123842.117975-1-nslusarek@gmx.net
Signed-off-by: Norbert Slusarek <nslusarek@gmx.net>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Daniel Borkmann says:
====================
pull-request: bpf 2021-06-23
The following pull-request contains BPF updates for your *net* tree.
We've added 14 non-merge commits during the last 6 day(s) which contain
a total of 13 files changed, 137 insertions(+), 64 deletions(-).
Note that when you merge net into net-next, there is a small merge conflict
between 9f2470fbc4cb ("skmsg: Improve udp_bpf_recvmsg() accuracy") from bpf
with c49661aa6f70 ("skmsg: Remove unused parameters of sk_msg_wait_data()")
from net-next. Resolution is to: i) net/ipv4/udp_bpf.c: take udp_msg_wait_data()
and remove err parameter from the function, ii) net/ipv4/tcp_bpf.c: take
tcp_msg_wait_data() and remove err parameter from the function, iii) for
net/core/skmsg.c and include/linux/skmsg.h: remove the sk_msg_wait_data()
implementation and its prototype in header.
The main changes are:
1) Fix BPF poke descriptor adjustments after insn rewrite, from John Fastabend.
2) Fix regression when using BPF_OBJ_GET with non-O_RDWR flags, from Maciej Żenczykowski.
3) Various bug and error handling fixes for UDP-related sock_map, from Cong Wang.
4) Fix patching of vmlinux BTF IDs with correct endianness, from Tony Ambardar.
5) Two fixes for TX descriptor validation in AF_XDP, from Magnus Karlsson.
6) Fix overflow in size calculation for bpf_map_area_alloc(), from Bui Quang Minh.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
I see no reason why max_dst_opts_cnt and max_hbh_opts_cnt
are fetched from the initial net namespace.
The other sysctls (max_dst_opts_len & max_hbh_opts_len)
are in fact already using the current ns.
Note: it is not clear why ipv6_destopt_rcv() use two ways to
get to the netns :
1) dev_net(dst->dev)
Originally used to increment IPSTATS_MIB_INHDRERRORS
2) dev_net(skb->dev)
Tom used this variant in his patch.
Maybe this calls to use ipv6_skb_net() instead ?
Fixes: 47d3d7ac656a ("ipv6: Implement limits on Hop-by-Hop and Destination options")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@quantonium.net>
Cc: Coco Li <lixiaoyan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Broadcom UniMAC MDIO bus from mdio-bcm-unimac module comes too late.
So, GENET cannot find the ethernet PHY on UniMAC MDIO bus. This leads
GENET fail to attach the PHY as following log:
bcmgenet fd580000.ethernet: GENET 5.0 EPHY: 0x0000
...
could not attach to PHY
bcmgenet fd580000.ethernet eth0: failed to connect to PHY
uart-pl011 fe201000.serial: no DMA platform data
libphy: bcmgenet MII bus: probed
...
unimac-mdio unimac-mdio.-19: Broadcom UniMAC MDIO bus
This patch adds the soft dependency to load mdio-bcm-unimac module
before genet module to avoid the issue.
Fixes: 9a4e79697009 ("net: bcmgenet: utilize generic Broadcom UniMAC MDIO controller driver")
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=213485
Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 3c9ef511b9fa ("bonding: avoid adding slave device with
IFF_MASTER flag") fix a crash when add slave device with IFF_MASTER,
but it rejects the scenario of nested bonding device.
As Eric Dumazet described: since there indeed is a usage scenario about
nesting bonding, we should not break it.
So we add a new judgment condition to allow nesting of bonding device.
Fixes: 3c9ef511b9fa ("bonding: avoid adding slave device with IFF_MASTER flag")
Suggested-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Di Zhu <zhudi21@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2021-06-23
1) Don't return a mtu smaller than 1280 on IPv6 pmtu discovery.
From Sabrina Dubroca
2) Fix seqcount rcu-read side in xfrm_policy_lookup_bytype
for the PREEMPT_RT case. From Varad Gautam.
3) Remove a repeated declaration of xfrm_parse_spi.
From Shaokun Zhang.
4) IPv4 beet mode can't handle fragments, but IPv6 does.
commit 68dc022d04eb ("xfrm: BEET mode doesn't support
fragments for inner packets") handled IPv4 and IPv6
the same way. Relax the check for IPv6 because fragments
are possible here. From Xin Long.
5) Memory allocation failures are not reported for
XFRMA_ENCAP and XFRMA_COADDR in xfrm_state_construct.
Fix this by moving both cases in front of the function.
6) Fix a missing initialization in the xfrm offload fallback
fail case for bonding devices. From Ayush Sawal.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Nicolas Dichtel updates MAINTAINERS file to add Netfilter IRC channel.
2) Skip non-IPv6 packets in nft_exthdr.
3) Skip non-TCP packets in nft_osf.
4) Skip non-TCP/UDP packets in nft_tproxy.
5) Memleak in hardware offload infrastructure when counters are used
for first time in a rule.
6) The VLAN transfer routine must use FLOW_DISSECTOR_KEY_BASIC instead
of FLOW_DISSECTOR_KEY_CONTROL. Moreover, make a more robust check
for 802.1q and 802.1ad to restore simple matching on transport
protocols.
7) Fix bogus EPERM when listing a ruleset when table ownership flag
is set on.
8) Honor table ownership flag when table is referenced by handle.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The sub-programs prog->aux->poke_tab[] is populated in jit_subprogs() and
then used when emitting 'BPF_JMP|BPF_TAIL_CALL' insn->code from the
individual JITs. The poke_tab[] to use is stored in the insn->imm by
the code adding it to that array slot. The JIT then uses imm to find the
right entry for an individual instruction. In the x86 bpf_jit_comp.c
this is done by calling emit_bpf_tail_call_direct with the poke_tab[]
of the imm value.
However, we observed the below null-ptr-deref when mixing tail call
programs with subprog programs. For this to happen we just need to
mix bpf-2-bpf calls and tailcalls with some extra calls or instructions
that would be patched later by one of the fixup routines. So whats
happening?
Before the fixup_call_args() -- where the jit op is done -- various
code patching is done by do_misc_fixups(). This may increase the
insn count, for example when we patch map_lookup_up using map_gen_lookup
hook. This does two things. First, it means the instruction index,
insn_idx field, of a tail call instruction will move by a 'delta'.
In verifier code,
struct bpf_jit_poke_descriptor desc = {
.reason = BPF_POKE_REASON_TAIL_CALL,
.tail_call.map = BPF_MAP_PTR(aux->map_ptr_state),
.tail_call.key = bpf_map_key_immediate(aux),
.insn_idx = i + delta,
};
Then subprog start values subprog_info[i].start will be updated
with the delta and any poke descriptor index will also be updated
with the delta in adjust_poke_desc(). If we look at the adjust
subprog starts though we see its only adjusted when the delta
occurs before the new instructions,
/* NOTE: fake 'exit' subprog should be updated as well. */
for (i = 0; i <= env->subprog_cnt; i++) {
if (env->subprog_info[i].start <= off)
continue;
Earlier subprograms are not changed because their start values
are not moved. But, adjust_poke_desc() does the offset + delta
indiscriminately. The result is poke descriptors are potentially
corrupted.
Then in jit_subprogs() we only populate the poke_tab[]
when the above insn_idx is less than the next subprogram start. From
above we corrupted our insn_idx so we might incorrectly assume a
poke descriptor is not used in a subprogram omitting it from the
subprogram. And finally when the jit runs it does the deref of poke_tab
when emitting the instruction and crashes with below. Because earlier
step omitted the poke descriptor.
The fix is straight forward with above context. Simply move same logic
from adjust_subprog_starts() into adjust_poke_descs() and only adjust
insn_idx when needed.
[ 82.396354] bpf_testmod: version magic '5.12.0-rc2alu+ SMP preempt mod_unload ' should be '5.12.0+ SMP preempt mod_unload '
[ 82.623001] loop10: detected capacity change from 0 to 8
[ 88.487424] ==================================================================
[ 88.487438] BUG: KASAN: null-ptr-deref in do_jit+0x184a/0x3290
[ 88.487455] Write of size 8 at addr 0000000000000008 by task test_progs/5295
[ 88.487471] CPU: 7 PID: 5295 Comm: test_progs Tainted: G I 5.12.0+ #386
[ 88.487483] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.9.2 01/24/2019
[ 88.487490] Call Trace:
[ 88.487498] dump_stack+0x93/0xc2
[ 88.487515] kasan_report.cold+0x5f/0xd8
[ 88.487530] ? do_jit+0x184a/0x3290
[ 88.487542] do_jit+0x184a/0x3290
...
[ 88.487709] bpf_int_jit_compile+0x248/0x810
...
[ 88.487765] bpf_check+0x3718/0x5140
...
[ 88.487920] bpf_prog_load+0xa22/0xf10
Fixes: a748c6975dea3 ("bpf: propagate poke descriptors to subprograms")
Reported-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Daniel Borkmann <daniel@iogearbox.net>
Both MAC802154_HWSIM_ATTR_RADIO_ID and MAC802154_HWSIM_ATTR_RADIO_EDGE
must be present to avoid a crash.
Fixes: f25da51fdc38 ("ieee802154: hwsim: add replacement for fakelb")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Aring <alex.aring@gmail.com>
Cc: Stefan Schmidt <stefan@datenfreihafen.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Alexander Aring <aahringo@redhat.com>
Link: https://lore.kernel.org/r/20210621180244.882076-1-eric.dumazet@gmail.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
No matter from hwsim_remove or hwsim_del_radio_nl, hwsim_del fails to
remove the entry in the edges list. Take the example below, phy0, phy1
and e0 will be deleted, resulting in e1 not freed and accessed in the
future.
hwsim_phys
|
------------------------------
| |
phy0 (edges) phy1 (edges)
----> e1 (idx = 1) ----> e0 (idx = 0)
Fix this by deleting and freeing all the entries in the edges list
between hwsim_edge_unsubscribe_me and list_del(&phy->list).
Reported-by: syzbot+b80c9959009a9325cdff@syzkaller.appspotmail.com
Fixes: 1c9f4a3fce77 ("ieee802154: hwsim: fix rcu handling")
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Acked-by: Alexander Aring <aahringo@redhat.com>
Link: https://lore.kernel.org/r/20210616020901.2759466-1-mudongliangabcd@gmail.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
When changing number of TX queues using ethtool:
# ethtool -L eth0 tx 1
[ 135.301047] Unable to handle kernel paging request at virtual address 00000000af5d0000
[...]
[ 135.525128] Call trace:
[ 135.525142] dma_release_from_dev_coherent+0x2c/0xb0
[ 135.525148] dma_free_attrs+0x54/0xe0
[ 135.525156] k3_cppi_desc_pool_destroy+0x50/0xa0
[ 135.525164] am65_cpsw_nuss_remove_tx_chns+0x88/0xdc
[ 135.525171] am65_cpsw_set_channels+0x3c/0x70
[...]
This is because k3_cppi_desc_pool_destroy() which is called after
k3_udma_glue_release_tx_chn() in am65_cpsw_nuss_remove_tx_chns()
references struct device that is unregistered at the end of
k3_udma_glue_release_tx_chn()
Therefore the right order is to call k3_cppi_desc_pool_destroy() and
destroy desc pool before calling k3_udma_glue_release_tx_chn().
Fix this throughout the driver.
Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Resetting software indexes in bcm4908_dma_alloc_buf_descs() is not
enough as it's called during device probe only. Driver resets DMA on
every .ndo_open callback and it's required to reset indexes then.
This fixes inconsistent rings state and stalled traffic after interface
down & up sequence.
Fixes: 4feffeadbcb2 ("net: broadcom: bcm4908enet: add BCM4908 controller driver")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
When doing source address validation, the flowi4 struct used for
fib_lookup should be in the reverse direction to the given skb.
fl4_dport and fl4_sport returned by fib4_rules_early_flow_dissect
should thus be swapped.
Fixes: 5a847a6e1477 ("net/ipv4: Initialize proto and ports in flow struct")
Signed-off-by: Miao Wang <shankerwangmiao@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following steps will definitely cause the kernel to crash:
ip link add vrf1 type vrf table 1
modprobe bonding.ko max_bonds=1
echo "+vrf1" >/sys/class/net/bond0/bonding/slaves
rmmod bonding
The root cause is that: When the VRF is added to the slave device,
it will fail, and some cleaning work will be done. because VRF device
has IFF_MASTER flag, cleanup process will not clear the IFF_BONDING flag.
Then, when we unload the bonding module, unregister_netdevice_notifier()
will treat the VRF device as a bond master device and treat netdev_priv()
as struct bonding{} which actually is struct net_vrf{}.
By analyzing the processing logic of bond_enslave(), it seems that
it is not allowed to add the slave device with the IFF_MASTER flag, so
we need to add a code check for this situation.
Signed-off-by: Di Zhu <zhudi21@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 6c11fbf97e69 ("ip6_tunnel: add MPLS transmit support")
moved assiging inner_ipproto down from ipxip6_tnl_xmit() to
its callee ip6_tnl_xmit(). The latter is also used by GRE.
Since commit 38720352412a ("gre: Use inner_proto to obtain inner
header protocol") GRE had been depending on skb->inner_protocol
during segmentation. It sets it in gre_build_header() and reads
it in gre_gso_segment(). Changes to ip6_tnl_xmit() overwrite
the protocol, resulting in GSO skbs getting dropped.
Note that inner_protocol is a union with inner_ipproto,
GRE uses the former while the change switched it to the latter
(always setting it to just IPPROTO_GRE).
Restore the original location of skb_set_inner_ipproto(),
it is unclear why it was moved in the first place.
Fixes: 6c11fbf97e69 ("ip6_tunnel: add MPLS transmit support")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mat Martineau says:
====================
mptcp: Fixes for v5.13
Here are two MPTCP fixes from Paolo.
Patch 1 fixes some possible connect-time race conditions with
MPTCP-level connection state changes.
Patch 2 deletes a duplicate function declaration.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 7896248983ef ("mptcp: add skeleton to sync msk socket
options to subflows") introduced a duplicate declaration of
mptcp_setsockopt(), just drop it.
Reported-by: Florian Westphal <fw@strlen.de>
Fixes: 7896248983ef ("mptcp: add skeleton to sync msk socket options to subflows")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The msk socket state is currently updated in a few spots without
owning the msk socket lock itself.
Some of such operations are safe, as they happens before exposing
the msk socket to user-space and can't race with other changes.
A couple of them, at connect time, can actually race with close()
or shutdown(), leaving breaking the socket state machine.
This change addresses the issue moving such update under the msk
socket lock with the usual:
<acquire spinlock>
<check sk lock onwers>
<ev defer to release_cb>
scheme.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/56
Fixes: 8fd738049ac3 ("mptcp: fallback in case of simultaneous connect")
Fixes: c3c123d16c0e ("net: mptcp: don't hang in mptcp_sendmsg() after TCP fallback")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In 32-bit architecture, the result of sizeof() is a 32-bit integer so
the expression becomes the multiplication between 2 32-bit integer which
can potentially leads to integer overflow. As a result,
bpf_map_area_alloc() allocates less memory than needed.
Fix this by casting 1 operand to u64.
Fixes: 0d2c4f964050 ("bpf: Eliminate rlimit-based memory accounting for sockmap and sockhash maps")
Fixes: 99c51064fb06 ("devmap: Use bpf_map_area_alloc() for allocating hash buckets")
Fixes: 546ac1ffb70d ("bpf: add devmap, a map for storing net device references")
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210613143440.71975-1-minhquangbui99@gmail.com
The fail path of efx_ef10_sriov_alloc_vf_vswitching is identical to the
full content of efx_ef10_sriov_free_vf_vswitching, so replace it for a
single call to efx_ef10_sriov_free_vf_vswitching.
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During SRIOV disabling it is checked wether any VF is currently attached
to a guest, using pci_vfs_assigned function. However, this check only
works with VFs attached with Xen, not with vfio/KVM. Added comments
clarifying this point.
Also, replaced manual check of PCI_DEV_FLAGS_ASSIGNED flag and used the
helper function pci_is_dev_assigned instead.
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If SRIOV cannot be disabled during device removal or module unloading,
return error code so it can be logged properly in the calling function.
Note that this can only happen if any VF is currently attached to a
guest using Xen, but not with vfio/KVM. Despite that in that case the
VFs won't work properly with PF removed and/or the module unloaded, I
have let it as is because I don't know what side effects may have
changing it, and also it seems to be the same that other drivers are
doing in this situation.
In the case of being called during SRIOV reconfiguration, the behavior
hasn't changed because the function is called with force=false.
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If pci_remove was called for a PF with VFs, the removal of the VFs was
called twice from efx_ef10_sriov_fini: one directly with pci_driver->remove
and another implicit by calling pci_disable_sriov, which also perform
the VFs remove. This was leading to crashing the kernel on the second
attempt.
Given that pci_disable_sriov already calls to pci remove function, get
rid of the direct call to pci_driver->remove from the driver.
2 different ways to trigger the bug:
- Create one or more VFs, then attach the PF to a virtual machine (at
least with qemu/KVM)
- Create one or more VFs, then remove the PF with:
echo 1 > /sys/bus/pci/devices/PF_PCI_ID/remove
Removing sfc module does not trigger the error, at least for me, because
it removes the VF first, and then the PF.
Example of a log with the error:
list_del corruption, ffff967fd20a8ad0->next is LIST_POISON1 (dead000000000100)
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:47!
[...trimmed...]
RIP: 0010:__list_del_entry_valid.cold.1+0x12/0x4c
[...trimmed...]
Call Trace:
efx_dissociate+0x1f/0x140 [sfc]
efx_pci_remove+0x27/0x150 [sfc]
pci_device_remove+0x3b/0xc0
device_release_driver_internal+0x103/0x1f0
pci_stop_bus_device+0x69/0x90
pci_stop_and_remove_bus_device+0xe/0x20
pci_iov_remove_virtfn+0xba/0x120
sriov_disable+0x2f/0xe0
efx_ef10_pci_sriov_disable+0x52/0x80 [sfc]
? pcie_aer_is_native+0x12/0x40
efx_ef10_sriov_fini+0x72/0x110 [sfc]
efx_pci_remove+0x62/0x150 [sfc]
pci_device_remove+0x3b/0xc0
device_release_driver_internal+0x103/0x1f0
unbind_store+0xf6/0x130
kernfs_fop_write+0x116/0x190
vfs_write+0xa5/0x1a0
ksys_write+0x4f/0xb0
do_syscall_64+0x5b/0x1a0
entry_SYSCALL_64_after_hwframe+0x65/0xca
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>