IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
[ Upstream commit 8fc0e3b6a8 ]
Frag needed should only be sent if the header enables DF.
This fix allows packets larger than MTU to pass the xfrm interface
and be fragmented after encapsulation, aligning behavior with
non-interface xfrm.
Fixes: f203b76d78 ("xfrm: Add virtual xfrm interfaces")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 4b5923249b upstream.
There are a few more bits in the GSWIP_MII_CFG register for which we
did rely on the boot-loader (or the hardware defaults) to set them up
properly.
For some external RMII PHYs we need to select the GSWIP_MII_CFG_RMII_CLK
bit and also we should un-set it for non-RMII PHYs. The
GSWIP_MII_CFG_RMII_CLK bit is ignored for other PHY connection modes.
The GSWIP IP also supports in-band auto-negotiation for RGMII PHYs when
the GSWIP_MII_CFG_RGMII_IBS bit is set. Clear this bit always as there's
no known hardware which uses this (so it is not tested yet).
Clear the xMII isolation bit when set at initialization time if it was
previously set by the bootloader. Not doing so could lead to no traffic
(neither RX nor TX) on a port with this bit set.
While here, also add the GSWIP_MII_CFG_RESET bit. We don't need to
manage it because this bit is self-clearning when set. We still add it
here to get a better overview of the GSWIP_MII_CFG register.
Fixes: 14fceff477 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Cc: stable@vger.kernel.org
Suggested-by: Hauke Mehrtens <hauke@hauke-m.de>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Updated after the upstream commit 3e9005be87 required some changes
for Linux 5.4 ]
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 3e9005be87 upstream.
PHY auto polling on the GSWIP hardware can be used so link changes
(speed, link up/down, etc.) can be detected automatically. Internally
GSWIP reads the PHY's registers for this functionality. Based on this
automatic detection GSWIP can also automatically re-configure it's port
settings. Unfortunately this auto polling (and configuration) mechanism
seems to cause various issues observed by different people on different
devices:
- FritzBox 7360v2: the two Gbit/s ports (connected to the two internal
PHY11G instances) are working fine but the two Fast Ethernet ports
(using an AR8030 RMII PHY) are completely dead (neither RX nor TX are
received). It turns out that the AR8030 PHY sets the BMSR_ESTATEN bit
as well as the ESTATUS_1000_TFULL and ESTATUS_1000_XFULL bits. This
makes the PHY auto polling state machine (rightfully?) think that the
established link speed (when the other side is Gbit/s capable) is
1Gbit/s.
- None of the Ethernet ports on the Zyxel P-2812HNU-F1 (two are
connected to the internal PHY11G GPHYs while the other three are
external RGMII PHYs) are working. Neither RX nor TX traffic was
observed. It is not clear which part of the PHY auto polling state-
machine caused this.
- FritzBox 7412 (only one LAN port which is connected to one of the
internal GPHYs running in PHY22F / Fast Ethernet mode) was seeing
random disconnects (link down events could be seen). Sometimes all
traffic would stop after such disconnect. It is not clear which part
of the PHY auto polling state-machine cauased this.
- TP-Link TD-W9980 (two ports are connected to the internal GPHYs
running in PHY11G / Gbit/s mode, the other two are external RGMII
PHYs) was affected by similar issues as the FritzBox 7412 just without
the "link down" events
Switch to software based configuration instead of PHY auto polling (and
letting the GSWIP hardware configure the ports automatically) for the
following link parameters:
- link up/down
- link speed
- full/half duplex
- flow control (RX / TX pause)
After a big round of manual testing by various people (who helped test
this on OpenWrt) it turns out that this fixes all reported issues.
Additionally it can be considered more future proof because any
"quirk" which is implemented for a PHY on the driver side can now be
used with the GSWIP hardware as well because Linux is in control of the
link parameters.
As a nice side-effect this also solves a problem where fixed-links were
not supported previously because we were relying on the PHY auto polling
mechanism, which cannot work for fixed-links as there's no PHY from
where it can read the registers. Configuring the link settings on the
GSWIP ports means that we now use the settings from device-tree also for
ports with fixed-links.
Fixes: 14fceff477 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Fixes: 3e6fdeb28f ("net: dsa: lantiq_gswip: Let GSWIP automatically set the xMII clock")
Cc: stable@vger.kernel.org
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Move gswip_port_set_{speed, duplex, pause} calls from
gswip_phylink_mac_link_up to gswip_phylink_mac_config because the
data required for these functions is not available inside
gswip_phylink_mac_link_up yet in Linux 5.4 (it was only added with
Linux 5.7). ]
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 9adc89af72 upstream.
Currently the mentioned helper can end-up freeing the socket wmem
without waking-up any processes waiting for more write memory.
If the partially orphaned skb is attached to an UDP (or raw) socket,
the lack of wake-up can hang the user-space.
Even for TCP sockets not calling the sk destructor could have bad
effects on TSQ.
Address the issue using skb_orphan to release the sk wmem before
setting the new sock_efree destructor. Additionally bundle the
whole ownership update in a new helper, so that later other
potential users could avoid duplicate code.
v1 -> v2:
- use skb_orphan() instead of sort of open coding it (Eric)
- provide an helper for the ownership change (Eric)
Fixes: f6ba8d33cf ("netem: fix skb_orphan_partial()")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 630e4576f8 upstream.
Found by virtue of ipv6 raw sockets not honouring the per-socket
IP{,V6}_FREEBIND setting.
Based on hits found via:
git grep '[.]ip_nonlocal_bind'
We fix both raw ipv6 sockets to honour IP{,V6}_FREEBIND and IP{,V6}_TRANSPARENT,
and we fix sctp sockets to honour IP{,V6}_TRANSPARENT (they already honoured
FREEBIND), and not just the ipv6 'ip_nonlocal_bind' sysctl.
The helper is defined as:
static inline bool ipv6_can_nonlocal_bind(struct net *net, struct inet_sock *inet) {
return net->ipv6.sysctl.ip_nonlocal_bind || inet->freebind || inet->transparent;
}
so this change only widens the accepted opt-outs and is thus a clean bugfix.
I'm not entirely sure what 'fixes' tag to add, since this is AFAICT an ancient bug,
but IMHO this should be applied to stable kernels as far back as possible.
As such I'm adding a 'fixes' tag with the commit that originally added the helper,
which happened in 4.19. Backporting to older LTS kernels (at least 4.9 and 4.14)
would presumably require open-coding it or backporting the helper as well.
Other possibly relevant commits:
v4.18-rc6-1502-g83ba4645152d net: add helpers checking if socket can be bound to nonlocal address
v4.18-rc6-1431-gd0c1f01138c4 net/ipv6: allow any source address for sendmsg pktinfo with ip_nonlocal_bind
v4.14-rc5-271-gb71d21c274ef sctp: full support for ipv6 ip_nonlocal_bind & IP_FREEBIND
v4.7-rc7-1883-g9b9742022888 sctp: support ipv6 nonlocal bind
v4.1-12247-g35a256fee52c ipv6: Nonlocal bind
Cc: Lorenzo Colitti <lorenzo@google.com>
Fixes: 83ba464515 ("net: add helpers checking if socket can be bound to nonlocal address")
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-By: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9d6803921a upstream.
Reset MAC header in HSR Tx path. This is needed, because direct packet
transmission, e.g. by specifying PACKET_QDISC_BYPASS does not reset the MAC
header.
This has been observed using the following setup:
|$ ip link add name hsr0 type hsr slave1 lan0 slave2 lan1 supervision 45 version 1
|$ ifconfig hsr0 up
|$ ./test hsr0
The test binary is using mmap'ed sockets and is specifying the
PACKET_QDISC_BYPASS socket option.
This patch resolves the following warning on a non-patched kernel:
|[ 112.725394] ------------[ cut here ]------------
|[ 112.731418] WARNING: CPU: 1 PID: 257 at net/hsr/hsr_forward.c:560 hsr_forward_skb+0x484/0x568
|[ 112.739962] net/hsr/hsr_forward.c:560: Malformed frame (port_src hsr0)
The warning can be safely removed, because the other call sites of
hsr_forward_skb() make sure that the skb is prepared correctly.
Fixes: d346a3fae3 ("packet: introduce PACKET_QDISC_BYPASS socket option")
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1153a74768 upstream.
Normally, TXQs have
txq->tid = tid;
txq->ac = ieee80211_ac_from_tid(tid);
However, the special management TXQ actually has
txq->tid = IEEE80211_NUM_TIDS; // 16
txq->ac = IEEE80211_AC_VO;
This makes sense, but ieee80211_ac_from_tid(16) is the same
as ieee80211_ac_from_tid(0) which is just IEEE80211_AC_BE.
Now, normally this is fine. However, if the netdev queues
were stopped, then the code in ieee80211_tx_dequeue() will
propagate the stop from the interface (vif->txqs_stopped[])
if the AC 2 (ieee80211_ac_from_tid(txq->tid)) is marked as
stopped. On wake, however, __ieee80211_wake_txqs() will wake
the TXQ if AC 0 (txq->ac) is woken up.
If a driver stops all queues with ieee80211_stop_tx_queues()
and then wakes them again with ieee80211_wake_tx_queues(),
the ieee80211_wake_txqs() tasklet will run to resync queue
and TXQ state. If all queues were woken, then what'll happen
is that _ieee80211_wake_txqs() will run in order of HW queues
0-3, typically (and certainly for iwlwifi) corresponding to
ACs 0-3, so it'll call __ieee80211_wake_txqs() for each AC in
order 0-3.
When __ieee80211_wake_txqs() is called for AC 0 (VO) that'll
wake up the management TXQ (remember its tid is 16), and the
driver's wake_tx_queue() will be called. That tries to get a
frame, which will immediately *stop* the TXQ again, because
now we check against AC 2, and AC 2 hasn't yet been marked as
woken up again in sdata->vif.txqs_stopped[] since we're only
in the __ieee80211_wake_txqs() call for AC 0.
Thus, the management TXQ will never be started again.
Fix this by checking txq->ac directly instead of calculating
the AC as ieee80211_ac_from_tid(txq->tid).
Fixes: adf8ed01e4 ("mac80211: add an optional TXQ for other PS-buffered frames")
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20210323210500.bf4d50afea4a.I136ffde910486301f8818f5442e3c9bf8670a9c4@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6b5674fe6b upstream.
Remove vsi->netdev->name from the trace.
This is redundant information. With the devinfo trace, the adapter
is already identifiable.
Previously following error was produced when compiling against sparse.
i40e_main.c:2571 i40e_sync_vsi_filters() error:
we previously assumed 'vsi->netdev' could be null (see line 2323)
Fixes: b603f9dc20 ("i40e: Log info when PF is entering and leaving Allmulti mode.")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 61431a5907 upstream.
Commit 924a9bc362 ("net: check if protocol extracted by virtio_net_hdr_set_proto is correct")
added a call to dev_parse_header_protocol() but mac_header is not yet set.
This means that eth_hdr() reads complete garbage, and syzbot complained about it [1]
This patch resets mac_header earlier, to get more coverage about this change.
Audit of virtio_net_hdr_to_skb() callers shows that this change should be safe.
[1]
BUG: KASAN: use-after-free in eth_header_parse_protocol+0xdc/0xe0 net/ethernet/eth.c:282
Read of size 2 at addr ffff888017a6200b by task syz-executor313/8409
CPU: 1 PID: 8409 Comm: syz-executor313 Not tainted 5.12.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232
__kasan_report mm/kasan/report.c:399 [inline]
kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
eth_header_parse_protocol+0xdc/0xe0 net/ethernet/eth.c:282
dev_parse_header_protocol include/linux/netdevice.h:3177 [inline]
virtio_net_hdr_to_skb.constprop.0+0x99d/0xcd0 include/linux/virtio_net.h:83
packet_snd net/packet/af_packet.c:2994 [inline]
packet_sendmsg+0x2325/0x52b0 net/packet/af_packet.c:3031
sock_sendmsg_nosec net/socket.c:654 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:674
sock_no_sendpage+0xf3/0x130 net/core/sock.c:2860
kernel_sendpage.part.0+0x1ab/0x350 net/socket.c:3631
kernel_sendpage net/socket.c:3628 [inline]
sock_sendpage+0xe5/0x140 net/socket.c:947
pipe_to_sendpage+0x2ad/0x380 fs/splice.c:364
splice_from_pipe_feed fs/splice.c:418 [inline]
__splice_from_pipe+0x43e/0x8a0 fs/splice.c:562
splice_from_pipe fs/splice.c:597 [inline]
generic_splice_sendpage+0xd4/0x140 fs/splice.c:746
do_splice_from fs/splice.c:767 [inline]
do_splice+0xb7e/0x1940 fs/splice.c:1079
__do_splice+0x134/0x250 fs/splice.c:1144
__do_sys_splice fs/splice.c:1350 [inline]
__se_sys_splice fs/splice.c:1332 [inline]
__x64_sys_splice+0x198/0x250 fs/splice.c:1332
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
Fixes: 924a9bc362 ("net: check if protocol extracted by virtio_net_hdr_set_proto is correct")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Balazs Nemeth <bnemeth@redhat.com>
Cc: Willem de Bruijn <willemb@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1c84b33101 upstream.
In '4da6a196f93b1' we fixed a potential unhash loop caused when
a TLS socket in a sockmap was removed from the sockmap. This
happened because the unhash operation on the TLS ctx continued
to point at the sockmap implementation of unhash even though the
psock has already been removed. The sockmap unhash handler when a
psock is removed does the following,
void sock_map_unhash(struct sock *sk)
{
void (*saved_unhash)(struct sock *sk);
struct sk_psock *psock;
rcu_read_lock();
psock = sk_psock(sk);
if (unlikely(!psock)) {
rcu_read_unlock();
if (sk->sk_prot->unhash)
sk->sk_prot->unhash(sk);
return;
}
[...]
}
The unlikely() case is there to handle the case where psock is detached
but the proto ops have not been updated yet. But, in the above case
with TLS and removed psock we never fixed sk_prot->unhash() and unhash()
points back to sock_map_unhash resulting in a loop. To fix this we added
this bit of code,
static inline void sk_psock_restore_proto(struct sock *sk,
struct sk_psock *psock)
{
sk->sk_prot->unhash = psock->saved_unhash;
This will set the sk_prot->unhash back to its saved value. This is the
correct callback for a TLS socket that has been removed from the sock_map.
Unfortunately, this also overwrites the unhash pointer for all psocks.
We effectively break sockmap unhash handling for any future socks.
Omitting the unhash operation will leave stale entries in the map if
a socket transition through unhash, but does not do close() op.
To fix set unhash correctly before calling into tls_update. This way the
TLS enabled socket will point to the saved unhash() handler.
Fixes: 4da6a196f9 ("bpf: Sockmap/tls, during free we may call tcp_bpf_unhash() in loop")
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Reported-by: Lorenz Bauer <lmb@cloudflare.com>
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/161731441904.68884.15593917809745631972.stgit@john-XPS-13-9370
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6e5a03bcba upstream.
In nfp_bpf_ctrl_msg_rx, if
nfp_ccm_get_type(skb) == NFP_CCM_TYPE_BPF_BPF_EVENT is true, the skb
will be freed. But the skb is still used by nfp_ccm_rx(&bpf->ccm, skb).
My patch adds a return when the skb was freed.
Fixes: bcf0cafab4 ("nfp: split out common control message handling code")
Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8a12f88361 upstream.
Multiple ttys try to claim the same the minor number causing a double
unregistration of the same device. The first unregistration succeeds
but the next one results in a null-ptr-deref.
The get_free_serial_index() function returns an available minor number
but doesn't assign it immediately. The assignment is done by the caller
later. But before this assignment, calls to get_free_serial_index()
would return the same minor number.
Fix this by modifying get_free_serial_index to assign the minor number
immediately after one is found to be and rename it to obtain_minor()
to better reflect what it does. Similary, rename set_serial_by_index()
to release_minor() and modify it to free up the minor number of the
given hso_serial. Every obtain_minor() should have corresponding
release_minor() call.
Fixes: 72dc1c096c ("HSO: add option hso driver")
Reported-by: syzbot+c49fe6089f295a05e6f8@syzkaller.appspotmail.com
Tested-by: syzbot+c49fe6089f295a05e6f8@syzkaller.appspotmail.com
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Anirudh Rayabharam <mail@anirudhrb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b7eeb52721 upstream.
When ice_remove_vsi_lkup_fltr is called, by calling
ice_add_to_vsi_fltr_list local copy of vsi filter list
is created. If any issues during creation of vsi filter
list occurs it up for the caller to free already
allocated memory. This patch ensures proper memory
deallocation in these cases.
Fixes: 80d144c9ac ("ice: Refactor switch rule management structures and functions")
Signed-off-by: Robert Malz <robertx.malz@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a26c56ae67 upstream.
Use the `marvell,reg-init` DT property to configure the LED[2]/INTn pin
of the Marvell 88E1514 ethernet PHY on Turris Omnia into interrupt mode.
Without this the pin is by default in LED[2] mode, and the Marvell PHY
driver configures LED[2] into "On - Link, Blink - Activity" mode.
This fixes the issue where the pca9538 GPIO/interrupt controller (which
can't mask interrupts in HW) received too many interrupts and after a
time started ignoring the interrupt with error message:
IRQ 71: nobody cared
There is a work in progress to have the Marvell PHY driver support
parsing PHY LED nodes from OF and registering the LEDs as Linux LED
class devices. Once this is done the PHY driver can also automatically
set the pin into INTn mode if it does not find LED[2] in OF.
Until then, though, we fix this via `marvell,reg-init` DT property.
Signed-off-by: Marek Behún <kabel@kernel.org>
Reported-by: Rui Salvaterra <rsalvaterra@gmail.com>
Fixes: 26ca8b52d6 ("ARM: dts: add support for Turris Omnia")
Cc: Uwe Kleine-König <uwe@kleine-koenig.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Gregory CLEMENT <gregory.clement@bootlin.com>
Cc: <stable@vger.kernel.org>
Tested-by: Rui Salvaterra <rsalvaterra@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit df41872b68 upstream.
I encountered a hung task issue, but not a performance one. I run DIO
on a device (need lba continuous, for example open channel ssd), maybe
hungtask in below case:
DIO: Checkpoint:
get addr A(at boundary), merge into BIO,
no submit because boundary missing
flush dirty data(get addr A+1), wait IO(A+1)
writeback timeout, because DIO(A) didn't submit
get addr A+2 fail, because checkpoint is doing
dio_send_cur_page() may clear sdio->boundary, so prevent it from missing
a boundary.
Link: https://lkml.kernel.org/r/20210322042253.38312-1-jack.qiu@huawei.com
Fixes: b1058b9812 ("direct-io: submit bio after boundary buffer is added to it")
Signed-off-by: Jack Qiu <jack.qiu@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 90bd070aae upstream.
The following deadlock is detected:
truncate -> setattr path is waiting for pending direct IO to be done (inode->i_dio_count become zero) with inode->i_rwsem held (down_write).
PID: 14827 TASK: ffff881686a9af80 CPU: 20 COMMAND: "ora_p005_hrltd9"
#0 __schedule at ffffffff818667cc
#1 schedule at ffffffff81866de6
#2 inode_dio_wait at ffffffff812a2d04
#3 ocfs2_setattr at ffffffffc05f322e [ocfs2]
#4 notify_change at ffffffff812a5a09
#5 do_truncate at ffffffff812808f5
#6 do_sys_ftruncate.constprop.18 at ffffffff81280cf2
#7 sys_ftruncate at ffffffff81280d8e
#8 do_syscall_64 at ffffffff81003949
#9 entry_SYSCALL_64_after_hwframe at ffffffff81a001ad
dio completion path is going to complete one direct IO (decrement
inode->i_dio_count), but before that it hung at locking inode->i_rwsem:
#0 __schedule+700 at ffffffff818667cc
#1 schedule+54 at ffffffff81866de6
#2 rwsem_down_write_failed+536 at ffffffff8186aa28
#3 call_rwsem_down_write_failed+23 at ffffffff8185a1b7
#4 down_write+45 at ffffffff81869c9d
#5 ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2]
#6 ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2]
#7 dio_complete+140 at ffffffff812c873c
#8 dio_aio_complete_work+25 at ffffffff812c89f9
#9 process_one_work+361 at ffffffff810b1889
#10 worker_thread+77 at ffffffff810b233d
#11 kthread+261 at ffffffff810b7fd5
#12 ret_from_fork+62 at ffffffff81a0035e
Thus above forms ABBA deadlock. The same deadlock was mentioned in
upstream commit 28f5a8a7c0 ("ocfs2: should wait dio before inode lock
in ocfs2_setattr()"). It seems that that commit only removed the
cluster lock (the victim of above dead lock) from the ABBA deadlock
party.
End-user visible effects: Process hang in truncate -> ocfs2_setattr path
and other processes hang at ocfs2_dio_end_io_write path.
This is to fix the deadlock itself. It removes inode_lock() call from
dio completion path to remove the deadlock and add ip_alloc_sem lock in
setattr path to synchronize the inode modifications.
[wen.gang.wang@oracle.com: remove the "had_alloc_lock" as suggested]
Link: https://lkml.kernel.org/r/20210402171344.1605-1-wen.gang.wang@oracle.com
Link: https://lkml.kernel.org/r/20210331203654.3911-1-wen.gang.wang@oracle.com
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7ad1e36616 upstream.
ia64 has two stacks:
- memory stack (or stack), pointed at by by r12
- register backing store (register stack), pointed at by
ar.bsp/ar.bspstore with complications around dirty
register frame on CPU.
In [1] Dmitry noticed that PTRACE_GET_SYSCALL_INFO returns the register
stack instead memory stack.
The bug comes from the fact that user_stack_pointer() and
current_user_stack_pointer() don't return the same register:
ulong user_stack_pointer(struct pt_regs *regs) { return regs->ar_bspstore; }
#define current_user_stack_pointer() (current_pt_regs()->r12)
The change gets both back in sync.
I think ptrace(PTRACE_GET_SYSCALL_INFO) is the only affected user by
this bug on ia64.
The change fixes 'rt_sigreturn.gen.test' strace test where it was
observed initially.
Link: https://bugs.gentoo.org/769614 [1]
Link: https://lkml.kernel.org/r/20210331084447.2561532-1-slyfox@gentoo.org
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Reported-by: Dmitry V. Levin <ldv@altlinux.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3e6fdeb28f upstream.
The xMII interface clock depends on the PHY interface (MII, RMII, RGMII)
as well as the current link speed. Explicitly configure the GSWIP to
automatically select the appropriate xMII interface clock.
This fixes an issue seen by some users where ports using an external
RMII or RGMII PHY were deaf (no RX or TX traffic could be seen). Most
likely this is due to an "invalid" xMII clock being selected either by
the bootloader or hardware-defaults.
Fixes: 14fceff477 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 864db232dc upstream.
nlh is being checked for validtity two times when it is dereferenced in
this function. Check for validity again when updating the flags through
nlh pointer to make the dereferencing safe.
CC: <stable@vger.kernel.org>
Addresses-Coverity: ("NULL pointer dereference")
Signed-off-by: Muhammad Usama Anjum <musamaanjum@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4b5db93e7f upstream.
When sock_wait_state() returns -EINPROGRESS, "sk->sk_state" is
LLCP_CONNECTING. In this case, llcp_sock_connect() is repeatedly invoked,
nfc_llcp_sock_link() will add sk to local->connecting_sockets twice.
sk->sk_node->next will point to itself, that will make an endless loop
and hang-up the system.
To fix it, check whether sk->sk_state is LLCP_CONNECTING in
llcp_sock_connect() to avoid repeated invoking.
Fixes: b4011239a0 ("NFC: llcp: Fix non blocking sockets connections")
Reported-by: "kiyin(尹亮)" <kiyin@tencent.com>
Link: https://www.openwall.com/lists/oss-security/2020/11/01/1
Cc: <stable@vger.kernel.org> #v3.11
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b14d72ac73 upstream
Ceiling value may be miss-aligned with what's actually configured into the
ARR register. This is seen after probe as currently the ARR value is zero,
whereas ceiling value is set to the maximum. So:
- reading ceiling reports zero
- in case the counter gets enabled without any prior configuration,
it won't count.
- in case the function gets set by the user 1st, (priv->ceiling) is used.
Fix it by getting rid of the cached "priv->ceiling" variable. Rather use
the ARR register value directly by using regmap read or write when needed.
There should be no drawback on performance as priv->ceiling isn't used in
performance critical path.
There's also no point in writing ARR while setting function (sms), so
it can be safely removed.
Fixes: ad29937e20 ("counter: Add STM32 Timer quadrature encoder")
Suggested-by: William Breathitt Gray <vilhelm.gray@gmail.com>
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@foss.st.com>
Acked-by: William Breathitt Gray <vilhelm.gray@gmail.com>
Cc: <Stable@vger.kernel.org>
Link: https://lore.kernel.org/r/1614793789-10346-1-git-send-email-fabrice.gasnier@foss.st.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
[sudip: adjuct context]
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 334ef6ed06 upstream.
While allmodconfig and allyesconfig build for s390 there are also
various bots running compile tests with randconfig, where PCI is
disabled. This reveals that a lot of drivers should actually depend on
HAS_IOMEM.
Adding this to each device driver would be a never ending story,
therefore just disable COMPILE_TEST for s390.
The reasoning is more or less the same as described in
commit bc083a64b6 ("init/Kconfig: make COMPILE_TEST depend on !UML").
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The below patches caused a regression in a multipath setup:
Fixes: 9f98772ba3 ("nvme-rdma: fix controller reset hang during traffic")
Fixes: 2875b0aeca ("nvme-tcp: fix controller reset hang during traffic")
These patches on their own are correct because they fixed a controller reset
regression.
When we reset/teardown a controller, we must freeze and quiesce the namespaces
request queues to make sure that we safely stop inflight I/O submissions.
Freeze is mandatory because if our hctx map changed between reconnects,
blk_mq_update_nr_hw_queues will immediately attempt to freeze the queue, and
if it still has pending submissions (that are still quiesced) it will hang.
This is what the above patches fixed.
However, by freezing the namespaces request queues, and only unfreezing them
when we successfully reconnect, inflight submissions that are running
concurrently can now block grabbing the nshead srcu until either we successfully
reconnect or ctrl_loss_tmo expired (or the user explicitly disconnected).
This caused a deadlock [1] when a different controller (different path on the
same subsystem) became live (i.e. optimized/non-optimized). This is because
nvme_mpath_set_live needs to synchronize the nshead srcu before requeueing I/O
in order to make sure that current_path is visible to future (re)submisions.
However the srcu lock is taken by a blocked submission on a frozen request
queue, and we have a deadlock.
In recent kernels (v5.9+) direct_make_request was replaced by submit_bio_noacct
which does not have this issue because it bio_list will be active when
nvme-mpath calls submit_bio_noacct on the bottom device (because it was
populated when submit_bio was triggered on it.
Hence, we need to fix all the kernels that were before submit_bio_noacct was
introduced.
[1]:
Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp]
Call Trace:
__schedule+0x293/0x730
schedule+0x33/0xa0
schedule_timeout+0x1d3/0x2f0
wait_for_completion+0xba/0x140
__synchronize_srcu.part.21+0x91/0xc0
synchronize_srcu_expedited+0x27/0x30
synchronize_srcu+0xce/0xe0
nvme_mpath_set_live+0x64/0x130 [nvme_core]
nvme_update_ns_ana_state+0x2c/0x30 [nvme_core]
nvme_update_ana_state+0xcd/0xe0 [nvme_core]
nvme_parse_ana_log+0xa1/0x180 [nvme_core]
nvme_read_ana_log+0x76/0x100 [nvme_core]
nvme_mpath_init+0x122/0x180 [nvme_core]
nvme_init_identify+0x80e/0xe20 [nvme_core]
nvme_tcp_setup_ctrl+0x359/0x660 [nvme_tcp]
nvme_tcp_reconnect_ctrl_work+0x24/0x70 [nvme_tcp]
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26f55a59dc upstream.
The branch displacement logic in the BPF JIT compilers for x86 assumes
that, for any generated branch instruction, the distance cannot
increase between optimization passes.
But this assumption can be violated due to how the distances are
computed. Specifically, whenever a backward branch is processed in
do_jit(), the distance is computed by subtracting the positions in the
machine code from different optimization passes. This is because part
of addrs[] is already updated for the current optimization pass, before
the branch instruction is visited.
And so the optimizer can expand blocks of machine code in some cases.
This can confuse the optimizer logic, where it assumes that a fixed
point has been reached for all machine code blocks once the total
program size stops changing. And then the JIT compiler can output
abnormal machine code containing incorrect branch displacements.
To mitigate this issue, we assert that a fixed point is reached while
populating the output image. This rejects any problematic programs.
The issue affects both x86-32 and x86-64. We mitigate separately to
ease backporting.
Signed-off-by: Piotr Krysiuk <piotras@gmail.com>
Reviewed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e4d4d45643 upstream.
The branch displacement logic in the BPF JIT compilers for x86 assumes
that, for any generated branch instruction, the distance cannot
increase between optimization passes.
But this assumption can be violated due to how the distances are
computed. Specifically, whenever a backward branch is processed in
do_jit(), the distance is computed by subtracting the positions in the
machine code from different optimization passes. This is because part
of addrs[] is already updated for the current optimization pass, before
the branch instruction is visited.
And so the optimizer can expand blocks of machine code in some cases.
This can confuse the optimizer logic, where it assumes that a fixed
point has been reached for all machine code blocks once the total
program size stops changing. And then the JIT compiler can output
abnormal machine code containing incorrect branch displacements.
To mitigate this issue, we assert that a fixed point is reached while
populating the output image. This rejects any problematic programs.
The issue affects both x86-32 and x86-64. We mitigate separately to
ease backporting.
Signed-off-by: Piotr Krysiuk <piotras@gmail.com>
Reviewed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>