IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Some released FW versions mistakenly don't set the capability that 50G per
lane link-modes are supported for VFs (ptys_extended_ethernet capability
bit).
Use PTYS.ext_eth_proto_capability instead, as this indication is always
accurate. If PTYS.ext_eth_proto_capability is valid
(has a non-zero value) conclude that the HCA supports 50G per lane.
Otherwise, conclude that the HCA doesn't support 50G per lane.
Fixes: 08e8676f1607 ("IB/mlx5: Add support for 50Gbps per lane link modes")
Link: https://lore.kernel.org/r/20200707110612.882962-3-leon@kernel.org
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The bond_ipsec_* helpers don't need RTNL, and can potentially get called
without it being held, so switch from rtnl_dereference() to
rcu_dereference() to access bond struct data.
Lightly tested with xfrm bonding, no problems found, should address the
syzkaller bug referenced below.
Reported-by: syzbot+582c98032903dcc04816@syzkaller.appspotmail.com
CC: Huy Nguyen <huyn@mellanox.com>
CC: Saeed Mahameed <saeedm@mellanox.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Jakub Kicinski <kuba@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: netdev@vger.kernel.org
CC: intel-wired-lan@lists.osuosl.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure we don't regress the CAP_SYSLOG behavior of the module address
visibility via /proc/modules nor /sys/module/*/sections/*.
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
When evaluating access control over kallsyms visibility, credentials at
open() time need to be used, not the "current" creds (though in BPF's
case, this has likely always been the same). Plumb access to associated
file->f_cred down through bpf_dump_raw_ok() and its callers now that
kallsysm_show_value() has been refactored to take struct cred.
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 7105e828c087 ("bpf: allow for correlation of maps and helpers in dump")
Signed-off-by: Kees Cook <keescook@chromium.org>
The kprobe show() functions were using "current"'s creds instead
of the file opener's creds for kallsyms visibility. Fix to use
seq_file->file->f_cred.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 81365a947de4 ("kprobes: Show address of kprobes if kallsyms does")
Fixes: ffb9bd68ebdb ("kprobes: Show blacklist addresses as same as kallsyms does")
Signed-off-by: Kees Cook <keescook@chromium.org>
In order to gain access to the open file's f_cred for kallsym visibility
permission checks, refactor the module section attributes to use the
bin_attribute instead of attribute interface. Additionally removes the
redundant "name" struct member.
Cc: stable@vger.kernel.org
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
In order to perform future tests against the cred saved during open(),
switch kallsyms_show_value() to operate on a cred, and have all current
callers pass current_cred(). This makes it very obvious where callers
are checking the wrong credential in their "read" contexts. These will
be fixed in the coming patches.
Additionally switch return value to bool, since it is always used as a
direct permission check, not a 0-on-success, negative-on-error style
function return.
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Convert all-mask IP address to Big Endian, instead, for comparison.
Fixes: f286dd8eaad5 ("cxgb4: use correct type for all-mask IP address comparison")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DP83869 is an Ethernet PHY, not a charger, so fix the documentation
accordingly.
Fixes: 4d66c56f7efe ("dt-bindings: net: dp83869: Add TI dp83869 phy")
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Acked-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DP83867 is an Ethernet PHY, not a charger, so fix the documentation
accordingly.
Fixes: 74ac28f16486 ("dt-bindings: dp83867: Convert DP83867 to yaml")
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Acked-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A scenario has been observed where a 'bc_init' message for a link is not
retransmitted if it fails to be received by the peer. This leads to the
peer never establishing the link fully and it discarding all other data
received on the link. In this scenario the message is lost in transit to
the peer.
The issue is traced to the 'nxt_retr' field of the skb not being
initialised for links that aren't a bc_sndlink. This leads to the
comparison in tipc_link_advance_transmq() that gates whether to attempt
retransmission of a message performing in an undesirable way.
Depending on the relative value of 'jiffies', this comparison:
time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr)
may return true or false given that 'nxt_retr' remains at the
uninitialised value of 0 for non bc_sndlinks.
This is most noticeable shortly after boot when jiffies is initialised
to a high value (to flush out rollover bugs) and we compare a jiffies of,
say, 4294940189 to zero. In that case time_before returns 'true' leading
to the skb not being retransmitted.
The fix is to ensure that all skbs have a valid 'nxt_retr' time set for
them and this is achieved by refactoring the setting of this value into
a central function.
With this fix, transmission losses of 'bc_init' messages do not stall
the link establishment forever because the 'bc_init' message is
retransmitted and the link eventually establishes correctly.
Fixes: 382f598fb66b ("tipc: reduce duplicate packets for unicast traffic")
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's possible that device removal happens when the bond is in non-AB mode,
and addition happens in AB mode, so bond_ipsec_del_sa() never gets called,
which leaves security associations in an odd state if bond_ipsec_add_sa()
then gets called after switching the bond into AB. Just call add and
delete universally for all modes to keep things consistent.
However, it's also possible that this code gets called when the system is
shutting down, and the xfrm subsystem has already been disconnected from
the bond device, so we need to do some error-checking and bail, lest we
hit a null ptr deref.
Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load")
CC: Huy Nguyen <huyn@mellanox.com>
CC: Saeed Mahameed <saeedm@mellanox.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Jakub Kicinski <kuba@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: netdev@vger.kernel.org
CC: intel-wired-lan@lists.osuosl.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Walleij says:
====================
RTL8366RB tagging support
This patch set adds DSA tagging support to the RTL8366RB
DSA driver.
There is a minor performance improvement in the tag parser
compared to the previous patch set and the review tags
have been collected.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This activates the support to use the CPU tag to properly
direct ingress traffic to the right port.
Bit 15 in register RTL8368RB_CPU_CTRL_REG can be set to
1 to disable the insertion of the CPU tag which is what
the code currently does. The bit 15 define calls this
setting RTL8368RB_CPU_INSTAG which is confusing since the
inverse meaning is implied: programmers may think that
setting this bit to 1 will *enable* inserting the tag
rather than disabling it, so rename this setting in
bit 15 to RTL8368RB_CPU_NO_TAG which is more to the
point.
After this e.g. ping works out-of-the-box with the
RTL8366RB.
Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Mauri Sandberg <sandberg@mailfence.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This implements the known parts of the Realtek 4 byte
tag protocol version 0xA, as found in the RTL8366RB
DSA switch.
It is designated as protocol version 0xA as a
different Realtek 4 byte tag format with protocol
version 0x9 is known to exist in the Realtek RTL8306
chips.
The tag and switch chip lacks public documentation, so
the tag format has been reverse-engineered from
packet dumps. As only ingress traffic has been available
for analysis an egress tag has not been possible to
develop (even using educated guesses about bit fields)
so this is as far as it gets. It is not known if the
switch even supports egress tagging.
Excessive attempts to figure out the egress tag format
was made. When nothing else worked, I just tried all bit
combinations with 0xannp where a is protocol and p is
port. I looped through all values several times trying
to get a response from ping, without any positive
result.
Using just these ingress tags however, the switch
functionality is vastly improved and the packets find
their way into the destination port without any
tricky VLAN configuration. On the D-Link DIR-685 the
LAN ports now come up and respond to ping without
any command line configuration so this is a real
improvement for users.
Egress packets need to be restricted to the proper
target ports using VLAN, which the RTL8366RB DSA
switch driver already sets up.
Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Mauri Sandberg <sandberg@mailfence.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch exposes new link modes using 100Gbps per lane, including 100G,
200G and 400G modes.
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define 100G, 200G and 400G link modes using 100Gbps per lane
LR, ER and FR are defined as a single link mode because they are
using same technology and by design are fully interoperable.
EEPROM content indicates if the module is LR, ER, or FR, and the
user space ethtool decoder is planned to support decoding these
modes in the EEPROM.
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
CC: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the tx path of l2tp, l2tp_xmit_skb() calls skb_dst_set() to set
skb's dst. However, it will eventually call inet6_csk_xmit() or
ip_queue_xmit() where skb's dst will be overwritten by:
skb_dst_set_noref(skb, dst);
without releasing the old dst in skb. Then it causes dst/dev refcnt leak:
unregister_netdevice: waiting for eth0 to become free. Usage count = 1
This can be reproduced by simply running:
# modprobe l2tp_eth && modprobe l2tp_ip
# sh ./tools/testing/selftests/net/l2tp.sh
So before going to inet6_csk_xmit() or ip_queue_xmit(), skb's dst
should be dropped. This patch is to fix it by removing skb_dst_set()
from l2tp_xmit_skb() and moving skb_dst_drop() into l2tp_xmit_core().
Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: James Chapman <jchapman@katalix.com>
Tested-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan says:
====================
bnxt_en: Driver update for net-next.
This patchset implements ethtool -X to setup user-defined RSS indirection
table. The new infrastructure also allows the proper logical ring index
to be used to populate the RSS indirection when queried by ethtool -x.
Prior to these patches, we were incorrectly populating the output of
ethtool -x with internal ring IDs which would make no sense to the user.
The last 2 patches add some cleanups to the VLAN acceleration logic
and check the firmware capabilities before allowing VLAN acceleration
offloads.
v4: Move bnxt_get_rxfh_indir_size() fix to a new patch #2.
Modify patch #7 to revert RSS map to default only when necessary.
v3: Use ALIGN() in patch 5.
Add warning messages in patch 6.
v2: Some RSS indirection table changes requested by Jakub Kicinski.
====================
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bare-metal use cases require giving firmware and the embedded
application processor control over VLAN offloads. The driver should
not attempt to override or utilize this feature in such scenarios
since it will not work as expected.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hardware VLAN offload feature on our NIC does not have separate
knobs for handling customer and service tags on RX. Either offloading
of both must be enabled or both must be disabled. Introduce definitions
for the combined feature set in order to clean up the code and make
this constraint more clear. Technically these features can be separately
enabled on TX, however, since the default is to turn both on, the
combined TX feature set is also introduced for code consistency.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the new infrastructure in place, we can now support the setting of
the indirection table from ethtool.
When changing channels, in a rare case that firmware cannot reserve the
rings that were promised, we will still try to keep the RSS map and only
revert to default when absolutely necessary.
v4: Revert RSS map to default during ring change only when absolutely
necessary.
v3: Add warning messages when firmware cannot reserve the requested RX
rings, and when the RSS table entries have to change to default.
v2: When changing channels, if the RSS table size changes and RSS map
is non-default, return error.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have the logical indirection table, we can return these
proper logical indices directly to ethtool -x instead of the physical
IDs.
Reported-by: Jakub Kicinski <kicinski@fb.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have the logical table, we can fill the HW RSS table
using the logical table's entries and converting them to the HW
specific format. Re-initialize the logical table to standard
distribution if the number of RX rings changes during ring reservation.
v4: Use bnxt_get_rxfh_indir_size() to get the RSS table size.
v2: Use ALIGN() to roundup the RSS table size.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On some chips, this varies based on the number of RX rings. Add this
helper function and refactor the existing code to use it.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver currently does not keep track of the logical RSS indirection
table. The hardware RSS table is set up with standard default ring
distribution when initializing the chip. This makes it difficult to
support user sepcified indirection table entries. As a first step, add
the logical table in the main bnxt structure and allocate it according
to chip specific table size. Add a function that sets up default
RSS distribution based on the number of RX rings.
v4: Use bnxt_get_rxfh_indir_size() for the current RSS table size.
v2: Use kmalloc_array() since we init. all entries afterwards.
Use ALIGN() to roundup the RSS table size.
Use ethtool_rxfh_indir_default() to init. the default entries.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix up bnxt_get_rxfh_indir_size() to return the proper current RSS
table size for P5 chips. Change it to non-static so that bnxt.c
can use it to get the table size.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, we allocate one page for the hardware DMA RSS indirection
table. While the size is currently big enough for all chips, future
chip variations may support bigger sizes, so it is better to calculate
and store the chip specific size and allocate accordingly.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When adding first socket to nbd, if nsock's allocation failed, the data
structure member "config->socks" was reallocated, but the data structure
member "config->num_connections" was not updated. A memory leak will occur
then because the function "nbd_config_put" will free "config->socks" only
when "config->num_connections" is not zero.
Fixes: 03bf73c315ed ("nbd: prevent memory leak")
Reported-by: syzbot+934037347002901b8d2a@syzkaller.appspotmail.com
Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
cpu-feature-registers.rst is missing a new line before a couple
of tables listing the visible fields, causing broken tables in
the HTML documentation generated by "make htmldocs". Fix this
by adding the missing new line.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200707143152.154541-1-suzuki.poulose@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
After entering kdb due to breakpoint, when we execute 'ss' or 'go' (will
delay installing breakpoints, do single-step first), it won't work
correctly, and it will enter kdb due to oops.
It's because the reason gotten in kdb_stub() is not as expected, and it
seems that the ex_vector for single-step should be 0, like what arch
powerpc/sh/parisc has implemented.
Before the patch:
Entering kdb (current=0xffff8000119e2dc0, pid 0) on processor 0 due to Keyboard Entry
[0]kdb> bp printk
Instruction(i) BP #0 at 0xffff8000101486cc (printk)
is enabled addr at ffff8000101486cc, hardtype=0 installed=0
[0]kdb> g
/ # echo h > /proc/sysrq-trigger
Entering kdb (current=0xffff0000fa878040, pid 266) on processor 3 due to Breakpoint @ 0xffff8000101486cc
[3]kdb> ss
Entering kdb (current=0xffff0000fa878040, pid 266) on processor 3 Oops: (null)
due to oops @ 0xffff800010082ab8
CPU: 3 PID: 266 Comm: sh Not tainted 5.7.0-rc4-13839-gf0e5ad491718 #6
Hardware name: linux,dummy-virt (DT)
pstate: 00000085 (nzcv daIf -PAN -UAO)
pc : el1_irq+0x78/0x180
lr : __handle_sysrq+0x80/0x190
sp : ffff800015003bf0
x29: ffff800015003d20 x28: ffff0000fa878040
x27: 0000000000000000 x26: ffff80001126b1f0
x25: ffff800011b6a0d8 x24: 0000000000000000
x23: 0000000080200005 x22: ffff8000101486cc
x21: ffff800015003d30 x20: 0000ffffffffffff
x19: ffff8000119f2000 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000000 x14: 0000000000000000
x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000
x9 : 0000000000000000 x8 : ffff800015003e50
x7 : 0000000000000002 x6 : 00000000380b9990
x5 : ffff8000106e99e8 x4 : ffff0000fadd83c0
x3 : 0000ffffffffffff x2 : ffff800011b6a0d8
x1 : ffff800011b6a000 x0 : ffff80001130c9d8
Call trace:
el1_irq+0x78/0x180
printk+0x0/0x84
write_sysrq_trigger+0xb0/0x118
proc_reg_write+0xb4/0xe0
__vfs_write+0x18/0x40
vfs_write+0xb0/0x1b8
ksys_write+0x64/0xf0
__arm64_sys_write+0x14/0x20
el0_svc_common.constprop.2+0xb0/0x168
do_el0_svc+0x20/0x98
el0_sync_handler+0xec/0x1a8
el0_sync+0x140/0x180
[3]kdb>
After the patch:
Entering kdb (current=0xffff8000119e2dc0, pid 0) on processor 0 due to Keyboard Entry
[0]kdb> bp printk
Instruction(i) BP #0 at 0xffff8000101486cc (printk)
is enabled addr at ffff8000101486cc, hardtype=0 installed=0
[0]kdb> g
/ # echo h > /proc/sysrq-trigger
Entering kdb (current=0xffff0000fa852bc0, pid 268) on processor 0 due to Breakpoint @ 0xffff8000101486cc
[0]kdb> g
Entering kdb (current=0xffff0000fa852bc0, pid 268) on processor 0 due to Breakpoint @ 0xffff8000101486cc
[0]kdb> ss
Entering kdb (current=0xffff0000fa852bc0, pid 268) on processor 0 due to SS trap @ 0xffff800010082ab8
[0]kdb>
Fixes: 44679a4f142b ("arm64: KGDB: Add step debugging support")
Signed-off-by: Wei Li <liwei391@huawei.com>
Tested-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20200509214159.19680-2-liwei391@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Continually butchering our entry code with CPU errata workarounds has
led to it looking a little scruffy. Consistently used /* */ comment
style for multi-line block comments and ensure that small numeric labels
use consecutive integers.
No functional change, but the state of things was irritating.
Signed-off-by: Will Deacon <will@kernel.org>
The current handling of erratum 1414080 has the side effect that
cntkctl_el1 can get changed for both 32 and 64bit tasks.
This isn't a problem so far, but if we ever need to mitigate another
of these errata on the 64bit side, we'd better keep the messing with
cntkctl_el1 local to 32bit tasks.
For that, make sure that on entering the kernel from a 32bit tasks,
userspace access to cntvct gets enabled, and disabled returning to
userspace, while it never gets changed for 64bit tasks.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20200706163802.1836732-5-maz@kernel.org
[will: removed branch instructions per Mark's review comments]
Signed-off-by: Will Deacon <will@kernel.org>
ARM64_WORKAROUND_1418040 requires that AArch32 EL0 accesses to
the virtual counter register are trapped and emulated by the kernel.
This makes the vdso pretty pointless, and in some cases livelock
prone.
Provide a workaround entry that limits the vdso to 64bit tasks.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200706163802.1836732-4-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
As we are about to disable the vdso for compat tasks in some circumstances,
let's allow a workaround descriptor to express exactly that.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200706163802.1836732-3-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
We have a class of errata (grouped under the ARM64_WORKAROUND_1418040
banner) that force the trapping of counter access from 32bit EL0.
We would normally disable the whole vdso for such defect, except that
it would disable it for 64bit userspace as well, which is a shame.
Instead, add a new vdso_clock_mode, which signals that the vdso
isn't usable for compat tasks. This gets checked in the new
vdso_clocksource_ok() helper, now provided for the 32bit vdso.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200706163802.1836732-2-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
The return value of down_and_up_cpus() can be assigned to @err directly.
With that, the useless assignment to @err with zero can be dropped.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20200630075943.203954-1-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
The CPU mask (@tmp) should be released on failing to allocate
@cpu_groups or any of its elements. Otherwise, it leads to memory
leakage because the CPU mask variable is dynamically allocated
when CONFIG_CPUMASK_OFFSTACK is enabled.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20200630075227.199624-1-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
PAGE_HYP_DEVICE is intended to encode attribute bits for an EL2 stage-1
pte mapping a device. Unfortunately, it includes PROT_DEVICE_nGnRE which
encodes attributes for EL1 stage-1 mappings such as UXN and nG, which are
RES0 for EL2, and DBM which is meaningless as TCR_EL2.HD is not set.
Fix the definition of PAGE_HYP_DEVICE so that it doesn't set RES0 bits
at EL2.
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200708162546.26176-1-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Support for rejecting packets from the prerouting chain, from
Laura Garcia Liebana.
2) Remove useless assignment in pipapo, from Stefano Brivio.
3) On demand hook registration in IPVS, from Julian Anastasov.
4) Expire IPVS connection from process context to not overload
timers, also from Julian.
5) Fallback to conntrack TCP tracker to handle connection reuse
in IPVS, from Julian Anastasov.
6) Several patches to support for chain bindings.
7) Expose enum nft_chain_flags through UAPI.
8) Reject unsupported chain flags from the netlink control plane.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli says:
====================
net: phy: Uninline PHY ethtool statistics
Now that we have introduced ethtool_phy_ops we can uninline those
operations and move them back into phy.c where they belong. Since those
functions are used by DSA, we need to continue exporting those symbols.
It might be possible to remove ndo_get_ethtool_phy_stats in a subsequent
patch since we could have DSA register its own ethtool_phy_ops instance
instead of overloading the ethtool_ops.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have moved the PHY ethtool statistics to be dynamically
registered, we no longer need to inline those for ethtool. This used to
be done to avoid cross symbol referencing and allow ethtool to be
decoupled from PHYLIB entirely.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend ethtool_phy_ops to include the 3 function pointers necessary for
implementing PHY statistics. In a subsequent change we will uninline
those functions.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul says:
====================
net/smc: fixes 2020-07-08
Please apply the following patch series for smc to netdev's net tree.
The patches fix problems found during more testing of SMC
functionality, resulting in hang conditions and unneeded link
deactivations. The clc module was hardened to be prepared for
possible future SMCD versions.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
CLC proposal messages of future SMCD versions could be larger than SMCD
V1 CLC proposal messages.
To enable toleration in SMC V1 the receival of CLC proposal messages
is adapted:
* accept larger length values in CLC proposal
* check trailing eye catcher for incoming CLC proposal with V1 length only
* receive the whole CLC proposal even in cases it does not fit into the
V1 buffer
Fixes: e7b7a64a8493d ("smc: support variable CLC proposal messages")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The similar smc_ib_devices spinlock has been converted to a mutex.
Protecting the smcd_dev_list by a mutex is possible as well. This
patch converts the smcd_dev_list spinlock to a mutex.
Fixes: c6ba7c9ba43d ("net/smc: add base infrastructure for SMC-D and ISM")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tests showed this BUG:
[572555.252867] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
[572555.252876] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 131031, name: smcapp
[572555.252879] INFO: lockdep is turned off.
[572555.252883] CPU: 1 PID: 131031 Comm: smcapp Tainted: G O 5.7.0-rc3uschi+ #356
[572555.252885] Hardware name: IBM 3906 M03 703 (LPAR)
[572555.252887] Call Trace:
[572555.252896] [<00000000ac364554>] show_stack+0x94/0xe8
[572555.252901] [<00000000aca1f400>] dump_stack+0xa0/0xe0
[572555.252906] [<00000000ac3c8c10>] ___might_sleep+0x260/0x280
[572555.252910] [<00000000acdc0c98>] __mutex_lock+0x48/0x940
[572555.252912] [<00000000acdc15c2>] mutex_lock_nested+0x32/0x40
[572555.252975] [<000003ff801762d0>] mlx5_lag_get_roce_netdev+0x30/0xc0 [mlx5_core]
[572555.252996] [<000003ff801fb3aa>] mlx5_ib_get_netdev+0x3a/0xe0 [mlx5_ib]
[572555.253007] [<000003ff80063848>] smc_pnet_find_roce_resource+0x1d8/0x310 [smc]
[572555.253011] [<000003ff800602f0>] __smc_connect+0x1f0/0x3e0 [smc]
[572555.253015] [<000003ff80060634>] smc_connect+0x154/0x190 [smc]
[572555.253022] [<00000000acbed8d4>] __sys_connect+0x94/0xd0
[572555.253025] [<00000000acbef620>] __s390x_sys_socketcall+0x170/0x360
[572555.253028] [<00000000acdc6800>] system_call+0x298/0x2b8
[572555.253030] INFO: lockdep is turned off.
Function smc_pnet_find_rdma_dev() might be called from
smc_pnet_find_roce_resource(). It holds the smc_ib_devices list
spinlock while calling infiniband op get_netdev(). At least for mlx5
the get_netdev operation wants mutex serialization, which conflicts
with the smc_ib_devices spinlock.
This patch switches the smc_ib_devices spinlock into a mutex to
allow sleeping when calling get_netdev().
Fixes: a4cf0443c414 ("smc: introduce SMC as an IB-client")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>