IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
When __blkcg_rstat_flush() is called from cgroup_rstat_flush*() code
path, interrupt is always disabled.
When we start to flush blkcg per-cpu stats list in __blkg_release()
for avoiding to leak blkcg_gq's reference in commit 20cb1c2fb756
("blk-cgroup: Flush stats before releasing blkcg_gq"), local irq
isn't disabled yet, then lockdep warning may be triggered because
the dependent cgroup locks may be acquired from irq(soft irq) handler.
Fix the issue by disabling local irq always.
Fixes: 20cb1c2fb756 ("blk-cgroup: Flush stats before releasing blkcg_gq")
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Closes: https://lore.kernel.org/linux-block/pz2wzwnmn5tk3pwpskmjhli6g3qly7eoknilb26of376c7kwxy@qydzpvt6zpis/T/#u
Cc: stable@vger.kernel.org
Cc: Jay Shin <jaeshin@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Link: https://lore.kernel.org/r/20230622084249.1208005-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmSSyuYACgkQ1V2XiooU
IORbWhAAgkCFKOl2OQ/IccASBZx9ER34ZiJw9JylnndtIyLhVlBnXuJY3Nhc5r38
HnVxE+kgWbOYLLcq1uTqSdy+6j1VOKLk/HURtpDlr8685EmBVAE0lzNXwcgMQ9DJ
RKKbaVmWHFR2eOHrasvVSzXD+tK3GSiVvHVbIhu1FJuLwHTmo6vOEGBfHN4hSMnh
bxGP43ZSk4wbsgF1XT7wl4TN9MHllzSQm3WRtXXTCEDER4L/SOE39Z5PzUTCAikR
GwVXmolPhgCPGXfwe/abojFtYk1Cs/wVuogfirRd6BUV300GageuwuF7UymBf4GF
9t/sH2u95i9qAVzItoG/r4uMSY1IfDO/iBMiw1Q5lI1oPE0N7EBbtL7ZKxxpZ5RS
2g0TvLNNHLqmBpwxjm5Dq8XsTdt4PBQUOddGprcAoxmRrdZecgSqDn7IJTQeCOGc
GUl2dKdZvr33Ld1dTyqin/vijELpfIZDbLI2lFb/uFx7U/cBq8Ws2mvN2QZ9zdoe
30l+UucPOj7f787Bol8oMt2GSJv3iPXRpwwLkwuE75LGC8ckdGGo3xBPDg6OToA2
XyrdeLrarrHs0vJ+g8rZnG4yaEcG1GmeTWORt0pjsDZ8Y+ddecdts8fYJ+a+VUbp
lXkge1yIMJv5lI1axMFIMcvmLobqUKxLSuCZz51DLYWRfdIH490=
=9NBP
-----END PGP SIGNATURE-----
Merge tag 'nf-23-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
This is v3, including a crash fix for patch 01/14.
The following patchset contains Netfilter/IPVS fixes for net:
1) Fix UDP segmentation with IPVS tunneled traffic, from Terin Stock.
2) Fix chain binding transaction logic, add a bound flag to rule
transactions. Remove incorrect logic in nft_data_hold() and
nft_data_release().
3) Add a NFT_TRANS_PREPARE_ERROR deactivate state to deal with releasing
the set/chain as a follow up to 1240eb93f061 ("netfilter: nf_tables:
incorrect error path handling with NFT_MSG_NEWRULE")
4) Drop map element references from preparation phase instead of
set destroy path, otherwise bogus EBUSY with transactions such as:
flush chain ip x y
delete chain ip x w
where chain ip x y contains jump/goto from set elements.
5) Pipapo set type does not regard generation mask from the walk
iteration.
6) Fix reference count underflow in set element reference to
stateful object.
7) Several patches to tighten the nf_tables API:
- disallow set element updates of bound anonymous set
- disallow unbound anonymous set/chain at the end of transaction.
- disallow updates of anonymous set.
- disallow timeout configuration for anonymous sets.
8) Fix module reference leak in chain updates.
9) Fix nfnetlink_osf module autoload.
10) Fix deletion of basechain when NFTA_CHAIN_HOOK is specified as
in iptables-nft.
This Netfilter batch is larger than usual at this stage, I am aware we
are fairly late in the -rc cycle, if you prefer to route them through
net-next, please let me know.
netfilter pull request 23-06-21
* tag 'nf-23-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: Fix for deleting base chains with payload
netfilter: nfnetlink_osf: fix module autoload
netfilter: nf_tables: drop module reference after updating chain
netfilter: nf_tables: disallow timeout for anonymous sets
netfilter: nf_tables: disallow updates of anonymous sets
netfilter: nf_tables: reject unbound chain set before commit phase
netfilter: nf_tables: reject unbound anonymous set before commit phase
netfilter: nf_tables: disallow element updates of bound anonymous sets
netfilter: nf_tables: fix underflow in object reference counter
netfilter: nft_set_pipapo: .walk does not deal with generations
netfilter: nf_tables: drop map element references from preparation phase
netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
netfilter: nf_tables: fix chain binding transaction logic
ipvs: align inner_mac_header for encapsulation
====================
Link: https://lore.kernel.org/r/20230621100731.68068-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This reverts commit 1f86123b9749 ("net: align SO_RCVMARK required
privileges with SO_MARK") because the reasoning in the commit message
is not really correct:
SO_RCVMARK is used for 'reading' incoming skb mark (via cmsg), as such
it is more equivalent to 'getsockopt(SO_MARK)' which has no priv check
and retrieves the socket mark, rather than 'setsockopt(SO_MARK) which
sets the socket mark and does require privs.
Additionally incoming skb->mark may already be visible if
sysctl_fwmark_reflect and/or sysctl_tcp_fwmark_accept are enabled.
Furthermore, it is easier to block the getsockopt via bpf
(either cgroup setsockopt hook, or via syscall filters)
then to unblock it if it requires CAP_NET_RAW/ADMIN.
On Android the socket mark is (among other things) used to store
the network identifier a socket is bound to. Setting it is privileged,
but retrieving it is not. We'd like unprivileged userspace to be able
to read the network id of incoming packets (where mark is set via
iptables [to be moved to bpf])...
An alternative would be to add another sysctl to control whether
setting SO_RCVMARK is privilged or not.
(or even a MASK of which bits in the mark can be exposed)
But this seems like over-engineering...
Note: This is a non-trivial revert, due to later merged commit e42c7beee71d
("bpf: net: Consider has_current_bpf_ctx() when testing capable() in sk_setsockopt()")
which changed both 'ns_capable' into 'sockopt_ns_capable' calls.
Fixes: 1f86123b9749 ("net: align SO_RCVMARK required privileges with SO_MARK")
Cc: Larysa Zaremba <larysa.zaremba@intel.com>
Cc: Simon Horman <simon.horman@corigine.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Eyal Birger <eyal.birger@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Patrick Rohr <prohr@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230618103130.51628-1-maze@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
struct mux_adth actually ends with multiple struct mux_adth_dg members.
This is seen both in the comments about the member:
/**
* struct mux_adth - Structure of the Aggregated Datagram Table Header.
...
* @dg: datagramm table with variable length
*/
and in the preparation for populating it:
adth_dg_size = offsetof(struct mux_adth, dg) +
ul_adb->dg_count[i] * sizeof(*dg);
...
adth_dg_size -= offsetof(struct mux_adth, dg);
memcpy(&adth->dg, ul_adb->dg[i], adth_dg_size);
This was reported as a run-time false positive warning:
memcpy: detected field-spanning write (size 16) of single field "&adth->dg" at drivers/net/wwan/iosm/iosm_ipc_mux_codec.c:852 (size 8)
Adjust the struct mux_adth definition and associated sizeof() math; no binary
output differences are observed in the resulting object file.
Reported-by: Florian Klink <flokli@flokli.de>
Closes: https://lore.kernel.org/lkml/dbfa25f5-64c8-5574-4f5d-0151ba95d232@gmail.com/
Fixes: 1f52d7b62285 ("net: wwan: iosm: Enable M.2 7360 WWAN card support")
Cc: M Chetan Kumar <m.chetan.kumar@intel.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Intel Corporation <linuxwwan@intel.com>
Cc: Loic Poulain <loic.poulain@linaro.org>
Cc: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230620194234.never.023-kees@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When mirroring to a gretap in hardware the device expects to be
programmed with the egress port and all the encapsulating headers. This
requires the driver to resolve the path the packet will take in the
software data path and program the device accordingly.
If the path cannot be resolved (in this case because of an unresolved
neighbor), then mirror installation fails until the path is resolved.
This results in a race that causes the test to sometimes fail.
Fix this by setting the neighbor's state to permanent in a couple of
tests, so that it is always valid.
Fixes: 35c31d5c323f ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1d")
Fixes: 239e754af854 ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1q")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/268816ac729cb6028c7a34d4dda6f4ec7af55333.1687264607.git.petrm@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Properly check for RX_DROP_UNUSABLE now that the new drop reason
infrastructure is used. Without this change, the comparison will always
be false as a more specific reason is given in the lower bits of result.
Fixes: baa951a1c177 ("mac80211: use the new drop reasons infrastructure")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20230621120543.412920-2-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts says:
====================
mptcp: fixes for 6.4
Patch 1 correctly handles disconnect() failures that can happen in some
specific cases: now the socket state is set as unconnected as expected.
That fixes an issue introduced in v6.2.
Patch 2 fixes a divide by zero bug in mptcp_recvmsg() with a fix similar
to a recent one from Eric Dumazet for TCP introducing sk_wait_pending
flag. It should address an issue present in MPTCP from almost the
beginning, from v5.9.
Patch 3 fixes a possible list corruption on passive MPJ even if the race
seems very unlikely, better be safe than sorry. The possible issue is
present from v5.17.
Patch 4 consolidates fallback and non fallback state machines to avoid
leaking some MPTCP sockets. The fix is likely needed for versions from
v5.11.
Patch 5 drops code that is no longer used after the introduction of
patch 4/6. This is not really a fix but this patch can probably land in
the -net tree as well not to leave unused code.
Patch 6 ensures listeners are unhashed before updating their sk status
to avoid possible deadlocks when diag info are going to be retrieved
with a lock. Even if it should not be visible with the way we are
currently getting diag info, the issue is present from v5.17.
====================
Link: https://lore.kernel.org/r/20230620-upstream-net-20230620-misc-fixes-for-v6-4-v1-0-f36aa5eae8b9@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The MPTCP protocol access the listener subflow in a lockless
manner in a couple of places (poll, diag). That works only if
the msk itself leaves the listener status only after that the
subflow itself has been closed/disconnected. Otherwise we risk
deadlock in diag, as reported by Christoph.
Address the issue ensuring that the first subflow (the listener
one) is always disconnected before updating the msk socket status.
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/407
Fixes: b29fcfb54cd7 ("mptcp: full disconnect implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thanks to the previous patch -- "mptcp: consolidate fallback and non
fallback state machine" -- we can finally drop the "temporary hack"
used to detect rx eof.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
An orphaned msk releases the used resources via the worker,
when the latter first see the msk in CLOSED status.
If the msk status transitions to TCP_CLOSE in the release callback
invoked by the worker's final release_sock(), such instance of the
workqueue will not take any action.
Additionally the MPTCP code prevents scheduling the worker once the
socket reaches the CLOSE status: such msk resources will be leaked.
The only code path that can trigger the above scenario is the
__mptcp_check_send_data_fin() in fallback mode.
Address the issue removing the special handling of fallback socket
in __mptcp_check_send_data_fin(), consolidating the state machine
for fallback and non fallback socket.
Since non-fallback sockets do not send and do not receive data_fin,
the mptcp code can update the msk internal status to match the next
step in the SM every time data fin (ack) should be generated or
received.
As a consequence we can remove a bunch of checks for fallback from
the fastpath.
Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
At passive MPJ time, if the msk socket lock is held by the user,
the new subflow is appended to the msk->join_list under the msk
data lock.
In mptcp_release_cb()/__mptcp_flush_join_list(), the subflows in
that list are moved from the join_list into the conn_list under the
msk socket lock.
Append and removal could race, possibly corrupting such list.
Address the issue splicing the join list into a temporary one while
still under the msk data lock.
Found by code inspection, the race itself should be almost impossible
to trigger in practice.
Fixes: 3e5014909b56 ("mptcp: cleanup MPJ subflow list handling")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently the mptcp code has assumes that disconnect() can fail only
at mptcp_sendmsg_fastopen() time - to avoid a deadlock scenario - and
don't even bother returning an error code.
Soon mptcp_disconnect() will handle more error conditions: let's track
them explicitly.
As a bonus, explicitly annotate TCP-level disconnect as not failing:
the mptcp code never blocks for event on the subflows.
Fixes: 7d803344fdc3 ("mptcp: fix deadlock in fastopen error path")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZJK5DwAKCRDbK58LschI
gyUtAQD4gT4BEVHRqvniw9yyqYo0BvElAznutDq7o9kFHFep2gEAoksEWS84OdZj
0L5mSKjXrpHKzmY/jlMrVIcTb3VzOw0=
=gAYE
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-06-21
We've added 7 non-merge commits during the last 14 day(s) which contain
a total of 7 files changed, 181 insertions(+), 15 deletions(-).
The main changes are:
1) Fix a verifier id tracking issue with scalars upon spill,
from Maxim Mikityanskiy.
2) Fix NULL dereference if an exception is generated while a BPF
subprogram is running, from Krister Johansen.
3) Fix a BTF verification failure when compiling kernel with LLVM_IAS=0,
from Florent Revest.
4) Fix expected_attach_type enforcement for kprobe_multi link,
from Jiri Olsa.
5) Fix a bpf_jit_dump issue for x86_64 to pick the correct JITed image,
from Yonghong Song.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Force kprobe multi expected_attach_type for kprobe_multi link
bpf/btf: Accept function names that contain dots
selftests/bpf: add a test for subprogram extables
bpf: ensure main program has an extable
bpf: Fix a bpf_jit_dump issue for x86_64 with sysctl bpf_jit_enable.
selftests/bpf: Add test cases to assert proper ID tracking on spill
bpf: Fix verifier id tracking of scalars on spill
====================
Link: https://lore.kernel.org/r/20230621101116.16122-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For a long time the tick was aligned to clock MONOTONIC so that the tick
event happened at a multiple of nanoseconds per tick starting from clock
MONOTONIC = 0.
At some point this changed as the refined jiffies clocksource which is
used during boot before the TSC or other clocksources becomes usable, was
adjusted with a boot offset, so that time 0 is closer to the point where
the kernel starts.
This broke the assumption in the tick code that when the tick setup
happens early on ktime_get() will return a multiple of nanoseconds per
tick. As a consequence applications which aligned their periodic
execution so that it does not collide with the tick were not longer
guaranteed that the tick period starts from time 0.
The fix for this regression was to realign the tick when it is initially
set up to a multiple of tick periods. That works as long as the
underlying tick device supports periodic mode, but breaks under certain
conditions when the tick device supports only one shot mode.
Depending on the offset, the alignment delta to clock MONOTONIC can get
in a range where the minimal programming delta of the underlying clock
event device is larger than the calculated delta to the next tick. This
results in a boot hang as the tick code tries to play catch up, but as
the tick never fires jiffies are not advanced so it keeps trying for
ever.
Solve this by moving the tick alignement into the NOHZ / HIGHRES
enablement code because at that point it is guaranteed that the
underlying clocksource is high resolution capable and not longer
depending on the tick.
This is far before user space starts, so at the point where applications
try to align their timers, the old behaviour of the tick happening at a
multiple of nanoseconds per tick starting from clock MONOTONIC = 0 is
restored.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmSTUNQTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoQQjD/4+zHK7IxRK+1k1lIStxksEudVeNuqK
7RZKNTawZFSChrxGmv5uhmqYDK9/tudnZot+uyAXI81JHnnGSXbco+VrAA2DQ2v2
w/oxhAHIAobJWDzUuCqT2YsseDVNoTGMrTTPrL4klXqhVtShVOLTI/603+macEiw
Olqmd7XxlyhCloNqoMIh18EwSNIs78kP3YAQHUmq3NoRkJT5c3wWKvZy38WOxSJ9
Jjpw75tePfcwWte9fGVq+oJ8hOWkPAKN5hFL8420RrEgdIkxKlhVmuyG48ieZKH1
LE1brB5iaU8+M91PevZMRlD7oio8u/d6xX1vT3Cqz9UBe1NLj1I0ToX6COhwSNxm
50U0HJ5wGlpodu4IyBHXTG45UAVwpOg/EgDlFigeP3Nxl6h45ugrA010r6FqE1zV
KltjNGsk/OD5d8ywW+yManU2mySm9znqmWN2zbjRGm138Tuz7vglcBhFB6jEJFUF
V8vYPLwI4J2gqifx8NoFBIfLuZp/+D4LOQiJ7XRc9CVm3JLK8Xi8fxnHGTBwJHIn
45WU4hzmY7fjR/gGhAyG7lH6tlo7VTC9/lrO+YtAy2hL2sK14+m/ZH7836I5bAhw
EgSyVlcfR6tgO55I3JgkbNIyO5TUAZPZIAY1TlyR53xXvMkb9aZTh9UMm1xMJ3uX
+sQaLAuDFNUQSg==
=/uvx
-----END PGP SIGNATURE-----
Merge tag 'timers-urgent-2023-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single regression fix for a regression fix:
For a long time the tick was aligned to clock MONOTONIC so that the
tick event happened at a multiple of nanoseconds per tick starting
from clock MONOTONIC = 0.
At some point this changed as the refined jiffies clocksource which is
used during boot before the TSC or other clocksources becomes usable,
was adjusted with a boot offset, so that time 0 is closer to the point
where the kernel starts.
This broke the assumption in the tick code that when the tick setup
happens early on ktime_get() will return a multiple of nanoseconds per
tick. As a consequence applications which aligned their periodic
execution so that it does not collide with the tick were not longer
guaranteed that the tick period starts from time 0.
The fix for this regression was to realign the tick when it is
initially set up to a multiple of tick periods. That works as long as
the underlying tick device supports periodic mode, but breaks under
certain conditions when the tick device supports only one shot mode.
Depending on the offset, the alignment delta to clock MONOTONIC can
get in a range where the minimal programming delta of the underlying
clock event device is larger than the calculated delta to the next
tick. This results in a boot hang as the tick code tries to play catch
up, but as the tick never fires jiffies are not advanced so it keeps
trying for ever.
Solve this by moving the tick alignement into the NOHZ / HIGHRES
enablement code because at that point it is guaranteed that the
underlying clocksource is high resolution capable and not longer
depending on the tick.
This is far before user space starts, so at the point where
applications try to align their timers, the old behaviour of the tick
happening at a multiple of nanoseconds per tick starting from clock
MONOTONIC = 0 is restored"
* tag 'timers-urgent-2023-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/common: Align tick period during sched_timer setup
A last minute revert to fix a regression.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmSSsb0PHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpteMH/RRAiEYfWD+/DqFvxeBdFQsistVphMj7/3hh
9tKbVHXIJGl9SVodfjL/vvgk4QwZg3CgmKGDAWGpPO1SjFHCg7JgII7AE6Wu5Ff+
an4FguPA5Wc+7dmzsVH3yPg9N5Im9XtR5Y9avlhDmDEmLcKSzBEEtWrfQtKLLuyi
/dZZTEkzDgAP/wkrOEp90U9ION/bZ/hxFDbGWMr2XAg9ZR90GivCOleG63qPyLcq
k9asAF55tjNhNmqH6OpR5BcfFZhJli3IJa56Bypr//dJkzjQSrRhkz2UcwXNwu/b
GxAmBo/ObgnuSOUsDve9jZowGLa/MgocuCMcf05P9D3CtYsIqIs=
=gmrq
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fix from Michael Tsirkin:
"A last minute revert to fix a regression"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
Revert "virtio-blk: support completion batching for the IRQ path"
One last fix for SPI, just a simple fix for incorrect handling of probe
deferral for DMA in the Qualcomm GENI driver.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmSS8esACgkQJNaLcl1U
h9Cm1Qf+PD5lKhoc4Grecs5EbrI/Fzg0y3y6aLP2dghc+XbGnNcCtBYExavRWpJa
AsmqgH+Q7lARZ6T7vrsVyyV3Pn6mLem4aNWo1omKTbmfOD4zf9IT3pnoMQSBo8VC
/7f5nlP+0aVtHXwhZeGP3biZKZ25c2tkDsFb5yKBKFSohSi23RJCErHLhnVZv8Ne
2OymxblxbcGV3Mu63EvZbnP0q8r0toK4VgAKn86onV2D/jrH/EjNZDTSXWLHhCkk
IptQ8vCv9suIF1aZApvBlOWqE8xs1dp/OVNuD2MtVFKWCm/9DS65QPXVxz5VUQVx
J0GF1m8uvLWdWfZwqS75orCpUYpuGQ==
=/Lry
-----END PGP SIGNATURE-----
Merge tag 'spi-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fix from Mark Brown:
"One last fix for SPI, just a simple fix for incorrect handling of
probe deferral for DMA in the Qualcomm GENI driver"
* tag 'spi-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-geni-qcom: correctly handle -EPROBE_DEFER from dma_request_chan()
One simple fix for v6.4, some incorrectly specified bitfield masks in
the PCA9450 driver.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmSS8qMACgkQJNaLcl1U
h9AuQwf7BY16m/8M/3aHCzkj6d/NzzDOmsvgLuelAp7WAhoCd/+7jrp1rgt+M/CS
hiZZlhIaQVhEOjBzfDIEAY5aTW7rpZB/PsA0cVOSffVASAFNMkfAKo2zLPQ+IN/V
kcFHYRDtctQnENZZ3a4L9gxU0NuKwjvdX1tBhDk7fGzZLc2S+J3G/ea9z5NUNj97
ycUwbmOyLMmqh+Yb+nUrMBwvMpm8HsjzS3Ln3Xlz4rZUCKIfzOfewjmLO3ZbqwQZ
bjeiFB4XoyfwacQBPgeR1vCNZRXPw21Th2CYE7p4nqTUdpqHFU5k43ILl1nrpcBE
jS9INrLBU88EWvkE6Mh67qIyFey/FQ==
=naq9
-----END PGP SIGNATURE-----
Merge tag 'regulator-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"One simple fix for v6.4, some incorrectly specified bitfield masks in
the PCA9450 driver"
* tag 'regulator-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: pca9450: Fix LDO3OUT and LDO4OUT MASK
The earlier fix to take account of the register data size when limiting
raw register writes exposed the fact that the Intel AVMM bus was
incorrectly specifying too low a limit on the maximum data transfer, it
is only capable of transmitting one register so had set a transfer size
limit that couldn't fit both the value and the the register address into
a single message.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmSS8bEACgkQJNaLcl1U
h9B1zwf/emwrogeLq+eQ03Q9O4oE7OmbYmbTKKhZHtdwtUxZ4qsZ9aNekcgPw1Kv
riZHiL2EC7EKFAqTl1KiGsoJTWQONt32DFcc+27fW6IyXAFc4AMDf2JPnKMtZC83
0R+H0vC/FDVO6lzxNerPuC62ydaBr38XdimyCwYgLtzNWwsZUNh6leXgjFbF3sh7
MEi/SaLH9hcBr0suERnXh3DZUAT7d2kvwL1krSDHF7pvhQ+Z2pn96B1NVnaVVhM1
OlyfQLieRZz1Q2yF2LASI3I9n2IWeToR9i2iaWn2RlGiBzXpo5E5YpEFtH7umCH0
AjoI5Cf44xFXh5asJskRydgI7iKwPQ==
=yQ+K
-----END PGP SIGNATURE-----
Merge tag 'regmap-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fix from Mark Brown:
"One more fix for v6.4
The earlier fix to take account of the register data size when
limiting raw register writes exposed the fact that the Intel AVMM bus
was incorrectly specifying too low a limit on the maximum data
transfer, it is only capable of transmitting one register so had set a
transfer size limit that couldn't fit both the value and the the
register address into a single message"
* tag 'regmap-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: spi-avmm: Fix regmap_bus max_raw_write
Rather than assign the user pointer to msghdr->msg_control, assign it
to msghdr->msg_control_user to make sparse happy. They are in a union
so the end result is the same, but let's avoid new sparse warnings and
squash this one.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306210654.mDMcyMuB-lkp@intel.com/
Fixes: cac9e4418f4c ("io_uring/net: save msghdr->msg_control for retries")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We cannot sanely handle partial retries for recvmsg if we have cmsg
attached. If we don't, then we'd just be overwriting the initial cmsg
header on retries. Alternatively we could increment and handle this
appropriately, but it doesn't seem worth the complication.
Move the MSG_WAITALL check into the non-multishot case while at it,
since MSG_WAITALL is explicitly disabled for multishot anyway.
Link: https://lore.kernel.org/io-uring/0b0d4411-c8fd-4272-770b-e030af6919a0@kernel.dk/
Cc: stable@vger.kernel.org # 5.10+
Reported-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we have cmsg attached AND we transferred partial data at least, clear
msg_controllen on retry so we don't attempt to send that again.
Cc: stable@vger.kernel.org # 5.10+
Fixes: cac9e4418f4c ("io_uring/net: save msghdr->msg_control for retries")
Reported-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A fix for a typoed iterator in the Intel Soundwire driver, fairly simple
on inspection though not reviewed by Intel.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmSS9eUACgkQJNaLcl1U
h9AUrgf8DzxOtaTaxzWGlVhoky6KxmYXXLANFOM1iDvGghoxEiYRCjn79El5+MuL
YfmgOzUX1ls1T9eZFzUqhAmavjfYbQw7bK90nuK2iNklxswhFbvwJdb0X854eFza
wLZEXhDnw9uyuCeenhC/JICxz5JvRnscmQ+rMy0SZQhKWeQ5QUdy2oa1o6dEpLd7
uaqZirgaaa2jWikj/6KwO7/sEaE7UznZfwdMbijwwwURXMty1+9Zo5V8+sctYbjI
oMHhiTbgAPbgfKN6k7MntGwWXBM7CXpAuzz6SOwdrOQmKaE5ODAAG1PnAWMkR2DQ
T0rlCos56Xrom2lseAQoX/97OKCJWA==
=dj3R
-----END PGP SIGNATURE-----
Merge tag 'asoc-fix-v6.4-rc7' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fix for v6.4
A fix for a typoed iterator in the Intel Soundwire driver, fairly simple
on inspection though not reviewed by Intel.
Adds the required quirk to enable the Cirrus amp and correct pins
on the ASUS ROG GV601V series.
While this works if the related _DSD properties are made available, these
aren't included in the ACPI of these laptops (yet).
Signed-off-by: Luke D. Jones <luke@ljones.dev>
Link: https://lore.kernel.org/r/20230621085715.5382-1-luke@ljones.dev
Signed-off-by: Takashi Iwai <tiwai@suse.de>
We currently allow to create perf link for program with
expected_attach_type == BPF_TRACE_KPROBE_MULTI.
This will cause crash when we call helpers like get_attach_cookie or
get_func_ip in such program, because it will call the kprobe_multi's
version (current->bpf_ctx context setup) of those helpers while it
expects perf_link's current->bpf_ctx context setup.
Making sure that we use BPF_TRACE_KPROBE_MULTI expected_attach_type
only for programs attaching through kprobe_multi link.
Fixes: ca74823c6e16 ("bpf: Add cookie support to programs attached with kprobe multi link")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230618131414.75649-1-jolsa@kernel.org
When building a kernel with LLVM=1, LLVM_IAS=0 and CONFIG_KASAN=y, LLVM
leaves DWARF tags for the "asan.module_ctor" & co symbols. In turn,
pahole creates BTF_KIND_FUNC entries for these and this makes the BTF
metadata validation fail because they contain a dot.
In a dramatic turn of event, this BTF verification failure can cause
the netfilter_bpf initialization to fail, causing netfilter_core to
free the netfilter_helper hashmap and netfilter_ftp to trigger a
use-after-free. The risk of u-a-f in netfilter will be addressed
separately but the existence of "asan.module_ctor" debug info under some
build conditions sounds like a good enough reason to accept functions
that contain dots in BTF.
Although using only LLVM=1 is the recommended way to compile clang-based
kernels, users can certainly do LLVM=1, LLVM_IAS=0 as well and we still
try to support that combination according to Nick. To clarify:
- > v5.10 kernel, LLVM=1 (LLVM_IAS=0 is not the default) is recommended,
but user can still have LLVM=1, LLVM_IAS=0 to trigger the issue
- <= 5.10 kernel, LLVM=1 (LLVM_IAS=0 is the default) is recommended in
which case GNU as will be used
Fixes: 1dc92851849c ("bpf: kernel side support for BTF Var and DataSec")
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Yonghong Song <yhs@meta.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/bpf/20230615145607.3469985-1-revest@chromium.org
This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13.
This change appears to have broken things...
We now see applications hanging during disk accesses.
e.g.
multi-port virtio-blk device running in h/w (FPGA)
Host running a simple 'fio' test.
[global]
thread=1
direct=1
ioengine=libaio
norandommap=1
group_reporting=1
bs=4K
rw=read
iodepth=128
runtime=1
numjobs=4
time_based
[job0]
filename=/dev/vda
[job1]
filename=/dev/vdb
[job2]
filename=/dev/vdc
...
[job15]
filename=/dev/vdp
i.e. 16 disks; 4 queues per disk; simple burst of 4KB reads
This is repeatedly run in a loop.
After a few, normally <10 seconds, fio hangs.
With 64 queues (16 disks), failure occurs within a few seconds; with 8 queues (2 disks) it may take ~hour before hanging.
Last message:
fio-3.19
Starting 8 threads
Jobs: 1 (f=1): [_(7),R(1)][68.3%][eta 03h:11m:06s]
I think this means at the end of the run 1 queue was left incomplete.
'diskstats' (run while fio is hung) shows no outstanding transactions.
e.g.
$ cat /proc/diskstats
...
252 0 vda 1843140071 0 14745120568 712568645 0 0 0 0 0 3117947 712568645 0 0 0 0 0 0
252 16 vdb 1816291511 0 14530332088 704905623 0 0 0 0 0 3117711 704905623 0 0 0 0 0 0
...
Other stats (in the h/w, and added to the virtio-blk driver ([a]virtio_queue_rq(), [b]virtblk_handle_req(), [c]virtblk_request_done()) all agree, and show every request had a completion, and that virtblk_request_done() never gets called.
e.g.
PF= 0 vq=0 1 2 3
[a]request_count - 839416590 813148916 105586179 84988123
[b]completion1_count - 839416590 813148916 105586179 84988123
[c]completion2_count - 0 0 0 0
PF= 1 vq=0 1 2 3
[a]request_count - 823335887 812516140 104582672 75856549
[b]completion1_count - 823335887 812516140 104582672 75856549
[c]completion2_count - 0 0 0 0
i.e. the issue is after the virtio-blk driver.
This change was introduced in kernel 6.3.0.
I am seeing this using 6.3.3.
If I run with an earlier kernel (5.15), it does not occur.
If I make a simple patch to the 6.3.3 virtio-blk driver, to skip the blk_mq_add_to_batch()call, it does not fail.
e.g.
kernel 5.15 - this is OK
virtio_blk.c,virtblk_done() [irq handler]
if (likely(!blk_should_fake_timeout(req->q))) {
blk_mq_complete_request(req);
}
kernel 6.3.3 - this fails
virtio_blk.c,virtblk_handle_req() [irq handler]
if (likely(!blk_should_fake_timeout(req->q))) {
if (!blk_mq_complete_request_remote(req)) {
if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed
}
}
}
If I do, kernel 6.3.3 - this is OK
virtio_blk.c,virtblk_handle_req() [irq handler]
if (likely(!blk_should_fake_timeout(req->q))) {
if (!blk_mq_complete_request_remote(req)) {
virtblk_request_done(req); //force this here...
if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed
}
}
}
Perhaps you might like to fix/test/revert this change...
Martin
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306090826.C1fZmdMe-lkp@intel.com/
Cc: Suwan Kim <suwan.kim027@gmail.com>
Tested-by: edliaw@google.com
Reported-by: "Roberts, Martin" <martin.roberts@intel.com>
Message-Id: <336455b4f630f329380a8f53ee8cad3868764d5c.1686295549.git.mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This reverts commit da9ef50f545f86ffe6ff786174d26500c4db737a.
This fixes a regression in which the link would come up, but no
communication was possible.
The reverted commit was also removing a comment about
DP83867_PHYCR_FORCE_LINK_GOOD, this is not added back in this commits
since it seems that this is unrelated to the original code change.
Closes: https://lore.kernel.org/all/ZGuDJos8D7N0J6Z2@francesco-nb.int.toradex.com/
Fixes: da9ef50f545f ("net: phy: dp83867: perform soft reset and retain established link")
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Praneeth Bajjuri <praneeth@ti.com>
Link: https://lore.kernel.org/r/20230619154435.355485-1-francesco@dolcini.it
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
PHY address and device address are passed in the wrong order.
Cc: stable@vger.kernel.org
Fixes: 4e4aafcddbbf ("net: mdio: Add dedicated C45 API to MDIO bus drivers")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20230619094948.84452-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This includes a wholesale reversion of the post-6.4 series "make slab shrink
lockless". After input from Dave Chinner it has been decided that we
should go a different way. Thread starts at
https://lkml.kernel.org/r/ZH6K0McWBeCjaf16@dread.disaster.area.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZJH/qAAKCRDdBJ7gKXxA
jq7uAP9AtDGHfvOuW5jlHdYfpUBnbfuQDKjiik71UuIxyhtwQQEAqpOBv7UDuhHj
NbNIGTIi/xM5vkpjV6CBo9ymR7qTKwo=
=uGuc
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2023-06-20-12-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"19 hotfixes. 8 of these are cc:stable.
This includes a wholesale reversion of the post-6.4 series 'make slab
shrink lockless'. After input from Dave Chinner it has been decided
that we should go a different way [1]"
Link: https://lkml.kernel.org/r/ZH6K0McWBeCjaf16@dread.disaster.area [1]
* tag 'mm-hotfixes-stable-2023-06-20-12-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
selftests/mm: fix cross compilation with LLVM
mailmap: add entries for Ben Dooks
nilfs2: prevent general protection fault in nilfs_clear_dirty_page()
Revert "mm: vmscan: make global slab shrink lockless"
Revert "mm: vmscan: make memcg slab shrink lockless"
Revert "mm: vmscan: add shrinker_srcu_generation"
Revert "mm: shrinkers: make count and scan in shrinker debugfs lockless"
Revert "mm: vmscan: hold write lock to reparent shrinker nr_deferred"
Revert "mm: vmscan: remove shrinker_rwsem from synchronize_shrinkers()"
Revert "mm: shrinkers: convert shrinker_rwsem to mutex"
nilfs2: fix buffer corruption due to concurrent device reads
scripts/gdb: fix SB_* constants parsing
scripts: fix the gfp flags header path in gfp-translate
udmabuf: revert 'Add support for mapping hugepages (v4)'
mm/khugepaged: fix iteration in collapse_file
memfd: check for non-NULL file_seals in memfd_create() syscall
mm/vmalloc: do not output a spurious warning when huge vmalloc() fails
mm/mprotect: fix do_mprotect_pkey() limit check
writeback: fix dereferencing NULL mapping->host on writeback_page_template
Fix a kernel crash during early resume from ACPI S3 that has been
present since the 5.15 cycle when might_sleep() was added to
down_timeout(), which in some configurations of the kernel caused
an implicit preemption point to trigger at a wrong time.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmSRzgISHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRx7xAP/ifRMO55I2+K4oGO36Iol07TJASzSU6Q
mVk5/rw4LbcS/WnONO3T77hkBTCQ/uILkMImV/OwPQgOA+n6Fg5fKQMdGG8tamQb
FAJUE0TBDFLSGLl3YjoqYz3euA1ldZvivAukPSfom8G1Xoqcyrx6PfHqMKzL5JQt
fLb+8WOPkxtMt6fJ/iWFewkXHMU2/ZqiOpvaCOG3+SrBtf6S+EKnZmiQSxLaZ3o1
LK2Dx5lIWVT+aR6EgrF+3Vvrex0vry7MOI/7da8g3ou9newklAgtGAgUH1sqMDfp
dlQv0gksdCnNeBn5mtbcb3ZGsedaoy77WNrdzWod2psySCo0mgE+F8c7iEFBccZW
ss9a4Ev7fCtaghrxv/SSvn3CHFdM7wBvRnfrBJhCHTcr93hu8HiSS5GZKEDuRQye
wYSCgxHA7YTIetHcb0FZxnJgYHKihqD3qoaKlXcbd2cQXvGJGIGRcWw7COggNpw8
D3TR6/kJiePzEwuTV7EVlsGS26UjSBA73eXsMTxn4CyzJbO5oveLsMmkzMjVathg
s/pJIBXnU4i3nTdLjTezioAODhfIAkA7i/c61ZO3Y+bNgve5EoyT5MuFOsejBX3L
SJus945AFKNFLzuGVxsvlG+dc1P2fApo10gf5jE5gG9Xo1HLyiHfYK+1Lu5+dKW0
5LLGa6rd/NOs
=FyLX
-----END PGP SIGNATURE-----
Merge tag 'acpi-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix a kernel crash during early resume from ACPI S3 that has been
present since the 5.15 cycle when might_sleep() was added to
down_timeout(), which in some configurations of the kernel caused an
implicit preemption point to trigger at a wrong time"
* tag 'acpi-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: sleep: Avoid breaking S3 wakeup due to might_sleep()
Fix a regression introduced during the 6.3 cycle and causing
intel_soc_dts_iosf to report incorrect temperature values due
to a coding mistake (Hans de Goede).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmSRzXQSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRx64cP/0t73vzzBYmtyarWT/akpz/XK+ND82te
E8uOutsmINLn6ALAkgXqGFTQlcZVX3eZMPDL0BukDd7IktvqTxzGFD16SREBNv6A
XSiSDZnjrXfbuBhiBTjynVq7m1YW1+VEuYQL3yhGOTGWlszpiSo8K1AYkuW6gOiS
0Lmkq89q+QNpHyEZb8dRfcY95bXLQkwb+22mrizoN8WqC2ybiEt7y2GyzPeqSXge
Zcu/kC28iQsDzNTPv5s+qL0SJPywyACLIQgPEech2TL+TFY5q62PoPpkok/tQ9SS
o1YDUdDjanW4ubG0lt5qWs0srn1ScywwRM9sp/Kn0QyrB3z1DPwYKJtEoP01/z2R
z6mxGrGh0Bhi/iKo8YYkDi2ALnunOOINH7myZayd7XtrMmRqF0wtfobo2EF4Duq9
bm6vgoTRcK+A2mlKXRKZKNf8JbeQAsXTC6mMQ7YjAAXbm0NL/1Y2HQVxY7x4lRvr
jaqlT0oPipPzdJUIvWGQSVu1a5LzznvC4Bn0o+bB6VldACpynutyiWXqjvi+GGab
7CXY/DJt2K+Uj8hPqpucwYJ2qrL3GMtg9a7bjK4wyHxMAVA5GNDibWMKGepRA+qM
Sov8Bgc1VzSKqy1YpLNqeCA09lSRSC9OvDMZt+0LteEK62WbX4dJFDKXv9ruDhZh
qSYGJ66QSrvQ
=dLDl
-----END PGP SIGNATURE-----
Merge tag 'thermal-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fix from Rafael Wysocki:
"Fix a regression introduced during the 6.3 cycle causing
intel_soc_dts_iosf to report incorrect temperature values
due to a coding mistake (Hans de Goede)"
* tag 'thermal-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal/intel/intel_soc_dts_iosf: Fix reporting wrong temperatures
- Fix MAINTAINERS file to point to proper mailing list for rtla and rv
The mailing list pointed to linux-trace-devel instead of
linux-trace-kernel. The former is for the tracing libraries
and the latter is for anything in the Linux kernel tree.
The wrong mailing list was used because linux-trace-kernel did not
exist when rtla and rv were created.
- User events:
. Fix matching of dynamic events to their user events
When user writes to dynamic_events file, a lookup of the
registered dynamic events are made, but there were some cases
that a match could be incorrectly made.
. Add auto cleanup of user events
Have the user events automatically get removed when the last
reference (file descriptor) is closed. This was asked for to
prevent leaks of user events hanging around needing admins
to clean them up.
. Add persistent logic (but not let user space use it yet)
In some cases, having a persistent user event (one that does not
get cleaned up automatically) is useful. But there's still
debates about how to expose this to user space. The infrastructure
is added, but the API is not.
. Update the selftests
Update the user event selftests to reflect the above changes.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZJGrABQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qgRDAQDvF8ktlcn+gqyUxt2OcTlbBh0jqS0b
FKXYdq6FTgfWYQD/ctunFbPdzn4D6Kc/lG8p4QxpMmtA19BUOPwEt3CkAwM=
=CDzr
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix MAINTAINERS file to point to proper mailing list for rtla and rv
The mailing list pointed to linux-trace-devel instead of
linux-trace-kernel. The former is for the tracing libraries and the
latter is for anything in the Linux kernel tree. The wrong mailing
list was used because linux-trace-kernel did not exist when rtla and
rv were created.
- User events:
- Fix matching of dynamic events to their user events
When user writes to dynamic_events file, a lookup of the
registered dynamic events is made, but there were some cases that
a match could be incorrectly made.
- Add auto cleanup of user events
Have the user events automatically get removed when the last
reference (file descriptor) is closed. This was asked for to
prevent leaks of user events hanging around needing admins to
clean them up.
- Add persistent logic (but not let user space use it yet)
In some cases, having a persistent user event (one that does not
get cleaned up automatically) is useful. But there's still debates
about how to expose this to user space. The infrastructure is
added, but the API is not.
- Update the selftests
Update the user event selftests to reflect the above changes"
* tag 'trace-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/user_events: Document auto-cleanup and remove dyn_event refs
selftests/user_events: Adapt dyn_test to non-persist events
selftests/user_events: Ensure auto cleanup works as expected
tracing/user_events: Add auto cleanup and future persist flag
tracing/user_events: Track refcount consistently via put/get
tracing/user_events: Store register flags on events
tracing/user_events: Remove user_ns walk for groups
selftests/user_events: Add perf self-test for empty arguments events
selftests/user_events: Clear the events after perf self-test
selftests/user_events: Add ftrace self-test for empty arguments events
tracing/user_events: Fix the incorrect trace record for empty arguments events
tracing: Modify print_fields() for fields output order
tracing/user_events: Handle matching arguments that is null from dyn_events
tracing/user_events: Prevent same name but different args event
tracing/rv/rtla: Update MAINTAINERS file to point to proper mailing list
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmSR56sACgkQxWXV+ddt
WDvdORAAk3IVO8V6md56wSQBC0S2itaMomSYlzBJL8fqriKZwhaG6C/pJPyt0wyQ
zj1ZsJqX3WwhDssVWIYNrj6CLr7gOQiMpbX363PYTFc6Hvrw8FqDVk/VjworsSmC
IyVo3K11HCLIapbYrB1QLxzufDqYieYw6c2jS1elnwUUssvPHxonlfc2IZ1nef9z
jLS9ZBbEjO7KaUwKrDJ1g4OFljwSnuYCW41fZWFaoXnvlBk84BbBvcRisFY69a++
Kv0ux1Z2NJXmBOzKG4ZXCCNamwHj/oHqOT2cZm7ZzWrruIo5IP/Q4lzXHxKkbxA0
7cUfS+O7UqwBDzZZsBEos6Pn82q1rj9RFQkw8B6gCazwRN16TiA2U3AqOkjPZolL
RXMhLB1oEpQbyxB/PMhk63yV3uHs7VxVvyycuG9qlynFCRVFNxj4hE4TXFJu97DD
TBOXmUaCBbyRMX72BycziW4hwsbWafkmZqRWV8wHqRXOD4nd7QCj51ZOFTIsNmbq
Gpo2cJxmzEeYJC/o4Kq+ABCcmrphMhgALugR8T0TBP9gqp0WVs6NmixTzFEc9Zo6
VbcN721J7R7yYBknJIUQ5GuJJddhu4Ya86vk/N2eBsGVSq0KxCtJoHbM4RBYTaTl
NiXcdXu1Cr4TojoZN5W6cH6BBSCvIcEBruQtNkoIj5G0Wfrvt0U=
=IFO0
-----END PGP SIGNATURE-----
Merge tag 'for-6.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"One more regression fix for an assertion failure that uncovered a
nasty problem with stripe calculations. This is caused by a u32
overflow when there are enough devices. The fstests require 6 so this
hasn't been caught, I was able to hit it with 8.
The fix is minimal and only adds u64 casts, we'll clean that up later.
I did various additional tests to be sure"
* tag 'for-6.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix u32 overflows when left shifting stripe_nr
When deleting a base chain, iptables-nft simply submits the whole chain
to the kernel, including the NFTA_CHAIN_HOOK attribute. The new code
added by fixed commit then turned this into a chain update, destroying
the hook but not the chain itself. Detect the situation by checking if
the chain type is either netdev or inet/ingress.
Fixes: 7d937b107108f ("netfilter: nf_tables: support for deleting devices in an existing netdev chain")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Move the alias from xt_osf to nfnetlink_osf.
Fixes: f9324952088f ("netfilter: nfnetlink_osf: extract nfnetlink_subsystem code from xt_osf.c")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Otherwise the module reference counter is leaked.
Fixes b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Disallow updates of set timeout and garbage collection parameters for
anonymous sets.
Fixes: 123b99619cca ("netfilter: nf_tables: honor set timeout and garbage collection updates")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Use binding list to track set transaction and to check for unbound
chains before entering the commit phase.
Bail out if chain binding remain unused before entering the commit
step.
Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add a new list to track set transaction and to check for unbound
anonymous sets before entering the commit phase.
Bail out at the end of the transaction handling if an anonymous set
remains unbound.
Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Anonymous sets come with NFT_SET_CONSTANT from userspace. Although API
allows to create anonymous sets without NFT_SET_CONSTANT, it makes no
sense to allow to add and to delete elements for bound anonymous sets.
Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Since ("netfilter: nf_tables: drop map element references from
preparation phase"), integration with commit protocol is better,
therefore drop the workaround that b91d90368837 ("netfilter: nf_tables:
fix leaking object reference count") provides.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The .walk callback iterates over the current active set, but it might be
useful to iterate over the next generation set. Use the generation mask
to determine what set view (either current or next generation) is use
for the walk iteration.
Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
set .destroy callback releases the references to other objects in maps.
This is very late and it results in spurious EBUSY errors. Drop refcount
from the preparation phase instead, update set backend not to drop
reference counter from set .destroy path.
Exceptions: NFT_TRANS_PREPARE_ERROR does not require to drop the
reference counter because the transaction abort path releases the map
references for each element since the set is unbound. The abort path
also deals with releasing reference counter for new elements added to
unbound sets.
Fixes: 591054469b3e ("netfilter: nf_tables: revisit chain/object refcounting from elements")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add a new state to deal with rule expressions deactivation from the
newrule error path, otherwise the anonymous set remains in the list in
inactive state for the next generation. Mark the set/chain transaction
as unbound so the abort path releases this object, set it as inactive in
the next generation so it is not reachable anymore from this transaction
and reference counter is dropped.
Fixes: 1240eb93f061 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add bound flag to rule and chain transactions as in 6a0a8d10a366
("netfilter: nf_tables: use-after-free in failing rule with bound set")
to skip them in case that the chain is already bound from the abort
path.
This patch fixes an imbalance in the chain use refcnt that triggers a
WARN_ON on the table and chain destroy path.
This patch also disallows nested chain bindings, which is not
supported from userspace.
The logic to deal with chain binding in nft_data_hold() and
nft_data_release() is not correct. The NFT_TRANS_PREPARE state needs a
special handling in case a chain is bound but next expressions in the
same rule fail to initialize as described by 1240eb93f061 ("netfilter:
nf_tables: incorrect error path handling with NFT_MSG_NEWRULE").
The chain is left bound if rule construction fails, so the objects
stored in this chain (and the chain itself) are released by the
transaction records from the abort path, follow up patch ("netfilter:
nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain")
completes this error handling.
When deleting an existing rule, chain bound flag is set off so the
rule expression .destroy path releases the objects.
Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>