1153428 Commits

Author SHA1 Message Date
Linus Torvalds
222882c2ab Random number generator fixes for Linux 6.2-rc1.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmOhGqkACgkQSfxwEqXe
 A663VxAA0TOqGpqhI5FFV4r9rQR876SwD5doernG2k/lDugweTb0o4JK3jo03aFE
 V+McSbPAkiICSVsKompc48Blt7stHYfAIGu7KXBl88ZlrbNYay3ooV26WcAMbo7Y
 T3iLiK8wiqJBYOD5TUA3GA1ijVRmKBMiURrC1trHK7qZRc5N9lIedp6hRJD1INC+
 LUpSj6LoIhjpIqjbBLD3QlPRklkVhsceAvFwu/q2E5VUZgnMvBDzdJdll0Nt8uU4
 JBjw4GPsP5EdTOG0c+xt9aHApAbaT7Q7Un+dGKRf7xDUkgNvdZHwYitMIsIItyL0
 COiwQJS67ny61fBddz90yh6l7QK6D52fCC4xYLheBsS8cvZy2GcXZHYwDKUxqFWK
 UhqkEVTbl7gXC/NxGomEvkA84tPkLOtLlpd8BVaIQh5wEanVZOO9VxiMmFUYfmB+
 ygfcy011cxr4lLFzykh1JaRLGkWNTDN4RZXEsu501yZwv/xIcXZks0J5sRllZ3Pl
 JMjEwjWpy7CzCUAzR9v5wlRdYNTQjghT5zQEVeQMmsO+1LEKbh6HKPMHJd6DZWUI
 McuMylGChIYO4h6NTlY9wlkPI8MWTSSpN8UPXqRnpVwFSqhegnWul/C254tdX0aC
 0nrqLnlVQ95a2ZwfWkvZbbjN2kOtti+osQIv3PlCPeYiil599nU=
 =IBNq
 -----END PGP SIGNATURE-----

Merge tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull more random number generator updates from Jason Donenfeld:
 "Two remaining changes that are now possible after you merged a few
  other trees:

   - #include <asm/archrandom.h> can be removed from random.h now,
     making the direct use of the arch_random_* API more of a private
     implementation detail between the archs and random.c, rather than
     something for general consumers.

   - Two additional uses of prandom_u32_max() snuck in during the
     initial phase of pulls, so these have been converted to
     get_random_u32_below(), and now the deprecated prandom_u32_max()
     alias -- which was just a wrapper around get_random_u32_below() --
     can be removed.

  In addition, there is one fix:

   - Check efi_rt_services_supported() before attempting to use an EFI
     runtime function.

     This affected EFI systems that disable runtime services yet still
     boot via EFI (e.g. the reporter's Lenovo Thinkpad X13s laptop), as
     well systems where EFI runtime services have been forcibly
     disabled, such as on PREEMPT_RT.

     On those machines, a very early and hard to diagnose crash would
     happen, preventing boot"

* tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  prandom: remove prandom_u32_max()
  efi: random: fix NULL-deref when refreshing seed
  random: do not include <asm/archrandom.h> from random.h
2022-12-21 08:02:30 -08:00
Linus Torvalds
19822e3ee4 Urgent RCU pull request for v6.2
This commit fixes a lockdep false positive in synchronize_rcu() that
 can otherwise occur during early boot.  Theis fix simply avoids invoking
 lockdep if the scheduler has not yet been initialized, that is, during
 that portion of boot when interrupts are disabled.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmOeXj8THHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jPmZEACaI5JqO6Dr2U4HojJJBYEfLVaSYxDp
 JrUi5D5WzzZidyjM2fyyZZkdRVQ24i1aV2H/fbLoIIH/smYjE/KLEFHQmclpphw5
 BSOyapotjdt5YhIavvAeOjdUd7jPyMqhbDVnwzjnblhUD1ObLVlhIs8Pjn7/03sF
 gzlIhYgp3EL7GenT9j9kud2FwWP+wrVQ7SdJ+Ni/WAHYO8860xQAmFXH/07bYzx7
 fbp5iPkCOSSUoRMw/qQ8s7CE3XhBNKufv1BtcvV/uxEtutfV1qvEQBv/l2RBd0Vg
 wOVBZnWXze+7IUx13M90R/d04Nn7RaGwon6xBMlvIwL3qzEj8x/r1FYz7zZhQPkv
 wwChAxFHQACnLCZSu48WBtVrawNdZHM57KHUK4rloAbrK92FpVznhQU+5pBDy4c6
 rfY2my+SNO4kWvePEg/2fd8aQycrZr99fK/ojCIerEn8MNboxuVOYTjzy0qtUcVT
 yJ/80O8ADI3QL/NRhjMFWgEnBDbHN1PcGhiRoutApdLQkg/UPTJjCRZ7ibmIFYY2
 ViW3cSndr/f0I7sOex2EILHwiZ2bUKiwyeTW6vWuFl/7MEWsvpJaWoUxXgQj99Bt
 ncAOaxtmmuhbwrOCt2kab90A0c/thNx9kNYYIkG3vUNcSRzyHQtg3ydEljBpaTFR
 OzhrqdUA7W9Sfg==
 =UKUo
 -----END PGP SIGNATURE-----

Merge tag 'rcu-urgent.2022.12.17a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull RCU fix from Paul McKenney:
 "This fixes a lockdep false positive in synchronize_rcu() that can
  otherwise occur during early boot.

  The fix simply avoids invoking lockdep if the scheduler has not yet
  been initialized, that is, during that portion of boot when interrupts
  are disabled"

* tag 'rcu-urgent.2022.12.17a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  rcu: Don't assert interrupts enabled too early in boot
2022-12-21 07:59:57 -08:00
Christian Brauner
11933cf1d9
pnode: terminate at peers of source
The propagate_mnt() function handles mount propagation when creating
mounts and propagates the source mount tree @source_mnt to all
applicable nodes of the destination propagation mount tree headed by
@dest_mnt.

Unfortunately it contains a bug where it fails to terminate at peers of
@source_mnt when looking up copies of the source mount that become
masters for copies of the source mount tree mounted on top of slaves in
the destination propagation tree causing a NULL dereference.

Once the mechanics of the bug are understood it's easy to trigger.
Because of unprivileged user namespaces it is available to unprivileged
users.

While fixing this bug we've gotten confused multiple times due to
unclear terminology or missing concepts. So let's start this with some
clarifications:

* The terms "master" or "peer" denote a shared mount. A shared mount
  belongs to a peer group.

* A peer group is a set of shared mounts that propagate to each other.
  They are identified by a peer group id. The peer group id is available
  in @shared_mnt->mnt_group_id.
  Shared mounts within the same peer group have the same peer group id.
  The peers in a peer group can be reached via @shared_mnt->mnt_share.

* The terms "slave mount" or "dependent mount" denote a mount that
  receives propagation from a peer in a peer group. IOW, shared mounts
  may have slave mounts and slave mounts have shared mounts as their
  master. Slave mounts of a given peer in a peer group are listed on
  that peers slave list available at @shared_mnt->mnt_slave_list.

* The term "master mount" denotes a mount in a peer group. IOW, it
  denotes a shared mount or a peer mount in a peer group. The term
  "master mount" - or "master" for short - is mostly used when talking
  in the context of slave mounts that receive propagation from a master
  mount. A master mount of a slave identifies the closest peer group a
  slave mount receives propagation from. The master mount of a slave can
  be identified via @slave_mount->mnt_master. Different slaves may point
  to different masters in the same peer group.

* Multiple peers in a peer group can have non-empty ->mnt_slave_lists.
  Non-empty ->mnt_slave_lists of peers don't intersect. Consequently, to
  ensure all slave mounts of a peer group are visited the
  ->mnt_slave_lists of all peers in a peer group have to be walked.

* Slave mounts point to a peer in the closest peer group they receive
  propagation from via @slave_mnt->mnt_master (see above). Together with
  these peers they form a propagation group (see below). The closest
  peer group can thus be identified through the peer group id
  @slave_mnt->mnt_master->mnt_group_id of the peer/master that a slave
  mount receives propagation from.

* A shared-slave mount is a slave mount to a peer group pg1 while also
  a peer in another peer group pg2. IOW, a peer group may receive
  propagation from another peer group.

  If a peer group pg1 is a slave to another peer group pg2 then all
  peers in peer group pg1 point to the same peer in peer group pg2 via
  ->mnt_master. IOW, all peers in peer group pg1 appear on the same
  ->mnt_slave_list. IOW, they cannot be slaves to different peer groups.

* A pure slave mount is a slave mount that is a slave to a peer group
  but is not a peer in another peer group.

* A propagation group denotes the set of mounts consisting of a single
  peer group pg1 and all slave mounts and shared-slave mounts that point
  to a peer in that peer group via ->mnt_master. IOW, all slave mounts
  such that @slave_mnt->mnt_master->mnt_group_id is equal to
  @shared_mnt->mnt_group_id.

  The concept of a propagation group makes it easier to talk about a
  single propagation level in a propagation tree.

  For example, in propagate_mnt() the immediate peers of @dest_mnt and
  all slaves of @dest_mnt's peer group form a propagation group propg1.
  So a shared-slave mount that is a slave in propg1 and that is a peer
  in another peer group pg2 forms another propagation group propg2
  together with all slaves that point to that shared-slave mount in
  their ->mnt_master.

* A propagation tree refers to all mounts that receive propagation
  starting from a specific shared mount.

  For example, for propagate_mnt() @dest_mnt is the start of a
  propagation tree. The propagation tree ecompasses all mounts that
  receive propagation from @dest_mnt's peer group down to the leafs.

With that out of the way let's get to the actual algorithm.

We know that @dest_mnt is guaranteed to be a pure shared mount or a
shared-slave mount. This is guaranteed by a check in
attach_recursive_mnt(). So propagate_mnt() will first propagate the
source mount tree to all peers in @dest_mnt's peer group:

for (n = next_peer(dest_mnt); n != dest_mnt; n = next_peer(n)) {
        ret = propagate_one(n);
        if (ret)
               goto out;
}

Notice, that the peer propagation loop of propagate_mnt() doesn't
propagate @dest_mnt itself. @dest_mnt is mounted directly in
attach_recursive_mnt() after we propagated to the destination
propagation tree.

The mount that will be mounted on top of @dest_mnt is @source_mnt. This
copy was created earlier even before we entered attach_recursive_mnt()
and doesn't concern us a lot here.

It's just important to notice that when propagate_mnt() is called
@source_mnt will not yet have been mounted on top of @dest_mnt. Thus,
@source_mnt->mnt_parent will either still point to @source_mnt or - in
the case @source_mnt is moved and thus already attached - still to its
former parent.

For each peer @m in @dest_mnt's peer group propagate_one() will create a
new copy of the source mount tree and mount that copy @child on @m such
that @child->mnt_parent points to @m after propagate_one() returns.

propagate_one() will stash the last destination propagation node @m in
@last_dest and the last copy it created for the source mount tree in
@last_source.

Hence, if we call into propagate_one() again for the next destination
propagation node @m, @last_dest will point to the previous destination
propagation node and @last_source will point to the previous copy of the
source mount tree and mounted on @last_dest.

Each new copy of the source mount tree is created from the previous copy
of the source mount tree. This will become important later.

The peer loop in propagate_mnt() is straightforward. We iterate through
the peers copying and updating @last_source and @last_dest as we go
through them and mount each copy of the source mount tree @child on a
peer @m in @dest_mnt's peer group.

After propagate_mnt() handled the peers in @dest_mnt's peer group
propagate_mnt() will propagate the source mount tree down the
propagation tree that @dest_mnt's peer group propagates to:

for (m = next_group(dest_mnt, dest_mnt); m;
                m = next_group(m, dest_mnt)) {
        /* everything in that slave group */
        n = m;
        do {
                ret = propagate_one(n);
                if (ret)
                        goto out;
                n = next_peer(n);
        } while (n != m);
}

The next_group() helper will recursively walk the destination
propagation tree, descending into each propagation group of the
propagation tree.

The important part is that it takes care to propagate the source mount
tree to all peers in the peer group of a propagation group before it
propagates to the slaves to those peers in the propagation group. IOW,
it creates and mounts copies of the source mount tree that become
masters before it creates and mounts copies of the source mount tree
that become slaves to these masters.

It is important to remember that propagating the source mount tree to
each mount @m in the destination propagation tree simply means that we
create and mount new copies @child of the source mount tree on @m such
that @child->mnt_parent points to @m.

Since we know that each node @m in the destination propagation tree
headed by @dest_mnt's peer group will be overmounted with a copy of the
source mount tree and since we know that the propagation properties of
each copy of the source mount tree we create and mount at @m will mostly
mirror the propagation properties of @m. We can use that information to
create and mount the copies of the source mount tree that become masters
before their slaves.

The easy case is always when @m and @last_dest are peers in a peer group
of a given propagation group. In that case we know that we can simply
copy @last_source without having to figure out what the master for the
new copy @child of the source mount tree needs to be as we've done that
in a previous call to propagate_one().

The hard case is when we're dealing with a slave mount or a shared-slave
mount @m in a destination propagation group that we need to create and
mount a copy of the source mount tree on.

For each propagation group in the destination propagation tree we
propagate the source mount tree to we want to make sure that the copies
@child of the source mount tree we create and mount on slaves @m pick an
ealier copy of the source mount tree that we mounted on a master @m of
the destination propagation group as their master. This is a mouthful
but as far as we can tell that's the core of it all.

But, if we keep track of the masters in the destination propagation tree
@m we can use the information to find the correct master for each copy
of the source mount tree we create and mount at the slaves in the
destination propagation tree @m.

Let's walk through the base case as that's still fairly easy to grasp.

If we're dealing with the first slave in the propagation group that
@dest_mnt is in then we don't yet have marked any masters in the
destination propagation tree.

We know the master for the first slave to @dest_mnt's peer group is
simple @dest_mnt. So we expect this algorithm to yield a copy of the
source mount tree that was mounted on a peer in @dest_mnt's peer group
as the master for the copy of the source mount tree we want to mount at
the first slave @m:

for (n = m; ; n = p) {
        p = n->mnt_master;
        if (p == dest_master || IS_MNT_MARKED(p))
                break;
}

For the first slave we walk the destination propagation tree all the way
up to a peer in @dest_mnt's peer group. IOW, the propagation hierarchy
can be walked by walking up the @mnt->mnt_master hierarchy of the
destination propagation tree @m. We will ultimately find a peer in
@dest_mnt's peer group and thus ultimately @dest_mnt->mnt_master.

Btw, here the assumption we listed at the beginning becomes important.
Namely, that peers in a peer group pg1 that are slaves in another peer
group pg2 appear on the same ->mnt_slave_list. IOW, all slaves who are
peers in peer group pg1 point to the same peer in peer group pg2 via
their ->mnt_master. Otherwise the termination condition in the code
above would be wrong and next_group() would be broken too.

So the first iteration sets:

n = m;
p = n->mnt_master;

such that @p now points to a peer or @dest_mnt itself. We walk up one
more level since we don't have any marked mounts. So we end up with:

n = dest_mnt;
p = dest_mnt->mnt_master;

If @dest_mnt's peer group is not slave to another peer group then @p is
now NULL. If @dest_mnt's peer group is a slave to another peer group
then @p now points to @dest_mnt->mnt_master points which is a master
outside the propagation tree we're dealing with.

Now we need to figure out the master for the copy of the source mount
tree we're about to create and mount on the first slave of @dest_mnt's
peer group:

do {
        struct mount *parent = last_source->mnt_parent;
        if (last_source == first_source)
                break;
        done = parent->mnt_master == p;
        if (done && peers(n, parent))
                break;
        last_source = last_source->mnt_master;
} while (!done);

We know that @last_source->mnt_parent points to @last_dest and
@last_dest is the last peer in @dest_mnt's peer group we propagated to
in the peer loop in propagate_mnt().

Consequently, @last_source is the last copy we created and mount on that
last peer in @dest_mnt's peer group. So @last_source is the master we
want to pick.

We know that @last_source->mnt_parent->mnt_master points to
@last_dest->mnt_master. We also know that @last_dest->mnt_master is
either NULL or points to a master outside of the destination propagation
tree and so does @p. Hence:

done = parent->mnt_master == p;

is trivially true in the base condition.

We also know that for the first slave mount of @dest_mnt's peer group
that @last_dest either points @dest_mnt itself because it was
initialized to:

last_dest = dest_mnt;

at the beginning of propagate_mnt() or it will point to a peer of
@dest_mnt in its peer group. In both cases it is guaranteed that on the
first iteration @n and @parent are peers (Please note the check for
peers here as that's important.):

if (done && peers(n, parent))
        break;

So, as we expected, we select @last_source, which referes to the last
copy of the source mount tree we mounted on the last peer in @dest_mnt's
peer group, as the master of the first slave in @dest_mnt's peer group.
The rest is taken care of by clone_mnt(last_source, ...). We'll skip
over that part otherwise this becomes a blogpost.

At the end of propagate_mnt() we now mark @m->mnt_master as the first
master in the destination propagation tree that is distinct from
@dest_mnt->mnt_master. IOW, we mark @dest_mnt itself as a master.

By marking @dest_mnt or one of it's peers we are able to easily find it
again when we later lookup masters for other copies of the source mount
tree we mount copies of the source mount tree on slaves @m to
@dest_mnt's peer group. This, in turn allows us to find the master we
selected for the copies of the source mount tree we mounted on master in
the destination propagation tree again.

The important part is to realize that the code makes use of the fact
that the last copy of the source mount tree stashed in @last_source was
mounted on top of the previous destination propagation node @last_dest.
What this means is that @last_source allows us to walk the destination
propagation hierarchy the same way each destination propagation node @m
does.

If we take @last_source, which is the copy of @source_mnt we have
mounted on @last_dest in the previous iteration of propagate_one(), then
we know @last_source->mnt_parent points to @last_dest but we also know
that as we walk through the destination propagation tree that
@last_source->mnt_master will point to an earlier copy of the source
mount tree we mounted one an earlier destination propagation node @m.

IOW, @last_source->mnt_parent will be our hook into the destination
propagation tree and each consecutive @last_source->mnt_master will lead
us to an earlier propagation node @m via
@last_source->mnt_master->mnt_parent.

Hence, by walking up @last_source->mnt_master, each of which is mounted
on a node that is a master @m in the destination propagation tree we can
also walk up the destination propagation hierarchy.

So, for each new destination propagation node @m we use the previous
copy of @last_source and the fact it's mounted on the previous
propagation node @last_dest via @last_source->mnt_master->mnt_parent to
determine what the master of the new copy of @last_source needs to be.

The goal is to find the _closest_ master that the new copy of the source
mount tree we are about to create and mount on a slave @m in the
destination propagation tree needs to pick. IOW, we want to find a
suitable master in the propagation group.

As the propagation structure of the source mount propagation tree we
create mirrors the propagation structure of the destination propagation
tree we can find @m's closest master - i.e., a marked master - which is
a peer in the closest peer group that @m receives propagation from. We
store that closest master of @m in @p as before and record the slave to
that master in @n

We then search for this master @p via @last_source by walking up the
master hierarchy starting from the last copy of the source mount tree
stored in @last_source that we created and mounted on the previous
destination propagation node @m.

We will try to find the master by walking @last_source->mnt_master and
by comparing @last_source->mnt_master->mnt_parent->mnt_master to @p. If
we find @p then we can figure out what earlier copy of the source mount
tree needs to be the master for the new copy of the source mount tree
we're about to create and mount at the current destination propagation
node @m.

If @last_source->mnt_master->mnt_parent and @n are peers then we know
that the closest master they receive propagation from is
@last_source->mnt_master->mnt_parent->mnt_master. If not then the
closest immediate peer group that they receive propagation from must be
one level higher up.

This builds on the earlier clarification at the beginning that all peers
in a peer group which are slaves of other peer groups all point to the
same ->mnt_master, i.e., appear on the same ->mnt_slave_list, of the
closest peer group that they receive propagation from.

However, terminating the walk has corner cases.

If the closest marked master for a given destination node @m cannot be
found by walking up the master hierarchy via @last_source->mnt_master
then we need to terminate the walk when we encounter @source_mnt again.

This isn't an arbitrary termination. It simply means that the new copy
of the source mount tree we're about to create has a copy of the source
mount tree we created and mounted on a peer in @dest_mnt's peer group as
its master. IOW, @source_mnt is the peer in the closest peer group that
the new copy of the source mount tree receives propagation from.

We absolutely have to stop @source_mnt because @last_source->mnt_master
either points outside the propagation hierarchy we're dealing with or it
is NULL because @source_mnt isn't a shared-slave.

So continuing the walk past @source_mnt would cause a NULL dereference
via @last_source->mnt_master->mnt_parent. And so we have to stop the
walk when we encounter @source_mnt again.

One scenario where this can happen is when we first handled a series of
slaves of @dest_mnt's peer group and then encounter peers in a new peer
group that is a slave to @dest_mnt's peer group. We handle them and then
we encounter another slave mount to @dest_mnt that is a pure slave to
@dest_mnt's peer group. That pure slave will have a peer in @dest_mnt's
peer group as its master. Consequently, the new copy of the source mount
tree will need to have @source_mnt as it's master. So we walk the
propagation hierarchy all the way up to @source_mnt based on
@last_source->mnt_master.

So terminate on @source_mnt, easy peasy. Except, that the check misses
something that the rest of the algorithm already handles.

If @dest_mnt has peers in it's peer group the peer loop in
propagate_mnt():

for (n = next_peer(dest_mnt); n != dest_mnt; n = next_peer(n)) {
        ret = propagate_one(n);
        if (ret)
                goto out;
}

will consecutively update @last_source with each previous copy of the
source mount tree we created and mounted at the previous peer in
@dest_mnt's peer group. So after that loop terminates @last_source will
point to whatever copy of the source mount tree was created and mounted
on the last peer in @dest_mnt's peer group.

Furthermore, if there is even a single additional peer in @dest_mnt's
peer group then @last_source will __not__ point to @source_mnt anymore.
Because, as we mentioned above, @dest_mnt isn't even handled in this
loop but directly in attach_recursive_mnt(). So it can't even accidently
come last in that peer loop.

So the first time we handle a slave mount @m of @dest_mnt's peer group
the copy of the source mount tree we create will make the __last copy of
the source mount tree we created and mounted on the last peer in
@dest_mnt's peer group the master of the new copy of the source mount
tree we create and mount on the first slave of @dest_mnt's peer group__.

But this means that the termination condition that checks for
@source_mnt is wrong. The @source_mnt cannot be found anymore by
propagate_one(). Instead it will find the last copy of the source mount
tree we created and mounted for the last peer of @dest_mnt's peer group
again. And that is a peer of @source_mnt not @source_mnt itself.

IOW, we fail to terminate the loop correctly and ultimately dereference
@last_source->mnt_master->mnt_parent. When @source_mnt's peer group
isn't slave to another peer group then @last_source->mnt_master is NULL
causing the splat below.

For example, assume @dest_mnt is a pure shared mount and has three peers
in its peer group:

===================================================================================
                                         mount-id   mount-parent-id   peer-group-id
===================================================================================
(@dest_mnt) mnt_master[216]              309        297               shared:216
    \
     (@source_mnt) mnt_master[218]:      609        609               shared:218

(1) mnt_master[216]:                     607        605               shared:216
    \
     (P1) mnt_master[218]:               624        607               shared:218

(2) mnt_master[216]:                     576        574               shared:216
    \
     (P2) mnt_master[218]:               625        576               shared:218

(3) mnt_master[216]:                     545        543               shared:216
    \
     (P3) mnt_master[218]:               626        545               shared:218

After this sequence has been processed @last_source will point to (P3),
the copy generated for the third peer in @dest_mnt's peer group we
handled. So the copy of the source mount tree (P4) we create and mount
on the first slave of @dest_mnt's peer group:

===================================================================================
                                         mount-id   mount-parent-id   peer-group-id
===================================================================================
    mnt_master[216]                      309        297               shared:216
   /
  /
(S0) mnt_slave                           483        481               master:216
  \
   \    (P3) mnt_master[218]             626        545               shared:218
    \  /
     \/
    (P4) mnt_slave                       627        483               master:218

will pick the last copy of the source mount tree (P3) as master, not (S0).

When walking the propagation hierarchy via @last_source's master
hierarchy we encounter (P3) but not (S0), i.e., @source_mnt.

We can fix this in multiple ways:

(1) By setting @last_source to @source_mnt after we processed the peers
    in @dest_mnt's peer group right after the peer loop in
    propagate_mnt().

(2) By changing the termination condition that relies on finding exactly
    @source_mnt to finding a peer of @source_mnt.

(3) By only moving @last_source when we actually venture into a new peer
    group or some clever variant thereof.

The first two options are minimally invasive and what we want as a fix.
The third option is more intrusive but something we'd like to explore in
the near future.

This passes all LTP tests and specifically the mount propagation
testsuite part of it. It also holds up against all known reproducers of
this issues.

Final words.
First, this is a clever but __worringly__ underdocumented algorithm.
There isn't a single detailed comment to be found in next_group(),
propagate_one() or anywhere else in that file for that matter. This has
been a giant pain to understand and work through and a bug like this is
insanely difficult to fix without a detailed understanding of what's
happening. Let's not talk about the amount of time that was sunk into
fixing this.

Second, all the cool kids with access to
unshare --mount --user --map-root --propagation=unchanged
are going to have a lot of fun. IOW, triggerable by unprivileged users
while namespace_lock() lock is held.

[  115.848393] BUG: kernel NULL pointer dereference, address: 0000000000000010
[  115.848967] #PF: supervisor read access in kernel mode
[  115.849386] #PF: error_code(0x0000) - not-present page
[  115.849803] PGD 0 P4D 0
[  115.850012] Oops: 0000 [#1] PREEMPT SMP PTI
[  115.850354] CPU: 0 PID: 15591 Comm: mount Not tainted 6.1.0-rc7 #3
[  115.850851] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[  115.851510] RIP: 0010:propagate_one.part.0+0x7f/0x1a0
[  115.851924] Code: 75 eb 4c 8b 05 c2 25 37 02 4c 89 ca 48 8b 4a 10
49 39 d0 74 1e 48 3b 81 e0 00 00 00 74 26 48 8b 92 e0 00 00 00 be 01
00 00 00 <48> 8b 4a 10 49 39 d0 75 e2 40 84 f6 74 38 4c 89 05 84 25 37
02 4d
[  115.853441] RSP: 0018:ffffb8d5443d7d50 EFLAGS: 00010282
[  115.853865] RAX: ffff8e4d87c41c80 RBX: ffff8e4d88ded780 RCX: ffff8e4da4333a00
[  115.854458] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8e4d88ded780
[  115.855044] RBP: ffff8e4d88ded780 R08: ffff8e4da4338000 R09: ffff8e4da43388c0
[  115.855693] R10: 0000000000000002 R11: ffffb8d540158000 R12: ffffb8d5443d7da8
[  115.856304] R13: ffff8e4d88ded780 R14: 0000000000000000 R15: 0000000000000000
[  115.856859] FS:  00007f92c90c9800(0000) GS:ffff8e4dfdc00000(0000)
knlGS:0000000000000000
[  115.857531] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  115.858006] CR2: 0000000000000010 CR3: 0000000022f4c002 CR4: 00000000000706f0
[  115.858598] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  115.859393] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  115.860099] Call Trace:
[  115.860358]  <TASK>
[  115.860535]  propagate_mnt+0x14d/0x190
[  115.860848]  attach_recursive_mnt+0x274/0x3e0
[  115.861212]  path_mount+0x8c8/0xa60
[  115.861503]  __x64_sys_mount+0xf6/0x140
[  115.861819]  do_syscall_64+0x5b/0x80
[  115.862117]  ? do_faccessat+0x123/0x250
[  115.862435]  ? syscall_exit_to_user_mode+0x17/0x40
[  115.862826]  ? do_syscall_64+0x67/0x80
[  115.863133]  ? syscall_exit_to_user_mode+0x17/0x40
[  115.863527]  ? do_syscall_64+0x67/0x80
[  115.863835]  ? do_syscall_64+0x67/0x80
[  115.864144]  ? do_syscall_64+0x67/0x80
[  115.864452]  ? exc_page_fault+0x70/0x170
[  115.864775]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  115.865187] RIP: 0033:0x7f92c92b0ebe
[  115.865480] Code: 48 8b 0d 75 4f 0c 00 f7 d8 64 89 01 48 83 c8 ff
c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 42 4f 0c 00 f7 d8 64 89
01 48
[  115.866984] RSP: 002b:00007fff000aa728 EFLAGS: 00000246 ORIG_RAX:
00000000000000a5
[  115.867607] RAX: ffffffffffffffda RBX: 000055a77888d6b0 RCX: 00007f92c92b0ebe
[  115.868240] RDX: 000055a77888d8e0 RSI: 000055a77888e6e0 RDI: 000055a77888e620
[  115.868823] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
[  115.869403] R10: 0000000000001000 R11: 0000000000000246 R12: 000055a77888e620
[  115.869994] R13: 000055a77888d8e0 R14: 00000000ffffffff R15: 00007f92c93e4076
[  115.870581]  </TASK>
[  115.870763] Modules linked in: nft_fib_inet nft_fib_ipv4
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 ip_set rfkill nf_tables nfnetlink qrtr snd_intel8x0
sunrpc snd_ac97_codec ac97_bus snd_pcm snd_timer intel_rapl_msr
intel_rapl_common snd vboxguest intel_powerclamp video rapl joydev
soundcore i2c_piix4 wmi fuse zram xfs vmwgfx crct10dif_pclmul
crc32_pclmul crc32c_intel polyval_clmulni polyval_generic
drm_ttm_helper ttm e1000 ghash_clmulni_intel serio_raw ata_generic
pata_acpi scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath
[  115.875288] CR2: 0000000000000010
[  115.875641] ---[ end trace 0000000000000000 ]---
[  115.876135] RIP: 0010:propagate_one.part.0+0x7f/0x1a0
[  115.876551] Code: 75 eb 4c 8b 05 c2 25 37 02 4c 89 ca 48 8b 4a 10
49 39 d0 74 1e 48 3b 81 e0 00 00 00 74 26 48 8b 92 e0 00 00 00 be 01
00 00 00 <48> 8b 4a 10 49 39 d0 75 e2 40 84 f6 74 38 4c 89 05 84 25 37
02 4d
[  115.878086] RSP: 0018:ffffb8d5443d7d50 EFLAGS: 00010282
[  115.878511] RAX: ffff8e4d87c41c80 RBX: ffff8e4d88ded780 RCX: ffff8e4da4333a00
[  115.879128] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8e4d88ded780
[  115.879715] RBP: ffff8e4d88ded780 R08: ffff8e4da4338000 R09: ffff8e4da43388c0
[  115.880359] R10: 0000000000000002 R11: ffffb8d540158000 R12: ffffb8d5443d7da8
[  115.880962] R13: ffff8e4d88ded780 R14: 0000000000000000 R15: 0000000000000000
[  115.881548] FS:  00007f92c90c9800(0000) GS:ffff8e4dfdc00000(0000)
knlGS:0000000000000000
[  115.882234] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  115.882713] CR2: 0000000000000010 CR3: 0000000022f4c002 CR4: 00000000000706f0
[  115.883314] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  115.883966] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fixes: f2ebb3a921c1 ("smarter propagate_mnt()")
Fixes: 5ec0811d3037 ("propogate_mnt: Handle the first propogated copy being a slave")
Cc: <stable@vger.kernel.org>
Reported-by: Ditang Chen <ditang.c@gmail.com>
Signed-off-by: Seth Forshee (Digital Ocean) <sforshee@kernel.org>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
If there are no big objections I'll get this to Linus rather sooner than later.
2022-12-21 14:45:25 +01:00
Florian Fainelli
24b333a866 MIPS: dts: bcm63268: Add missing properties to the TWD node
We currently have a DTC warning with the current DTS due to the lack of
a suitable #address-cells and #size-cells property:

  DTC     arch/mips/boot/dts/brcm/bcm63268-comtrend-vr-3032u.dtb
arch/mips/boot/dts/brcm/bcm63268.dtsi:115.5-22: Warning (reg_format): /ubus/timer-mfd@10000080/timer@0:reg: property has invalid length (8 bytes) (#address-cells == 2, #size-cells == 1)
arch/mips/boot/dts/brcm/bcm63268.dtsi:120.5-22: Warning (reg_format): /ubus/timer-mfd@10000080/watchdog@1c:reg: property has invalid length (8 bytes) (#address-cells == 2, #size-cells == 1)
arch/mips/boot/dts/brcm/bcm63268.dtsi:111.4-35: Warning (ranges_format): /ubus/timer-mfd@10000080:ranges: "ranges" property has invalid length (12 bytes) (parent #address-cells == 1, child #address-cells == 2, #size-cells == 1)

Fixes: d3db4b96ab7f ("mips: dts: bcm63268: add TWD block timer")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2022-12-21 10:46:10 +01:00
Sergio Paracuellos
76ce51798c MIPS: ralink: mt7621: avoid to init common ralink reset controller
Commit 38a8553b0a22 ("clk: ralink: make system controller node a reset provider")
make system controller a reset provider for mt7621 ralink SoCs. Ralink init code
also tries to start previous common reset controller which at the end tries to
find device tree node 'ralink,rt2880-reset'. mt7621 device tree file is not
using at all this node anymore. Hence avoid to init this common reset controller
for mt7621 ralink SoCs to avoid 'Failed to find reset controller node' boot
error trace error.

Fixes: 64b2d6ffff86 ("staging: mt7621-dts: align resets with binding documentation")
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2022-12-21 10:45:56 +01:00
Christoph Hellwig
3622b86f49 dma-mapping: reject GFP_COMP for noncoherent allocations
While not quite as bogus as for the dma-coherent allocations that were
fixed earlier, GFP_COMP for these allocations has no benefits for
the dma-direct case, and can't be supported at all by dma dma-iommu
backend which splits up allocations into smaller orders.  Due to an
oversight in ffcb75458460 that flag stopped being cleared for all
dma allocations, but only got rejected for coherent ones, so fix up
these callers to not allow __GFP_COMP as well after the sound code
has been fixed to not ask for it.

Fixes: ffcb75458460 ("dma-mapping: reject __GFP_COMP in dma_alloc_attrs")
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reported-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Tested-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
2022-12-21 08:45:38 +01:00
Christoph Hellwig
db91832127 ALSA: memalloc: don't use GFP_COMP for non-coherent dma allocations
While not quite as bogus as for the dma-coherent allocations that were
fixed earlier, GFP_COMP for these allocations has no benefits for
the dma-direct case, and can't be supported at all by dma dma-iommu
backend which splits up allocations into smaller orders.  Due to an
oversight in ffcb75458460 that flag stopped being cleared for all
dma allocations, but only got rejected for coherent ones.

Start fixing this by not requesting __GFP_COMP in the sound code, which
is the only place that did this.

Fixes: ffcb75458460 ("dma-mapping: reject __GFP_COMP in dma_alloc_attrs")
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reported-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Tested-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
2022-12-21 08:45:17 +01:00
Andrew Morton
1644d755d0 Merge branch 'linus' 2022-12-20 15:02:03 -08:00
Wei Fang
19e72b064f net: fec: check the return value of build_skb()
The build_skb might return a null pointer but there is no check on the
return value in the fec_enet_rx_queue(). So a null pointer dereference
might occur. To avoid this, we check the return value of build_skb. If
the return value is a null pointer, the driver will recycle the page and
update the statistic of ndev. Then jump to rx_processing_done to clear
the status flags of the BD so that the hardware can recycle the BD.

Fixes: 95698ff6177b ("net: fec: using page pool to manage RX buffers")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Shenwei Wang <Shenwei.wang@nxp.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/20221219022755.1047573-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-20 11:33:24 -08:00
Tim Huang
8660495a9c drm/amdgpu: skip mes self test after s0i3 resume for MES IP v11.0
MES is part of gfxoff and MES suspend and resume are skipped for S0i3.
But the mes_self_test call path is still in the amdgpu_device_ip_late_init.
it's should also be skipped for s0ix as no hardware re-initialization
happened.

Besides, mes_self_test will free the BO that triggers a lot of warning
messages while in the suspend state.

[   81.656085] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
[   81.679435] Call Trace:
[   81.679726]  <TASK>
[   81.679981]  amdgpu_mes_remove_hw_queue+0x17a/0x230 [amdgpu]
[   81.680857]  amdgpu_mes_self_test+0x390/0x430 [amdgpu]
[   81.681665]  mes_v11_0_late_init+0x37/0x50 [amdgpu]
[   81.682423]  amdgpu_device_ip_late_init+0x53/0x280 [amdgpu]
[   81.683257]  amdgpu_device_resume+0xae/0x2a0 [amdgpu]
[   81.684043]  amdgpu_pmops_resume+0x37/0x70 [amdgpu]
[   81.684818]  pci_pm_resume+0x5c/0xa0
[   81.685247]  ? pci_pm_thaw+0x90/0x90
[   81.685658]  dpm_run_callback+0x4e/0x160
[   81.686110]  device_resume+0xad/0x210
[   81.686529]  async_resume+0x1e/0x40
[   81.686931]  async_run_entry_fn+0x33/0x120
[   81.687405]  process_one_work+0x21d/0x3f0
[   81.687869]  worker_thread+0x4a/0x3c0
[   81.688293]  ? process_one_work+0x3f0/0x3f0
[   81.688777]  kthread+0xff/0x130
[   81.689157]  ? kthread_complete_and_exit+0x20/0x20
[   81.689707]  ret_from_fork+0x22/0x30
[   81.690118]  </TASK>
[   81.690380] ---[ end trace 0000000000000000 ]---

v2: make the comment clean and use adev->in_s0ix instead of
adev->suspend

Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0, 6.1
2022-12-20 13:23:05 -05:00
Evan Quan
e73fc71e8f drm/amd/pm: correct the fan speed retrieving in PWM for some SMU13 asics
For SMU 13.0.0 and 13.0.7, the output from PMFW is in percent. Driver
need to convert that into correct PMW(255) based.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0, 6.1
2022-12-20 13:22:56 -05:00
Evan Quan
272b981416 drm/amd/pm: bump SMU13.0.0 driver_if header to version 0x34
To fit the latest PMFW and suppress the warning emerged on driver loading.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0, 6.1
2022-12-20 13:22:45 -05:00
Namhyung Kim
59119c09ae perf lock contention: Factor out lock_type_table
Move it out of get_type_str() so that we can reuse the table for others
later.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221219201732.460111-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-20 15:18:29 -03:00
Yang Jihong
8b269b7555 perf probe: Check -v and -q options in the right place
Check the -q and -v options first to return earlier on error.

Before:

  # perf probe -q -v test
  probe-definition(0): test
  symbol:test file:(null) line:0 offset:0 return:0 lazy:(null)
  0 arguments
    Error: -v and -q are exclusive.

After:

  # perf probe -q -v test
    Error: -v and -q are exclusive.

Fixes: 5e17b28f1e246b98 ("perf probe: Add --quiet option to suppress output result message")
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20221220035702.188413-4-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-20 15:16:33 -03:00
Yang Jihong
7c0a6144f9 perf tools: Fix usage of the verbose variable
The data type of the verbose variable is integer and can be negative,
replace improperly used cases in a unified manner:
 1. if (verbose)        => if (verbose > 0)
 2. if (!verbose)       => if (verbose <= 0)
 3. if (XX && verbose)  => if (XX && verbose > 0)
 4. if (XX && !verbose) => if (XX && verbose <= 0)

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20221220035702.188413-3-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-20 15:16:33 -03:00
Yang Jihong
188ac720d3 perf debug: Set debug_peo_args and redirect_to_stderr variable to correct values in perf_quiet_option()
When perf uses quiet mode, perf_quiet_option() sets the 'debug_peo_args'
variable to -1, and display_attr() incorrectly determines the value of
'debug_peo_args'.  As a result, unexpected information is displayed.

Before:

  # perf record --quiet -- ls > /dev/null
  ------------------------------------------------------------
  perf_event_attr:
    size                             128
    { sample_period, sample_freq }   4000
    sample_type                      IP|TID|TIME|PERIOD
    read_format                      ID|LOST
    disabled                         1
    inherit                          1
    mmap                             1
    comm                             1
    freq                             1
    enable_on_exec                   1
    task                             1
    precise_ip                       3
    sample_id_all                    1
    exclude_guest                    1
    mmap2                            1
    comm_exec                        1
    ksymbol                          1
    bpf_event                        1
  ------------------------------------------------------------
  ...

After:
  # perf record --quiet -- ls > /dev/null
  #

redirect_to_stderr is a similar problem.

Fixes: f78eaef0e0493f60 ("perf tools: Allow to force redirect pr_debug to stderr.")
Fixes: ccd26741f5e6bdf2 ("perf tool: Provide an option to print perf_event_open args and return value")
Suggested-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: martin.lau@kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: https://lore.kernel.org/r/20221220035702.188413-2-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-20 15:16:33 -03:00
Arnaldo Carvalho de Melo
b235e5b51f tools headers UAPI: Sync linux/kvm.h with the kernel sources
To pick the changes in:

  86bdf3ebcfe1ded0 ("KVM: Support dirty ring in conjunction with bitmap")

That just rebuilds perf, as these patches don't add any new KVM ioctl to
be harvested for the the 'perf trace' ioctl syscall argument
beautifiers.

This is also by now used by tools/testing/selftests/kvm/, a simple test
build didn't succeed, but for another reason:

  lib/kvm_util.c: In function ‘vm_enable_dirty_ring’:
  lib/kvm_util.c:125:30: error: ‘KVM_CAP_DIRTY_LOG_RING_ACQ_REL’ undeclared (first use in this function); did you mean ‘KVM_CAP_DIRTY_LOG_RING’?
    125 |         if (vm_check_cap(vm, KVM_CAP_DIRTY_LOG_RING_ACQ_REL))
        |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        |                              KVM_CAP_DIRTY_LOG_RING

I'll send a separate patch for that.

This silences this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
  diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lore.kernel.org/lkml/Y6H3b1Q4Msjy5Yz3@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-20 15:15:57 -03:00
Alex Deucher
afa6646b1c drm/amdgpu: skip MES for S0ix as well since it's part of GFX
It's also part of gfxoff.

Cc: stable@vger.kernel.org # 6.0, 6.1
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-20 13:08:12 -05:00
Arnd Bergmann
d118b18fb1 drm/amd/pm: avoid large variable on kernel stack
The activity_monitor_external[] array is too big to fit on the
kernel stack, resulting in this warning with clang:

drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/smu_v13_0_7_ppt.c:1438:12: error: stack frame size (1040) exceeds limit (1024) in 'smu_v13_0_7_get_power_profile_mode' [-Werror,-Wframe-larger-than]

Use dynamic allocation instead. It should also be possible to
have single element here instead of the array, but this seems
easier.

v2: fix up argument to sizeof() (Alex)

Fixes: 334682ae8151 ("drm/amd/pm: enable workload type change on smu_v13_0_7")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-20 12:59:34 -05:00
Philip Yang
1a799c4c19 drm/amdkfd: Fix double release compute pasid
If kfd_process_device_init_vm returns failure after vm is converted to
compute vm and vm->pasid set to compute pasid, KFD will not take
pdd->drm_file reference. As a result, drm close file handler maybe
called to release the compute pasid before KFD process destroy worker to
release the same pasid and set vm->pasid to zero, this generates below
WARNING backtrace and NULL pointer access.

Add helper amdgpu_amdkfd_gpuvm_set_vm_pasid and call it at the last step
of kfd_process_device_init_vm, to ensure vm pasid is the original pasid
if acquiring vm failed or is the compute pasid with pdd->drm_file
reference taken to avoid double release same pasid.

 amdgpu: Failed to create process VM object
 ida_free called for id=32770 which is not allocated.
 WARNING: CPU: 57 PID: 72542 at ../lib/idr.c:522 ida_free+0x96/0x140
 RIP: 0010:ida_free+0x96/0x140
 Call Trace:
  amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu]
  amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu]
  drm_file_free.part.13+0x216/0x270 [drm]
  drm_close_helper.isra.14+0x60/0x70 [drm]
  drm_release+0x6e/0xf0 [drm]
  __fput+0xcc/0x280
  ____fput+0xe/0x20
  task_work_run+0x96/0xc0
  do_exit+0x3d0/0xc10

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 RIP: 0010:ida_free+0x76/0x140
 Call Trace:
  amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu]
  amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu]
  drm_file_free.part.13+0x216/0x270 [drm]
  drm_close_helper.isra.14+0x60/0x70 [drm]
  drm_release+0x6e/0xf0 [drm]
  __fput+0xcc/0x280
  ____fput+0xe/0x20
  task_work_run+0x96/0xc0
  do_exit+0x3d0/0xc10

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-20 12:58:06 -05:00
Philip Yang
29d48b87db drm/amdkfd: Fix kfd_process_device_init_vm error handling
Should only destroy the ib_mem and let process cleanup worker to free
the outstanding BOs. Reset the pointer in pdd->qpd structure, to avoid
NULL pointer access in process destroy worker.

 BUG: kernel NULL pointer dereference, address: 0000000000000010
 Call Trace:
  amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel+0x46/0xb0 [amdgpu]
  kfd_process_device_destroy_cwsr_dgpu+0x40/0x70 [amdgpu]
  kfd_process_destroy_pdds+0x71/0x190 [amdgpu]
  kfd_process_wq_release+0x2a2/0x3b0 [amdgpu]
  process_one_work+0x2a1/0x600
  worker_thread+0x39/0x3d0

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-20 12:57:58 -05:00
Arnaldo Carvalho de Melo
6d5edd15c9 tools headers UAPI: Sync powerpc syscall table with the kernel sources
To pick the changes in these csets:

  ce883a2ba310cd7c ("powerpc/32: fix syscall wrappers with 64-bit arguments")

That doesn't cause any changes in the perf tools.

This table is used in tools perf to allow features as described in the
last update to this file.

This addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl'
  diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/Y6H0C5plZ4V4aiPm@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-20 14:42:49 -03:00
Arnaldo Carvalho de Melo
a66558dcb1 tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes in:

  97fa21f65c3eb5bb ("x86/resctrl: Move MSR defines into msr-index.h")
  7420ae3bb977b46e ("x86/intel_epb: Set Alder Lake N and Raptor Lake P normal EPB")

Addressing these tools/perf build warnings:

    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
    Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'

That makes the beautification scripts to pick some new entries:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
  $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
  $ diff -u before after
  --- before	2022-12-20 14:28:40.893794072 -0300
  +++ after	2022-12-20 14:28:54.831993914 -0300
  @@ -266,6 +266,7 @@
   	[0xc0000104 - x86_64_specific_MSRs_offset] = "AMD64_TSC_RATIO",
   	[0xc000010e - x86_64_specific_MSRs_offset] = "AMD64_LBR_SELECT",
   	[0xc000010f - x86_64_specific_MSRs_offset] = "AMD_DBG_EXTN_CFG",
  +	[0xc0000200 - x86_64_specific_MSRs_offset] = "IA32_MBA_BW_BASE",
   	[0xc0000300 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS",
   	[0xc0000301 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_CTL",
   	[0xc0000302 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS_CLR",
  $

Now one can trace systemwide asking to see backtraces to where that MSR
is being read/written, see this example with a previous update:

  # perf trace -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
  ^C#

If we use -v (verbose mode) we can see what it does behind the scenes:

  # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
  Using CPUID AuthenticAMD-25-21-0
  0x6a0
  0x6a8
  New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
  0x6a0
  0x6a8
  New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
  mmap size 528384B
  ^C#

Example with a frequent msr:

  # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2
  Using CPUID AuthenticAMD-25-21-0
  0x48
  New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
  0x48
  New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
  mmap size 528384B
  Looking at the vmlinux_path (8 entries long)
  symsrc__init: build id mismatch for vmlinux.
  Using /proc/kcore for kernel data
  Using /proc/kallsyms for symbols
     0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
                                       do_trace_write_msr ([kernel.kallsyms])
                                       do_trace_write_msr ([kernel.kallsyms])
                                       __switch_to_xtra ([kernel.kallsyms])
                                       __switch_to ([kernel.kallsyms])
                                       __schedule ([kernel.kallsyms])
                                       schedule ([kernel.kallsyms])
                                       futex_wait_queue_me ([kernel.kallsyms])
                                       futex_wait ([kernel.kallsyms])
                                       do_futex ([kernel.kallsyms])
                                       __x64_sys_futex ([kernel.kallsyms])
                                       do_syscall_64 ([kernel.kallsyms])
                                       entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
                                       __futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so)
     0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2)
                                       do_trace_write_msr ([kernel.kallsyms])
                                       do_trace_write_msr ([kernel.kallsyms])
                                       __switch_to_xtra ([kernel.kallsyms])
                                       __switch_to ([kernel.kallsyms])
                                       __schedule ([kernel.kallsyms])
                                       schedule_idle ([kernel.kallsyms])
                                       do_idle ([kernel.kallsyms])
                                       cpu_startup_entry ([kernel.kallsyms])
                                       secondary_startup_64_no_verify ([kernel.kallsyms])
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/lkml/Y6HyTOGRNvKfCVe4@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-20 14:36:47 -03:00
Arnaldo Carvalho de Melo
eeac18e2bf tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
To pick up the changes in:

  bc7ed4d30815bc43 ("drm/i915/perf: Apply Wa_18013179988")
  81d5f7d91492aa3a ("drm/i915/perf: Add 32-bit OAG and OAR formats for DG2")
  8133a6daad4e7274 ("drm/i915: enable PS64 support for DG2")
  b76c14c8fb2af1e4 ("drm/i915/huc: better define HuC status getparam possible return values.")
  94dfc73e7cf4a31d ("treewide: uapi: Replace zero-length arrays with flexible-array members")

That doesn't add any ioctl, so no changes in tooling.

This silences this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
  diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/lkml/Y6HukoRaZh2R4j5U@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-20 14:28:13 -03:00
Alessandro Carminati
bfa87ac86c rv/monitors: Move monitor structure in rodata
It makes sense to move the important monitor structure into rodata to
prevent accidental structure modification.

Link: https://lkml.kernel.org/r/20221122173648.4732-1-acarmina@redhat.com

Signed-off-by: Alessandro Carminati <acarmina@redhat.com>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-20 11:46:40 -05:00
Linus Torvalds
b6bb9676f2 m68knommu: updates and fixes for v6.2
Fixes include:
 . use strscpy() instead of strncpy() for cmdline setup
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEmsfM6tQwfNjBOxr3TiQVqaG9L4AFAmOg5EsACgkQTiQVqaG9
 L4BYbg//VCfF1+mGM0TJkTRGNKr4JuSN7QvD3iwrUCgdFdmPTT87l4mgzwTepQtS
 mHF8LdRI7re51grG5cNo7Z6RFZC1/SksjSm1no7o6S3c705N5aMYMTTz1rUW/9w3
 s9shSjcHA8cKmP0/W5jASSe0fKu3gY28txmTFobZg8PzT5mo6fDlcQTucgf3HOpK
 6+zqocNtOXv3iG7Ay2mcP40EmrZyB3EswB6S26BvR3Vzf2yfXCWLhSQAXt9OjfJv
 IG3Zz2ba94uGRcYd1PWzeYyHUCdYu/YWkvajSg36vUVL56y6HnyTxjAIKkkMb0GM
 SxRE+Qq13lSSQy2aNTcOvSyTXUX3zzRpkmCA5pKAEB/cYSxuQo9t5PNj4q7tK+cw
 YURD1ter5h6h60TrN2kFUknOm1XNHrrKHmPLxTX/PZWg/DxDDsbftZkernBnXuct
 u4mSOpfeG2EEsKQu2V7tVN/MZCIK1uF52v69Zzslf6Xw61jNG98cyHnOk/x+Ci6J
 v1+y9o7W4r1+3x0XZ7NP1WyNkAhcuTTleHbI995z3ZlvSt3mCOKeIBXA1a15dukq
 3atzdvcdEHm4LiCyDlztweKMT2l0YFFp7M8fJRWFanxv5Oyt5aLW0AjQ6eO6ko7z
 dbUA8uHWa8XrbyKWEMjosYw7RqQrZRdY9l0qv8vvg2SC3xh92nE=
 =7enN
 -----END PGP SIGNATURE-----

Merge tag 'm68knommu-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu

Pull m68knommu update from Greg Ungerer:
 "Only a single change to use the safer strscpy() instead of strncpy()
  when setting up the cmdline"

* tag 'm68knommu-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68k: use strscpy() to instead of strncpy()
2022-12-20 08:56:35 -06:00
Linus Torvalds
32d528c4b8 SPDX/License additions for 6.2-rc1
Here are 2 small updates for LICENSES and some kernel files that add the
 Copyleft-next license and use it in a SPDX tag as a dual-license for
 some kernel files.
 
 These have been discussed thoroughly in public on the linux-spdx mailing
 list, and have the needed acks on them, as well as having been in
 linux-next with no reported issues for quite some time.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY6F1Qg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynGWwCfVJ+Z1CVWSFC8KaaGNiFu/gXmgNUAoKy11gWJ
 8igpSNEkOiGiaGA+AvN+
 =j8iu
 -----END PGP SIGNATURE-----

Merge tag 'spdx-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx

Pull SPDX/License additions from Greg KH:
 "Here are two small updates for LICENSES and some kernel files that add
  the Copyleft-next license and use it in a SPDX tag as a dual-license
  for some kernel files.

  These have been discussed thoroughly in public on the linux-spdx
  mailing list, and have the needed acks on them, as well as having been
  in linux-next with no reported issues for quite some time"

* tag 'spdx-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
  testing: use the copyleft-next-0.3.1 SPDX tag
  LICENSES: Add the copyleft-next-0.3.1 license
2022-12-20 08:53:16 -06:00
Linus Torvalds
3e0caea754 Devicetree updates for v6.2, part 2:
- Treewide dropping of redundant 'binding' or 'schema' from schema
   titles. This will be followed up with a automated check to catch
   these.
 
 - Re-sort vendor-prefies
 
 - Convert GPIO based watchdog to schema
 
 - Handle all the variations for clocks, resets, power domains in i.MX
   PCIe binding
 
 - Document missing 'power-domains' property in mxsfb
 
 - Fix error with path references in Tegra XUSB example
 
 - Honor CONFIG_CMDLINE* even without /chosen node
 -----BEGIN PGP SIGNATURE-----
 
 iQIyBAABCgAdFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmOg5DEACgkQ+vtdtY28
 YcMM6Q/3c8FpkvnSltcBT/a9nszD52aE1/STUDdb4t69PX4JVn0PO6oSMyMa7RPw
 wlPqGi7J2VRqarALiqokMtEHa0Thn84Rf6BQCO2ktHDBux1wG2xWPOD8G+GjDGbJ
 YwxBzPN7rbmgm2EqrxMI+nABX/3Wj78B3ocFFjulCEZz9ZY9jPhJF8FVfUNa0529
 kUhLPmOPPl4plg4LCOTmZesVXpSeU3FuSypCepEf906rJxLO3Cb2KP5AU5uCEcuT
 giTnsghL5t2iyCefNU0duR15J3XffrlcwKUMaoEsbS/u+autpZRx69KGpnZfp48F
 zZeij8cgcUJ14we+A8aRPN9H5NSQK+iOFBcBMPrKhboeOtFXN3Ftarum5Pq/J41a
 qmeCgREiMMzy8GOMsKJ+25uwoL61iGBQlxHHqylAQzJ3KfRRgSIAgWlS01btYXih
 jPp9JYvRubHsdjUQPNNBb9Us7VAO3KgJEGjBZV5DpXeVLg8g2w27gG4QgbqSf66a
 JeZz07yeiGgpGknW1NAp7EO1C030LaOnBVuRhN71QNjTTd4/+J46fdjXm0JdZj/A
 ZVQCbTM3LKCYGbt3Nio3QstzcM1bK19IH4J0zN8CJe/nxdAyopbe0aK5MgC7vxmO
 rB/g/e2MOf32aXLZSCzjKMKefEmA3g0/KmZdoopTT4uSz9TCjA==
 =ZGp/
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull more devicetree updates from Rob Herring:
 "This is mostly a treewide clean-up from Krzysztof. There's also a
  couple of fixes and things that fell thru the cracks.

  I must say this has been a nice merge window without bindings dumped
  in at the last minute introducing warnings.

  Summary:

   - Treewide dropping of redundant 'binding' or 'schema' from schema
     titles. This will be followed up with a automated check to catch
     these.

   - Re-sort vendor-prefies

   - Convert GPIO based watchdog to schema

   - Handle all the variations for clocks, resets, power domains in i.MX
     PCIe binding

   - Document missing 'power-domains' property in mxsfb

   - Fix error with path references in Tegra XUSB example

   - Honor CONFIG_CMDLINE* even without /chosen node"

* tag 'devicetree-for-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: drop redundant part of title (manual)
  dt-bindings: clock: drop redundant part of title
  dt-bindings: drop redundant part of title (beginning)
  dt-bindings: drop redundant part of title (end, part three)
  dt-bindings: drop redundant part of title (end, part two)
  dt-bindings: drop redundant part of title (end)
  dt-bindings: clock: st,stm32mp1-rcc: add proper title
  dt-bindings: memory-controllers: ti,gpmc-child: drop redundant part of title
  dt-bindings: drop redundant part of title of shared bindings
  dt-bindings: watchdog: gpio: Convert bindings to YAML
  dt-bindings: imx6q-pcie: Handle more resets on legacy platforms
  dt-bindings: imx6q-pcie: Handle various PD configurations
  dt-bindings: imx6q-pcie: Handle various clock configurations
  dt-bindings: hwmon: ntc-thermistor: drop Naveen Krishna Chatradhi from maintainers
  dt-bindings: mxsfb: Document i.MX8M/i.MX6SX/i.MX6SL power-domains property
  dt-bindings: vendor-prefixes: sort entries alphabetically
  dt-bindings: usb: tegra-xusb: Remove path references
  of: fdt: Honor CONFIG_CMDLINE* even without /chosen node
2022-12-20 08:48:24 -06:00
Linus Torvalds
35f79d0e2c parisc architecture fixes for kernel v6.2-rc1:
Fixes:
 - Fix potential null-ptr-deref in start_task()
 - Fix kgdb console on serial port
 - Add missing FORCE prerequisites in Makefile
 - Drop PMD_SHIFT from calculation in pgtable.h
 
 Enhancements:
 - Implement a wrapper to align madvise() MADV_* constants with other
   architectures
 - If machine supports running MPE/XL, show the MPE model string
 
 Cleanups:
 - Drop duplicate kgdb console code
 - Indenting fixes in setup_cmdline()
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCY6B/cgAKCRD3ErUQojoP
 X85pAQCC6YpSYON3KZRfABeiDTRCKcGm72p7JQRnyj88XCq6ZAEA40T2qpRpjoYi
 NaXr28mxHFYh4Z0c5Y7K5EuFTT7gAA4=
 =e2Jd
 -----END PGP SIGNATURE-----

Merge tag 'parisc-for-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc updates from Helge Deller:
 "There is one noteable patch, which allows the parisc kernel to use the
  same MADV_xxx constants as the other architectures going forward. With
  that change only alpha has one entry left (MADV_DONTNEED is 6 vs 4 on
  others) which is different. To prevent an ABI breakage, a wrapper is
  included which translates old MADV values to the new ones, so existing
  userspace isn't affected. Reason for that patch is, that some
  applications wrongly used the standard MADV_xxx values even on some
  non-x86 platforms and as such those programs failed to run correctly
  on parisc (examples are qemu-user, tor browser and boringssl).

  Then the kgdb console and the LED code received some fixes, and some
  0-day warnings are now gone. Finally, the very last compile warning
  which was visible during a kernel build is now fixed too (in the vDSO
  code).

  The majority of the patches are tagged for stable series and in
  summary this patchset is quite small and drops more code than it adds:

Fixes:
   - Fix potential null-ptr-deref in start_task()
   - Fix kgdb console on serial port
   - Add missing FORCE prerequisites in Makefile
   - Drop PMD_SHIFT from calculation in pgtable.h

  Enhancements:
   - Implement a wrapper to align madvise() MADV_* constants with other
     architectures
   - If machine supports running MPE/XL, show the MPE model string

  Cleanups:
   - Drop duplicate kgdb console code
   - Indenting fixes in setup_cmdline()"

* tag 'parisc-for-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Show MPE/iX model string at bootup
  parisc: Add missing FORCE prerequisites in Makefile
  parisc: Move pdc_result struct to firmware.c
  parisc: Drop locking in pdc console code
  parisc: Drop duplicate kgdb_pdc console
  parisc: Fix locking in pdc_iodc_print() firmware call
  parisc: Drop PMD_SHIFT from calculation in pgtable.h
  parisc: Align parisc MADV_XXX constants with all other architectures
  parisc: led: Fix potential null-ptr-deref in start_task()
  parisc: Fix inconsistent indenting in setup_cmdline()
2022-12-20 08:43:53 -06:00
José Expósito
54f27dc53f HID: sony: Fix unused function warning
Compiling this driver without setting "CONFIG_SONY_FF" generates the
following warning:

	drivers/hid/hid-sony.c:2358:20: warning: unused function
	'sony_send_output_report' [-Wunused-function]
	static inline void sony_send_output_report(struct sony_sc *sc)
	                   ^
	1 warning generated.

Add the missing preprocessor check to fix it.

Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2022-12-20 15:37:52 +01:00
Terry Junge
3d57f36c89 HID: plantronics: Additional PIDs for double volume key presses quirk
I no longer work for Plantronics (aka Poly, aka HP) and do not have
access to the headsets in order to test. However, as noted by Maxim,
the other 32xx models that share the same base code set as the 3220
would need the same quirk. This patch adds the PIDs for the rest of
the Blackwire 32XX product family that require the quirk.

Plantronics Blackwire 3210 Series (047f:c055)
Plantronics Blackwire 3215 Series (047f:c057)
Plantronics Blackwire 3225 Series (047f:c058)

Quote from previous patch by Maxim Mikityanskiy
Plantronics Blackwire 3220 Series (047f:c056) sends HID reports twice
for each volume key press. This patch adds a quirk to hid-plantronics
for this product ID, which will ignore the second volume key press if
it happens within 5 ms from the last one that was handled.

The patch was tested on the mentioned model only, it shouldn't affect
other models, however, this quirk might be needed for them too.
Auto-repeat (when a key is held pressed) is not affected, because the
rate is about 3 times per second, which is far less frequent than once
in 5 ms.
End quote

Signed-off-by: Terry Junge <linuxhid@cosmicgizmosystems.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2022-12-20 15:35:21 +01:00
José Expósito
4eab1c2fe0 HID: multitouch: fix Asus ExpertBook P2 P2451FA trackpoint
The HID descriptor of this device contains two mouse collections, one
for mouse emulation and the other for the trackpoint.

Both collections get merged and, because the first one defines X and Y,
the movemenent events reported by the trackpoint collection are
ignored.

Set the MT_CLS_WIN_8_FORCE_MULTI_INPUT class for this device to be able
to receive its reports.

This fix is similar to/based on commit 40d5bb87377a ("HID: multitouch:
enable multi-input as a quirk for some devices").

Link: https://gitlab.freedesktop.org/libinput/libinput/-/issues/825
Reported-by: Akito <the@akito.ooo>
Tested-by: Akito <the@akito.ooo>
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2022-12-20 15:34:16 +01:00
José Expósito
cec827d658 HID: Ignore HP Envy x360 eu0009nv stylus battery
Battery status is reported for the HP Envy x360 eu0009nv stylus even
though it does not have battery.

Prevent it from always reporting the battery as low (1%).

Link: https://gitlab.freedesktop.org/libinput/libinput/-/issues/823
Reported-by: Ioannis Iliopoulos <jxftw2424@gmail.com>
Tested-by: Ioannis Iliopoulos <jxftw2424@gmail.com>
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2022-12-20 15:33:37 +01:00
Linus Torvalds
70b07bec95 asm-generic bits for 6.2
There are only three fairly simple patches. The #include
 change to linux/swab.h addresses a userspace build issue,
 and the change to the mmio tracing logic helps provide
 more useful traces.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmOgtJUACgkQmmx57+YA
 GNln8Q//dvQ2FRIWBXKh4r6CxtiCx2aktGmnP1YAuaIVuzjGSn/8EQZAoTYN5jKY
 Io8rFt1/FfOMtu3E32JtGpgfDAP/8sfz3Lao9bzJR/Fjv059qL5QCoI3qbEFTNz9
 vzUqiddFZGppn76qsXSA6aItHVDS4Y97XiYRSwSMlpIz+9a84rYxCo04bNR4ut4t
 PR5+lvlTDfGfmj+SebrCt/IEi/FF9ckEYCLJHfaSPcQcujLDZDKPcT2RbubgwHgB
 OfE5Rx25xJxR4BU5MFe74sKn5Qi5HOfr1GrsjL3RbMNiYuHgbwLcZkMXvbZukdHz
 50Gt8UXMAxvZYKz92kyQLYuiKEtFSrQ8JccgqVUWL/lRLDoUkTg4hz4tmGUZE6KP
 ElxdgIBem9yrFX0oCaPNkY5d3MRU2i19FvBfKWKC54NbcmBjpHxxSg+WW/P7Jw+N
 uegj7qcEh7RcQU4w97OW4nS+eZmnXb4O4qXZeFwhXHS/snH7p3iBApyoPlyb+KOs
 np5MWRNaGFfi8BWWeVTX78U2VW8Ql8nnlRIlk/Wwm8AkVaNFQDnffKPi87paZd9o
 Kl+a9broMf4v0Oq5JTxqPMzmn9zUV0rHa1VanRBnNKqTOWalmNcsfsg1Ih9PhAAT
 p3u2CN0cBI7QmrcymJHrCuv0eNJRjsYa5FB4xmhJcJkD2qjsqXI=
 =05F5
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic updates from Arnd Bergmann:
 "There are only three fairly simple patches.

  The #include change to linux/swab.h addresses a userspace build issue,
  and the change to the mmio tracing logic helps provide more useful
  traces"

* tag 'asm-generic-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  uapi: Add missing _UAPI prefix to <asm-generic/types.h> include guard
  asm-generic/io: Add _RET_IP_ to MMIO trace for more accurate debug info
  include/uapi/linux/swab: Fix potentially missing __always_inline
2022-12-20 08:32:11 -06:00
Jason Gerecke
1db1f39259 HID: wacom: Ensure bootloader PID is usable in hidraw mode
Some Wacom devices have a special "bootloader" mode that is used for
firmware flashing. When operating in this mode, the device cannot be
used for input, and the HID descriptor is not able to be processed by
the driver. The driver generates an "Unknown device_type" warning and
then returns an error code from wacom_probe(). This is a problem because
userspace still needs to be able to interact with the device via hidraw
to perform the firmware flash.

This commit adds a non-generic device definition for 056a:0094 which
is used when devices are in "bootloader" mode. It marks the devices
with a special BOOTLOADER type that is recognized by wacom_probe() and
wacom_raw_event(). When we see this type we ensure a hidraw device is
created and otherwise keep our hands off so that userspace is in full
control.

Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Tested-by: Tatsunosuke Tobita <tatsunosuke.tobita@wacom.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2022-12-20 15:29:10 +01:00
Jiasheng Jiang
53ffa6a9f8 HID: amd_sfh: Add missing check for dma_alloc_coherent
Add check for the return value of the dma_alloc_coherent since
it may return NULL pointer if allocation fails.

Fixes: 4b2c53d93a4b ("SFH:Transport Driver to add support of AMD Sensor Fusion Hub (SFH)")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Acked-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20221220024921.21992-1-jiasheng@iscas.ac.cn
2022-12-20 09:45:53 +01:00
Dave Airlie
38624d2c97 - Documentation fixe (Matt, Miaoqian)
- OA-perf related fix (Umesh)
 - VLV/CHV HDMI/DP audio fix (Ville)
 - Display DDI/Transcoder fix (Khaled)
 - Migrate fixes (Chris, Matt)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAmObhQoACgkQ+mJfZA7r
 E8r+OwgAoghECtkKqFfAK/laPSk1UTh1kebNR/B9HDdINQwmDCs3N9OcvgpqT68t
 Vve/boxBsN+9ybV7zMsW5HFdeFrUCCWvywhF3kEwuLlvKD95r4k6Q1pYCA9dS7+F
 plTK64GR1HkVozUp7J4qqhFCSP2gaT4z3+m9/iX9m6TKectebrtSEMl3uUcvB88Z
 u+YmauyrnCWNG8UnHZgRKFzsh81WURSmHdFZlGyLv/4ylnkF5fq59UIMcIL9HCIw
 NbeQ+4BwFhMhdTkf/dl8LblnjkNdW7mxOzDNl+gzngO5QgFZ/AtP+8L/1wN3Q8GO
 NfdAiM8HWIP4xawEcZnBl0meIY91kw==
 =eIt5
 -----END PGP SIGNATURE-----

Merge tag 'drm-intel-next-fixes-2022-12-15' of git://anongit.freedesktop.org/drm/drm-intel into drm-next

- Documentation fixe (Matt, Miaoqian)
- OA-perf related fix (Umesh)
- VLV/CHV HDMI/DP audio fix (Ville)
- Display DDI/Transcoder fix (Khaled)
- Migrate fixes (Chris, Matt)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Y5uFYOJ/1jgf2eSE@intel.com
2022-12-20 15:43:14 +10:00
Dave Airlie
5504eb164e Merge tag 'amd-drm-fixes-6.2-2022-12-15' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-fixes-6.2-2022-12-15:

amdgpu:
- Spelling fix
- BO pin fix
- Properly handle polaris 10/11 overlap asics
- GMC9 fix
- SR-IOV suspend fix
- DCN 3.1.4 fix
- KFD userptr locking fix
- SMU13.x fixes
- GDS/GWS/OA handling fix
- Reserved VMID handling fixes
- FRU EEPROM fix
- BO validation fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221215224936.6438-1-alexander.deucher@amd.com
2022-12-20 15:21:18 +10:00
Jason A. Donenfeld
3c202d14a9 prandom: remove prandom_u32_max()
Convert the final two users of prandom_u32_max() that slipped in during
6.2-rc1 to use get_random_u32_below().

Then, with no more users left, we can finally remove the deprecated
function.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-12-20 03:13:45 +01:00
Johan Hovold
41a15855c1 efi: random: fix NULL-deref when refreshing seed
Do not try to refresh the RNG seed in case the firmware does not support
setting variables.

This is specifically needed to prevent a NULL-pointer dereference on the
Lenovo X13s with some firmware revisions, or more generally, whenever
the runtime services have been disabled (e.g. efi=noruntime or with
PREEMPT_RT).

Fixes: e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
Reported-by: Steev Klimaszewski <steev@kali.org>
Reported-by: Bjorn Andersson <andersson@kernel.org>
Tested-by: Steev Klimaszewski <steev@kali.org>
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sc8280xp-lenovo-thinkpad-x13s
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-12-20 03:13:45 +01:00
Jason A. Donenfeld
6bb20c152b random: do not include <asm/archrandom.h> from random.h
The <asm/archrandom.h> header is a random.c private detail, not
something to be called by other code. As such, don't make it
automatically available by way of random.h.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-12-20 03:13:45 +01:00
Jakub Kicinski
4be84df38a linux-can-fixes-for-6.2-20221219
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEBsvAIBsPu6mG7thcrX5LkNig010FAmOgfpUTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRCtfkuQ2KDTXZWcB/96Hn9tedLIt0B04oxycKxXD3DeISHy
 HlekzWLi9p3/EzrYb3KE7+9mPC35GWtzEavCcxkqwLQAft8ZosHUBhdF5+84Tbr/
 Rk6kNuP4QKxCq4fkm1xIShT0jo0978XxIzr2bFggsz2UZOTa+DwnAQu7WfgkpI30
 uBzWmlFYmQ7NswooXDdJ0bXlPr+RejdeezQsLgbq0JH2cw0DUJjEXBAsnvqhsviG
 mWLT4KE57hXseEIw3CS44ARgFLEVcIpFUuzHnHkIYI/4e5KY3F04KeCMSh5LgA45
 1VRa4X60ONDtShCCuqA+/+xK1A/cqHToL8wAraVV9htO0moen3WYzuov
 =ae/5
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-6.2-20221219' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2022-12-19

The first patch is by Vincent Mailhol and adds the etas_es58x
devlink documentation to the index.

Haibo Chen's patch for the flexcan driver fixes a unbalanced
pm_runtime_enable warning.

The last patch is by me, targets the kvaser_usb driver and fixes
an error occurring with gcc-13.

* tag 'linux-can-fixes-for-6.2-20221219' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: kvaser_usb: hydra: help gcc-13 to figure out cmd_len
  can: flexcan: avoid unbalanced pm_runtime_enable warning
  Documentation: devlink: add missing toc entry for etas_es58x devlink doc
====================

Link: https://lore.kernel.org/r/20221219155210.1143439-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-19 17:47:59 -08:00
Jakub Kicinski
918fb1aaa2 Merge branch 'stop-corrupting-socket-s-task_frag'
Benjamin Coddington says:

====================
Stop corrupting socket's task_frag

The networking code uses flags in sk_allocation to determine if it can use
current->task_frag, however in-kernel users of sockets may stop setting
sk_allocation when they convert to the preferred memalloc_nofs_save/restore,
as SUNRPC has done in commit a1231fda7e94 ("SUNRPC: Set memalloc_nofs_save()
on all rpciod/xprtiod jobs").

This will cause corruption in current->task_frag when recursing into the
network layer for those subsystems during page fault or reclaim.  The
corruption is difficult to diagnose because stack traces may not contain the
offending subsystem at all.  The corruption is unlikely to show up in
testing because it requires memory pressure, and so subsystems that
convert to memalloc_nofs_save/restore are likely to continue to run into
this issue.

Previous reports and proposed fixes:
https://lore.kernel.org/netdev/96a18bd00cbc6cb554603cc0d6ef1c551965b078.1663762494.git.gnault@redhat.com/
https://lore.kernel.org/netdev/b4d8cb09c913d3e34f853736f3f5628abfd7f4b6.1656699567.git.gnault@redhat.com/
https://lore.kernel.org/linux-nfs/de6d99321d1dcaa2ad456b92b3680aa77c07a747.1665401788.git.gnault@redhat.com/

Guilluame Nault has done all of the hard work tracking this problem down and
finding the best fix for this issue.  I'm just taking a turn posting another
fix.
====================

Link: https://lore.kernel.org/r/cover.1671194454.git.bcodding@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-19 17:28:52 -08:00
Benjamin Coddington
08f65892c5 net: simplify sk_page_frag
Now that in-kernel socket users that may recurse during reclaim have benn
converted to sk_use_task_frag = false, we can have sk_page_frag() simply
check that value.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-19 17:28:50 -08:00
Benjamin Coddington
98123866fc Treewide: Stop corrupting socket's task_frag
Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the
GFP_NOIO flag on sk_allocation which the networking system uses to decide
when it is safe to use current->task_frag.  The results of this are
unexpected corruption in task_frag when SUNRPC is involved in memory
reclaim.

The corruption can be seen in crashes, but the root cause is often
difficult to ascertain as a crashing machine's stack trace will have no
evidence of being near NFS or SUNRPC code.  I believe this problem to
be much more pervasive than reports to the community may indicate.

Fix this by having kernel users of sockets that may corrupt task_frag due
to reclaim set sk_use_task_frag = false.  Preemptively correcting this
situation for users that still set sk_allocation allows them to convert to
memalloc_nofs_save/restore without the same unexpected corruptions that are
sure to follow, unlikely to show up in testing, and difficult to bisect.

CC: Philipp Reisner <philipp.reisner@linbit.com>
CC: Lars Ellenberg <lars.ellenberg@linbit.com>
CC: "Christoph Böhmwalder" <christoph.boehmwalder@linbit.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Josef Bacik <josef@toxicpanda.com>
CC: Keith Busch <kbusch@kernel.org>
CC: Christoph Hellwig <hch@lst.de>
CC: Sagi Grimberg <sagi@grimberg.me>
CC: Lee Duncan <lduncan@suse.com>
CC: Chris Leech <cleech@redhat.com>
CC: Mike Christie <michael.christie@oracle.com>
CC: "James E.J. Bottomley" <jejb@linux.ibm.com>
CC: "Martin K. Petersen" <martin.petersen@oracle.com>
CC: Valentina Manea <valentina.manea.m@gmail.com>
CC: Shuah Khan <shuah@kernel.org>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: David Howells <dhowells@redhat.com>
CC: Marc Dionne <marc.dionne@auristor.com>
CC: Steve French <sfrench@samba.org>
CC: Christine Caulfield <ccaulfie@redhat.com>
CC: David Teigland <teigland@redhat.com>
CC: Mark Fasheh <mark@fasheh.com>
CC: Joel Becker <jlbec@evilplan.org>
CC: Joseph Qi <joseph.qi@linux.alibaba.com>
CC: Eric Van Hensbergen <ericvh@gmail.com>
CC: Latchesar Ionkov <lucho@ionkov.net>
CC: Dominique Martinet <asmadeus@codewreck.org>
CC: Ilya Dryomov <idryomov@gmail.com>
CC: Xiubo Li <xiubli@redhat.com>
CC: Chuck Lever <chuck.lever@oracle.com>
CC: Jeff Layton <jlayton@kernel.org>
CC: Trond Myklebust <trond.myklebust@hammerspace.com>
CC: Anna Schumaker <anna@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>

Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-19 17:28:49 -08:00
Guillaume Nault
fb87bd4751 net: Introduce sk_use_task_frag in struct sock.
Sockets that can be used while recursing into memory reclaim, like
those used by network block devices and file systems, mustn't use
current->task_frag: if the current process is already using it, then
the inner memory reclaim call would corrupt the task_frag structure.

To avoid this, sk_page_frag() uses ->sk_allocation to detect sockets
that mustn't use current->task_frag, assuming that those used during
memory reclaim had their allocation constraints reflected in
->sk_allocation.

This unfortunately doesn't cover all cases: in an attempt to remove all
usage of GFP_NOFS and GFP_NOIO, sunrpc stopped setting these flags in
->sk_allocation, and used memalloc_nofs critical sections instead.
This breaks the sk_page_frag() heuristic since the allocation
constraints are now stored in current->flags, which sk_page_frag()
can't read without risking triggering a cache miss and slowing down
TCP's fast path.

This patch creates a new field in struct sock, named sk_use_task_frag,
which sockets with memory reclaim constraints can set to false if they
can't safely use current->task_frag. In such cases, sk_page_frag() now
always returns the socket's page_frag (->sk_frag). The first user is
sunrpc, which needs to avoid using current->task_frag but can keep
->sk_allocation set to GFP_KERNEL otherwise.

Eventually, it might be possible to simplify sk_page_frag() by only
testing ->sk_use_task_frag and avoid relying on the ->sk_allocation
heuristic entirely (assuming other sockets will set ->sk_use_task_frag
according to their constraints in the future).

The new ->sk_use_task_frag field is placed in a hole in struct sock and
belongs to a cache line shared with ->sk_shutdown. Therefore it should
be hot and shouldn't have negative performance impacts on TCP's fast
path (sk_shutdown is tested just before the while() loop in
tcp_sendmsg_locked()).

Link: https://lore.kernel.org/netdev/b4d8cb09c913d3e34f853736f3f5628abfd7f4b6.1656699567.git.gnault@redhat.com/
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-19 17:28:49 -08:00
Matt Johnston
b389a902dd mctp: Remove device type check at unregister
The unregister check could be incorrectly triggered if a netdev
changes its type after register. That is possible for a tun device
using TUNSETLINK ioctl, resulting in mctp unregister failing
and the netdev unregister waiting forever.

This was encountered by https://github.com/openthread/openthread/issues/8523

Neither check at register or unregister is required. They were added in
an attempt to track down mctp_ptr being set unexpectedly, which should
not happen in normal operation.

Fixes: 7b1871af75f3 ("mctp: Warn if pointer is set for a wrong dev type")
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://lore.kernel.org/r/20221215054933.2403401-1-matt@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-19 17:20:22 -08:00
Arun Ramadoss
62e027fb0e net: dsa: microchip: remove IRQF_TRIGGER_FALLING in request_threaded_irq
KSZ swithes used interrupts for detecting the phy link up and down.
During registering the interrupt handler, it used IRQF_TRIGGER_FALLING
flag. But this flag has to be retrieved from device tree instead of hard
coding in the driver, so removing the flag.

Fixes: ff319a644829 ("net: dsa: microchip: move interrupt handling logic from lan937x to ksz_common")
Reported-by: Christian Eggers <ceggers@arri.de>
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Link: https://lore.kernel.org/r/20221213101440.24667-1-arun.ramadoss@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-19 17:18:48 -08:00
Linus Torvalds
850f7a5cab ARM: SoC fixes for 6.2
These are a couple of build fixes from randconfig testing,
 plus a set of Mediatek SoC specific fixes, all trivial.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmOgvC4ACgkQmmx57+YA
 GNljSRAArj/5Kdl0oISLPRr24zFMzpjN3gAdr0ZmAWw0ZUH5aLMp6aiXEtd2+NU1
 ZY33Gsj1Dxz05FYsoMIVNnIpr/6UzrCooSErJfEHaF+rojKvCguJD7tF18VmRRkn
 4m7+U9QoOhn7ho0P83bjZYqsgyfwOEZyKVVy2Hk29JQpiZzN6QQLCR7ecXSAmVhb
 JiQIt3Rcq+AriLHp1dx49dYI6b35zhdygCGIo5I7+V+vGDfzaSPCsTcTvv9NK1hr
 t6dztG5l9nENybIspLjfC9XlaRtoyRFyTGKTcLe2K0dnLlTs8J/kW8/WGPvYAtNJ
 BXc0Qw1117/mKkP24Y3i1+GGvMgp2qarW8Pcl6OBTPcg7h0Ac1ukg/mK0mF1eIDf
 4GKjPFyNctNb1vJXdcBI2x3On97vosxokSzrzs53axidRmEdj7JOSaJOx3dj4ExX
 Ue51+wOqKSAmzWfJmRWUGy7ifKtd1sCsC5z2w/9OAr5K9LdWbcfKXMhHjOsduiLL
 EUL7Z37FNGYPKIr2ZM3wjhmnl3IwzPzirmhWRq+ekzaSvmZCeWimXr5r/U8bXE3P
 vXPoiTF2sUfwh66WvEGXgxSCxRNFfsEI1mH9S8X0PFNV+AfN+eNFY/Mr0kNMBv2W
 gg12BolLjvXtf8yPVRG9TndJXOUpqmZsaUuQt5c6QKsU24NcpCw=
 =qUCm
 -----END PGP SIGNATURE-----

Merge tag 'soc-fixes-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
 "These are a couple of build fixes from randconfig testing, plus a set
  of Mediatek SoC specific fixes, all trivial"

* tag 'soc-fixes-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  soc: tegra: fix CPU_BIG_ENDIAN dependencies
  ARM: disallow pre-ARMv5 builds with ld.lld
  ARM: pxa: fix building with clang
  MAINTAINERS: add related dts to IXP4xx
  ARM: dts: spear: drop 0x from unit address
  arm64: dts: mt8183: Fix Mali GPU clock
  arm64: dts: mediatek: mt8195-demo: fix the memory size of node secmon
  soc: mediatek: pm-domains: Fix the power glitch issue
2022-12-19 16:07:59 -06:00
Linus Torvalds
6feb57c2fd Kbuild updates for v6.2
- Support zstd-compressed debug info
 
  - Allow W=1 builds to detect objects shared among multiple modules
 
  - Add srcrpm-pkg target to generate a source RPM package
 
  - Make the -s option detection work for future GNU Make versions
 
  - Add -Werror to KBUILD_CPPFLAGS when CONFIG_WERROR=y
 
  - Allow W=1 builds to detect -Wundef warnings in any preprocessed files
 
  - Raise the minimum supported version of binutils to 2.25
 
  - Use $(intcmp ...) to compare integers if GNU Make >= 4.4 is used
 
  - Use $(file ...) to read a file if GNU Make >= 4.2 is used
 
  - Print error if GNU Make older than 3.82 is used
 
  - Allow modpost to detect section mismatches with Clang LTO
 
  - Include vmlinuz.efi into kernel tarballs for arm64 CONFIG_EFI_ZBOOT=y
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmOeImsVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsG06IP/iVjuWFvnjDZT4X8X6zN8aKp1vtR
 EMkmoRtt5cD4CLb1MG4N7irYHgedQSx4rYceP45MyW1I3egl6Ct14RDyeQ1xSIZb
 XFTLDCZvfl/up3MdiqNAqKRS7x5lk9++7F0t+2SoQxKQyJvm735XreX+VhZ1FeLB
 qcHrmzJ5veky5Ry/3OkNUgKFBjKEAL+qKMc55uvkXqfTb3KoBa2r4VC1OaoYGRru
 R8oF9qQRnGVQAl/LbBVchmgSjxryxPrCvBGiKlK03VkXdzEMHMimEJh3BQ6e0PGo
 gajdk+4liy7z+jQnI7jFhvJjGKzkEP/Bc99M/uS92QX5MgpH6mqpHMoqqPiqW87K
 RmZH37FqRu1Vo8dpibmH6r2K6YD/HHRjaDHk1VuuCQYEn0dsNmokPXOqd/1v0I1i
 TXPjWOw1AID5vMJWllqxFhpeVvf0vx5BT/UNrh68MLqlJZzv2eMVJb4fNy6640ml
 U0NclMnOa3eOmf5z1T7/LqDRTa63Q0kpanRrBpcmVOaqW+ZpQ3SQjh4uBN1PyJHL
 cX3Skc341DyRlFiT54QhGKlm57MEb2gjhBZ3Z4J+b7sEFgvjXH/W8vcOGIKlppmA
 CfYMyres4OV+fJc89ONkWsvLiOP1OeUGPvytm33J5QMKXc8SzOLP0D/F8kjrDflm
 EROKuZ4EA5ej/rOy
 =Ig/Y
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Support zstd-compressed debug info

 - Allow W=1 builds to detect objects shared among multiple modules

 - Add srcrpm-pkg target to generate a source RPM package

 - Make the -s option detection work for future GNU Make versions

 - Add -Werror to KBUILD_CPPFLAGS when CONFIG_WERROR=y

 - Allow W=1 builds to detect -Wundef warnings in any preprocessed files

 - Raise the minimum supported version of binutils to 2.25

 - Use $(intcmp ...) to compare integers if GNU Make >= 4.4 is used

 - Use $(file ...) to read a file if GNU Make >= 4.2 is used

 - Print error if GNU Make older than 3.82 is used

 - Allow modpost to detect section mismatches with Clang LTO

 - Include vmlinuz.efi into kernel tarballs for arm64 CONFIG_EFI_ZBOOT=y

* tag 'kbuild-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (29 commits)
  buildtar: fix tarballs with EFI_ZBOOT enabled
  modpost: Include '.text.*' in TEXT_SECTIONS
  padata: Mark padata_work_init() as __ref
  kbuild: ensure Make >= 3.82 is used
  kbuild: refactor the prerequisites of the modpost rule
  kbuild: change module.order to list *.o instead of *.ko
  kbuild: use .NOTINTERMEDIATE for future GNU Make versions
  kconfig: refactor Makefile to reduce process forks
  kbuild: add read-file macro
  kbuild: do not sort after reading modules.order
  kbuild: add test-{ge,gt,le,lt} macros
  Documentation: raise minimum supported version of binutils to 2.25
  kbuild: add -Wundef to KBUILD_CPPFLAGS for W=1 builds
  kbuild: move -Werror from KBUILD_CFLAGS to KBUILD_CPPFLAGS
  kbuild: Port silent mode detection to future gnu make.
  init/version.c: remove #include <generated/utsrelease.h>
  firmware_loader: remove #include <generated/utsrelease.h>
  modpost: Mark uuid_le type to be suitable only for MEI
  kbuild: add ability to make source rpm buildable using koji
  kbuild: warn objects shared among multiple modules
  ...
2022-12-19 12:33:32 -06:00