919265 Commits

Author SHA1 Message Date
Roopa Prabhu
8590ceedb7 nexthop: add support for notifiers
This patch adds nexthop add/del notifiers. To be used by
vxlan driver in a later patch. Could possibly be used by
switchdev drivers in the future.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:00:38 -07:00
Roopa Prabhu
1274e1cc42 vxlan: ecmp support for mac fdb entries
Todays vxlan mac fdb entries can point to multiple remote
ips (rdsts) with the sole purpose of replicating
broadcast-multicast and unknown unicast packets to those remote ips.

E-VPN multihoming [1,2,3] requires bridged vxlan traffic to be
load balanced to remote switches (vteps) belonging to the
same multi-homed ethernet segment (E-VPN multihoming is analogous
to multi-homed LAG implementations, but with the inter-switch
peerlink replaced with a vxlan tunnel). In other words it needs
support for mac ecmp. Furthermore, for faster convergence, E-VPN
multihoming needs the ability to update fdb ecmp nexthops independent
of the fdb entries.

New route nexthop API is perfect for this usecase.
This patch extends the vxlan fdb code to take a nexthop id
pointing to an ecmp nexthop group.

Changes include:
- New NDA_NH_ID attribute for fdbs
- Use the newly added fdb nexthop groups
- makes vxlan rdsts and nexthop handling code mutually
  exclusive
- since this is a new use-case and the requirement is for ecmp
nexthop groups, the fdb add and update path checks that the
nexthop is really an ecmp nexthop group. This check can be relaxed
in the future, if we want to introduce replication fdb nexthop groups
and allow its use in lieu of current rdst lists.
- fdb update requests with nexthop id's only allowed for existing
fdb's that have nexthop id's
- learning will not override an existing fdb entry with nexthop
group
- I have wrapped the switchdev offload code around the presence of
rdst

[1] E-VPN RFC https://tools.ietf.org/html/rfc7432
[2] E-VPN with vxlan https://tools.ietf.org/html/rfc8365
[3] http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf

Includes a null check fix in vxlan_xmit from Nikolay

v2 - Fixed build issue:
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:00:38 -07:00
Roopa Prabhu
38428d6871 nexthop: support for fdb ecmp nexthops
This patch introduces ecmp nexthops and nexthop groups
for mac fdb entries. In subsequent patches this is used
by the vxlan driver fdb entries. The use case is
E-VPN multihoming [1,2,3] which requires bridged vxlan traffic
to be load balanced to remote switches (vteps) belonging to
the same multi-homed ethernet segment (This is analogous to
a multi-homed LAG but over vxlan).

Changes include new nexthop flag NHA_FDB for nexthops
referenced by fdb entries. These nexthops only have ip.
This patch includes appropriate checks to avoid routes
referencing such nexthops.

example:
$ip nexthop add id 12 via 172.16.1.2 fdb
$ip nexthop add id 13 via 172.16.1.3 fdb
$ip nexthop add id 102 group 12/13 fdb

$bridge fdb add 02:02:00:00:00:13 dev vxlan1000 nhid 101 self

[1] E-VPN https://tools.ietf.org/html/rfc7432
[2] E-VPN VxLAN: https://tools.ietf.org/html/rfc8365
[3] LPC talk with mention of nexthop groups for L2 ecmp
http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf

v4 - fixed uninitialized variable reported by kernel test robot
Reported-by: kernel test robot <rong.a.chen@intel.com>

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:00:38 -07:00
David S. Miller
7b1b843a1e Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2020-05-21

This series contains updates to igc and e1000.

Andre cleans up code that was left over from the igb driver that handled
MAC address filters based on the source address, which is not currently
supported.  Simplifies the MAC address filtering code and prepare the
igc driver for future source address support.  Updated the MAC address
filter internal APIs to support filters based on source address.  Added
support for Network Flow Classification (NFC) rules based on source MAC
address.  Cleaned up the 'cookie' field which is not used anywhere in
the code and cleaned up a wrapper function that was not needed.
Simplified the filtering code for readability and aligned the ethtool
functions, so that function names were consistent.

Alex provides a fix for e1000 to resolve a deadlock issue when NAPI is
being disabled.

Sasha does additional cleanup of the igc driver of dead code that is not
used or needed.

v2: Fix the function header comment in patch 3 of the series, based on
    the feedback from Jakub Kicinski.
====================

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 13:48:50 -07:00
Linus Torvalds
444565650a io_uring-5.7-2020-05-22
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl7IByoQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpigmEACWJoK7zk5OK2RhavpzOsb2SDu8nz/YAbUe
 +R6tXAjwe4Z7lVVa+FW/fmN9/mQcjyYRIbbG564IFs5fe6hPUoOjUzHqvGTOFLHd
 Fw8mjVKgWjWAE5GdoX6ATauLhVwwjnImej1PNfO/J5y29o0SQksP8MbM0eGuuNx1
 piqxBj0/3h3YyPn1GeJmqxwwcsFhzHDqk7fbkfbQokZk+7SPiKpqWgJBa7AKSlNC
 N0WTluT4UOummQZw1RFynPfA4cCuX6XHVgWAa9h7vrJHXigvuMWqLaHG+MBFqeKu
 xD6PPnaCnMwcLRe4T2sJvtjxmNSdyr15Q2kGkIi/RhohSIn4u/y8jEA6wTprCP48
 rDi30dn1o2LwUj2S1NO3YCOV8jIKWUguztEvKiAXmjf4KDZIDd4/OwrFsJdb4vg9
 EuK86SEwXbvFHf9nu1M7pHlGThKfQi0CiK6C6M7Qb/kOthio72wwZ46gGkwLDk5z
 DZWHymHBhQw/z1c20loX7pBvFIzLzbuYUThf23UegPzXVqqQfBkqs4BGFcOGuqy6
 yfEYF/MAX/O/TQgm2dDQHrhl05AevLu/UQXMXZ8Ha6OrmlC4C2qu3Te/iZO8FUew
 YIx5H5XmBh93McjpmJ8VCn7CjE+y/ufNTMdvm8WzCyAIfH40gfcyLangpre26QoJ
 CCAARffXrQ==
 =ZYUy
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.7-2020-05-22' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "A small collection of small fixes that should go into this release:

   - Two fixes for async request preparation (Pavel)

   - Busy clear fix for SQPOLL (Xiaoguang)

   - Don't use kiocb->private for O_DIRECT buf index, some file systems
     use it (Bijan)

   - Kill dead check in io_splice()

   - Ensure sqo_wait is initialized early

   - Cancel task_work if we fail adding to original process

   - Only add (IO)pollable requests to iopoll list, fixing a regression
     in this merge window"

* tag 'io_uring-5.7-2020-05-22' of git://git.kernel.dk/linux-block:
  io_uring: reset -EBUSY error when io sq thread is waken up
  io_uring: don't add non-IO requests to iopoll pending list
  io_uring: don't use kiocb.private to store buf_index
  io_uring: cancel work if task_work_add() fails
  io_uring: remove dead check in io_splice()
  io_uring: fix FORCE_ASYNC req preparation
  io_uring: don't prepare DRAIN reqs twice
  io_uring: initialize ctx->sqo_wait earlier
2020-05-22 11:12:30 -07:00
Linus Torvalds
db9f384785 block-5.7-2020-05-22
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl7IBtAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpl6yEAC2PiQwyQS5ZGw/gSEuHXYmoRTdqPz1Q3MP
 U3fzwqrat+WWtdeRsyReDGKMVkgJYtPA2aVKtv4BrMVpW+5ShK0az3IqzHc7kUjq
 ktEqWcmtXft/CFcYpLz8hyBNuflkU/qIQ6pJMPzzCOjoX5pADiigMGCe77FLCDK2
 LnaJyryNKGSQyYO1YD8EOdV/yAUw8FiUcXXBb4v3EbYpaoFqK7vmK9wJfBfHDpcI
 15P0clirNDbdZkjbrcaF5GExwh86RJswSUn+fzXF2nuIJDV0Iqecp9mFNeIRvbHd
 bHpCrSsTKZarq/D0ObCrM633F7AKz5R2OO8RZe8C5kRpmiTiqSFixKcG77wI9+oO
 ovZmX+AZWXsmrlYAM1HJqHMn7NHGBsUXFvPmxz6mE7NYrw4SVUkbzBA3egQN20iE
 NVEkNSjhRJhmogMeIs26G5qtr4c/g/dmSROxRYHjQNIfbFJBW/AsyvOBOVi37Yxv
 RUsek8J71IdhzYJr+b4aNkqZvgxUlSLGUDOMFMkk0hlNhdQ/K99y5rJxE06+gmdl
 6MaOdmd7U/hOpybsxobkWCJ718KohVId1gEnQFap2tvAe2WPAgXPlrCnYd1vQpFm
 /c5qdiMNOf7A8QQ8mq/cYnQfw1ncY1dUThrOsoJ96TBBx7i/M5oM/ZjBqVaWDU4r
 TkuEEUjaNw==
 =yyR2
 -----END PGP SIGNATURE-----

Merge tag 'block-5.7-2020-05-22' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Two fixes for null_blk zone mode"

* tag 'block-5.7-2020-05-22' of git://git.kernel.dk/linux-block:
  null_blk: don't allow discard for zoned mode
  null_blk: return error for invalid zone size
2020-05-22 11:10:42 -07:00
Linus Torvalds
b09ca17a2a RISC-V Fixes for 5.7-rc7
This tag contains two fixes:
 
 * Another !MMU build fix that was a straggler from last week.
 * A fix to use the "register" keyword for the GP global register variable.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAl7H9tYTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiewuEACejtJD1KbV4MoHSKw98WwtCPp7xceF
 pm73DR5drm03T+dBIt0IQe9MnlAN6oMuPw1NvwHhJWX1ezvs1o2ZegZw7ZA5uczl
 QFqHFCajzzRyF7C0KCukONYhGV67xrrapVfFMfytvdCTuhTjYUHQAjXHuciI2BiP
 1VeGZzECj4nRAYFis9GVO652D3LRA+LZ0Op10IqsBHcV3Iia75GREj9IK34+Uwzq
 Xe+pVh9GVBINz6WVX8rBcK67z76pMNnTX2hMoHWbunHobXeYnh/uYZ79bcOrGFGc
 g2KTTDpYqU7FWRBqmj/P6rqNrUFFStMsH79YGabr3kOT/yyYLrGf17oZtrp5oy92
 68EMuM5DSO/2jRD8sHtW6gg1k7Kv1oo49TQlBb26ifTiOEGhCE8bD8rI4ADsx4N+
 6Oj5P7NTfqgc++e8NMRES59MiQNUg40nsA0AP1omwD163LN8bZNmI5kZZnvazH4k
 ViIGgwrPUKL8S1MxQ2S4maJhGdQ2UOsbPi99zQ+o6NxR2Dxf7OoqsrJYzDnVS3f6
 vmczgYshlXti6qhNbLmK8HIC4g0FzDg3CUpFvrezyWJJhmebyKRWGhUsaMLm9YDO
 vwX7UnOv9VMKzvOVoS9iDoIXSFtt1dt4SxZ1rQCPJwCb0fqBc/QqVL1pBtSh4iM8
 pL3DDCnOAgvr0Q==
 =L8xn
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:
 "Two fixes:

   - Another !MMU build fix that was a straggler from last week

   - A fix to use the "register" keyword for the GP global register
     variable"

* tag 'riscv-for-linus-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: gp_in_global needs register keyword
  riscv: Fix print_vm_layout build error if NOMMU
2020-05-22 11:08:26 -07:00
Borislav Petkov
9bb4cbf486 EFI fixes for v5.7-rc6:
- fix EFI framebuffer earlycon for wide fonts
 - avoid filling screen_info with garbage if the EFI framebuffer is not
   available
 - fix a potential host tool build error due to a symbol clash on x86
 - work around a EFI firmware bug regarding the binary format of the TPM
   final events table
 - fix a missing memory free by reworking the E820 table sizing routine to
   not do the allocation in the first place
 - add CPER parsing for firmware errors
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEnNKg2mrY9zMBdeK7wjcgfpV0+n0FAl7H3HIACgkQwjcgfpV0
 +n1pEAgAjJfwDJmBcYhJzjX8WLnXPJiUmUH9d9tF1t3TlhF6c1G8auXU+Fyia4uI
 ejRNw/N4+SXzM9yL+Z19PKBpQsPzQXgm2r9WTPVN5jTelUUI+jFZCH+pKC+TKRp1
 /Tx/XIMifCw18gNXsjj6WJEeAyLoh4tb+6bwn7DlPO5cPrxX49LvPuQNMXybk2yi
 KimdNKUry1wYpo/WpHqEdFq5//CLAWNkrL9UXlkANvQ6BJNIMI0kRIUC0MVsTMnE
 BoCkBO93PdvqxOcnV3WTRvSFetb7qA59Jay62jLc26Myqc4t4pgVWojVm6RHLfZg
 17btYACxICgF2mNTZYlKemEEqKPpzQ==
 =mY5f
 -----END PGP SIGNATURE-----

Merge tag 'efi-fixes-for-v5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/urgent

Pull EFI fixes from Ard Biesheuvel:

"- fix EFI framebuffer earlycon for wide fonts
 - avoid filling screen_info with garbage if the EFI framebuffer is not
   available
 - fix a potential host tool build error due to a symbol clash on x86
 - work around a EFI firmware bug regarding the binary format of the TPM
   final events table
 - fix a missing memory free by reworking the E820 table sizing routine to
   not do the allocation in the first place
 - add CPER parsing for firmware errors"
2020-05-22 20:06:25 +02:00
Josh Poimboeuf
187b96db5c x86/unwind/orc: Fix unwind_get_return_address_ptr() for inactive tasks
Normally, show_trace_log_lvl() scans the stack, looking for text
addresses to print.  In parallel, it unwinds the stack with
unwind_next_frame().  If the stack address matches the pointer returned
by unwind_get_return_address_ptr() for the current frame, the text
address is printed normally without a question mark.  Otherwise it's
considered a breadcrumb (potentially from a previous call path) and it's
printed with a question mark to indicate that the address is unreliable
and typically can be ignored.

Since the following commit:

  f1d9a2abff66 ("x86/unwind/orc: Don't skip the first frame for inactive tasks")

... for inactive tasks, show_trace_log_lvl() prints *only* unreliable
addresses (prepended with '?').

That happens because, for the first frame of an inactive task,
unwind_get_return_address_ptr() returns the wrong return address
pointer: one word *below* the task stack pointer.  show_trace_log_lvl()
starts scanning at the stack pointer itself, so it never finds the first
'reliable' address, causing only guesses to being printed.

The first frame of an inactive task isn't a normal stack frame.  It's
actually just an instance of 'struct inactive_task_frame' which is left
behind by __switch_to_asm().  Now that this inactive frame is actually
exposed to callers, fix unwind_get_return_address_ptr() to interpret it
properly.

Fixes: f1d9a2abff66 ("x86/unwind/orc: Don't skip the first frame for inactive tasks")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200522135435.vbxs7umku5pyrdbk@treble
2020-05-22 19:55:17 +02:00
Linus Torvalds
4286d192c8 - Bring the PTRACE_SYSEMU semantics in line with the man page.
- Annotate variable assignment in get_user() with the type to avoid
   sparse warnings.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl7H+w0ACgkQa9axLQDI
 XvEiVQ//azhBez3lTHeEgh2p45d75Na8HrrHjGGvQ9lhvtMBisrh/N7fp41Fl1vY
 2nG3YJqUY7SZJG2EZkF7OQOMHUjpGE3kHJX3YAn6dCk2RzNDe1TCXKrP0r7xkbLh
 OW3MpBsUbGksR8THYJsZ+GiTLZ/BrnZJjpJxcMHNwxnNZeVKT1OY4nJSebUo5Rdf
 4lm1jMuSiKuHHfkucp1UsGe3QQxsWYpFahjMgU/iCm3BhlzWvocV8NNG6na10SIb
 unT4+sXmFM/0pzVFBSdObmeEik4wSibkKh76A8z0SROMkPCtigu2n9RdDtii22ia
 uiixQ6CkEYaXaGBXFqjE50dKkMCH0EYpZiEPp22QGHkLRSKI6ZZi0wvQmTvYNLMI
 3a6Ptw0u9Ym/MH17HmjCiBuEfWOQeNhOM8YRcMqhHoguTz6Xj/mglbX8DDC37YAw
 MnGNH0a/uhBy35eujgW92b2wvNMWcYnuRSiSX8/PSsG3Hxe36nWTI8y20dQ88wAR
 wTqzNExmWDQqbTmaSAM0gGfDPkaSiiY+WDAO2FdB/jtovmefB/RBXVmwqbjBsigX
 OtXuyF/rVitZ6S3SXal5YBT4xey/MI4NRbnwCm4edXyz+Hz7cDM2iQwDBaiaVZj8
 4QHn+TqJTUKWeHRfLuMzg2a5UbCnEZGVBNEPtVxx9bSGqjQ2o+Q=
 =HdPI
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - Bring the PTRACE_SYSEMU semantics in line with the man page.

 - Annotate variable assignment in get_user() with the type to avoid
   sparse warnings.

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Add get_user() type annotation on the !access_ok() path
  arm64: Fix PTRACE_SYSEMU semantics
2020-05-22 09:34:19 -07:00
Linus Torvalds
f5ca7a7161 sound fixes for 5.7-rc7
Just a few small fixes: the only significant one is a slight
 improvement for PCM running position update with no-period-elapsed
 case while the rest are HD-audio fixups and ice1712 model quirk.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAl7H5JwOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE/yUQ//VnGdWfwTLteuJ1Anwym1HAgjihycdRBymGJV
 YA0hx+wBRn9QAjBlBh/qlQSyWMLFqrh0luQS5WuctDvgXDsF0LsMy9IcZqOr1dS5
 wJA4deEoUgU99obgGGyjc+6mDwINKdt2WaTT9q82C+2AHjhOZl+XBvwUuODTTr+l
 gGgRQA1J8YmIIWrsuZy7LaarvqwdPqk4RLH/fVGDW9b5Q3lVaUyEQ7Aw42iZTLIq
 A+wGqZJ7EkylbGJM/t/C98fku4kW3OYob1tzm9n65N5JGdyq83Gvz1/VcwWmtUTz
 LMfPutFGHHHxW5Kkn9M7KEyMzoKr4XPUfijrxbFzZq6WvWRO/AHcUc6FWyBS3Zgf
 x+Xo2C02LDO7p8sv9mFjDQhYNp6M6IAFN+ATg7hZBO/MJL9Ed6tCnGqvAG+8X2PB
 2FCoIk/btCxCFv/1BKGWJy6WjRqQ865WONSSsFpym+JHlkfmi36tw/8RPvSwninM
 32O6KG7DXi2khbH1b7NEOmmLjw+PMWvNy2CJIKu1l7DaGaUNkJZO9uXO5Q7y45nb
 otKTX9TOmlus4mESZqYeBQcIlHN4yKChGazTaIu6bxLIBWdBSuoNs/PXmIQ3on6S
 qr9OQ/DM/i/Ce/hON3cNII6bVmcbjk+CqHmyYKVtMxOiUXxoi7Nd5pFvefqL44UF
 0ZeM7EM=
 =8gyV
 -----END PGP SIGNATURE-----

Merge tag 'sound-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Just a few small fixes: the only significant one is a slight
  improvement for PCM running position update with no-period-elapsed
  case while the rest are HD-audio fixups and ice1712 model quirk"

* tag 'sound-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek - Add more fixup entries for Clevo machines
  ALSA: iec1712: Initialize STDSP24 properly when using the model=staudio option
  ALSA: hda/realtek - Fix silent output on Gigabyte X570 Aorus Xtreme
  ALSA: pcm: fix incorrect hw_base increase
2020-05-22 09:22:22 -07:00
Al Viro
8cfb347ad0 arm64: Add get_user() type annotation on the !access_ok() path
Sparse reports "Using plain integer as NULL pointer" when the arm64
__get_user_error() assigns 0 to a pointer type. Use proper type
annotation.

Signed-of-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: kbuild test robot <lkp@intel.com>
Link: http://lkml.kernel.org/r/20200522142321.GP23230@ZenIV.linux.org.uk
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-05-22 16:59:49 +01:00
Linus Torvalds
c8347bbf19 powerpc fixes for 5.7 #5
A revert of a recent change to the PTE bits for 32-bit BookS, which broke swap.
 
 And a "fix" to disable STRICT_KERNEL_RWX for 64-bit in Kconfig, as it's causing
 crashes for some people.
 
 Thanks to:
   Christophe Leroy, Rui Salvaterra.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl7H1aMTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgG3sD/9L72cw3cI2TcDTw+OEv2S2TyXrZZc3
 09WyUeerEv8mK8pk1eH8jVk6BqQK9bVZaq/3zYxLr5vnL1m+CZ4Qr8eGy5AcV3AC
 HXiGKhLbEh4btuoN3NwJ6fvEzA85dMTWsowGpgW8JgX1o7rtJmro0XW9EndhZGd2
 WWMBDsWo+RaKODej0c0Bz3TAOVgvxalE1SSLq63Q1sRoPhAZAJ0l8K3ED/EgC+tb
 v/VUi3fQNJngIzlMBc0sNOPp7NgcnDXoozAkW5c2Bp7YURbzeU0oXmsMAxQnyzee
 MP4MY1fAHI3CYdQ7QVRRDpQsTc84bAXVD+te+zhUJejaNm3mWLojRVieYT98eZXi
 iCi4Q0aSuAh3H8rxaYgi9ZemUkSKn+5pLu4kIAyMkBtnTB50E1YqUXVxfPcqk48N
 Y3Fkd6AyZ2/HyxS3bBVAubT/+GxK8HgQNGUBaF7iS50QKd6fl8EKjEBK1tVbYrTj
 xH7lXJpBnLCIj2ygZE1mBLxG8UTLGTfdnpxVNfVkNsLZK4tdsMaQ/llOzVA1uBOY
 twaRAhJkC0RHKHak1KNIQ8gh6HPjqwfg+P6SXHvT347YlTbsKgZei9wHtnZy4lsD
 CAnSImfgJMbzXCoULSoQbgXW0PloRZ1Zz1+WdfxmNjcNsRSqBNoaS1CaPKr7f8to
 a5JEWrUY1D49YQ==
 =yBu+
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - a revert of a recent change to the PTE bits for 32-bit BookS, which
   broke swap.

 - a "fix" to disable STRICT_KERNEL_RWX for 64-bit in Kconfig, as it's
   causing crashes for some people.

Thanks to Christophe Leroy and Rui Salvaterra.

* tag 'powerpc-5.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: Disable STRICT_KERNEL_RWX
  Revert "powerpc/32s: reorder Linux PTE bits to better match Hash PTE bits."
2020-05-22 08:51:39 -07:00
Klaus Doth
7a839dbab1 misc: rtsx: Add short delay after exit from ASPM
DMA transfers to and from the SD card stall for 10 seconds and run into
timeout on RTS5260 card readers after ASPM was enabled.

Adding a short msleep after disabling ASPM fixes the issue on several
Dell Precision 7530/7540 systems I tested.

This function is only called when waking up after the chip went into
power-save after not transferring data for a few seconds. The added
msleep does therefore not change anything in data transfer speed or
induce any excessive waiting while data transfers are running, or the
chip is sleeping. Only the transition from sleep to active is affected.

Signed-off-by: Klaus Doth <kdlnx@doth.eu>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/4434eaa7-2ee3-a560-faee-6cee63ebd6d4@doth.eu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-22 13:27:38 +02:00
Tony Nguyen
5757cc7c8b ice: Rename build_ctob to ice_build_ctob
To make the function easier to identify as being part of the ice driver,
prepend ice to the function name.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Bruce Allan
c522d1f686 ice: remove unnecessary backslash
Self-explanatory.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Bruce Allan
86a2e00d20 ice: remove unnecessary check
The variable status cannot be zero due to a prior check of it; remove this
check.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Bruce Allan
92ace4824c ice: remove unnecessary expression that is always true
The else conditional expression is always true due to the if conditional
expression; remove it and add a comment to make it obvious still.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Lihong Yang
757976ab16 ice: Fix check for removing/adding mac filters
In function ice_set_mac_address, we will remove old dev_addr before
adding the new MAC. In the removing and adding process of the MAC,
there is no need to return error if the check finds the to-be-removed
dev_addr does not exist in the MAC filter list or the to-be-added mac
already exists, keep going or return success accordingly.

Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Michal Swiatkowski
1b8f15b64a ice: refactor filter functions
Move filter functions to separate file.

Add functions that prepare suitable ice_fltr_info struct
depending on the filter type and add this struct to earlier created
list:
- ice_fltr_add_mac_to_list
- ice_fltr_add_vlan_to_list
- ice_fltr_add_eth_to_list
This functions are used in adding and removing filters.

Create wrappers for functions mentioned above that alloc list,
add suitable ice_fltr_info to it and call add or remove function.
- ice_fltr_prepare_mac
- ice_fltr_prepare_mac_and_broadcast
- ice_fltr_prepare_vlan
- ice_fltr_prepare_eth

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Eric Joyner
857a4f0e9f ice: Fix resource leak on early exit from function
Memory allocated in the ice_add_prof_id_vsig() function wasn't being
properly freed if an error occurred inside the for-loop in the function.

In particular, 'p' wasn't being freed if an error occurred before it was
added to the resource list at the end of the for-loop.

Signed-off-by: Eric Joyner <eric.joyner@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Jesse Brandeburg
53bb66983f ice: cleanup vf_id signedness
The vf_id variable is dealt with in the code in inconsistent
ways of sign usage, preventing compilation with -Werror=sign-compare.
Fix this problem in the code by always treating vf_id as unsigned, since
there are no valid values of vf_id that are negative.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Karol Kolacinski
88865fc4bb ice: Fix casting issues
Change min() macros to min_t() which has compare type specified and it
helps avoid precision loss.

In some cases there was precision loss during calls or assignments.
Some fields in structs were unnecessarily large and gave multiple
warnings.

There were also some minor type differences which are now fixed as well as
some cases where a simple cast was needed.

Callers were were passing data that is a u16 to
ice_sched_cfg_node_bw_alloc() but the function was truncating that to a u8.
Fix that by changing the function to take a u16.

Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Lihong Yang
0fee35774d ice: Provide more meaningful error message
When printing the ice status or AQ error codes, instead of printing out the
numerical value, provide the description of the error code. This provides
more info about the issue than a number.

Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Anirudh Venkataramanan
de75135b5c ice: Fix probe/open race condition
As soon as the driver registers the PF netdev, userspace utilities
like NetworkManager try to bring up the associated interface. When
this happens, the driver may not have finished initializing fully,
resulting in a bunch of errors in the interface up flow.

The driver already has a mechanism to indicate if it's not up yet;
by setting the __ICE_DOWN bit in pf->state, but this bit gets
cleared too early in the current flow. So clear this bit only when
the driver is fully up. Also check for the same bit in the ice_open
flow, and return -EBUSY if the bit is set.

Also in ice_open, replace references of vsi->back with a local
variable.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Dave Ertman
46a316500e ice: only drop link once when setting pauseparams
Currently, the ice driver is setting a PHY configuration,
which causes a link drop, and then additionally it calls
for a nway_reset, which restarts auto-negotiation on the
link, which also causes a link drop.  These two link
events in such close timing is causing the FW to not be
able to generate a link interrupt for the driver to
respond to.

Remove the unnecessary auto-negotiation restart from the
set pauseparams flow.  Also remove error path that
would have performed an ice_down/ice_up as that is
also unnecessary.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Dave Ertman
891540024b ice: Fix check for contiguous TCs
The current implementation for contiguous TC check
is assuming that the UPs will be mapped to TCs in
a linear progressing fashion.  This is obviously
not always true.

Change the check to allow for various UP2TC mapping
configurations.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Avinash JD
610ed0e93e ice: Don't reset and rebuild for Tx timeout on PFC enabled queue
When there's a Tx timeout for a queue which belongs to a PFC enabled TC,
then it's not because the queue is hung but because PFC is in action.

In PFC, peer sends a pause frame for a specified period of time when its
buffer threshold is exceeded (due to congestion). Netdev on the other
hand checks if ACK is received within a specified time for a TX packet, if
not, it'll invoke the tx_timeout routine.

Signed-off-by: Avinash JD <avinash.dayanand@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:03 -07:00
Brett Creeley
01b5e89aab ice: Add VF promiscuous support
Implement promiscuous support for VF VSIs. Behaviour of promiscuous support
is based on VF trust as well as the, introduced, vf-true-promisc flag.

A trusted VF with vf-true-promisc disabled will be the default VSI, which
means that all traffic without a matching destination MAC address in the
device's internal switch will be forwarded to this VF VSI.

A trusted VF with vf-true-promisc enabled will go into "true promiscuous
mode". This amounts to the VF receiving all ingress and egress traffic
that hits the device's internal switch.

An untrusted VF will only receive traffic destined for that VF.

The vf-true-promisc-support flag cannot be toggled while any VF is in
promiscuous mode. This flag should be set prior to loading the iavf driver
or spawning VF(s).

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:03 -07:00
Tony Nguyen
a4e82a81f5 ice: Add support for tunnel offloads
Create a boost TCAM entry for each tunnel port in order to get a tunnel
PTYPE. Update netdev feature flags and implement the appropriate logic to
get and set values for hardware offloads.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:03 -07:00
Jacob Keller
f45a645fa6 ice: report netlist version in .info_get
The flash memory for the ice hardware contains a block of information
used for link management called the Netlist module.

As this essentially represents another section of firmware, add its
version information to the output of the driver's .info_get handler.

This includes both a version and the first few bytes of a hash of the
module contents.

  fw.netlist -> the version information extracted from the netlist module
  fw.netlist.build-> first 4 bytes of the hash of the contents, similar
                     to fw.mgmt.build

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:03 -07:00
Jakub Sitnicki
5cf65922bb flow_dissector: Drop BPF flow dissector prog ref on netns cleanup
When attaching a flow dissector program to a network namespace with
bpf(BPF_PROG_ATTACH, ...) we grab a reference to bpf_prog.

If netns gets destroyed while a flow dissector is still attached, and there
are no other references to the prog, we leak the reference and the program
remains loaded.

Leak can be reproduced by running flow dissector tests from selftests/bpf:

  # bpftool prog list
  # ./test_flow_dissector.sh
  ...
  selftests: test_flow_dissector [PASS]
  # bpftool prog list
  4: flow_dissector  name _dissect  tag e314084d332a5338  gpl
          loaded_at 2020-05-20T18:50:53+0200  uid 0
          xlated 552B  jited 355B  memlock 4096B  map_ids 3,4
          btf_id 4
  #

Fix it by detaching the flow dissector program when netns is going away.

Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20200521083435.560256-1-jakub@cloudflare.com
2020-05-21 17:52:45 -07:00
Alexei Starovoitov
29ae90d221 Merge branch 'improve-branch_taken'
John Fastabend says:

====================
This series adds logic to the verifier to handle the case where a
pointer is known to be non-null but then the verifier encountesr a
instruction, such as 'if ptr == 0 goto X' or 'if ptr != 0 goto X',
where the pointer is compared against 0. Because the verifier tracks
if a pointer may be null in many cases we can improve the branch
tracking by following the case known to be true.

The first patch adds the verifier logic and patches 2-4 add the
test cases.

v1->v2: fix verifier logic to return -1 indicating both paths must
still be walked if value is not zero. Move mark_precision skip for
this case into caller of mark_precision to ensure mark_precision can
still catch other misuses. And add PTR_TYPE_BTF_ID to our list of
supported types. Finally add one more test to catch the value not
equal zero case. Thanks to Andrii for original review.

Also fixed up commit messages hopefully its better now.
====================

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-05-21 17:47:33 -07:00
John Fastabend
d844a71bff bpf: Selftests, add printk to test_sk_lookup_kern to encode null ptr check
Adding a printk to test_sk_lookup_kern created the reported failure
where a pointer type is checked twice for NULL. Lets add it to the
progs test test_sk_lookup_kern.c so we test the case from C all the
way into the verifier.

We already have printk's in selftests so seems OK to add another one.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/159009170603.6313.1715279795045285176.stgit@john-Precision-5820-Tower
2020-05-21 17:44:25 -07:00
John Fastabend
f9b16ec0ee bpf: Selftests, verifier case for non null pointer map value branch
When we have pointer type that is known to be non-null we only follow
the non-null branch. This adds tests to cover the map_value pointer
returned from a map lookup. To force an error if both branches are
followed we do an ALU op on R10.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/159009168650.6313.7434084136067263554.stgit@john-Precision-5820-Tower
2020-05-21 17:44:25 -07:00
John Fastabend
c72b5cbb09 bpf: Selftests, verifier case for non null pointer check branch taken
When we have pointer type that is known to be non-null and comparing
against zero we only follow the non-null branch. This adds tests to
cover this case for reference tracking. Also add the other case when
comparison against a non-zero value and ensure we still fail with
unreleased reference.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/159009166599.6313.1593680633787453767.stgit@john-Precision-5820-Tower
2020-05-21 17:44:25 -07:00
John Fastabend
cac616db39 bpf: Verifier track null pointer branch_taken with JNE and JEQ
Currently, when considering the branches that may be taken for a jump
instruction if the register being compared is a pointer the verifier
assumes both branches may be taken. But, if the jump instruction
is comparing if a pointer is NULL we have this information in the
verifier encoded in the reg->type so we can do better in these cases.
Specifically, these two common cases can be handled.

 * If the instruction is BPF_JEQ and we are comparing against a
   zero value. This test is 'if ptr == 0 goto +X' then using the
   type information in reg->type we can decide if the ptr is not
   null. This allows us to avoid pushing both branches onto the
   stack and instead only use the != 0 case. For example
   PTR_TO_SOCK and PTR_TO_SOCK_OR_NULL encode the null pointer.
   Note if the type is PTR_TO_SOCK_OR_NULL we can not learn anything.
   And also if the value is non-zero we learn nothing because it
   could be any arbitrary value a different pointer for example

 * If the instruction is BPF_JNE and ware comparing against a zero
   value then a similar analysis as above can be done. The test in
   asm looks like 'if ptr != 0 goto +X'. Again using the type
   information if the non null type is set (from above PTR_TO_SOCK)
   we know the jump is taken.

In this patch we extend is_branch_taken() to consider this extra
information and to return only the branch that will be taken. This
resolves a verifier issue reported with C code like the following.
See progs/test_sk_lookup_kern.c in selftests.

 sk = bpf_sk_lookup_tcp(skb, tuple, tuple_len, BPF_F_CURRENT_NETNS, 0);
 bpf_printk("sk=%d\n", sk ? 1 : 0);
 if (sk)
   bpf_sk_release(sk);
 return sk ? TC_ACT_OK : TC_ACT_UNSPEC;

In the above the bpf_printk() will resolve the pointer from
PTR_TO_SOCK_OR_NULL to PTR_TO_SOCK. Then the second test guarding
the release will cause the verifier to walk both paths resulting
in the an unreleased sock reference. See verifier/ref_tracking.c
in selftests for an assembly version of the above.

After the above additional logic is added the C code above passes
as expected.

Reported-by: Andrey Ignatov <rdna@fb.com>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/159009164651.6313.380418298578070501.stgit@john-Precision-5820-Tower
2020-05-21 17:44:25 -07:00
Alexei Starovoitov
79917b242c Merge branch 'af_xdp-common-alloc'
Björn Töpel says:

====================
Overview
========

Driver adoption for AF_XDP has been slow. The amount of code required
to proper support AF_XDP is substantial and the driver/core APIs are
vague or even non-existing. Drivers have to manually adjust data
offsets, updating AF_XDP handles differently for different modes
(aligned/unaligned).

This series attempts to improve the situation by introducing an AF_XDP
buffer allocation API. The implementation is based on a single core
(single producer/consumer) buffer pool for the AF_XDP UMEM.

A buffer is allocated using the xsk_buff_alloc() function, and
returned using xsk_buff_free(). If a buffer is disassociated with the
pool, e.g. when a buffer is passed to an AF_XDP socket, a buffer is
said to be released. Currently, the release function is only used by
the AF_XDP internals and not visible to the driver.

Drivers using this API should register the XDP memory model with the
new MEM_TYPE_XSK_BUFF_POOL type, which will supersede the
MEM_TYPE_ZERO_COPY type.

The buffer type is struct xdp_buff, and follows the lifetime of
regular xdp_buffs, i.e.  the lifetime of an xdp_buff is restricted to
a NAPI context. In other words, the API is not replacing xdp_frames.

DMA mapping/synching is folded into the buffer handling as well.

@JeffK The Intel drivers changes should go through the bpf-next tree,
       and not your regular Intel tree, since multiple (non-Intel)
       drivers are affected.

The outline of the series is as following:

Patch 1 is a fix for xsk_umem_xdp_frame_sz().

Patch 2 to 4 are restructures/clean ups. The XSKMAP implementation is
moved to net/xdp/. Functions/defines/enums that are only used by the
AF_XDP internals are moved from the global include/net/xdp_sock.h to
net/xdp/xsk.h. We are also introducing a new "driver include file",
include/net/xdp_sock_drv.h, which is the only file NIC driver
developers adding AF_XDP zero-copy support should care about.

Patch 5 adds the new API, and migrates the "copy-mode"/skb-mode AF_XDP
path to the new API.

Patch 6 to 11 migrates the existing zero-copy drivers to the new API.

Patch 12 removes the MEM_TYPE_ZERO_COPY memory type, and the "handle"
member of struct xdp_buff.

Patch 13 simplifies the xdp_return_{frame,frame_rx_napi,buff}
functions.

Patch 14 is a performance patch, where some functions are inlined.

Finally, patch 15 updates the MAINTAINERS file to correctly mirror the
new file layout.

Note that this series removes the "handle" member from struct
xdp_buff, which reduces the xdp_buff size.

After this series, the diff stat of drivers/net/ is:
  27 files changed, 419 insertions(+), 1288 deletions(-)

This series is a first step of simplifying the driver side of
AF_XDP. I think more of the AF_XDP logic can be moved from the drivers
to the AF_XDP core, e.g. the "need wakeup" set/clear functionality.

Statistics when allocation fails can now be added to the socket
statistics via the XDP_STATISTICS getsockopt(). This will be added in
a follow up series.

Performance
===========

As a nice side effect, performance is up a bit as well.

  * i40e: 3% higher pps for rxdrop, zero-copy, aligned and unaligned
    (40 GbE, 64B packets).
  * mlx5: RX +0.8 Mpps, TX +0.4 Mpps

Changelog
=========

v4->v5:
  * Fix various kdoc and GCC warnings (W=1). (Jakub)

v3->v4:
    * mlx5: Remove unused variable num_xsk_frames. (Jakub)
    * i40e: Made i40e_fd_handle_status() static. (kbuild test robot)

v2->v3:
  * Added xsk_umem_xdp_frame_sz() fix to the series. (Björn)
  * Initialize struct xdp_buff member frame_sz. (Björn)
  * Add API to query the DMA address of a frame. (Maxim)
  * Do DMA sync for CPU till the end of the frame to handle possible
    growth (frame_sz). (Maxim)
  * mlx5: Handle frame_sz, use xsk_buff_xdp_get_frame_dma, use
    xsk_buff API for DMA sync on TX, add performance numbers. (Maxim)

v1->v2:
  * mlx5: Fix DMA address handling, set XDP metadata to invalid. (Maxim)
  * ixgbe: Fixed xdp_buff data_end update. (Björn)
  * Swapped SoBs in patch 4. (Maxim)

rfc->v1:
  * Fixed build errors/warnings for m68k and riscv. (kbuild test
    robot)
  * Added headroom/chunk size getter. (Maxim/Björn)
  * mlx5: Put back the sanity check for XSK params, use XSK API to get
    the total headroom size. (Maxim)
  * Fixed spelling in commit message. (Björn)
  * Make sure xp_validate_desc() is inlined for Tx perf. (Maxim)
  * Sorted file entries. (Joe)
  * Added xdp_return_{frame,frame_rx_napi,buff} simplification (Björn)

Thanks for all the comments/input/help!
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-05-21 17:31:35 -07:00
Björn Töpel
28bee21dc0 MAINTAINERS, xsk: Update AF_XDP section after moves/adds
Update MAINTAINERS to correctly mirror the current AF_XDP socket file
layout. Also, add the AF_XDP files of libbpf.

rfc->v1: Sorted file entries. (Joe)

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Joe Perches <joe@perches.com>
Link: https://lore.kernel.org/bpf/20200520192103.355233-16-bjorn.topel@gmail.com
2020-05-21 17:31:27 -07:00
Björn Töpel
26062b185e xsk: Explicitly inline functions and move definitions
In order to reduce the number of function calls, the struct
xsk_buff_pool definition is moved to xsk_buff_pool.h. The functions
xp_get_dma(), xp_dma_sync_for_cpu(), xp_dma_sync_for_device(),
xp_validate_desc() and various helper functions are explicitly
inlined.

Further, move xp_get_handle() and xp_release() to xsk.c, to allow for
the compiler to perform inlining.

rfc->v1: Make sure xp_validate_desc() is inlined for Tx perf. (Maxim)

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200520192103.355233-15-bjorn.topel@gmail.com
2020-05-21 17:31:27 -07:00
Björn Töpel
82c41671ca xdp: Simplify xdp_return_{frame, frame_rx_napi, buff}
The xdp_return_{frame,frame_rx_napi,buff} function are never used,
except in xdp_convert_zc_to_xdp_frame(), by the MEM_TYPE_XSK_BUFF_POOL
memory type.

To simplify and reduce code, change so that
xdp_convert_zc_to_xdp_frame() calls xsk_buff_free() directly since the
type is know, and remove MEM_TYPE_XSK_BUFF_POOL from the switch
statement in __xdp_return() function.

Suggested-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200520192103.355233-14-bjorn.topel@gmail.com
2020-05-21 17:31:27 -07:00
Björn Töpel
0807892ecb xsk: Remove MEM_TYPE_ZERO_COPY and corresponding code
There are no users of MEM_TYPE_ZERO_COPY. Remove all corresponding
code, including the "handle" member of struct xdp_buff.

rfc->v1: Fixed spelling in commit message. (Björn)

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200520192103.355233-13-bjorn.topel@gmail.com
2020-05-21 17:31:27 -07:00
Björn Töpel
39d6443c8d mlx5, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOL
Use the new MEM_TYPE_XSK_BUFF_POOL API in lieu of MEM_TYPE_ZERO_COPY in
mlx5e. It allows to drop a lot of code from the driver (which is now
common in AF_XDP core and was related to XSK RX frame allocation, DMA
mapping, etc.) and slightly improve performance (RX +0.8 Mpps, TX +0.4
Mpps).

rfc->v1: Put back the sanity check for XSK params, use XSK API to get
         the total headroom size. (Maxim)

v1->v2: Fix DMA address handling, set XDP metadata to invalid. (Maxim)

v2->v3: Handle frame_sz, use xsk_buff_xdp_get_frame_dma, use xsk_buff
        API for DMA sync on TX, add performance numbers. (Maxim)

v3->v4: Remove unused variable num_xsk_frames. (Jakub)

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200520192103.355233-12-bjorn.topel@gmail.com
2020-05-21 17:31:27 -07:00
Björn Töpel
7117132b22 ixgbe, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOL
Remove MEM_TYPE_ZERO_COPY in favor of the new MEM_TYPE_XSK_BUFF_POOL
APIs.

v1->v2: Fixed xdp_buff data_end update. (Björn)

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: intel-wired-lan@lists.osuosl.org
Link: https://lore.kernel.org/bpf/20200520192103.355233-11-bjorn.topel@gmail.com
2020-05-21 17:31:27 -07:00
Björn Töpel
175fc43067 ice, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOL
Remove MEM_TYPE_ZERO_COPY in favor of the new MEM_TYPE_XSK_BUFF_POOL
APIs.

v4->v5: Fixed "warning: Excess function parameter 'alloc' description
        in 'ice_alloc_rx_bufs_zc'" and "warning: Excess function
        parameter 'xdp' description in
        'ice_construct_skb_zc'". (Jakub)

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: intel-wired-lan@lists.osuosl.org
Link: https://lore.kernel.org/bpf/20200520192103.355233-10-bjorn.topel@gmail.com
2020-05-21 17:31:26 -07:00
Björn Töpel
3b4f0b66c2 i40e, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOL
Remove MEM_TYPE_ZERO_COPY in favor of the new MEM_TYPE_XSK_BUFF_POOL
APIs. The AF_XDP zero-copy rx_bi ring is now simply a struct xdp_buff
pointer.

v4->v5: Fixed "warning: Excess function parameter 'bi' description in
        'i40e_construct_skb_zc'". (Jakub)

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: intel-wired-lan@lists.osuosl.org
Link: https://lore.kernel.org/bpf/20200520192103.355233-9-bjorn.topel@gmail.com
2020-05-21 17:31:26 -07:00
Björn Töpel
be1222b585 i40e: Separate kernel allocated rx_bi rings from AF_XDP rings
Continuing the path to support MEM_TYPE_XSK_BUFF_POOL, the AF_XDP
zero-copy/sk_buff rx_bi rings are now separate. Functions to properly
allocate the different rings are added as well.

v3->v4: Made i40e_fd_handle_status() static. (kbuild test robot)
v4->v5: Fix kdoc for i40e_clean_programming_status(). (Jakub)

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: intel-wired-lan@lists.osuosl.org
Link: https://lore.kernel.org/bpf/20200520192103.355233-8-bjorn.topel@gmail.com
2020-05-21 17:31:26 -07:00
Björn Töpel
e1675f9736 i40e: Refactor rx_bi accesses
As a first step to migrate i40e to the new MEM_TYPE_XSK_BUFF_POOL
APIs, code that accesses the rx_bi (SW/shadow ring) is refactored to
use an accessor function.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: intel-wired-lan@lists.osuosl.org
Link: https://lore.kernel.org/bpf/20200520192103.355233-7-bjorn.topel@gmail.com
2020-05-21 17:31:26 -07:00
Björn Töpel
2b43470add xsk: Introduce AF_XDP buffer allocation API
In order to simplify AF_XDP zero-copy enablement for NIC driver
developers, a new AF_XDP buffer allocation API is added. The
implementation is based on a single core (single producer/consumer)
buffer pool for the AF_XDP UMEM.

A buffer is allocated using the xsk_buff_alloc() function, and
returned using xsk_buff_free(). If a buffer is disassociated with the
pool, e.g. when a buffer is passed to an AF_XDP socket, a buffer is
said to be released. Currently, the release function is only used by
the AF_XDP internals and not visible to the driver.

Drivers using this API should register the XDP memory model with the
new MEM_TYPE_XSK_BUFF_POOL type.

The API is defined in net/xdp_sock_drv.h.

The buffer type is struct xdp_buff, and follows the lifetime of
regular xdp_buffs, i.e.  the lifetime of an xdp_buff is restricted to
a NAPI context. In other words, the API is not replacing xdp_frames.

In addition to introducing the API and implementations, the AF_XDP
core is migrated to use the new APIs.

rfc->v1: Fixed build errors/warnings for m68k and riscv. (kbuild test
         robot)
         Added headroom/chunk size getter. (Maxim/Björn)

v1->v2: Swapped SoBs. (Maxim)

v2->v3: Initialize struct xdp_buff member frame_sz. (Björn)
        Add API to query the DMA address of a frame. (Maxim)
        Do DMA sync for CPU till the end of the frame to handle
        possible growth (frame_sz). (Maxim)

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200520192103.355233-6-bjorn.topel@gmail.com
2020-05-21 17:31:26 -07:00
Björn Töpel
89e4a376e3 xsk: Move defines only used by AF_XDP internals to xsk.h
Move the XSK_NEXT_PG_CONTIG_{MASK,SHIFT}, and
XDP_UMEM_USES_NEED_WAKEUP defines from xdp_sock.h to the AF_XDP
internal xsk.h file. Also, start using the BIT{,_ULL} macro instead of
explicit shifts.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200520192103.355233-5-bjorn.topel@gmail.com
2020-05-21 17:31:26 -07:00