786020 Commits

Author SHA1 Message Date
Shay Agroskin
67daf11860 net/mlx5: Added "per_lane_error_counters" cap bit to PCAM
Added "Per lane raw errors" capability bit in
Ports Capabilities Mask (PCAM) enhanced features
layout.

This bit determines if the fields "phy_raw_errors_laneX"
in "Physical Layer statistical" counters group are supported.

Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18 13:32:36 -07:00
Jakub Kicinski
3ddeac6705 tools: bpftool: use 4 context mode for the NFP disasm
The nfp driver is currently always JITing the BPF for 4 context/thread
mode of the NFP flow processors.  Tell this to the disassembler,
otherwise some registers may be incorrectly decoded.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-18 22:16:02 +02:00
Shay Agroskin
6cfa946050 net/mlx5e: Ethtool driver callback for query/set FEC policy
Driver callback function for 'ethtool --show-fec',
'ethtool --set-fec' commands.

The query function returns active and configured FEC policy
for current link speed.

The set function sets FEC policy for all supported link
speeds.
1) If current link speed doesn't support requested FEC policy,
   the function fails.
2) If a different link speed doesn't support requested FEC
   policy, FEC capbilities for this speed are turned off.

Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18 13:13:31 -07:00
Shay Agroskin
2095b26414 net/mlx5e: Add port FEC get/set functions
Added functions to query and set link FEC policy.
To get/set FEC capabilities in PPLM reg we need to query
current link speed.
'mlx5_get_fec_speed_field' queries current link speed and returns
correct field offset.

FEC Query's return value is divided into 'active FEC policy', which is
the FEC policy used by the link, and 'configured FEC policy', which
is the FEC policy requested by the user.
The two values may differ if:
1) FEC policy was configured to 'auto',
   in which case the active FEC policy would be the default FEC policy
   for current link speed.

2) FEC policy was changed, but no link reset is performed. In which case,
   the active FEC policy would become the configured one after a link
   reset.

FEC set function sets FEC policy for all link speeds and perform link
reset.
1) If current link speed doesn't support requested FEC policy,
   the function fails.
2) If a different link speed doesn't support requested FEC policy,
   FEC capbilities for this speed are turned off and a warning message
   is printed.

Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18 13:13:31 -07:00
Shay Agroskin
4b5b9c7d97 net/mlx5: Add FEC fields to Port Phy Link Mode (PPLM) reg
Added FEC related fields to PPLM layout.
These fields are needed to set and query FEC policy
for different link speeds.

Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18 13:13:31 -07:00
Vlad Buslov
2a4c429802 net/mlx5: Remove counter from idr after removing it from list
Fs_counters list can temporary become unsorted when new counters are
created/deleted concurrently. Idr is used to quickly lookup position to
insert new counter in logarithmic time. However, if new flows are
concurrently inserted during time window when flows with adjacent ids are
already removed from idr but are still present in counters list,
mlx5_fc_stats_work() observes counters list in inconsistent state, which
results following warning:

[ 1839.561955] mlx5_core 0000:81:00.0: mlx5_cmd_fc_bulk_get:587:(pid 729): Flow counter id (0x102d5) out of range (0x1c0a8..0x1c10b). Counter ignored.

Move idr_remove() call to be executed synchronously with counter deletion
from list. Extract this code to mlx5_fc_stats_remove() helper function that
is called by workqueue job handler mlx5_fc_stats_work().

Fixes: 12d6066c3b29 ("net/mlx5: Add flow counters idr")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
2018-10-18 13:13:31 -07:00
Vlad Buslov
fd33071303 net/mlx5: Take fs_counters dellist before addlist
In fs_counters elements from both addlist and dellist are removed by
mlx5_fc_stats_work() without any locking. This introduces race condition
when batch of new rules is created and then immediately deleted (for
example, when error occurred during flow creation). In such case some of
the rules might be in dellist, but not in addlist when mlx5_fc_stats_work()
is executed concurrently with tc, which will result rule deletion and
use-after-free on next iteration because deleted rules are still in
addlist.

Always take dellist first to guarantee that rules can only be deleted after
they were removed from addlist.

Fixes: 6e5e22839136 ("net/mlx5: Add new list to store deleted flow counters")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reported-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
2018-10-18 13:13:31 -07:00
Tariq Toukan
4972e6fa3a net/mlx5: Refactor fragmented buffer struct fields and init flow
Take struct mlx5_frag_buf out of mlx5_frag_buf_ctrl, as it is not
needed to manage and control the datapath of the fragmented buffers API.

struct mlx5_frag_buf contains control info to manage the allocation
and de-allocation of the fragmented buffer.
Its fields are not relevant for datapath, so here I take them out of the
struct mlx5_frag_buf_ctrl, except for the fragments array itself.

In addition, modified mlx5_fill_fbc to initialise the frags pointers
as well. This implies that the buffer must be allocated before the
function is called.

A set of type-specific *_get_byte_size() functions are replaced by
a generic one.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18 13:13:31 -07:00
Peng Hao
1bd70d2eba selftests/bpf: fix file resource leak in load_kallsyms
FILE pointer variable f is opened but never closed.

Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-18 22:13:30 +02:00
Jeff Kirsher
828092ef77 Documentation: intel: Convert to RST format
Now that the documents have been updated to conform to the reStructured Text
guidelines, we can now change the file extensions and update the other
related references.

This converts all of the Intel wired LAN driver documentation to *.rst.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-10-18 12:41:29 -07:00
Jeff Kirsher
f12a84a9f6 Documentation: fm10k: Add kernel documentation
Added the fm10k kernel documentation, which apparently was missing.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-10-18 12:39:39 -07:00
Jeff Kirsher
1fae869bcf Documentation: ice: Prepare documentation for RST conversion
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.

Also fixed old/broken URLs to the correct or updated URL.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-10-18 12:38:17 -07:00
Jeff Kirsher
7bacc01d3e Documentation: iavf: Prepare documentation for RST conversion
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.

Also fixed old/broken URLs to the correct or updated URL.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-10-18 12:36:14 -07:00
Jeff Kirsher
1e06edcc2f Documentation: i40e: Prepare documentation for RST conversion
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.

Also fixed old/broken URLs to the correct or updated URL.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
2018-10-18 12:34:28 -07:00
Jeff Kirsher
63e2ea2f89 Documentation: ixgbevf: Prepare documentation for RST conversion
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.

Also fixed old/broken URLs to the correct or updated URL.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-10-18 12:28:35 -07:00
Jeff Kirsher
4d256e4d8a Documentation: ixgbe: Prepare documentation for RST conversion
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.

Also fixed old/broken URLs to the correct or updated URL.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-10-18 12:27:56 -07:00
Jeff Kirsher
413548de58 Documentation: igbvf: Prepare documentation for RST conversion
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.

Also fixed old/broken URLs to the correct or updated URL.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-10-18 12:24:18 -07:00
Jeff Kirsher
cf673eee90 Documentation: igb: Prepare documentation for RST conversion
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.

Also fixed old/broken URLs to the correct or updated URL.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-10-18 12:22:39 -07:00
Jeff Kirsher
b87e7f2468 Documentation: e1000e: Prepare documentation for RST conversion
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.

Also fixed old/broken URLs to the correct or updated URL.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-10-18 12:21:05 -07:00
Jeff Kirsher
8d59045f11 Documentation: ixgb: Prepare documentation for RST conversion
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.

Also fixed old/broken URLs to the correct or updated URL.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-10-18 12:17:32 -07:00
Jeff Kirsher
27642facf1 Documentation: e100, e1000: Add missing SPDX header
Add the SPDX-Lincense-Identifier to the Intel wired Ethernet *.rst
kernel documentation.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-10-18 12:15:27 -07:00
Corentin Labbe
c0e6f052f4 Documentation: networking: ixgb: Remove reference to IXGB_NAPI
NAPI is enabled by default and IXGB_NAPI was removed since
commit 6d37ab282e24 ("ixgb: make NAPI the only option and the default")
Update the doc accordingly.

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-10-18 12:03:03 -07:00
Heiner Kallweit
6b839b6cf9 r8169: fix NAPI handling under high load
rtl_rx() and rtl_tx() are called only if the respective bits are set
in the interrupt status register. Under high load NAPI may not be
able to process all data (work_done == budget) and it will schedule
subsequent calls to the poll callback.
rtl_ack_events() however resets the bits in the interrupt status
register, therefore subsequent calls to rtl8169_poll() won't call
rtl_rx() and rtl_tx() - chip interrupts are still disabled.

Fix this by calling rtl_rx() and rtl_tx() independent of the bits
set in the interrupt status register. Both functions will detect
if there's nothing to do for them.

Fixes: da78dbff2e05 ("r8169: remove work from irq handler.")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-18 11:33:29 -07:00
David S. Miller
27faeebd00 sparc: Revert unintended perf changes.
Some local debugging hacks accidently slipped into the VDSO commit.

Sorry!

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-18 11:32:29 -07:00
David S. Miller
3a3295bfa6 Merge branch 'sctp-fix-sk_wmem_queued-and-use-it-to-check-for-writable-space'
Xin Long says:

====================
sctp: fix sk_wmem_queued and use it to check for writable space

sctp doesn't count and use asoc sndbuf_used, sk sk_wmem_alloc and
sk_wmem_queued properly, which also causes some problem.

This patchset is to improve it.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-18 11:23:47 -07:00
Xin Long
cd305c74b0 sctp: use sk_wmem_queued to check for writable space
sk->sk_wmem_queued is used to count the size of chunks in out queue
while sk->sk_wmem_alloc is for counting the size of chunks has been
sent. sctp is increasing both of them before enqueuing the chunks,
and using sk->sk_wmem_alloc to check for writable space.

However, sk_wmem_alloc is also increased by 1 for the skb allocked
for sending in sctp_packet_transmit() but it will not wake up the
waiters when sk_wmem_alloc is decreased in this skb's destructor.

If msg size is equal to sk_sndbuf and sendmsg is waiting for sndbuf,
the check 'msg_len <= sctp_wspace(asoc)' in sctp_wait_for_sndbuf()
will keep waiting if there's a skb allocked in sctp_packet_transmit,
and later even if this skb got freed, the waiting thread will never
get waked up.

This issue has been there since very beginning, so we change to use
sk->sk_wmem_queued to check for writable space as sk_wmem_queued is
not increased for the skb allocked for sending, also as TCP does.

SOCK_SNDBUF_LOCK check is also removed here as it's for tx buf auto
tuning which I will add in another patch.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-18 11:23:47 -07:00
Xin Long
605c0ac182 sctp: count both sk and asoc sndbuf with skb truesize and sctp_chunk size
Now it's confusing that asoc sndbuf_used is doing memory accounting with
SCTP_DATA_SNDSIZE(chunk) + sizeof(sk_buff) + sizeof(sctp_chunk) while sk
sk_wmem_alloc is doing that with skb->truesize + sizeof(sctp_chunk).

It also causes sctp_prsctp_prune to count with a wrong freed memory when
sndbuf_policy is not set.

To make this right and also keep consistent between asoc sndbuf_used, sk
sk_wmem_alloc and sk_wmem_queued, use skb->truesize + sizeof(sctp_chunk)
for them.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-18 11:23:47 -07:00
Leo Li
4364bcb2cd drm: Get ref on CRTC commit object when waiting for flip_done
This fixes a general protection fault, caused by accessing the contents
of a flip_done completion object that has already been freed. It occurs
due to the preemption of a non-blocking commit worker thread W by
another commit thread X. X continues to clear its atomic state at the
end, destroying the CRTC commit object that W still needs. Switching
back to W and accessing the commit objects then leads to bad results.

Worker W becomes preemptable when waiting for flip_done to complete. At
this point, a frequently occurring commit thread X can take over. Here's
an example where W is a worker thread that flips on both CRTCs, and X
does a legacy cursor update on both CRTCs:

        ...
     1. W does flip work
     2. W runs commit_hw_done()
     3. W waits for flip_done on CRTC 1
     4. > flip_done for CRTC 1 completes
     5. W finishes waiting for CRTC 1
     6. W waits for flip_done on CRTC 2

     7. > Preempted by X
     8. > flip_done for CRTC 2 completes
     9. X atomic_check: hw_done and flip_done are complete on all CRTCs
    10. X updates cursor on both CRTCs
    11. X destroys atomic state
    12. X done

    13. > Switch back to W
    14. W waits for flip_done on CRTC 2
    15. W raises general protection fault

The error looks like so:

    general protection fault: 0000 [#1] PREEMPT SMP PTI
    **snip**
    Call Trace:
     lock_acquire+0xa2/0x1b0
     _raw_spin_lock_irq+0x39/0x70
     wait_for_completion_timeout+0x31/0x130
     drm_atomic_helper_wait_for_flip_done+0x64/0x90 [drm_kms_helper]
     amdgpu_dm_atomic_commit_tail+0xcae/0xdd0 [amdgpu]
     commit_tail+0x3d/0x70 [drm_kms_helper]
     process_one_work+0x212/0x650
     worker_thread+0x49/0x420
     kthread+0xfb/0x130
     ret_from_fork+0x3a/0x50
    Modules linked in: x86_pkg_temp_thermal amdgpu(O) chash(O)
    gpu_sched(O) drm_kms_helper(O) syscopyarea sysfillrect sysimgblt
    fb_sys_fops ttm(O) drm(O)

Note that i915 has this issue masked, since hw_done is signaled after
waiting for flip_done. Doing so will block the cursor update from
happening until hw_done is signaled, preventing the cursor commit from
destroying the state.

v2: The reference on the commit object needs to be obtained before
    hw_done() is signaled, since that's the point where another commit
    is allowed to modify the state. Assuming that the
    new_crtc_state->commit object still exists within flip_done() is
    incorrect.

    Fix by getting a reference in setup_commit(), and releasing it
    during default_clear().

Signed-off-by: Leo Li <sunpeng.li@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1539611200-6184-1-git-send-email-sunpeng.li@amd.com
2018-10-18 14:23:13 -04:00
David S. Miller
2d0f0ca2c7 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2018-10-17

This series adds support for the new igc driver.

The igc driver is the new client driver supporting the Intel I225
Ethernet Controller, which supports 2.5GbE speeds.  The reason for
creating a new client driver, instead of adding support for the new
device in e1000e, is that the silicon behaves more like devices
supported in igb driver.  It also did not make sense to add a client
part, to the igb driver which supports only 1GbE server parts.

This initial set of patches is designed for basic support (i.e. link and
pass traffic).  Follow-on patch series will add more advanced support
like VLAN, Wake-on-LAN, etc..
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-18 10:27:20 -07:00
David S. Miller
99e9acd85c mlx5-updates-2018-10-17
========================================================================
 
 From Or Gerlitz <ogerlitz@mellanox.com>:
 
 This series from Paul adds support to mlx5 e-switch tc offloading of multiple priorities and chains.
 
 This is made of four building blocks (along with few minor driver refactors):
 
 [1] Split FDB fast path prio to multiple namespaces
 
 Currently the FDB name-space contains two priorities, fast path (p0) and slow path (p1).
 The slow path contains the per representor SQ send-to-vport TX rule and the match-all
 RX miss rule. As a pre-step to support multi-chains and priorities, we split the FDB fast path
 to multiple namespaces  (sub namespaces), each with multiple priorities.
 
 [2] E-Switch chains and priorities
 
 A chain is a group of priorities. We use the fdb parallel sub-namespaces to implement chains,
 and a flow table for each priority in them.
 
 Because these namespaces are parallel and in series to the slow path
 fdb, the chains aren't connected to each other (but to the slow path),
 and one must use a explicit goto action to reach a different chain.
 
 Flow tables for the priorities are created on demand and destroyed
 once not used.
 
 [3] Add a no-append flow insertion mode, use it for TC offloads
 
 Enhance the driver fs core, such that if a no-append flag is set by the caller,
 we add a new FTE, instead of appending the actions of the inserted rule when
 the same match already exists.
 
 For encap rules, we defer the HW offloading till we have a valid neighbor. This can
 result in the packet hitting a lower priority rule in the HW DP. Use the no-append API
 to push these packets to the slow path FDB table, so they go to the TC kernel DP as done
 before priorities where supported.
 
 [4] Offloading tc priorities and chains for eswitch flows
 
 Using [1], [2] and [3] above we add the support for offloading both chains
 and priorities. To get to a new chain, use the tc goto action. We support
 a fixed prio range 1-16, and chains 0-3.
 =============================================================================
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbx6k1AAoJEEg/ir3gV/o+l40H/14rNaV27vefjuALgOvNX4DY
 iSI5UFv9ILnAemcD2xkVfJeGolwdzoRhCXJ5oyCylCPnP4tb9zgDgwu9V/WmIRG+
 DOaPLu+0V6jqfEGO5sXJPMhJNUR8WWAjfu66htJ0Nc1HV2OM5eYrcvjaYCfW4Egr
 QFWGyq4sPyYcpbb7wURbhmkfs8Vwxcj9c2cZIfXo3VJsKxULqU9Mj5hZnirI1OAy
 UhjLssb/8wfHmwNcqETI9ae7O+vPDMLkxdQvpviEBI+HJ7vZ6op2X4lVEsn/Bx2E
 /KrHGQObkwim8thTOYkQeJtqptWbiRvkpNnwryUV1fwjWPl6X1r3bXH7RdeRwCg=
 =aFCc
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-10-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

mlx5-updates-2018-10-17

========================================================================

From Or Gerlitz <ogerlitz@mellanox.com>:

This series from Paul adds support to mlx5 e-switch tc offloading of multiple priorities and chains.

This is made of four building blocks (along with few minor driver refactors):

[1] Split FDB fast path prio to multiple namespaces

Currently the FDB name-space contains two priorities, fast path (p0) and slow path (p1).
The slow path contains the per representor SQ send-to-vport TX rule and the match-all
RX miss rule. As a pre-step to support multi-chains and priorities, we split the FDB fast path
to multiple namespaces  (sub namespaces), each with multiple priorities.

[2] E-Switch chains and priorities

A chain is a group of priorities. We use the fdb parallel sub-namespaces to implement chains,
and a flow table for each priority in them.

Because these namespaces are parallel and in series to the slow path
fdb, the chains aren't connected to each other (but to the slow path),
and one must use a explicit goto action to reach a different chain.

Flow tables for the priorities are created on demand and destroyed
once not used.

[3] Add a no-append flow insertion mode, use it for TC offloads

Enhance the driver fs core, such that if a no-append flag is set by the caller,
we add a new FTE, instead of appending the actions of the inserted rule when
the same match already exists.

For encap rules, we defer the HW offloading till we have a valid neighbor. This can
result in the packet hitting a lower priority rule in the HW DP. Use the no-append API
to push these packets to the slow path FDB table, so they go to the TC kernel DP as done
before priorities where supported.

[4] Offloading tc priorities and chains for eswitch flows

Using [1], [2] and [3] above we add the support for offloading both chains
and priorities. To get to a new chain, use the tc goto action. We support
a fixed prio range 1-16, and chains 0-3.
=============================================================================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-18 10:25:37 -07:00
David S. Miller
8f18da4721 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2018-10-18

1) Remove an unnecessary dev->tstats check in xfrmi_get_stats64.
   From Li RongQing.

2) We currently do a sizeof(element) instead of a sizeof(array)
   check when initializing the ovec array of the secpath.
   Currently this array can have only one element, so code is
   OK but error-prone. Change this to do a sizeof(array)
   check so that we can add more elements in future.
   From Li RongQing.

3) Improve xfrm IPv6 address hashing by using the complete IPv6
   addresses for a hash. From Michal Kubecek.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-18 09:57:42 -07:00
David S. Miller
2ee653f644 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2018-10-18

1) Free the xfrm interface gro_cells when deleting the
   interface, otherwise we leak it. From Li RongQing.

2) net/core/flow.c does not exist anymore, so remove it
   from the MAINTAINERS file.

3) Fix a slab-out-of-bounds in _decode_session6.
   From Alexei Starovoitov.

4) Fix RCU protection when policies inserted into
   thei bydst lists. From Florian Westphal.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-18 09:55:08 -07:00
Ming Lei
744889b7cb block: don't deal with discard limit in blkdev_issue_discard()
blk_queue_split() does respect this limit via bio splitting, so no
need to do that in blkdev_issue_discard(), then we can align to
normal bio submit(bio_add_page() & submit_bio()).

More importantly, this patch fixes one issue introduced in a22c4d7e34402cc
("block: re-add discard_granularity and alignment checks"), in which
zero discard bio may be generated in case of zero alignment.

Fixes: a22c4d7e34402ccdf3 ("block: re-add discard_granularity and alignment checks")
Cc: stable@vger.kernel.org
Cc: Ming Lin <ming.l@ssi.samsung.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Xiao Ni <xni@redhat.com>
Tested-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-18 07:23:40 -06:00
Eric Sandeen
fa520c47ea fscache: Fix out of bound read in long cookie keys
fscache_set_key() can incur an out-of-bounds read, reported by KASAN:

 BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x5b3/0x680 [fscache]
 Read of size 4 at addr ffff88084ff056d4 by task mount.nfs/32615

and also reported by syzbot at https://lkml.org/lkml/2018/7/8/236

  BUG: KASAN: slab-out-of-bounds in fscache_set_key fs/fscache/cookie.c:120 [inline]
  BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x7a9/0x880 fs/fscache/cookie.c:171
  Read of size 4 at addr ffff8801d3cc8bb4 by task syz-executor907/4466

This happens for any index_key_len which is not divisible by 4 and is
larger than the size of the inline key, because the code allocates exactly
index_key_len for the key buffer, but the hashing loop is stepping through
it 4 bytes (u32) at a time in the buf[] array.

Fix this by calculating how many u32 buffers we'll need by using
DIV_ROUND_UP, and then using kcalloc() to allocate a precleared allocation
buffer to hold the index_key, then using that same count as the hashing
index limit.

Fixes: ec0328e46d6e ("fscache: Maintain a catalogue of allocated cookies")
Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18 11:32:21 +02:00
David Howells
1ff22883b0 fscache: Fix incomplete initialisation of inline key space
The inline key in struct rxrpc_cookie is insufficiently initialized,
zeroing only 3 of the 4 slots, therefore an index_key_len between 13 and 15
bytes will end up hashing uninitialized memory because the memcpy only
partially fills the last buf[] element.

Fix this by clearing fscache_cookie objects on allocation rather than using
the slab constructor to initialise them.  We're going to pretty much fill
in the entire struct anyway, so bringing it into our dcache writably
shouldn't incur much overhead.

This removes the need to do clearance in fscache_set_key() (where we aren't
doing it correctly anyway).

Also, we don't need to set cookie->key_len in fscache_set_key() as we
already did it in the only caller, so remove that.

Fixes: ec0328e46d6e ("fscache: Maintain a catalogue of allocated cookies")
Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com
Reported-by: Eric Sandeen <sandeen@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18 11:32:21 +02:00
Al Viro
169b803397 cachefiles: fix the race between cachefiles_bury_object() and rmdir(2)
the victim might've been rmdir'ed just before the lock_rename();
unlike the normal callers, we do not look the source up after the
parents are locked - we know it beforehand and just recheck that it's
still the child of what used to be its parent.  Unfortunately,
the check is too weak - we don't spot a dead directory since its
->d_parent is unchanged, dentry is positive, etc.  So we sail all
the way to ->rename(), with hosting filesystems _not_ expecting
to be asked renaming an rmdir'ed subdirectory.

The fix is easy, fortunately - the lock on parent is sufficient for
making IS_DEADDIR() on child safe.

Cc: stable@vger.kernel.org
Fixes: 9ae326a69004 (CacheFiles: A cache that backs onto a mounted filesystem)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18 11:32:21 +02:00
Linus Torvalds
eb66ae0308 mremap: properly flush TLB before releasing the page
Jann Horn points out that our TLB flushing was subtly wrong for the
mremap() case.  What makes mremap() special is that we don't follow the
usual "add page to list of pages to be freed, then flush tlb, and then
free pages".  No, mremap() obviously just _moves_ the page from one page
table location to another.

That matters, because mremap() thus doesn't directly control the
lifetime of the moved page with a freelist: instead, the lifetime of the
page is controlled by the page table locking, that serializes access to
the entry.

As a result, we need to flush the TLB not just before releasing the lock
for the source location (to avoid any concurrent accesses to the entry),
but also before we release the destination page table lock (to avoid the
TLB being flushed after somebody else has already done something to that
page).

This also makes the whole "need_flush" logic unnecessary, since we now
always end up flushing the TLB for every valid entry.

Reported-and-tested-by: Jann Horn <jannh@google.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18 11:30:52 +02:00
Christoph Hellwig
19e6420e41 LICENSES: Remove CC-BY-SA-4.0 license text
Using non-GPL licenses for our documentation is rather problematic,
as it can directly include other files, which generally are GPLv2
licensed and thus not compatible.

Remove this license now that the only user (idr.rst) is gone to avoid
people semi-accidentally using it again.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18 11:28:50 +02:00
Greg Kroah-Hartman
ca9f672f7c Merge branch 'ida-fixes-4.19-rc8' of git://git.infradead.org/users/willy/linux-dax
Matthew writes:
  "IDA/IDR fixes for 4.19

   I have two tiny fixes, one for the IDA test-suite and one for the IDR
   documentation license."

* 'ida-fixes-4.19-rc8' of git://git.infradead.org/users/willy/linux-dax:
  idr: Change documentation license
  test_ida: Fix lockdep warning
2018-10-18 11:24:32 +02:00
Balakrishna Godavarthi
c614ca3f74 Bluetooth: hci_qca: Add support for controller debug logs.
This patch will prevent error messages splashing on console.

[   78.426697] Bluetooth: hci_core.c:hci_acldata_packet() hci0: ACL packet for unknown connection handle 3804
[   78.436682] Bluetooth: hci_core.c:hci_acldata_packet() hci0: ACL packet for unknown connection handle 3804
[   78.446639] Bluetooth: hci_core.c:hci_acldata_packet() hci0: ACL packet for unknown connection handle 3804
[   78.456596] Bluetooth: hci_core.c:hci_acldata_packet() hci0: ACL packet for unknown connection handle 3804

QCA wcn3990 will send the debug logs in the form of ACL packets.
While decoding packet in qca_recv(), marking the received debug log
packet as diagnostic packet.

Signed-off-by: Harish Bandi <c-hbandi@codeaurora.org>
Signed-off-by: Balakrishna Godavarthi <bgodavar@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-10-18 09:55:16 +02:00
Owen Lin
1411a26053 Bluetooth: btusb: Add support for 0cf3:535b QCA_ROME device
T:  Bus=01 Lev=01 Prnt=01 Port=03 Cnt=02 Dev#=  3 Spd=12   MxCh= 0
D:  Ver= 2.01 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=0cf3 ProdID=535b Rev= 0.01
C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=81(I) Atr=03(Int.) MxPS=  16 Ivl=1ms
E:  Ad=82(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=   0 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   0 Ivl=1ms

Signed-off-by: Owen Lin <olin@rivetnetworks.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-10-18 09:30:10 +02:00
Ingo Molnar
20e8e72d0f perf/urgent fixes:
- Stop fallbacking to kallsyms for vDSO symbols lookup, this wasn't
   being really used and is not valid in arches such as Sparc, where
   user and kernel space don't share the address space, relying only on
   cpumode to figure out what DSOs to lookup (Arnaldo Carvalho de Melo)
 
 - Align cpu map synthesized events properly, fixing SIGBUS in
   CPUs like Sparc (David Miller)
 
 - Fix use of alternatives to find JDIR (Jarod Wilson)
 
 - Store ids for events with their own cpus when synthesizing user
   level event details (scale, unit, etc) events, fixing a crash
   when recording a PMU event with a cpumask defined (Jiri Olsa)
 
 - Fix wrong filter_band* values for uncore Intel vendor events (Jiri Olsa)
 
 - Fix detection of tracefs path in systems without tracefs, where
   that path should be the debugfs mountpoint plus "/tracing/" (Jiri Olsa)
 
 - Pass build flags to traceevent build, allowing using alternative
   flags in distro packages, RPM, for instance (Jiri Olsa)
 
 - Fix 'perf report' crash on invalid inline debug information (Milian Wolff)
 
 - Synch kvm uapi copies (Arnaldo Carvalho de Melo)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCW8eytQAKCRCyPKLppCJ+
 Jz94AP9Ra7FFmnMuffimP5pIkUacfqkLXPG3Lymxa8+pm0FH6gD/cWUZCxNdchBN
 v4zFXT1i9iR2YCKu8/1iijVx2wtpZQw=
 =Dh50
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo-4.19-20181017' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

- Stop falling back to kallsyms for vDSO symbols lookup, this wasn't
  being really used and is not valid in arches such as Sparc, where
  user and kernel space don't share the address space, relying only on
  cpumode to figure out what DSOs to lookup (Arnaldo Carvalho de Melo)

- Align CPU map synthesized events properly, fixing SIGBUS in
  CPUs like Sparc (David Miller)

- Fix use of alternatives to find JDIR (Jarod Wilson)

- Store IDs for events with their own CPUs when synthesizing user
  level event details (scale, unit, etc) events, fixing a crash
  when recording a PMU event with a cpumask defined (Jiri Olsa)

- Fix wrong filter_band* values for uncore Intel vendor events (Jiri Olsa)

- Fix detection of tracefs path in systems without tracefs, where
  that path should be the debugfs mountpoint plus "/tracing/" (Jiri Olsa)

- Pass build flags to traceevent build, allowing using alternative
  flags in distro packages, RPM, for instance (Jiri Olsa)

- Fix 'perf report' crash on invalid inline debug information (Milian Wolff)

- Synch KVM UAPI copies (Arnaldo Carvalho de Melo)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-18 07:41:29 +02:00
Nikolay Aleksandrov
eddf016b91 net: ipmr: fix unresolved entry dumps
If the skb space ends in an unresolved entry while dumping we'll miss
some unresolved entries. The reason is due to zeroing the entry counter
between dumping resolved and unresolved mfc entries. We should just
keep counting until the whole table is dumped and zero when we move to
the next as we have a separate table counter.

Reported-by: Colin Ian King <colin.king@canonical.com>
Fixes: 8fb472c09b9d ("ipmr: improve hash scalability")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 22:35:42 -07:00
Gregory CLEMENT
06a36ecb5d net: mscc: ocelot: Fix comment in ocelot_vlant_wait_for_completion()
The ocelot_vlant_wait_for_completion() function is very similar to the
ocelot_mact_wait_for_completion(). It seemed to have be copied but the
comment was not updated, so let's fix it.

Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 22:33:43 -07:00
Xin Long
5660b9d9d6 sctp: fix the data size calculation in sctp_data_size
sctp data size should be calculated by subtracting data chunk header's
length from chunk_hdr->length, not just data header.

Fixes: 668c9beb9020 ("sctp: implement assign_number for sctp_stream_interleave")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 22:32:21 -07:00
Gustavo A. R. Silva
82385b0d2d net: skbuff.h: Mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 22:31:30 -07:00
Arthur Kiyanovski
9fd255928d net: ena: enable Low Latency Queues
Use the new API to enable usage of LLQ.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 22:30:41 -07:00
Ake Koomsin
05c998b738 virtio_net: avoid using netif_tx_disable() for serializing tx routine
Commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
introduces netif_tx_disable() after netif_device_detach() in order to
avoid use-after-free of tx queues. However, there are two issues.

1) Its operation is redundant with netif_device_detach() in case the
   interface is running.
2) In case of the interface is not running before suspending and
   resuming, the tx does not get resumed by netif_device_attach().
   This results in losing network connectivity.

It is better to use netif_tx_lock_bh()/netif_tx_unlock_bh() instead for
serializing tx routine during reset. This also preserves the symmetry
of netif_device_detach() and netif_device_attach().

Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
Signed-off-by: Ake Koomsin <ake@igel.co.jp>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 22:29:30 -07:00
Greg Kroah-Hartman
9bd871df56 This fixes two bugs:
- Fix size mismatch of tracepoint array
 
  - Have preemptirq test module use same clock source of the selftest
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCW8eRhRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qkEgAP4vscLVMSYBTUuDNXX0+l8FVdrpPagL
 1tjTJpTUfG3QLQEA9XOl8vR/Yy/BywcU7K2R3zGbo7Qh6AgpWl2pJcmsGQk=
 =XS5E
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Steven writes:
  "tracing: Two fixes for 4.19

   This fixes two bugs:
    - Fix size mismatch of tracepoint array
    - Have preemptirq test module use same clock source of the selftest"

* tag 'trace-v4.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Use trace_clock_local() for looping in preemptirq_delay_test.c
  tracepoint: Fix tracepoint array element size mismatch
2018-10-18 07:29:05 +02:00
Netanel Belgazal
8c590f9776 net: ena: Fix Kconfig dependency on X86
The Kconfig limitation of X86 is to too wide.
The ENA driver only requires a little endian dependency.

Change the dependency to be on little endian CPU.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 22:28:34 -07:00