linux

iv/linux

Author	SHA1	Message	Date
Conley Lee	47869e82c8	sun4i-emac.c: add dma support Thanks for your review. Here is the new version for this patch. This patch adds support for the emac rx dma present on sun4i. The emac is able to move packets from rx fifo to RAM by using dma. Change since v4. - rename sbk field to skb - rename alloc_emac_dma_req to emac_alloc_dma_req - using kzalloc(..., GPF_ATOMIC) in interrupt context to avoid sleeping - retry by using emac_inblk_32bit when emac_dma_inblk_32bit fails - fix some code style issues Change since v5. - fix some code style issue Signed-off-by: Conley Lee <conleylee@foxmail.com> Link: https://lore.kernel.org/r/tencent_DE05ADA53D5B084D4605BE6CB11E49EF7408@qq.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-29 17:51:36 -08:00
Nikolay Aleksandrov	168fed986b	net: bridge: mcast: fix br_multicast_ctx_vlan_global_disabled helper We need to first check if the context is a vlan one, then we need to check the global bridge multicast vlan snooping flag, and finally the vlan's multicast flag, otherwise we will unnecessarily enable vlan mcast processing (e.g. querier timers). Fixes: 7b54aaaf53cb ("net: bridge: multicast: add vlan state initialization and control") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20211228153142.536969-1-nikolay@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-29 17:49:45 -08:00
Muchun Song	e22e45fc9e	net: fix use-after-free in tw_timer_handler A real world panic issue was found as follow in Linux 5.4. BUG: unable to handle page fault for address: ffffde49a863de28 PGD 7e6fe62067 P4D 7e6fe62067 PUD 7e6fe63067 PMD f51e064067 PTE 0 RIP: 0010:tw_timer_handler+0x20/0x40 Call Trace: <IRQ> call_timer_fn+0x2b/0x120 run_timer_softirq+0x1ef/0x450 __do_softirq+0x10d/0x2b8 irq_exit+0xc7/0xd0 smp_apic_timer_interrupt+0x68/0x120 apic_timer_interrupt+0xf/0x20 This issue was also reported since 2017 in the thread [1], unfortunately, the issue was still can be reproduced after fixing DCCP. The ipv4_mib_exit_net is called before tcp_sk_exit_batch when a net namespace is destroyed since tcp_sk_ops is registered befrore ipv4_mib_ops, which means tcp_sk_ops is in the front of ipv4_mib_ops in the list of pernet_list. There will be a use-after-free on net->mib.net_statistics in tw_timer_handler after ipv4_mib_exit_net if there are some inflight time-wait timers. This bug is not introduced by commit f2bf415cfed7 ("mib: add net to NET_ADD_STATS_BH") since the net_statistics is a global variable instead of dynamic allocation and freeing. Actually, commit 61a7e26028b9 ("mib: put net statistics on struct net") introduces the bug since it put net statistics on struct net and free it when net namespace is destroyed. Moving init_ipv4_mibs() to the front of tcp_init() to fix this bug and replace pr_crit() with panic() since continuing is meaningless when init_ipv4_mibs() fails. [1] https://groups.google.com/g/syzkaller/c/p1tn-_Kc6l4/m/smuL_FMAAgAJ?pli=1 Fixes: 61a7e26028b9 ("mib: put net statistics on struct net") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Cong Wang <cong.wang@bytedance.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20211228104145.9426-1-songmuchun@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-29 17:46:43 -08:00
Jianguo Wu	add25d6d6c	selftests: net: Fix a typo in udpgro_fwd.sh $rvs -> $rcv Fixes: a062260a9d5f ("selftests: net: add UDP GRO forwarding self-tests") Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Link: https://lore.kernel.org/r/d247d7c8-a03a-0abf-3c71-4006a051d133@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-29 17:33:48 -08:00
wujianguo	9c1952aeaa	selftests/net: udpgso_bench_tx: fix dst ip argument udpgso_bench_tx call setup_sockaddr() for dest address before parsing all arguments, if we specify "-p ${dst_port}" after "-D ${dst_ip}", then ${dst_port} will be ignored, and using default cfg_port 8000. This will cause test case "multiple GRO socks" failed in udpgro.sh. Setup sockaddr after parsing all arguments. Fixes: 3a687bef148d ("selftests: udp gso benchmark") Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/ff620d9f-5b52-06ab-5286-44b945453002@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-29 17:29:50 -08:00
Jakub Kicinski	e2dfb94f27	bluetooth-next pull request for net-next: - Add support for Foxconn MT7922A - Add support for Realtek RTL8852AE - Rework HCI event handling to use skb_pull_data -----BEGIN PGP SIGNATURE----- iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmHMzl8ZHGx1aXoudm9u LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKSpcD/0dP8yVmlPztQwV/O/DjIHc rVCplr+pzdOcuW4qXDsKhDt9y5E6KgIhIqnksHV3fM58yY1L6sm9n0Qpihys0mIS GHQ7VtqCE8WMpifkCTqdKV34Guy9BP0GGQpWTh7jNTeiwHT1t5pjA6ajYE3qaSS/ z4opVFz/eGqinPLPxjP7GBP4VqaLG0QMNo3pSz0hi+E/krIwnjT+VoKNKteDkHf9 klmHrGF2OYtXC8VezvquFgmUgoN+yL/63Ci1DlkF3mRUEc6jMHFMs7BiQov2eENt Hq9E1ye5Hlf5IedtJ80QdkLvW/JgE1jZKATB7x7E09j9QBP2gxJRP9z6sSJiVIVm 53MRRYaA9jOPCfaVPKGMp+4NQMa7F2gahoI1k7LVS0ksc0RGC/nixtdiZWSjwU6E cYiQOk5CD3hK+lPSYBvjMOnm+h4w4FZ5W70g5VKKnAGCUBvERSwh2qmUdpG4j1vq 4oeg64DsLP5n7MWbHn7zbtviiJVbMg+a0D4GlExlxu06OOx7yra3s+LbcHoZbG4B T3ivhx9oX+ho7fG3rjBPrENoZB62n48x/mzxsLyFHt/m3uSgsXVSCVIm9dwwsdBM P2OzX/dSx/3AhnH0wNPtENAoU4/ETuMOKuFCoU9pL0cjJbQ3M9FipoZakzF/B1se 18+x0pq2o4PT9W1JUPMfvA== =AVll -----END PGP SIGNATURE----- Merge tag 'for-net-next-2021-12-29' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: - Add support for Foxconn MT7922A - Add support for Realtek RTL8852AE - Rework HCI event handling to use skb_pull_data * tag 'for-net-next-2021-12-29' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (62 commits) Bluetooth: MGMT: Fix spelling mistake "simultanous" -> "simultaneous" Bluetooth: vhci: Set HCI_QUIRK_VALID_LE_STATES Bluetooth: MGMT: Fix LE simultaneous roles UUID if not supported Bluetooth: hci_sync: Add check simultaneous roles support Bluetooth: hci_sync: Wait for proper events when connecting LE Bluetooth: hci_sync: Add support for waiting specific LE subevents Bluetooth: hci_sync: Add hci_le_create_conn_sync Bluetooth: hci_event: Use skb_pull_data when processing inquiry results Bluetooth: hci_sync: Push sync command cancellation to workqueue Bluetooth: hci_qca: Stop IBS timer during BT OFF Bluetooth: btusb: Add support for Foxconn MT7922A Bluetooth: btintel: Add missing quirks and msft ext for legacy bootloader Bluetooth: btusb: Add two more Bluetooth parts for WCN6855 Bluetooth: L2CAP: Fix using wrong mode Bluetooth: hci_sync: Fix not always pausing advertising when necessary Bluetooth: mgmt: Make use of mgmt_send_event_skb in MGMT_EV_DEVICE_CONNECTED Bluetooth: mgmt: Make use of mgmt_send_event_skb in MGMT_EV_DEVICE_FOUND Bluetooth: mgmt: Introduce mgmt_alloc_skb and mgmt_send_event_skb Bluetooth: btusb: Return error code when getting patch status failed Bluetooth: btusb: Handle download_firmware failure cases ... ==================== Link: https://lore.kernel.org/r/20211229211258.2290966-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-29 14:14:38 -08:00
Jakub Kicinski	f7397cd24c	Merge branch 'net-bridge-mcast-add-and-enforce-query-interval-minimum' Nikolay Aleksandrov says: ==================== net: bridge: mcast: add and enforce query interval minimum This set adds and enforces 1 second minimum value for bridge multicast query and startup query intervals in order to avoid rearming the timers too often which could lock and crash the host. I doubt anyone is using such low values or anything lower than 1 second, so it seems like a good minimum. In order to be compatible if the value is lower then it is overwritten and a log message is emitted, since we can't return an error at this point. Eric, I looked for the syzbot reports in its dashboard but couldn't find them so I've added you as the reporter. I've prepared a global bridge igmp rate limiting patch but wasn't sure if it's ok for -net. It adds a static limit of 32k packets per second, I plan to send it for net-next with added drop counters for each bridge so it can be easily debugged. Original report can be seen at: https://lore.kernel.org/netdev/e8b9ce41-57b9-b6e2-a46a-ff9c791cf0ba@gmail.com/ ==================== Link: https://lore.kernel.org/r/20211227172116.320768-1-nikolay@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-29 12:59:44 -08:00
Nikolay Aleksandrov	f83a112bd9	net: bridge: mcast: add and enforce startup query interval minimum As reported[1] if startup query interval is set too low in combination with large number of startup queries and we have multiple bridges or even a single bridge with multiple querier vlans configured we can crash the machine. Add a 1 second minimum which must be enforced by overwriting the value if set lower (i.e. without returning an error) to avoid breaking user-space. If that happens a log message is emitted to let the admin know that the startup interval has been set to the minimum. It doesn't make sense to make the startup interval lower than the normal query interval so use the same value of 1 second. The issue has been present since these intervals could be user-controlled. [1] https://lore.kernel.org/netdev/e8b9ce41-57b9-b6e2-a46a-ff9c791cf0ba@gmail.com/ Fixes: d902eee43f19 ("bridge: Add multicast count/interval sysfs entries") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-29 12:59:38 -08:00
Nikolay Aleksandrov	99b4061095	net: bridge: mcast: add and enforce query interval minimum As reported[1] if query interval is set too low and we have multiple bridges or even a single bridge with multiple querier vlans configured we can crash the machine. Add a 1 second minimum which must be enforced by overwriting the value if set lower (i.e. without returning an error) to avoid breaking user-space. If that happens a log message is emitted to let the administrator know that the interval has been set to the minimum. The issue has been present since these intervals could be user-controlled. [1] https://lore.kernel.org/netdev/e8b9ce41-57b9-b6e2-a46a-ff9c791cf0ba@gmail.com/ Fixes: d902eee43f19 ("bridge: Add multicast count/interval sysfs entries") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-29 12:59:37 -08:00
Tamir Duberstein	fb7bc92040	ipv6: raw: check passed optlen before reading Add a check that the user-provided option is at least as long as the number of bytes we intend to read. Before this patch we would blindly read sizeof(int) bytes even in cases where the user passed optlen<sizeof(int), which would potentially read garbage or fault. Discovered by new tests in https://github.com/google/gvisor/pull/6957 . The original get_user call predates history in the git repo. Signed-off-by: Tamir Duberstein <tamird@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20211229200947.2862255-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-29 12:32:56 -08:00
Jakub Kicinski	cfcad56b20	Merge branch 'net-define-new-hwtstamp-flag-and-return-it-to-userspace' Hangbin Liu says: ==================== net: define new hwtstamp flag and return it to userspace This patchset defined the new hwtstamp flag HWTSTAMP_FLAG_BONDED_PHC_INDEX to make userspace program build pass with old kernel header by settting ifdef. Let's also return the flag when do SIOC[G/S]HWTSTAMP to let userspace know that it's necessary for a given netdev. ==================== Link: https://lore.kernel.org/r/20211229080938.231324-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-29 12:31:39 -08:00
Hangbin Liu	cfe355c56e	Bonding: return HWTSTAMP_FLAG_BONDED_PHC_INDEX to notify user space If the userspace program is distributed in binary form (distro package), there is no way to know on which kernel versions it will run. Let's only check if the flag was set when do SIOCSHWTSTAMP. And return hwtstamp_config with flag HWTSTAMP_FLAG_BONDED_PHC_INDEX to notify userspace whether the new feature is supported or not. Suggested-by: Jakub Kicinski <kuba@kernel.org> Fixes: 085d61000845 ("Bonding: force user to add HWTSTAMP_FLAG_BONDED_PHC_INDEX when get/set HWTSTAMP") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-29 12:31:37 -08:00
Hangbin Liu	1bb412d46c	net_tstamp: define new flag HWTSTAMP_FLAG_BONDED_PHC_INDEX As we defined the new hwtstamp_config flag HWTSTAMP_FLAG_BONDED_PHC_INDEX as enum, it's not easy for userspace program to check if the flag is supported when build. Let's define the new flag so user space could build it on old kernel with ifdef check. Fixes: 9c9211a3fc7a ("net_tstamp: add new flag HWTSTAMP_FLAG_BONDED_PHC_INDEX") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-29 12:31:36 -08:00
Linus Torvalds	eec4df26e2	s390 update for 5.16-rc8 - fix s390 mcount regex typo in recordmcount.pl -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmHMLXcACgkQIg7DeRsp bsKb/w/8CtZwzz4orNTNTNbnLWauEQTmUiqznzRLusOrcWmzoE5hOVzYu/gpdGl6 fXGFMAfP0+2+juUsDQ94rlWLXZUswbEm1FN9W58H1AJS2rrHPYC4/vfpEkLcjF27 9rigQys5TtUmeHI+Xoxp1lLZc1PysIvFakLeJgTzXflHAlFZCzc99mJ/jCmXtkvA Su8vf2GEU2XeEJOuibO7JRKwHbbcf4imfgET3Nb0MNnYAOcJ41++2IkrHSvj0O7W Pe/OEE7hnTcDNU9Gf9DuZjvWyDWml752aGQjG990Xwk/oWQZ2bBVYRvUgnVMBWSc H2fVTfxadF9y2tH0C1ODKb1TNBqCSqED6Ld86gK+ymkDM85tmdkCzWzRg1ofXAJr FmJMEPAwRL6iR9ngKIX9qZMoqUfqt/pGc3TV67cUxUkaQ6T8TW0luSjm+l/C2Moz FzjWCPvDZaX+4e6b7gbD+ZDVreY7qEZIAeNEMoEGlr7hZumkrrmMvUTh72Sb8fpp /Wy+kMHstrr5xHmoRqgphx9qPwxwiGIh+Z7ssncTjb4CNxzp4xYK2KX7Wr2h1nOM VE435zyJvT2MmnYVXFxsQHQaY3bzaSROyM7UyZyVl98ZQX6Apo+MDkSd5KTUsK6y Na/Cb1qBd0acZCjf5LG1LNuUIUat4jqrhBFtoPkHtxrIVC6jSrY= =UL8C -----END PGP SIGNATURE----- Merge tag 's390-5.16-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fix from Heiko Carstens: - fix s390 mcount regex typo in recordmcount.pl * tag 's390-5.16-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: recordmcount.pl: fix typo in s390 mcount regex	2021-12-29 10:07:20 -08:00
Ruud Bos	38970eac41	igb: support EXTTS on 82580/i354/i350 Support for the PTP pin function on 82580/i354/i350 based adapters. Because the time registers of these adapters do not have the nice split in second rollovers as the i210 has, the implementation is slightly more complex compared to the i210 implementation. Signed-off-by: Ruud Bos <kernel.hbk@gmail.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-29 10:01:04 -08:00
Ruud Bos	1819fc753a	igb: support PEROUT on 82580/i354/i350 Support for the PEROUT PTP pin function on 82580/i354/i350 based adapters. Because the time registers of these adapters do not have the nice split in second rollovers as the i210 has, the implementation is slightly more complex compared to the i210 implementation. Signed-off-by: Ruud Bos <kernel.hbk@gmail.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-29 10:01:04 -08:00
Ruud Bos	cf99c1dd7b	igb: move PEROUT and EXTTS isr logic to separate functions Remove code duplication in the tsync interrupt handler function by moving this logic to separate functions. This keeps the interrupt handler readable and allows the new functions to be extended for adapter types other than i210. Signed-off-by: Ruud Bos <kernel.hbk@gmail.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-29 10:01:03 -08:00
Ruud Bos	8ab55aba31	igb: move SDP config initialization to separate function Allow reuse of SDP config struct initialization by moving it to a separate function. Signed-off-by: Ruud Bos <kernel.hbk@gmail.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-29 10:01:03 -08:00
Ciara Loftus	5bec7ca2be	xsk: Initialise xskb free_list_node This commit initialises the xskb's free_list_node when the xskb is allocated. This prevents a potential false negative returned from a call to list_empty for that node, such as the one introduced in commit 199d983bc015 ("xsk: Fix crash on double free in buffer pool") In my environment this issue caused packets to not be received by the xdpsock application if the traffic was running prior to application launch. This happened when the first batch of packets failed the xskmap lookup and XDP_PASS was returned from the bpf program. This action is handled in the i40e zc driver (and others) by allocating an skbuff, freeing the xdp_buff and adding the associated xskb to the xsk_buff_pool's free_list if it hadn't been added already. Without this fix, the xskb is not added to the free_list because the check to determine if it was added already returns an invalid positive result. Later, this caused allocation errors in the driver and the failure to receive packets. Fixes: 199d983bc015 ("xsk: Fix crash on double free in buffer pool") Fixes: 2b43470add8c ("xsk: Introduce AF_XDP buffer allocation API") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20211220155250.2746-1-ciara.loftus@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-29 10:00:18 -08:00
Haimin Zhang	3ccdcee284	bpf: Add missing map_get_next_key method to bloom filter map. Without it, kernel crashes in map_get_next_key(). Fixes: 9330986c0300 ("bpf: Add bloom filter map implementation") Reported-by: TCS Robot <tcs_robot@tencent.com> Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Joanne Koong <joannekoong@fb.com> Link: https://lore.kernel.org/bpf/1640776802-22421-1-git-send-email-tcs.kernel@gmail.com	2021-12-29 09:38:31 -08:00
Jakub Kicinski	b6459415b3	net: Don't include filter.h from net/sock.h sock.h is pretty heavily used (5k objects rebuilt on x86 after it's touched). We can drop the include of filter.h from it and add a forward declaration of struct sk_filter instead. This decreases the number of rebuilt objects when bpf.h is touched from ~5k to ~1k. There's a lot of missing includes this was masking. Primarily in networking tho, this time. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org	2021-12-29 08:48:14 -08:00
Rafał Miłecki	9ed319e411	of: net: support NVMEM cells with MAC in text format Some NVMEM devices have text based cells. In such cases MAC is stored in a XX:XX:XX:XX:XX:XX format. Use mac_pton() to parse such data and support those NVMEM cells. This is required to support e.g. a very popular U-Boot and its environment variables. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-29 11:32:58 +00:00
Gal Pressman	992d8a4e38	net/mlx5e: Fix wrong features assignment in case of error In case of an error in mlx5e_set_features(), 'netdev->features' must be updated with the correct state of the device to indicate which features were updated successfully. To do that we maintain a copy of 'netdev->features' and update it after successful feature changes, so we can assign it to back to 'netdev->features' if needed. However, since not all netdev features are handled by the driver (e.g. GRO/TSO/etc), some features may not be updated correctly in case of an error updating another feature. For example, while requesting to disable TSO (feature which is not handled by the driver) and enable HW-GRO, if an error occurs during HW-GRO enable, 'oper_features' will be assigned with 'netdev->features' and HW-GRO turned off. TSO will remain enabled in such case, which is a bug. To solve that, instead of using 'netdev->features' as the baseline of 'oper_features' and changing it on set feature success, use 'features' instead and update it in case of errors. Fixes: 75b81ce719b7 ("net/mlx5e: Don't override netdev features field unless in error flow") Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-12-28 22:42:50 -08:00
Roi Dayan	077cdda764	net/mlx5e: TC, Fix memory leak with rules with internal port Fix a memory leak with decap rule with internal port as destination device. The driver allocates a modify hdr action but doesn't set the flow attr modify hdr action which results in skipping releasing the modify hdr action when releasing the flow. backtrace: [<000000005f8c651c>] krealloc+0x83/0xd0 [<000000009f59b143>] alloc_mod_hdr_actions+0x156/0x310 [mlx5_core] [<000000002257f342>] mlx5e_tc_match_to_reg_set_and_get_id+0x12a/0x360 [mlx5_core] [<00000000b44ea75a>] mlx5e_tc_add_fdb_flow+0x962/0x1470 [mlx5_core] [<0000000003e384a0>] __mlx5e_add_fdb_flow+0x54c/0xb90 [mlx5_core] [<00000000ed8b22b6>] mlx5e_configure_flower+0xe45/0x4af0 [mlx5_core] [<00000000024f4ab5>] mlx5e_rep_indr_offload.isra.0+0xfe/0x1b0 [mlx5_core] [<000000006c3bb494>] mlx5e_rep_indr_setup_tc_cb+0x90/0x130 [mlx5_core] [<00000000d3dac2ea>] tc_setup_cb_add+0x1d2/0x420 Fixes: b16eb3c81fe2 ("net/mlx5: Support internal port as decap route device") Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-12-28 22:42:50 -08:00
Andrii Nakryiko	5b3d729877	libbpf: Improve LINUX_VERSION_CODE detection Ubuntu reports incorrect kernel version through uname(), which on older kernels leads to kprobe BPF programs failing to load due to the version check mismatch. Accommodate Ubuntu's quirks with LINUX_VERSION_CODE by using Ubuntu-specific /proc/version_code to fetch major/minor/patch versions to form LINUX_VERSION_CODE. While at it, consolide libbpf's kernel version detection code between libbpf.c and libbpf_probes.c. [0] Closes: https://github.com/libbpf/libbpf/issues/421 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20211222231003.2334940-1-andrii@kernel.org	2021-12-28 19:20:31 -08:00
Andrii Nakryiko	f60edf5b53	libbpf: Use 100-character limit to make bpf_tracing.h easier to read Improve bpf_tracing.h's macro definition readability by keeping them single-line and better aligned. This makes it easier to follow all those variadic patterns. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20211222213924.1869758-2-andrii@kernel.org	2021-12-28 19:14:45 -08:00
Andrii Nakryiko	3cc31d7940	libbpf: Normalize PT_REGS_xxx() macro definitions Refactor PT_REGS macros definitions in bpf_tracing.h to avoid excessive duplication. We currently have classic PT_REGS_xxx() and CO-RE-enabled PT_REGS_xxx_CORE(). We are about to add also _SYSCALL variants, which would require excessive copying of all the per-architecture definitions. Instead, separate architecture-specific field/register names from the final macro that utilize them. That way for upcoming _SYSCALL variants we'll be able to just define x86_64 exception and otherwise have one common set of _SYSCALL macro definitions common for all architectures. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/bpf/20211222213924.1869758-1-andrii@kernel.org	2021-12-28 19:14:44 -08:00
Jakub Kicinski	9665e03a8d	Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2021-12-28 This series contains updates to igc driver only. Vinicius disables support for crosstimestamp on i225-V as lockups are being observed. James McLaughlin fixes Tx timestamping support on non-MSI-X platforms. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: igc: Fix TX timestamp support for non-MSI-X platforms igc: Do not enable crosstimestamping for i225-V models ==================== Link: https://lore.kernel.org/r/20211228182421.340354-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-28 16:19:10 -08:00
Jakub Kicinski	271d3be1c3	Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 10GbE Intel Wired LAN Driver Updates 2021-12-28 Alexander Lobakin says: napi_build_skb() I introduced earlier this year ([0]) aims to decrease MM pressure and the overhead from in-place kmem_cache_alloc() on each Rx entry processing by decaching skbuff_heads from NAPI per-cpu cache filled prior to that by napi_consume_skb() (so it is sort of a direct shortcut for free -> mm -> alloc cycle). Currently, no in-tree drivers use it. Switch all Intel Ethernet drivers to it to get slight-to-medium perf boosts depending on the frame size. ice driver, 50 Gbps link, pktgen + XDP_PASS (local in) sample: frame_size/nthreads 64/42 128/20 256/8 512/4 1024/2 1532/1 net-next (Kpps) 46062 34654 18248 9830 5343 2714 series 47438 34708 18330 9875 5435 2777 increase 2.9% 0.15% 0.45% 0.46% 1.72% 2.32% Additionally, e1000's been switched to napi_consume_skb() as it's safe and works fine there, and there's no point in napi_build_skb() without paired NAPI cache feeding point. [0] https://lore.kernel.org/all/20210213141021.87840-1-alobakin@pm.me * '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ixgbevf: switch to napi_build_skb() ixgbe: switch to napi_build_skb() igc: switch to napi_build_skb() igb: switch to napi_build_skb() ice: switch to napi_build_skb() iavf: switch to napi_build_skb() i40e: switch to napi_build_skb() e1000: switch to napi_build_skb() e1000: switch to napi_consume_skb() ==================== Link: https://lore.kernel.org/r/20211228175815.281449-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-28 16:16:57 -08:00
Christophe JAILLET	140c7bc7d1	ionic: Initialize the 'lif->dbid_inuse' bitmap When allocated, this bitmap is not initialized. Only the first bit is set a few lines below. Use bitmap_zalloc() to make sure that it is cleared before being used. Fixes: 6461b446f2a0 ("ionic: Add interrupts and doorbells") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Shannon Nelson <snelson@pensando.io> Link: https://lore.kernel.org/r/6a478eae0b5e6c63774e1f0ddb1a3f8c38fa8ade.1640527506.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-28 16:07:35 -08:00
Aleksander Jan Bajkowski	4c46625bb5	net: lantiq_etop: add blank line after declaration This patch adds a missing line after the declaration and fixes the checkpatch warning: WARNING: Missing a blank line after declarations + int desc; + for (desc = 0; desc < LTQ_DESC_NUM; desc++) Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Link: https://lore.kernel.org/r/20211228220031.71576-1-olek2@wp.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-28 16:00:08 -08:00
Aleksander Jan Bajkowski	723955913e	net: lantiq_etop: add missing comment for wmb() This patch adds the missing code comment for memory barrier call and fixes checkpatch warning: WARNING: memory barrier without comment + wmb(); Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Link: https://lore.kernel.org/r/20211228214910.70810-1-olek2@wp.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-28 15:59:55 -08:00
Thomas Gleixner	1bd3277188	r8169: don't use pci_irq_vector() in atomic context Since referenced change pci_irq_vector() can't be used in atomic context any longer. This conflicts with our usage of this function in rtl8169_netpoll(). Therefore store the interrupt number in struct rtl8169_private. Fixes: 495c66aca3da ("genirq/msi: Convert to new functions") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/3cd24763-f307-78f5-76ed-a5fbf315fb28@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-28 15:59:05 -08:00
Linus Torvalds	e7c124bd04	selinux/stable-5.16 PR 20211228 -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmHLZPoUHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXO+yw//XzQzscVHj35GDwmVYtSo4fyge3L4 bNLz498hNdnL15GWuPa6Rr97zCJ0QSWAi9Ecz/21EcNdPIvpL2x7X7SCjT4+X11Z gbfuZJWU4Dpg5qu7CJU4eeOen78XietjQFYpiqn5aDRgDZzHqXhtgUZUMDHw6O3z dCy93aOGsIZ5t2fkQcwao19EbhslKtjX9upl2uUoxkmXz8wnMaql3p3ciNjdCYOK T6x0Q/A311V7wW/2AYzk/17zOVIm/Fpv3YiYN/PCb1dM2lpKZw1AayMKZf8zwfcY Tp+jyJnyVbPAaBqG1i2hNUziQXVchGBQONXJDcT9LavgITFMZ9li0XGJV9aHTM0/ G+mAofFJVYSWlRva17lGKIpyauJqIgJirE7XERhkfBKIcAs2uXyw8kMbWKsLhx30 vn9LxWg8HHyg8ob5/+gwaDe1s1R4dAVP42D9pyk6boY7E/dX8i25lfQKGQebt2K1 BJOhxVMPfN3gna7kK173dY9ZT+xqs8Y1vQ5QHOPlg5H93sDmat9XBh7PPDKVhFeq 7yxjiXkGXJx/JGQpa33PgSFAhWEHzyB/ILV5jozy7lTaQNeBzz2RKoFfSDIkLJdb R9p9OoXz9FeVC6hdxXvw4EqVPAzT4tEC4/Ik8RhkUITLBxahpTVM5bULKiFbN7mj lS1GzEnwu8OcLuc= =gMGp -----END PGP SIGNATURE----- Merge tag 'selinux-pr-20211228' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux fix from Paul Moore: "One more small SELinux patch to address an uninitialized stack variable" * tag 'selinux-pr-20211228' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: initialize proto variable in selinux_ip_postroute_compat()	2021-12-28 13:33:06 -08:00
Linus Torvalds	ecf71de775	A couple improvements for auxdisplay: - charlcd: checking for pointer reference before dereferencing. - charlcd: fixing coding style issue. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmHLLwMACgkQGXyLc2ht IW1aFA/+PqbolftHHUCq8fmfWnwWdyaBnyrmiLCxkxovPYAgR4GL/XuVh08wF9jL jjpUyGOnU3Dojp7btuQe0ybvdk4UvqTrtyWduN4p7xt5BMA/aiBBUoUSpCVt+j4O CTVsf4pMbm+6BuKoB8R7Ve1eblNCuJYgwa3s9rqvSb7wuEqS/sCxJpbevKGh2MiY DmFY6rK/CAVql+tVck/T1xoqwZPOLJuxSxIl2xOc4dQcBJT5tlL4yKmywUI8jtIX nHa9WrAOSL4GrTDytCoDiPqQ0DZS+WOFBjXMI36jCBI5cJy23w1U+3ksWQhpH9Jm /hfDeHJQaoWtnxjd+/iFCcExkpHiT6nlEgS3SfVQrvyrJzHgKnL+LcjQjGDs8TuD NcnHQOE/qZ6sPjVvHoYUWLGISGE4dtVKytNuNzCLdQwNZ/KwPLfGTv6sz08gCH/o Wz9MRghCCNHLrJWuraBRgEFO9AVGUYFRY7g/7yD7FwEu0ji5EaFckwvrPFqb1ALL pW/HSP+kQNtzAl/4JaXmPEzEoS+F0nzoL2s6MZdO71/py42oWGDENuOoBAywO8ZC XfmP/zq9331AO0NbWiRNnubbJUXU+/zmsTydV4exxElsZWaU6aLl8HFgZuEfypIw S3XyrtNmUH0JYnD30bz+8g7GGsXLIQpd+GrOUifDSecf9XuCKdc= =Bz/T -----END PGP SIGNATURE----- Merge tag 'auxdisplay-for-linus-v5.16' of git://github.com/ojeda/linux Pull auxdisplay fixes from Miguel Ojeda: "A couple of improvements for charlcd: - check pointer before dereferencing - fix coding style issue" * tag 'auxdisplay-for-linus-v5.16' of git://github.com/ojeda/linux: auxdisplay: charlcd: checking for pointer reference before dereferencing auxdisplay: charlcd: fixing coding style issue	2021-12-28 11:46:15 -08:00
Linus Torvalds	f651faaaba	powerpc fixes for 5.16 #5 Fix DEBUG_WX never reporting any WX mappings, due to use of an incorrect config symbol since we converted to using generic ptdump. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmHK5joTHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgOC0EACbVv34c9dt2ODchr45RHJFmfDdp/oh xhQPpSFWQZltfqsLcMwsSRdbVWOvDUDb5eUVJF6PdBhK9BhO30nq3vP8477N5Whp PJX237ygZVjFWVbhb0cQB7nmWdMAJhJ7inqIvHxHcgkmzf/M9D21AV/YM3zrZLT9 NFgboT4TUHG7ll6s8xZce2AM+1Iq/3J8anFkh0AZJPPCdaefOh3JRAkAf7LDTYrV ARiVAImXW6zbtKcxZ7Qu5bDjPhHSB/wd00AKY3eqkGIEDwp78TDVItrIGVWX5Y+R U6abKgxkZwlMC1K+PCN0q53lg/eSX59Z4FEY/aiWZ2/zfNHU1CSIAVRu7SgL+JVN aypVpLuChp7TrWL0asngkJ8/ztFRx8SkBg0oxpLXOPkGU91KEhuedtBBlL60sfoG Xh55PvRxgjWUaNwiDX+f+G6Mx2zzaiGOUUeiwwexCtsji9Yt7Fe3NH2eteYLt45r S9hcyX/jfa/HMFTIoStqsb//pcQXi5fGA6xU/16aGNpIiRP8+nvrMcGSTuTO4CON RCO0mf6yd1Cdq+OrD7vX6lS8iXVhGJtfc49KQ6nPtMHkcMStEYS52jANlzfU3CKO ne3+mrX5raty7VkG0HRCfMauhgVPhHqbdlK1JNjSUagL0xXxf+88s3vVWuwRyIb2 32tNV4XvWtysYw== =Pt/J -----END PGP SIGNATURE----- Merge tag 'powerpc-5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: "Fix DEBUG_WX never reporting any WX mappings, due to use of an incorrect config symbol since we converted to using generic ptdump" * tag 'powerpc-5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/ptdump: Fix DEBUG_WX since generic ptdump conversion	2021-12-28 11:42:01 -08:00
James McLaughlin	f85846bbf4	igc: Fix TX timestamp support for non-MSI-X platforms Time synchronization was not properly enabled on non-MSI-X platforms. Fixes: 2c344ae24501 ("igc: Add support for TX timestamping") Signed-off-by: James McLaughlin <james.mclaughlin@qsc.com> Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-28 09:54:11 -08:00
Vinicius Costa Gomes	1e81dcc1ab	igc: Do not enable crosstimestamping for i225-V models It was reported that when PCIe PTM is enabled, some lockups could be observed with some integrated i225-V models. While the issue is investigated, we can disable crosstimestamp for those models and see no loss of functionality, because those models don't have any support for time synchronization. Fixes: a90ec8483732 ("igc: Add support for PTP getcrosststamp()") Link: https://lore.kernel.org/all/924175a188159f4e03bd69908a91e606b574139b.camel@gmx.de/ Reported-by: Stefan Dietrich <roots@gmx.de> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-28 09:54:10 -08:00
Alexander Lobakin	c155001989	ixgbevf: switch to napi_build_skb() napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order to save some cycles on freeing/allocating skbuff_heads on every new Rx or completed Tx. ixgbevf driver runs Tx completion polling cycle right before the Rx one and uses napi_consume_skb() to feed the cache with skbuff_heads of completed entries, so it's never empty and always warm at that moment. Switch to the napi_build_skb() to relax mm pressure on heavy Rx. Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-28 09:42:33 -08:00
Alexander Lobakin	a39363367a	ixgbe: switch to napi_build_skb() napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order to save some cycles on freeing/allocating skbuff_heads on every new Rx or completed Tx. ixgbe driver runs Tx completion polling cycle right before the Rx one and uses napi_consume_skb() to feed the cache with skbuff_heads of completed entries, so it's never empty and always warm at that moment. Switch to the napi_build_skb() to relax mm pressure on heavy Rx. Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-28 09:42:33 -08:00
Alexander Lobakin	4dd330a7e8	igc: switch to napi_build_skb() napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order to save some cycles on freeing/allocating skbuff_heads on every new Rx or completed Tx. igc driver runs Tx completion polling cycle right before the Rx one and uses napi_consume_skb() to feed the cache with skbuff_heads of completed entries, so it's never empty and always warm at that moment. Switch to the napi_build_skb() to relax mm pressure on heavy Rx. Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-28 09:42:33 -08:00
Alexander Lobakin	fa441f0fa8	igb: switch to napi_build_skb() napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order to save some cycles on freeing/allocating skbuff_heads on every new Rx or completed Tx. igb driver runs Tx completion polling cycle right before the Rx one and uses napi_consume_skb() to feed the cache with skbuff_heads of completed entries, so it's never empty and always warm at that moment. Switch to the napi_build_skb() to relax mm pressure on heavy Rx. Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-28 09:42:33 -08:00
Alexander Lobakin	5ce6663158	ice: switch to napi_build_skb() napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order to save some cycles on freeing/allocating skbuff_heads on every new Rx or completed Tx. ice driver runs Tx completion polling cycle right before the Rx one and uses napi_consume_skb() to feed the cache with skbuff_heads of completed entries, so it's never empty and always warm at that moment. Switch to the napi_build_skb() to relax mm pressure on heavy Rx. Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-28 09:42:33 -08:00
Alexander Lobakin	ef687d61e0	iavf: switch to napi_build_skb() napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order to save some cycles on freeing/allocating skbuff_heads on every new Rx or completed Tx. iavf driver runs Tx completion polling cycle right before the Rx one and uses napi_consume_skb() to feed the cache with skbuff_heads of completed entries, so it's never empty and always warm at that moment. Switch to the napi_build_skb() to relax mm pressure on heavy Rx. Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-28 09:42:33 -08:00
Alexander Lobakin	6e19cf7d38	i40e: switch to napi_build_skb() napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order to save some cycles on freeing/allocating skbuff_heads on every new Rx or completed Tx. i40e driver runs Tx completion polling cycle right before the Rx one and uses napi_consume_skb() to feed the cache with skbuff_heads of completed entries, so it's never empty and always warm at that moment. Switch to the napi_build_skb() to relax mm pressure on heavy Rx. Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-28 09:42:33 -08:00
Alexander Lobakin	89a354c03b	e1000: switch to napi_build_skb() napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order to save some cycles on freeing/allocating skbuff_heads on every new Rx or completed Tx element. e1000 driver runs Tx completion polling cycle right before the Rx one. Now that e1000 uses napi_consume_skb() to put skbuff_heads of completed entries into the cache, it will never empty and always warm at that moment. Switch to the napi_build_skb() to relax mm pressure on heavy Rx and increase throughput. Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-28 09:42:25 -08:00
Alexander Lobakin	dcb95f06ea	e1000: switch to napi_consume_skb() In order to take the best from per-cpu NAPI skbuff_head caches and CPU cycles, let's switch from dev_kfree_skb_any(), which passes skb back to the mm layer, to napi_consume_skb(), which feeds those caches on non-zero budget instead (falls back to the former on 0). Do the replacement in e1000_unmap_and_free_tx_resource(). There are 4 call sites of this function throughout the driver: * e1000_clean_tx_ring(). Slowpath, process context, cleans the whole Tx ring on ifdown. Use budget of 0 here; * e1000_tx_map(). Hotpath, net Tx softirq, unmaps the buffers in case of error. Use 0 as well; * e1000_clean_tx_irq(). Hotpath, NAPI Tx completion polling cycle. As the driver doesn't count completed Tx entries towards the NAPI budget, just use the poll budget of 64 to utilize caches. Apart from being a preparation for switching to napi_build_skb(), this is useful on its own as well, as napi_consume_skb() flushes skb caches by batches of 32 instead of one-at-a-time. Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-12-28 09:41:57 -08:00
Colin Ian King	0f1eae8e56	net: caif: remove redundant assignment to variable expectlen Variable expectlen is being assigned a value that is never read, the assignment occurs before a return statement. The assignment is redundant and can be removed. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-28 12:52:24 +00:00
David S. Miller	16fa29aef7	Merge branch 'smc-fixes' Dust Li says: ==================== net/smc: fix kernel panic caused by race of smc_sock This patchset fixes the race between smc_release triggered by close(2) and cdc_handle triggered by underlaying RDMA device. The race is caused because the smc_connection may been released before the pending tx CDC messages got its CQEs. In order to fix this, I add a counter to track how many pending WRs we have posted through the smc_connection, and only release the smc_connection after there is no pending WRs on the connection. The first patch prevents posting WR on a QP that is not in RTS state. This patch is needed because if we post WR on a QP that is not in RTS state, ib_post_send() may success but no CQE will return, and that will confuse the counter tracking the pending WRs. The second patch add a counter to track how many WRs were posted through the smc_connection, and don't reset the QP on link destroying to prevent leak of the counter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-28 12:42:46 +00:00
Dust Li	349d43127d	net/smc: fix kernel panic caused by race of smc_sock A crash occurs when smc_cdc_tx_handler() tries to access smc_sock but smc_release() has already freed it. [ 4570.695099] BUG: unable to handle page fault for address: 000000002eae9e88 [ 4570.696048] #PF: supervisor write access in kernel mode [ 4570.696728] #PF: error_code(0x0002) - not-present page [ 4570.697401] PGD 0 P4D 0 [ 4570.697716] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 4570.698228] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc4+ #111 [ 4570.699013] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/0 [ 4570.699933] RIP: 0010:_raw_spin_lock+0x1a/0x30 <...> [ 4570.711446] Call Trace: [ 4570.711746] <IRQ> [ 4570.711992] smc_cdc_tx_handler+0x41/0xc0 [ 4570.712470] smc_wr_tx_tasklet_fn+0x213/0x560 [ 4570.712981] ? smc_cdc_tx_dismisser+0x10/0x10 [ 4570.713489] tasklet_action_common.isra.17+0x66/0x140 [ 4570.714083] __do_softirq+0x123/0x2f4 [ 4570.714521] irq_exit_rcu+0xc4/0xf0 [ 4570.714934] common_interrupt+0xba/0xe0 Though smc_cdc_tx_handler() checked the existence of smc connection, smc_release() may have already dismissed and released the smc socket before smc_cdc_tx_handler() further visits it. smc_cdc_tx_handler() \|smc_release() if (!conn) \| \| \|smc_cdc_tx_dismiss_slots() \| smc_cdc_tx_dismisser() \| \|sock_put(&smc->sk) <- last sock_put, \| smc_sock freed bh_lock_sock(&smc->sk) (panic) \| To make sure we won't receive any CDC messages after we free the smc_sock, add a refcount on the smc_connection for inflight CDC message(posted to the QP but haven't received related CQE), and don't release the smc_connection until all the inflight CDC messages haven been done, for both success or failed ones. Using refcount on CDC messages brings another problem: when the link is going to be destroyed, smcr_link_clear() will reset the QP, which then remove all the pending CQEs related to the QP in the CQ. To make sure all the CQEs will always come back so the refcount on the smc_connection can always reach 0, smc_ib_modify_qp_reset() was replaced by smc_ib_modify_qp_error(). And remove the timeout in smc_wr_tx_wait_no_pending_sends() since we need to wait for all pending WQEs done, or we may encounter use-after- free when handling CQEs. For IB device removal routine, we need to wait for all the QPs on that device been destroyed before we can destroy CQs on the device, or the refcount on smc_connection won't reach 0 and smc_sock cannot be released. Fixes: 5f08318f617b ("smc: connection data control (CDC)") Reported-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-28 12:42:45 +00:00

... 2 3 4 5 6 ...

1062069 Commits