linux

iv/linux

Author	SHA1	Message	Date
Kirill A. Shutemov	23baf831a3	mm, treewide: redefine MAX_ORDER sanely MAX_ORDER currently defined as number of orders page allocator supports: user can ask buddy allocator for page order between 0 and MAX_ORDER-1. This definition is counter-intuitive and lead to number of bugs all over the kernel. Change the definition of MAX_ORDER to be inclusive: the range of orders user can ask from buddy allocator is 0..MAX_ORDER now. [kirill@shutemov.name: fix min() warning] Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box [akpm@linux-foundation.org: fix another min_t warning] [kirill@shutemov.name: fixups per Zi Yan] Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name [akpm@linux-foundation.org: fix underlining in docs] Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/ Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-04-05 19:42:46 -07:00
Nick Child	92125c3a60	ibmvnic: Add hotpluggable CPU callbacks to reassign affinity hints When CPU's are added and removed, ibmvnic devices will reassign hint values. Introduce a new cpu hotplug state CPUHP_IBMVNIC_DEAD to signal to ibmvnic devices that the CPU has been removed and it is time to reset affinity hint assignments. On the other hand, when CPU's are being added, add a state instance to CPUHP_AP_ONLINE_DYN which will trigger a reassignment of affinity hints once the new CPU's are online. This implementation is based on the virtio_net driver. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com> Reviewed-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-14 10:47:07 +00:00
Nick Child	44fbc1b6e0	ibmvnic: Assign IRQ affinity hints to device queues Assign affinity hints to ibmvnic device queue interrupts. Affinity hints are assigned and removed during sub-crq init and teardown, respectively. This update should improve latency if utilized as interrupt lines and processing are more equally distributed among CPU's. This implementation is based on the virtio_net driver. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com> Reviewed-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-14 10:47:07 +00:00
Jakub Kicinski	0e55546b18	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net include/linux/netdevice.h net/core/dev.c `6510ea973d` ("net: Use this_cpu_inc() to increment net->core_stats") `794c24e992` ("net-core: rx_otherhost_dropped to core_stats") https://lore.kernel.org/all/20220428111903.5f4304e0@canb.auug.org.au/ drivers/net/wan/cosa.c `d48fea8401` ("net: cosa: fix error check return value of register_chrdev()") `89fbca3307` ("net: wan: remove support for COSA and SRP synchronous serial boards") https://lore.kernel.org/all/20220428112130.1f689e5e@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-04-28 13:02:01 -07:00
Dany Madden	aeaf59b787	Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits" This reverts commit `723ad91613` When client requests channel or ring size larger than what the server can support the server will cap the request to the supported max. So, the client would not be able to successfully request resources that exceed the server limit. Fixes: `723ad91613` ("ibmvnic: Add ethtool private flag for driver-defined queue limits") Signed-off-by: Dany Madden <drt@linux.ibm.com> Link: https://lore.kernel.org/r/20220427235146.23189-1-drt@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-04-28 09:46:18 -07:00
Sukadev Bhattiprolu	93b1ebb348	ibmvnic: Allow multiple ltbs in txpool ltb_set Allow multiple LTBs in the txpool's ltb_set. i.e rather than using a single large LTB, use several smaller LTBs. The first n-1 LTBs will all be of the same size. The size of the last LTB in the set depends on the number of buffers and buffer (mtu) size. This strategy hopefully allows more reuse of the initial LTBs and also reduces the chances of an allocation failure (of the large LTB) when system is low in memory. Suggested-by: Brian King <brking@linux.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-04-15 14:02:06 -07:00
Sukadev Bhattiprolu	a75de82057	ibmvnic: Allow multiple ltbs in rxpool ltb_set Allow multiple LTBs in the rxpool's ltb_set. The first n-1 LTBs will all be of the same size. The size of the last LTB in the set depends on the number of buffers and buffer (mtu) size. Having a set of LTBs per pool provides a couple of benefits. First, with the current value of IBMVNIC_MAX_LTB_SIZE of 16MB, with an MTU of 9000, we need a LTB (DMA buffer) of that size but the allocation can fail in low memory conditions. With a set of LTBs per pool, we can use several smaller (8MB) LTBs and hopefully have fewer allocation failures. (See also comments in ibmvnic.h on the trade-off with smaller LTBs) Second since the kernel limits the size of the DMA buffer to 16MB (based on MAX_ORDER), with a single DMA buffer per pool, the pool is also limited to 16MB. This in turn limits the number of buffers per pool to 1763 when MTU is 9000. With a set of LTBs per pool, we can have upto the max of 4096 buffers per pool even when MTU is 9000. Suggested-by: Brian King <brking@linux.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-04-15 14:02:05 -07:00
Sukadev Bhattiprolu	d6b4585090	ibmvnic: convert rxpool ltb to a set of ltbs Define and use interfaces that treat the long term buffer (LTB) of an rxpool as a set of LTBs rather than a single LTB. The set only has one LTB for now. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-04-15 14:02:05 -07:00
Sukadev Bhattiprolu	4219196d1f	ibmvnic: fix race between xmit and reset There is a race between reset and the transmit paths that can lead to ibmvnic_xmit() accessing an scrq after it has been freed in the reset path. It can result in a crash like: Kernel attempted to read user page (0) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000000 Faulting instruction address: 0xc0080000016189f8 Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c0080000016189f8] ibmvnic_xmit+0x60/0xb60 [ibmvnic] LR [c000000000c0046c] dev_hard_start_xmit+0x11c/0x280 Call Trace: [c008000001618f08] ibmvnic_xmit+0x570/0xb60 [ibmvnic] (unreliable) [c000000000c0046c] dev_hard_start_xmit+0x11c/0x280 [c000000000c9cfcc] sch_direct_xmit+0xec/0x330 [c000000000bfe640] __dev_xmit_skb+0x3a0/0x9d0 [c000000000c00ad4] __dev_queue_xmit+0x394/0x730 [c008000002db813c] __bond_start_xmit+0x254/0x450 [bonding] [c008000002db8378] bond_start_xmit+0x40/0xc0 [bonding] [c000000000c0046c] dev_hard_start_xmit+0x11c/0x280 [c000000000c00ca4] __dev_queue_xmit+0x564/0x730 [c000000000cf97e0] neigh_hh_output+0xd0/0x180 [c000000000cfa69c] ip_finish_output2+0x31c/0x5c0 [c000000000cfd244] __ip_queue_xmit+0x194/0x4f0 [c000000000d2a3c4] __tcp_transmit_skb+0x434/0x9b0 [c000000000d2d1e0] __tcp_retransmit_skb+0x1d0/0x6a0 [c000000000d2d984] tcp_retransmit_skb+0x34/0x130 [c000000000d310e8] tcp_retransmit_timer+0x388/0x6d0 [c000000000d315ec] tcp_write_timer_handler+0x1bc/0x330 [c000000000d317bc] tcp_write_timer+0x5c/0x200 [c000000000243270] call_timer_fn+0x50/0x1c0 [c000000000243704] __run_timers.part.0+0x324/0x460 [c000000000243894] run_timer_softirq+0x54/0xa0 [c000000000ea713c] __do_softirq+0x15c/0x3e0 [c000000000166258] __irq_exit_rcu+0x158/0x190 [c000000000166420] irq_exit+0x20/0x40 [c00000000002853c] timer_interrupt+0x14c/0x2b0 [c000000000009a00] decrementer_common_virt+0x210/0x220 --- interrupt: 900 at plpar_hcall_norets_notrace+0x18/0x2c The immediate cause of the crash is the access of tx_scrq in the following snippet during a reset, where the tx_scrq can be either NULL or an address that will soon be invalid: ibmvnic_xmit() { ... tx_scrq = adapter->tx_scrq[queue_num]; txq = netdev_get_tx_queue(netdev, queue_num); ind_bufp = &tx_scrq->ind_buf; if (test_bit(0, &adapter->resetting)) { ... } But beyond that, the call to ibmvnic_xmit() itself is not safe during a reset and the reset path attempts to avoid this by stopping the queue in ibmvnic_cleanup(). However just after the queue was stopped, an in-flight ibmvnic_complete_tx() could have restarted the queue even as the reset is progressing. Since the queue was restarted we could get a call to ibmvnic_xmit() which can then access the bad tx_scrq (or other fields). We cannot however simply have ibmvnic_complete_tx() check the ->resetting bit and skip starting the queue. This can race at the "back-end" of a good reset which just restarted the queue but has not cleared the ->resetting bit yet. If we skip restarting the queue due to ->resetting being true, the queue would remain stopped indefinitely potentially leading to transmit timeouts. IOW ->resetting is too broad for this purpose. Instead use a new flag that indicates whether or not the queues are active. Only the open/ reset paths control when the queues are active. ibmvnic_complete_tx() and others wake up the queue only if the queue is marked active. So we will have: A. reset/open thread in ibmvnic_cleanup() and __ibmvnic_open() ->resetting = true ->tx_queues_active = false disable tx queues ... ->tx_queues_active = true start tx queues B. Tx interrupt in ibmvnic_complete_tx(): if (->tx_queues_active) netif_wake_subqueue(); To ensure that ->tx_queues_active and state of the queues are consistent, we need a lock which: - must also be taken in the interrupt path (ibmvnic_complete_tx()) - shared across the multiple queues in the adapter (so they don't become serialized) Use rcu_read_lock() and have the reset thread synchronize_rcu() after updating the ->tx_queues_active state. While here, consolidate a few boolean fields in ibmvnic_adapter for better alignment. Based on discussions with Brian King and Dany Madden. Fixes: `7ed5b31f4a` ("net/ibmvnic: prevent more than one thread from running in reset") Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-18 13:22:22 +00:00
Sukadev Bhattiprolu	fd98693cb0	ibmvnic: Allow queueing resets during probe We currently don't allow queuing resets when adapter is in VNIC_PROBING state - instead we throw away the reset and return EBUSY. The reasoning is probably that during ibmvnic_probe() the ibmvnic_adapter itself is being initialized so performing a reset during this time can lead us to accessing fields in the ibmvnic_adapter that are not fully initialized. A review of the code shows that all the adapter state neede to process a reset is initialized before registering the CRQ so that should no longer be a concern. Further the expectation is that if we do get a reset (transport event) during probe, the do..while() loop in ibmvnic_probe() will handle this by reinitializing the CRQ. While that is true to some extent, it is possible that the reset might occur _after_ the CRQ is registered and CRQ_INIT message was exchanged but _before_ the adapter state is set to VNIC_PROBED. As mentioned above, such a reset will be thrown away. While the client assumes that the adapter is functional, the vnic server will wait for the client to reinit the adapter. This disconnect between the two leaves the adapter down needing manual intervention. Because ibmvnic_probe() has other work to do after initializing the CRQ (such as registering the netdev at a minimum) and because the reset event can occur at any instant after the CRQ is initialized, there will always be a window between initializing the CRQ and considering the adapter ready for resets (ie state == PROBED). So rather than discarding resets during this window, allow queueing them - but only process them after the adapter is fully initialized. To do this, introduce a new completion state ->probe_done and have the reset worker thread wait on this before processing resets. This change brings up two new situations in or just after ibmvnic_probe(). First after one or more resets were queued, we encounter an error and decide to retry the initialization. At that point the queued resets are no longer relevant since we could be talking to a new vnic server. So we must purge/flush the queued resets before restarting the initialization. As a side note, since we are still in the probing stage and we have not registered the netdev, it will not be CHANGE_PARAM reset. Second this change opens up a potential race between the worker thread in __ibmvnic_reset(), the tasklet and the ibmvnic_open() due to the following sequence of events: 1. Register CRQ 2. Get transport event before CRQ_INIT completes. 3. Tasklet schedules reset: a) add rwi to list b) schedule_work() to start worker thread which runs and waits for ->probe_done. 4. ibmvnic_probe() decides to retry, purges rwi_list 5. Re-register crq and this time rest of probe succeeds - register netdev and complete(->probe_done). 6. Worker thread resumes in __ibmvnic_reset() from 3b. 7. Worker thread sets ->resetting bit 8. ibmvnic_open() comes in, notices ->resetting bit, sets state to IBMVNIC_OPEN and returns early expecting worker thread to finish the open. 9. Worker thread finds rwi_list empty and returns without opening the interface. If this happens, the ->ndo_open() call is effectively lost and the interface remains down. To address this, ensure that ->rwi_list is not empty before setting the ->resetting bit. See also comments in __ibmvnic_reset(). Fixes: `6a2fb0e99f` ("ibmvnic: driver initialization for kdump/kexec") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-25 10:57:47 +00:00
Sukadev Bhattiprolu	3a5d9db7fb	ibmvnic: remove unused ->wait_capability With previous bug fix, ->wait_capability flag is no longer needed and can be removed. Fixes: `249168ad07` ("ibmvnic: Make CRQ interrupt tasklet wait for all capabilities crqs") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-24 12:05:03 +00:00
Dany Madden	fe4c82a7e0	ibmvnic: remove unused defines IBMVNIC_STATS_TIMEOUT and IBMVNIC_INIT_FAILED are not used in the driver. Remove them. Suggested-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-14 12:58:02 +00:00
Sukadev Bhattiprolu	bbd809305b	ibmvnic: Reuse tx pools when possible Rather than releasing the tx pools on every close and reallocating them on open, reuse the tx pools unless the pool parameters (number of pools, size of each pool or size of each buffer in a pool) have changed. If the pool parameters changed, then release the old pools (if any) and allocate new ones. Specifically release tx pools, if: - adapter is removed, - pool parameters change during reset, - we encounter an error when opening the adapter in response to a user request (in ibmvnic_open()). and don't release them: - in __ibmvnic_close() or - on errors in __ibmvnic_open() in the hope that we can reuse them during this or next reset. With these changes reset_tx_pools() can be dropped because its optimization is now included in init_tx_pools() itself. cleanup_tx_pools() releases all the skbs associated with the pool and is called from ibmvnic_cleanup(), which is called on every reset. Since we want to reuse skbs across resets, move cleanup_tx_pools() out of ibmvnic_cleanup() and call it only when user closes the adapter. Add two new adapter fields, ->prev_mtu, ->prev_tx_pool_size to track the previous values and use them to decide whether to reuse or realloc the pools. Reviewed-by: Rick Lindsley <ricklind@linux.vnet.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-15 11:12:24 +01:00
Sukadev Bhattiprolu	489de956e7	ibmvnic: Reuse rx pools when possible Rather than releasing the rx pools on and reallocating them on every reset, reuse the rx pools unless the pool parameters (number of pools, size of each pool or size of each buffer in a pool) have changed. If the pool parameters changed, then release the old pools (if any) and allocate new ones. Specifically release rx pools, if: - adapter is removed, - pool parameters change during reset, - we encounter an error when opening the adapter in response to a user request (in ibmvnic_open()). and don't release them: - in __ibmvnic_close() or - on errors in __ibmvnic_open() in the hope that we can reuse them on the next reset. With these, reset_rx_pools() can be dropped because its optimzation is now included in init_rx_pools() itself. cleanup_rx_pools() releases all the skbs associated with the pool and is called from ibmvnic_cleanup(), which is called on every reset. Since we want to reuse skbs across resets, move cleanup_rx_pools() out of ibmvnic_cleanup() and call it only when user closes the adapter. Add two new adapter fields, ->prev_rx_buf_sz, ->prev_rx_pool_size to keep track of the previous values and use them to decide whether to reuse or realloc the pools. Reviewed-by: Rick Lindsley <ricklind@linux.vnet.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-15 11:12:24 +01:00
Sukadev Bhattiprolu	129854f061	ibmvnic: Use bitmap for LTB map_ids In a follow-on patch, we will reuse long term buffers when possible. When doing so we have to be careful to properly assign map ids. We can no longer assign them sequentially because a lower map id may be available and we could wrap at 255 and collide with an in-use map id. Instead, use a bitmap to track active map ids and to find a free map id. Don't need to take locks here since the map_id only changes during reset and at that time only the reset worker thread should be using the adapter. Noticed this when analyzing an error Dany Madden ran into with the patch set. Reported-by: Dany Madden <drt@linux.ibm.com> Reviewed-by: Rick Lindsley <ricklind@linux.vnet.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-15 11:12:24 +01:00
Sukadev Bhattiprolu	0df7b9ad8f	ibmvnic: Use/rename local vars in init_rx_pools To make the code more readable, use/rename some local variables. Basically we have a set of pools, num_pools. Each pool has a set of buffers, pool_size and each buffer is of size buff_size. pool_size is a bit ambiguous (whether size in bytes or buffers). Add a comment in the header file to make it explicit. Reviewed-by: Rick Lindsley <ricklind@linux.vnet.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-15 11:12:23 +01:00
Cristobal Forno	53f8b1b254	ibmvnic: Allow device probe if the device is not ready at boot Allow the device to be initialized at a later time if it is not available at boot. The device will be allowed to probe but will be given a "down" state. After completing device probe and registering the net device, the driver will await an interrupt signal from its partner device, indicating that it is ready for boot. The driver will schedule a work event to perform the necessary procedure and begin operation. Co-developed-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: Cristobal Forno <cforno12@linux.ibm.com> Acked-by: Lijun Pan <lijunp213@gmail.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-10 14:35:07 -07:00
Lijun Pan	c82eaa4064	ibmvnic: clean up the remaining debugfs data structures Commit `e704f0434e` ("ibmvnic: Remove debugfs support") did not clean up everything. Remove the remaining code. Signed-off-by: Lijun Pan <lijunp213@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-12 13:29:10 -07:00
Jakub Kicinski	b646acd5eb	net: re-solve some conflicts after net -> net-next merge Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-16 23:12:23 -08:00
David S. Miller	d489ded1a3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2021-02-16 17:51:13 -08:00
Sukadev Bhattiprolu	4a41c421f3	ibmvnic: serialize access to work queue on remove The work queue is used to queue reset requests like CHANGE-PARAM or FAILOVER resets for the worker thread. When the adapter is being removed the adapter state is set to VNIC_REMOVING and the work queue is flushed so no new work is added. However the check for adapter being removed is racy in that the adapter can go into REMOVING state just after we check and we might end up adding work just as it is being flushed (or after). The ->rwi_lock is already being used to serialize queue/dequeue work. Extend its usage ensure there is no race when scheduling/flushing work. Fixes: `6954a9e419` ("ibmvnic: Flush existing work items before device removal") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Cc:Uwe Kleine-König <uwe@kleine-koenig.org> Cc:Saeed Mahameed <saeed@kernel.org> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-15 15:17:30 -08:00
Dany Madden	a6f2fe5f10	ibmvnic: change IBMVNIC_MAX_IND_DESCS to 16 The supported indirect subcrq entries on Power8 is 16. Power9 supports 128. Redefined this value to 16 to minimize the driver from having to reset when migrating between Power9 and Power8. In our rx/tx performance testing, we found no performance difference between 16 and 128 at this time. Fixes: `f019fb6392` ("ibmvnic: Introduce indirect subordinate Command Response Queue buffer") Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-12 17:17:05 -08:00
Lijun Pan	4bb9f2e482	ibmvnic: remove unused spinlock_t stats_lock definition stats_lock is no longer used. So remove it. Signed-off-by: Lijun Pan <lijunp213@gmail.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-11 13:03:00 -08:00
Lijun Pan	a369d96ca5	ibmvnic: add comments for spinlock_t definitions There are several spinlock_t definitions without comments. Add them. Signed-off-by: Lijun Pan <lijunp213@gmail.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-11 13:03:00 -08:00
Jakub Kicinski	55fd59b003	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: drivers/net/ethernet/ibm/ibmvnic.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-03 15:44:09 -08:00
Dany Madden	a86d5c682b	ibmvnic: no reset timeout for 5 seconds after reset Reset timeout is going off right after adapter reset. This patch ensures that timeout is scheduled if it has been 5 seconds since the last reset. 5 seconds is the default watchdog timeout. Fixes: `ed651a1087` ("ibmvnic: Updated reset handling") Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-28 13:26:48 -08:00
Sukadev Bhattiprolu	76cdc5c5d9	ibmvnic: track pending login If after ibmvnic sends a LOGIN it gets a FAILOVER, it is possible that the worker thread will start reset process and free the login response buffer before it gets a (now stale) LOGIN_RSP. The ibmvnic tasklet will then try to access the login response buffer and crash. Have ibmvnic track pending logins and discard any stale login responses. Fixes: `032c5e8284` ("Driver for IBM System i/p VNIC protocol") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-28 13:26:48 -08:00
Jakub Kicinski	5c39f26e67	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Trivial conflict in CAN, keep the net-next + the byteswap wrapper. Conflicts: drivers/net/can/usb/gs_usb.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-27 18:25:27 -08:00
Lijun Pan	3ada288150	ibmvnic: enhance resetting status check during module exit Based on the discussion with Sukadev Bhattiprolu and Dany Madden, we believe that checking adapter->resetting bit is preferred since RESETTING state flag is not as strict as resetting bit. RESETTING state flag is removed since it is verbose now. Fixes: `7d7195a026` ("ibmvnic: Do not process device remove during device reset") Signed-off-by: Lijun Pan <ljp@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 16:28:34 -08:00
Dwip N. Banerjee	9a87c3fca2	ibmvnic: Ensure that device queue memory is cache-line aligned PCI bus slowdowns were observed on IBM VNIC devices as a result of partial cache line writes and non-cache aligned full cache line writes. Ensure that packet data buffers are cache-line aligned to avoid these slowdowns. Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 19:50:34 -08:00
Thomas Falcon	c62aa3734f	ibmvnic: Clean up TX code and TX buffer data structure Remove unused and superfluous code and members in existing TX implementation and data structures. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 19:50:34 -08:00
Thomas Falcon	f019fb6392	ibmvnic: Introduce indirect subordinate Command Response Queue buffer This patch introduces the infrastructure to send batched subordinate Command Response Queue descriptors, which are used by the ibmvnic driver to send TX frame and RX buffer descriptors. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Acked-by: Lijun Pan <ljp@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 19:50:33 -08:00
Lijun Pan	b9cd795b0e	ibmvnic: set up 200GBPS speed Set up the speed according to crq->query_phys_parms.rsp.speed. Fix IBMVNIC_10GBPS typo. Fixes: `f8d6ae0d27` ("ibmvnic: Report actual backing device speed and duplex values") Signed-off-by: Lijun Pan <ljp@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 16:03:51 -07:00
Thomas Falcon	507ebe6444	ibmvnic: Fix use-after-free of VNIC login response buffer The login response buffer is freed after it is received and parsed, but other functions in the driver still attempt to read it, such as when the device is opened, causing the Oops below. Store relevant information in the driver's private data structures and use those instead. BUG: Kernel NULL pointer dereference on read at 0x00000010 Faulting instruction address: 0xc00800000050a900 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables ibmvnic ibmveth crc32c_vpmsum autofs4 CPU: 7 PID: 759 Comm: NetworkManager Not tainted 5.9.0-rc1-00124-gd0a84e1f38d9 #14 NIP: c00800000050a900 LR: c00800000050a8f0 CTR: 00000000005b1904 REGS: c0000001ed746d20 TRAP: 0300 Not tainted (5.9.0-rc1-00124-gd0a84e1f38d9) MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24428484 XER: 00000001 CFAR: c0000000000101b0 DAR: 0000000000000010 DSISR: 40000000 IRQMASK: 0 GPR00: c00800000050a8f0 c0000001ed746fb0 c008000000518e00 0000000000000000 GPR04: 00000000000000c0 0000000000000080 0003c366c60c4501 0000000000000352 GPR08: 000000000001f400 0000000000000010 0000000000000000 0000000000000000 GPR12: 0001cf0000000019 c00000001ec97680 00000001003dfd40 0000010008dbb22c GPR16: 0000000000000000 0000000000000000 0000000000000000 c000000000edb6c8 GPR20: c000000004e73e00 c000000004fd2448 c000000004e6d700 c000000004fd2448 GPR24: c000000004fd2400 c000000004a0cd20 c0000001ed961860 c0080000005029d8 GPR28: 0000000000000000 0000000000000003 c000000004a0c000 0000000000000000 NIP [c00800000050a900] init_resources+0x338/0xa00 [ibmvnic] LR [c00800000050a8f0] init_resources+0x328/0xa00 [ibmvnic] Call Trace: [c0000001ed746fb0] [c00800000050a8f0] init_resources+0x328/0xa00 [ibmvnic] (unreliable) [c0000001ed747090] [c00800000050b024] ibmvnic_open+0x5c/0x100 [ibmvnic] [c0000001ed747110] [c000000000bdcc0c] __dev_open+0x17c/0x250 [c0000001ed7471b0] [c000000000bdd1ec] __dev_change_flags+0x1dc/0x270 [c0000001ed747260] [c000000000bdd2bc] dev_change_flags+0x3c/0x90 [c0000001ed7472a0] [c000000000bf24b8] do_setlink+0x3b8/0x1280 [c0000001ed747450] [c000000000bf8cc8] __rtnl_newlink+0x5a8/0x980 [c0000001ed7478b0] [c000000000bf9110] rtnl_newlink+0x70/0xb0 [c0000001ed7478f0] [c000000000bf07c4] rtnetlink_rcv_msg+0x364/0x460 [c0000001ed747990] [c000000000c68b94] netlink_rcv_skb+0x84/0x1a0 [c0000001ed747a00] [c000000000bef758] rtnetlink_rcv+0x28/0x40 [c0000001ed747a20] [c000000000c68188] netlink_unicast+0x218/0x310 [c0000001ed747a80] [c000000000c6848c] netlink_sendmsg+0x20c/0x4e0 [c0000001ed747b20] [c000000000b9dc88] ____sys_sendmsg+0x158/0x360 [c0000001ed747bb0] [c000000000ba1c88] ___sys_sendmsg+0x98/0xf0 [c0000001ed747d10] [c000000000ba1db8] __sys_sendmsg+0x78/0x100 [c0000001ed747dc0] [c000000000033820] system_call_exception+0x160/0x280 [c0000001ed747e20] [c00000000000d740] system_call_common+0xf0/0x27c Instruction dump: 3be00000 38810068 b1410076 3941006a 93e10072 fbea0000 b1210068 4bff9915 eb9e0ca0 eabe0900 393c0010 3ab50048 <7fa04c2c> 7fba07b4 7b431764 7b4917a0 ---[ end trace fbc5949a28e103bd ]--- Fixes: `f3ae59c0c0` ("ibmvnic: store RX and TX subCRQ handle array in ibmvnic_adapter struct") Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-24 15:56:57 -07:00
Cristobal Forno	f3ae59c0c0	ibmvnic: store RX and TX subCRQ handle array in ibmvnic_adapter struct Currently the driver reads RX and TX subCRQ handle array directly from a DMA-mapped buffer address when it needs to make a H_SEND_SUBCRQ hcall. This patch stores that information in the ibmvnic_sub_crq_queue structure instead of reading from the buffer received at login. The overall goal of this patch is to parse relevant information from the login response buffer and store it in the driver's private data structures so that we don't need to read directly from the buffer and can then free up that memory. Signed-off-by: Cristobal Forno <cforno12@linux.ibm.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-19 15:38:16 -07:00
Juliet Kim	7d7195a026	ibmvnic: Do not process device remove during device reset The ibmvnic driver does not check the device state when the device is removed. If the device is removed while a device reset is being processed, the remove may free structures needed by the reset, causing an oops. Fix this by checking the device state before processing device remove. Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-10 15:50:42 -07:00
Thomas Falcon	ff25dcb9a1	ibmvnic: Serialize device queries Provide some serialization for device CRQ commands and queries to ensure that the shared variable used for storing return codes is properly synchronized. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-26 13:19:31 -08:00
Juliet Kim	7ed5b31f4a	net/ibmvnic: prevent more than one thread from running in reset The current code allows more than one thread to run in reset. This can corrupt struct adapter data. Check adapter->resetting before performing a reset, if there is another reset running delay (100 msec) before trying again. Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-25 13:41:41 +02:00
Juliet Kim	b27507bb59	net/ibmvnic: unlock rtnl_lock in reset so linkwatch_event can run Commit `a5681e20b5` ("net/ibmnvic: Fix deadlock problem in reset") made the change to hold the RTNL lock during a reset to avoid deadlock but linkwatch_event is fired during the reset and needs the RTNL lock. That keeps linkwatch_event process from proceeding until the reset is complete. The reset process cannot tolerate the linkwatch_event processing after reset completes, so release the RTNL lock during the process to allow a chance for linkwatch_event to run during reset. This does not guarantee that the linkwatch_event will be processed as soon as link state changes, but is an improvement over the current code where linkwatch_event processing is always delayed, which prevents transmissions on the device from being deactivated leading transmit watchdog timer to time-out. Release the RTNL lock before link state change and re-acquire after the link state change to allow linkwatch_event to grab the RTNL lock and run during the reset. Fixes: `a5681e20b5` ("net/ibmnvic: Fix deadlock problem in reset") Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-25 13:41:41 +02:00
Thomas Gleixner	d5bb994bcd	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 114 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 8 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190523091650.663497195@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-05-24 17:39:01 +02:00
Thomas Falcon	62740e9788	net/ibmvnic: Update MAC address settings after adapter reset It was discovered in testing that the underlying hardware MAC address will revert to initial settings following a device reset, but the driver fails to resend the current OS MAC settings. This oversight can result in dropped packets should the scenario occur. Fix this by informing hardware of current MAC address settings following any adapter initialization or resets. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-10 15:11:43 -07:00
Murilo Fossa Vicentini	e56e251566	ibmvnic: Add device identification to requested IRQs The ibmvnic driver currently uses the same fixed name when using request_irq, this makes it hard to parse when multiple VNIC devices are available at the same time. This patch adds the unit_address as the device identification along with an id for each queue. The original idea was to use the interface name as an identifier, but it is not feasible given these requests happen at adapter probe, and at this point netdev is not yet registered so it doesn't have the proper name assigned to it. Signed-off-by: Murilo Fossa Vicentini <muvic@linux.ibm.com> Reviewed-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com> Reviewed-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 20:20:06 -04:00
Murilo Fossa Vicentini	f8d6ae0d27	ibmvnic: Report actual backing device speed and duplex values The ibmvnic driver currently reports a fixed value for both speed and duplex settings regardless of the actual backing device that is being used. By adding support to the QUERY_PHYS_PARMS command defined by the PAPR+ we can query the current physical port state and report the proper values for these feilds. Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Murilo Fossa Vicentini <muvic@linux.ibm.com> Reviewed-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-19 14:07:18 -07:00
Thomas Falcon	6c5c748908	ibmvnic: Convert reset work item mutex to spin lock ibmvnic_reset can create and schedule a reset work item from an IRQ context, so do not use a mutex, which can sleep. Convert the reset work item mutex to a spin lock. Locking debugger generated the trace output below. BUG: sleeping function called from invalid context at kernel/locking/mutex.c:908 in_atomic(): 1, irqs_disabled(): 1, pid: 120, name: kworker/8:1 4 locks held by kworker/8:1/120: #0: 0000000017c05720 ((wq_completion)"events"){+.+.}, at: process_one_work+0x188/0x710 #1: 00000000ace90706 ((linkwatch_work).work){+.+.}, at: process_one_work+0x188/0x710 #2: 000000007632871f (rtnl_mutex){+.+.}, at: rtnl_lock+0x30/0x50 #3: 00000000fc36813a (&(&crq->lock)->rlock){..-.}, at: ibmvnic_tasklet+0x88/0x2010 [ibmvnic] irq event stamp: 26293 hardirqs last enabled at (26292): [<c000000000122468>] tasklet_action_common.isra.12+0x78/0x1c0 hardirqs last disabled at (26293): [<c000000000befce8>] _raw_spin_lock_irqsave+0x48/0xf0 softirqs last enabled at (26288): [<c000000000a8ac78>] dev_deactivate_queue.constprop.28+0xc8/0x160 softirqs last disabled at (26289): [<c0000000000306e0>] call_do_softirq+0x14/0x24 CPU: 8 PID: 120 Comm: kworker/8:1 Kdump: loaded Not tainted 4.20.0-rc6 #6 Workqueue: events linkwatch_event Call Trace: [c0000003fffa7a50] [c000000000bc83e4] dump_stack+0xe8/0x164 (unreliable) [c0000003fffa7aa0] [c00000000015ba0c] ___might_sleep+0x2dc/0x320 [c0000003fffa7b20] [c000000000be960c] __mutex_lock+0x8c/0xb40 [c0000003fffa7c30] [d000000006202ac8] ibmvnic_reset+0x78/0x330 [ibmvnic] [c0000003fffa7cc0] [d0000000062097f4] ibmvnic_tasklet+0x1054/0x2010 [ibmvnic] [c0000003fffa7e00] [c0000000001224c8] tasklet_action_common.isra.12+0xd8/0x1c0 [c0000003fffa7e60] [c000000000bf1238] __do_softirq+0x1a8/0x64c [c0000003fffa7f90] [c0000000000306e0] call_do_softirq+0x14/0x24 [c0000003f3f87980] [c00000000001ba50] do_softirq_own_stack+0x60/0xb0 [c0000003f3f879c0] [c0000000001218a8] do_softirq+0xa8/0x100 [c0000003f3f879f0] [c000000000121a74] __local_bh_enable_ip+0x174/0x180 [c0000003f3f87a60] [c000000000bf003c] _raw_spin_unlock_bh+0x5c/0x80 [c0000003f3f87a90] [c000000000a8ac78] dev_deactivate_queue.constprop.28+0xc8/0x160 [c0000003f3f87ad0] [c000000000a8c8b0] dev_deactivate_many+0xd0/0x520 [c0000003f3f87b70] [c000000000a8cd40] dev_deactivate+0x40/0x60 [c0000003f3f87ba0] [c000000000a5e0c4] linkwatch_do_dev+0x74/0xd0 [c0000003f3f87bd0] [c000000000a5e694] __linkwatch_run_queue+0x1a4/0x1f0 [c0000003f3f87c30] [c000000000a5e728] linkwatch_event+0x48/0x60 [c0000003f3f87c50] [c0000000001444e8] process_one_work+0x238/0x710 [c0000003f3f87d20] [c000000000144a48] worker_thread+0x88/0x4e0 [c0000003f3f87db0] [c00000000014e3a8] kthread+0x178/0x1c0 [c0000003f3f87e20] [c00000000000bfd0] ret_from_kernel_thread+0x5c/0x6c Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-10 17:34:25 -08:00
Juliet Kim	a5681e20b5	net/ibmnvic: Fix deadlock problem in reset This patch changes to use rtnl_lock only during a reset to avoid deadlock that could occur when a thread operating close is holding rtnl_lock and waiting for reset_lock acquired by another thread, which is waiting for rtnl_lock in order to set the number of tx/rx queues during a reset. Also, we now setting the number of tx/rx queues during a soft reset for failover or LPM events. Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-19 18:56:31 -08:00
Thomas Falcon	723ad91613	ibmvnic: Add ethtool private flag for driver-defined queue limits When choosing channel amounts and ring sizes, the maximums in the ibmvnic driver are defined by the virtual i/o server management partition. Even though they are defined as maximums, the client driver may in fact successfully request resources that exceed these limits, which are mostly dependent on a user's hardware With this in mind, provide an ethtool flag that when enabled will allow the user to request resources limited by driver-defined maximums instead of limits defined by the management partition. The driver will try to honor the user's request but may not allowed by the management partition. In this case, the driver requests as close as it can get to the desired amount until it succeeds. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-01 23:31:16 -07:00
Thomas Falcon	20b5ba1f61	ibmvnic: Introduce driver limits for ring sizes Introduce driver-defined maximums for queue ring sizes. Devices available for IBM vNIC today will likely not allow this amount, but this should give us some leeway for future devices that may support larger ring sizes. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-01 23:31:16 -07:00
Thomas Falcon	ad95a240a1	ibmvnic: Increase maximum queue size limit Increase queue size limit to 16. Devices available for IBM vNIC today will not allow this amount, but this should give us some leeway for future devices that may support more RX or TX queues. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-01 23:31:16 -07:00
Thomas Falcon	79dabbb716	ibmvnic: Remove code to request error information When backing device firmware reports an error, it provides an error ID, which is meant to be queried for more detailed error information. Currently, however, an error ID is not provided by the Virtual I/O server and there are not any plans to do so. For now, it is always unfilled or zero, so request_error_information will never be called. Remove it. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-08-07 12:46:27 -07:00
Thomas Falcon	2770a7984d	ibmvnic: Introduce hard reset recovery Introduce a recovery hard reset to handle reset failure as a result of change of device context following a transport event, such as a backing device failover or partition migration. These operations reset the device context to its initial state. If this occurs during a reset, any initialization commands are likely to fail with an invalid state error as backing device firmware requests reinitialization. When this happens, make one more attempt by performing a hard reset, which frees any resources currently allocated and performs device initialization. If a transport event occurs during a device reset, a flag is set which will trigger a new hard reset following the completionof the current reset event. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:19:26 -04:00

1 2

98 Commits