IPoIB: Fix send lockup due to missed TX completion
Commit f0dc117abdfa ("IPoIB: Fix TX queue lockup with mixed UD/CM traffic") attempts to solve an issue where unprocessed UD send completions can deadlock the netdev. The patch doesn't fully resolve the issue because if more than half the tx_outstanding's were UD and all of the destinations are RC reachable, arming the CQ doesn't solve the issue. This patch uses the IB_CQ_REPORT_MISSED_EVENTS on the ib_req_notify_cq(). If the rc is above 0, the UD send cq completion callback is called directly to re-arm the send completion timer. This issue is seen in very large parallel filesystem deployments and the patch has been shown to correct the issue. Cc: <stable@vger.kernel.org> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
This commit is contained in:
parent
a937536b86
commit
1ee9e2aa7b
@ -758,9 +758,13 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
|
||||
if (++priv->tx_outstanding == ipoib_sendq_size) {
|
||||
ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n",
|
||||
tx->qp->qp_num);
|
||||
if (ib_req_notify_cq(priv->send_cq, IB_CQ_NEXT_COMP))
|
||||
ipoib_warn(priv, "request notify on send CQ failed\n");
|
||||
netif_stop_queue(dev);
|
||||
rc = ib_req_notify_cq(priv->send_cq,
|
||||
IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS);
|
||||
if (rc < 0)
|
||||
ipoib_warn(priv, "request notify on send CQ failed\n");
|
||||
else if (rc)
|
||||
ipoib_send_comp_handler(priv->send_cq, dev);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
Loading…
x
Reference in New Issue
Block a user