97 lines
3.6 KiB
Diff
97 lines
3.6 KiB
Diff
From 18dc0ad80f9c8eb7e4f34d6da219072b88a9b046 Mon Sep 17 00:00:00 2001
|
|
From: Yonglong Liu <liuyonglong@huawei.com>
|
|
Date: Mon, 7 Aug 2023 19:34:52 +0800
|
|
Subject: [PATCH 269/283] net: hns3: fix deadlock issue when externel_lb and
|
|
reset are executed together
|
|
|
|
mainline inclusion
|
|
from mainline-v6.5-rc6
|
|
commit ac6257a3ae5db5193b1f19c268e4f72d274ddb88
|
|
category: bugfix
|
|
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I8EN49
|
|
|
|
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ac6257a3ae5db5193b1f19c268e4f72d274ddb88
|
|
|
|
--------------------------------
|
|
|
|
When externel_lb and reset are executed together, a deadlock may
|
|
occur:
|
|
[ 3147.217009] INFO: task kworker/u321:0:7 blocked for more than 120 seconds.
|
|
[ 3147.230483] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
|
|
[ 3147.238999] task:kworker/u321:0 state:D stack: 0 pid: 7 ppid: 2 flags:0x00000008
|
|
[ 3147.248045] Workqueue: hclge hclge_service_task [hclge]
|
|
[ 3147.253957] Call trace:
|
|
[ 3147.257093] __switch_to+0x7c/0xbc
|
|
[ 3147.261183] __schedule+0x338/0x6f0
|
|
[ 3147.265357] schedule+0x50/0xe0
|
|
[ 3147.269185] schedule_preempt_disabled+0x18/0x24
|
|
[ 3147.274488] __mutex_lock.constprop.0+0x1d4/0x5dc
|
|
[ 3147.279880] __mutex_lock_slowpath+0x1c/0x30
|
|
[ 3147.284839] mutex_lock+0x50/0x60
|
|
[ 3147.288841] rtnl_lock+0x20/0x2c
|
|
[ 3147.292759] hclge_reset_prepare+0x68/0x90 [hclge]
|
|
[ 3147.298239] hclge_reset_subtask+0x88/0xe0 [hclge]
|
|
[ 3147.303718] hclge_reset_service_task+0x84/0x120 [hclge]
|
|
[ 3147.309718] hclge_service_task+0x2c/0x70 [hclge]
|
|
[ 3147.315109] process_one_work+0x1d0/0x490
|
|
[ 3147.319805] worker_thread+0x158/0x3d0
|
|
[ 3147.324240] kthread+0x108/0x13c
|
|
[ 3147.328154] ret_from_fork+0x10/0x18
|
|
|
|
In externel_lb process, the hns3 driver call napi_disable()
|
|
first, then the reset happen, then the restore process of the
|
|
externel_lb will fail, and will not call napi_enable(). When
|
|
doing externel_lb again, napi_disable() will be double call,
|
|
cause a deadlock of rtnl_lock().
|
|
|
|
This patch use the HNS3_NIC_STATE_DOWN state to protect the
|
|
calling of napi_disable() and napi_enable() in externel_lb
|
|
process, just as the usage in ndo_stop() and ndo_start().
|
|
|
|
Fixes: 04b6ba143521 ("net: hns3: add support for external loopback test")
|
|
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
|
|
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
|
|
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Link: https://lore.kernel.org/r/20230807113452.474224-5-shaojijie@huawei.com
|
|
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Signed-off-by: Xiaodong Li <lixiaodong67@huawei.com>
|
|
---
|
|
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 14 +++++++++++++-
|
|
1 file changed, 13 insertions(+), 1 deletion(-)
|
|
|
|
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
|
|
index 43bd0c4b6108..18d0a218040a 100644
|
|
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
|
|
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
|
|
@@ -5467,6 +5467,9 @@ void hns3_external_lb_prepare(struct net_device *ndev, bool if_running)
|
|
if (!if_running)
|
|
return;
|
|
|
|
+ if (test_and_set_bit(HNS3_NIC_STATE_DOWN, &priv->state))
|
|
+ return;
|
|
+
|
|
netif_carrier_off(ndev);
|
|
netif_tx_disable(ndev);
|
|
|
|
@@ -5495,7 +5498,16 @@ void hns3_external_lb_restore(struct net_device *ndev, bool if_running)
|
|
if (!if_running)
|
|
return;
|
|
|
|
- hns3_nic_reset_all_ring(priv->ae_handle);
|
|
+ if (hns3_nic_resetting(ndev))
|
|
+ return;
|
|
+
|
|
+ if (!test_bit(HNS3_NIC_STATE_DOWN, &priv->state))
|
|
+ return;
|
|
+
|
|
+ if (hns3_nic_reset_all_ring(priv->ae_handle))
|
|
+ return;
|
|
+
|
|
+ clear_bit(HNS3_NIC_STATE_DOWN, &priv->state);
|
|
|
|
for (i = 0; i < priv->vector_num; i++)
|
|
hns3_vector_enable(&priv->tqp_vector[i]);
|
|
--
|
|
2.34.1
|
|
|