WSL2 下 onnx 报错记录

记一次wsl下的环境踩坑，折腾一个多小时，发现了奇怪的解决方法

发现问题

先上环境：

import onnxruntime as ort
ort.get_available_providers()

显示可用

['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']

报错段，识别不到 gpu ，但环境下pytorch能使用gpu

session = ort.InferenceSession("yolov8n/best.onnx", providers=["CUDAExecutionProvider"])

ERROR

*************** EP Error ***************
EP Error /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:129 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; SUCCTYPE = cudaError; std::conditional_t<THRW, void, common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; SUCCTYPE = cudaError; std::conditional_t<THRW, void, common::Status> = void] CUDA failure 100: no CUDA-capable device is detected ; GPU=-1 ; hostname=DESKTOP-LBBLQ7H ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=282 ; expr=cudaSetDevice(info_.device_id); 

 when using ['CUDAExecutionProvider']
Falling back to ['CPUExecutionProvider'] and retrying.
****************************************

尝试 debug

查阅资料，认为的 onnxruntime 找不到 cudnn, 而 pytorch 是自带 cudnn 的。

wsl 中的 cuda toolkit 都是调用window下的，在win下安装好后一般不必在wsl重新安装

apt 安装库，修改环境变量 ... 之后便报了 not found 错误, 然而继续把缺少的安装上并修改环境变量后，在终端执行 nvidia-smi 发现无效了, 赶紧把环境变量注释掉。

意外解决

原本打算就这样算了，服务器用 CPU （i7-13700）也足够，在中间需要数据转换时 import torch 报错 undefined symbol,一开始以为还是把环境搞崩了，但开了新的 .py 发现能正常使用。之后尝试在使用 onnx 前多加一行

import torch # add
import onnxruntime as ort
session = ort.InferenceSession("yolov8n/best.onnx", providers=["CUDAExecutionProvider"])

没想到真就通过了...

记录引起报错的代码

onnxruntime: 无法识别 GPU

import onnxruntime as ort
session = ort.InferenceSession("yolov8n/best.onnx", providers=["CUDAExecutionProvider"])

pytorch: undefined symbol 报错

import onnxruntime
import torch