{“type”:”block”,”srcClientIds”:[“5356cd3d-b828-47e3-aa6f-10758a7ed065″],”srcRootClientId”:””}
使用命令查看docker状态:journalctl -xel -u docker
报错docker status : error="runtime not found in config:nvidia"
原因:
1.daemon.json被删除
复制其他服务器的daemon.json到服务器上/etc/docker
{{"type":"block","srcClientIds":["5356cd3d-b828-47e3-aa6f-10758a7ed065"],"srcRootClientId":""} "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } }
再次启动docker报缺少/usr/bin/nvidia-container-runtime,执行第2步有nvidia-docker2产生这个程序
2.缺少nvidia-docker2,重新安装
curl https://get.docker.com | sh sudo systemctl start docker && sudo systemctl enable docker # 设置stable存储库和GPG密钥: distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \ && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list # 要访问experimental诸如WSL上的CUDA或A100上的新MIG功能之类的功能,您可能需要将experimental分支添加到存储库列表中. # 可加可不加 curl -s -L https://nvidia.github.io/nvidia-container-runtime/experimental/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list # nvidia-docker2更新软件包清单后,安装软件包(和依赖项): sudo apt-get update sudo apt-get install -y nvidia-docker2 # 设置默认运行时后,重新启动Docker守护程序以完成安装: sudo systemctl restart docker