一.基础资料

1.Git 地址

地址

2.issues

issues

3.参考

参考 csdn

二.服务器信息

1.GPU 服务器

  • GPU 服务器自带 CUDA 安装(前提是需要勾选上)
  • CUDA 需要选择大于 11.3 的版本
  • 登录服务器后会自动安装 GPU 驱动

2.CUDA 安装

GPU 服务器自带 CUDA

CUDA 版本查看

【机器学习系列】M3DM工业缺陷检测部署与训练-LMLPHP

3.登录信息

删除指定主机的秘钥:

ssh-keygen -R 47.107.139.237

ssh-keygen -R 47.107.139.237 的作用是从 known_hosts 文件中删除指定主机的密钥。known_hosts 文件是 SSH 用来存储已知主机的公钥的文件。通常情况下,当你首次连接到一个主机时,SSH 会将该主机的公钥添加到 known_hosts 文件中,以后的连接中会验证主机的公钥是否匹配,以确保连接的安全性。使用 -R 选项可以从该文件中删除指定主机的条目,这在你知道主机的密钥可能已经发生变化或需要清理旧密钥时很有用。

登录信息:

#
sshpass -p xxxxx ssh -A -g root@47.107.139.237

# 给豪哥的
47.107.139.237
root
xxxxx

4.查询系统信息

[root@lavm-ikopaz5aoj ~]# uname -a
Linux lavm-ikopaz5aoj 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
[root@lavm-ikopaz5aoj ~]# cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)
[root@lavm-ikopaz5aoj ~]#

三.基础环境

1.安装 git

sudo apt update
sudo apt install git
git --version

2.环境准备

Ubuntu 18.04
Python 3.8
Pytorch 1.9.0
CUDA 11.3

3.安装 conda

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
~/miniconda3/bin/conda init zsh
vim ~/.bashrc
export PATH=$PATH:~/miniconda3/bin
source ~/.bashrc

4.Python 安装

建议用 conda 安装 python

# 创建虚拟环境
conda create -n m3dm python=3.8

# 进入虚拟环境
conda activate m3dm

5.Pytorch 安装

# torch版本---github要求的
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
# 指定cuda==11.3时,pytorch的版本pytorch==1.12.1
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 -f https://download.pytorch.org/whl/torch_stable.html

版本对应关系:

版本关系

6.网络测试

# 会用到的网站
https://huggingface.co/

# 检查是否可以访问
curl https://huggingface.co/

telnet huggingface.co 443
(m3dm) root@iZwz9c1tow6mi9lnah1hrtZ:/kwan/M3DM# telnet huggingface.co 443
Trying 162.125.7.1...
Connected to huggingface.co.
Escape character is '^]'.
Connection closed by foreign host.

四.执行步骤

1.创建目录

mkdir /kwan
cd /kwan
mkdir software

2.代码

git clone https://github.com/nomewang/M3DM.git

3.requirements

cd  M3DM
pip install -r requirements.txt

4.安装其他依赖

pip install ninja
pip install open3d

5.knn_cuda

# install knn_cuda
pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl

6.pointnet2_ops_lib

# install pointnet2_ops_lib
pip install "git+http://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"

7.上传数据集

cd /kwan/M3DM
mkdir -p datasets/mvtec3d

scp  /Users/qinyingjie/Downloads/000-训练/dowel.tar.xz root@47.107.139.237:/kwan/M3DM/datasets/mvtec3d

8.预处理

#进入目录
cd /kwan/M3DM

#解压
cd /kwan/M3DM/datasets/mvtec3d
tar -xvf dowel.tar.xz

#数据集预处理
cd /kwan/M3DM
python utils/preprocessing.py datasets/mvtec3d/

9.权重处理

# 下载权重放入文件夹 /checkpoints
cd /kwan/M3DM
mkdir checkpoints

scp  /Users/qinyingjie/Downloads/001-资源/B_8-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_224.npz root@47.107.139.237:/kwan/M3DM/checkpoints

scp  /Users/qinyingjie/Downloads/001-资源/B_8-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0.npz root@47.107.139.237:/kwan/M3DM/checkpoints

scp  /Users/qinyingjie/Downloads/001-资源/dino_deitsmall8_pretrain.pth.zip root@47.107.139.237:/kwan/M3DM/checkpoints

scp  /Users/qinyingjie/Downloads/001-资源/dino_vitbase8_pretrain.pth root@47.107.139.237:/kwan/M3DM/checkpoints

scp  /Users/qinyingjie/Downloads/001-资源/Point-BERT.pth root@47.107.139.237:/kwan/M3DM/checkpoints

scp  /Users/qinyingjie/Downloads/001-资源/pointmae_pretrain.pth root@47.107.139.237:/kwan/M3DM/checkpoints

scp  /Users/qinyingjie/Downloads/001-资源/uff_pretrain.pth root@47.107.139.237:/kwan/M3DM/checkpoints

10.训练

mkdir -p datasets/patch_lib

#开始训练
python3 main.py \
--method_name DINO+Point_MAE \
--memory_bank multiple \
--rgb_backbone_name vit_base_patch8_224_dino \
--xyz_backbone_name Point_MAE \
--save_feature

问题1:

# AttributeError: module 'torch' has no attribute 'frombuffer'
# 升级torch版本
pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 -f https://download.pytorch.org/whl/torch_stable.html

问题2:

【机器学习系列】M3DM工业缺陷检测部署与训练-LMLPHP

五.数据集

1.数据集下载

After download, put the dataset in dataset folder.

2.数据准备

To run the preprocessing

python utils/preprocessing.py datasets/mvtec3d/

It may take a few hours to run the preprocessing.

六.Checkpoints 与训练

1.Checkpoints

The following table lists the pretrain model used in M3DM:

Put the checkpoint files in checkpoints folder.

2.训练

Train and test the double lib version and save the feature for UFF training:

mkdir -p datasets/patch_lib
python3 main.py \
--method_name DINO+Point_MAE \
--memory_bank multiple \
--rgb_backbone_name vit_base_patch8_224_dino \
--xyz_backbone_name Point_MAE \
--save_feature \
03-18 06:10