基于YOLOv5与PyTorch的Python物体检测推理全指南

一、环境准备与依赖安装

1.1 基础环境配置

YOLOv5的推理依赖Python 3.8+环境，建议使用虚拟环境（如conda或venv）隔离项目依赖。操作系统推荐Linux（Ubuntu 20.04+）或Windows 10/11，需确保已安装CUDA驱动（若使用GPU加速）。

1.2 PyTorch安装

PyTorch是YOLOv5的核心框架，需根据硬件选择版本：

# CPU版本
pip install torch torchvision torchaudio
# GPU版本（CUDA 11.7）
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

验证安装：

import torch
print(torch.__version__)  # 应输出如1.13.1
print(torch.cuda.is_available())  # GPU环境下应返回True

1.3 YOLOv5源码与依赖

从官方仓库克隆代码并安装依赖：

git clone https://github.com/ultralytics/yolov5.git
cd yolov5
pip install -r requirements.txt  # 包含opencv-python、numpy等

二、模型加载与预处理

2.1 模型选择与下载

YOLOv5提供多种预训练模型（如yolov5s.pt、yolov5m.pt），根据需求选择：

yolov5s：轻量级（7.3M参数），适合边缘设备。
yolov5l：高精度（46.5M参数），适合服务器部署。

模型会自动从Release页面下载，也可手动下载后放入yolov5/models目录。

2.2 图像预处理

输入图像需统一为640x640像素（默认），支持单张图片或批量处理：

import cv2
from yolov5.models.experimental import attempt_load
from yolov5.utils.general import non_max_suppression, scale_coords
from yolov5.utils.torch_utils import select_device
def preprocess_image(img_path):
    img = cv2.imread(img_path)  # BGR格式
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)  # 转为RGB
    img = cv2.resize(img, (640, 640))  # 调整大小
    img = img.transpose(2, 0, 1)  # HWC → CHW
    img = torch.from_numpy(img).float() / 255.0  # 归一化
    img = img.unsqueeze(0)  # 添加batch维度
    return img

三、推理执行与后处理

3.1 模型初始化与推理

device = select_device('cuda' if torch.cuda.is_available() else 'cpu')
model = attempt_load('yolov5s.pt', device=device)  # 加载模型
img = preprocess_image('test.jpg')
with torch.no_grad():
    pred = model(img)[0]  # 推理

3.2 非极大值抑制（NMS）

过滤重叠框并保留高置信度结果：

conf_thres = 0.25  # 置信度阈值
iou_thres = 0.45  # IoU阈值
pred = non_max_suppression(pred, conf_thres, iou_thres)

3.3 结果解析与坐标转换

def postprocess(pred, original_shape):
    # pred: [batch, num_boxes, 6] (x1, y1, x2, y2, conf, class)
    boxes = pred[:, :, :4]  # 边界框坐标
    scores = pred[:, :, 4]  # 置信度
    classes = pred[:, :, 5]  # 类别ID
    # 缩放坐标至原图尺寸
    scale_factor = min(640 / original_shape[0], 640 / original_shape[1])
    boxes[:, :, [0, 2]] /= scale_factor
    boxes[:, :, [1, 3]] /= scale_factor
    return boxes, scores, classes

四、结果可视化与输出

4.1 绘制检测框

使用OpenCV标注结果：

def draw_detections(img_path, boxes, scores, classes, class_names):
    img = cv2.imread(img_path)
    h, w = img.shape[:2]
    for box, score, cls_id in zip(boxes, scores, classes):
        x1, y1, x2, y2 = map(int, box)
        label = f"{class_names[int(cls_id)]}: {score:.2f}"
        cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)
        cv2.putText(img, label, (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
    cv2.imwrite('result.jpg', img)
    return img

4.2 完整推理流程示例

if __name__ == '__main__':
    img_path = 'test.jpg'
    original_img = cv2.imread(img_path)
    h, w = original_img.shape[:2]
    # 1. 预处理
    processed_img = preprocess_image(img_path)
    # 2. 推理
    device = select_device('cuda' if torch.cuda.is_available() else 'cpu')
    model = attempt_load('yolov5s.pt', device=device)
    with torch.no_grad():
        pred = model(processed_img.to(device))[0]
    # 3. 后处理
    pred = non_max_suppression(pred, 0.25, 0.45)
    boxes, scores, classes = postprocess(pred[0].cpu().numpy(), (h, w))
    # 4. 可视化
    class_names = ['person', 'car', 'truck']  # 示例类别
    result_img = draw_detections(img_path, boxes, scores, classes, class_names)

五、性能优化与部署建议

5.1 硬件加速

GPU推理：确保CUDA和cuDNN版本兼容，使用torch.backends.cudnn.benchmark = True自动优化。
TensorRT加速：将PyTorch模型转换为TensorRT引擎，可提升3-5倍速度。

5.2 模型量化

使用动态量化减少模型体积和推理时间：

quantized_model = torch.quantization.quantize_dynamic(
    model, {torch.nn.Linear}, dtype=torch.qint8
)

5.3 批量推理

合并多张图片为一个batch，充分利用GPU并行能力：

batch_imgs = [preprocess_image(f'test_{i}.jpg') for i in range(4)]
batch_imgs = torch.cat(batch_imgs, dim=0).to(device)
with torch.no_grad():
    batch_pred = model(batch_imgs)

六、常见问题与解决方案

6.1 CUDA内存不足

降低batch_size（默认1）。
使用torch.cuda.empty_cache()清理缓存。

6.2 检测框抖动

调整iou_thres（如0.5→0.4）。
启用视频流中的帧间平滑（如移动平均）。

6.3 自定义类别检测

修改data/coco.yaml中的names字段，或训练自定义模型：

# data/custom.yaml
names: ['cat', 'dog', 'bird']

七、扩展应用场景

7.1 实时视频流检测

cap = cv2.VideoCapture(0)  # 或视频文件路径
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    # 预处理与推理（同上）
    # ...
    cv2.imshow('Detection', result_img)
    if cv2.waitKey(1) == 27:  # ESC键退出
        break

7.2 嵌入式设备部署

使用torch.onnx.export()导出为ONNX格式。
通过TensorRT或OpenVINO在Jetson系列或树莓派上运行。

八、总结与资源推荐

本文系统介绍了YOLOv5与PyTorch结合实现物体检测的全流程，涵盖环境配置、模型加载、推理优化及结果可视化。对于进阶用户，建议：

参考YOLOv5官方文档获取最新特性。
尝试训练自定义数据集（使用train.py脚本）。
探索PyTorch的分布式训练以加速大规模数据集处理。

通过实践上述方法，开发者可快速构建高效的物体检测系统，并灵活应用于安防监控、自动驾驶、工业质检等领域。