保姆级教程：用Labelme标注交通灯数据集，并一键转成YOLOv5训练格式（附完整脚本）-平芜编程栈

从零构建交通信号灯检测模型：Labelme标注与YOLOv5格式转换全流程实战

在计算机视觉领域，目标检测一直是热门研究方向，而交通信号灯的识别更是自动驾驶和智能交通系统中的关键环节。本文将手把手带你完成从原始图像标注到YOLOv5模型训练准备的全过程，重点解决两个核心痛点：如何用Labelme进行高精度多边形标注，以及如何将标注结果高效转换为YOLOv5所需的训练格式。

1. 环境配置与工具准备

工欲善其事，必先利其器。在开始标注前，我们需要搭建稳定可靠的工作环境。推荐使用Anaconda创建独立的Python环境，避免与系统环境产生冲突：

conda create -n labelme python=3.8 -y conda activate labelme conda install pyqt=5.15.7 -y pip install labelme==5.1.1

安装完成后，通过终端输入labelme命令即可启动标注工具。为方便后续管理，建议按以下结构创建项目目录：

yolov5_traffic_light/ ├── images/ # 存放原始图像 ├── annotations/ # 保存Labelme生成的JSON文件 ├── labels/ # 存放转换后的YOLO格式标签 ├── scripts/ # 存放转换脚本 └── dataset/ # 最终训练数据集

2. 高质量数据标注技巧

2.1 Labelme标注实操细节

启动Labelme后，点击"Open Dir"选择images文件夹加载待标注图像。标注交通信号灯时需特别注意：

多边形标注技巧：
- 使用滚轮放大图像至能清晰辨识信号灯边缘
- 首尾点必须严格重合形成闭合区域
- 对于圆形信号灯，至少标注12个点以保证轮廓精度
标签命名规范：
- 保持一致性（如全部小写）
- 建议采用color_state格式（如red_on,green_off）
- 避免使用空格和特殊字符

标注示例：
对同一场景中的多个信号灯，应分别标注并赋予正确标签。夜间场景需特别注意区分信号灯是否处于点亮状态。

2.2 标注质量控制

为提高模型训练效果，标注时应注意：

完整性：确保标注覆盖整个信号灯区域，包括边缘光晕
一致性：相同类别的标注方式保持一致
排除干扰：不标注被遮挡超过30%的信号灯

完成标注后，系统会为每张图像生成对应的JSON文件，包含以下关键信息：

{ "version": "5.1.1", "flags": {}, "shapes": [ { "label": "red_on", "points": [[302, 205], [310, 198], ...], "shape_type": "polygon" } ], "imagePath": "IMG_001.jpg", "imageData": null }

3. 格式转换核心技术

3.1 Labelme转YOLOv5格式原理

YOLOv5要求标签文件为TXT格式，每行表示一个对象，包含：

<class_id> <x_center> <y_center> <width> <height>

而Labelme生成的是多边形顶点坐标，需要进行以下转换：

将多边形转换为最小外接矩形
将绝对坐标归一化为相对坐标（0-1范围）
计算中心点和宽高

转换脚本核心逻辑：

def polygon_to_yolo(points, img_width, img_height): x_coords = [p[0] for p in points] y_coords = [p[1] for p in points] x_min, x_max = min(x_coords), max(x_coords) y_min, y_max = min(y_coords), max(y_coords) x_center = (x_min + x_max) / 2 / img_width y_center = (y_min + y_max) / 2 / img_height width = (x_max - x_min) / img_width height = (y_max - y_min) / img_height return [x_center, y_center, width, height]

3.2 完整转换脚本实现

创建json2yolo.py脚本，实现批量转换：

import os import json from tqdm import tqdm def convert(json_dir, output_dir, class_list): os.makedirs(output_dir, exist_ok=True) for json_file in tqdm(os.listdir(json_dir)): if not json_file.endswith('.json'): continue with open(os.path.join(json_dir, json_file)) as f: data = json.load(f) txt_path = os.path.join(output_dir, json_file.replace('.json', '.txt')) with open(txt_path, 'w') as f: for shape in data['shapes']: class_id = class_list.index(shape['label']) points = shape['points'] bbox = polygon_to_yolo(points, data['imageWidth'], data['imageHeight']) line = f"{class_id} {' '.join(map(str, bbox))}\n" f.write(line) if __name__ == '__main__': convert('annotations', 'labels', ['red_on', 'yellow_on', 'green_on'])

4. 数据集划分与验证

4.1 科学划分数据集

合理的数据集划分对模型评估至关重要，推荐比例：

数据集	比例	用途
训练集	70%	模型训练
验证集	20%	超参数调优
测试集	10%	最终评估

实现代码片段：

def split_dataset(image_dir, label_dir, output_dir, ratios=(0.7, 0.2, 0.1)): files = [f for f in os.listdir(image_dir) if f.endswith('.jpg')] random.shuffle(files) train_idx = int(len(files) * ratios[0]) val_idx = train_idx + int(len(files) * ratios[1]) splits = { 'train': files[:train_idx], 'val': files[train_idx:val_idx], 'test': files[val_idx:] } for split, files in splits.items(): os.makedirs(os.path.join(output_dir, split, 'images'), exist_ok=True) os.makedirs(os.path.join(output_dir, split, 'labels'), exist_ok=True) for file in files: # 复制图像和标签文件到对应目录 ...

4.2 数据一致性检查

转换完成后必须验证数据质量：

图像-标签匹配检查：

# 检查文件数量是否一致 ls images/*.jpg | wc -l ls labels/*.txt | wc -l

标注可视化验证：使用以下脚本将YOLO格式标注绘制到图像上：

import cv2 def visualize(image_path, label_path, class_names): image = cv2.imread(image_path) height, width = image.shape[:2] with open(label_path) as f: for line in f: class_id, xc, yc, w, h = map(float, line.strip().split()) x1 = int((xc - w/2) * width) y1 = int((yc - h/2) * height) x2 = int((xc + w/2) * width) y2 = int((yc + h/2) * height) cv2.rectangle(image, (x1, y1), (x2, y2), (0,255,0), 2) cv2.putText(image, class_names[int(class_id)], (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0,255,0), 2) cv2.imshow('Preview', image) cv2.waitKey(0)

5. 高效标注的工程实践

5.1 标注效率提升技巧

快捷键记忆：
- Ctrl+O打开目录
- Ctrl+S保存当前标注
- Ctrl+鼠标滚轮快速缩放

批量处理技巧：

# 批量检查JSON文件完整性 find annotations/ -name "*.json" -exec jq '.' {} > /dev/null \;

5.2 常见问题解决方案

问题现象	可能原因	解决方案
转换后坐标超出[0,1]范围	标注点超出图像边界	检查标注时是否误点在图像外
转换后bbox宽高为0	首尾点未重合	确保多边形闭合
类别ID错误	class_list顺序不匹配	保持转换和训练使用相同class_list

在实际项目中，建议建立标注规范文档，包含：