告别手动转换！用Python脚本一键搞定LabelImg的YOLO txt与VOC xml格式互转-平芜编程栈

告别手动转换！用Python脚本一键搞定LabelImg的YOLO txt与VOC xml格式互转

数据标注是计算机视觉项目中最耗时但至关重要的环节。当你用LabelImg完成标注后，却发现不同框架需要不同格式——YOLO要求txt，VOC需要xml。手动转换不仅效率低下，还容易出错。本文将带你用Python脚本实现两种格式的智能互转，彻底解决这个痛点。

1. 理解标注格式的本质差异

1.1 YOLO格式的坐标体系

YOLO使用的.txt文件采用相对坐标表示法，每行对应一个标注对象，格式为：

<class_id> <x_center> <y_center> <width> <height>

其中所有数值都是相对于图像宽高的比例值（0-1之间）。例如：

0 0.435 0.512 0.120 0.300

表示：

类别ID为0的对象
中心点位于图像宽度的43.5%和高度的51.2%处
宽度占图像总宽的12%，高度占30%

1.2 VOC格式的XML结构

PASCAL VOC的.xml文件则采用绝对坐标，包含完整的元信息：

<annotation> <size> <width>1920</width> <height>1080</height> </size> <object> <name>person</name> <bndbox> <xmin>500</xmin> <ymin>300</ymin> <xmax>700</xmax> <ymax>800</ymax> </bndbox> </object> </annotation>

关键区别在于：

使用具体像素值而非比例
包含完整的图像路径和尺寸信息
支持更多属性（如difficult、truncated等）

2. 转换核心算法解析

2.1 YOLO转VOC的数学原理

转换核心是将相对坐标转为绝对坐标，计算公式为：

xmin = (x_center - width/2) * image_width xmax = (x_center + width/2) * image_width ymin = (y_center - height/2) * image_height ymax = (y_center + height/2) * image_height

Python实现关键代码：

def yolo_to_voc(x_center, y_center, width, height, img_w, img_h): xmin = int((x_center - width/2) * img_w) xmax = int((x_center + width/2) * img_w) ymin = int((y_center - height/2) * img_h) ymax = int((y_center + height/2) * img_h) return xmin, ymin, xmax, ymax

2.2 VOC转YOLO的逆向计算

反向转换公式：

x_center = ((xmin + xmax)/2) / image_width y_center = ((ymin + ymax)/2) / image_height width = (xmax - xmin) / image_width height = (ymax - ymin) / image_height

对应代码实现：

def voc_to_yolo(xmin, ymin, xmax, ymax, img_w, img_h): x_center = (xmin + xmax) / 2 / img_w y_center = (ymin + ymax) / 2 / img_h width = (xmax - xmin) / img_w height = (ymax - ymin) / img_h return x_center, y_center, width, height

3. 完整脚本实现与优化

3.1 批量转换脚本架构

建议采用以下目录结构：

convert_tool/ ├── input/ │ ├── images/ # 原始图片 │ ├── yolo_txt/ # YOLO格式标注 │ └── voc_xml/ # VOC格式标注 ├── output/ ├── classes.txt # 类别定义文件 └── converter.py # 转换脚本

3.2 增强版转换脚本

import os import cv2 import xml.etree.ElementTree as ET from tqdm import tqdm # 进度条显示 class LabelConverter: def __init__(self, class_file): with open(class_file) as f: self.classes = [line.strip() for line in f.readlines()] def txt_to_xml(self, txt_path, img_path, output_dir): """YOLO txt转VOC xml""" img = cv2.imread(img_path) h, w = img.shape[:2] xml_content = [] xml_content.append('<annotation>') xml_content.append(f'<filename>{os.path.basename(img_path)}</filename>') xml_content.append('<size>') xml_content.append(f'<width>{w}</width>') xml_content.append(f'<height>{h}</height>') xml_content.append('<depth>3</depth>') xml_content.append('</size>') with open(txt_path) as f: for line in f: class_id, xc, yc, bw, bh = map(float, line.split()) xmin, ymin, xmax, ymax = self._yolo_to_voc(xc, yc, bw, bh, w, h) xml_content.append('<object>') xml_content.append(f'<name>{self.classes[int(class_id)]}</name>') xml_content.append('<bndbox>') xml_content.append(f'<xmin>{xmin}</xmin>') xml_content.append(f'<ymin>{ymin}</ymin>') xml_content.append(f'<xmax>{xmax}</xmax>') xml_content.append(f'<ymax>{ymax}</ymax>') xml_content.append('</bndbox>') xml_content.append('</object>') xml_content.append('</annotation>') output_path = os.path.join(output_dir, os.path.splitext(os.path.basename(txt_path))[0] + '.xml') with open(output_path, 'w') as f: f.write('\n'.join(xml_content)) def batch_convert(self, input_dir, output_dir, img_dir, mode='txt2xml'): """批量转换入口""" os.makedirs(output_dir, exist_ok=True) if mode == 'txt2xml': for txt_file in tqdm(os.listdir(input_dir)): if txt_file.endswith('.txt'): img_name = os.path.splitext(txt_file)[0] + '.jpg' self.txt_to_xml( os.path.join(input_dir, txt_file), os.path.join(img_dir, img_name), output_dir ) elif mode == 'xml2txt': # 实现XML到TXT的转换逻辑 pass @staticmethod def _yolo_to_voc(xc, yc, bw, bh, img_w, img_h): """坐标转换核心方法""" xmin = int((xc - bw/2) * img_w) xmax = int((xc + bw/2) * img_w) ymin = int((yc - bh/2) * img_h) ymax = int((yc + bh/2) * img_h) return max(0, xmin), max(0, ymin), min(img_w, xmax), min(img_h, ymax)

提示：脚本中添加了边界检查，确保转换后的坐标不会超出图像范围

4. 实战问题排查指南

4.1 常见错误及解决方案

错误现象	可能原因	解决方法
转换后坐标异常	图像尺寸读取错误	使用OpenCV的imread检查图像加载
类别ID越界	classes.txt与标注不匹配	确认类别文件与标注使用相同顺序
文件路径错误	相对路径处理不当	使用os.path.abspath转为绝对路径
内存不足	大尺寸图像批量处理	分批次处理或优化图像加载方式

4.2 高级调试技巧

可视化验证：转换后使用以下代码检查标注是否准确：

import matplotlib.pyplot as plt import matplotlib.patches as patches def plot_boxes(img_path, xml_path): img = plt.imread(img_path) fig, ax = plt.subplots(1) ax.imshow(img) tree = ET.parse(xml_path) for obj in tree.findall('object'): box = obj.find('bndbox') xmin = int(box.find('xmin').text) ymin = int(box.find('ymin').text) xmax = int(box.find('xmax').text) ymax = int(box.find('ymax').text) rect = patches.Rectangle( (xmin, ymin), xmax-xmin, ymax-ymin, linewidth=2, edgecolor='r', facecolor='none') ax.add_patch(rect) plt.show()

性能优化：处理大规模数据集时：

使用多进程加速：

from multiprocessing import Pool def parallel_convert(args): converter, txt, img, out = args converter.txt_to_xml(txt, img, out) with Pool(4) as p: # 4个进程 p.map(parallel_convert, task_list)

异常处理增强：在转换函数中添加完整性检查：

def safe_convert(txt_path, img_path, output_dir): try: if not os.path.exists(img_path): raise FileNotFoundError(f"Missing image: {img_path}") # 检查标注文件非空 if os.path.getsize(txt_path) == 0: print(f"Warning: Empty annotation {txt_path}") return # 执行转换 self.txt_to_xml(txt_path, img_path, output_dir) except Exception as e: print(f"Error processing {txt_path}: {str(e)}") with open('conversion_errors.log', 'a') as f: f.write(f"{txt_path}\t{str(e)}\n")

5. 扩展应用场景

5.1 与其他工具的集成

将转换脚本集成到标注流水线中：

LabelImg插件：通过修改LabelImg源码，在保存时自动生成两种格式
CI/CD流程：在模型训练前自动统一标注格式
数据增强管道：格式转换与图像增强同步进行

5.2 支持更多格式

扩展脚本以支持更多流行格式：

COCO JSON：适用于MMDetection等框架
TFRecord：TensorFlow标准格式
CSV：简化版表格格式

添加新格式的转换只需实现对应的坐标计算逻辑，例如COCO格式：

def to_coco(self, txt_path, img_path, img_id, ann_id): img = cv2.imread(img_path) h, w = img.shape[:2] with open(txt_path) as f: annotations = [] for line in f: class_id, xc, yc, bw, bh = map(float, line.split()) xmin, ymin, xmax, ymax = self._yolo_to_voc(xc, yc, bw, bh, w, h) annotations.append({ "id": ann_id[0], "image_id": img_id, "category_id": int(class_id), "bbox": [xmin, ymin, xmax-xmin, ymax-ymin], "area": (xmax-xmin)*(ymax-ymin), "iscrowd": 0 }) ann_id[0] += 1 return { "images": [{ "id": img_id, "width": w, "height": h, "file_name": os.path.basename(img_path) }], "annotations": annotations, "categories": [ {"id": i, "name": name} for i, name in enumerate(self.classes) ] }