想玩转人体姿态估计？从零开始用Python解析LSP数据集（附完整代码）-平芜编程栈

从零解析LSP数据集：Python实战人体姿态估计数据预处理

在计算机视觉领域，人体姿态估计一直是热门研究方向之一。而想要入门这个领域，第一步往往不是搭建复杂的神经网络模型，而是学会如何处理原始数据集。Leeds Sports Pose（LSP）作为经典的人体姿态估计基准数据集，包含了2000张运动场景图像和对应的14个关键点标注，是初学者理想的练手材料。但当你真正下载这个数据集后，可能会被MATLAB格式的.mat文件和复杂的目录结构搞得一头雾水。

本文将带你一步步用Python解析LSP数据集，从读取原始数据到可视化验证，最终转换为更通用的JSON格式。不同于简单的数据集介绍，我们聚焦于"拿到数据后第一步该做什么"这个实际问题，提供可直接复用的代码和清晰的思路。无论你是刚接触计算机视觉的学生，还是想扩展技能树的开发者，都能通过这个实战案例掌握数据处理的关键技能。

1. 理解LSP数据集结构

LSP数据集解压后通常包含以下文件和目录：

LSP_dataset/ ├── images/ # 原始图像（2000张JPEG） ├── visualized/ # 带标注的可视化图像（2000张） ├── joints.mat # MATLAB格式的标注数据 └── README.txt # 数据集说明文件

关键点标注存储在joints.mat文件中，这是一个MATLAB格式的二进制文件，包含一个3×14×2000的矩阵。其中：

第一个维度（3）分别表示：x坐标、y坐标和可见性（0不可见，1可见）
第二个维度（14）对应14个人体关键点
第三个维度（2000）对应2000张图像

14个关键点的顺序固定为：

右脚踝
右膝盖
右髋部
左髋部
左膝盖
左脚踝
右手腕
右肘部
右肩部
左肩部
左肘部
左手腕
颈部
头顶

2. 环境准备与依赖安装

在开始处理数据前，我们需要准备Python环境并安装必要的库。推荐使用Python 3.7+版本，并创建一个干净的虚拟环境：

python -m venv lsp_env source lsp_env/bin/activate # Linux/Mac # 或 lsp_env\Scripts\activate # Windows

安装所需依赖库：

pip install numpy scipy matplotlib opencv-python tqdm

这些库的作用分别是：

numpy：处理多维数组数据
scipy：读取MATLAB格式文件
matplotlib：数据可视化
opencv-python：图像处理
tqdm：进度条显示

3. 解析MATLAB标注文件

使用Python解析.mat文件的核心是scipy.io.loadmat函数。下面我们创建一个Python脚本来提取关键点信息：

import numpy as np from scipy.io import loadmat def parse_lsp_mat(mat_path): """ 解析LSP数据集的joints.mat文件 参数: mat_path: joints.mat文件路径 返回: joints_array: 形状为(2000, 14, 3)的numpy数组 2000张图像，14个关键点，每个点有(x,y,visibility)三个值 """ mat_data = loadmat(mat_path) joints = mat_data['joints'] # 提取关键数据 # 调整数组维度顺序为(图像数量, 关键点数量, 坐标+可见性) joints_array = np.transpose(joints, (2, 1, 0)) return joints_array

调用这个函数并检查数据：

joints_array = parse_lsp_mat('LSP_dataset/joints.mat') print(f"数据集形状: {joints_array.shape}") print("第一张图像的第一个关键点信息(x,y,visibility):") print(joints_array[0, 0])

4. 可视化验证标注数据

为了确保我们正确解析了数据，最好的方法是可视化原始图像和对应的关键点。下面是一个可视化函数：

import cv2 import matplotlib.pyplot as plt def visualize_keypoints(img_path, keypoints, visibility_threshold=0.5): """ 可视化图像和关键点 参数: img_path: 图像路径 keypoints: 14个关键点的坐标和可见性，形状为(14,3) visibility_threshold: 可见性阈值 """ img = cv2.imread(img_path) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # 转换颜色通道 plt.figure(figsize=(10, 10)) plt.imshow(img) # 定义关键点连接线（骨架） skeleton = [ (13, 12), # 头顶到颈部 (12, 8), # 颈部到右肩 (8, 7), # 右肩到右肘 (7, 6), # 右肘到右手腕 (12, 9), # 颈部到左肩 (9, 10), # 左肩到左肘 (10, 11), # 左肘到左手腕 (12, 2), # 颈部到右髋 (2, 1), # 右髋到右膝 (1, 0), # 右膝到右脚踝 (12, 3), # 颈部到左髋 (3, 4), # 左髋到左膝 (4, 5) # 左膝到左脚踝 ] # 绘制关键点 for i, (x, y, vis) in enumerate(keypoints): if vis > visibility_threshold: plt.scatter(x, y, color='red', s=50) plt.text(x, y, str(i), color='white', fontsize=8) # 绘制骨架连线 for (i, j) in skeleton: if (keypoints[i, 2] > visibility_threshold and keypoints[j, 2] > visibility_threshold): plt.plot( [keypoints[i, 0], keypoints[j, 0]], [keypoints[i, 1], keypoints[j, 1]], linewidth=2, color='green' ) plt.axis('off') plt.show()

使用示例：

# 选择第一张图像进行可视化 img_idx = 0 img_path = f"LSP_dataset/images/im{img_idx+1:04d}.jpg" keypoints = joints_array[img_idx] visualize_keypoints(img_path, keypoints)

5. 转换为通用数据格式

为了更方便地在不同框架中使用这些数据，我们可以将其转换为JSON格式。下面是将整个数据集转换为JSON的函数：

import json import os from tqdm import tqdm def convert_lsp_to_json(dataset_dir, output_path): """ 将LSP数据集转换为JSON格式 参数: dataset_dir: 数据集根目录 output_path: 输出的JSON文件路径 """ # 解析MATLAB文件 mat_path = os.path.join(dataset_dir, "joints.mat") joints_array = parse_lsp_mat(mat_path) # 准备JSON数据结构 data = { "info": { "dataset": "Leeds Sports Pose (LSP)", "keypoints": [ "right_ankle", "right_knee", "right_hip", "left_hip", "left_knee", "left_ankle", "right_wrist", "right_elbow", "right_shoulder", "left_shoulder", "left_elbow", "left_wrist", "neck", "head_top" ], "skeleton": [ [13, 12], [12, 8], [8, 7], [7, 6], [12, 9], [9, 10], [10, 11], [12, 2], [2, 1], [1, 0], [12, 3], [3, 4], [4, 5] ] }, "images": [] } # 遍历所有图像 image_dir = os.path.join(dataset_dir, "images") total_images = joints_array.shape[0] for img_idx in tqdm(range(total_images), desc="Processing images"): img_name = f"im{img_idx+1:04d}.jpg" img_path = os.path.join(image_dir, img_name) # 获取图像尺寸 img = cv2.imread(img_path) height, width = img.shape[:2] # 准备关键点数据 keypoints = [] for x, y, vis in joints_array[img_idx]: keypoints.extend([float(x), float(y), float(vis)]) # 添加到JSON结构 data["images"].append({ "file_name": img_name, "width": width, "height": height, "keypoints": keypoints }) # 保存为JSON文件 with open(output_path, "w") as f: json.dump(data, f, indent=2) print(f"转换完成，结果已保存到 {output_path}")

调用这个函数：

convert_lsp_to_json("LSP_dataset", "lsp_dataset.json")

生成的JSON文件结构如下：

{ "info": { "dataset": "Leeds Sports Pose (LSP)", "keypoints": [ "right_ankle", "right_knee", "right_hip", "left_hip", "left_knee", "left_ankle", "right_wrist", "right_elbow", "right_shoulder", "left_shoulder", "left_elbow", "left_wrist", "neck", "head_top" ], "skeleton": [ [13, 12], [12, 8], [8, 7], [7, 6], [12, 9], [9, 10], [10, 11], [12, 2], [2, 1], [1, 0], [12, 3], [3, 4], [4, 5] ] }, "images": [ { "file_name": "im0001.jpg", "width": 202, "height": 202, "keypoints": [x1,y1,vis1, x2,y2,vis2, ..., x14,y14,vis14] }, ... ] }

6. 数据增强与预处理技巧

原始数据直接用于训练可能效果不佳，这里介绍几个实用的预处理技巧：

6.1 关键点归一化

将关键点坐标归一化到[0,1]范围，使模型不受图像绝对尺寸影响：

def normalize_keypoints(keypoints, img_width, img_height): """ 归一化关键点坐标 参数: keypoints: 原始关键点数组，形状为(14,3) img_width: 图像宽度 img_height: 图像高度 返回: 归一化后的关键点数组 """ normalized = keypoints.copy() normalized[:, 0] /= img_width # x坐标归一化 normalized[:, 1] /= img_height # y坐标归一化 return normalized

6.2 数据增强示例：随机水平翻转

水平翻转是姿态估计中常用的数据增强方法，但需要注意左右关键点的对应关系：

def horizontal_flip(image, keypoints): """ 水平翻转图像和关键点 参数: image: 原始图像(numpy数组) keypoints: 原始关键点数组，形状为(14,3) 返回: flipped_image: 翻转后的图像 flipped_keypoints: 翻转后的关键点 """ # 翻转图像 flipped_image = cv2.flip(image, 1) # 定义左右对称的关键点索引 left_right_pairs = [ (0, 5), # 右脚踝 <-> 左脚踝 (1, 4), # 右膝盖 <-> 左膝盖 (2, 3), # 右髋部 <-> 左髋部 (6, 11), # 右手腕 <-> 左手腕 (7, 10), # 右肘部 <-> 左肘部 (8, 9) # 右肩部 <-> 左肩部 ] # 创建翻转后的关键点数组 flipped_keypoints = keypoints.copy() width = image.shape[1] # 处理对称关键点 for i, j in left_right_pairs: flipped_keypoints[i], flipped_keypoints[j] = keypoints[j].copy(), keypoints[i].copy() flipped_keypoints[i, 0] = width - keypoints[j, 0] # 调整x坐标 flipped_keypoints[j, 0] = width - keypoints[i, 0] # 处理不对称关键点(颈部和头顶) flipped_keypoints[12, 0] = width - keypoints[12, 0] # 颈部 flipped_keypoints[13, 0] = width - keypoints[13, 0] # 头顶 return flipped_image, flipped_keypoints

6.3 创建PyTorch数据集类

为了更方便地在PyTorch中使用LSP数据集，我们可以创建一个自定义Dataset类：

import torch from torch.utils.data import Dataset class LSPDataset(Dataset): def __init__(self, json_path, transform=None): """ 初始化LSP数据集 参数: json_path: 转换后的JSON文件路径 transform: 数据增强变换 """ with open(json_path) as f: self.data = json.load(f) self.transform = transform self.keypoint_names = self.data["info"]["keypoints"] self.skeleton = self.data["info"]["skeleton"] def __len__(self): return len(self.data["images"]) def __getitem__(self, idx): img_info = self.data["images"][idx] img_path = os.path.join("LSP_dataset", "images", img_info["file_name"]) # 加载图像 image = cv2.imread(img_path) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # 获取关键点(形状为[14,3]) keypoints = np.array(img_info["keypoints"]).reshape(-1, 3) # 应用变换 if self.transform: image, keypoints = self.transform(image, keypoints) # 转换为torch张量 image = torch.from_numpy(image).permute(2, 0, 1).float() / 255.0 keypoints = torch.from_numpy(keypoints).float() return image, keypoints

使用示例：

dataset = LSPDataset("lsp_dataset.json") print(f"数据集大小: {len(dataset)}") # 获取第一个样本 image, keypoints = dataset[0] print(f"图像形状: {image.shape}") print(f"关键点形状: {keypoints.shape}")

7. 常见问题与解决方案

在实际处理LSP数据集时，可能会遇到以下问题：

7.1 关键点可见性处理

LSP数据集中的关键点可见性标记有时不够准确。建议的处理策略：

训练时：对不可见关键点(x,y)坐标使用零值，并在损失函数中通过可见性标记加权
评估时：只计算可见关键点的精度

def prepare_for_training(keypoints): """ 准备训练用的关键点数据 参数: keypoints: 原始关键点数组，形状为(14,3) 返回: processed: 处理后的关键点，不可见点的坐标置零 """ processed = keypoints.copy() invisible = processed[:, 2] < 0.5 # 可见性阈值 processed[invisible, :2] = 0 # 不可见点的坐标置零 return processed

7.2 图像尺寸不一致

虽然LSP图像尺寸相近，但仍存在细微差异。建议统一调整尺寸：

def resize_image_and_keypoints(image, keypoints, target_size=(256, 256)): """ 调整图像尺寸并相应调整关键点坐标 参数: image: 原始图像 keypoints: 原始关键点数组，形状为(14,3) target_size: 目标尺寸(宽,高) 返回: resized_image: 调整后的图像 resized_keypoints: 调整后的关键点 """ h, w = image.shape[:2] new_w, new_h = target_size # 调整图像尺寸 resized_image = cv2.resize(image, target_size) # 调整关键点坐标 resized_keypoints = keypoints.copy() resized_keypoints[:, 0] = keypoints[:, 0] * (new_w / w) resized_keypoints[:, 1] = keypoints[:, 1] * (new_h / h) return resized_image, resized_keypoints

7.3 数据不平衡问题

LSP数据集包含不同运动类别，某些姿势可能样本较少。解决方案：

过采样：复制少数类别的样本
数据增强：对少数类别使用更激进的数据增强
类别加权：在损失函数中为不同类别设置不同权重

def analyze_pose_distribution(json_path): """ 分析姿势分布情况(简化版) 参数: json_path: 转换后的JSON文件路径 """ with open(json_path) as f: data = json.load(f) # 这里简化分析，实际应根据图像内容分类 # 例如通过关键点角度判断姿势类型 print("总样本数:", len(data["images"])) print("关键点名称:", data["info"]["keypoints"])

处理LSP数据集时，我发现在可视化阶段最容易出现问题。特别是当关键点坐标超出图像范围时，可视化会失败。一个实用的调试技巧是添加边界检查：

def safe_visualize(img_path, keypoints, img_size=(202, 202)): """ 带边界检查的可视化函数 参数: img_path: 图像路径 keypoints: 关键点数组，形状为(14,3) img_size: 图像尺寸(宽,高) """ # 检查关键点是否在图像范围内 valid = np.logical_and.reduce([ keypoints[:, 0] >= 0, keypoints[:, 0] < img_size[0], keypoints[:, 1] >= 0, keypoints[:, 1] < img_size[1], keypoints[:, 2] > 0.5 # 可见 ]) if not np.all(valid): print(f"警告: 图像 {img_path} 中有 {np.sum(~valid)} 个关键点在图像外") # 调用原始可视化函数 visualize_keypoints(img_path, keypoints)