Python小白的数学建模课-07.选址问题实战：从P中位到集合覆盖的PuLP求解-平芜编程栈

1. 选址问题入门：从生活场景到数学模型

选址问题听起来高大上，其实在我们生活中随处可见。比如你家楼下要开一家便利店，老板就得考虑开在哪里能覆盖最多小区居民；疫情期间新建方舱医院，政府需要规划位置让病患能够快速到达；就连你玩《模拟城市》游戏，也得琢磨着把警察局、医院放在哪儿最合理。这些本质上都是选址问题。

作为数学建模中的经典题型，选址问题通常具备四个关键要素：

设施：就是你要布局的东西，比如消防站、仓库、5G基站
区域：设施和服务对象所在的物理范围
距离：不一定是实际距离，也可以是时间、成本等度量
目标：要优化的方向，比如总成本最低、响应时间最快

我第一次接触这类问题时，被各种专业术语搞得头晕。后来发现只要把握住一个核心：选址问题就是在特定约束下，找到让某个目标最优的设施位置安排。举个例子，某外卖平台要在杭州新增5个配送站，希望让全市骑手平均配送时间最短——这就是典型的P-中位问题。

2. 三大经典模型详解与适用场景

2.1 P-中位问题：精打细算的总成本控制

去年我参与过一个社区菜鸟驿站的选址项目，要求从15个候选点选出3个，让所有住户取件路程总和最小。这正是P-中位问题的典型应用。

数学模型的关键点在于：

决策变量设置：

x_j = 1 # 候选点j被选中 y_ij = 1 # 需求点i由候选点j服务

目标函数：最小化加权距离总和
核心约束：
- 只能选P个设施
- 每个需求点必须被服务
- 只有被选中的点才能提供服务

用PuLP建模时，我发现用字典推导式定义变量特别方便：

# 定义0-1变量 x = pulp.LpVariable.dicts("选址", candidate_points, cat="Binary") y = pulp.LpVariable.dicts("服务", [(i,j) for i in demand_points for j in candidate_points], cat="Binary")

2.2 P-中心问题：雪中送炭的公平性考量

疫情期间某市要新增3个核酸检测点，要求最远居民到达时间不超过30分钟。这种"保证最坏情况最好"的需求，就是P-中心问题的用武之地。

与P-中位的主要区别：

目标函数变为最小化最大距离
需要引入辅助变量D表示最大距离
每个需求点的服务距离都不能超过D

实际编程时要注意：

# 添加最大距离变量 D = pulp.LpVariable("最大距离", lowBound=0) # 约束条件要确保所有距离≤D for i in demand_points: prob += pulp.lpSum([distance[i][j] * y[(i,j)] for j in candidate_points]) <= D

2.3 集合覆盖问题：应急场景的保底方案

消防站选址是集合覆盖问题的经典案例。某区要在8个候选点中选最少的消防站，确保所有小区能在10分钟内获得救援。

这类问题的特点是：

事先定义覆盖范围（如10分钟车程）
目标是最少设施数量
约束条件要求每个需求点至少被一个设施覆盖

PuLP建模技巧：

# 创建覆盖参数矩阵 cover = { (i,j): 1 if distance[i][j] <= 10 else 0 for i in demand_points for j in candidate_points } # 添加覆盖约束 for i in demand_points: prob += pulp.lpSum([x[j] * cover[(i,j)] for j in candidate_points]) >= 1

3. PuLP实战：从数据准备到模型求解

3.1 数据准备与预处理

以某物流企业实际案例为例，我们需要：

收集需求点坐标和货物量
测量候选仓库位置
计算距离矩阵（考虑实际路网）

import pandas as pd from geopy.distance import geodesic # 读取Excel数据 demand_df = pd.read_excel("demand_points.xlsx") candidate_df = pd.read_excel("candidate_points.xlsx") # 计算距离矩阵 distance = { (i,j): geodesic((di.lat,di.lng), (cj.lat,cj.lng)).km for i, di in demand_df.iterrows() for j, cj in candidate_df.iterrows() }

3.2 完整建模流程示范

下面展示一个P-中位问题的完整求解代码：

import pulp # 初始化问题 prob = pulp.LpProblem("Warehouse_Location", pulp.LpMinimize) # 定义变量 x = pulp.LpVariable.dicts("candidate", candidate_df.index, cat="Binary") y = pulp.LpVariable.dicts("service", [(i,j) for i in demand_df.index for j in candidate_df.index], cat="Binary") # 目标函数：最小化加权距离 prob += pulp.lpSum([ demand_df.loc[i,"weight"] * distance[(i,j)] * y[(i,j)] for i in demand_df.index for j in candidate_df.index ]) # 约束条件 # 1. 只能选P个仓库 prob += pulp.lpSum([x[j] for j in candidate_df.index]) == 3 # 2. 每个需求点必须被服务 for i in demand_df.index: prob += pulp.lpSum([y[(i,j)] for j in candidate_df.index]) == 1 # 3. 只有被选中的仓库才能提供服务 for i in demand_df.index: for j in candidate_df.index: prob += y[(i,j)] <= x[j] # 求解 prob.solve() # 输出结果 print("Status:", pulp.LpStatus[prob.status]) for j in candidate_df.index: if x[j].varValue > 0: print(f"仓库 {j} 被选中")

3.3 结果分析与可视化

求解完成后，我们可以用matplotlib展示选址结果：

import matplotlib.pyplot as plt # 绘制底图 plt.figure(figsize=(10,8)) plt.scatter(demand_df.lng, demand_df.lat, c="blue", label="需求点") # 标记选中的仓库 selected = [j for j in candidate_df.index if x[j].varValue > 0] plt.scatter(candidate_df.loc[selected, "lng"], candidate_df.loc[selected, "lat"], c="red", marker="*", s=200, label="选中仓库") # 绘制服务关系 for i in demand_df.index: for j in candidate_df.index: if y[(i,j)].varValue > 0.9: plt.plot([demand_df.loc[i,"lng"], candidate_df.loc[j,"lng"]], [demand_df.loc[i,"lat"], candidate_df.loc[j,"lat"]], "g--", alpha=0.3) plt.legend() plt.title("仓库选址优化结果") plt.show()

4. 避坑指南与性能优化

4.1 常见错误排查

距离矩阵计算错误：
- 确保使用合适的地理距离计算方法
- 检查坐标单位是否统一（经纬度还是平面坐标）
模型无可行解：
- 检查约束条件是否矛盾
- 确认P值设置是否合理（比如P太小导致无法覆盖所有需求点）
求解时间过长：
- 尝试先放松整数约束求解线性规划
- 考虑使用启发式算法预处理

4.2 大规模问题优化技巧

当遇到几百个候选点时，可以尝试：

预处理减少变量：

# 只保留能覆盖至少一个需求点的候选点 valid_candidates = { j for j in candidate_df.index if any(distance[(i,j)] <= max_distance for i in demand_df.index) }

使用稀疏矩阵存储：

from scipy.sparse import dok_matrix # 创建稀疏距离矩阵 distance_matrix = dok_matrix((len(demand_df), len(candidate_df)), dtype=float) for (i,j), dist in distance.items(): if dist <= 30: # 只存储有效距离 distance_matrix[i,j] = dist

并行计算加速：

from multiprocessing import Pool def calculate_distance(args): i, di, j, cj = args return (i,j), geodesic((di.lat,di.lng), (cj.lat,cj.lng)).km with Pool(processes=4) as pool: args = [(i,di,j,cj) for i,di in demand_df.iterrows() for j,cj in candidate_df.iterrows()] distance = dict(pool.map(calculate_distance, args))

4.3 模型扩展与变种

实际项目中，我们经常需要处理更复杂的情况：

带容量约束的选址：

# 添加容量约束 for j in candidate_df.index: prob += pulp.lpSum([ demand_df.loc[i,"demand"] * y[(i,j)] for i in demand_df.index ]) <= candidate_df.loc[j,"capacity"] * x[j]

多目标优化：

# 同时考虑成本和服务质量 prob += pulp.lpSum([cost[j]*x[j] for j in candidate_df.index]), "成本" prob += pulp.lpSum([ demand_df.loc[i,"weight"] * distance[(i,j)] * y[(i,j)] for i in demand_df.index for j in candidate_df.index ]), "服务质量" # 转化为单目标 prob += 0.7 * prob.objective["成本"] + 0.3 * prob.objective["服务质量"]

动态选址问题：

# 分时段建模 time_periods = ["早高峰", "平峰", "晚高峰"] for t in time_periods: prob += pulp.lpSum([x[j,t] for j in candidate_df.index]) <= max_stations[t] # 添加时段特定约束...