深度学习必备技能：够用就行的文件名与路径操作-平芜编程栈

深度学习的时候难免需要处理各类文件，文件名，但是方法有很多种，按自己的习惯总结了一套。
**

1.文件路径与文件名处理

** 1.1从路径到文件名

首先你需要确定数据所在根目录

path="/home/User/Script/Model_Test/"

使用pathlib将路径字符串转换为Path对象，读取目录下所有文件,files是一个list类型，每一项是该文件的绝对路径组成的Path对象

frompathlibimportPath pathFolder=Path(path)files=list(pathFolder.glob("*.*"))print(type(files))

<class'list'>

files内容：

[PosixPath('/home/User/Script/Model_Test/2026_03_28_01.txt.tar.gz'), PosixPath('/home/User/Script/Model_Test/2026_03_28_02.txt'), PosixPath('/home/User/Script/Model_Test/2026_03_28_03.txt'), PosixPath('/home/User/Script/Model_Test/2026_03_28_04.txt'),]

遍历文件

forfileinfiles:

1.2文件名处理

文件名由文件名.后缀组成，Path对象文件名后缀处理：

#绝对路径print(type(file),file)#<class 'pathlib.PosixPath'> /home/User/Script/Model_Test/2026_03_28_01.tar.gz#含后缀文件名print(type(file.name),file.name)#<class 'str'> 2026_03_28_01.tar.gz#文件最后一个后缀".*"print(type(file.suffix),file.suffix)#<class 'str'> .gz#文件名print(type(file.stem),file.stem)#<class 'str'> 2026_03_28_01.tar#如果碰到.tar.gz文件，没有两个后缀,因为files[0].stem是<class 'str'>print(file.stem.suffix)#AttributeError: 'str' object has no attribute 'suffix'#可以通过再次转化为Path()对象即可print(type(Path(file.stem).suffix),Path(file.stem).suffix，'\n',type(Path(file.stem).stem),Path(file.stem).stem)#<class 'str'> .tar#<class 'str'> 2026_03_28_01

Path对象文件名匹配问题：
Path对象的属性Path.stem,Path.name.Path.suffix,都是字符串类型,提取.stem作为字符串处理

fileStem=Path(file.stem).stemprint(type(fileStem),fileStem)#<class 'str'> 2026_03_28_01

str.split(‘*’)法：适用比较简单的文件名类型，如2026_03_28_01

year=fileStem.split("_")[0]mouth=fileStem.split("_")[1]day=fileStem.split("_")[2]no=fileStem.split("_")[3]print(type(year),year,"\n",type(mouth),mouth,"\n",type(day),day,"\n",type(no),no)

输出：

<class'str'>2026<class'str'>03<class'str'>28<class'str'>01

从右边开始拆的str.rsplit(“*”),从左往右返回结果
**正则表达式法：**当文件名较为复杂的时候，如abc219-DEF5687_2687.txt.tar.gz,可以使用正则表达式库re
找全部符合条件的：re.findall(正则表达式，字符串)，返回的是列表，列表内每项都是，
找第一个符合条件的：re.search(正则表达式，字符串），返回match对象
拆分：re.split(正则表达式，字符串）,
匹配：re.match(正则表达式，字符串）返回match对象
针对match对象分组：re.search().group()，re.match().group()，返回字符串

importrefile=Path('/abc219-DEF2026_03_05_14_00.txt.tar.gz')fileStem=file.stem

模式	说明
`\d`	digit 单个数字
`\d{4}`	匹配连续四位数字
`\d+`	匹配一个或多个数字
`\w`	word 单个字母、数字、下划线
`\w{4}`	匹配连续四位字母/数字/下划线
`\w+`	匹配一个或多个连续的字母/数字/下划线
`\.`	匹配 “.”（需要转义）
`^`	字符串以xxx开头，如`^abc`
`$`	字符串以xxx结尾，如`abc$`
`.`	匹配任意单个字符
`.*`	匹配任意多个字符（含0个）
`()`	括号内的为群组（捕获组）

print('d:\n',type(re.findall(r'\d',fileName)),type(re.findall(r'\d',fileName)[0]),re.findall(r'\d',fileName),type(re.findall(r'\d',fileName)[0]),'\n',type(re.search(r'\d+',fileName)),re.search(r'\d+',fileName),'\n',type(re.search(r'\d{3}',fileName)),re.search(r'\d{3}',fileName),'\n','w:\n',type(re.findall(r'\w',fileName)),re.findall(r'\w',fileName),type(re.findall(r'\w',fileName)[0]),'\n',type(re.findall(r'\w{2}',fileName)),re.findall(r'\w{2}',fileName),'\n',type(re.findall(r'\w+',fileName)),re.findall(r'\w+',fileName),'\n','\.:\n',type(re.findall(r'\.',fileName)),re.findall(r'\.',fileName),'\n','^\w+:\n',type(re.findall(r'^\w+',fileName)),re.findall(r'^\w+',fileName),'\n','^abc.*:\n',type(re.search(r'^abc.*',fileName)),re.search(r'^abc.*',fileName),'\n','.*.tar$:\n',type(re.findall(r'.*.tar$',fileName)),re.findall(r'.*.tar$',fileName),'\n')

输出：

d:<class'list'><class'str'>['2','1','9','2','0','2','6','0','3','0','5','1','4','0','0']<class'str'><class're.Match'><re.Match object;span=(3,6),match='219'><class're.Match'><re.Match object;span=(3,6),match='219'>w:<class'list'>['a','b','c','2','1','9','D','E','F','2','0','2','6','_','0','3','_','0','5','_','1','4','_','0','0','t','x','t','t','a','r']<class'str'><class'list'>['ab','c2','19','DE','F2','02','6_','03','_0','5_','14','_0','tx','ta']<class'list'>['abc219','DEF2026_03_05_14_00','txt','tar']\.:<class'list'>['.','.']^\w+:<class'list'>['abc219']^abc.*:<class're.Match'><re.Match object;span=(0,34),match='abc219-DEF2026_03_05_14_00.txt.tar'>.*.tar$:<class'list'>['abc219-DEF2026_03_05_14_00.txt.tar']

正则表达式返回的的是一个match对象，通过group(0)输出整个match对象的字符串类型，group(n)则是patterns内第n个括号内的字符串：

patterns=r"\w+\d+-\w{3}(\d+)_(\d+)_(\d+)_(\d+)_(\d+).*"match=re.match(patterns,fileName)#一般写法，加if：ifmatch:print(f"""all_data ={match.group(0)},{type(match)},{type(match.group(0))}YY ={match.group(1)},{type(match.group(1))}MM ={match.group(2)},{type(match.group(2))}DD ={match.group(3)},{type(match.group(3))}hh ={match.group(4)},{type(match.group(4))}dd ={match.group(5)},{type(match.group(5))}""")

输出：

all_data=abc219-DEF2026_03_05_14_00.txt.tar,<class're.Match'>,<class'str'>YY=2026,<class'str'>MM=03,<class'str'>DD=05,<class'str'>hh=14,<class'str'>dd=00,<class'str'>

2.文本文件操作

open(file, mode=‘r’, buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

file: 文件路径（字符串或Path对象）
mode: 打开模式
encoding: 文本编码（如’utf-8’）

文本编码与文件打开模式

模式	说明
`'r'`	只读（默认），文件必须存在
`'w'`	只写，文件存在则清空，不存在则创建
`'a'`	追加，文件存在则追加，不存在则创建
`'x'`	独占创建，文件已存在则报错
`'b'`	二进制模式（如`'rb'`，`'wb'`）
`'t'`	文本模式（默认）
`'+'`	读写模式（如`'r+'`，`'w+'`）

组合使用示例

'rb'：以二进制格式只读打开
'wb'：以二进制格式写入（会覆盖）
'a+'：以读写模式追加
'r+b'：以二进制格式读写打开

path="/home/User/Script/Model_Test/"pathFolder=Path(path)files=list(pathFloder.glob("*.txt"))forfileinfiles:print("file Name = ",file.name)withopen(file,"r")asf:lines=f.readlines()print("readlinesType = ",type(lines),'\n',type(lines[0]),lines[0])forlineinlines:print(len(line))cleanline=line.strip()print(len(cleanline))

with是一个上下文管理器，自动管理文件，打开后不需要手动关闭
as表示将打开的文件赋值给f
**f.readlines()**表示读取整个文件，返回的是一个列表，每一项是该行的字符串，并且，它包含档尾，换行！
**.strip()**表示移除字符串首尾的指定字符，默认为空白字符

输出：

fileName=abc-123456.txt readlinesType=<class'list'><class'str'>>ABC12345678(Defghij:klmn901 opqr:STU23456789 vwx:yza01:2345678-9012345)7473

3.CSV文件操作

安装pandas库

condainstallpandas

pd.read_csv(file_path, index_col=0, header=0) 读取CSV文件

file_path - 文件路径
index_col=0 - 将第0列（第一列）作为行索引
header=0 - 将第0行（第一行）作为列名

表达式	说明	返回类型
`df.loc[primary_key, attribute]`	行名primary_key列名attribute定位数据	由csv文件决定
`df.loc[primary_key]`	整行	`DataFrame`
`df.loc[[primary_key1, primary_key2]]`	多行	`DataFrame`
`df.loc[:, attribute]`	整列	`DataFrame`
`df.loc[:, [attribute1, attribute2]]`	多列	`DataFrame`
`df.loc[primary_key, attribute]`	对应的值	具体类型由csv文件决定
`df.loc[[primary_key1, primary_key2], [attribute1, attribute2]]`	多个值（第一个列表是行值，第二个列表是列值）	`DataFrame`

importpandasaspd path="/abc.csv"df=pd.read_csv(path,index_col=0,header=0)print(df.loc['a','b'])

深度学习必备技能：够用就行的文件名与路径操作

1.文件路径与文件名处理

** 1.1从路径到文件名

1.2文件名处理

2.文本文件操作

文本编码与文件打开模式

组合使用示例

3.CSV文件操作

老Mac升级新系统终极指南：OpenCore Legacy Patcher完整教程

3个实战技巧突破Sketch设计稿到HTML的智能转换瓶颈

3步解决Zotero文献管理效率难题：从格式混乱到规范统一的蜕变

学习C语言的第一周

OpenClaw模型热切换：gemma-3-12b-it与Qwen根据任务动态调用

利用快马ai快速生成openclaw部署原型，十分钟搭建测试环境