news 2025/12/25 15:40:18

How to Parse a CSV File in Bash

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
How to Parse a CSV File in Bash

1. Overview

In this tutorial, we’ll learn how to parse values from Comma-Separated Values (CSV) files with various Bash built-in utilities.

First, we’ll discuss the prerequisites to read records from a file. Then we’ll explore different techniques to parse CSV files into Bash variables and array lists.

Finally, we’ll examine a few third-party tools for advanced CSV parsing.

2. Prerequisites

Let’s briefly review the standards defined for CSV files:

  1. Each record is on a separate line, delimited by a line break.
  2. The last record in the file may or may not end with a line break.
  3. There may be anoptional header line appearing as the first line of the filewith the same format as regular record lines.
  4. Within the header and records, there may beone or more fields separated by a comma.
  5. Fields containing line breaks, double quotes, and commas should be enclosed in double-quotes.
  6. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.

CSV files containing records with commas or line breaks within quoted strings aren’t in our scope; however, we’ll discuss them briefly in the last section of the article.

Now let’s set up our standard sample CSV file:

$ cat input.csv SNo,Quantity,Price,Value 1,2,20,40 2,5,10,50Copy

2.1. Reading Records From a File

We’ll run an example to read records from our input file:

#!/bin/bash while read line do echo "Record is : $line" done < input.csvCopy

Here we used the read command to read the line-break (\n) separated records of our CSV file. We’ll check the output from our script:

Record is : SNo,Quantity,Price,Value Record is : 1,2,20,40 Record is : 2,5,10,50Copy

As we can see, there’s a complication; the header of the file is also getting processed. So let’s dive into the solutions.

2.2. Ignoring the Header Line

We’ll run another example to exclude the header line from the output:

#!/bin/bash while read line do echo "Record is : $line" done < <(tail -n +2 input.csv)Copy

Herewe used the tail command to read from the second line of the file. Subsequently, we passed the output as a file to thewhileloop using process substitution. The<(..)section enables us to specify thetailcommand and lets Bash read from its output like a file:

Record is : 1,2,20,40 Record is : 2,5,10,50Copy

Now we’ll try another way to achieve the same result:

#!/bin/bash exec < input.csv read header while read line do echo "Record is : $line" doneCopy

In this approach,we used the exec command to change the standard input to read from the file. Then we used thereadcommand to process the header line. Subsequently, we processed the remaining file in thewhileloop.

3. Parsing Values From a CSV File

So far, we’ve been reading line-break-separated records from CSV files. Henceforth, we’ll look atmethods to read the values from each data record.

3.1. From All Columns

Let’s see how to store the field values as we loop through the CSV file:

#! /bin/bash while IFS="," read -r rec_column1 rec_column2 rec_column3 rec_column4 do echo "Displaying Record-$rec_column1" echo "Quantity: $rec_column2" echo "Price: $rec_column3" echo "Value: $rec_column4" echo "" done < <(tail -n +2 input.csv)Copy

Note thatwe’re setting the Input Field Separator (IFS) to“,”in thewhileloop.As a result, we can parse the comma-delimited field values into Bash variables using thereadcommand.

We’ll also check the output generated on executing the above script:

Displaying Record-1 Quantity: 2 Price: 20 Value: 40 Displaying Record-2 Quantity: 5 Price: 10 Value: 50Copy

3.2. From the First Few Columns

There can be instances where we’re interested in reading only the first few columns of the file for processing.

We’ll demonstrate this with an example:

#! /bin/bash while IFS="," read -r rec_column1 rec_column2 rec_remaining do echo "Displaying Record-$rec_column1" echo "Quantity: $rec_column2" echo "Remaining fields of Record-$rec_column1 : $rec_remaining" echo "" done < <(tail -n +2 input.csv)Copy

In this example, we can store the value in the first and second fields of the input CSV in therec_column1andrec_column2variables, respectively. Notably, we stored the remaining fields in therec_remainingvariable.

Let’s look at the output of our script:

Displaying Record-1 Quantity: 2 Remaining fields of Record-1 : 20,40 Displaying Record-2 Quantity: 5 Remaining fields of Record-2 : 10,50Copy

3.3. From Specific Column Numbers

Again,we’ll use process substitution to pass only specific columns to thewhileloop for reading.To fetch those columns, we’ll utilize thecutcommand:

#! /bin/bash while IFS="," read -r rec1 rec2 do echo "Displaying Record-$rec1" echo "Price: $rec2" done < <(cut -d "," -f1,3 input.csv | tail -n +2)Copy

As a result, we can parse only the first and third columns of our input CSV.

We’ll validate it with the output:

Displaying Record-1 Price: 20 Displaying Record-2 Price: 10Copy

3.4. From Specific Column Names

There can be situations where we might need to parse the values from CSV based on column names in the header line.

We’ll illustrate this with a simple user-input-driven script:

#! /bin/bash col_a='SNo' read -p "Enter the column name to be printed for each record: " col_b loc_col_a=$(head -1 input.csv | tr ',' '\n' | nl |grep -w "$col_a" | tr -d " " | awk -F " " '{print $1}') loc_col_b=$(head -1 input.csv | tr ',' '\n' | nl |grep -w "$col_b" | tr -d " " | awk -F " " '{print $1}') while IFS="," read -r rec1 rec2 do echo "Displaying Record-$rec1" echo "$col_b: $rec2" echo "" done < <(cut -d "," -f${loc_col_a},${loc_col_b} input.csv | tail -n +2)Copy

This script takescol_bas input from the user, and prints the corresponding column value for every record in the file.

We calculated the location of a column using a combination of the tr,awk, grep,andnlcommands.

First, we converted the commas in the header line into line-breaks using thetrcommand. Then we appended the line number at the beginning of each line using thenlcommand. Next, we searched the column name in the output using thegrepcommand, and truncated the preceding spaces using thetrcommand.

Finally, we used theawkcommand to get the first field, which corresponds to the column number.

We’ll save the above script asparse_csv.shfor execution:

$ ./parse_csv.sh Enter the column name to be printed for each record: Price Displaying Record-1 Price: 20 Displaying Record-2 Price: 10Copy

As expected, when “Price” is given as the input, only the values of the column number corresponding to the string “Price” in the header are printed.This approach can be particularly useful when the sequence of columns in a CSV file isn’t guaranteed.

4. Mapping Columns of CSV File into Bash Arrays

In the previous section, we parsed the field values into Bash variables for each record. Now we’ll check methods to parse entire columns of CSV into Bash arrays:

#! /bin/bash arr_record1=( $(tail -n +2 input.csv | cut -d ',' -f1) ) arr_record2=( $(tail -n +2 input.csv | cut -d ',' -f2) ) arr_record3=( $(tail -n +2 input.csv | cut -d ',' -f3) ) arr_record4=( $(tail -n +2 input.csv | cut -d ',' -f4) ) echo "array of SNos : ${arr_record1[@]}" echo "array of Qty : ${arr_record2[@]}" echo "array of Price : ${arr_record3[@]}" echo "array of Value : ${arr_record4[@]}"Copy

We’reusing command substitution to exclude the header line using thetailcommand, and then using thecutcommand to filter the respective columns. Notably, thefirst set of parentheses is required to hold the output of the command substitution in variablearr_record1as an array.

Let’s check the script output:

array of SNos : 1 2 array of Qty : 2 5 array of Price : 20 10 array of Value : 40 50Copy

5. Parsing CSV File Into a Bash Array

There may be cases where we prefer to map the entire CSV file into an array. We can then use the array to process the records.

Let’s check the implementation:

#! /bin/bash arr_csv=() while IFS= read -r line do arr_csv+=("$line") done < input.csv echo "Displaying the contents of array mapped from csv file:" index=0 for record in "${arr_csv[@]}" do echo "Record at index-${index} : $record" ((index++)) doneCopy

In this example, we read the line from our input CSV, and then appended it to the arrayarr_csv(+=is used to append the records to Bash array). Then we printed the records of the array using aforloop.

Let’s check the output:

Displaying the contents of array mapped from csv file: Record at index-0 : SNo,Quantity,Price,Value Record at index-1 : 1,2,20,40 Record at index-2 : 2,5,10,50Copy

For Bash versions 4 and above, we can also populate the array using thereadarraycommand:

readarray -t array_csv < input.csvCopy

This reads lines frominput.csvinto an array variable,array_csv. The-toption will remove the trailing newlines from each line.

6. Parsing CSV Files Having Line Breaks and Commas Within Records

So far, we’ve used the fileinput.csvfor running all our illustrations.

Now we’ll create another CSV file containing line breaks and commas within quoted strings:

$ cat address.csv SNo,Name,Address 1,Bruce Wayne,"1007 Mountain Drive, Gotham" 2,Sherlock Holmes,"221b Baker Street, London"Copy

There can be several more permutations and combinations of line-breaks, commas, and quotes within CSV files. For this reason,it’s a complex task to process such CSV files with only Bash built-in utilities.Generally, third-party tools, like csvkit, are employed for advanced CSV parsing.

However, another suitable alternative is Python’s CSV module, as Python is generally pre-installed on most Linux distributions.

7. Conclusion

In this article, we studied multiple techniques to parse values from CSV files.

First, we discussed the CSV standards and checked the steps to read records from a file. Next, we implemented several case-studies to parse the field values of a CSV file. We also explored ways to handle the optional header line of CSV files.

Then we presented techniques to store either columns or all the records of a CSV file into Bash arrays. Finally, we offered a brief introduction to some third-party tools for advanced CSV parsing.

版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2025/12/15 15:01:46

云服务器系统优化:释放算力潜能的关键举措

云服务器的性能不仅取决于硬件配置&#xff0c;更离不开系统层面的优化。在硬件配置日益同质化的今天&#xff0c;系统优化已成为云服务厂商提升核心竞争力的关键抓手。云服务器系统优化是一项贯穿硬件、操作系统、虚拟化层、应用层的全链路工作&#xff0c;通过对各层的协同优…

作者头像 李华
网站建设 2025/12/15 15:00:04

鸿蒙PC UI控件库 - Label 基础标签详解

视频演示地址&#xff1a; https://www.bilibili.com/video/BV1jomdBBE4H/ &#x1f4cb; 目录 概述特性快速开始API 参考使用示例主题配置最佳实践常见问题总结 概述 Label 是控件库中的基础标签组件&#xff0c;支持多种尺寸、颜色、图标等功能&#xff0c;适用于状态标…

作者头像 李华
网站建设 2025/12/15 14:59:28

分享一个vscode的todo插件(todo Tree 包含使用

TodoTree插件是一款高效的任务管理工具&#xff0c;用户可通过插件商店下载安装。该插件以树状结构展示文件列表&#xff0c;支持使用Ctrl/快捷键快速添加注释&#xff0c;输入"TODO"即可标记待办事项。安装后能直观查看和管理代码中的任务标记&#xff0c;提升开发效…

作者头像 李华
网站建设 2025/12/15 14:57:14

踩坑记:DBeaver连接GBase 8S时“编码转换失败”的终极解决

本文由真实问题排查过程整理&#xff0c;适用于使用 DBeaver 连接 GBase 8s 遇到 java.sql.SQLException: 数据库地点信息不匹配 的开发者。问题现象 在 DBeaver 中使用官方 JDBC 驱动连接 image_analysis_db 时&#xff0c;只要 SQL 中包含中文&#xff0c;例如&#xff1a; I…

作者头像 李华
网站建设 2025/12/22 19:44:55

Java反射机制:原理、应用与最佳实践

在Java开发中&#xff0c;反射机制是一项核心且强大的技术&#xff0c;它允许程序在运行时获取类的信息、操作对象的属性和方法&#xff0c;甚至动态创建对象。本文将从原理、应用场景到最佳实践&#xff0c;全面解析Java反射机制&#xff0c;帮助开发者灵活运用这一技术。一、…

作者头像 李华
网站建设 2025/12/15 14:55:24

ReAct+LangGraph:构建大模型智能体的完整指南(含代码示例)

简介 本文详细介绍了如何使用LangGraph框架构建ReAct智能体&#xff0c;分为硬编码和基于大语言模型两种实现方式。ReAct框架通过"推理行动"的循环流程&#xff0c;使智能体能思考并解决问题。LangGraph允许将智能体行为定义为"图"结构&#xff0c;支持复杂…

作者头像 李华