Suppose you have multiple *.csv files derived from your machine everyday and you need to combine it into one big .csv for further actions, everybody knows Ctrl-C/Ctrl-V would do the trick but what if we are talking about hundreds of ’em?
Pandas and glob may be one sound solution to accomplish this task with ease.
In my example I have 4 .csv files in the same folder, each of them has the same header structure:
The first two rows of the .csv file are the headers of the file, when I combine them, I don’t want to include these headers, just the clean data started from row 3 should be read and write into output file.
Here is a small piece of code which could combine all .csv files in the current working folder into one .xlsx (Excel 2013+) file.
import pandas as pd
import numpy as np
wk_path = os.getcwd()
allFiles = glob.glob(wk_path + "/*.csv")
frame = pd.DataFrame()
list_ = 
for file_ in allFiles:
df = pd.read_csv(file_, index_col = None, header = 0, skiprows = 1)
frame = pd.concat(list_)
the “skiprows = 1” will skip to first 2 rows (The red rectangle part) in each file, and there is a for loop to read each file into the list_ list, and then each list will be append to the DataFrame.