To drop all empty columns (but still keeping the headers) using the Python Pandas library we can use the following 4-line script to read in the csv file, drop the columns where all the elements are missing, and save the data to a new csv file.
from pandas.io.parsers import read_csv
data = read_csv('remove-empty-columns.csv')
filtered_data = data.dropna(axis='columns', how='all')
filtered_data.to_csv('empty-columns-removed.csv', index=False)
As shown below, the sample data included in the csv file has 3 columns which contain missing values.
The second column, labeled bar, is completely empty except the header; columns like this should be dropped. The other columns contain data, but should not be dropped even though they contain some missing values.
data
foo | bar | baz | |
---|---|---|---|
0 | a | NaN | 1.0 |
1 | b | NaN | 2.0 |
2 | c | NaN | NaN |
3 | NaN | NaN | 4.0 |
4 | e | NaN | 5.0 |
Using the pandas.DataFrame.dropna() function with the columns axis we can drop any column where all the entries are NaN (missing values).
filtered_data = data.dropna(axis='columns', how='all')
filtered_data
foo | baz | |
---|---|---|
0 | a | 1.0 |
1 | b | 2.0 |
2 | c | NaN |
3 | NaN | 4.0 |
4 | e | 5.0 |