#!/usr/bin/env python # coding: utf-8 # # Mteor 227 - Python - File Input Examples # In this example, we will experiment with reading in several different text files and do some examples showing how we can manipulate the data. # # Let's start by reading in a simple text file that contains one number per line. The following line opens the file for reading. You will need two data files, scores.txt and station_data.txt, in your directory to run these examples. # In[18]: f=open('scores.txt','r') # Next, read the complete file and store the values in the object, lines, and print the result. # In[19]: lines =f.readlines() # In[20]: print(lines) # Notice that the values were read in as character strings. This will cause a problem when we try and do any calculations on the values, so let's convert them to integers using a simple, single line loop. Once converted, the values are once again printed. # In[21]: tests = [int(i) for i in lines] tests # There, that is much better. Now that the values are integers, we can do some simple calculations on them using simple, intrinsic functions that already exist in python. The sum() and len() functions are used here to find the average. # In[22]: avg_1 = sum(tests)/len(tests) avg_1 # Honestly, however, this is a pretty clunky way of finding the average. A much better way is to use the functions that are contained in the numerical python (numpy) package. To use this package, we use the import command. Also note here that I have also used the alias np for numpy. This will make it easier to use. # # Once the numpy package is imported, we can do all sorts of calculations on the data using the fuctions available in the package. # In[23]: import numpy as np average = np.mean(tests) std = np.std(tests) max = max(tests) min = min(tests) average,std,max,min # To really demonstrate the power of python, let's use some of what we have learned on our station data text file. # # Let's try and open the file and read in in using the same method as we did the file above. # In[24]: stations=open("station_data.txt",'r') # In[25]: data = stations.readlines() # If we print the data with the line below, we see the results are not good. All of the data on each line is contained in a single string. Since there are multiple integer values on each line, we cannot simply convert them to integers. There is a way to split the integers out, but this is more complicated and we really should look for a better way. # In[27]: print(data) # In[29]: print(data[0]) # One such way is to use the pandas python data analysis library. Let's import that package and alias it as pd for easy of use. # In[30]: import pandas as pd # To read the file, we will use the read_csv method and designated that the file is a space delimited file and that there is not a header. # In[31]: data = pd.read_csv('station_data.txt',delimiter=" ",header=None) # Printing out the data now shows us that we have now stored 1440 lines of 11 columns. That sounds like exactly what we were looking for. Thank you, pandas! # In[37]: print(data) # The temperature exists in the sixth column of the dataset. To print out the temperature of the first line of input, we would use data[5][0]. Python is row-major, so the column is listed first and the row second. Also remember that Python indexing starts with 0, so the sixth column is indexed with a 5. # In[38]: print(data[5][0]) # Now that we know how the data is stored, we can use the tools in our other packages to do calculations on the data. For example, the line below find the average temperature in the file. Note the [:] tells Python to use all of the row values in this column. # In[39]: print(np.mean(data[5][:])) # Pandas can also be used to read the simple text file we started this exercise with, however, read_csv is a bit too much for our simple file. Use read_table instead. Again, once the data is read, calcuations can be performed on the data. # In[40]: last = pd.read_table('scores.txt',header=None) # In[42]: print(np.mean(last[0])) last # In[ ]: