Before going through this tutorial. Work out the basic python programming exercises on code academy.
Python places an emphasis on readablity, simplicity and explicitiness.
Every thing is an object in python. Every number, string, data structure, class are referred to as python objects.
You can use comments to summarize a code. See the below example for comments. For printing a statement you can use 'print' command. Strings should be included in double quotes.
print "Big data examiner" #Big data examiner is a one stop place to learn datascience.
Big data examiner
You can return type of an object using type command. You can check whether an object is an instance of a particular type using isinstance function.
a= 'Big data'
print type(a)
b= 'Examiner'
print type(b)
c= 4.5
print isinstance(a, str)
print isinstance(a,int)
print isinstance(c, (int, float))
<type 'str'> <type 'str'> True False True
Objects and attributes of a python object can be accessed using object.attribute_name.
a = 'Bill gates'
a.<tab> # remove <tab> and press tab button
File "<ipython-input-15-94d2f58585b1>", line 2 a.<tab> # remove <tab> and press tab button ^ SyntaxError: invalid syntax
You can import a Python module using import command.
import numpy as np # importing numpy as np
data_new = [6, 7.5, 8, 0, 1]
data = np.array(data1) # accessing numpy as np. Here I am converting a list to array
data
array([ 6. , 7.5, 8. , 0. , 1. ])
try these functions, these are self explanatory
x= [1,2,3,4]
y = x
z=list(x)
print x is y
print x is not z
# you can use the following operators:
# x // y -> this is called floor divide, it drops the fractional remainder
# x** y -> raise x to the y the power.
# x< =y, x<y -> True if y is less than or equal to y. Same implies with greater than symbol.
# same applies to other logical operators such as &, |, ^, ==, !=
True True
Objects whose value can be changed, once they are created are called mutable objects. Objects whose value cannot be changed, once they are created are called immutable objects
# list, dict, arrays are a mutable
programming = ['Python', 'R', 'Java', 'Php']
programming[2] ='c++'
print programming
#Strings and tuples are immutable
z_tuple = (9, 10, 11, 23)
z_tuple[1] = 'twenty two' # you cant mutate a tuple
['Python', 'R', 'c++', 'Php']
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-33-1282c7c7a358> in <module>() 6 #Strings and tuples are immutable 7 z_tuple = (9, 10, 11, 23) ----> 8 z_tuple[1] = 'twenty two' TypeError: 'tuple' object does not support item assignment
# you can write multiline strings using triple quotes ''' or """
"""
Hi! learn Python it is fun
Data science and machine learning are amazing
"""
'\nHi! learn Python it is fun \nData science and machine learning are amazing\n'
# As I said before python strings are immutable.
x= ' This is big data examiner'
x[10] = 'f'
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-43-033ea51cd601> in <module>() 1 # As I said before python strings are immutable. 2 x= ' This is big data examiner' ----> 3 x[10] = 'f' TypeError: 'str' object does not support item assignment
x = 'Java is a powerful programming language'
y = x.replace('Java', 'Python')
y
'Python is a powerful programming language'
# many python objects can be converted to a string using 'str' function
x = 56664
y = str(x)
print y
print type(y)
# strings act like other sequences, such as lists and tuples
a = 'Python'
print list(a)
print a[:3] # you can slice a python string
print a[3:]
56664 <type 'str'> ['P', 'y', 't', 'h', 'o', 'n'] Pyt hon
#String concentation is very important
p = "P is the best programming language"
q = ", I have ever seen"
z = p+q
z
'P is the best programming language, I have ever seen'
You have to do lot of string formatting while doing data analysis. You can format an argument as a string using %s, %d for an integer, %.3f for a number with 3 decimal points
print "Hii space left is just %.3f gb, and the data base is %s" %(0.987, 'mysql')
print "Hii space left is just %f gb, and the data base is %s" %(0.987, 'mysql')
print "Hii space left is just %d gb, and the data base is %s" %(0.987, 'mysql')
Hii space left is just 0.987 gb, and the data base is mysql Hii space left is just 0.987000 gb, and the data base is mysql Hii space left is just 0 gb, and the data base is mysql
# boolean values in python are written as True and False.
print True and True
print True or False
print True and False
True True False
#Empty iterables(list, dict, strings, tuples etc) are treated as False ,if used with a control flow(if, for ,while.. etc)
print bool([]), bool([1,2,3])
print bool('Hello Python!'), bool('')
bool(0), bool(1)
False True True False
(False, True)
x = '1729'
y = float(x)
print type(y)
print int(y)
print bool(y)
<type 'float'> 1729 True
#Python date and time module provides datetime, date and time types
from datetime import datetime, date, time
td = datetime(1989,6,9,5,1, 30)# do not write number 6 as 06, you will get an invalid token error.
print td.day
print td.minute
print td.date()
print td.time()
td.strftime('%m/%d/%y %H:%M:%S')#strf method converts the date and time into a string
9 1 1989-06-09 05:01:30
'06/09/89 05:01:30'
from datetime import datetime, date, time
datetime.strptime('1989911', '%Y%m%d') # strings can be converted to date and time objects using strptime
td = datetime(1989,6,9,5,1, 30)
td.replace(hour =0 ,minute=0, second=30)#you can replace function to edit datetim function
datetime.datetime(1989, 6, 9, 0, 0, 30)
from datetime import datetime, date, time
td = datetime(1989,6,9,5,1, 30)
td1 = datetime(1988,8, 31, 11, 2, 23)
new_time =td1 - td # you can subtract two different date and time functions
print new_time
print type(new_time) # the type is date and time
print td +new_time
-282 days, 6:00:53 <type 'datetime.timedelta'> 1988-08-31 11:02:23
Handling Exceptions is only a fancy name for handling python errors. In Python many functions work ony on certain type of input. For example, float function returns a value error, when you feed it with a string.
print float('7.968')
float('Big data')
7.968
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-8-e679c5a97125> in <module>() 1 print float('7.968') ----> 2 float('Big data') ValueError: could not convert string to float: Big data
# suppose we want our float function to return the input value, we can do this using the folowing code.
def return_float(x):
try:
return float(x)
except:
return x
print return_float('4.55')
print return_float('big data') # This time it didnt return a value error
4.55 big data
#print float((9,8)) ->this will return a type error, remove the comment and check the output.
def return_float(x):
try:
return float(x)
except(TypeError, ValueError):# type error and value error are mentioned as a exception values
return x
print return_float((9,8)) #now you can see it returns 9,8
(9, 8)
# these are called ternary expressions
x = 'Life is short use python'
'This is my favourite quote' if x == 'Life is short use python' else 'I hate R'
'This is my favourite quote'
#Tuples are one dimensional, fixed length, imutable sequence of Python Objects.
machine_learning = 77, 45, 67
print machine_learning
pythonista = (87, 56, 98), (78, 45, 33) #Nested Tuples
print pythonista
(77, 45, 67) ((87, 56, 98), (78, 45, 33))
#You can convert any sequence to a tuple by using 'tuple' keyword
print tuple([4,0,2])
pythonista = tuple('Python')
print pythonista
pythonista[0] # you can accessing each element in a tuple,
(4, 0, 2) ('P', 'y', 't', 'h', 'o', 'n')
'P'
x = tuple(['Manu',[99,88], 'Jeevan'])
#x[2] = 'Prakash' # you cant modify a tuple like this
x[1].append(77)# But you can append to a object to a tuple
x
('Manu', [99, 88, 77], 'Jeevan')
y = ('Mean', 'Median', 'Mode')+('Chisquare', 'Annova') + ('statistical significance',) # you can concatenate a tuple using'+' symbol.
print y
('Mean', 'Median') *4 # try printing a tuple using a number
('Mean', 'Median', 'Mode', 'Chisquare', 'Annova', 'statistical significance')
('Mean', 'Median', 'Mean', 'Median', 'Mean', 'Median', 'Mean', 'Median')
deep_learning =('Theano', 'Open cv', 'Torch') # you can un pack a tuple
x,y,z= deep_learning
print x
print y
print z
Theano Open cv Torch
countries ='Usa', 'India', ('Afghanistan',' Pakistan'),
a,b,(c,d) = countries
print a
print b
print c
print d
Usa India Afghanistan Pakistan
countries ='Usa', 'India', ('Afghanistan',' Pakistan'), 'Usa', 'Usa'
countries.count('Usa') # .count can be used to count how many values are ther in a tuple
3
I havent discussed lists, as it is covered in depth in code academy tutorials. I am going through the concepts that are not discussed in code academy. Some important list concepts are:
countries =['Usa', 'India','Afghanistan',' Pakistan']
y = countries.extend(['Britian', 'Canada', 'Uzbekistan', 'Turkey'])
z = countries.sort(key=len) # countries are sorted according to number of characters
print countries
# extend can be a handy feature when your lists are large.
['Usa', 'India', 'Canada', 'Turkey', 'Britian', ' Pakistan', 'Uzbekistan', 'Afghanistan']
import bisect
b = [9,9,9,9,5,6,3,5,3,2,1,4,7,8]
b.sort()
x =bisect.bisect(b,2) # bisect.bisect finds the location where an element should be inserted to keep it sorted.
y= bisect.bisect(b, 5)
print x
print y
2 7
# When iterating over a sequence; to keep track of the index of the current element, you can use 'enumerate'
languages = ['Bigdata', 'Hadoop', 'mapreduce', 'Nosql']
for i,val in enumerate(languages):
print i,val
0 Bigdata 1 Hadoop 2 mapreduce 3 Nosql
#Sorted function returns a new sorted list from a sequence
print sorted([89, 99,45,63,25,53,34,56])
print sorted('Big data examiner')
[25, 34, 45, 53, 56, 63, 89, 99] [' ', ' ', 'B', 'a', 'a', 'a', 'd', 'e', 'e', 'g', 'i', 'i', 'm', 'n', 'r', 't', 'x']
hot_job = ['Big_data', 'data science', 'data scientist', 'data base developer']
languages = ['c', 'c++', 'java', 'python']
statistics = ['Mean', 'Median', 'Mode', 'Chi square']
print zip(hot_job, languages, statistics)
for i, (x,y) in enumerate(zip(hot_job, languages)): #See how I use zip and enumerate together
print('%d: %s, %s' %(i,x,y))
[('Big_data', 'c', 'Mean'), ('data science', 'c++', 'Median'), ('data scientist', 'java', 'Mode'), ('data base developer', 'python', 'Chi square')] 0: Big_data, c 1: data science, c++ 2: data scientist, java 3: data base developer, python
# you can unzip a zipped sequence as follows
rockers = [('Jame', 'Manu'), ('Govind', 'Dheepan'),('Partha', 'Reddy')]
first_names, last_names = zip(*rockers)
print first_names
print last_names
('Jame', 'Govind', 'Partha') ('Manu', 'Dheepan', 'Reddy')
#Use reversed keyword to reverse a sequence
list(reversed(range(20)))
[19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
# you can combine two dictionaries using 'update' method
d1 = {'a' : 'octave', 'b' : 'Java'}
d1.update({'c' : 'foo', 'd' : 12})
print d1
d2 = {'a' : 'octave', 'b' : 'Java'}
d2.update({'b' : 'foo', 'c' : 12}) #the dictionary inside brackets, overrides the value 'b' in d2
print d2
{'a': 'octave', 'c': 'foo', 'b': 'Java', 'd': 12} {'a': 'octave', 'c': 12, 'b': 'foo'}
# dict type function accepts a tuple
data_science = dict(zip(range(10), reversed(range(10)))) # see how I am using zip and dict to create a key- value pair
data_science
{0: 9, 1: 8, 2: 7, 3: 6, 4: 5, 5: 4, 6: 3, 7: 2, 8: 1, 9: 0}
# The keys of a dictionary should be immutable(int, string, float, tuples). The technical term for this is hashability
print hash('string')
print hash((1,2,3))
print hash([1,2,4]) # generates an error as lists are immutable
-1542666171 -378539185
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-148-27f144be1274> in <module>() 2 print hash('string') 3 print hash((1,2,3)) ----> 4 print hash([1,2,4]) TypeError: unhashable type: 'list'
# An easy way to convert a list into a key is to convert it to a tuple
fg ={}
fg[tuple([3,4,5])] = 45
fg
{(3, 4, 5): 45}
# a set is an unordered collection of unique elements.
set([3,3,4,4,4,6,7,7,7,8])
{3, 4, 6, 7, 8}
#Sets support mathematical set operations like union, intersection, difference, and symmetric difference
a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}
print a|b # union
print a&b #intersection-> common elements in two dictionaries
print a-b
print a^b # symmetric difference
print {1,2,3} =={3,2,1} # if values are equal so True
set([1, 2, 3, 4, 5, 6, 7, 8]) set([3, 4, 5]) set([1, 2]) set([1, 2, 6, 7, 8]) True
football_clubs = ['Manchester', 'Liverpool', 'Arsenal', 'Chelsea', 'Mancity', 'Tottenham', 'Barcelona','Dortmund']
football ={}
for clubs in football_clubs:
club = clubs[0] # gets the first character of football_clubs
if club not in football_clubs:
football[club] = [clubs]
else:
football[club].append(clubs)
print football
{'A': ['Arsenal'], 'C': ['Chelsea'], 'B': ['Barcelona'], 'D': ['Dortmund'], 'M': ['Mancity'], 'L': ['Liverpool'], 'T': ['Tottenham']}
# Usually, a Python dictionary throws a KeyError if you try to get an item with a key that is not currently in the dictionary.
#The defaultdict in contrast will simply create any items that you try to access (provided of course they do not exist yet). To create such a "default" item, it calls the function object that you pass in the constructor
#(more precisely, it's an arbitrary "callable" object, which includes function and type objects).
# The Same operation can be done using default dict
from collections import defaultdict # default dict is present in collections library
soccer = defaultdict(list)
for clubs in football_clubs:
soccer[clubs[0]].append(clubs)
print soccer
defaultdict(<type 'list'>, {'A': ['Arsenal'], 'C': ['Chelsea'], 'B': ['Barcelona'], 'D': ['Dortmund'], 'M': ['Manchester', 'Mancity'], 'L': ['Liverpool'], 'T': ['Tottenham']})
# a function can return multiple values
def b():
x =34
y =45
z =89
return x,y,z
Technically closure functions are called as dynamically-generated function returned by another function. The main property is that the returned function has access to the local variables in local namespace, where it was created. In laymans term a closure function is a function within main function.
# Example of a closure function. The function returns True, if a element is repeated in the list.
def dict_funct():
new_dict = {} # create a new dictionary
def modifier(z):
if z in new_dict: # if z is in dictionary
return True
else:
new_dict[z]=True
return False
return modifier
x = dict_funct()
list_func = [5,4,6,5,3,4,6,2,1,5]
y = [x(i) for i in list_func]
print y
[False, False, False, True, False, True, True, False, False, True]
Raw data is messy. So you have to clean the data set, to make it ready for analysis. Here we have a list of states that consists of unnecessary punctuations,capilitalization and white space. First, I am importing a python module called regular expression. Second, I am creating a funtion called remove_functions, to remove the unnecessary punctuations, re.sub is used to remove unnecessary punctuations in the function. Third, I am creating a list of three functions [str.strip](http://www.tutorialspoint.com/python/string_strip.htm, remove_functions and str.title.
# If we are doing some data cleaning, we will be having a messy data set like this.
import re
states = [' Kerala', 'Gujarat!', 'Delhi', 'Telengana', 'TriPUra', 'Tamil Nadu##', 'West Bengal?']
def remove_functions(strp):
return re.sub('[!#?]', '', strp)
ooops = [str.strip, remove_functions, str.title] # create a list of functions
def clean_data(oops, funky): # function takes two arguments
result = [] # create a empty list
for data in oops: # loop over(go to each and every element) in states
for fun in funky: # loop over ooops list
data = fun(data) # apply each and every function in ooops to states.
result.append(data) # attach formmated states data to a new list
return result # return the list
x = clean_data(states, ooops)
print x
['Kerala', 'Gujarat', 'Delhi', 'Telengana', 'Tripura', 'Tamil Nadu', 'West Bengal']
# Lambda is short form of writing a function.
def f(x):
return x**2
print f(8)
#same function using lambda
y = lambda x: x**2
print y(9)
64 81
def new_objjj():
for x in xrange(100):
yield x**2 #when using generator functions, Use yield instead of return.
some_variable = new_objjj()
# The above function can be written as follows
new_obj = (x**2 for x in range(100))
#Generator expressions can be used inside any Python function that will accept a generator
y = sum(x**2 for x in xrange(100))
print y
dict((i,i**2) for i in xrange(5)) #xrange is faster than range
328350
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
rkeys=[1,2,3]
rvals=['South','Sardinia','North']
rmap={e[0]:e[1] for e in zip(rkeys,rvals)} # use of Zip function
rmap
{1: 'South', 2: 'Sardinia', 3: 'North'}