Python Workshop: Basic Introduction for Psychology Researchers


INTRODUCTION

  • Introductions

    • Montréal-Python
      • Speaker
      • Events: Monthly meetup (MP), project nights, sprints, workshops, PyCon 2014!
      • Workshops
      • Sponsors
    • And you?
      • Students?
      • Programmers?
      • Professors?
  • Objectives

    • explore Python through experimentation and introspection in the interpretor
    • master basic notions : types, functions, conditionals, iterations...
    • write a script that we will execute in the interpreter
    • import Python's power
  • Documentation

  • Environment

    • text editor
    • python interpreter: python, ipython

I. THEORY (what)

Programming

  • data
    • typed, structured in variables
  • operations on data
    • functions transforming input data in output data
    • storing, retrieveing from files, database, Internet
  • formal language
    • syntax
  • object oriented
    • modeling the world
    • to group in one place the definition of the kinds of things in the world
    • to group in one place the variables and functions relevant to objects of the same class

Programming for psychology researchers

  • Data analysis (pandas)
  • Experiments, tasks (raw_input, Tk, web)
  • Publishing (web, print, LaTeX... ipython notebook)
  • Modeling

Python

  • interpreted
  • object oriented
  • dynamic
  • strongly typed
  • ... used (almost) everywhere

  • Python 2 vs Python 3

Programming in Python

Environment

  • Wes McKinney, Python for Data analysis, O'Reilly, 2013, p.11
  • ... on Integrated Development Environments (IDEs)

    • When asked about my standard development environment, I almost always say "IPython plus a text editor". I typically write a program and iteratively test and debug each piece of it in IPython. It is also useful to be able to play around with data interactively and visually verify that a particular set of data manipulations are doing the right thing. Libraries like pandas and NumPy are designed to be easy-to-use in the shell.

Approach

  1. Explore in the interactive interpreter IPython
  2. Write in your script what works
  3. Search for the tools you need (pypi, Internet, community, books...)
  4. Import (may need to install first) these tools and build more complex scripts, programs
  5. Create your modules or program
  6. Share it and collaboratively create better tools
  7. Do better research, do better science

Interpreter

  • python: vanilla interpreter
$ python
  • ipython : interactivity / introspection
$ ipython

Syntax


II. DEMO (how)

Exploring

  • data in basic variables : name, firstname, dob, has_psy, problems, treatments
  • transforming it with functions : age()
  • ... or methods (functions on objects) : name.upper()

Coding a script

  • create a complex data structure : a Person class
  • interact with user
    • a psychology example with an HEXACO self form
  • save data in a file

Import Python's power

  • going beyond with existing tools
    • importing from the standard library (shipped with Python) : datetime, csv
    • a psychology example with pandas library for data analysis on HEXACO data

III. PRACTICE (hands-on : do-it)

Start

  • launch interpertor

Variables

Name, value, reference

  • variable names refers to values
In [1]:
a = 12
b = a
In [2]:
id(a)
Out[2]:
142708844
In [3]:
id(b)
Out[3]:
142708844

Types

  • dynamic types (no need to declare)
In [4]:
n = None        # NoneType: special value meaning... nothing
b = True        # bool: boolean... True or False (case sensitive)

i = 15          # int: integer
f = 15.5        # float: non-integer values

s = "string"    # str: strings, written with "" or ''
u = u"string"   # unicode: unicode string, writh with u"" or u''

l = []          # list: list of objects (ordered)
t = ()          # tuple: immutable list of objects (can't append to it)
d = {}          # dict: dictionary of data (unique, unordered)

st = {}         # set: a collection of objects (unique, unordered), set([])
  • unpacking
In [5]:
coord = (45.30, 73.34)
lat, lon = coord
  • strongly typed (no implicit casting)
  • casting:

    str(), int(), float(), bool(), list(), tuple(), dict(), set()

In [6]:
float(a)
Out[6]:
12.0

Containers

  • nesting
In [22]:
l = [[1,2,3],[4,'ohai',6],[7,8,9]]
d = {1611: {'lastname':u'Gutiérrez Hermoso', 'firstname':u'Jordi'},
     123: {'lastname':u'Leduc-Hamel', 'firstname':u'Mathieu'}}
  • indexing
In [23]:
l[2]
d[1611]
l[1][1]
Out[23]:
'ohai'
  • slicing
In [28]:
l[1:3]
Out[28]:
[[4, 'ohai', 6], [7, 8, 9]]

Exercice : variables

  1. create a list of official psychological problems, a pseudo DSM
  2. create a dictionary representing a person, with these data
    • name
    • firstname
    • sex (or gender)
    • year of birth
    • to know if he has a psy or not
    • a list of his diagnosed psychological problems (from the official list)

Functions

Function call

  • function name : open
  • function call : open()

Built-in functions

type() # returns object's type
dir() # returns the object's attributes
help() # gives an object's documentation
callable() # whether an object is function-like...

bool(), int(), str() # initialisation or casting
getattr()

isinstance(object, Type)# test the class (or type) of an object
issubclass()
super()

len()
min()
max()

open()
range()
raw_input()

enumerate()
zip()
sorted
reversed()

print
del

Declare a function

  • naming convention
  • output : None by default, one or many values
  • input : named or positional parameters
  • variables scope
  • *args, **kwargs
In []:
def my_function(param1, param2, param3=None, param4=0, *args, **kwards):
    """This is my function."""
    output = True
    return output

Exercice : functions

  1. create a function that returns the age of someone in function of his year of birth

Namespaces

Objects

  • object : everything is an object
  • attribute = variable on an object
  • method = function on an object
  • object.attribute
  • object.method()
  • object.attribute.method()

Introspection

  • variable. [+ tab]
  • variable?

  • type()

  • dir()
  • help()

  • exploring the types

Exercice : namespaces

  1. add a problem to the list of problems
  2. count the total problems you have in your list
  3. create the fullname (from name and firstname) of your person but with its name in uppercase
  4. add on the person dictionary a new key for his treatments
In [2]:
name = u"Jordi Gutiérrez Hermoso"
firstname, paternalname, maternalname = name.split()
paternalname.upper()
paternalname.lower()
paternalname.ljust(30)
name = [firstname.lower(), paternalname.lower(), maternalname.lower()]
username = ".".join(name)

name = u"Jordi Gutiérrez Hermoso"
username = ".".join(name.split()).lower()

users = []
users.append(username)

jordigh = {'firstname':u'Jordi', 'lastname':u'Gutiérrez Hermoso'}
mathieu = {'firstname':u'Mathieu', 'lastname':u'Leduc-Hamel'}
jp = {'firstname':u'Jean-Philippe', 'lastname':u'Caissy'}

people = []
people.append(jordigh)
people.append(mathieu)
people.append(jp)

status = [
    (1, u'New'),
    (2, u'In progress'),
    (3, u'Rejected'),
    (4, u'Accepted'),
]

Flow and control

Comparison and logical operators

  • false = False, 0, "", (), [], {}, None
  • and, or, not
  • < > <= >= == !=
  • x < y <= z
  • is, is not
  • in, not in

Conditional

  • if, elif, else
In [2]:
numlist = range(6)
if 5 in numlist:
    print 'hooray 5'
elif 4 in numlist:
    print 'hooray 4'
else:
    print 'not hooray :-('
hooray 5

  • one-liner
In [3]:
'hooray 5' if 5 in numlist else 'not hooray :-('
Out[3]:
'hooray 5'

Iteration

  • while
In [7]:
year = 2012
while year <= 2015:
    print year
    year = year + 1   # year += 1
2012
2013
2014
2015

  • for
In [8]:
for i in range(2012, 2016):
    print i
2012
2013
2014
2015

Exercice : flow and control

  1. write a test to see if a problem is in the list of the person's problems
    • if so, add a proposed treatment (or measure) for this specific person's problem
      • the treament is not modifiable (we want to make sure that for this problem, we've done that)
    • if not, give him an advice (print it on screen)
  2. from all the problems that this person has, list only ones that didn't received a treatment yet

Scripts

In [18]:
#! /usr/bin/env python
# -*- encoding: utf-8 -*-

def sup(name):
    return u"Sup %s!" % (name)

if __name__ == '__main__':
    print u"--------------------------------------------------"
    print u"START the script"
    print u"--------------------------------------------------"
    name = raw_input("What is your name? ")
    print sup(name)
    print u"-----------------------------------------------"
    print u"END the script"
    print u"-------------------------------------------------"
--------------------------------------------------
START the script
--------------------------------------------------
-----------------------------------------------
END the script
-------------------------------------------------

  • shebang: #!/usr/bin/env python
  • encoding: # -*- encoding: utf-8 -*-
  • if __name__ == '__main__':
  • raw_input()

  • create a repository project/contacts

  • create in this repo a Python script called form.py which:

    • Asks the user their name, last name, and date of birth
    • Greets the user putting their full name in ALLCAPS
  • Execution in ipython:

run script
  • execution with python:
$ python script.py

Exercice : scripts

  1. create a file called lab.py
  2. add your working code you typed in the interpertor
    • variable for the official psychological problems
    • function to calculate the age
  3. ask the user all the questions you need to get the data to store in your dictionary
  4. end your interaction with your user by telling him something relevant with its age and problems
  5. run your script
  6. interact with your script
  7. close the interpretor
  8. re-open the interpretor
  9. re-run the script

Classes

Declare a class

In []:
class Person(object):
    
    def __init__(self, name, firstname, dob=None):
        self.name = name
        self.firstname = firstname
        self.dob = dob
        
    def age(self):
        # TODO : compute in function of dob and now
        return self.dob

Create objects

  • objects (instances) of a Class
In []:
mathieu = Person("Leduc-Hamel", "Mathieu")
davin = Person(firstname="Davin", name="Baragiotta")

Exercice : classes

  1. write a Person class in your script
  2. run the script
  3. create few persons objects (instances) from your Person class
  4. when everything's fine, replace your person dictionary and your age function with your Person class

Import

  • import module
  • from module import name
  • from module import name as my_name

  • built-in: no need to import

  • standard library (shipped with python): import without installation
  • pypi : lots of modules to install that are just begging to be imported

  • importable if installed "in the path"

Standard library modules

In [1]:
from datetime import date

today = date.today()
print today
#year = ??
2014-03-18

In [21]:
import sys
sys.path
Out[21]:
['',
 '/usr/lib/python2.7',
 '/usr/lib/python2.7/plat-linux2',
 '/usr/lib/python2.7/lib-tk',
 '/usr/lib/python2.7/lib-old',
 '/usr/lib/python2.7/lib-dynload',
 '/home/jordi/.local/lib/python2.7/site-packages',
 '/usr/local/lib/python2.7/dist-packages',
 '/usr/lib/python2.7/dist-packages',
 '/usr/lib/python2.7/dist-packages/PIL',
 '/usr/lib/python2.7/dist-packages/gst-0.10',
 '/usr/lib/python2.7/dist-packages/gtk-2.0',
 '/usr/lib/pymodules/python2.7',
 '/usr/lib/python2.7/dist-packages/IPython/extensions']

Examples

  • re
  • datetime
  • collections
  • pprint
  • decimal
  • os.path
  • pickle
  • sqlite3
  • zipfile
  • csv
  • email
  • json
  • htmllib
  • urllib2
  • Tkinter, ttk
  • pdb
  • sys

Exercice : import from standard library

  1. import the date name from the datetime module of the standard library
  2. explore how to obtain the year out from a date
  3. upgrade your function calculating the age to compute based on date objects and no more with integers
  4. explore how to deal with days (you don't have the same age in the same year whether your birthday is past or to come)
  5. upgrade again your age function to use date deltas

Your own modules

__init__.py __name__ __main__
  • __name__ : Module name. It's the file name if imported, but it's __main__ if executed (useful for tests)
  • add the file __init__.py in a directory to turn it into a module

Exercice : import your module

  1. create a directory called inpsych
  2. create a __init__.py file in it
  3. move your lab.py file in the inpsych directory
  4. in the interpretor, import the age function living in the lab.py file in order to test it
    • make sure that the inpsych directory is "in the path" for your interpretor
    • launching the interpretor in the parent directory of your module makes your module "in the path", reachable for the interpretor

Third party modules

https://pypi.python.org/

  • pypi = python packaging index
  • modules that we import live in packages that we install on our system
  • find the right package (on pypi or talking with a Pythonista)

Databases

  • DB : sqlite3, mysqldb, psycopg2
  • ORM : sqlalchemy

Web development

  • django (+south)
  • pyramid
  • flask

Science

  • numpy
  • matplotlib
  • pandas
  • scipy
Psychology
  • psychopy

System administration

  • fabric
  • celery
  • ...

Natural language treatment

  • nltk
  • pattern

  • install with pip

  • pip = pip installs python

Installation with pip

  • pip installs packages on your machine
  • for a better package management, use virtualenv (covered in an other workshop)
$ pip install packagename

Exercice : import a third party module

  1. install psychopy on your (virtual) machine
    1. close the Python interpretor (or open a new terminal)
    2. install with pip
    3. launch back the interpretor
  2. import psychopy
  3. explore psychopy

Files

  • open, manipulate, close
In [15]:
f = open('python.txt')
for line in f.readlines():
    print line,
f.close()
Python is a language that I love
and, it seems, its name comes from
the Monty Python troupe of British comedians
and not from the kind of snake called "python"


In [16]:
f = open("python.txt")
lines = f.readlines()
f.close()

target = 'python'
context = [line for line in lines if target in line]
comments = [line for line in lines if line.startswith('#')]

Exercice : files

  1. introspect the open function to see how to open in writing, reading or append mode... : w, r, a...
  2. create a list of strings
  3. write these strings in a file, e.g.: test.txt

Syntaxic sugar

List, dictionary and set comprehensions

  • Create a list, a dictionary or a set from an iterable using a one-liner
new_list = [v for v in old_list if v < 2]
new_dict = {k, v for k, v in old_dict.items() if k < 2}
new_set = {v for v in old_list if v < 2}

String formatting

  • substitution : %
In [11]:
for n in range(10):
    print "%d to the 3rd power is: %d" % (n, n**3)

for p in people:
    print "Bonjour/Hello %s %s" % (p['firstname'], p['lastname'].upper())
0 to the 3rd power is: 0
1 to the 3rd power is: 1
2 to the 3rd power is: 8
3 to the 3rd power is: 27
4 to the 3rd power is: 64
5 to the 3rd power is: 125
6 to the 3rd power is: 216
7 to the 3rd power is: 343
8 to the 3rd power is: 512
9 to the 3rd power is: 729
Bonjour/Hello Jordi GUTIÉRREZ HERMOSO
Bonjour/Hello Mathieu LEDUC-HAMEL
Bonjour/Hello Jean-Philippe CAISSY

Exercice : string formatting

  1. use string formatting in the questions asked or feedback given to the user (e.g.: age)

Exceptions

  • try, except
In []:
try:
    15/0
except (ZeroDivisionError,), e:
    print "Dividing by zero is bad, m'kay?"
    print e

Exercice : exception

  1. in your age function, return an explicit error message when the input is not a date

Data permanence

  • files
  • serialisation: import pickle
In [23]:
import pickle

f = open('pickles', 'w')
pickle.dump(status, f)
pickle.dump(people, f)
f.close()

exit()

import pickle

f = open('pickles')
pickle.load(f)
#objects = []
#for obj in pickle.load(f):
#    objects.append(obj)
f.close()
  • in database?

Exercice

  1. in your lab.py program, store all the responses obtained from the users in a .csv file
    1. import the csv module of the standard library
    2. explore how to write a csv file with the csv module... csv.writer?
    3. make sure your program writes in the file in append mode so we don't loose data after each use
  2. test your program multiple times, check that .csv file grows with data

CONCLUSION

Going further

What we've learned

  • import antigravity
  • documentation + interactivity + introspection
  • scripts + modules + import

Getting help

  • helpful community
  • Montréal-Python:

    • join the community
    • upcoming events
  • enjoy!

EXERCISE

Objective

Create a script flux.py that returns the 5 latest stories posted on the Montreal-Python-site: http://montrealpython.org/fr/feed/

Approach

  1. use feedparser
    • pip install feedparser
  2. launch the interpreter and follow the docs:
  3. introspect as needed (and print a dict to see its keys)
  4. code the script that will do the desired processing
  5. launch the script in the interpreter and confirm its proper execution
  6. serve cold

Algorithm

  • read the RSS feed from http://montrealpython.org
  • keep the number of desired items
  • process the kept items:
    • Create a unique string with the data of the last modified items (title, URL... and eventually the date of last modification)

Solution

Take the time to code up the solution yourself... ... then you can compare with our solution

As a hint, our solution only has 8 lines of Python.