# Tutorial on Python for scientific computing¶

Marcos Duarte
Renato Naville Watanabe
Laboratory of Biomechanics and Motor Control (http://pesquisa.ufabc.edu.br/bmclab)
Federal University of ABC, Brazil

This will be a very brief tutorial on Python.
For a complete (and much better) tutorial about Python see A Whirlwind Tour of Python and Python Data Science Handbook for a specific tutorial about Python for scientific computing.

To use Python for scientific computing we need the Python program itself with its main modules and specific packages for scientific computing. See this notebook on how to install Python for scientific computing.
Once you get Python and the necessary packages for scientific computing ready to work, there are different ways to run Python, the main ones are:

• open a terminal window in your computer and type python or ipython that the Python interpreter will start
• run the Jupyter notebook and start working with Python in a browser
• run Spyder, an interactive development environment (IDE)
• run the Jupyter qtconsole, a more featured terminal
• run Python online in a website such as https://www.pythonanywhere.com/ or Colaboratory
• run Python using any other Python editor or IDE

We will use the Jupyter Notebook for this tutorial but you can run almost all the things we will see here using the other forms listed above.

## Python as a calculator¶

Once in the Jupyter notebook, if you type a simple mathematical expression and press Shift+Enter it will give the result of the expression:

In [2]:
1 + 2 - 25

Out[2]:
-22
In [3]:
4/7

Out[3]:
0.5714285714285714

Using the print function, let's explore the mathematical operations available in Python:

In [3]:
print('1+2 = ', 1+2, '\n', '4*5 = ', 4*5, '\n', '6/7 = ', 6/7, '\n', '8**2 = ', 8**2, sep='')

1+2 = 3
4*5 = 20
6/7 = 0.8571428571428571
8**2 = 64


And if we want the square-root of a number:

In [4]:
sqrt(9)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-4-c836dfef5db4> in <module>()
----> 1 sqrt(9)

NameError: name 'sqrt' is not defined

We get an error message saying that the sqrt function if not defined. This is because sqrt and other mathematical functions are available with the math module:

In [1]:
import math

In [2]:
math.sqrt(9)

Out[2]:
3.0
In [3]:
from math import sqrt

In [4]:
sqrt(9)

Out[4]:
3.0

## The import function¶

We used the command 'import' to be able to call certain functions. In Python functions are organized in modules and packages and they have to be imported in order to be used.

A module is a file containing Python definitions (e.g., functions) and statements. Packages are a way of structuring Python’s module namespace by using “dotted module names”. For example, the module name A.B designates a submodule named B in a package named A. To be used, modules and packages have to be imported in Python with the import function.

Namespace is a container for a set of identifiers (names), and allows the disambiguation of homonym identifiers residing in different namespaces. For example, with the command import math, we will have all the functions and statements defined in this module in the namespace 'math.', for example, 'math.pi' is the π constant and 'math.cos()', the cosine function.

By the way, to know which Python version you are running, we can use one of the following modules:

In [5]:
import sys
sys.version

Out[5]:
'3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 17:14:51) \n[GCC 7.2.0]'

And if you are in an IPython session:

In [6]:
from IPython import sys_info
print(sys_info())

{'commit_hash': 'd774f565b',
'commit_source': 'installation',
'default_encoding': 'UTF-8',
'ipython_path': '/opt/miniconda3/lib/python3.6/site-packages/IPython',
'ipython_version': '7.4.0',
'os_name': 'posix',
'platform': 'Linux-4.15.0-47-generic-x86_64-with-debian-buster-sid',
'sys_executable': '/opt/miniconda3/bin/python',
'sys_platform': 'linux',
'sys_version': '3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 17:14:51) \n'
'[GCC 7.2.0]'}


The first option gives information about the Python version; the latter also includes the IPython version, operating system, etc.

## Object-oriented programming¶

Python is designed as an object-oriented programming (OOP) language. OOP is a paradigm that represents concepts as "objects" that have data fields (attributes that describe the object) and associated procedures known as methods.

This means that all elements in Python are objects and they have attributes which can be acessed with the dot (.) operator after the name of the object. We already experimented with that when we imported the module sys, it became an object, and we acessed one of its attribute: sys.version.

OOP as a paradigm is much more than defining objects, attributes, and methods, but for now this is enough to get going with Python.

## Python and IPython help¶

To get help about any Python command, use help():

In [7]:
help(math.degrees)

Help on built-in function degrees in module math:

degrees(...)
degrees(x)

Convert angle x from radians to degrees.



Or if you are in the IPython environment, simply add '?' to the function that a window will open at the bottom of your browser with the same help content:

In [8]:
math.degrees?


And if you add a second '?' to the statement you get access to the original script file of the function (an advantage of an open source language), unless that function is a built-in function that does not have a script file, which is the case of the standard modules in Python (but you can access the Python source code if you want; it just does not come with the standard program for installation).

So, let's see this feature with another function:

In [4]:
import scipy.fftpack
scipy.fftpack.fft??


To know all the attributes of an object, for example all the functions available in math, we can use the function dir:

In [14]:
print(dir(math))

['__doc__', '__loader__', '__name__', '__package__', '__spec__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose', 'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'nan', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'tau', 'trunc']


### Tab completion in IPython¶

IPython has tab completion: start typing the name of the command (object) and press tab to see the names of objects available with these initials letters. When the name of the object is typed followed by a dot (math.), pressing tab will show all available attribites, scroll down to the desired attribute and press Enter to select it.

### The four most helpful commands in IPython¶

These are the most helpful commands in IPython (from IPython tutorial):

• ? : Introduction and overview of IPython’s features.
• %quickref : Quick reference.
• help : Python’s own help system.
• object? : Details about ‘object’, use ‘object??’ for extra details.

Comments in Python start with the hash character, #, and extend to the end of the physical line:

In [15]:
# Import the math library to access more math stuff
import math
math.pi  # this is the pi constant; a useless comment since this is obvious

Out[15]:
3.141592653589793

To insert comments spanning more than one line, use a multi-line string with a pair of matching triple-quotes: """ or ''' (we will see the string data type later). A typical use of a multi-line comment is as documentation strings and are meant for anyone reading the code:

In [16]:
"""Documentation strings are typically written like that.

A docstring is a string literal that occurs as the first statement
in a module, function, class, or method definition.

"""

Out[16]:
'Documentation strings are typically written like that.\n\nA docstring is a string literal that occurs as the first statement\nin a module, function, class, or method definition.\n\n'

A docstring like above is useless and its output as a standalone statement looks uggly in IPython Notebook, but you will see its real importance when reading and writting codes.

Commenting a programming code is an important step to make the code more readable, which Python cares a lot.
There is a style guide for writting Python code (PEP 8) with a session about how to write comments.

### Magic functions¶

IPython has a set of predefined ‘magic functions’ that you can call with a command line style syntax.
There are two kinds of magics, line-oriented and cell-oriented.
Line magics are prefixed with the % character and work much like OS command-line calls: they get as an argument the rest of the line, where arguments are passed without parentheses or quotes.
Cell magics are prefixed with a double %%, and they are functions that get as an argument not only the rest of the line, but also the lines below it in a separate argument.

## Assignment and expressions¶

The equal sign ('=') is used to assign a value to a variable. Afterwards, no result is displayed before the next interactive prompt:

In [17]:
x = 1


Spaces between the statements are optional but it helps for readability.

To see the value of the variable, call it again or use the print function:

In [18]:
x

Out[18]:
1
In [19]:
print(x)

1


Of course, the last assignment is that holds:

In [9]:
x = 2
x = 3
x

Out[9]:
3

In mathematics '=' is the symbol for identity, but in computer programming '=' is used for assignment, it means that the right part of the expresssion is assigned to its left part.
For example, 'x=x+1' does not make sense in mathematics but it does in computer programming:

In [14]:
x = x + 1
print(x)

6


A value can be assigned to several variables simultaneously:

In [11]:
x = y = 4
print(x)
print(y)

4
4


Several values can be assigned to several variables at once:

In [13]:
x, y = 5, 6
print(x)
print(y)

5
6


And with that, you can do (!):

In [24]:
x, y = y, x
print(x)
print(y)

6
5


Variables must be “defined” (assigned a value) before they can be used, or an error will occur:

In [25]:
x = z

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-25-cfba4031bce1> in <module>()
----> 1 x = z

NameError: name 'z' is not defined

## Variables and types¶

There are different types of built-in objects in Python (and remember that everything in Python is an object):

In [26]:
import types
print(dir(types))

['AsyncGeneratorType', 'BuiltinFunctionType', 'BuiltinMethodType', 'CodeType', 'CoroutineType', 'DynamicClassAttribute', 'FrameType', 'FunctionType', 'GeneratorType', 'GetSetDescriptorType', 'LambdaType', 'MappingProxyType', 'MemberDescriptorType', 'MethodType', 'ModuleType', 'SimpleNamespace', 'TracebackType', '_GeneratorWrapper', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_ag', '_calculate_meta', '_collections_abc', '_functools', 'coroutine', 'new_class', 'prepare_class']


Let's see some of them now.

### Numbers: int, float, complex¶

Numbers can an integer (int), float, and complex (with imaginary part).
Let's use the function type to show the type of number (and later for any other object):

In [27]:
type(6)

Out[27]:
int

A float is a non-integer number:

In [28]:
math.pi

Out[28]:
3.141592653589793
In [29]:
type(math.pi)

Out[29]:
float

Python (IPython) is showing math.pi with only 15 decimal cases, but internally a float is represented with higher precision.
Floating point numbers in Python are implemented using a double (eight bytes) word; the precison and internal representation of floating point numbers are machine specific and are available in:

In [30]:
sys.float_info

Out[30]:
sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)

Be aware that floating-point numbers can be trick in computers:

In [31]:
0.1 + 0.2

Out[31]:
0.30000000000000004
In [32]:
0.1 + 0.2 - 0.3

Out[32]:
5.551115123125783e-17

These results are not correct (and the problem is not due to Python). The error arises from the fact that floating-point numbers are represented in computer hardware as base 2 (binary) fractions and most decimal fractions cannot be represented exactly as binary fractions. As consequence, decimal floating-point numbers are only approximated by the binary floating-point numbers actually stored in the machine. See here for more on this issue.

A complex number has real and imaginary parts:

In [33]:
1+2j

Out[33]:
(1+2j)
In [34]:
print(type(1+2j))

<class 'complex'>


Each part of a complex number is represented as a floating-point number. We can see them using the attributes .real and .imag:

In [35]:
print((1+2j).real)
print((1+2j).imag)

1.0
2.0


### Strings¶

Strings can be enclosed in single quotes or double quotes:

In [36]:
s = 'string (str) is a built-in type in Python'
s

Out[36]:
'string (str) is a built-in type in Python'
In [37]:
type(s)

Out[37]:
str

String enclosed with single and double quotes are equal, but it may be easier to use one instead of the other:

In [38]:
'string (str) is a Python's built-in type'

  File "<ipython-input-38-ca70e9285fe4>", line 1
'string (str) is a Python's built-in type'
^
SyntaxError: invalid syntax

In [39]:
"string (str) is a Python's built-in type"

Out[39]:
"string (str) is a Python's built-in type"

But you could have done that using the Python escape character '\':

In [40]:
'string (str) is a Python\'s built-in type'

Out[40]:
"string (str) is a Python's built-in type"

Strings can be concatenated (glued together) with the + operator, and repeated with *:

In [41]:
s = 'P' + 'y' + 't' + 'h' + 'o' + 'n'
print(s)
print(s*5)

Python
PythonPythonPythonPythonPython


Strings can be subscripted (indexed); like in C, the first character of a string has subscript (index) 0:

In [42]:
print('s[0] = ', s[0], '  (s[index], start at 0)')
print('s[5] = ', s[5])
print('s[-1] = ', s[-1], '  (last element)')
print('s[:] = ', s[:], '  (all elements)')
print('s[1:] = ', s[1:], '  (from this index (inclusive) till the last (inclusive))')
print('s[2:4] = ', s[2:4], '  (from first index (inclusive) till second index (exclusive))')
print('s[:2] = ', s[:2], '  (till this index, exclusive)')
print('s[:10] = ', s[:10], '  (Python handles the index if it is larger than the string length)')
print('s[-10:] = ', s[-10:])
print('s[0:5:2] = ', s[0:5:2], '  (s[ini:end:step])')
print('s[::2] = ', s[::2], '  (s[::step], initial and final indexes can be omitted)')
print('s[0:5:-1] = ', s[::-1], '  (s[::-step] reverses the string)')
print('s[:2] + s[2:] = ', s[:2] + s[2:], '  (because of Python indexing, this sounds natural)')

s[0] =  P   (s[index], start at 0)
s[5] =  n
s[-1] =  n   (last element)
s[:] =  Python   (all elements)
s[1:] =  ython   (from this index (inclusive) till the last (inclusive))
s[2:4] =  th   (from first index (inclusive) till second index (exclusive))
s[:2] =  Py   (till this index, exclusive)
s[:10] =  Python   (Python handles the index if it is larger than the string length)
s[-10:] =  Python
s[0:5:2] =  Pto   (s[ini:end:step])
s[::2] =  Pto   (s[::step], initial and final indexes can be omitted)
s[0:5:-1] =  nohtyP   (s[::-step] reverses the string)
s[:2] + s[2:] =  Python   (because of Python indexing, this sounds natural)


### len()¶

Python has a built-in functon to get the number of itens of a sequence:

In [43]:
help(len)

Help on built-in function len in module builtins:

len(obj, /)
Return the number of items in a container.


In [44]:
s = 'Python'
len(s)

Out[44]:
6

The function len() helps to understand how the backward indexing works in Python.
The index s[-i] should be understood as s[len(s) - i] rather than accessing directly the i-th element from back to front. This is why the last element of a string is s[-1]:

In [45]:
print('s = ', s)
print('len(s) = ', len(s))
print('len(s)-1 = ',len(s) - 1)
print('s[-1] = ', s[-1])
print('s[len(s) - 1] = ', s[len(s) - 1])

s =  Python
len(s) =  6
len(s)-1 =  5
s[-1] =  n
s[len(s) - 1] =  n


Or, strings can be surrounded in a pair of matching triple-quotes: """ or '''. End of lines do not need to be escaped when using triple-quotes, but they will be included in the string. This is how we created a multi-line comment earlier:

In [46]:
"""Strings can be surrounded in a pair of matching triple-quotes: \""" or '''.

End of lines do not need to be escaped when using triple-quotes,
but they will be included in the string.

"""

Out[46]:
'Strings can be surrounded in a pair of matching triple-quotes: """ or \'\'\'.\n\nEnd of lines do not need to be escaped when using triple-quotes,\nbut they will be included in the string.\n\n'

### Lists¶

Values can be grouped together using different types, one of them is list, which can be written as a list of comma-separated values between square brackets. List items need not all have the same type:

In [47]:
x = ['spam', 'eggs', 100, 1234]
x

Out[47]:
['spam', 'eggs', 100, 1234]

Lists can be indexed and the same indexing rules we saw for strings are applied:

In [48]:
x[0]

Out[48]:
'spam'

The function len() works for lists:

In [49]:
len(x)

Out[49]:
4

### Tuples¶

A tuple consists of a number of values separated by commas, for instance:

In [50]:
t = ('spam', 'eggs', 100, 1234)
t

Out[50]:
('spam', 'eggs', 100, 1234)

The type tuple is why multiple assignments in a single line works; elements separated by commas (with or without surrounding parentheses) are a tuple and in an expression with an '=', the right-side tuple is attributed to the left-side tuple:

In [51]:
a, b = 1, 2
print('a = ', a, '\nb = ', b)

a =  1
b =  2


Is the same as:

In [52]:
(a, b) = (1, 2)
print('a = ', a, '\nb = ', b)

a =  1
b =  2


### Sets¶

Python also includes a data type for sets. A set is an unordered collection with no duplicate elements.

In [53]:
basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
fruit = set(basket)  # create a set without duplicates
fruit

Out[53]:
{'apple', 'banana', 'orange', 'pear'}

As set is an unordered collection, it can not be indexed as lists and tuples.

In [54]:
set(['orange', 'pear', 'apple', 'banana'])
'orange' in fruit  # fast membership testing

Out[54]:
True

### Dictionaries¶

Dictionary is a collection of elements organized keys and values. Unlike lists and tuples, which are indexed by a range of numbers, dictionaries are indexed by their keys:

In [55]:
tel = {'jack': 4098, 'sape': 4139}
tel

Out[55]:
{'jack': 4098, 'sape': 4139}
In [56]:
tel['guido'] = 4127
tel

Out[56]:
{'guido': 4127, 'jack': 4098, 'sape': 4139}
In [57]:
tel['jack']

Out[57]:
4098
In [58]:
del tel['sape']
tel['irv'] = 4127
tel

Out[58]:
{'guido': 4127, 'irv': 4127, 'jack': 4098}
In [59]:
tel.keys()

Out[59]:
dict_keys(['jack', 'guido', 'irv'])
In [60]:
'guido' in tel

Out[60]:
True

The dict() constructor builds dictionaries directly from sequences of key-value pairs:

In [61]:
tel = dict([('sape', 4139), ('guido', 4127), ('jack', 4098)])
tel

Out[61]:
{'guido': 4127, 'jack': 4098, 'sape': 4139}

## Built-in Constants¶

• False : false value of the bool type
• True : true value of the bool type
• None : sole value of types.NoneType. None is frequently used to represent the absence of a value.

In computer science, the Boolean or logical data type is composed by two values, true and false, intended to represent the values of logic and Boolean algebra. In Python, 1 and 0 can also be used in most situations as equivalent to the Boolean values.

## Logical (Boolean) operators¶

### and, or, not¶

• and : logical AND operator. If both the operands are true then condition becomes true. (a and b) is true.
• or : logical OR Operator. If any of the two operands are non zero then condition becomes true. (a or b) is true.
• not : logical NOT Operator. Reverses the logical state of its operand. If a condition is true then logical NOT operator will make false.

### Comparisons¶

The following comparison operations are supported by objects in Python:

• == : equal
• != : not equal
• < : strictly less than
• <= : less than or equal
• > : strictly greater than
• >= : greater than or equal
• is : object identity
• is not : negated object identity
In [62]:
True == False

Out[62]:
False
In [63]:
not True == False

Out[63]:
True
In [64]:
1 < 2 > 1

Out[64]:
True
In [65]:
True != (False or True)

Out[65]:
False
In [66]:
True != False or True

Out[66]:
True

## Indentation and whitespace¶

In Python, statement grouping is done by indentation (this is mandatory), which are done by inserting whitespaces, not tabs. Indentation is also recommended for alignment of function calling that span more than one line for better clarity.
We will see examples of indentation in the next session.

## Control of flow¶

### if...elif...else¶

Conditional statements (to peform something if another thing is True or False) can be implemmented using the if statement:

if expression:
statement
elif:
statement
else:
statement

elif (one or more) and else are optionals.
The indentation is obligatory.
For example:

In [67]:
if True:
pass


Which does nothing useful.

Let's use the if...elif...else statements to categorize the body mass index of a person:

In [68]:
# body mass index
weight = 100  # kg
height = 1.70  # m
bmi = weight / height**2

In [69]:
if bmi < 15:
c = 'very severely underweight'
elif 15 <= bmi < 16:
c = 'severely underweight'
elif 16 <= bmi < 18.5:
c = 'underweight'
elif 18.5 <= bmi < 25:
c = 'normal'
elif 25 <= bmi < 30:
c = 'overweight'
elif 30 <= bmi < 35:
c = 'moderately obese'
elif 35 <= bmi < 40:
c = 'severely obese'
else:
c = 'very severely obese'

print('For a weight of {0:.1f} kg and a height of {1:.2f} m,\n\
the body mass index (bmi) is {2:.1f} kg/m2,\nwhich is considered {3:s}.'\
.format(weight, height, bmi, c))

For a weight of 100.0 kg and a height of 1.70 m,
the body mass index (bmi) is 34.6 kg/m2,
which is considered moderately obese.


### for¶

The for statement iterates over a sequence to perform operations (a loop event).

for iterating_var in sequence:
statements
In [70]:
for i in [3, 2, 1, 'go!']:
print(i, end=', ')

3, 2, 1, go!,
In [71]:
for letter in 'Python':
print(letter),

P
y
t
h
o
n


#### The range() function¶

The built-in function range() is useful if we need to create a sequence of numbers, for example, to iterate over this list. It generates lists containing arithmetic progressions:

In [72]:
help(range)

Help on class range in module builtins:

class range(object)
|  range(stop) -> range object
|  range(start, stop[, step]) -> range object
|
|  Return an object that produces a sequence of integers from start (inclusive)
|  to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
|  start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
|  These are exactly the valid indices for a list of 4 elements.
|  When step is given, it specifies the increment (or decrement).
|
|  Methods defined here:
|
|  __bool__(self, /)
|      self != 0
|
|  __contains__(self, key, /)
|      Return key in self.
|
|  __eq__(self, value, /)
|      Return self==value.
|
|  __ge__(self, value, /)
|      Return self>=value.
|
|  __getattribute__(self, name, /)
|      Return getattr(self, name).
|
|  __getitem__(self, key, /)
|      Return self[key].
|
|  __gt__(self, value, /)
|      Return self>value.
|
|  __hash__(self, /)
|      Return hash(self).
|
|  __iter__(self, /)
|      Implement iter(self).
|
|  __le__(self, value, /)
|      Return self<=value.
|
|  __len__(self, /)
|      Return len(self).
|
|  __lt__(self, value, /)
|      Return self<value.
|
|  __ne__(self, value, /)
|      Return self!=value.
|
|  __new__(*args, **kwargs) from builtins.type
|      Create and return a new object.  See help(type) for accurate signature.
|
|  __reduce__(...)
|      helper for pickle
|
|  __repr__(self, /)
|      Return repr(self).
|
|  __reversed__(...)
|      Return a reverse iterator.
|
|  count(...)
|      rangeobject.count(value) -> integer -- return number of occurrences of value
|
|  index(...)
|      rangeobject.index(value, [start, [stop]]) -> integer -- return index of value.
|      Raise ValueError if the value is not present.
|
|  ----------------------------------------------------------------------
|  Data descriptors defined here:
|
|  start
|
|  step
|
|  stop


In [73]:
range(10)

Out[73]:
range(0, 10)
In [74]:
range(1, 10, 2)

Out[74]:
range(1, 10, 2)
In [75]:
for i in range(10):
n2 = i**2
print(n2),

0
1
4
9
16
25
36
49
64
81


### while¶

The while statement is used for repeating sections of code in a loop until a condition is met (this different than the for statement which executes n times):

while expression:
statement

Let's generate the Fibonacci series using a while loop:

In [76]:
# Fibonacci series: the sum of two elements defines the next
a, b = 0, 1
while b < 1000:
print(b, end='   ')
a, b = b, a+b

1   1   2   3   5   8   13   21   34   55   89   144   233   377   610   987

## Function definition¶

A function in a programming language is a piece of code that performs a specific task. Functions are used to reduce duplication of code making easier to reuse it and to decompose complex problems into simpler parts. The use of functions contribute to the clarity of the code.

A function is created with the def keyword and the statements in the block of the function must be indented:

In [77]:
def function():
pass


As per construction, this function does nothing when called:

In [78]:
function()


The general syntax of a function definition is:

def function_name( parameters ):
"""Function docstring.

The help for the function

"""

function body

return variables

A more useful function:

In [79]:
def fibo(N):
"""Fibonacci series: the sum of two elements defines the next.

The series is calculated till the input parameter N and
returned as an ouput variable.

"""

a, b, c = 0, 1, []
while b < N:
c.append(b)
a, b = b, a + b

return c

In [80]:
fibo(100)

Out[80]:
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
In [81]:
if 3 > 2:
print('teste')

teste


Let's implemment the body mass index calculus and categorization as a function:

In [82]:
def bmi(weight, height):
"""Body mass index calculus and categorization.

Enter the weight in kg and the height in m.
See http://en.wikipedia.org/wiki/Body_mass_index

"""

bmi = weight / height**2

if bmi < 15:
c = 'very severely underweight'
elif 15 <= bmi < 16:
c = 'severely underweight'
elif 16 <= bmi < 18.5:
c = 'underweight'
elif 18.5 <= bmi < 25:
c = 'normal'
elif 25 <= bmi < 30:
c = 'overweight'
elif 30 <= bmi < 35:
c = 'moderately obese'
elif 35 <= bmi < 40:
c = 'severely obese'
else:
c = 'very severely obese'

s = 'For a weight of {0:.1f} kg and a height of {1:.2f} m,\
the body mass index (bmi) is {2:.1f} kg/m2,\
which is considered {3:s}.'\
.format(weight, height, bmi, c)

print(s)

In [83]:
bmi(73, 1.70)

For a weight of 73.0 kg and a height of 1.70 m,    the body mass index (bmi) is 25.3 kg/m2,    which is considered overweight.


## Numeric data manipulation with Numpy¶

Numpy is the fundamental package for scientific computing in Python and has a N-dimensional array package convenient to work with numerical data. With Numpy it's much easier and faster to work with numbers grouped as 1-D arrays (a vector), 2-D arrays (like a table or matrix), or higher dimensions. Let's create 1-D and 2-D arrays in Numpy:

In [84]:
import numpy as np

In [85]:
x1d = np.array([1, 2, 3, 4, 5, 6])
print(type(x1d))
x1d

<class 'numpy.ndarray'>

Out[85]:
array([1, 2, 3, 4, 5, 6])
In [86]:
x2d = np.array([[1, 2, 3], [4, 5, 6]])
x2d

Out[86]:
array([[1, 2, 3],
[4, 5, 6]])

len() and the Numpy functions size() and shape() give information aboout the number of elements and the structure of the Numpy array:

In [87]:
print('1-d array:')
print(x1d)
print('len(x1d) = ', len(x1d))
print('np.size(x1d) = ', np.size(x1d))
print('np.shape(x1d) = ', np.shape(x1d))
print('np.ndim(x1d) = ', np.ndim(x1d))
print('\n2-d array:')
print(x2d)
print('len(x2d) = ', len(x2d))
print('np.size(x2d) = ', np.size(x2d))
print('np.shape(x2d) = ', np.shape(x2d))
print('np.ndim(x2d) = ', np.ndim(x2d))

1-d array:
[1 2 3 4 5 6]
len(x1d) =  6
np.size(x1d) =  6
np.shape(x1d) =  (6,)
np.ndim(x1d) =  1

2-d array:
[[1 2 3]
[4 5 6]]
len(x2d) =  2
np.size(x2d) =  6
np.shape(x2d) =  (2, 3)
np.ndim(x2d) =  2


Create random data

In [88]:
x = np.random.randn(4,3)
x

Out[88]:
array([[-0.36123769,  0.18896133, -0.53809885],
[-0.7332364 ,  0.47109317,  1.06194556],
[ 0.07331805,  0.72426922, -1.74606307],
[-0.48601252, -0.72308218, -0.98513516]])

Joining (stacking together) arrays

In [89]:
x = np.random.randint(0, 5, size=(2, 3))
print(x)
y = np.random.randint(5, 10, size=(2, 3))
print(y)

[[1 2 3]
[3 2 2]]
[[5 5 5]
[7 5 9]]

In [90]:
np.vstack((x,y))

Out[90]:
array([[1, 2, 3],
[3, 2, 2],
[5, 5, 5],
[7, 5, 9]])
In [91]:
np.hstack((x,y))

Out[91]:
array([[1, 2, 3, 5, 5, 5],
[3, 2, 2, 7, 5, 9]])

Create equally spaced data

In [92]:
np.arange(start = 1, stop = 10, step = 2)

Out[92]:
array([1, 3, 5, 7, 9])
In [93]:
np.linspace(start = 0, stop = 1, num = 11)

Out[93]:
array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9,  1. ])

### Interpolation¶

Consider the following data:

In [94]:
y = [5,  4, 10,  8,  1, 10,  2,  7,  1,  3]


Suppose we want to create data in between the given data points (interpolation); for instance, let's try to double the resolution of the data by generating twice as many data:

In [95]:
t = np.linspace(0, len(y), len(y))       # time vector for the original data
tn = np.linspace(0, len(y), 2 * len(y))  # new time vector for the new time-normalized data
yn = np.interp(tn, t, y)                 # new time-normalized data
yn

Out[95]:
array([ 5.        ,  4.52631579,  4.05263158,  6.52631579,  9.36842105,
9.26315789,  8.31578947,  5.78947368,  2.47368421,  3.36842105,
7.63157895,  8.31578947,  4.52631579,  2.78947368,  5.15789474,
6.36842105,  3.52631579,  1.10526316,  2.05263158,  3.        ])

The key is the Numpy interp function, from its help:

interp(x, xp, fp, left=None, right=None)
One-dimensional linear interpolation.
Returns the one-dimensional piecewise linear interpolant to a function with given values at discrete data-points.



A plot of the data will show what we have done:

In [96]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.figure(figsize=(10,5))
plt.plot(t, y, 'bo-', lw=2, label='original data')
plt.plot(tn, yn, '.-', color=[1, 0, 0, .5], lw=2, label='interpolated')
plt.legend(loc='best', framealpha=.5)
plt.show()


For more about Numpy, see http://www.numpy.org/.

## Read and save files¶

There are two kinds of computer files: text files and binary files:

Text file: computer file where the content is structured as a sequence of lines of electronic text. Text files can contain plain text (letters, numbers, and symbols) but they are not limited to such. The type of content in the text file is defined by the Unicode encoding (a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems).

Binary file: computer file where the content is encoded in binary form, a sequence of integers representing byte values.

Let's see how to save and read numeric data stored in a text file:

Using plain Python

In [97]:
f = open("newfile.txt", "w")           # open file for writing
f.write("This is a test\n")            # save to file
f.write("And here is another line\n")  # save to file
f.close()
f = open('newfile.txt', 'r')           # open file for reading
f = f.read()                           # read from file
print(f)

This is a test
And here is another line


In [98]:
help(open)

Help on built-in function open in module io:

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
Open file and return a stream.  Raise IOError upon failure.

file is either a text or byte string giving the name (and the path
if the file isn't in the current working directory) of the file to
be opened or an integer file descriptor of the file to be
wrapped. (If a file descriptor is given, it is closed when the
returned I/O object is closed, unless closefd is set to False.)

mode is an optional string that specifies the mode in which the file
is opened. It defaults to 'r' which means open for reading in text
mode.  Other common values are 'w' for writing (truncating the file if
it already exists), 'x' for creating and writing to a new file, and
'a' for appending (which on some Unix systems, means that all writes
append to the end of the file regardless of the current seek position).
In text mode, if encoding is not specified the encoding used is platform
dependent: locale.getpreferredencoding(False) is called to get the
current locale encoding. (For reading and writing raw bytes use binary
mode and leave encoding unspecified.) The available modes are:

========= ===============================================================
Character Meaning
--------- ---------------------------------------------------------------
'r'       open for reading (default)
'w'       open for writing, truncating the file first
'x'       create a new file and open it for writing
'a'       open for writing, appending to the end of the file if it exists
'b'       binary mode
't'       text mode (default)
'+'       open a disk file for updating (reading and writing)
'U'       universal newline mode (deprecated)
========= ===============================================================

The default mode is 'rt' (open for reading text). For binary random
access, the mode 'w+b' opens and truncates the file to 0 bytes, while
'r+b' opens the file without truncation. The 'x' mode implies 'w' and
raises an FileExistsError if the file already exists.

Python distinguishes between files opened in binary and text modes,
even when the underlying operating system doesn't. Files opened in
binary mode (appending 'b' to the mode argument) return contents as
bytes objects without any decoding. In text mode (the default, or when
't' is appended to the mode argument), the contents of the file are
returned as strings, the bytes having been first decoded using a
platform-dependent encoding or using the specified encoding if given.

'U' mode is deprecated and will raise an exception in future versions
of Python.  It has no effect in Python 3.  Use newline to control
universal newlines mode.

buffering is an optional integer used to set the buffering policy.
Pass 0 to switch buffering off (only allowed in binary mode), 1 to select
line buffering (only usable in text mode), and an integer > 1 to indicate
the size of a fixed-size chunk buffer.  When no buffering argument is
given, the default buffering policy works as follows:

* Binary files are buffered in fixed-size chunks; the size of the buffer
is chosen using a heuristic trying to determine the underlying device's
"block size" and falling back on io.DEFAULT_BUFFER_SIZE.
On many systems, the buffer will typically be 4096 or 8192 bytes long.

* "Interactive" text files (files for which isatty() returns True)
use line buffering.  Other text files use the policy described above
for binary files.

encoding is the name of the encoding used to decode or encode the
file. This should only be used in text mode. The default encoding is
platform dependent, but any encoding supported by Python can be
passed.  See the codecs module for the list of supported encodings.

errors is an optional string that specifies how encoding errors are to
be handled---this argument should not be used in binary mode. Pass
'strict' to raise a ValueError exception if there is an encoding error
(the default of None has the same effect), or pass 'ignore' to ignore
errors. (Note that ignoring encoding errors can lead to data loss.)
See the documentation for codecs.register or run 'help(codecs.Codec)'
for a list of the permitted encoding error strings.

newline controls how universal newlines works (it only applies to text
mode). It can be None, '', '\n', '\r', and '\r\n'.  It works as
follows:

* On input, if newline is None, universal newlines mode is
enabled. Lines in the input can end in '\n', '\r', or '\r\n', and
these are translated into '\n' before being returned to the
caller. If it is '', universal newline mode is enabled, but line
endings are returned to the caller untranslated. If it has any of
the other legal values, input lines are only terminated by the given
string, and the line ending is returned to the caller untranslated.

* On output, if newline is None, any '\n' characters written are
translated to the system default line separator, os.linesep. If
newline is '' or '\n', no translation takes place. If newline is any
of the other legal values, any '\n' characters written are translated
to the given string.

If closefd is False, the underlying file descriptor will be kept open
when the file is closed. This does not work when a file name is given
and must be True in that case.

A custom opener can be used by passing a callable as *opener*. The
underlying file descriptor for the file object is then obtained by
calling *opener* with (*file*, *flags*). *opener* must return an open
file descriptor (passing os.open as *opener* results in functionality
similar to passing None).

open() returns a file object whose type depends on the mode, and
through which the standard file operations such as reading and writing
are performed. When open() is used to open a file in a text mode ('w',
'r', 'wt', 'rt', etc.), it returns a TextIOWrapper. When used to open
a file in a binary mode, the returned class varies: in read binary
mode, it returns a BufferedReader; in write binary and append binary
modes, it returns a BufferedWriter, and in read/write mode, it returns
a BufferedRandom.

It is also possible to use a string or bytearray as a file for both
reading and writing. For strings StringIO can be used like a file
opened in a text mode, and for bytes a BytesIO can be used like a file
opened in a binary mode.



Using Numpy

In [99]:
import numpy as np
data = np.random.randn(3,3)
np.savetxt('myfile.txt', data, fmt="%12.6G")    # save to file
data = np.genfromtxt('myfile.txt', unpack=True) # read from file
data

Out[99]:
array([[-0.141613 ,  0.0158789,  1.09553  ],
[-0.538208 ,  0.370273 , -0.0108841],
[-0.327891 ,  0.117738 ,  1.63674  ]])

## Ploting with matplotlib¶

Matplotlib is the most-widely used packge for plotting data in Python. Let's see some examples of it.

In [100]:
import matplotlib.pyplot as plt


Use the IPython magic %matplotlib inline to plot a figure inline in the notebook with the rest of the text:

In [104]:
%matplotlib inline

In [105]:
import numpy as np

In [106]:
t = np.linspace(0, 0.99, 100)
x = np.sin(2 * np.pi * 2 * t)
n = np.random.randn(100) / 5
plt.Figure(figsize=(12,8))
plt.plot(t, x, label='sine', linewidth=2)
plt.plot(t, x + n, label='noisy sine', linewidth=2)
plt.annotate(s='$sin(4 \pi t)$', xy=(.2, 1), fontsize=20, color=[0, 0, 1])
plt.legend(loc='best', framealpha=.5)
plt.xlabel('Time [s]')
plt.ylabel('Amplitude')
plt.title('Data plotting using matplotlib')
plt.show()


Use the IPython magic %matplotlib qt to plot a figure in a separate window (from where you will be able to change some of the figure proprerties):

In [107]:
%matplotlib qt

Warning: Cannot change to a different GUI toolkit: qt. Using notebook instead.

In [108]:
mu, sigma = 10, 2
x = mu + sigma * np.random.randn(1000)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
ax1.plot(x, 'ro')
ax1.set_title('Data')
ax1.grid()

n, bins, patches = ax2.hist(x, 25, normed=True, facecolor='r') # histogram
ax2.set_xlabel('Bins')
ax2.set_ylabel('Probability')
ax2.set_title('Histogram')
fig.suptitle('Another example using matplotlib', fontsize=18, y=1)
ax2.grid()

plt.tight_layout()
plt.show()


And a window with the following figure should appear:

In [1]:
from IPython.display import Image
Image(url="./../images/plot.png")

Out[1]:

You can switch back and forth between inline and separate figure using the %matplotlib magic commands used above. There are plenty more examples with the source code in the matplotlib gallery.

In [110]:
# get back the inline plot
%matplotlib inline


## Signal processing with Scipy¶

The Scipy package has a lot of functions for signal processing, among them: Integration (scipy.integrate), Optimization (scipy.optimize), Interpolation (scipy.interpolate), Fourier Transforms (scipy.fftpack), Signal Processing (scipy.signal), Linear Algebra (scipy.linalg), and Statistics (scipy.stats). As an example, let's see how to use a low-pass Butterworth filter to attenuate high-frequency noise and how the differentiation process of a signal affects the signal-to-noise content. We will also calculate the Fourier transform of these data to look at their frequencies content.

In [111]:
from scipy.signal import butter, filtfilt
import scipy.fftpack
freq = 100.
t = np.arange(0,1,.01);
w = 2*np.pi*1 # 1 Hz
y = np.sin(w*t)+0.1*np.sin(10*w*t)
# Butterworth filter
b, a = butter(4, (5/(freq/2)), btype = 'low')
y2 = filtfilt(b, a, y)
# 2nd derivative of the data
ydd = np.diff(y,2)*freq*freq   # raw data
y2dd = np.diff(y2,2)*freq*freq # filtered data
# frequency content
yfft = np.abs(scipy.fftpack.fft(y))/(y.size/2);   # raw data
y2fft = np.abs(scipy.fftpack.fft(y2))/(y.size/2); # filtered data
freqs = scipy.fftpack.fftfreq(y.size, 1./freq)
yddfft = np.abs(scipy.fftpack.fft(ydd))/(ydd.size/2);
y2ddfft = np.abs(scipy.fftpack.fft(y2dd))/(ydd.size/2);
freqs2 = scipy.fftpack.fftfreq(ydd.size, 1./freq)


And the plots:

In [114]:
fig, ((ax1,ax2),(ax3,ax4)) = plt.subplots(2, 2, figsize=(12, 6))

ax1.set_title('Temporal domain', fontsize=14)
ax1.plot(t, y, 'r', linewidth=2, label = 'raw data')
ax1.plot(t, y2, 'b', linewidth=2, label = 'filtered @ 5 Hz')
ax1.set_ylabel('f')
ax1.legend(frameon=False, fontsize=12)

ax2.set_title('Frequency domain', fontsize=14)
ax2.plot(freqs[:int(yfft.size/4)], yfft[:int(yfft.size/4)],'r',  lw=2,label='raw data')
ax2.plot(freqs[:int(yfft.size/4)],y2fft[:int(yfft.size/4)],'b--',lw=2,label='filtered @ 5 Hz')
ax2.set_ylabel('FFT(f)')
ax2.legend(frameon=False, fontsize=12)

ax3.plot(t[:-2], ydd, 'r', linewidth=2, label = 'raw')
ax3.plot(t[:-2], y2dd, 'b', linewidth=2, label = 'filtered @ 5 Hz')
ax3.set_xlabel('Time [s]'); ax3.set_ylabel("f ''")

ax4.plot(freqs[:int(yddfft.size/4)], yddfft[:int(yddfft.size/4)], 'r', lw=2, label = 'raw')
ax4.plot(freqs[:int(yddfft.size/4)],y2ddfft[:int(yddfft.size/4)],'b--',lw=2, label='filtered @ 5 Hz')
ax4.set_xlabel('Frequency [Hz]'); ax4.set_ylabel("FFT(f '')")
plt.show()


## Symbolic mathematics with Sympy¶

Sympy is a package to perform symbolic mathematics in Python. Let's see some of its features:

In [115]:
from IPython.display import display
import sympy as sym
from sympy.interactive import printing
printing.init_printing()


Define some symbols and the create a second-order polynomial function (a.k.a., parabola):

In [116]:
x, y = sym.symbols('x y')
y = x**2 - 2*x - 3
y

Out[116]:
$$x^{2} - 2 x - 3$$

Plot the parabola at some given range:

In [117]:
from sympy.plotting import plot
%matplotlib inline
plot(y, (x, -3, 5));


And the roots of the parabola are given by:

In [118]:
sym.solve(y, x)

Out[118]:
$$\left [ -1, \quad 3\right ]$$

We can also do symbolic differentiation and integration:

In [119]:
dy = sym.diff(y, x)
dy

Out[119]:
$$2 x - 2$$
In [120]:
sym.integrate(dy, x)

Out[120]:
$$x^{2} - 2 x$$

For example, let's use Sympy to represent three-dimensional rotations. Consider the problem of a coordinate system xyz rotated in relation to other coordinate system XYZ. The single rotations around each axis are illustrated by:

In [2]:
from IPython.display import Image
Image(url="./../images/rotations.png")

Out[2]:

The single 3D rotation matrices around Z, Y, and X axes can be expressed in Sympy:

In [122]:
from IPython.core.display import Math
from sympy import symbols, cos, sin, Matrix, latex
a, b, g = symbols('alpha beta gamma')

RX = Matrix([[1, 0, 0], [0, cos(a), -sin(a)], [0, sin(a), cos(a)]])
display(Math(latex('\\mathbf{R_{X}}=') + latex(RX, mat_str = 'matrix')))

RY = Matrix([[cos(b), 0, sin(b)], [0, 1, 0], [-sin(b), 0, cos(b)]])
display(Math(latex('\\mathbf{R_{Y}}=') + latex(RY, mat_str = 'matrix')))

RZ = Matrix([[cos(g), -sin(g), 0], [sin(g), cos(g), 0], [0, 0, 1]])
display(Math(latex('\\mathbf{R_{Z}}=') + latex(RZ, mat_str = 'matrix')))

$$\mathbf{R_{X}}=\left[\begin{matrix}1 & 0 & 0\\0 & \cos{\left (\alpha \right )} & - \sin{\left (\alpha \right )}\\0 & \sin{\left (\alpha \right )} & \cos{\left (\alpha \right )}\end{matrix}\right]$$
$$\mathbf{R_{Y}}=\left[\begin{matrix}\cos{\left (\beta \right )} & 0 & \sin{\left (\beta \right )}\\0 & 1 & 0\\- \sin{\left (\beta \right )} & 0 & \cos{\left (\beta \right )}\end{matrix}\right]$$
$$\mathbf{R_{Z}}=\left[\begin{matrix}\cos{\left (\gamma \right )} & - \sin{\left (\gamma \right )} & 0\\\sin{\left (\gamma \right )} & \cos{\left (\gamma \right )} & 0\\0 & 0 & 1\end{matrix}\right]$$

And using Sympy, a sequence of elementary rotations around X, Y, Z axes is given by:

In [123]:
RXYZ = RZ*RY*RX
display(Math(latex('\\mathbf{R_{XYZ}}=') + latex(RXYZ, mat_str = 'matrix')))

$$\mathbf{R_{XYZ}}=\left[\begin{matrix}\cos{\left (\beta \right )} \cos{\left (\gamma \right )} & \sin{\left (\alpha \right )} \sin{\left (\beta \right )} \cos{\left (\gamma \right )} - \sin{\left (\gamma \right )} \cos{\left (\alpha \right )} & \sin{\left (\alpha \right )} \sin{\left (\gamma \right )} + \sin{\left (\beta \right )} \cos{\left (\alpha \right )} \cos{\left (\gamma \right )}\\\sin{\left (\gamma \right )} \cos{\left (\beta \right )} & \sin{\left (\alpha \right )} \sin{\left (\beta \right )} \sin{\left (\gamma \right )} + \cos{\left (\alpha \right )} \cos{\left (\gamma \right )} & - \sin{\left (\alpha \right )} \cos{\left (\gamma \right )} + \sin{\left (\beta \right )} \sin{\left (\gamma \right )} \cos{\left (\alpha \right )}\\- \sin{\left (\beta \right )} & \sin{\left (\alpha \right )} \cos{\left (\beta \right )} & \cos{\left (\alpha \right )} \cos{\left (\beta \right )}\end{matrix}\right]$$

Suppose there is a rotation only around X ($\alpha$) by $\pi/2$; we can get the numerical value of the rotation matrix by substituing the angle values:

In [124]:
r = RXYZ.subs({a: np.pi/2, b: 0, g: 0})
r

Out[124]:
$$\left[\begin{matrix}1 & 0 & 0\\0 & 6.12323399573677 \cdot 10^{-17} & -1.0\\0 & 1.0 & 6.12323399573677 \cdot 10^{-17}\end{matrix}\right]$$

And we can prettify this result:

In [125]:
display(Math(latex(r'\mathbf{R_{(\alpha=\pi/2)}}=') +
latex(r.n(chop=True, prec=3), mat_str = 'matrix')))

$$\mathbf{R_{(\alpha=\pi/2)}}=\left[\begin{matrix}1.0 & 0 & 0\\0 & 0 & -1.0\\0 & 1.0 & 0\end{matrix}\right]$$

For more about Sympy, see http://docs.sympy.org/latest/tutorial/.

## Data analysis with pandas¶

"pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python."

To work with labellled data, pandas has a type called DataFrame (basically, a matrix where columns and rows have may names and may be of different types) and it is also the main type of the software R. Fo ezample:

In [126]:
import pandas as pd

In [127]:
x = 5*['A'] + 5*['B']
x

Out[127]:
['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B']
In [128]:
df = pd.DataFrame(np.random.rand(10,2), columns=['Level 1', 'Level 2'] )
df['Group'] = pd.Series(['A']*5 + ['B']*5)
plot = df.boxplot(by='Group')

In [130]:
from pandas.plotting import scatter_matrix
df = pd.DataFrame(np.random.randn(100, 3), columns=['A', 'B', 'C'])
plot = scatter_matrix(df, alpha=0.5, figsize=(8, 6), diagonal='kde')


pandas is aware the data is structured and give you basic statistics considerint that and nicely formatted:

In [131]:
df.describe()

Out[131]:
A B C
count 100.000000 100.000000 100.000000
mean 0.101437 -0.004921 0.078482
std 1.017746 1.084727 1.023564
min -2.323379 -2.708453 -2.866506
25% -0.584179 -0.752746 -0.583681
50% 0.074468 -0.050982 0.115910
75% 0.845898 0.705415 0.674648
max 2.713227 2.302115 2.585530

For more on pandas, see this tutorial: http://pandas.pydata.org/pandas-docs/stable/10min.html.