2. Pyplot State Machine
3. Pylab
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(0, 10, 0.2)
y = np.sin(x)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x, y)
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(0, 10, 0.2)
y = np.sin(x)
plt.plot(x, y)
%pylab
x = arange(0, 10, 0.2)
y = sin(x)
plot(x, y)
Consult the built-in documentation, for example:
>>> help(subplot)
Help on function subplot in module matplotlib.pyplot:
subplot(*args, **kwargs)
Return a subplot axes positioned by the given grid definition.
...
%pylab inline
Populating the interactive namespace from numpy and matplotlib
Use Pandas to load a dataset which contains population data for four countries
import pandas as pd
populations = pd.read_csv(
'https://git.lumc.nl/courses/programming-course/raw/visualization-2018/visualization/data/populations.csv'
)
Take a quick look at the data
populations.head()
Year | Belgium | Denmark | Netherlands | Sweden | |
---|---|---|---|---|---|
0 | 1950 | 8.63930 | 4.28135 | 10.11365 | 7.01660 |
1 | 1951 | 8.67820 | 4.30370 | 10.26440 | 7.07040 |
2 | 1952 | 8.73040 | 4.33380 | 10.38210 | 7.12445 |
3 | 1953 | 8.77775 | 4.36930 | 10.49300 | 7.17145 |
4 | 1954 | 8.81940 | 4.40570 | 10.61535 | 7.21360 |
Let's make a plot the population of the Netherlands on the y-axis, and the year on the x-axis
plot(populations['Year'], populations['Netherlands']);
plot(populations['Year'], populations['Netherlands'])
title('Historical Population of The Netherlands')
xlabel('Year')
ylabel('Population (Millions)');
How about a 5px thick orange line?
plot(populations['Year'], populations['Netherlands'],
linewidth=5, color='orange')
title('Historical Population of The Netherlands')
xlabel('Year')
ylabel('Population (Millions)');
Label at five-year intervals
Display the label vertically
plot(populations['Year'], populations['Netherlands'],
linewidth=5, color='orange')
title('Historical Population of The Netherlands')
xlabel('Year')
ylabel('Population (Millions)')
xticks(range(1950, 2016, 5), rotation=90);
plot(populations['Year'], populations['Netherlands'],
linewidth=5, color='orange')
title('Historical Population of The Netherlands')
xlabel('Year')
ylabel('Population (Millions)')
xticks(range(1950, 2016, 5), rotation=90)
xlim(1970, 1990);
plot(populations['Year'], populations['Netherlands'],
linewidth=5, color='orange')
title('Historical Population of The Netherlands')
xlabel('Year')
ylabel('Population (Millions)')
xticks(range(1950, 2016, 5), rotation=90)
xlim(1970, 1990)
ylim(13,15);
Integer tick labels
plot(populations['Year'], populations['Netherlands'],
linewidth=5, color='orange')
title('Historical Population of The Netherlands')
xlabel('Year')
ylabel('Population (Millions)')
xticks(range(1950, 2016, 5), rotation=90)
xlim(1970, 1990)
ylim(13,15)
yticks(range(13,16));
Calling plot multiple times within the same cell will add multiple series to the chart
Let's compare the Dutch with the Danes
plot(populations['Year'], populations['Netherlands'], color='orange')
plot(populations['Year'], populations['Denmark'], color='red')
title('Historical Populations of The Netherlands and Denmark')
xlabel('Year')
ylabel('Population (Millions)')
xticks(range(1950, 2016, 5), rotation=90);
plot(populations['Year'], populations['Netherlands'],
color='orange', label='The Netherlands')
plot(populations['Year'], populations['Denmark'],
color='red', label='Denmark')
legend(loc='upper left')
title('Historical Populations of The Netherlands and Denmark')
xlabel('Year')
ylabel('Population (Millions)')
xticks(range(1950, 2016, 5), rotation=90);
Let's load a different dataset and take a look at some different plot types
flowers = pd.read_csv('https://git.lumc.nl/courses/programming-course/raw/visualization-2018/visualization/data/iris.csv')
flowers.head()
sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
A simple boxplot of the sepal-length distribution
boxplot(flowers['sepal_length'], labels=['Sepal_length']);
Distributions of multiple features
# make a list containing the numeric feature column names
features = list(flowers.columns[:-1])
features
['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
# plot the data
boxplot([flowers[f] for f in features], labels=features);
Let's change the shape of the boxplot
# make the figure 10 'units' wide and 5 'units' high
figsize(10, 5)
# plot the data
boxplot([flowers[f] for f in features], labels=features);
figsize(7,4)
hist(flowers['petal_length'])
title('Petal Length Distribution')
xlabel('petal length')
ylabel('count');
change the number of 'bins'
hist(flowers['petal_length'], bins=20)
title('Petal Length Distribution')
xlabel('petal length')
ylabel('count');
hist(flowers['petal_length'], bins=20, facecolor='teal', edgecolor='black', alpha=0.7)
title('Petal Length Distribution')
xlabel('petal length')
ylabel('count');
for i in range(1, 5):
subplot(2, 2, i)
xticks([]), yticks([])
text(0.5, 0.5, 'subplot(2, 2, %d)' % i, ha='center', size=18, alpha=0.75);
subplot(2, 2, 1) indicates the first cell of a 2 row x 2 column matrix
subplot(2, 2, 4) indicates the fourth cell of a 2 column x 2 row matrix
More complicated layouts
subplot(1, 3, 1) # 1 row, 3 columns, cell 1
xticks([]), yticks([])
text(0.5, 0.5, '(1, 3, 1)', ha='center', size=18, alpha=0.75)
subplot(2, 3, 3) # 2 rows, 3 columns, cell 3
xticks([]), yticks([])
text(0.5, 0.5, '(2, 3, 3)', ha='center', size=18, alpha=0.75)
subplot(3, 2, 6) # 3 rows, 2 columns, cell 6
xticks([]), yticks([])
text(0.5, 0.5, '(3, 2, 6)', ha='center', size=18, alpha=0.75)
subplot(3, 3, 5) # 3 rows, 3 columns, cell 5
xticks([]), yticks([])
text(0.5, 0.5, '(3, 3, 5)', ha='center', size=18, alpha=0.75);
Compare how the features are distributed by species
species = list(set(flowers.species))
print(species)
['virginica', 'setosa', 'versicolor']
# make a dataset for each species
setosa = flowers[flowers.species == 'setosa']
versicolor = flowers[flowers.species == 'versicolor']
virginica = flowers[flowers.species == 'virginica']
figsize(10, 8)
for cell, feature in enumerate(features):
subplot(2, 2, cell + 1)
boxplot(
[setosa[feature], versicolor[feature], virginica[feature]],
labels=species
)
ylabel(feature)
figsize(7,4)
using xkcd mode
with xkcd():
hist(flowers['petal_length'], bins=20, facecolor='teal', edgecolor='black')
title('Petal Length Distribution')
xlabel('petal length')
ylabel('count');
Images can be saved to a file using savefig after the plotting commands:
savefig('myplot.pdf')
The format of the saved image will be inferred from the given file extension.
This lesson was based on previous work by Jeroen Laros and Martijn Vermaat