pyugm
models¶This notebook shows how to specify a simple discrete undirected probabilistic graphical model and perform common operations like marginalisation and calibration.
The aim of the package is to provide ways to quickly specify and test out undirected probabilistic graphical models.
At the moment it is too slow to tackle even medium sized problems (like in vision),
but I plan to move the main inference routines to Cython
soon, which should should make the package more generally usable.
I hope to incorporate some the nice features of my two favourite machine learning packages, sklearn
and PyMC
.
sklearn
provides a uniform interface to different models and many of the common preprocessing steps.pyugm
models should therefore have a similar interface and the necessary helpers to easily apply a model to actual data.
A drawback of sklearn
, IMHO, is that almost none of the models are Bayesian (even for the Bayesian ridge regression model it is
difficult to get the posterior or the predictive distribution).
PyMC
, on the other hand, is fully Bayesian and a wonderful tool for many proplems. I find, however, that it issometimes difficult to specify models with many types of observed variables - it seems to me to be more aimed at models with output variables of a single type.
import numpy as np
from pyugm.factor import DiscreteFactor
# Specify the potential table
factor_data = np.array([[1, 2], [2, 1]])
# The variable names ("1" and "2") and cardinalities (2 and 2).
variables_names_and_cardinalities = [(1, 2), (2, 2)]
# Construct the factor
factor = DiscreteFactor(variables_names_and_cardinalities, data=factor_data)
print factor
F{1, 2}
factor.data # The potential table
array([[ 1., 2.], [ 2., 1.]])
# Marginalise out all the variables that is not named "1". (i.e. marginalise out variable "2")
marg = factor.marginalize([1])
print marg
print marg.data
F{1} [ 3. 3.]
Belief
s are mutable Factor
s. They contain the current belief over the variables in the factor.
from pyugm.factor import DiscreteBelief
# Create a belief that is based on a factor
belief = DiscreteBelief(factor)
# Reduce the original factor by observing variable "1" taking on the value 0. [TODO: implement efficient factor reduction]
# Evidence is set by a dictionary where the key is a variable name and the value its observed value.
belief.set_evidence({1: 0})
print belief
print belief.data
F{1, 2} [[ 1. 2.] [ 0. 0.]]
Models are collections of factors. The model automatically builds a cluster graph by greedily adding the factor that have the largest separator set with a factor already in the graph. By using this scheme you will often end up with a tree.
from pyugm.model import Model
factor1 = DiscreteFactor([(1, 2), (2, 2)], data=np.array([[1, 2], [2, 1]]))
factor2 = DiscreteFactor([(2, 2), ('variable3', 3)], # Variable names can also be strings
data=np.array([[0, 0.2, 0.3], [0.1, 0.5, 0.3]])) # Cardinalities of 2 and 3 means the factor table must be 2x3
# [TODO: cardinalities can be inferred from data shape when provided]
factor3 = DiscreteFactor([('variable3', 3), (4, 2)], data=np.array([[0, 1], [1, 2], [0.5, 0]]))
model = Model([factor1, factor2, factor3])
model.edges # returns a set of tuples
{(F{1, 2}, F{2, variable3}), (F{2, variable3}, F{variable3, 4})}
The graph:
factor1 -- factor2 -- factor3
has been built.
Models contain immutable factor
s, while Inference
objects contain belief
s. Inference
objects contain the calibrate
method to calibrate the belief
s.
from pyugm.model import Model
from pyugm.infer_message import LoopyBeliefUpdateInference
Run loopy belief propagation on a new model. - Actually it is not the message passing version of belief propagation but the belief update algorithm.
factor1 = DiscreteFactor([(1, 2), (2, 2)], data=np.array([[1, 2], [2, 1]]))
factor2 = DiscreteFactor([(2, 2), ('variable3', 3)], data=np.array([[0, 0.2, 0.3], [0.1, 0.5, 0.3]]))
factor3 = DiscreteFactor([('variable3', 3), (4, 2)], data=np.array([[0, 1], [1, 2], [0.5, 0.1]]))
model = Model([factor1, factor2, factor3])
inferrer = LoopyBeliefUpdateInference(model)
inferrer.calibrate()
<pyugm.infer_message.LoopyBeliefUpdateInference at 0x10771b2d0>
# Calibrated marginals
print inferrer.get_marginals(1)[0], inferrer.get_marginals(1)[0].data
print inferrer.get_marginals(2)[0], inferrer.get_marginals(2)[0].data
F{1} [ 0.56510417 0.43489583] F{2} [ 0.3046875 0.6953125]
# Natural logarithm of the normalizing factor
print inferrer.partition_approximation()
2.03861954626
variable3 = 1
?¶inferrer.calibrate(evidence={'variable3': 1})
<pyugm.infer_message.LoopyBeliefUpdateInference at 0x10771b2d0>
# Calibrated marginals
print inferrer.get_marginals(1)[0], inferrer.get_marginals(1)[0].data
print inferrer.get_marginals(2)[0], inferrer.get_marginals(2)[0].data
F{1} [ 0.50759607 0.49240393] F{2} [ 0.4772118 0.5227882]
# Natural logarithm of the normalizing factor
print inferrer.partition_approximation()
1.75892086647
Although many improvements are necessary I hope this gives a glimpse of what I'm aiming at. I'll discuss parameter learning and different update orderings in another notebook.