probfit Basic Tutorial

Lets start with a simple simple straight line

We can't really call this a fitting package without being able to fit a straight line, right?

In [1]:
from probfit import describe, Chi2Regression
import iminuit
import numpy as np
import numpy.random as npr
In [2]:
#lets make s straight line with gaussian(mu=0,sigma=1) noise
npr.seed(0)
x = linspace(0,10,20) 
y = 3*x+15+ randn(20)
err = np.array([1]*20)
errorbar(x,y,err,fmt='.');
In [3]:
#lets define our line
#first argument has to be independent variable
#arguments after that are shape parameters
def line(x,m,c): #define it to be parabolic or whatever you like
    return m*x+c
#We can make it faster but for this example this is plentily fast.
#We will talk about speeding things up later(you will need cython)
In [4]:
describe(line)
Out[4]:
['x', 'm', 'c']
In [5]:
#cost function
chi2 = Chi2Regression(line,x,y,err)
In [6]:
#Chi^2 regression is just a callable object nothing special about it
describe(chi2)
Out[6]:
['m', 'c']
In [7]:
#minimize it
#yes it gives you a heads up that you didn't give it initial value
#we can ignore it for now
minimizer = iminuit.Minuit(chi2) #see iminuit tutorial on how to give initial value/range/error
minimizer.migrad(); #very stable robust minimizer
#you can look at your terminal to see what it is doing;
-c:4: InitialParamWarning: Parameter m does not have initial value. Assume 0.
-c:4: InitialParamWarning: Parameter m is floating but does not have initial step size. Assume 1.
-c:4: InitialParamWarning: Parameter c does not have initial value. Assume 0.
-c:4: InitialParamWarning: Parameter c is floating but does not have initial step size. Assume 1.

FCN = 12.0738531135 NFCN = 36 NCALLS = 36
EDM = 1.10886029888e-21 GOAL EDM = 1e-05 UP = 1.0
Valid Valid Param Accurate Covar PosDef Made PosDef
True True True True False
Hesse Fail HasCov Above EDM Reach calllim
False True False False
+ Name Value Parab Error Minos Error- Minos Error+ Limit- Limit+ FIXED
1 m 2.886277e+00 7.367884e-02 0.000000e+00 0.000000e+00
2 c 1.613795e+01 4.309458e-01 0.000000e+00 0.000000e+00

In [8]:
#lets see our results
print minimizer.values
print minimizer.errors
{'c': 16.137947520534624, 'm': 2.8862774144823855}
{'c': 0.4309458211385722, 'm': 0.07367884284273937}
In [9]:
#or a pretty printing
minimizer.print_fmin()

FCN = 12.0738531135 NFCN = 36 NCALLS = 36
EDM = 1.10886029888e-21 GOAL EDM = 1e-05 UP = 1.0
Valid Valid Param Accurate Covar PosDef Made PosDef
True True True True False
Hesse Fail HasCov Above EDM Reach calllim
False True False False
+ Name Value Parab Error Minos Error- Minos Error+ Limit- Limit+ FIXED
1 m 2.886277e+00 7.367884e-02 0.000000e+00 0.000000e+00
2 c 1.613795e+01 4.309458e-01 0.000000e+00 0.000000e+00

Parabolic error

is calculated using the second derivative at the minimum This is good in most cases where the uncertainty is symmetric not much correlation exists. Migrad usually got this accurately but if you want ot be sure call minimizer.hesse() after migrad

Minos Error

is obtained by scanning chi^2 and likelihood profile and find the the point where chi^2 is increased by 1 or -ln likelihood is increased by 0.5 This error is generally asymmetric and is correct in all case. This is quite computationally expensive if you have many parameter. call minimizer.minos('variablename') after migrad to get this error

In [10]:
#let's visualize our line
chi2.draw(minimizer)
#looks good
In [11]:
#Sometime we want the error matrix
print 'error matrix:\n', minimizer.matrix()
#or correlation matrix
print 'correlation matrix:\n', minimizer.matrix(correlation=True)
#or a pretty html
#despite the method named error_matrix it's actually a correlation matrix
minimizer.print_matrix()
error matrix:
((0.005428571882645087, -0.027142859751431703), (-0.027142859751431703, 0.18571430075679826))
correlation matrix:
((1.0, -0.8548504260481388), (-0.8548504260481388, 1.0))
+
m
c
m 1.00 -0.85
c -0.85 1.00

Simple gaussian distribution fit

In high energy physics, we usually want to fit a distribution to a histogram. Let's look at simple gaussian distribution

In [12]:
import numpy.random as npr
import iminuit
from probfit import BinnedLH
In [13]:
#First lets make our data
npr.seed(0)
data = randn(10000)*4+1
#sigma = 4 and mean = 1
hist(data,bins=100,histtype='step');

Here is our PDF/model

In [14]:
#normalized gaussian
def myPDF(x,mu,sigma):
    return 1/sqrt(2*pi)/sigma*exp(-(x-mu)**2/2./sigma**2)
In [15]:
#build your cost function here we use binned likelihood
cost = BinnedLH(myPDF,data)
In [16]:
#Let's fit
minimizer = iminuit.Minuit(cost,sigma=3.) #notice here that we give initial value to sigma

#up parameter determine where it determine uncertainty
#1(default) for chi^2 and 0.5 for likelihood
minimizer.set_up(0.5)
minimizer.migrad() #very stable minimization algorithm
#like in all binned fit with long zero tail. It will have to do something about the zero bin
#dist_fit handle them gracefully but will give you a head up
Out[16]:
-c:2: InitialParamWarning: Parameter mu does not have initial value. Assume 0.
-c:2: InitialParamWarning: Parameter mu is floating but does not have initial step size. Assume 1.
-c:2: InitialParamWarning: Parameter sigma is floating but does not have initial step size. Assume 1.
-c:7: LogWarning: x is really small return 0

FCN = 20.9361422674 NFCN = 46 NCALLS = 46
EDM = 9.66960245303e-07 GOAL EDM = 5e-06 UP = 0.5
Valid Valid Param Accurate Covar PosDef Made PosDef
True True True True False
Hesse Fail HasCov Above EDM Reach calllim
False True False False
+ Name Value Parab Error Minos Error- Minos Error+ Limit- Limit+ FIXED
1 mu 9.258899e-01 3.962587e-02 0.000000e+00 0.000000e+00
2 sigma 3.940361e+00 2.835235e-02 0.000000e+00 0.000000e+00

({'hesse_failed': False, 'has_reached_call_limit': False, 'has_accurate_covar': True, 'has_posdef_covar': True, 'up': 0.5, 'edm': 9.66960245302778e-07, 'is_valid': True, 'is_above_max_edm': False, 'has_covariance': True, 'has_made_posdef_covar': False, 'has_valid_parameters': True, 'fval': 20.936142267424803, 'nfcn': 46},
 [{'is_const': False, 'name': 'mu', 'has_limits': False, 'value': 0.9258898864801999, 'number': 0L, 'has_lower_limit': False, 'upper_limit': 0.0, 'lower_limit': 0.0, 'has_upper_limit': False, 'error': 0.039625867907556483, 'is_fixed': False},
  {'is_const': False, 'name': 'sigma', 'has_limits': False, 'value': 3.9403605325352453, 'number': 1L, 'has_lower_limit': False, 'upper_limit': 0.0, 'lower_limit': 0.0, 'has_upper_limit': False, 'error': 0.028352352139778852, 'is_fixed': False}])
In [17]:
#let's see the result
print 'Value:', minimizer.values
print 'Error:', minimizer.errors
Value: {'mu': 0.9258898864801999, 'sigma': 3.9403605325352453}
Error: {'mu': 0.039625867907556483, 'sigma': 0.028352352139778852}
In [18]:
#That printout can get out of hand quickly
minimizer.print_fmin()
#and correlation matrix
#will not display well in firefox(tell them to fix writing-mode:)
minimizer.print_matrix()

FCN = 20.9361422674 NFCN = 46 NCALLS = 46
EDM = 9.66960245303e-07 GOAL EDM = 5e-06 UP = 0.5
Valid Valid Param Accurate Covar PosDef Made PosDef
True True True True False
Hesse Fail HasCov Above EDM Reach calllim
False True False False
+ Name Value Parab Error Minos Error- Minos Error+ Limit- Limit+ FIXED
1 mu 9.258899e-01 3.962587e-02 0.000000e+00 0.000000e+00
2 sigma 3.940361e+00 2.835235e-02 0.000000e+00 0.000000e+00

+
mu
sigma
mu 1.00 -0.00
sigma -0.00 1.00
In [19]:
#you can see if your fit make any sense too
cost.draw(minimizer)#uncertainty is given by symetric poisson
In [20]:
#how about making sure the error making sense
minimizer.draw_mnprofile('mu');
-c:2: LogWarning: x is really small return 0
In [21]:
#2d contour error
#you can notice that it takes sometime to draw
#we will this is because our PDF is defined in Python
#we will show how to speed this up later
minimizer.draw_mncontour('mu','sigma');
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/iminuit/_plotting.py:78: LogWarning: x is really small return 0
  sigma=this_sig)

How about Chi^2

Let's explore another popular cost function chi^2. Chi^2 is bad when you have bin with 0. ROOT just ignore. ROOFIT does something I don't remember. But's it's best to avoid using chi^2 when you have bin with 0 count.

In [22]:
import numpy.random as npr
from math import sqrt,exp,pi
import iminuit
from probfit import Extended, gaussian, BinnedChi2, describe
In [23]:
#we will use the same data as in the previous example
npr.seed(0)
data = npr.randn(10000)*4+1
#sigma = 4 and mean = 1
hist(data, bins=100, histtype='step');
In [24]:
#And the same PDF: normalized gaussian
def myPDF(x,mu,sigma):
    return 1/sqrt(2*pi)/sigma*exp(-(x-mu)**2/2./sigma**2)
In [25]:
#binned chi^2 fit only makes sense for extended fit
extended_pdf = Extended(myPDF)
In [26]:
#very useful method to look at function signature
describe(extended_pdf) #you can look at what your pdf means
Out[26]:
['x', 'mu', 'sigma', 'N']
In [27]:
#Chi^2 distribution fit is really bad for distribution with long tail
#since when bin count=0... poisson error=0 and blows up chi^2
#so give it some range
chi2 = BinnedChi2(extended_pdf,data,bound=(-7,10))
minimizer = iminuit.Minuit(chi2,sigma=1)
minimizer.migrad()
Out[27]:
-c:5: InitialParamWarning: Parameter mu does not have initial value. Assume 0.
-c:5: InitialParamWarning: Parameter mu is floating but does not have initial step size. Assume 1.
-c:5: InitialParamWarning: Parameter sigma is floating but does not have initial step size. Assume 1.
-c:5: InitialParamWarning: Parameter N does not have initial value. Assume 0.
-c:5: InitialParamWarning: Parameter N is floating but does not have initial step size. Assume 1.

FCN = 36.5025976822 NFCN = 245 NCALLS = 245
EDM = 1.60978285965e-09 GOAL EDM = 1e-05 UP = 1.0
Valid Valid Param Accurate Covar PosDef Made PosDef
True True True True False
Hesse Fail HasCov Above EDM Reach calllim
False True False False
+ Name Value Parab Error Minos Error- Minos Error+ Limit- Limit+ FIXED
1 mu 9.064821e-01 4.482535e-02 0.000000e+00 0.000000e+00
2 sigma 3.960930e+00 4.110841e-02 0.000000e+00 0.000000e+00
3 N 9.969682e+03 1.033724e+02 0.000000e+00 0.000000e+00

({'hesse_failed': False, 'has_reached_call_limit': False, 'has_accurate_covar': True, 'has_posdef_covar': True, 'up': 1.0, 'edm': 1.6097828596516295e-09, 'is_valid': True, 'is_above_max_edm': False, 'has_covariance': True, 'has_made_posdef_covar': False, 'has_valid_parameters': True, 'fval': 36.502597682222884, 'nfcn': 245},
 [{'is_const': False, 'name': 'mu', 'has_limits': False, 'value': 0.9064821049526899, 'number': 0L, 'has_lower_limit': False, 'upper_limit': 0.0, 'lower_limit': 0.0, 'has_upper_limit': False, 'error': 0.0448253527998543, 'is_fixed': False},
  {'is_const': False, 'name': 'sigma', 'has_limits': False, 'value': 3.960930426694066, 'number': 1L, 'has_lower_limit': False, 'upper_limit': 0.0, 'lower_limit': 0.0, 'has_upper_limit': False, 'error': 0.041108411444117775, 'is_fixed': False},
  {'is_const': False, 'name': 'N', 'has_limits': False, 'value': 9969.68167277957, 'number': 2L, 'has_lower_limit': False, 'upper_limit': 0.0, 'lower_limit': 0.0, 'has_upper_limit': False, 'error': 103.37239320160195, 'is_fixed': False}])
In [28]:
chi2.draw(minimizer)
In [29]:
#and usual deal
minimizer.print_fmin()
minimizer.print_matrix() 

FCN = 36.5025976822 NFCN = 245 NCALLS = 245
EDM = 1.60978285965e-09 GOAL EDM = 1e-05 UP = 1.0
Valid Valid Param Accurate Covar PosDef Made PosDef
True True True True False
Hesse Fail HasCov Above EDM Reach calllim
False True False False
+ Name Value Parab Error Minos Error- Minos Error+ Limit- Limit+ FIXED
1 mu 9.064821e-01 4.482535e-02 0.000000e+00 0.000000e+00
2 sigma 3.960930e+00 4.110841e-02 0.000000e+00 0.000000e+00
3 N 9.969682e+03 1.033724e+02 0.000000e+00 0.000000e+00

+
mu
sigma
N
mu 1.00 -0.10 -0.05
sigma -0.10 1.00 0.18
N -0.05 0.18 1.00

Unbinned Likelihood and How to speed things up

Unbinned likelihood is computationally very very expensive. It's now a good time that we talk about how to speed things up with cython

In [30]:
import numpy.random as npr
from probfit import UnbinnedLH, gaussian, describe
import iminuit
In [31]:
#same data
npr.seed(0)
data = npr.randn(10000)*4+1
#sigma = 4 and mean = 1
hist(data,bins=100,histtype='step');
In [32]:
#We want to speed things up with cython
#load cythonmagic
%load_ext cythonmagic
In [33]:
%%cython
cimport cython
from libc.math cimport exp,M_PI,sqrt
#same gaussian distribution but now written in cython
@cython.binding(True)#IMPORTANT:this tells cython to dump function signature too
def cython_PDF(double x,double mu,double sigma):
    #these are c add multiply etc not python so it's fast
    return 1/sqrt(2*M_PI)/sigma*exp(-(x-mu)**2/2./sigma**2)
In [34]:
#cost function 
ublh = UnbinnedLH(cython_PDF,data)
minimizer = iminuit.Minuit(ublh,sigma=2.)
minimizer.set_up(0.5)#remember this is likelihood
minimizer.migrad()#yes amazingly fast
ublh.show(minimizer)
minimizer.print_fmin()
minimizer.print_matrix() 
-c:3: InitialParamWarning: Parameter mu does not have initial value. Assume 0.
-c:3: InitialParamWarning: Parameter mu is floating but does not have initial step size. Assume 1.
-c:3: InitialParamWarning: Parameter sigma is floating but does not have initial step size. Assume 1.

FCN = 27927.1139471 NFCN = 69 NCALLS = 69
EDM = 5.05909350517e-09 GOAL EDM = 5e-06 UP = 0.5
Valid Valid Param Accurate Covar PosDef Made PosDef
True True True True False
Hesse Fail HasCov Above EDM Reach calllim
False True False False
+ Name Value Parab Error Minos Error- Minos Error+ Limit- Limit+ FIXED
1 mu 9.262679e-01 3.950226e-02 0.000000e+00 0.000000e+00
2 sigma 3.950224e+00 2.793227e-02 0.000000e+00 0.000000e+00


FCN = 27927.1139471 NFCN = 69 NCALLS = 69
EDM = 5.05909350517e-09 GOAL EDM = 5e-06 UP = 0.5
Valid Valid Param Accurate Covar PosDef Made PosDef
True True True True False
Hesse Fail HasCov Above EDM Reach calllim
False True False False
+ Name Value Parab Error Minos Error- Minos Error+ Limit- Limit+ FIXED
1 mu 9.262679e-01 3.950226e-02 0.000000e+00 0.000000e+00
2 sigma 3.950224e+00 2.793227e-02 0.000000e+00 0.000000e+00

+
mu
sigma
mu 1.00 0.00
sigma 0.00 1.00
In [35]:
#remember how slow it was?
#now it's super fast(and it's even unbinned likelihood)
minimizer.draw_mnprofile('mu');

But you really don't have to write your own gaussian there are tons of builtin function written in cython for you.

In [36]:
import probfit.pdf
print dir(probfit.pdf)
print describe(gaussian)
print type(gaussian)
['MinimalFuncCode', 'Polynomial', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__pyx_capi__', '__test__', 'argus', 'cauchy', 'cruijff', 'crystalball', 'describe', 'doublegaussian', 'gaussian', 'linear', 'novosibirsk', 'np', 'poly2', 'poly3', 'rtv_breitwigner', 'ugaussian']
['x', 'mean', 'sigma']
<type 'builtin_function_or_method'>
In [37]:
ublh = UnbinnedLH(gaussian,data)
minimizer = iminuit.Minuit(ublh,sigma=2.)
minimizer.set_up(0.5)#remember this is likelihood
minimizer.migrad()#yes amazingly fast
ublh.show(minimizer)
-c:2: InitialParamWarning: Parameter mean does not have initial value. Assume 0.
-c:2: InitialParamWarning: Parameter mean is floating but does not have initial step size. Assume 1.
-c:2: InitialParamWarning: Parameter sigma is floating but does not have initial step size. Assume 1.

FCN = 27927.1139471 NFCN = 69 NCALLS = 69
EDM = 5.07834778662e-09 GOAL EDM = 5e-06 UP = 0.5
Valid Valid Param Accurate Covar PosDef Made PosDef
True True True True False
Hesse Fail HasCov Above EDM Reach calllim
False True False False
+ Name Value Parab Error Minos Error- Minos Error+ Limit- Limit+ FIXED
1 mean 9.262679e-01 3.950226e-02 0.000000e+00 0.000000e+00
2 sigma 3.950224e+00 2.793227e-02 0.000000e+00 0.000000e+00

But... We can't normalize everything analytically and how to generate toy sample from PDF

When fitting distribution to a PDF. One of the common problem that we run into is normalization. Not all function is analytically integrable on the range of our interest. Let's look at crystal ball function.

In [38]:
from probfit import crystalball, gen_toy, Normalized, describe, UnbinnedLH
import numpy.random as npr
import iminuit
In [39]:
#lets first generate a crystal ball sample
#dist_fit has builtin toy generation capability
#lets introduce crystal ball function
#http://en.wikipedia.org/wiki/Crystal_Ball_function
#it's simply gaussian with power law tail
#normally found in energy deposited in crystals
#impossible to normalize analytically
#and normalization will depend on shape parameters
describe(crystalball)
Out[39]:
['x', 'alpha', 'n', 'mean', 'sigma']
In [40]:
npr.seed(0)
bound = (-1,2)
data = gen_toy(crystalball,10000,bound=bound,alpha=1.,n=2.,mean=1.,sigma=0.3,quiet=False)
#quiet = False tells it to plot out original function
#toy histogram and poisson error from both orignal distribution and toy
['x', 'alpha', 'n', 'mean', 'sigma']
In [41]:
#To fit this we need to tell normalized our crystal ball PDF
#this is done with trapezoid rule with simple cache mechanism
#can be done by Normalized functor
ncball = Normalized(crystalball,bound)
#this can also bedone with declerator
#@normalized_function(bound)
#def myPDF(x,blah):
#    return blah
print 'unnorm:', crystalball(1.0,1,2,1,0.3)
print '  norm:', ncball(1.0,1,2,1,0.3)
unnorm: 1.0
  norm: 1.10945677354
In [42]:
#it has the same signature as the crystalball
describe(ncball)
Out[42]:
['x', 'alpha', 'n', 'mean', 'sigma']
In [43]:
#now we can fit as usual
ublh = UnbinnedLH(ncball,data)
minimizer = iminuit.Minuit(ublh,
    alpha=1.,n=2.1,mean=1.2,sigma=0.3)
minimizer.set_up(0.5)#remember this is likelihood
minimizer.migrad()#yes amazingly fast Normalize is written in cython
ublh.show(minimizer)
#crystalball function is nortorious for its sensitivity on n parameter
#dist_fit give you a heads up where it might have float overflow
-c:4: InitialParamWarning: Parameter alpha is floating but does not have initial step size. Assume 1.
-c:4: InitialParamWarning: Parameter n is floating but does not have initial step size. Assume 1.
-c:4: InitialParamWarning: Parameter mean is floating but does not have initial step size. Assume 1.
-c:4: InitialParamWarning: Parameter sigma is floating but does not have initial step size. Assume 1.
-c:6: SmallIntegralWarning: (0.9689428295957161, 0.44026873052662185, -7.85278743872955, 0.9263532214909775, 0.29811259533420964)

FCN = 6154.3751674 NFCN = 178 NCALLS = 178
EDM = 1.37875641304e-06 GOAL EDM = 5e-06 UP = 0.5
Valid Valid Param Accurate Covar PosDef Made PosDef
True True True True False
Hesse Fail HasCov Above EDM Reach calllim
False True False False
+ Name Value Parab Error Minos Error- Minos Error+ Limit- Limit+ FIXED
1 alpha 1.012952e+00 5.216176e-02 0.000000e+00 0.000000e+00
2 n 1.812784e+00 2.145145e-01 0.000000e+00 0.000000e+00
3 mean 9.982477e-01 5.508616e-03 0.000000e+00 0.000000e+00
4 sigma 2.996569e-01 4.150674e-03 0.000000e+00 0.000000e+00

What if things went wrong

In [44]:
#crystalball is nortoriously sensitive to initial parameter
#now it is a good time to show what happen when things...doesn't fit
ublh = UnbinnedLH(ncball,data)
minimizer = iminuit.Minuit(ublh)#NO initial value
minimizer.set_up(0.5)#remember this is likelihood
minimizer.migrad()#yes amazingly fast but tons of read
#Remember there is a heads up
Out[44]:
-c:4: InitialParamWarning: Parameter alpha does not have initial value. Assume 0.
-c:4: InitialParamWarning: Parameter alpha is floating but does not have initial step size. Assume 1.
-c:4: InitialParamWarning: Parameter n does not have initial value. Assume 0.
-c:4: InitialParamWarning: Parameter n is floating but does not have initial step size. Assume 1.
-c:4: InitialParamWarning: Parameter mean does not have initial value. Assume 0.
-c:4: InitialParamWarning: Parameter mean is floating but does not have initial step size. Assume 1.
-c:4: InitialParamWarning: Parameter sigma does not have initial value. Assume 0.
-c:4: InitialParamWarning: Parameter sigma is floating but does not have initial step size. Assume 1.

FCN = 10986.1228867 NFCN = 230 NCALLS = 230
EDM = 0.0 GOAL EDM = 5e-06 UP = 0.5
Valid Valid Param Accurate Covar PosDef Made PosDef
False True False False False
Hesse Fail HasCov Above EDM Reach calllim
True True False False
+ Name Value Parab Error Minos Error- Minos Error+ Limit- Limit+ FIXED
1 alpha 0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00
2 n 0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00
3 mean 0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00
4 sigma 0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00

({'hesse_failed': True, 'has_reached_call_limit': False, 'has_accurate_covar': False, 'has_posdef_covar': False, 'up': 0.5, 'edm': 0.0, 'is_valid': False, 'is_above_max_edm': False, 'has_covariance': True, 'has_made_posdef_covar': False, 'has_valid_parameters': True, 'fval': 10986.122886679766, 'nfcn': 230},
 [{'is_const': False, 'name': 'alpha', 'has_limits': False, 'value': 0.0, 'number': 0L, 'has_lower_limit': False, 'upper_limit': 0.0, 'lower_limit': 0.0, 'has_upper_limit': False, 'error': 1.0, 'is_fixed': False},
  {'is_const': False, 'name': 'n', 'has_limits': False, 'value': 0.0, 'number': 1L, 'has_lower_limit': False, 'upper_limit': 0.0, 'lower_limit': 0.0, 'has_upper_limit': False, 'error': 1.0, 'is_fixed': False},
  {'is_const': False, 'name': 'mean', 'has_limits': False, 'value': 0.0, 'number': 2L, 'has_lower_limit': False, 'upper_limit': 0.0, 'lower_limit': 0.0, 'has_upper_limit': False, 'error': 1.0, 'is_fixed': False},
  {'is_const': False, 'name': 'sigma', 'has_limits': False, 'value': 0.0, 'number': 3L, 'has_lower_limit': False, 'upper_limit': 0.0, 'lower_limit': 0.0, 'has_upper_limit': False, 'error': 1.0, 'is_fixed': False}])
In [45]:
ublh.show(minimizer)#it bounds to fails
In [46]:
minimizer.migrad_ok(), minimizer.matrix_accurate()
#checking these two method give you a good sign
Out[46]:
(False, False)
In [47]:
#fix it by giving it initial value/error/limit or fixing parameter see iminuit Tutorial
#now we can fit as usual
ublh = UnbinnedLH(ncball,data)
minimizer = iminuit.Minuit(ublh,
    alpha=1.,n=2.1,mean=1.2,sigma=0.3)
minimizer.set_up(0.5)#remember this is likelihood
minimizer.migrad()#yes amazingly fast. Normalize is written in cython
ublh.show(minimizer)
-c:5: InitialParamWarning: Parameter alpha is floating but does not have initial step size. Assume 1.
-c:5: InitialParamWarning: Parameter n is floating but does not have initial step size. Assume 1.
-c:5: InitialParamWarning: Parameter mean is floating but does not have initial step size. Assume 1.
-c:5: InitialParamWarning: Parameter sigma is floating but does not have initial step size. Assume 1.

FCN = 6154.3751674 NFCN = 178 NCALLS = 178
EDM = 1.37875641304e-06 GOAL EDM = 5e-06 UP = 0.5
Valid Valid Param Accurate Covar PosDef Made PosDef
True True True True False
Hesse Fail HasCov Above EDM Reach calllim
False True False False
+ Name Value Parab Error Minos Error- Minos Error+ Limit- Limit+ FIXED
1 alpha 1.012952e+00 5.216176e-02 0.000000e+00 0.000000e+00
2 n 1.812784e+00 2.145145e-01 0.000000e+00 0.000000e+00
3 mean 9.982477e-01 5.508616e-03 0.000000e+00 0.000000e+00
4 sigma 2.996569e-01 4.150674e-03 0.000000e+00 0.000000e+00

How do we guess initial value?

This is a hard question but visualization can helps us

In [48]:
from probfit import try_uml, Normalized, crystalball, gen_toy
In [49]:
bound = (-1,2)
ncball = Normalized(crystalball,bound)
data = gen_toy(crystalball,10000,bound=bound,alpha=1.,n=2.,mean=1.,sigma=0.3)
In [50]:
besttry = try_uml(ncball,data,alpha=1.,n=2.1,mean=1.2,sigma=0.3)
In [51]:
#or you can try multiple
#too many will just confuse you
besttry = try_uml(ncball,data,alpha=1.,n=2.1,mean=[1.2,1.1],sigma=[0.3,0.5])
print besttry #and you can find which one give you minimal unbinned likelihood
{'alpha': 1.0, 'mean': 1.1, 'sigma': 0.3, 'n': 2.1}

Extended Fit: 2 Gaussian with Polynomial Background

In [52]:
import numpy.random as npr
from probfit import gen_toy, gaussian, Extended, describe, try_binlh, BinnedLH
import numpy as np
import iminuit
In [53]:
peak1 = npr.randn(3000)*0.2
peak2 = npr.randn(5000)*0.1+4
bg = gen_toy(lambda x : (x+2)**2, 20000,(-2,5))
all_data = np.concatenate([peak1,peak2,bg])
hist((peak1,peak2,bg,all_data),bins=200,histtype='step',range=(-2,5));
In [54]:
%load_ext cythonmagic
The cythonmagic extension is already loaded. To reload it, use:
  %reload_ext cythonmagic
In [55]:
%%cython
cimport cython
from probfit import Normalized, gaussian

@cython.binding(True)
def poly(double x,double a,double b, double c):
    return a*x*x+b*x+c

#remember linear function is not normalizable for -inf ... +inf
nlin = Normalized(poly,(-1,5))

#our extended PDF for 3 types of signal
@cython.binding(True)
def myPDF(double x, 
    double a, double b, double c, double nbg,
    double mu1, double sigma1, double nsig1,
    double mu2, double sigma2, double nsig2):

    cdef double NBG = nbg*nlin(x,a,b,c)
    cdef double NSIG1 = nsig1*gaussian(x,mu1,sigma1) 
    cdef double NSIG2 = nsig2*gaussian(x,mu2,sigma2)
    return NBG + NSIG1 + NSIG2
In [56]:
print describe(myPDF)
['x', 'a', 'b', 'c', 'nbg', 'mu1', 'sigma1', 'nsig1', 'mu2', 'sigma2', 'nsig2']
In [57]:
#lets use what we just learned
#for complicated function good initial value(and initial step) is crucial
#if it doesn't converge try play with initial value and initial stepsize(error_xxx)
besttry = try_binlh(myPDF,all_data, 
    a=1.,b=2.,c=4.,nbg=20000.,
    mu1=0.1,sigma1=0.2,nsig1=3000.,
    mu2=3.9,sigma2=0.1,nsig2=5000.,extended=True, bins=300, bound=(-1,5) )
print besttry
{'a': 1.0, 'c': 4.0, 'b': 2.0, 'mu1': 0.1, 'mu2': 3.9, 'sigma1': 0.2, 'nsig2': 5000.0, 'nsig1': 3000.0, 'nbg': 20000.0, 'sigma2': 0.1}
In [58]:
binlh = BinnedLH(myPDF,all_data, bins=200, extended=True, bound=(-1,5))
#too lazy to type initial values from what we try?
#use keyword expansion **besttry
#need to put in initial step size for mu1 and mu2 
#with error_mu* otherwise it won't converge(try it yourself)
minimizer = iminuit.Minuit(binlh, error_mu1=0.1, error_mu2=0.1, **besttry)
-c:6: InitialParamWarning: Parameter a is floating but does not have initial step size. Assume 1.
-c:6: InitialParamWarning: Parameter b is floating but does not have initial step size. Assume 1.
-c:6: InitialParamWarning: Parameter c is floating but does not have initial step size. Assume 1.
-c:6: InitialParamWarning: Parameter nbg is floating but does not have initial step size. Assume 1.
-c:6: InitialParamWarning: Parameter sigma1 is floating but does not have initial step size. Assume 1.
-c:6: InitialParamWarning: Parameter nsig1 is floating but does not have initial step size. Assume 1.
-c:6: InitialParamWarning: Parameter sigma2 is floating but does not have initial step size. Assume 1.
-c:6: InitialParamWarning: Parameter nsig2 is floating but does not have initial step size. Assume 1.
In [59]:
minimizer.migrad();

FCN = 102.411500691 NFCN = 253 NCALLS = 253
EDM = 1.77181708742e-06 GOAL EDM = 5e-06 UP = 0.5
Valid Valid Param Accurate Covar PosDef Made PosDef
True True True True False
Hesse Fail HasCov Above EDM Reach calllim
False True False False
+ Name Value Parab Error Minos Error- Minos Error+ Limit- Limit+ FIXED
1 a 8.334438e-01 4.686769e-02 0.000000e+00 0.000000e+00
2 b 3.639166e+00 1.242921e-01 0.000000e+00 0.000000e+00
3 c 3.546744e+00 1.306977e-01 0.000000e+00 0.000000e+00
4 nbg 1.991540e+04 1.635680e+02 0.000000e+00 0.000000e+00
5 mu1 -1.478781e-03 4.662901e-03 0.000000e+00 0.000000e+00
6 sigma1 1.959101e-01 4.064030e-03 0.000000e+00 0.000000e+00
7 nsig1 2.989618e+03 6.949690e+01 0.000000e+00 0.000000e+00
8 mu2 4.003399e+00 2.071651e-03 0.000000e+00 0.000000e+00
9 sigma2 9.790340e-02 2.006122e-03 0.000000e+00 0.000000e+00
10 nsig2 5.049941e+03 1.008751e+02 0.000000e+00 0.000000e+00

In [60]:
binlh.show(minimizer)
In [61]:
minimizer.print_fmin()

FCN = 102.411500691 NFCN = 253 NCALLS = 253
EDM = 1.77181708742e-06 GOAL EDM = 5e-06 UP = 0.5
Valid Valid Param Accurate Covar PosDef Made PosDef
True True True True False
Hesse Fail HasCov Above EDM Reach calllim
False True False False
+ Name Value Parab Error Minos Error- Minos Error+ Limit- Limit+ FIXED
1 a 8.334438e-01 4.686769e-02 0.000000e+00 0.000000e+00
2 b 3.639166e+00 1.242921e-01 0.000000e+00 0.000000e+00
3 c 3.546744e+00 1.306977e-01 0.000000e+00 0.000000e+00
4 nbg 1.991540e+04 1.635680e+02 0.000000e+00 0.000000e+00
5 mu1 -1.478781e-03 4.662901e-03 0.000000e+00 0.000000e+00
6 sigma1 1.959101e-01 4.064030e-03 0.000000e+00 0.000000e+00
7 nsig1 2.989618e+03 6.949690e+01 0.000000e+00 0.000000e+00
8 mu2 4.003399e+00 2.071651e-03 0.000000e+00 0.000000e+00
9 sigma2 9.790340e-02 2.006122e-03 0.000000e+00 0.000000e+00
10 nsig2 5.049941e+03 1.008751e+02 0.000000e+00 0.000000e+00

Advance: Custom cost function and Simultaneous Fit

Sometimes, what we want to fit is the sum of likelihood /chi^2 of two PDF sharing some parameters. dist_fit doesn't provied a built_in facility to do this but it can be built easily.

In this example, we will fit two gaussian distribution where we know that the width are the same but the peak is at different places.

In [62]:
import numpy.random as npr
from probfit import UnbinnedLH, gaussian, describe, draw_compare_hist
import iminuit
In [63]:
npr.seed(0)
#Lets make two gaussian
data1 = npr.randn(10000)+3
data2 = npr.randn(10000)-2
hist([data1,data2],bins=100,histtype='step',label=['data1','data2']);
In [64]:
#remember this is nothing special about builtin cost function
#except some utility function like draw and show
ulh1 = UnbinnedLH(gaussian,data1)
ulh2 = UnbinnedLH(gaussian,data2)
print describe(ulh1)
print describe(ulh2)
['mean', 'sigma']
['mean', 'sigma']
In [65]:
#you can also use cython to do this
class CustomCost:
    def __init__(self,pdf1,data1,pdf2,data2):
        self.ulh1 = UnbinnedLH(pdf1,data1)
        self.ulh2 = UnbinnedLH(pdf2,data2)
    #this is the important part you need __call__ to calculate your cost
    #in our case it's sum of likelihood with sigma
    def __call__(self,mu1,mu2,sigma):
        return self.ulh1(mu1,sigma)+self.ulh2(mu2,sigma)
In [66]:
simul_lh = CustomCost(gaussian,data1,gaussian,data2)
In [67]:
minimizer = iminuit.Minuit(simul_lh,sigma=0.5)
minimizer.set_up(0.5)#remember it's likelihood
minimizer.migrad();
-c:1: InitialParamWarning: errordef is not given. Default to 1.
-c:1: InitialParamWarning: Parameter mu1 does not have initial value. Assume 0.
-c:1: InitialParamWarning: Parameter mu1 is floating but does not have initial step size. Assume 1.
-c:1: InitialParamWarning: Parameter mu2 does not have initial value. Assume 0.
-c:1: InitialParamWarning: Parameter mu2 is floating but does not have initial step size. Assume 1.
-c:1: InitialParamWarning: Parameter sigma is floating but does not have initial step size. Assume 1.

FCN = 28184.0142876 NFCN = 97 NCALLS = 97
EDM = 2.24683294143e-09 GOAL EDM = 5e-06 UP = 0.5
Valid Valid Param Accurate Covar PosDef Made PosDef
True True True True False
Hesse Fail HasCov Above EDM Reach calllim
False True False False
+ Name Value Parab Error Minos Error- Minos Error+ Limit- Limit+ FIXED
1 mu1 2.981566e+00 9.903099e-03 0.000000e+00 0.000000e+00
2 mu2 -1.989012e+00 9.903099e-03 0.000000e+00 0.000000e+00
3 sigma 9.903098e-01 4.951551e-03 0.000000e+00 0.000000e+00

In [68]:
minimizer.print_fmin()
minimizer.print_matrix()
results = minimizer.values

FCN = 28184.0142876 NFCN = 97 NCALLS = 97
EDM = 2.24683294143e-09 GOAL EDM = 5e-06 UP = 0.5
Valid Valid Param Accurate Covar PosDef Made PosDef
True True True True False
Hesse Fail HasCov Above EDM Reach calllim
False True False False
+ Name Value Parab Error Minos Error- Minos Error+ Limit- Limit+ FIXED
1 mu1 2.981566e+00 9.903099e-03 0.000000e+00 0.000000e+00
2 mu2 -1.989012e+00 9.903099e-03 0.000000e+00 0.000000e+00
3 sigma 9.903098e-01 4.951551e-03 0.000000e+00 0.000000e+00

+
mu1
mu2
sigma
mu1 1.00 0.00 0.00
mu2 0.00 1.00 0.00
sigma 0.00 0.00 1.00
In [69]:
draw_compare_hist(gaussian,[results['mu1'],results['sigma']],data1,normed=True);
draw_compare_hist(gaussian,[results['mu2'],results['sigma']],data2,normed=True);
In []: