In this notebook we simulate the Poisson Pareto Burst Process (PPBP), modeling internet traffic.
We monitor a continuous time stochastic process $t\in[0,T]\mapsto B_t$, where $B_t$ is the number of active bursts in the traffic of data, at the time $t$.
The number of starting bursts (for example the number of requests for WEB download sesions) is a Poisson process of rate $\lambda$, and burst durations are independent and identical distributed random variables having Pareto distribution, Pareto($\alpha, \beta$), of shape $\alpha\in (1,2)$, and scale $\beta$.
The cumulative distribution function (cdf) of Pareto($\alpha, \beta$) is defined as follows:
$F(x)=1-(\beta/x)^\alpha$ for $x\geq \beta$, and $0$ otherwise. $\beta$ is the smallest duration of a burst.
For $\alpha\in(1,2)$ the burst duration random variable, $D$, has a finite expectation, $E(D)=\displaystyle\frac{\alpha\beta}{\alpha-1}$, and infinite variance.
At time $t=0$, when we are starting monitoring the burst process, there already exist active bursts and their number is a Poisson random variable of parameter $\Lambda=\lambda E(D)$. The durations of these pre-existent bursts are independent and identical distributed random variables having the probability distribution whose cdf is defined by:
$\alpha, \beta$ are the parameters of the above Pareto distribution. The probability distribution with this cdf is called distribution of Pareto forward recurrence times. It has infinite mean and variance.
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as st
The graph of $G(x)$ is illustrated below:
alpha=1.7
beta=2
a=1.0/alpha
x=np.linspace(0.02, beta,50)
y=1-(alpha-1)*a*(1.0-x/beta)-a
xx=np.linspace(beta,8, 100)
yy=1-a*np.power(xx/beta, 1.0-alpha)
plt.plot(x,y, 'g')
plt.plot(xx, yy, 'r')
[<matplotlib.lines.Line2D at 0x37b4c90>]
A random variable having the cdf $G$ can be simulated by inversion method:
def rvsG(alpha, beta, n):
vals=[]
a=1-1.0/alpha
for i in range(n):
u=np.random.random()
if (u<a):
vals.append( beta*(alpha*a-1.0+u)/(a*alpha-1))
else:
vals.append(beta*np.power(alpha*(1-u), 1.0/(1-alpha)))
return vals
vals=rvsG(alpha,beta, 500)
histo=plt.hist(vals, bins=50, normed=True)
Since the Pareto distributed burst durations have infinite variance, the PPBP process exhibits Long Range Dependence (LRD), meaning that it has autocorrelations that persist over all times scales. LRD is characterized by the Hurst exponent, $H\in(0.5, 1)$. $H$ is related to the shape parameter of the Pareto distribution by $H=(3-\alpha)/2$.
In order to simulate the burst process we have to set:
From $H$ we get $\alpha=3-2H$, and from $E(D)$, the location parameter $\beta$ for the Pareto distribution.
def timesBurstInterval(ts, tf, h):
j=(int)(ts/h)+1 if np.fmod(ts, h) else (int)(ts/h)
k=(int)(tf/h) if np.fmod(tf, h) else (int)(tf/h)-1
return j,k
def internetTraffic(T, lam=5.0, H=0.65, meanD=4.0):#lam, meanD, H,
alpha=3.0-2*H
beta=(alpha-1)*meanD/alpha
h=beta/4
print beta, h
tm=np.arange(0, T, h)
B=np.zeros(tm.shape[0], dtype=float)
theta=1.0/lam
S=st.expon(scale=theta)
D=st.pareto(alpha)
n=st.poisson.rvs(lam*meanD)
initD= rvsG(alpha, beta, n)
for i in range(n):
j,k=timesBurstInterval(0, min(initD[i],T), h)
B[j:k+1]+=1.0
ts=0
tf=0
while(ts<T):
x=S.rvs(size=1)# length of interarrivals intervals
ts=ts+x# burst start time
if ts>=T:
break
d=beta*D.rvs(size=1)
tf=ts+d# burst stop time
j,k=timesBurstInterval(ts, min(tf,T),h)
B[j:k+1]+=1.0 #update the number of active bursts in [ts, tf]
return tm, B
tm, B=internetTraffic(2000)
plt.rcParams['figure.figsize'] = (15.0, 6.0)
plt.plot (tm, B)
plt.title('Internet traffic as a Poisson Pareto Burst Process. Hurst exponent 0.75')
plt.xlabel('time t')
plt.ylabel('Number of active bursts')
1.64705882353 0.411764705882
<matplotlib.text.Text at 0x3be7110>
Now we send the points of coordinates $(t, B_t)$ to a Plotly stream. Plotly's real-time streaming API is avalaible only starting with Plotly 1.0.
import plotly.plotly as py
import plotly.tools as tls
from plotly.graph_objs import Stream
from plotly.graph_objs import Figure, Data, Layout, Scatter
tls.set_credentials_file(username="empet",
api_key="xxxxxxxxxx")#replace with your username and api-key
# Read credentials
em_creds = tls.get_credentials_file()
#Sign in to Plotly from API
py.sign_in(em_creds['username'], em_creds['api_key'])
tls.set_credentials_file(stream_ids=["yyyyyyyyyy"])#replace with your stream_id
em_stream_ids = tls.get_credentials_file()['stream_ids']
em_stream_id = em_stream_ids[0]
em_stream = Stream(token=em_stream_id, maxpoints=700)
em_data = Data([Scatter(x=[], y=[], mode='lines+markers', stream=em_stream)])
em_layout = Layout(title='Internet Traffic as a Poisson Pareto Burst Process')
em_fig = Figure(data=em_data, layout=em_layout)
unique_url = py.plot(em_fig, filename='em_first-stream')
s = py.Stream(em_stream_id)
# Open the stream
s.open()
import time
import datetime
i=0
N=B.shape[0]
time.sleep(5) # delay start of stream by 5 seconds
while(i<N):
t=datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')
yy= B[i]
s.write(dict(x=t,y=yy)) # write to Plotly stream
i += 1
time.sleep(0.1)
s.close()
from IPython.display import Image
Image(filename='plotlygraph.png')
E. Petrisor
from IPython.core.display import HTML
def css_styling():
styles = open("./styles/custom.css", "r").read()
return HTML(styles)
css_styling()