Simple Timeseries Plots in `pandas`¶

Using a dummy dataset...

First, let's create a function to generate a random datetime between two datetimes:

In [30]:

#https://stackoverflow.com/a/553448/454773
from random import randrange
from datetime import timedelta
from datetime import datetime

def random_datetime(start=datetime.strptime('1/1/2019 1:30 PM','%d/%m/%Y %I:%M %p'),
                end=datetime.strptime('31/1/2019 4:50 AM', '%d/%m/%Y %I:%M %p')):
    """
    This function will return a random datetime between two datetime 
    objects.
    """
    delta = end - start
    int_delta = (delta.days * 24 * 60 * 60) + delta.seconds
    random_second = randrange(int_delta)
    return start + timedelta(seconds=random_second)

random_datetime()

Out[30]:

datetime.datetime(2019, 1, 29, 20, 40, 17)

Now we can createa dummy dataframe with a couple of columns of random data:

In [63]:

import numpy as np

numsamples = 1000

df=pd.DataFrame({'val1':np.random.rand(numsamples)})
df['datetime']= [random_datetime() for i in range(numsamples)]
#Add some periodicity to the second column of random numbers...
df['val2']=df.apply(lambda x: x['val1']*(1+np.sin(2*np.pi*(x['datetime'].hour-6)/24)), axis=1)
df

Out[63]:

	val1	datetime	val2
0	0.231440	2019-01-02 07:55:19	0.291342
1	0.266321	2019-01-22 01:14:58	0.009075
2	0.952032	2019-01-09 08:51:14	1.428048
3	0.643093	2019-01-12 09:13:53	1.097828
4	0.354115	2019-01-07 10:05:28	0.660788
...	...	...	...
995	0.042920	2019-01-17 18:38:57	0.042920
996	0.208727	2019-01-30 04:19:08	0.104364
997	0.249041	2019-01-02 03:52:03	0.072942
998	0.484270	2019-01-07 17:22:07	0.609609
999	0.765707	2019-01-03 03:24:35	0.224270

1000 rows × 3 columns

In [64]:

# median value by hour
df.groupby([df['datetime'].dt.hour]).median()

Out[64]:

	val1	val2
datetime
0	0.401714	0.000000
1	0.611881	0.020849
2	0.552599	0.074034
3	0.539660	0.158063
4	0.441285	0.220642
5	0.557000	0.412838
6	0.588485	0.588485
7	0.471580	0.593633
8	0.509826	0.764738
9	0.498654	0.851255
10	0.366607	0.684097
11	0.431500	0.848297
12	0.379588	0.759176
13	0.523362	1.028892
14	0.436434	0.814397
15	0.413217	0.705405
16	0.329830	0.494745
17	0.686149	0.863737
18	0.514521	0.514521
19	0.485645	0.359951
20	0.472008	0.236004
21	0.496440	0.145404
22	0.566665	0.075919
23	0.535262	0.018239

In [67]:

ax = df.groupby([df['datetime'].dt.hour]).median().reset_index().plot(kind='scatter',
                                                                      x='datetime',y='val1')

df.groupby([df['datetime'].dt.hour]).median().reset_index().plot(kind='scatter',
                                                                 x='datetime',y='val2', color='red', ax=ax)

Out[67]:

<matplotlib.axes._subplots.AxesSubplot at 0x11ff5d890>

Simple Timeseries Plots in pandas¶

Simple Timeseries Plots in `pandas`¶