pandas
¶Using a dummy dataset...
First, let's create a function to generate a random datetime between two datetimes:
#https://stackoverflow.com/a/553448/454773
from random import randrange
from datetime import timedelta
from datetime import datetime
def random_datetime(start=datetime.strptime('1/1/2019 1:30 PM','%d/%m/%Y %I:%M %p'),
end=datetime.strptime('31/1/2019 4:50 AM', '%d/%m/%Y %I:%M %p')):
"""
This function will return a random datetime between two datetime
objects.
"""
delta = end - start
int_delta = (delta.days * 24 * 60 * 60) + delta.seconds
random_second = randrange(int_delta)
return start + timedelta(seconds=random_second)
random_datetime()
datetime.datetime(2019, 1, 29, 20, 40, 17)
Now we can createa dummy dataframe with a couple of columns of random data:
import numpy as np
numsamples = 1000
df=pd.DataFrame({'val1':np.random.rand(numsamples)})
df['datetime']= [random_datetime() for i in range(numsamples)]
#Add some periodicity to the second column of random numbers...
df['val2']=df.apply(lambda x: x['val1']*(1+np.sin(2*np.pi*(x['datetime'].hour-6)/24)), axis=1)
df
val1 | datetime | val2 | |
---|---|---|---|
0 | 0.231440 | 2019-01-02 07:55:19 | 0.291342 |
1 | 0.266321 | 2019-01-22 01:14:58 | 0.009075 |
2 | 0.952032 | 2019-01-09 08:51:14 | 1.428048 |
3 | 0.643093 | 2019-01-12 09:13:53 | 1.097828 |
4 | 0.354115 | 2019-01-07 10:05:28 | 0.660788 |
... | ... | ... | ... |
995 | 0.042920 | 2019-01-17 18:38:57 | 0.042920 |
996 | 0.208727 | 2019-01-30 04:19:08 | 0.104364 |
997 | 0.249041 | 2019-01-02 03:52:03 | 0.072942 |
998 | 0.484270 | 2019-01-07 17:22:07 | 0.609609 |
999 | 0.765707 | 2019-01-03 03:24:35 | 0.224270 |
1000 rows × 3 columns
# median value by hour
df.groupby([df['datetime'].dt.hour]).median()
val1 | val2 | |
---|---|---|
datetime | ||
0 | 0.401714 | 0.000000 |
1 | 0.611881 | 0.020849 |
2 | 0.552599 | 0.074034 |
3 | 0.539660 | 0.158063 |
4 | 0.441285 | 0.220642 |
5 | 0.557000 | 0.412838 |
6 | 0.588485 | 0.588485 |
7 | 0.471580 | 0.593633 |
8 | 0.509826 | 0.764738 |
9 | 0.498654 | 0.851255 |
10 | 0.366607 | 0.684097 |
11 | 0.431500 | 0.848297 |
12 | 0.379588 | 0.759176 |
13 | 0.523362 | 1.028892 |
14 | 0.436434 | 0.814397 |
15 | 0.413217 | 0.705405 |
16 | 0.329830 | 0.494745 |
17 | 0.686149 | 0.863737 |
18 | 0.514521 | 0.514521 |
19 | 0.485645 | 0.359951 |
20 | 0.472008 | 0.236004 |
21 | 0.496440 | 0.145404 |
22 | 0.566665 | 0.075919 |
23 | 0.535262 | 0.018239 |
ax = df.groupby([df['datetime'].dt.hour]).median().reset_index().plot(kind='scatter',
x='datetime',y='val1')
df.groupby([df['datetime'].dt.hour]).median().reset_index().plot(kind='scatter',
x='datetime',y='val2', color='red', ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x11ff5d890>