There's been some recent discussion of issues with importing the dates from certain types of Excel files into R. It turns out that there are two different ways for Excel to store dates, and in one of these cases importing dates directly from an Excel file into R changes the date by 4 years and 1 day. See the whole story told by Kara Woo. It takes a disciplined researcher to catch this kind of issue.
I was curious to see if the same problem occurred in Python, so I asked Kara to send me some sample data that triggered the same problem. The actual dates run from 1950-2001. Importing in R using XLConnect gives dates from 1946-1997.
import pandas
data = pandas.io.excel.read_excel("dummydata.xlsx", "Sheet1")
data
date | year | month | day | data | |
---|---|---|---|---|---|
0 | 1950-05-03 00:00:00 | 1950 | 5 | 3 | 4.5 |
1 | 1950-01-01 00:00:00 | 1950 | 1 | 1 | 8.9 |
2 | 1953-12-20 00:00:00 | 1953 | 12 | 20 | 2.2 |
3 | 1976-11-19 00:00:00 | 1976 | 11 | 19 | 5.6 |
4 | 1989-12-31 00:00:00 | 1989 | 12 | 31 | 6.6 |
5 | 2001-04-14 00:00:00 | 2001 | 4 | 14 | 4.1 |
Thanks Pandas!