I have data in a csv file with dates stored as strings in a standard UK format -
%d/%m/%Y - meaning they look like:
The examples above represent 12 January 2012 and 30 January 2012.
When I import this data with pandas version 0.11.0 I applied the following transformation:
import pandas as pd ... cpts.Date = cpts.Date.apply(pd.to_datetime)
but it converted dates inconsistently. To use my existing example, 12/01/2012 would convert as a datetime object representing 1 December 2012 but 30/01/2012 converts as 30 January 2012, which is what I want.
After looking at this question I tried:
cpts.Date = cpts.Date.apply(pd.to_datetime, format='%d/%m/%Y')
but the results are exactly the same. The source code suggests I'm doing things right so I'm at a loss. Does anyone know what I'm doing wrong?
You can use the
parse_dates option from
read_csv to do the conversion directly while reading you data.
The trick here is to use
dayfirst=True to indicate your dates start with the day and not with the month. See here for more information: http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.parsers.read_csv.html
When your dates have to be the index:
>>> import pandas as pd >>> from StringIO import StringIO >>> s = StringIO("""date,value ... 12/01/2012,1 ... 12/01/2012,2 ... 30/01/2012,3""") >>> >>> pd.read_csv(s, index_col=0, parse_dates=True, dayfirst=True) value date 2012-01-12 1 2012-01-12 2 2012-01-30 3
Or when your dates are just in a certain column:
>>> s = StringIO("""date ... 12/01/2012 ... 12/01/2012 ... 30/01/2012""") >>> >>> pd.read_csv(s, parse_dates=, dayfirst=True) date 0 2012-01-12 00:00:00 1 2012-01-12 00:00:00 2 2012-01-30 00:00:00
I think you are calling it correctly, and I posted this as an issue on github.
You can just specify the format to
to_datetime directly, for example:
In : s = pd.Series(['12/1/2012', '30/01/2012']) In : pd.to_datetime(s, format='%d/%m/%Y') Out: 0 2012-01-12 00:00:00 1 2012-01-30 00:00:00 dtype: datetime64[ns]
Update: As OP correctly points out this doesn't work with NaN, if you are happy with
dayfirst=True (which works with NaN too):
Worth noting that have to be careful using
dayfirst (which is easier than specifying the exact format), since
dayfirst isn't strict.