Specifying date format when converting with pandas.to_datetime


Question

I have data in a csv file with dates stored as strings in a standard UK format - %d/%m/%Y - meaning they look like:

12/01/2012
30/01/2012

The examples above represent 12 January 2012 and 30 January 2012.

When I import this data with pandas version 0.11.0 I applied the following transformation:

import pandas as pd
...
cpts.Date = cpts.Date.apply(pd.to_datetime)

but it converted dates inconsistently. To use my existing example, 12/01/2012 would convert as a datetime object representing 1 December 2012 but 30/01/2012 converts as 30 January 2012, which is what I want.

After looking at this question I tried:

cpts.Date = cpts.Date.apply(pd.to_datetime, format='%d/%m/%Y')

but the results are exactly the same. The source code suggests I'm doing things right so I'm at a loss. Does anyone know what I'm doing wrong?

1
21
5/23/2017 12:32:23 PM

Accepted Answer

You can use the parse_dates option from read_csv to do the conversion directly while reading you data.
The trick here is to use dayfirst=True to indicate your dates start with the day and not with the month. See here for more information: http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.parsers.read_csv.html

When your dates have to be the index:

>>> import pandas as pd
>>> from StringIO import StringIO
>>> s = StringIO("""date,value
... 12/01/2012,1
... 12/01/2012,2
... 30/01/2012,3""")
>>> 
>>> pd.read_csv(s, index_col=0, parse_dates=True, dayfirst=True)
            value
date             
2012-01-12      1
2012-01-12      2
2012-01-30      3

Or when your dates are just in a certain column:

>>> s = StringIO("""date
... 12/01/2012
... 12/01/2012
... 30/01/2012""")
>>> 
>>> pd.read_csv(s, parse_dates=[0], dayfirst=True)
                 date
0 2012-01-12 00:00:00
1 2012-01-12 00:00:00
2 2012-01-30 00:00:00
21
5/21/2013 2:31:23 PM

I think you are calling it correctly, and I posted this as an issue on github.

You can just specify the format to to_datetime directly, for example:

In [1]: s = pd.Series(['12/1/2012', '30/01/2012'])

In [2]: pd.to_datetime(s, format='%d/%m/%Y')
Out[2]:
0   2012-01-12 00:00:00
1   2012-01-30 00:00:00
dtype: datetime64[ns]

Update: As OP correctly points out this doesn't work with NaN, if you are happy with dayfirst=True (which works with NaN too):

s.apply(pd.to_datetime, dayfirst=True)

Worth noting that have to be careful using dayfirst (which is easier than specifying the exact format), since dayfirst isn't strict.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon