Python | Pandas Series.dt.to_pydatetime

NumPy | Python Methods and Functions

Series.dt.to_pydatetime()

!!! Python’s datetime uses microsecond resolution, which is lower than pandas (nanosecond). The values are truncated.
>>> s = pd.Series(pd.date_range('20180310', periods=2))
>>> s
0   2018-03-10
1   2018-03-11
dtype: datetime64[ns]
>>> s.dt.to_pydatetime()
array([datetime.datetime(2018, 3, 10, 0, 0),
       datetime.datetime(2018, 3, 11, 0, 0)], dtype=object)
>>> s = pd.Series(pd.date_range('20180310', periods=2, freq='ns'))
>>> s
0   2018-03-10 00:00:00.000000000
1   2018-03-10 00:00:00.000000001
dtype: datetime64[ns]
>>> s.dt.to_pydatetime()
array([datetime.datetime(2018, 3, 10, 0, 0),
       datetime.datetime(2018, 3, 10, 0, 0)], dtype=object)



Pandas '.to_pydatetime()' not working inside a DataFrame

StackOverflow question

I have strings like '03-21-2019' that I want to convert to the native Python datetime object: that is, of the datetime.datetime type. The conversion is easy enough through pandas:

import pandas as pd
import datetime as dt

date_str = '03-21-2019'
pd_Timestamp = pd.to_datetime(date_str)
py_datetime_object = pd_Timestamp.to_pydatetime()
print(type(py_datetime_object))

with the result

<class 'datetime.datetime'>

This is precisely what I want, since I want to compute timedelta's by subtracting one of these from another - perfectly well-defined in the native Python datetime.datetime class. However, my data is in a pd.DataFrame. When I try the following code:

import pandas as pd
import datetime as dt

df = pd.DataFrame(columns=['Date'])
df.loc[0] = ['03-21-2019']
df['Date'] = df['Date'].apply(lambda x:
                              pd.to_datetime(x).to_pydatetime())
print(type(df['Date'].iloc[0]))

the result is

<class 'pandas._libs.tslibs.timestamps.Timestamp'>

This is the WRONG type, and I can't for the life of me figure out why only part of the lambda expression is getting evaluated (that is, string-to-pandas-Timestamp), and not the last part (that is, pandas-Timestamp-to-datetime.datetime). It doesn't work if I define the function explicitly, either, instead of using a lambda expression:

import pandas as pd
import datetime as dt


def to_native_datetime(date_str: str) -> dt.datetime:
    return pd.to_datetime(date_str).to_pydatetime()


df = pd.DataFrame(columns=['Date'])
df.loc[0] = ['03-21-2019']
df['Date'] = df['Date'].apply(to_native_datetime)
print(type(df['Date'].iloc[0]))

The result is the same as before. It's definitely doing part of the function, as the result is not a string anymore. But I want the native Python datetime.datetime object, and I see no way of getting it. This looks like a bug in pandas, but I'm certainly willing to see it as user error on my part.

Why can't I get the native datetime.datetime object out of a pandas.DataFrame string column?

I have looked at this thread and this one, but neither of them answer my question.

[EDIT]: Here's something even more bizarre:

import pandas as pd
import datetime as dt


def to_native_datetime(date_str: str) -> dt.datetime:
    return dt.datetime.strptime(date_str, '%m-%d-%Y')


df = pd.DataFrame(columns=['Date'])
df.loc[0] = ['03-21-2019']
df['Date'] = df['Date'].apply(to_native_datetime)
print(type(df['Date'].iloc[0]))

Here I'm not even using pandas to convert the string, and I STILL get a

<class 'pandas._libs.tslibs.timestamps.Timestamp'>

out of it!

Many thanks for your time!

[FURTHER EDIT]: Apparently, in this thread, in Nehal J Wani's answer, it comes out that pandas automatically converts back to its native datetime format when you assign into a pd.DataFrame. This is not what I wanted to hear, but apparently, I'm going to have to convert on-the-fly when I read out of the pd.DataFrame.

Answer:

Depending on what your actual goal is, you've a couple options you didn't mention directly.

1) If you have a static datetime object or a column of (pandas) Timestamps, and you're willing to deal with the Pandas version of a Timedelta (pandas._libs.tslibs.timedeltas.Timedelta), you can do the subtraction directly in pandas:

df = pd.DataFrame(columns=['Date'])
df.loc[0] = [pd.to_datetime('03-21-2019')]
df.loc[:, 'Offset'] = pd.Series([datetime.now()])
df.loc[:, 'Diff1'] = df['Offset'] - df['Date']
df.loc[:, 'Diff2'] = df['Date'] - datetime.now()

2) If you don't care about Dataframes, but are willing to deal with lists / numpy arrays, you can convert the datetimes to python-native datetimes by operating on the series rather than on individual elements. Below, arr is a numpy.ndarray of datetime.datetime objects. You can change it to a regular list of datetime with list(arr):

arr = df['Date'].dt.to_pydatetime()



to_pydatetime Source

def to_pydatetime(self) -> np.ndarray:
        """
        Return the data as an array of native Python datetime objects.
        Timezone information is retained if present.
        .. warning::
           Python's datetime uses microsecond resolution, which is lower than
           pandas (nanosecond). The values are truncated.
        Returns
        -------
        numpy.ndarray
            Object dtype array containing native Python datetime objects.
        See Also
        --------
        datetime.datetime : Standard library value for a datetime.
        Examples
        --------
        >>> s = pd.Series(pd.date_range('20180310', periods=2))
        >>> s
        0   2018-03-10
        1   2018-03-11
        dtype: datetime64[ns]
        >>> s.dt.to_pydatetime()
        array([datetime.datetime(2018, 3, 10, 0, 0),
               datetime.datetime(2018, 3, 11, 0, 0)], dtype=object)
        pandas' nanosecond precision is truncated to microseconds.
        >>> s = pd.Series(pd.date_range('20180310', periods=2, freq='ns'))
        >>> s
        0   2018-03-10 00:00:00.000000000
        1   2018-03-10 00:00:00.000000001
        dtype: datetime64[ns]
        >>> s.dt.to_pydatetime()
        array([datetime.datetime(2018, 3, 10, 0, 0),
               datetime.datetime(2018, 3, 10, 0, 0)], dtype=object)
        """
        return self._get_values().to_pydatetime()



Archived version

Series.dt can be used to access series values ​​as datetimelike and return multiple properties. Series.dt.to_pydatetime() Pandas Series.dt.to_pydatetime() returns data as an array of its own objects Python dates and times. Time zone information is saved, if available.

Syntax: Series.dt.to_pydatetime ()

Parameter: None

Returns: numpy.ndarray

Example # 1: Use Series.dt .to_pydatetime () to return the given series object as an array of native Python datetime object.

# import pandas as pd

import pandas as pd

 
# Create series

sr = pd.Series ([ '2012-12-31' , '2019-1-1 12:30' , ' 2008-02-2 10:30' ,

'2010-1-1 09:25' , ' 2019-12-31 00 : 00' ])

 
# Create index

idx = [ 'Day 1' , ' Day 2' , 'Day 3' , 'Day 4' , ' Day 5' ]

 
# set index

sr.index = idx

  
# Convert base data to date and time

sr = pd.to_datetime (sr)

  
# Print series

print (sr)

Exit:

We will now use Series.dt.to_pydatetime ( ) to return the data as an array of Python date and time objects.

# return the series data as
# native python date data

result = sr.dt.to_pydatetime () 

 
# print the result

print (result)

Output:

As we can see from the output, Series.dt.to_pydatetime () successfully returned the underlying data of this series object to as a Python date and time data array.

Example # 2: Use Series.dt.to_pydatetime () to return a given series object as an object array Python date and time.

# import pandas as pd

import pandas as pd

  
# Create series

sr = pd.Series (pd.date_range ( '2012-12-31 00:00' periods = 5 , freq = 'D' ,

tz = 'US / Central' ))

  
# Create index

idx = [ ' Day 1' , 'Day 2' , 'Day 3' , 'Day 4' , ' Day 5' ]

 
# set index

sr.index = idx

  
# Print series

print (sr)

Output:

We will now use Series.dt.to_pydatetime () to return the data as an array of Python date and time objects.

# return series data in ide
# native python date data

result = sr.dt.to_pydatetime () 

 
# print the result

print (result)

Output:

As we can see from the output, Series.dt.to_pydatetime () successfully returned the underlying data for the given series object as a Python date / time array.





Get Solution for free from DataCamp guru