Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23218

Confusing conversion of types in pandas DataFrame

$
0
0

Suppose I have a list of list of numbers that happen to be encoded as strings.

import pandas as pdpylist = [['1', '43'], ['2', '42'], ['3', '41'], ['4', '40'], ['5', '39']]

Now I want a dataframe where these numbers are integers.I can see from pandas documentation that I can force a data type via dtype, but when I run the following:

pyframe_1 = pd.DataFrame(pylist,dtype=int) 

I get the following warning:

FutureWarning: Could not cast to int32, falling back to object. This behavior is deprecated. In a future version, when a dtype is passed to 'DataFrame', either all columns will be cast to that dtype, or a TypeError will be raised.

and by inspection via dtypes:

pytypes_1 = pyframe_1.dtypes.to_list() # dtype[object_] of numpy module

my columns are np.object types.

But I can cast my columns to integer via two ways:

First one is column by column:

pyframe_2 = pd.DataFrame(pylist)pyframe_2[0] = pyframe_2[0].astype(int)pyframe_2[1] = pyframe_1[1].astype(int)

Second one is on the entire dataframe in an one-liner:

pyframe_3 = pd.DataFrame(pylist).astype(int)

Both give me a dataframe of integer columns from a list of list of strings.

My question is why does the first case, where I explicitly use dtype when creating a dataframe raise a warning (or error) with no conversion for the types? Why even have it as an option in the first place?

EDIT:Pandas version I'm running is 1.4.1.

EDIT:As per suggestions of @mozway one workaround is usingpyframe_1 = pd.DataFrame(pylist,dtype='Int32')

Which does convert to integer. I mean, to me it's kinda unnatural using a string (which Int32 is) to force a cast instead of using much more intuitive int. Inspecting dtypes from the method, I get different integer types.Casting with dtype='Int32' at instantiation level gets me Int32Dtype object of pandas.core.arrays.integer module. (Upon closer inspection it has an attribute of numpy_dtype which is dtype[int32] object of numpy module).Casting with .astype(int) gives me dtype[int32] object of numpy module. So there's not much difference, I guess? IDK.


Viewing all articles
Browse latest Browse all 23218

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>