Suppose I have a list of list of numbers that happen to be encoded as strings.
import pandas as pdpylist = [['1', '43'], ['2', '42'], ['3', '41'], ['4', '40'], ['5', '39']]Now I want a dataframe where these numbers are integers.I can see from pandas documentation that I can force a data type via dtype, but when I run the following:
pyframe_1 = pd.DataFrame(pylist,dtype=int) I get the following warning:
FutureWarning: Could not cast to int32, falling back to object. This behavior is deprecated. In a future version, when a dtype is passed to 'DataFrame', either all columns will be cast to that dtype, or a TypeError will be raised.and by inspection via dtypes:
pytypes_1 = pyframe_1.dtypes.to_list() # dtype[object_] of numpy module
my columns are np.object types.
But I can cast my columns to integer via two ways:
First one is column by column:
pyframe_2 = pd.DataFrame(pylist)pyframe_2[0] = pyframe_2[0].astype(int)pyframe_2[1] = pyframe_1[1].astype(int)Second one is on the entire dataframe in an one-liner:
pyframe_3 = pd.DataFrame(pylist).astype(int)Both give me a dataframe of integer columns from a list of list of strings.
My question is why does the first case, where I explicitly use dtype when creating a dataframe raise a warning (or error) with no conversion for the types? Why even have it as an option in the first place?
EDIT:Pandas version I'm running is 1.4.1.
EDIT:As per suggestions of @mozway one workaround is usingpyframe_1 = pd.DataFrame(pylist,dtype='Int32')
Which does convert to integer. I mean, to me it's kinda unnatural using a string (which Int32 is) to force a cast instead of using much more intuitive int. Inspecting dtypes from the method, I get different integer types.Casting with dtype='Int32' at instantiation level gets me Int32Dtype object of pandas.core.arrays.integer module. (Upon closer inspection it has an attribute of numpy_dtype which is dtype[int32] object of numpy module).Casting with .astype(int) gives me dtype[int32] object of numpy module. So there's not much difference, I guess? IDK.