i'm very new to python transformation world using pandas! I'm here to asking some code problem stuck by myself.For my case, i'm using python pandas to filter row value without eliminate them - as just select them out without delete them from the dataframe column within that dataframe column. However, within that dataframe column consist of mixed of float(value more than 2 decimal places) and integer (consist only one decimal place which the output as x.0
)for this dataframe column, in python pandas dataframe, it does default assume that column's datatype to float64
, so it's hard for me to determine/filter using isinstance()
and alternative function/ way to deal with this type mixed data type column without eliminate the unwanted value
Below is a mock of my dataset which structure from a dataframe:
data = {'time: [00.00.01,00.00.02,00.00.03,00.00.04,00.00.05,00.00.06,00.00.07,00.00.08,00.00.09,00.00.10,00.00.11,00.00.12,00.00.13,00.00.14,00.00.15,00.00.16,00.00.17,00.00.18,00.00.19,00.00.20,00.00.21,00.00.22,00.00.23,00.00.24],'value': [1.234, 2.345, 3.456, 4.567, 5.678, 6.789,1,2,3,4,5,6,1.234, 2.345, 3.456, 4.567, 5.678, 6.789,1,2,3,4,5,6]data = pd.DataFrame(data)
while print out as a dataframe, the data looks like:
> time value1 00.00.1 1.2342 00.00.2 2.3453 00.00.3 3.4564 00.00.4 4.5675 00.00.5 5.6786 00.00.6 6.7897 00.00.7 1.08 00.00.8 2.09 00.00.9 3.010 00.00.10 4.011 00.00.11 5.012 00.00.12 6.013 00.00.13 1.23414 00.00.14 2.34515 00.00.15 3.45616 00.00.16 4.56717 00.00.17 5.67818 00.00.18 6.78919 00.00.19 1.020 00.00.10 2.021 00.00.21 3.022 00.00.22 4.023 00.00.23 5.024 00.00.24 6.0
I've tried for the below solution to fix this problem with.-My solution-this will remove the values that with only one decimal place while keep the value more than one decimal places.
value = data[(data['value'].astype(str).str.split('.').str[1].str.len()>1)]
and the printed output might be like:
> time value0 00.00.1 1.2341 00.00.2 2.3452 00.00.3 3.4563 00.00.4 4.5674 00.00.5 5.6785 00.00.6 6.78913 00.00.13 1.23414 00.00.14 2.34515 00.00.15 3.45616 00.00.16 4.56717 00.00.17 5.67818 00.00.18 6.789
I'm expecting a dataframe could get a result like this without eliminate any unwanted row values even though there is undergoing a filter to select only the value with only one decimal place:
> time value1 00.00.1 1.2342 00.00.2 2.3453 00.00.3 3.4564 00.00.4 4.5675 00.00.5 5.6786 00.00.6 6.7897 00.00.7 1.08 00.00.8 2.09 00.00.9 3.010 00.00.10 4.011 00.00.11 5.012 00.00.12 6.013 00.00.13 1.23414 00.00.14 2.34515 00.00.15 3.45616 00.00.16 4.56717 00.00.17 5.67818 00.00.18 6.78919 00.00.19 1.020 00.00.10 2.021 00.00.21 3.022 00.00.22 4.023 00.00.23 5.024 00.00.24 6.0
I really do need your help to fix this problem, thanks in advance!