Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

Is this a pandas dataframe filter bug?

$
0
0

I am working in pandas with a sql database;I have a production dataframe similar to this:

DATEMACHINE_IDTYPEIS_NULL
Date 1Id 1150
Date 2Id 271

I'm trying to clean the data so I'm doing some filtering.

I previously tried to do the filtering directly with a sql query using this:

SELECT * FROM MACHINE_TABLE WHERE TYPE IN (5,6,8,9) AND COALESCE(IS_NULL,0) = 0

and I get 13221 rowsbut when I do it in pandas with these codes I get the following results

machine_data = machine_data[machine_data['TYPE'].isin({5,6,8,9}) & machine_data['IS_NULL'] == 0]machine_data2 = machine_data[machine_data['TYPE'].isin({5,6,8,9})]

for the first dataframe I get 14810 rows, and for the second dataframe I get only 14794.

Neither of these are equal to the result from sql query (considering the number of rows)

What I feel is a strange behavoir is that if I'm applying a second filter to my dataframe my expected number of rows should be less or equal.

Am I missing something?

I looked at the diff rows between this dataframes

machine_data.apply(lambda x: machine_data.loc[~x.isin(machine_data2[x.name]),x.name])
DATEMACHINE_IDTYPEIS_NULL
Date 1Id 199Nan
Date 2Id 299Nan

Which is even more weird for me.


Viewing all articles
Browse latest Browse all 23131

Trending Articles