I am working in pandas with a sql database;I have a production dataframe similar to this:
| DATE | MACHINE_ID | TYPE | IS_NULL |
|---|---|---|---|
| Date 1 | Id 1 | 15 | 0 |
| Date 2 | Id 2 | 7 | 1 |
I'm trying to clean the data so I'm doing some filtering.
I previously tried to do the filtering directly with a sql query using this:
SELECT * FROM MACHINE_TABLE WHERE TYPE IN (5,6,8,9) AND COALESCE(IS_NULL,0) = 0and I get 13221 rowsbut when I do it in pandas with these codes I get the following results
machine_data = machine_data[machine_data['TYPE'].isin({5,6,8,9}) & machine_data['IS_NULL'] == 0]machine_data2 = machine_data[machine_data['TYPE'].isin({5,6,8,9})]for the first dataframe I get 14810 rows, and for the second dataframe I get only 14794.
Neither of these are equal to the result from sql query (considering the number of rows)
What I feel is a strange behavoir is that if I'm applying a second filter to my dataframe my expected number of rows should be less or equal.
Am I missing something?
I looked at the diff rows between this dataframes
machine_data.apply(lambda x: machine_data.loc[~x.isin(machine_data2[x.name]),x.name])| DATE | MACHINE_ID | TYPE | IS_NULL |
|---|---|---|---|
| Date 1 | Id 1 | 99 | Nan |
| Date 2 | Id 2 | 99 | Nan |
Which is even more weird for me.