df_original represents the ice-cream inspector on station A and B:
df_original = pl.DataFrame( {"station": ["A", "A", "A", "A", "B", "B", "B", "B"],"ice_cream_date": [1, 2, 3, 4, 1, 2, 3, 4],"customers": [10, 20, 30, 5, 5, 7, 4, 10],"event": [0, 1, 0, 1, 1, 0, 1, 0], })ice_cream_dateis the date encoded.customersrepresent the number of customerseventrepersent the binary encoding where an inspector
df_evnets is where event == 1 in df_original:
df_events = pl.DataFrame( {"events": [1, 1, 1, 1],"ice_cream_date": [2, 4, 1, 3],"station": ["A", "A", "B", "B"],"customers": [20, 5, 5, 4],"evaluation_span": [10, 2, 2, 2],"evaluation_end_date": [4, 4, 3, 4],"evaluation_end_date_customer": [5, 5, 4, 10],"good_customer": [30, 10, 8, 12],"bad_customer": [5, 0, 4, 5], })evaluation_spanis the number of days after the inspection dateevaluation_end_dateis the latest day for the inspection. (if the date+evaluation_span > max avaliable date, the evaluation_end_date is the max avaliable date.)evaluation_end_date_customeris the number of customers at theevaluation_end_dateindf_originalgood_customeris the good threshold for an inspector to give the station "good" rating on a particularice_cream_datebad_customer: the bad threshold
Struglling with:I want to label whether a station has being given good or bad rating for every non-zero event.
If an inspector sees the number of customers exceeds(dips below) the good(bad) threshold first, the station will be given good(bad) rating no matter of the length of the event span
Expected output:
shape: (4, 7)┌────────────────┬─────────┬────────────────────┬──────┬─────┬────────────────┬────────────────────┐│ ice_cream_date ┆ station ┆ evaluation_custome ┆ good ┆ bad ┆ evaluation_end ┆ actual_evaluation_ ││ --- ┆ --- ┆ rs ┆ --- ┆ --- ┆ --- ┆ end_date ││ i64 ┆ str ┆ --- ┆ i64 ┆ i64 ┆ i64 ┆ --- ││┆┆ i64 ┆┆┆┆ i64 │╞════════════════╪═════════╪════════════════════╪══════╪═════╪════════════════╪════════════════════╡│ 2 ┆ A ┆ 30 ┆ 1 ┆ 0 ┆ 0 ┆ 3 ││ 4 ┆ A ┆ 5 ┆ 0 ┆ 0 ┆ 1 ┆ 4 ││ 1 ┆ B ┆ 4 ┆ 0 ┆ 1 ┆ 1 ┆ 3 ││ 3 ┆ B ┆ 10 ┆ 0 ┆ 0 ┆ 1 ┆ 4 │└────────────────┴─────────┴────────────────────┴──────┴─────┴────────────────┴────────────────────┘evaluation_customersis the number of customers used for the evaluationgoodis the good binary label, indicating that during the(event_date, event_date+evaluation_span], an inspectorbadis the bad binary label.evaluation_endis the evaluation end label, indicating that the number of customers did not exceeds(dips below) the good(bad) threshold during(event_date, event_date+span)actual_evaluation_end_dateis the date where the evaluation ends (good or bad threshold reached)