Quantcast
Viewing all articles
Browse latest Browse all 14069

Complex Poalrs Operation Using Subqueries and Threshold first hits

df_original represents the ice-cream inspector on station A and B:

df_original = pl.DataFrame(    {"station": ["A", "A", "A", "A", "B", "B", "B", "B"],"ice_cream_date": [1, 2, 3, 4, 1, 2, 3, 4],"customers": [10, 20, 30, 5, 5, 7, 4, 10],"event": [0, 1, 0, 1, 1, 0, 1, 0],    })
  • ice_cream_date is the date encoded.
  • customers represent the number of customers
  • event repersent the binary encoding where an inspector

df_evnets is where event == 1 in df_original:

df_events =  pl.DataFrame(    {"events": [1, 1, 1, 1],"ice_cream_date": [2, 4, 1, 3],"station": ["A", "A", "B", "B"],"customers": [20, 5, 5, 4],"evaluation_span": [10, 2, 2, 2],"evaluation_end_date": [4, 4, 3, 4],"evaluation_end_date_customer": [5, 5, 4, 10],"good_customer": [30, 10, 8, 12],"bad_customer": [5, 0, 4, 5],    })
  • evaluation_span is the number of days after the inspection date
  • evaluation_end_date is the latest day for the inspection. (if the date+evaluation_span > max avaliable date, the evaluation_end_date is the max avaliable date.)
  • evaluation_end_date_customer is the number of customers at the evaluation_end_date in df_original
  • good_customer is the good threshold for an inspector to give the station "good" rating on a particular ice_cream_date
  • bad_customer: the bad threshold

Struglling with:I want to label whether a station has being given good or bad rating for every non-zero event.

If an inspector sees the number of customers exceeds(dips below) the good(bad) threshold first, the station will be given good(bad) rating no matter of the length of the event span

Expected output:

shape: (4, 7)┌────────────────┬─────────┬────────────────────┬──────┬─────┬────────────────┬────────────────────┐│ ice_cream_date ┆ station ┆ evaluation_custome ┆ good ┆ bad ┆ evaluation_end ┆ actual_evaluation_ ││ ---            ┆ ---     ┆ rs                 ┆ ---  ┆ --- ┆ ---            ┆ end_date           ││ i64            ┆ str     ┆ ---                ┆ i64  ┆ i64 ┆ i64            ┆ ---                ││┆┆ i64                ┆┆┆┆ i64                │╞════════════════╪═════════╪════════════════════╪══════╪═════╪════════════════╪════════════════════╡│ 2              ┆ A       ┆ 30                 ┆ 1    ┆ 0   ┆ 0              ┆ 3                  ││ 4              ┆ A       ┆ 5                  ┆ 0    ┆ 0   ┆ 1              ┆ 4                  ││ 1              ┆ B       ┆ 4                  ┆ 0    ┆ 1   ┆ 1              ┆ 3                  ││ 3              ┆ B       ┆ 10                 ┆ 0    ┆ 0   ┆ 1              ┆ 4                  │└────────────────┴─────────┴────────────────────┴──────┴─────┴────────────────┴────────────────────┘
  • evaluation_customers is the number of customers used for the evaluation
  • good is the good binary label, indicating that during the (event_date, event_date+evaluation_span], an inspector
  • bad is the bad binary label.
  • evaluation_end is the evaluation end label, indicating that the number of customers did not exceeds(dips below) the good(bad) threshold during (event_date, event_date+span)
  • actual_evaluation_end_date is the date where the evaluation ends (good or bad threshold reached)

Viewing all articles
Browse latest Browse all 14069

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>