I am working on my pandas skills and have been stuck on an exercise for a while:
I have created a DF with bike test data. Tests are ordered by time (ascending test_id). I would like to get the most recent fail test (max test_id) for each bike+test_type group and count how many consecutive fails appeared in total (without interruption from a pass test for same group). Additionally, I would like to have another ends_with_pass column which tells us if a fail bike+test_type group is followed up by a pass test. How can I achieve this? I have the following code but don't know how to proceed:
fail = data[data["test_result"] == "fail"]max_fail_tests = fail.groupby(["bike", "test_type"])["test_id"].max().reset_index()res = pd.merge(max_failed_tests, data, on=["bike", "test_type"])res["fails_in_a_row"] = res.groupby( ["bike", "test_type"])["test_result"].apply( lambda x: ( x.eq("fail") & x.shift().ne("pass") ).cumsum())Given input:
| test_id | bike | test_type | test_result ||---------|------|-----------|-------------|| 1 | a | slow | pass || 1 | a | fast | pass || 15 | c | fast | pass || 15 | c | slow | pass || 34 | b | slow | fail | <-| 34 | b | fast | fail | <- 1st fail for b| 36 | a | slow | pass | | 36 | a | fast | pass || 37 | c | fast | fail | <- | 37 | c | slow | fail | <- 1st fail for c| 87 | c | fast | fail | <- | 87 | c | slow | fail | <- 2nd consecutive fail for c| 99 | b | slow | fail | <-| 99 | b | fast | fail | <- 2nd consecutive fail for b. Followed by pass, therefore `ends_with_pass` = `yes`| 124 | b | slow | pass || 124 | b | fast | pass |Expected output:
| bike | test_type | fails_in_a_row | ends_with_pass ||------|-----------|----------------|----------------|| b | fast | 2 | yes || b | slow | 2 | yes || c | fast | 2 | no || c | slow | 2 | no |