Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

Calculating head-to-head records in a DataFrame based on target values in Python

$
0
0

I have a Python script that processes tennis match data stored in a Pandas DataFrame (tennis_data_processed). Each row represents a single match from 2010 to 2023, including details about the tournament, match, and the two players involved. There's also a target variable that indicates whether Player1 won (1) or Player1 lost (0).

I'm attempting to add two new features, player1_h2h and player2_h2h, which represents the head-to-head record of players in each match. The idea is to count the number of wins each player has against the other player before the current match.

The code I've implemented works well when the target is 1 (indicating Player1 won). However, there's an issue when the target is 0 (indicating Player1 lost and therefore player2 won). In this case, the head-to-head record is not updating correctly for the subsequent match, as it adds the win to the player that lost.

The table I have before trying to create the h2h features is:

tourney_dateplayer1_idplayer2_idtarget
2012-01-16AB1
2012-01-16CD0
2012-03-27BA1
2012-03-27DC0
2012-04-29AB1

The table i want as a result (I'll show what it should look like for a head-to-head between two specific players, but it should be done for all matches):

tourney_dateplayer1_idplayer2_idtargetplayer1_h2hplayer2_h2h
2012-01-16AB000
2012-01-27AB001
2012-03-14BA120
2015-01-20AB003
2020-10-07BA140
2020-10-15AB105
2020-10-15BA151

To do this I have the following code:

def calculate_head2head(row, player1_col, player2_col, target_col):    # Identify player1 and player2 based on the target    player1_id = row[player1_col] if row[target_col] == 1 else row[player2_col]    player2_id = row[player2_col] if row[target_col] == 1 else row[player1_col]    # Identify the player who won in the previous row    prev_target = 1 - row[target_col]  # Switch 1 to 0 and vice versa    prev_won_player_id = row[player2_col] if prev_target == 1 else row[player1_col]    # Filter relevant matches for head-to-head calculation    matches = tennis_data_processed[        ((tennis_data_processed[player1_col] == player1_id) & (tennis_data_processed[player2_col] == player2_id)) |        ((tennis_data_processed[player1_col] == player2_id) & (tennis_data_processed[player2_col] == player1_id))    ]    # Count the number of wins for the players    player1_wins = matches[(matches[target_col] == 1) & (matches['tourney_date'] < row['tourney_date'])].shape[0]    player2_wins = matches[(matches[target_col] == 0) & (matches['tourney_date'] < row['tourney_date'])].shape[0]    # Adjust wins if player1 is now in player2 column    if row[target_col] == 0:        player1_wins, player2_wins = player2_wins, player1_wins        prev_won_player_id = player2_id  # Update the previous winner to player2    prev_matches = tennis_data_processed.loc[        (tennis_data_processed.index < row.name) &         (tennis_data_processed[target_col] == prev_target)    ].sort_values(by='tourney_date', ascending=False)    if not prev_matches.empty:        if row['tourney_date'] > prev_matches.iloc[0]['tourney_date']:            if prev_target == 0:                player2_wins += 1            else:                player1_wins += 1    return player1_wins, player2_wins# Apply the function row-wise to calculate head-to-head recordstennis_data_processed[['player1_h2h', 'player2_h2h']] = tennis_data_processed.apply(    lambda row: calculate_head2head(row, 'player1_id', 'player2_id', 'target'),    axis=1,    result_type='expand')

But with this code the resulting DataFrame is:

tourney_dateplayer1_idplayer2_idtargetplayer1_h2hplayer2_h2h
2012-01-16AB000
2012-01-27AB010
2012-03-14BA102
2015-01-20AB021
2020-10-07BA113
2020-10-15AB141

With my code, when the target is 0 (indicating Player1 lost and therefore player2 won)the head-to-head record is beeing updated to the player that lost the previous match. And when target is 1, it adds the win to the correct player.


Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>