I have a combined_vars_df
and surge_df
, wave_df
, and waterlevel_df
for 235 counties on the "Atlantic" and "GulfOfMexico" coasts. Their few lines of data information are given below. I am using dates of combined_vars_df
and selecting the same dates in surge_df
, wave_df
and waterlevel_df
, and calculating distances (using haversine
package) from longitude_precip
and latitude_precip
to latitudes
and longitudes
of surge, wave and waterlevel dfs and select the nearest locations of surge, wave and waterlevel and select their corresponding percentiles and other corresponding columns to add them in combined_vars_df. I am using the following conditions for calculating distances and finding the nearest location and its corresponding percentiles:
Conditions:
if
surge_point = Yes
, find the closest Storm Surge data points with their corresponding columns to that location and add them to combied_vars_dfif
wave_point = Yes
, find the closest wave data points with their percentiles to that location and add them to combied_vars_dfif
surge_point = Yes
, find the closest waterlevel data points with their corresponding percentiles to that location and add them to combied_vars_df
Also, I am using a condition for the wave_df to select df from if coast == "GulfOfMexico" or "Atlantic" because I have two datasets for waves, one for all the counties at "Atlantic" and another one for "GulfOfMexico".
I want to add all columns from surge, wave and waterlevel data and their values based on nearest locations. For example, creating new columns "Date_surge", "latitude_surge", "longitude_surge", "surge_percentiles", "Date_waterlevel", "latitude_waterlevel", "longitude_waterlevel", "waterlevel_percentiles", "Date_wave", "latitude_wave", "longitude_wave", "waveHs_percentiles" in combined_vars_df.
combined_vars_df
:
Time longitude_precip latitude_precip PRCP Percentiles_precip Date_precip longitude_soil_moisture latitude_soil_moisture swvl1 Percentiles_soil_moisture Date_soil_moisture time longitude_discharge latitude_discharge dis24 Percentiles_discharge Date_discharge surge_points wave_points1/1/1980 -75.925 37.525 6.96E-08 0.351962956 1/1/1980 -75.925 37.525 0.015750297 0.424171425 1/1/1980 1/1/1980 -75.925 37.525 0.31250007 0.404584409 1/1/1980 Yes No1/1/1980 -75.775 37.625 6.96E-08 0.359496898 1/1/1980 -75.775 37.625 0.09928183 0.425004301 1/1/1980 1/1/1980 -75.775 37.625 0.18750007 0.340279939 1/1/1980 Yes No1/1/1980 -75.475 37.875 6.96E-08 0.354216519 1/1/1980 -75.475 37.875 0.22655162 0.392982732 1/1/1980 1/1/1980 -75.475 37.875 0.10937507 0.441995239 1/1/1980 Yes No1/1/1980 -75.725 37.975 6.96E-08 0.376004926 1/1/1980 -75.725 37.975 0.25292561 0.398858982 1/1/1980 1/1/1980 -75.725 37.975 0.46875007 0.454758892 1/1/1980 Yes No1/1/1980 -75.475 37.925 6.96E-08 0.356204666 1/1/1980 -75.475 37.925 0.23879522 0.391525624 1/1/1980 1/1/1980 -75.475 37.925 0.39062507 0.383258258 1/1/1980 Yes No1/1/1980 -75.625 37.625 6.96E-08 0.351704216 1/1/1980 -75.625 37.625 0.09588592 0.425028448 1/1/1980 1/1/1980 -75.625 37.625 0.03125007 0.184426202 1/1/1980 Yes No1/1/1980 -75.575 37.925 6.96E-08 0.360239297 1/1/1980 -75.575 37.925 0.24256245 0.396662633 1/1/1980 1/1/1980 -75.575 37.925 0.17187507 0.321553118 1/1/1980 Yes No1/1/1980 -75.675 37.725 6.96E-08 0.35538695 1/1/1980 -75.675 37.725 0.16938361 0.424660739 1/1/1980 1/1/1980 -75.675 37.725 0.17187507 0.201397005 1/1/1980 Yes No
surge_df
:
Date_surge waterlevel_surge latitude_surge longitude_surge surge waterlevel (tide)_surge surge_percentiles1/1/1980 2107 44.634 -66.694 4.00000007 2103 0.71/1/1980 519 25.913 -81.753 -66.99999993 586 0.261/1/1980 106 41.235 -70.034 8.00000007 98 0.611/1/1980 1222 43.931 -69.302 -20.99999993 1243 0.5951/1/1980 819 39.067 -74.956 46.00000007 773 0.7866666671/1/1980 60.00000003 29.634 -91.567 -156.9999999 217 0.051/1/1980 156 29.78 -85.415 -92.99999993 249 0.13251/1/1980 1253 43.696 -70.21 -10.99999993 1264 0.61251/1/1980 631 30.308 -81.372 -109.9999999 741 0.2251/1/1980 85.00000003 30.278 -88.755 -221.9999999 307 0.051/1/1980 159 29.048 -89.106 -76.99999993 236 0.1051/1/1980 156 29.019 -89.282 -79.99999993 236 0.0951/1/1980 100 30.22 -88.462 -184.9999999 285 0.05
waterlevel_df
:
Date_waterlevel waterlevel latitude_waterlevel longitude_waterlevel surge_waterlevel waterlevel (tide)_waterlevel waterlevel_percentile1/1/1980 2107 44.634 -66.694 4.00000007 2103 0.8335714291/1/1980 519 25.913 -81.753 -66.99999993 586 0.5151/1/1980 106 41.235 -70.034 8.00000007 98 0.41751/1/1980 1222 43.931 -69.302 -20.99999993 1243 0.808751/1/1980 819 39.067 -74.956 46.00000007 773 0.8283333331/1/1980 60.00000003 29.634 -91.567 -156.9999999 217 0.051/1/1980 156 29.78 -85.415 -92.99999993 249 0.31751/1/1980 1253 43.696 -70.21 -10.99999993 1264 0.841251/1/1980 631 30.308 -81.372 -109.9999999 741 0.6116666671/1/1980 85.00000003 30.278 -88.755 -221.9999999 307 0.0833333331/1/1980 159 29.048 -89.106 -76.99999993 236 0.34251/1/1980 156 29.019 -89.282 -79.99999993 236 0.3275
wave_df
:
latitude longitude station_name waveHs Date waveHs_percentiles23.5 -83.5 61001 0.164062585 1/1/1980 23:00 0.13789775617.5 -72 61030 0.664062572 1/1/1980 23:00 0.0542.416672 -70.583298 63051 0.515625073 1/1/1980 23:00 0.20930151529.1667 -80.75 63428 1.343750072 1/1/1980 23:00 0.6599999936.5 -75.583298 63208 1.054687572 1/1/1980 23:00 0.43500000434.666698 -76.25 63271 0.750000072 1/1/1980 23:00 0.12499999532.833302 -79.25 63340 1.390625072 1/1/1980 23:00 0.68999998939.416698 -74.083298 63139 0.593750072 1/1/1980 23:00 0.08999999141.083328 -70.083298 63092 0.437500072 1/1/1980 23:00 0.0531.25 -81 63387 1.062500072 1/1/1980 23:00 0.594030164
I am using the following code, but I am not getting the data for surge, wave, and water level. I am getting empty columns. Could you please help me solve this issue?
def find_nearest_percentiles( combined_vars_df, surge_df, wave_df, waterlevel_df, coast, sheldus_df, window_size=3):""" Update the combined_vars_df DataFrame with the nearest percentile values of storm surge, wave, and water level based on the conditions of 'surge_points' and 'wave_points'. If 'surge_points' or 'wave_points' is 'No', the respective entry will be set to a null value. Additionally, consider the coast condition to differentiate between Gulf of Mexico and Atlantic datasets. Parameters: ----------- combined_vars_df : pd.DataFrame DataFrame containing combined variables data with 'longitude_precip', 'latitude_precip', 'Date_precip','surge_points', and 'wave_points' columns. surge_df : pd.DataFrame DataFrame containing storm surge data with 'longitude_surge', 'latitude_surge', 'Date_surge', and 'surge_percentiles' columns. wave_df : pd.DataFrame DataFrame containing wave data with 'longitude_wave', 'latitude_wave', 'Date_wave', and 'waveHs_percentiles' columns. waterlevel_df : pd.DataFrame DataFrame containing water level data with 'longitude_waterlevel', 'latitude_waterlevel', 'Date_waterlevel', and 'waterlevel_percentiles' columns. coast : str Coast information of the county ('GulfOfMexico' or 'Atlantic') to apply specific conditions. sheldus_df : pd.DataFrame DataFrame containing SHELDUS hazard data with 'Hazard_start' and 'Hazard_end' columns. window_size : int, optional The number of days to consider before and after the hazard start and end dates (default is 5). Returns: -------- pd.DataFrame The updated combined_vars_df with new columns 'nearest_surge', 'nearest_wave', and 'nearest_waterlevel' containing the nearest percentile values based on the specified conditions and coast.""" print("Finding the nearest percentile values for storm surge, wave, and water level..." ) # Initialize new columns for the nearest percentile values combined_vars_df["nearest_surge"] = None combined_vars_df["nearest_wave"] = None combined_vars_df["nearest_waterlevel"] = None # Convert date columns to datetime format for comparison combined_vars_df["Date_precip"] = pd.to_datetime(combined_vars_df["Date_precip"]) surge_df["Date_surge"] = pd.to_datetime(surge_df["Date_surge"]) wave_df["Date_wave"] = pd.to_datetime(wave_df["Date_wave"]) waterlevel_df["Date_waterlevel"] = pd.to_datetime(waterlevel_df["Date_waterlevel"]) # Remove Time from the Date column in wave_df and keep only the date if there are time values present wave_df["Date_wave"] = wave_df["Date_wave"].dt.date # Limit latitude and longitude to 6 decimal places for comparison surge_df["latitude_surge"] = surge_df["latitude_surge"].round(4) surge_df["longitude_surge"] = surge_df["longitude_surge"].round(4) wave_df["latitude_wave"] = wave_df["latitude_wave"].round(4) wave_df["longitude_wave"] = wave_df["longitude_wave"].round(4) waterlevel_df["latitude_waterlevel"] = waterlevel_df["latitude_waterlevel"].round(4) waterlevel_df["longitude_waterlevel"] = waterlevel_df["longitude_waterlevel"].round( 4 ) combined_vars_df["latitude_precip"] = combined_vars_df["latitude_precip"].round(4) combined_vars_df["longitude_precip"] = combined_vars_df["longitude_precip"].round(4) # Subset the dataframes based on the SHELDUS hazard start and end dates output_combined_vars_df = pd.DataFrame() output_surge_df = pd.DataFrame() output_wave_df = pd.DataFrame() output_waterlevel_df = pd.DataFrame() for i in range(len(sheldus_df)): # Get the start and end dates for the hazard event and the window dates start_date = sheldus_df["Hazard_start"].iloc[i] end_date = sheldus_df["Hazard_end"].iloc[i] first_window_date = start_date - timedelta(days=window_size) last_window_date = end_date + timedelta(days=window_size) window_dates = pd.date_range(first_window_date, last_window_date) # Subset the dataframes based on the window dates window_combined_vars = combined_vars_df.loc[ combined_vars_df["Date_precip"].isin(window_dates) ] window_surge = surge_df.loc[surge_df["Date_surge"].isin(window_dates)] window_wave = wave_df.loc[wave_df["Date_wave"].isin(window_dates)] window_waterlevel = waterlevel_df.loc[ waterlevel_df["Date_waterlevel"].isin(window_dates) ] # Concatenate the subsetted dataframes output_combined_vars_df = pd.concat( [output_combined_vars_df, window_combined_vars], axis=0 ) output_surge_df = pd.concat([output_surge_df, window_surge], axis=0) output_wave_df = pd.concat([output_wave_df, window_wave], axis=0) output_waterlevel_df = pd.concat( [output_waterlevel_df, window_waterlevel], axis=0 ) # Update the dataframes with the subsetted data combined_vars_df = output_combined_vars_df surge_df = output_surge_df wave_df = output_wave_df waterlevel_df = output_waterlevel_df # Reset the index of the combined_vars_df combined_vars_df.reset_index(drop=True, inplace=True) # Print the combined_vars_df print("Combined variables data for all sheldus events: ", combined_vars_df) # Print the surge_df and wave_df to check if "Date_surge" and "Date_wave" columns are present in the data print("Surge data: ", surge_df.head()) print("Wave data: ", wave_df.head()) # # Save the subsetted data to a CSV file # combined_vars_df.to_csv(f"{base_path}data/combined_vars_df_sheldus_dates_{county_name}_{FIPS}.csv", index=False) # Calculate bounding box for the county buffer = 0.25 # Buffer in degrees to include a little more area min_lon, max_lon = ( combined_vars_df["longitude_precip"].min() - buffer, combined_vars_df["longitude_precip"].max() + buffer, ) min_lat, max_lat = ( combined_vars_df["latitude_precip"].min() - buffer, combined_vars_df["latitude_precip"].max() + buffer, ) # Filter surge, wave, and water level data within the bounding box surge_filtered = surge_df[ (surge_df["longitude_surge"] >= min_lon)& (surge_df["longitude_surge"] <= max_lon)& (surge_df["latitude_surge"] >= min_lat)& (surge_df["latitude_surge"] <= max_lat) ] wave_filtered = wave_df[ (wave_df["longitude_wave"] >= min_lon)& (wave_df["longitude_wave"] <= max_lon)& (wave_df["latitude_wave"] >= min_lat)& (wave_df["latitude_wave"] <= max_lat) ] waterlevel_filtered = waterlevel_df[ (waterlevel_df["longitude_waterlevel"] >= min_lon)& (waterlevel_df["longitude_waterlevel"] <= max_lon)& (waterlevel_df["latitude_waterlevel"] >= min_lat)& (waterlevel_df["latitude_waterlevel"] <= max_lat) ] # Print the filtered surge, wave, and water level data print("Filtered surge data: ", surge_filtered.head()) print("Filtered wave data: ", wave_filtered.head()) print("Filtered water level data: ", waterlevel_filtered.head()) # Function to find the nearest percentile value for a given row and condition def find_nearest_percentile( row, event_df, date_col, lat_col, lon_col, percentile_col ):""" Find the nearest percentile value for a given row and condition. Parameters: ----------- row : pd.Series A single row from the combined_vars_df DataFrame containing the reference point. event_df : pd.DataFrame DataFrame containing event data (e.g., surge, wave, or water level) with date, latitude, longitude, and percentile columns. date_col : str Name of the column in event_df representing the date. lat_col : str Name of the column in event_df representing the latitude. lon_col : str Name of the column in event_df representing the longitude. percentile_col : str Name of the column in event_df representing the percentile values. Returns: -------- float or None The nearest percentile value if found, or None if no matching date is found in event_df.""" try: if row["Date_precip"].date() not in event_df[date_col].dt.date.values: return None event_same_date = event_df.loc[ event_df[date_col].dt.date == row["Date_precip"].date() ] distances = event_same_date.apply( lambda event: haversine( (row["latitude_precip"], row["longitude_precip"]), (event[lat_col], event[lon_col]), unit=Unit.KILOMETERS, ), axis=1, ) nearest_index = distances.idxmin() return event_same_date.loc[nearest_index, percentile_col] except KeyError as e: print(f"KeyError occurred: {str(e)}") return None except Exception as e: print(f"An error occurred: {str(e)}") return None # Apply the function to each row based on the conditions and coast for index, row in tqdm(combined_vars_df.iterrows(), total=len(combined_vars_df)): if row["surge_points"] == "Yes": combined_vars_df.at[index, "nearest_surge"] = find_nearest_percentile( row, surge_filtered,"Date_surge","latitude_surge","longitude_surge","surge_percentiles", ) combined_vars_df.at[index, "nearest_waterlevel"] = find_nearest_percentile( row, waterlevel_filtered,"Date_waterlevel","latitude_waterlevel","longitude_waterlevel","waterlevel_percentiles", ) if row["wave_points"] == "Yes": if coast == "GulfOfMexico": combined_vars_df.at[index, "nearest_wave"] = find_nearest_percentile( row, wave_filtered,"Date_wave","latitude_wave","longitude_wave","waveHs_percentiles", ) elif coast == "Atlantic": combined_vars_df.at[index, "nearest_wave"] = find_nearest_percentile( row, wave_filtered,"Date_wave","latitude_wave","longitude_wave","waveHs_percentiles", ) return combined_vars_df