Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 14126

How do I create an OLS result table?

$
0
0

I want to create an OLS regression result table that display the unemployment rate for females after a natural disaster takes place and another OLS regression result table for males. My issue is that the numbers on the result tables turn out to be the same for males and females. Can someone please look over my coding and give advice on how to correct it and send the correct coding to me please? Thank you.

Here's the code I've tried so far to get the OLS regression result table displaying the unemployment rate for females:

# Filter for females onlydf_females = df_unegeoafr_african[df_unegeoafr_african['Female'] == 1]# Create a binary variable indicating whether a disaster occurreddf_females['Disaster Occurred'] = df_females.groupby('Reference area')['Year'].transform(lambda x: x.diff().eq(1).cumsum().gt(0))#Define a variable "file" with the name of an excel file ("clean versiosn: all climate disasters 2000-2023.xlsx")#Use "os.path.join" to construct the complete file path by joining the directory path "data path" and teh file name "file"file = 'Climate_disasters_2000-2023.xlsx'path = os.path.join(data_path, file)#using read_excel function to read teh excel file located at the specified path "path"#the resulting dataframe is assigned to the variable "df_dis"df_dis = pd.read_excel(path)#use df_unegeoafr_african_copy.head() to select the first few rows (default 5 rows) of the dataframe "df_unegeoafr_african_copy"#use .to_csv() to convert the selected rows into a CSV formatdf_unegeoafr_african_copy.head().to_csv()#use df_dis.head() to select the first few rows (5 rows) of the dataframe "df_dis"#use .to_csv() convert the selected roqdf_dis.head().to_csv()# Filter for females onlydf_females = df_unegeoafr_african[df_unegeoafr_african['Female'] == 1]# Step 1: Create a binary variable indicating whether a disaster occurreddisaster_occurred = df_females.groupby('Reference area')['Year'].transform(lambda x: x.diff().eq(1).cumsum().gt(0))# Step 2: Assign the result to the original DataFrame using .copy() with .locdf_females = df_females.copy()df_females.loc[:, 'Disaster Occurred'] = disaster_occurred# Assuming you have a DataFrame df_dis with columns 'Country', 'Start Year', and 'DisNo.'# Group by 'Country' and 'Start Year' and check if any disaster occurred in each groupdf_result = df_dis.groupby(['Country', 'Start Year']).any().reset_index()# Create a new column 'Disaster Occurred' based on the presence of any disaster in the groupdf_result['Disaster Occurred'] = df_result['DisNo.'].astype(int)# Select only the necessary columns in the result DataFramedf_result = df_result[['Country', 'Start Year', 'Disaster Occurred']]# Rename the columns for claritydf_result = df_result.rename(columns={'Start Year': 'Year'})#display the updated dataframe df_resultprint(df_result)# Merge the DataFrames "df_result" and "df_unegeoafr_african_copy" based on 'Country' and 'Year'#how='left': specifies the type of merge to be peromed. "left" means it will include all rows from the left DataFrame (df_ilo) and any matching rows from the right DataFrame (df_result).#left_on=['Reference area', 'Year']: These are the columns from the left DataFrame (df_unegeoafr_african_copy) used as the key for merging.#right_on=['Country', 'Year']: These are the columns from the right DataFrame (df_result) used as the key for merging.df_merged = pd.merge(df_unegeoafr_african_copy, df_result, how='left', left_on=['Reference area', 'Year'], right_on=['Country', 'Year'])# Fill missing values in 'Disaster Occurred' column with#df_merged['Disaster Occurred']: This is selecting the column 'Disaster Occurred' from the DataFrame df_merged.#.fillna(0): This method fills missing (NaN) values in the selected column with the specified value, in this case, 0.df_merged['Disaster Occurred'] = df_merged['Disaster Occurred'].fillna(0)#display dataframe df_mergedprint(df_merged) #use df_merged.head to select the first few rows (by defailt 5 rows) of the dataframe#use to_csv() to convert the selected rows into a CSV formatdf_merged.head().to_csv()import statsmodels.api as sm# Assuming your DataFrame is named df_merged# Drop rows with missing values in the dependent variable and independent variabledf_regression = df_merged.dropna(subset=['Unemployment Rate', 'Disaster Occurred'])# Define the independent variable (X) and the dependent variable (y)X = df_regression['Disaster Occurred']y = df_regression['Unemployment Rate']# Add a constant to the independent variable (this is the intercept term)X = sm.add_constant(X)# Create and fit the regression model with robust standard errorsmodel = sm.OLS(y, X)results = model.fit(cov_type='HC3')# Display the regression resultsprint(results.summary())

This the coding I did for OLS regression table for unemployment rate for males:

# Filter for males onlydf_males = df_unegeoafr_african[df_unegeoafr_african['Female'] == 0]# Create a binary variable indicating whether a disaster occurreddf_males['Disaster Occurred'] = df_males.groupby('Reference area')['Year'].transform(lambda x: x.diff().eq(1).cumsum().gt(0))#Define a variable "file" with the name of an excel file ("clean versiosn: all climate disasters 2000-2023.xlsx")#Use "os.path.join" to construct the complete file path by joining the directory path "data path" and teh file name "file"file = 'Climate_disasters_2000-2023.xlsx'path = os.path.join(data_path, file)#using read_excel function to read teh excel file located at the specified path "path"#the resulting dataframe is assigned to the variable "df_dis"df_dis = pd.read_excel(path)#use df_unegeoafr_african_copy.head() to select the first few rows (default 5 rows) of the dataframe "df_unegeoafr_african_copy"#use .to_csv() to convert the selected rows into a CSV formatdf_unegeoafr_african_copy.head().to_csv()#use df_dis.head() to select the first few rows (5 rows) of the dataframe "df_dis"#use .to_csv() convert the selected roqdf_dis.head().to_csv()# Filter for males onlydf_males = df_unegeoafr_african[df_unegeoafr_african['Female'] == 0]# Step 1: Create a binary variable indicating whether a disaster occurreddisaster_occurred = df_males.groupby('Reference area')['Year'].transform(lambda x: x.diff().eq(1).cumsum().gt(0))# Step 2: Assign the result to the original DataFrame using .copy() with .locdf_males = df_males.copy()df_males.loc[:, 'Disaster Occurred'] = disaster_occurred# Assuming you have a DataFrame df_dis with columns 'Country', 'Start Year', and 'DisNo.'# Group by 'Country' and 'Start Year' and check if any disaster occurred in each groupdf_result = df_dis.groupby(['Country', 'Start Year']).any().reset_index()# Create a new column 'Disaster Occurred' based on the presence of any disaster in the groupdf_result['Disaster Occurred'] = df_result['DisNo.'].astype(int)# Select only the necessary columns in the result DataFramedf_result = df_result[['Country', 'Start Year', 'Disaster Occurred']]# Rename the columns for claritydf_result = df_result.rename(columns={'Start Year': 'Year'})#display the updated dataframe df_resultprint(df_result)# Merge the DataFrames "df_result" and "df_unegeoafr_african_copy" based on 'Country' and 'Year'#how='left': specifies the type of merge to be peromed. "left" means it will include all rows from the left DataFrame (df_ilo) and any matching rows from the right DataFrame (df_result).#left_on=['Reference area', 'Year']: These are the columns from the left DataFrame (df_unegeoafr_african_copy) used as the key for merging.#right_on=['Country', 'Year']: These are the columns from the right DataFrame (df_result) used as the key for merging.df_merged = pd.merge(df_unegeoafr_african_copy, df_result, how='left', left_on=['Reference area', 'Year'], right_on=['Country', 'Year'])# Fill missing values in 'Disaster Occurred' column with#df_merged['Disaster Occurred']: This is selecting the column 'Disaster Occurred' from the DataFrame df_merged.#.fillna(0): This method fills missing (NaN) values in the selected column with the specified value, in this case, 0.df_merged['Disaster Occurred'] = df_merged['Disaster Occurred'].fillna(0)#display dataframe df_mergedprint(df_merged)#use df_merged.head to select the first few rows (by defailt 5 rows) of the dataframe#use to_csv() to convert the selected rows into a CSV formatdf_merged.head().to_csv()import statsmodels.api as sm# Assuming your DataFrame is named df_merged# Drop rows with missing values in the dependent variable and independent variabledf_regression = df_merged.dropna(subset=['Unemployment Rate', 'Disaster Occurred'])# Define the independent variable (X) and the dependent variable (y)X = df_regression['Disaster Occurred']y = df_regression['Unemployment Rate']# Add a constant to the independent variable (this is the intercept term)X = sm.add_constant(X)# Create and fit the regression model with robust standard errorsmodel = sm.OLS(y, X)results = model.fit(cov_type='HC3')# Display the regression resultsprint(results.summary())

I got this OLS result table:

enter image description here


Viewing all articles
Browse latest Browse all 14126

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>