Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 13891

Extracting values from dataframe unintentionally adds index value

$
0
0

I'm trying to create a function in Python that uses dataframes from two other CSV files to determine population.

My Python file looks like this:

import numpy as npimport pandas as pddf = pd.read_csv("mm_copy.csv", index_col=False) # NB "Index_col = False" is necessary because otherwise Python tries to use first column as index, \                                                        #and throws everything out by one columncountries_df = pd.read_csv("countries.csv") south_america_df = countries_df.loc[countries_df.Continent == "South America"]south_american_countries_list = south_america_df.Entity.tolist() # Creates complete list of South American countries, using local definitionsprint(south_american_countries_list)population_df = pd.read_csv("population.csv", low_memory=False)population_df.set_index("Location")def calculate_crude_maternal_mortality(year):    # Establishes total WHO female population of South America    who_female_pop_slice = population_df.loc[(population_df.Location == "South America") & (population_df.Time == year)]    who_female_pop = who_female_pop_slice["TPopulationFemale1July"]    #Calc total South American Female population in relevant year    total_south_american_female_pop = []    for country in south_american_countries_list: # iterates through list of South American countries        if population_df["Location"].str.contains(country).any(): # checks country is in population dataframe            population_df_slice = population_df.loc[(population_df.Location == country) & (population_df.Time == year)] # locates appropriate row which has  country and year            country_female_pop = population_df_slice["TPopulationFemale1July"] # finds cell with female population            total_south_american_female_pop.append(country_female_pop) # adds this population to the totals list    print(total_south_american_female_pop)    # To add - adding hte values up within the list    print("Total South American female population, using WHO definitions = " + str(who_female_pop))calculate_crude_maternal_mortality(2015)

The countries_df section is fine, and does indeed create a list of the requred countries. I created it in this way to allow myself to iterate over it country by country.

An excerpt from the population.csv file (and therefore dataframe) looks like this:

SortOrder,LocID,Notes,ISO3_code,ISO2_code,SDMX_code,LocTypeID,LocTypeName,ParentID,Location,VarID,Variant,Time,TPopulation1Jan,TPopulation1July,TPopulationMale1July,TPopulationFemale1July,PopDensity,PopSexRatio,MedianAgePop,NatChange,NatChangeRT,PopChange,PopGrowthRate,DoublingTime,Births,Births1519,CBR,TFR,NRR,MAC,SRB,Deaths,DeathsMale,DeathsFemale,CDR,LEx,LExMale,LExFemale,LE15,LE15Male,LE15Female,LE65,LE65Male,LE65Female,LE80,LE80Male,LE80Female,InfantDeaths,IMR,LBsurvivingAge1,Under5Deaths,Q5,Q0040,Q0040Male,Q0040Female,Q0060,Q0060Male,Q0060Female,Q1550,Q1550Male,Q1550Female,Q1560,Q1560Male,Q1560Female,NetMigrations,CNMR247,170,,COL,CO,170,4,Country/Area,931,Colombia,2,Medium,2012,45550.12,45782.417,22632.42,23149.997,40.743,97.7642,27.2575,520.504,11.362,464.594,1.015,68.2904,748.734,163.722,16.344,1.9297,0.92,26.376,104.5,228.23,130.084,98.146,4.982,75.5973,72.4547,78.7871,62.1301,59.0969,65.1879,17.4558,16.0588,18.7051,7.1826,6.3333,7.824,10.868,14.5137,739.171,12.888,17.1791,61.6454,86.0621,36.5196,133.7882,174.8247,92.1336,66.6632,96.237,36.622,115.7444,155.5635,75.5928,-55.913,-1.221247,170,,COL,CO,170,4,Country/Area,931,Colombia,2,Medium,2013,46014.714,46237.93,22850.386,23387.544,41.1484,97.7032,27.6756,511.117,11.047,446.432,0.966,71.7544,744.381,159.336,16.088,1.9049,0.9087,26.405,104.5,233.264,132.443,100.821,5.041,75.8267,72.7253,78.9723,62.3146,59.3192,65.3327,17.5411,16.1093,18.8282,7.2328,6.3674,7.8916,10.44,14.0247,735.204,12.359,16.581,59.6473,82.8304,35.7795,130.9528,170.3958,90.9944,64.6886,92.7506,36.1904,113.4276,151.6658,74.9431,-64.686,-1.398247,170,,COL,CO,170,4,Country/Area,931,Colombia,2,Medium,2014,46461.146,46677.947,23060.58,23617.367,41.54,97.6425,28.0974,501.117,10.728,433.602,0.929,74.6122,739.615,152.064,15.834,1.883,0.8986,26.509,104.5,238.498,134.812,103.686,5.106,76.0427,72.9851,79.1389,62.4916,59.5363,65.4649,17.637,16.185,18.9449,7.285,6.4182,7.9467,10.055,13.5942,730.785,11.893,16.0617,57.8828,79.9829,35.12,128.4596,166.4828,90.0037,62.9794,89.7482,35.799,111.3895,148.2313,74.3682,-67.516,-1.445247,170,,COL,CO,170,4,Country/Area,931,Colombia,2,Medium,2015,46894.748,47119.728,23272.374,23847.355,41.9331,97.5889,28.5193,491.031,10.417,449.96,0.955,72.5809,734.664,144.665,15.585,1.8627,0.8893,26.64,104.5,243.633,136.964,106.669,5.168,76.2573,73.2516,79.2916,62.6672,59.7603,65.5831,17.7591,16.3109,19.0601,7.3437,6.4971,7.9874,9.675,13.1695,726.176,11.428,15.5387,56.3993,77.6037,34.5513,126.3584,163.1564,89.1868,61.7138,87.4564,35.5774,109.7502,145.3989,73.9664,-41.064,-0.871247,170,,COL,CO,170,4,Country/Area,931,Colombia,2,Medium,2016,47344.708,47625.955,23517.966,24107.989,42.3836,97.5526,28.9314,482.508,10.14,562.495,1.181,58.6915,730.565,140.179,15.353,1.8435,0.8806,26.686,104.5,248.057,138.68,109.378,5.213,76.4713,73.5187,79.4398,62.8419,59.9837,65.6973,17.9043,16.4802,19.1741,7.4081,6.5985,8.0166,9.316,12.7494,722.402,10.99,15.0258,55.193,75.7559,33.9994,124.5565,160.4191,88.3461,60.8743,85.9499,35.4094,108.4201,143.1839,73.5314,79.978,1.681247,170,,COL,CO,170,4,Country/Area,931,Colombia,2,Medium,2017,47907.203,48351.67,23874.571,24477.1,43.0295,97.5384,29.3149,473.319,9.832,888.935,1.839,37.6915,726.008,135.564,15.081,1.8178,0.8684,26.75,104.5,252.689,140.463,112.226,5.249,76.6456,73.7592,79.5348,62.9901,60.1914,65.7739,18.0382,16.6428,19.2726,7.469,6.6958,8.0421,8.998,12.3818,718.136,10.637,14.615,54.3985,74.2385,33.9488,123.1977,157.9442,88.1227,60.3837,84.7024,35.6815,107.3689,141.0915,73.5224,415.618,8.633247,170,,COL,CO,170,4,Country/Area,931,Colombia,2,Medium,2018,48796.138,49276.961,24331.054,24945.907,43.8529,97.5353,29.6736,467.285,9.53,961.647,1.952,35.5096,727.649,131.012,14.841,1.7868,0.8539,26.823,104.5,260.364,144.941,115.423,5.311,76.7479,73.8421,79.6597,63.05,60.2293,65.8597,18.1368,16.7471,19.3652,7.5196,6.7639,8.0783,8.716,11.9679,720.031,10.276,14.0874,54.3932,74.565,33.586,122.77,157.7746,87.356,61.1154,85.8714,35.9247,107.4454,141.4585,73.2214,494.364,10.083247,170,,COL,CO,170,4,Country/Area,931,Colombia,2,Medium,2019,49757.785,50187.406,24779.301,25408.106,44.6632,97.5252,30.0377,463.436,9.271,859.243,1.712,40.4876,733.94,126.781,14.682,1.765,0.8434,26.872,104.6,270.504,151.165,119.339,5.411,76.7523,73.7998,79.721,63.0269,60.1591,65.8941,18.1682,16.7473,19.431,7.5553,6.7804,8.1323,8.511,11.5935,726.503,10.041,13.6627,55.0823,75.6392,33.8578,123.0687,158.3005,87.3372,62.4333,87.5412,36.8336,108.0507,142.2845,73.5153,395.803,7.918247,170,,COL,CO,170,4,Country/Area,931,Colombia,2,Medium,2020,50617.028,50930.662,25139.588,25791.075,45.3246,97.474,30.4097,397.835,7.829,627.269,1.232,56.2619,733.491,121.663,14.435,1.7369,0.8305,26.912,104.6,335.656,191.26,144.396,6.606,74.7692,71.5365,78.1362,60.9525,57.7989,64.2239,16.6634,14.9753,18.2591,7.0308,6.229,7.6415,8.237,11.2177,726.305,9.697,13.1843,55.0287,74.9347,34.4943,140.9395,181.2286,100.096,68.1517,94.3241,41.5122,126.9074,166.2696,87.2104,229.437,4.515247,170,,COL,CO,170,4,Country/Area,931,Colombia,2,Medium,2021,51244.297,51516.562,25415.242,26101.321,45.846,97.3715,30.7725,332.554,6.469,544.53,1.057,65.5768,730.203,118.645,14.204,1.7168,0.8197,26.935,104.6,397.649,226.497,171.151,7.735,72.8296,69.4041,76.4421,58.9838,55.6203,62.5192,15.9966,14.4072,17.485,6.9963,6.2707,7.536,8,10.943,723.229,9.471,12.9233,63.5047,87.68,38.4849,175.774,226.3475,123.8155,87.1411,121.5977,51.8059,162.2411,212.2602,111.0627,211.978,4.123247,170,,COL,CO,170,4,Country/Area,931,Colombia,2,Medium,2022,51788.827,51874.024,25575.607,26298.417,46.1641,97.2515,31.171,338.317,6.512,170.394,0.328,,723.264,113.984,13.921,1.6922,0.8083,26.978,104.5,384.947,217.744,167.203,7.409,73.6595,70.3187,77.1432,59.8219,56.5496,63.2215,16.3778,14.8191,17.8162,6.9677,6.2137,7.5318,7.888,10.886,716.393,9.371,12.8868,61.5887,85.2531,37.0922,162.6828,209.9033,114.286,81.2504,113.7146,47.952,149.0061,195.5519,101.5067,-167.924,-3.232247,170,,COL,CO,170,4,Country/Area,931,Colombia,2,Medium,2023,51959.221,52085.168,25668.667,26416.5,46.352,97.1691,31.6447,426.943,8.184,251.893,0.484,,714.422,109.523,13.694,1.6886,0.8092,27.014,104.5,287.479,159.288,128.191,5.51,77.5138,74.6074,80.4018,63.6629,60.8467,66.4437,18.4607,17.0113,19.7221,7.7129,6.8891,8.3167,7.431,10.3908,707.951,8.822,12.29,49.7935,68.8941,30.0663,114.364,147.9818,80.2788,57.0775,80.4526,33.2056,100.8394,133.3786,68.0231,-175.051,-3.355

But when i run this, it doesn't output what I'm expecting, which is just a list, with the relevant female population values (drawn from a single "cell" in the CSV/dataframe), but also adds the row location number (which I think is a default index), and also the datatype. Running the Python file looks outputs this:

['Argentina', 'Bolivia', 'Brazil', 'Chile', 'Colombia', 'Ecuador', 'Falkland Islands', 'French Guiana', 'Guyana', 'Paraguay', 'Peru', 'South Georgia and the South Sandwich Islands', 'Suriname', 'Uruguay', 'Venezuela'][36241    21854.745Name: TPopulationFemale1July, dtype: float64, 36393    5511.477Name: TPopulationFemale1July, dtype: float64, 36545    104179.515Name: TPopulationFemale1July, dtype: float64, 36697    9004.176Name: TPopulationFemale1July, dtype: float64, 36849    23847.355Name: TPopulationFemale1July, dtype: float64, 37001    8095.674Name: TPopulationFemale1July, dtype: float64, 37153    1.71Name: TPopulationFemale1July, dtype: float64, 37305    129.182Name: TPopulationFemale1July, dtype: float64, 37457    383.41Name: TPopulationFemale1July, dtype: float64, 37609    3071.251Name: TPopulationFemale1July, dtype: float64, 37761    15496.198Name: TPopulationFemale1July, dtype: float64, 37913    287.482Name: TPopulationFemale1July, dtype: float64, 38065    1759.187Name: TPopulationFemale1July, dtype: float64, 38217    15341.128Name: TPopulationFemale1July, dtype: float64]Total South American female population, using WHO definitions = 36089    208962.491Name: TPopulationFemale1July, dtype: float64

Whereas I was just hoping for a list of numbers (the numbers at the end of each row here) such that I could add them up. That is, the numbers are correct, but I can't add them up because of all the extraneous information that seems to be being added at the same time.

Can anyone offer any suggestions? Thanks


Viewing all articles
Browse latest Browse all 13891

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>