Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23418

How to modify large amounts of data using pandas

$
0
0

As shown in this code, I need to use certain data in one table as a basis to modify another table and add some information. When this kind of table information scale is large, this violent traversal method is very inefficient. How should I modify it? Moreover, multiple sheets need to be compared, which makes the efficiency even lower.

df = pd.DataFrame({'id': [123, 321, 456, 543], 'name': ['xxx', 'yyy', 'zzz', 'www']})df.set_index('id', inplace=True)df_1 = pd.DataFrame({'id': [123, 321, 456, 543], 'name': ['xxx', 'yyy', 'zzz', 'www'], 'complete': ['yes', 'yes', 'yes', 'yes'], 'course_name':['AA', 'BB', 'AA', 'DD'], 'complete_date': ['1.1', '1.2', '1.1', '1.5']})df_1.set_index('id', inplace=True)group_df = df_1.groupby('course_name')info = dict()for course_name, course_df in group_df:    info[course_name]=[]    def process(row):            info[course_name].append(Subscriber(*row.tolist()))    get_info = course_df.loc[course_df["complete"] == "yes"]    get_columns = ['name', 'complete_date']    finish_df = get_info[get_columns]    Subscriber = namedtuple('Subscriber', ['name', 'complete_date'])    finish_df.apply(process, axis = 1)# print(info)# {'AA': [Subscriber(name='xxx', complete='yes'), Subscriber(name='zzz', complete='yes')], 'BB': [Subscriber(name='yyy', complete='yes')], 'DD': [Subscriber(name='www', complete='yes')]}'''modify df'''names = set(df['name'])for course in info.keys():      for name, date in info[course]:            if name in names:                  df.loc[df['name'] == name, course] = date +'yes'                  pass# print(df)

Viewing all articles
Browse latest Browse all 23418

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>