Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

How to groupBy on two columns and work out avg total value for each grouped column using pyspark

$
0
0

I have the following DataFrame and using Pyspark, I'm trying to get the following answers:

  1. Total Fare by Pick
  2. Total Tip by Pick
  3. Avg Drag by Pick
  4. Avg Drag by Drop
PickDropFareTipDrag
114.004.001.00
125.0010.008.00
125.0015.0012.00
3211.0012.0017.00
3541.0025.0013.00
4650.0070.002.00

My Query is so far like this:

from pyspark.sql import functions as funcfrom pyspark.sql.functions import descdf.groupBy('Pick', 'Drop') \    .agg(        func.sum('Fare').alias('FarePick'),        func.sum('Tip').alias('TipPick'),        func.avg('Drag').alias('AvgDragPick'),        func.avg('Drag').alias('AvgDragDrop')) \    .orderBy('Pick').show()

However, I don't think it seems to be correct. I'm abit stuck on (4) because the groupby does not seem correct. Can anyone suggest correction here?


Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>