Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23218

Hadoop jar command to run python mapper and reducer

$
0
0

I'm trying to run mapper and reducer to get the year and temperature as keys and values from zip files in my folder.I could write the mapper and reducer to print the values. However, I'm not able to get any output on putty. I've added the python codes for mapper and reducer as well as commands used on putty.Please help me find what's wrong. Thank you!

Here are the details:

Mapper:

mapper.py

import sysimport zipfileimport ioimport osimport re

input_folder = sys.argv[1] # Get the input folder path from command-line arguments

Iterating over zip files in the input folder

for filename in os.listdir(input_folder):your textif filename.endswith('.zip'):your textwith zipfile.ZipFile(os.path.join(input_folder, filename), 'r') as zip_file:your textfor inner_filename in zip_file.namelist():your textwith zip_file.open(inner_filename) as file:your textfor line in io.TextIOWrapper(file):your text# Process each line from the fileyour textval = line.strip()your text(year, temp) = (val[15:19], int(val[87:92]))your textif temp == 9999:your textsys.stderr.write("reporter:counter:Temperature,Missing,1\n")

Reducer:#reducer

import sys

Function to process the input key and values

def process_input(key, values):your text# Print the key and all the associated valuesyour textfor value in values:your textprint(key, value)

your text# Initializing variables to hold key-value pairscurrent_key = Nonecurrent_values = []

your text# Iterating over lines of input received from mapperfor line in sys.stdin:your text# Splitting the line into key and valueyour textkey, value = line.strip().split('\t', 1)

your text# If the key has changed, process the previous key-value pairyour textif key != current_key:your textif current_key is not None:your textprocess_input(current_key, current_values)your textcurrent_key = keyyour textcurrent_values = []

your text# Add the value to the list of values for the current keyyour textcurrent_values.append(value)

your text# Processing the last key-value pairif current_key is not None:your textprocess_input(current_key, current_values)

Commands used:

setting the HADOOP_CLASSPATH environment variable

export HADOOP_CLASSPATH=/home/student93/

copying CourseProjectData file and mapper and reducer python files from server hard drive to #hdfs

hdfs dfs -copyFromLocal /home/student93/Data /home/93student93/hdfs dfs -copyFromLocal /home/student93/project_mapper1.py /home/93student93/hdfs dfs -copyFromLocal /home/student93/project_reducer1.py /home/93student93/

#hadoop jar command to run mapper and reducer on the input and saving to outputhadoop jar hadoop-streaming-2.9.0.jar \ -input /home/93student93/Data \ -output /home/93student93/temperatures_years \ -mapper project_mapper1.py \ -reducer project_reducer1.py \ -file project_mapper1.py \ -file project_reducer1.py

I tried changing the python code since it deals with zip files and hadoop jar command. Couldn't get the expected results of extracting and printing years and temperatures from the zip files.After entering the hadoop jar command, the last message seen is INFO mapreduce.Job: Running job:However, there is no other output.


Viewing all articles
Browse latest Browse all 23218

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>