Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 16921

Airflow HiveOperator Result Set

$
0
0

I'm new to both Airflow and Python, and I'm trying to configure a scheduled report. The report needs to pull data from Hive and email the results.

My code thus far:

from datetime import datetime, timedeltafrom airflow import DAGfrom airflow.operators.hive_operator import HiveOperatordefault_args = {'owner': 'me','depends_on_past': False,'start_date': datetime(2015, 1, 1),'email': ['email@example.com'],'email_on_failure': True,'email_on_retry': True,'retries': 3,'retry_delay': timedelta(hours=2)}dag = DAG(    dag_id='hive_report',    max_active_runs=1,    default_args=default_args,    schedule_interval='@once')query = """    #query goes here"""run_hive_query = HiveOperator(    task_id="fetch_data",    hql=query,    dag=dag)

I'm pretty sure I need to add an EmailOperator task to send the results, as this only seems to be configured to email on failure or retry.

My question is this: what does the Hive operator do with the result set? What is the best way to pass the result set from one task to another?


Viewing all articles
Browse latest Browse all 16921

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>