Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23160

Working Around Concurrency Limits in AWS Glue

$
0
0

I have a question around how best to manage concurrent job instances in AWS glue.

I have a job defined like so:

job = client.create_job(        Name='JOB_NAME',         Role='the-role-name',        ExecutionProperty={'MaxConcurrentRuns': 25        },        Command={'Name': 'glueetl','ScriptLocation': script_location,'PythonVersion': '3'},        Tags={'Application': 'app','Project': 'proj'},        GlueVersion='2.0',        WorkerType='G.2X',        NumberOfWorkers=50    )

I want to call about 1000 instances of this job like so:

def run_job(f):    response = client.start_job_run(                JobName = JOB_NAME,                Arguments = {'--start_date':  start_date,'--end_date':  end_date,'--factor':  f} )    return responsefor f in factors:        response = run_job(f)        print(f"response: {response}")

The issue with this approach is #1 firing off all these requests at once will throw a throttling error and #2 if I sleep between job starts I still run up against concurrency limit which is 50.

Does anyone know an easy way to work around these issues?


Viewing all articles
Browse latest Browse all 23160

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>