I have a question around how best to manage concurrent job instances in AWS glue.
I have a job defined like so:
job = client.create_job( Name='JOB_NAME', Role='the-role-name', ExecutionProperty={'MaxConcurrentRuns': 25 }, Command={'Name': 'glueetl','ScriptLocation': script_location,'PythonVersion': '3'}, Tags={'Application': 'app','Project': 'proj'}, GlueVersion='2.0', WorkerType='G.2X', NumberOfWorkers=50 )I want to call about 1000 instances of this job like so:
def run_job(f): response = client.start_job_run( JobName = JOB_NAME, Arguments = {'--start_date': start_date,'--end_date': end_date,'--factor': f} ) return responsefor f in factors: response = run_job(f) print(f"response: {response}")The issue with this approach is #1 firing off all these requests at once will throw a throttling error and #2 if I sleep between job starts I still run up against concurrency limit which is 50.
Does anyone know an easy way to work around these issues?