I use the Apache Beam Python SDK together with google cloud, here a wrote a class that I use to process some business ruling. I ran a pipeline with 9 different business rules which all use the same class in a slightly different way but the broad idea is always the same.
Now I added a 10th rule in the pipe and all of the sudden Google cloud is giving me the following error:
Root cause: Timed out waiting for an update from the worker. For more information, see https://cloud.google.com/dataflow/docs/guides/common-errors#worker-lost-contact. Worker ID: xxxxxxxxx-01180119-o444-harness-gqz6The problem does not come from the new rule but from the number of rules, i tested this by removing an older rule while still runing the new one (in total 9 rules) and everything works.
I tried ingesting only half of the input data with 10 business rules and still the same error.
I upped the max number of workers to 40 workers and still the same error (while with 9 rules, 20 workers suffice).
I tried to use more diskspace for the workers (disk_size_gb=1,000) or tried using more powerfull workers (n1-highcpu-4 or n1-highmem-4) but the same error keeps occuring once I add the 10th business rule ..
Could anybody have any idea what I could try? Google cloud is giving almost no info in the logging ..