Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 13861

Use Pydantic For Programmatic Data Transformation?

$
0
0

I'm exploring plugging Pydantic into an ETL application that takes multiple sources as inputs and transforms them into an Elasticsearch document as output.

The problem can be framed as a generic transformation problem: a given target field may be derived from multiple source fields and a given source field may be used to derive multiple target fields. What I would like to do is something like this:

output = Model(source)upload(output.to_json())under the hood, each output field I defined in the model would take all the source fields needed and combine them into whatever result.

I have been able to force the result I want out of Pydantic using validators with pre=True: when I instantiate the model, I do it like this:

output = Model(a=input, b=input, c=input)then each validator grabs the appropriate fields from the input and does the necessary transformations. There is also nesting, so sometimes in the course of producing, say, a, another Pydantic model may need to be instantiated using a similar pattern.

In a specific case, a validator might look like this:

@validator('brand_with_identifier', pre=True)@classmethoddef transform_title(cls, value: SourceDoc, values, **kwargs):    return f'{value.brand} {value.identifier}'

the crucial point is that there might be many fields from the source I use in instantiation that are used to compute the output. For simplicity of development it seems like it would be nice if I didn't have to clutter my code up routing all the correct source fields to all the correct target fields. I just need an orchestrator that will invoke the transformer I specify for each target field defined in the model and I am trying to make Pydantic fit the bill.

My major concern with this is that this couples the instantiation of the model to a particular input, which seems highly undesirable. But what is extremely desirable is the ability to programmatically produce all the fields in the output from a given input and to have hooks that would allow me to define how each field should be derived from the input.

Is there a way to leverage Pydantic for my use case? (Whenever I look up how to use validators for transformation online, the assumption is always that a validator will transform a single input, which doesn't fit the general form of my problem: to programmatically transform from multiple source fields into multiple target fields.)


Viewing all articles
Browse latest Browse all 13861

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>