Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

Working with nested JSON in Great Expectations

$
0
0

I am trying to work with nested json in great expectations. I have managed to achieve the same with following expectation suite and batch modification like so:

nestedjson_expectations_suite.json

{"data_asset_type": "Dataset","expectation_suite_name": "default","expectations": [    {"expectation_type": "expect_column_values_to_be_between","kwargs": {"column": "id","max_value": 100,"min_value": 1      },"meta": {}    },    {"expectation_type": "expect_column_values_to_be_unique","kwargs": {"column": "id"      },"meta": {}    },    {"expectation_type": "expect_column_values_to_match_regex","kwargs": {"column": "name","regex": "^[A-Za-z\\s]+$"      },"meta": {}    },    {"expectation_type": "expect_column_values_to_not_be_null","kwargs": {"column": "name"      },"meta": {}    },    {"expectation_type": "expect_column_values_to_be_between","kwargs": {"column": "details_age","max_value": 120,"min_value": 0      },"meta": {}    },    {"expectation_type": "expect_column_values_to_match_regex","kwargs": {"column": "details_address_city","regex": "^[A-Za-z\\s]+$"      },"meta": {}    },    {"expectation_type": "expect_column_values_to_match_regex","kwargs": {"column": "details_address_state","regex": "^[A-Za-z\\s]+$"      },"meta": {}    }  ],"ge_cloud_id": null,"meta": {"great_expectations_version": "0.18.8"  }}

nested.py

import great_expectations as geimport openpyxl# Load the Great Expectations contextcontext = ge.data_context.DataContext("../.")# Load the JSON data into a Pandas DataFramedata_file_path = "../../data/nested.json"df = ge.read_json(data_file_path)# Create a batch of databatch = ge.dataset.PandasDataset(df)# Create new columns for nested valuesbatch["details_age"] = batch["details"].apply(lambda x: x.get("age"))batch["details_address_city"] = batch["details"].apply(lambda x: x.get("address").get("city"))batch["details_address_state"] = batch["details"].apply(lambda x: x.get("address").get("state"))result = batch.validate("nestedjson_expectations_suite.json")print(result)

nested.json

[  {"id": 1,"name": "John Doe","details": {"age": 30,"address": {"city": "New York","state": "NY"      }    }  },  {"id": 2,"name": "Jane Smith","details": {"age": 25,"address": {"city": "San Francisco","state": "CA"      }    }  }]

The documentation is not very clear on how to proceed with batches:

  1. How to run this as checkpoints?
  2. Build data docs?
  3. Create a validator?
  4. Export the results to an output file, maybe as a excel and not json

I tried to update the batch request to modify batches but that didn't seem to work!

validator = context.get_validator(    batch_request=BatchRequest(**batch_request),    expectation_suite_name=expectation_suite_name,)

Please suggest


Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>