I am going from a manual setup of my RAG solution in Azure to setting up everything programmatically using the azure python sdk. I have a container with a single pdf
. When setting up manually is see that the Document count under the created index is 401 when setting the chunking to 256. When using my custom skillset:
split_skill = SplitSkill( name="split", description="Split skill to chunk documents", context="/document", text_split_mode="pages", default_language_code="en", maximum_page_length=300, # why cannot this be set to 256 if I can do this with a manual setup? page_overlap_length=30, inputs=[ InputFieldMappingEntry(name="text", source="/document/content"), ], outputs=[ OutputFieldMappingEntry(name="textItems", target_name="pages") ],)
I get 271. I want to mimic my manual chunking setup as much as possible as I already have good performance. What am I missing?