I'm trying to use the langchain text splitters library fun to "chunk" or divide A massive str file that has Sci-Fi Books I want to split it into n_chunks with a n_lenght of overlaping
This is my code:
from langchain_text_splitters import CharacterTextSplittertext_splitter = CharacterTextSplitter( chunk_size=30, chunk_overlap=5)text_raw = """Water is life's matter and matrix, mother, and medium. There is no life without water.Save water, secure the future.Conservation is the key to a sustainable water supply.Every drop saved today is a resource for tomorrow.Let's work together to keep our rivers flowing and our oceans blue."""chunks=text_splitter.split_text(text_raw)print(chunks)print(f'\n\n {len(chunks)}')But this is my output:
["Water is life's matter and matrix, mother, and medium. There is no life without water.\nSave water, secure the future.\nConservation is the key to a sustainable water supply.\nEvery drop saved today is a resource for tomorrow.\nLet's work together to keep our rivers flowing and our oceans blue."] 1My intention is to split at every 30 characters and overlap the last/leading 5
for instance if this is one chunk:
'This is one Chunk after text splitting ABC'Then I want my following chunk to be something like :
'splitting ABC This is my Second Chunk ---''Notices how the beginning of the next chunk overlaps the last characters of the previous chunks?
that's what I'm looking for but it is obvious that that is not how that function works I'll be so thankful if you guys can help me out please, I am very new to langchanin I have checked the official documentation but haven't found an example or tutorial like the one I'm looking for
It will also be super helpful is you can point or reference a fun to save locally the chunks from LangChain or if we have to stick to base python to do that