I'm trying to use the Refextract library to extract citations from the 'references' session in a PDF academic paper.
Here is my code:
from refextract import extract_references_from_filedef extract_citations(uploaded_file_location): try: citations = extract_references_from_file(uploaded_file_location) citations = merge_citations(citations) unique_citations = {} for citation in citations: raw_ref = citation['raw_ref'][0] if raw_ref not in unique_citations: unique_citations[raw_ref] = citation else: pass unique_citations_list = list(unique_citations.values()) return unique_citations_list except Exception as e: print(f"An error occurred: {e}") return []
Just ignore the merge_citations function, the program run successfully on my localhost (I use Ubuntu 22.04), but when I run it on Docker (I use Docker Compose to run it all), there is the problem from docker logs:
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType
Here is my Dockerfile:
FROM python:latestWORKDIR /appCOPY requirements.txt ./RUN pip install -r requirements.txtCOPY . .CMD ["python", "main.py"]
I want to access the Google Cloud Bucket, so I have already debugged and checked the file path, but it is correct, so it seems like the primary problem is caused by the extract_references_from_file
function.
I've already searched, but there is not any satisfactory solution that works for me. So, I am happy for someone to help me fix it.