Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23218

Retrieving XMP metadata from PDF files with Python xmptools

$
0
0

I would like to use Python to retrieve metadata stored in PDF files. I am trying to use Python xmptools, but find that I cannot extract all the metadata. For example, this paper is available in PDF format. I have the following script that tries to extract the metadata

from xmptools import XMPMetadata, DCxmp = XMPMetadata.fromFile("Leonard_2015_Comment_on_‘Dimensionless_units_in_the_SI’.pdf")[0]print( xmp.getContainerItems(DC.publisher) )

This works fine. The result is [rdflib.term.Literal('IOP Publishing')]. However, if I change the last line to

print( xmp.getContainerItems(DC.identifier) )

then I get None as a result.

I think this may be due to the XML inside the PDF file. The data concerned with these two queries are

<dc:publisher><rdf:Bag><rdf:li>IOP Publishing</rdf:li></rdf:Bag></dc:publisher><dc:identifier>doi:10.1088/0026-1394/52/4/613</dc:identifier>

In the case of publisher, the information is wrapped in RDF tags, but that is not the case for identifier.

Is there a way for xmptools to read simple entries where RDF tags have not been used?


Viewing all articles
Browse latest Browse all 23218

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>