Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23276

Convert HTML to PDF - PDFKit File Size too Large

$
0
0

I have one complete (static, it doesn't rely on calls to the internet) HTML file that's < 900 KB in size, and I am currently using PDF Kit to create a single PDF from it that ends up being about 100 pages long. The PDF is 30-40 mB - which is way too large, frankly - considering each page of the PDF is just text and a small image repeated 4 times.

The way I create the PDF is pretty simple.

installation:

apt-get install wkhtmltopdf -ypip install pdfkit==1.0.0pip install pypdf2==2.10.5
import pdfkitdef html_to_pdf(html_path: str, pdf_path: str):    pdfkit.from_file(        input=html_path,         output_path=pdf_path,        configuration=pdfkit.configuration(),        options={'zoom': '0.9588', # seemed to be the right zoom through trial and error'disable-smart-shrinking': '', 'page-size': 'Letter','orientation': 'Landscape','margin-top': '0','margin-right': '0','margin-left': '0','margin-bottom': '0','encoding': "UTF-8",        })html_to_pdf(".my_html_file.html", "my_pdf_file.pdf")

The image I pull in - I've tried resizing the image and shrunk it to be about 30% of its original size, but there was no change at all in the size of the resulting .pdf.

What I notice about the PDF's I generate with PDFKit is that it's not really a PDF. As in - you can't really search the text, highlight text blocks, etc. It acts like it's essentially a big image on every page. When I do a print from my browser on the HTML and convert that to a PDF - I can do all those things for example.

I am stuck building something programmatically - so I need this to be automated. Is there some setting I'm missing with PDF Kit?

Also what can be noted is I have access to the actual string I use to make the HTML - I don't have to read an HTML file. Would that make a difference?

I'm also open to not using PDF kit at all. I just need something that doesn't require a license.


Viewing all articles
Browse latest Browse all 23276

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>