Converting an HTML page into a PDF file is frequently a crucial use case for businesses, particularly in tasks such as invoice and report generation.
Converting an HTML page into a PDF file is frequently a crucial use case for businesses, particularly in tasks such as invoice and report generation. However, relying on the print option from a web browser might not always be practical, especially when the PDF generation needs to occur as a background activity. Typically, this task is accomplished using tools like wkhtmltopdf or by running a headless Chrome browser in the background.
In this context, let's endeavor to create a basic solution that can convert a web page to a PDF. It's important to note that this implementation covers only fundamental features and is not intended for advanced functionalities.
We shall use the QtWebEngine which is a web rendering engine that can render the html page, and generate PDF files. We also use Qt for Python that offers the official Python bindings for Qt.
Before we get started, make sure you have Python installed on your system. Additionally, install the necessary Python packages using the following command:
pip install PySide6
Importing Necessary Modules
Let's begin by importing the required modules. In your Python script, include the following lines:
import sys
from PySide6 import QtCore, QtWidgets, QtWebEngineCore, QtWebEngineWidgets, QtGui
These modules provide the foundation for creating a headless browser and handling GUI components.
Defining the Conversion Function
Now, let's define the function. This function will take a URL and a PDF file name as parameters and automate the process of converting the web page to PDF.
def url_to_pdf(url, pdf):
# Create a QApplication instance
app = QtWidgets.QApplication(sys.argv)
# Set desktop user agent string
profile = QtWebEngineCore.QWebEngineProfile.defaultProfile()
profile.setHttpUserAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36")
# Create the QWebEngineView with viewport size
view = QtWebEngineWidgets.QWebEngineView()
view.resize(1920, 1080) # Adjust viewport size as needed
page = QtWebEngineCore.QWebEnginePage(view)
Within the function, customize the headless browser by setting the desktop user agent string and adjusting the viewport size
Define callback functions within the function to handle load and print events:
Callback function for handling print finished event
def handle_print_finished(filename, status):
print("finished", filename, status)
app.quit() # Quit the application after printing
# Callback function for handling load finished event
def handle_load_finished(status):
if status:
# Adjust print layout
layout = QtGui.QPageLayout() # Import from QtGui
layout.setPageSize(QtGui.QPageSize.A4) # Or desired page size
layout.setOrientation(QtGui.QPageLayout.Landscape) # If content is wider
layout.setMargins(QtCore.QMarginsF(0, 0, 0, 0)) # Set zero margins
page.printToPdf(pdf, layout)
else:
print("Failed to load page")
app.quit()
These functions will be triggered upon the completion of loading the web page and finishing the PDF printing process.
Command-Line Usage
Enable command-line usage by checking the number of provided arguments and extracting the URL and PDF file name:
if __name__ == "__main__":
# Check if the correct number of command-line arguments is provided
if len(sys.argv) != 3:
print("Usage: python application.py <url> <name_of_pdf_file>")
sys.exit(1)
# Extract URL and PDF file name from command-line arguments
url = sys.argv[1]
pdf = sys.argv[2]
# Call the function to convert the web page to PDF
url_to_pdf(url, pdf)
now, here is the full script below :
# Import necessary modules
import sys
from PySide6 import QtCore, QtWidgets, QtWebEngineCore, QtWebEngineWidgets, QtGui
# Function to convert a web page to PDF
def url_to_pdf(url, pdf):
# Create a QApplication instance
app = QtWidgets.QApplication(sys.argv)
# Set desktop user agent string
profile = QtWebEngineCore.QWebEngineProfile.defaultProfile()
profile.setHttpUserAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36")
# Create the QWebEngineView with viewport size
view = QtWebEngineWidgets.QWebEngineView()
view.resize(1920, 1080) # Adjust viewport size as needed
page = QtWebEngineCore.QWebEnginePage(view)
# Callback function for handling print finished event
def handle_print_finished(filename, status):
print("finished", filename, status)
app.quit() # Quit the application after printing
# Callback function for handling load finished event
def handle_load_finished(status):
if status:
# Adjust print layout
layout = QtGui.QPageLayout() # Import from QtGui
layout.setPageSize(QtGui.QPageSize.A4) # Or desired page size
layout.setOrientation(QtGui.QPageLayout.Landscape) # If content is wider
layout.setMargins(QtCore.QMarginsF(0, 0, 0, 0)) # Set zero margins
page.printToPdf(pdf, layout)
else:
print("Failed to load page")
app.quit()
# Connect signals and load the page
page.pdfPrintingFinished.connect(handle_print_finished)
page.loadFinished.connect(handle_load_finished)
page.load(QtCore.QUrl(url))
# Start the application event loop
sys.exit(app.exec())
if __name__ == "__main__":
# Check if the correct number of command-line arguments is provided
if len(sys.argv) != 3:
print("Usage: python application.py <url> <name_of_pdf_file>")
sys.exit(1)
# Extract URL and PDF file name from command-line arguments
url = sys.argv[1]
pdf = sys.argv[2]
# Call the function to convert the web page to PDF
url_to_pdf(url, pdf)
Now, you can save it as say, converter.py then run this script as below
The complete code can be found in this repository. I encourage you to test it out, share your feedback through comments, and consider enhancing it by adding additional features if you find opportunities for improvement.
cheers .