Unlocking the Power of WeasyPrint: Generating Dynamic PDF Reports with Python

Unlocking the Power of WeasyPrint: Generating Dynamic PDF Reports with Python

Transforming Live Data into Beautifully Formatted PDFs: A Step-by-Step Guide with WeasyPrint

Introduction

When it comes to generating PDF reports from dynamic data, having the right tools at your disposal is crucial. In this blog post, we'll unlock the power of WeasyPrint, an impressive Python library that transforms HTML and CSS into stunning PDF documents. We will walk through a code snippet that demonstrates the process and discuss the essential concepts involved.

The Need for Dynamic PDF Reports

There are times when static PDFs just won't cut it. You may find yourself in situations where you need to generate reports that incorporate live data, adapt to changing conditions, or personalize content based on user inputs. That's where the magic of dynamic PDF reports comes in. With WeasyPrint as your ally, you'll be able to render HTML and CSS into beautifully formatted PDFs that are dynamically generated to meet your specific needs.

Prerequisites

Before we dive into the code, it's important to note that a basic understanding of Python and HTML will pave the way for a smoother journey.

Setting Up WeasyPrint

To embark on our PDF generation adventure, we'll need to ensure that WeasyPrint is properly set up. To install it you can use the following command in your terminal or command prompt:

pip install WeasyPrint

Building a Data Dictionary for HTML Placeholder Substitution

With WeasyPrint installed, we can now proceed to the next step: creating a dictionary that stores the dynamic data to be inserted into the HTML document within our Python script.

This dictionary is the bridge between your Python script and the HTML content. It allows us to seamlessly integrate the data variables into the HTML template. Here's an example of how the dictionary can be structured:

report_data = {
    'placeholder1': dynamic_data1,
    'placeholder2': dynamic_data2,
    'placeholder3': dynamic_data3,
    ...
}

The keys in the dictionary correspond to the placeholders (e.g., {placeholder1}, {placeholder2}, etc.) within the HTML file. Ensure that the keys in the report_data dictionary match the placeholders in the HTML file, allowing for accurate substitution during the HTML formatting process.

Integrating Images and Plots

But hey, why limit yourself to just text data? With WeasyPrint, you can take your PDF reports to the next level by including dynamic images and plots generated by your Python script.

Let's say you're using a plotting library like Plotly. After generating a plot, you can save it as an image file in a designated directory, such as "figures". Here's an example:

fig1.write_image('figures/fig1.png')

Now, let's seamlessly integrate these images into your PDF reports. First, we need to convert them into data URIs, which can be directly embedded into the HTML. Using the base64 module, we can encode the image files and create data URIs. Here's an example:

with open("figures/fig1.png", "rb") as f:
    fig1 = f"data:image/png;base64,{base64.b64encode(f.read()).decode()}"

Repeat this process for each plot, saving them as image files and converting them to data URIs. Make sure to adjust the file paths and variable names accordingly.

Now that you have a collection of data URIs for your plots, you can populate the plots dictionary with variables representing the images:

plots = {
    'figure1': fig1,
    ...
}

In the plots dictionary, the keys represent the placeholders in your HTML template where the images will be inserted (e.g., <img src="{fig1}" alt="Figure">). Assign each key with its corresponding variable that holds the data URI for the respective image.

Generating PDFs with HTML Templates:

Now, let's move on to the fun part: creating PDFs with WeasyPrint and HTML templates. The code snippet below demonstrates the process:

# Import the necessary libraries
from weasyprint import HTML
# Define a list of HTML files
html_files = ['file1.html', 'file2.html', 'file3.html']
# Create an empty list to store all the PDF pages
all_pages = []
# Iterate through each HTML file
for html_file in html_files:
    # Read the contents of the HTML file
    with open(f'htmls/{html_file}') as f:
        html = f.read()  
    # Apply dynamic data to the HTML template
    html = html.format(**report_data, **plots)
    # Create an HTML instance
    html_page = HTML(string=html, base_url=f'htmls/{html_file}')
    # Render the HTML and get the document
    doc = html_page.render()
    # Append the pages to the all_pages list
    all_pages += doc.pages

# Create a new document using the first original document's metadata
new_doc = doc.copy(pages=all_pages)
# Save the new document to a file
new_doc.write_pdf(f"{name}_report.pdf")

In the code, we use the WeasyPrint library to transform HTML and CSS content into visually appealing PDF reports. Here's how the process works:

  1. Iterate over the list of HTML files: We loop through a list of HTML files (html_files) that contain the templates for our PDF reports.

  2. Read the HTML content: For each HTML file, we open and read its content.

  3. Replace placeholders with dynamic data: Using the .format() method on the html string and passing **report_data and **plots as arguments, we replace the placeholders within the HTML with their respective dynamic data values. The ** operator allows us to unpack the report_data and plots dictionaries and pass their key-value pairs as keyword arguments to the .format() method.

  4. Render the HTML: We create an instance of the HTML class from WeasyPrint, passing the formatted HTML string. The base_url parameter is set to the path of the HTML file, ensuring that any relative paths within the HTML are resolved correctly.

  5. Generate the document: By calling the render() method on the html_page instance, we obtain a document (doc) that represents the generated PDF.

  6. Append pages to the list: We append the pages of the document to the all_pages list.

  7. Create a new document: We create a new document (new_doc) by copying the metadata from the original document (doc) and setting its pages to the combined all_pages list.

  8. Save the document as a PDF: Finally, we save the new document as a PDF file using the write_pdf() method. The resulting PDF report is generated, incorporating the dynamic data from the report_data and plots dictionaries. The filename includes the provided name variable for identification and organization purposes.

Conclusion

Congratulations, you're now armed with the magical powers of WeasyPrint and dynamic HTML templates to generate mesmerizing PDF reports! With this code snippet, you have a solid foundation for generating dynamic PDF reports using WeasyPrint. Feel free to customize and expand upon this code to meet your specific requirements and unleash the full potential of dynamic PDF generation.