I need to generate PDFs and my current pipeline expects XSL-FO (rendered by an FO engine), but my input content is HTML.
I’m trying to understand the right way to convert HTML → XSL-FO, ideally preserving common formatting like:
headings (
h1-h6)paragraphs, bold/italic
lists (
ul/ol)tables
basic CSS (margins/padding, font sizes, alignment)
What I’m looking for
Is there a reliable conversion approach (library/tool) for HTML → XSL-FO?
If direct conversion is not recommended, what’s the best practice pipeline to go from HTML to PDF when I have existing FO-based infrastructure?
How do people handle CSS, especially for tables and spacing, during the conversion?
Context / constraints
Input HTML may be user-generated, so it can be messy.
I can restrict the HTML/CSS subset if needed.
I can run conversion server-side (Java/Python/Node are all possible).
Output is XSL-FO XML, then rendered to PDF by an FO engine.
What I tried
Searching for “HTML to XSL-FO” mostly returns outdated references or partial converters.
I’m unsure whether I should:
convert HTML → well-formed XHTML → transform to FO (XSLT?)
use a dedicated converter
avoid FO and use an HTML-to-PDF renderer instead