How to customize RegexBasedLocationExtractionStrategy to also return font information along with text location
04:56 25 Mar 2026

I'm trying to develop a "template" processor which "replaces" tags (for instance $key1) with variable text. For this I plan to use RegexBasedLocationExtractionStrategy or a custom subclass thereof to paste boxes with new text over the locations of the tags. Document is tabular, so no problem with replacement text having a different length than tag. However, font should be preserved.

As is, RegexBasedLocationExtractionStrategy only returns the locations (rectangles) of the tags.

However, its javadoc suggest that its toCRI() and toRectangles() methods may be overridden to remember additional information (their example cites color of text, but it would work just as well with font information)

No problem to override toCRI to carry over the font info from TextRenderInfo to a new MyCharacterRenderInfo subclass.

However, carrying the font info from the MyCharacterRenderInfo to the final result (IPdfTextLocation or Rectangle) proves to be more difficult, especially if multi-line text matches are involved (Pattern.DOTALL flag):

  • the toRectangles() method is rather long, and would need to be duplicated entirely into the subclass

  • toRectangles() itself calls other methods which are package private, and thus inaccessible from the subclass, such as TextChunk::sameLine()

Or how is this supposed to be done?

Wouldn't it be easier if RegexBasedLocationExtractionStrategy supplied an overridable method that just makes one CharacterRenderInfo from a one-character TextRenderInfo, and another overridable method that makes or merges one IPdfTextLocation from a run of CharacterRenderInfos?

itext