I'm building a DOM search highlighter that supports multi-word queries.
If the user searches:
power shell
I want:
The full phrase
"power shell"to match firstIf the full phrase doesn’t exist, fall back to matching
"power"OR"shell"
Here is the builder I'm using:
function buildSearchRegex(q) {
q = (q || "").trim().replace(/\s+/g, " ");
if (!q) return null;
const tokens = q.split(" ").filter((t) => t.length >= 1);
if (!tokens.length) return null;
const escapedTokens = tokens.map(t =>
t.replace(/[.*+?^${}()|[\]\\]/g, "\\$&")
);
const phrasePattern =
escapedTokens.length >= 2
? escapedTokens.join("[\\s\\u00A0]+")
: null;
const tokenPattern = escapedTokens.join("|");
const pattern = phrasePattern
? `${phrasePattern}|${tokenPattern}`
: tokenPattern;
return new RegExp(pattern, "gi");
}
However, because the phrase and tokens are in the same alternation group:
phrase|token1|token2
the token matches can sometimes occur before the phrase match in the text flow.
What is the correct way to:
Prefer full phrase matches
But still fall back to token matches
Without double-matching parts of the phrase?
Is this best solved via regex alone, or should I perform two passes (phrase first, then tokens)?