How to parse fuzzy text descriptions into structured time-series data in Python?

02:03 07 May 2026

I am extracting streaming subscriber data from text using an LLM, and I get results like this:

{
  "raw_extractions": [
    {
      "platform_mention": "Netflix",
      "year_mention": "2012",
      "subscriber_mention": "roughly 30 million subscribers worldwide"
    },
    {
      "platform_mention": "Netflix",
      "year_mention": "2020",
      "subscriber_mention": "just under 200 million"
    },
    {
      "platform_mention": "Netflix",
      "year_mention": "2022",
      "subscriber_mention": "hovered around 220 million subscribers"
    }
  ]
}

I need to convert this into clean time-series data for analysis:

| year | platform | subscribers_min | subscribers_max | confidence |
|------|----------|----------------|-----------------|------------|
| 2012 | Netflix  | 30             | 30              | medium     |
| 2020 | Netflix  | 195            | 200             | medium     |
| 2022 | Netflix  | 220            | 220             | medium     |

What is the best Python approach to parse fuzzy phrases like "roughly 30 million", "just under 200 million" into numeric ranges?

python fuzzy

Your Answer

Privacy & Cookie Consent