Dealing with Cumulative Phonetic Alignment Drift in CTC-based Quranic Recitation Correction System
09:11 24 Jan 2026

Problem Context: I am building a Quranic recitation correction system using a fine-tuned Wav2Vec2-Bert model for Multi-level CTC decoding. The pipeline takes a user's audio, predicts phonemes (Sifat), and compares them against a reference generated by a phonetic transcriber (quran_phonetizer).

The Technical Challenge: In long Ayahs (sequences > 20 words), I am facing a Cumulative Alignment Drift (Shift). Since CTC models don't provide perfect word boundaries and Quranic recitation involves "Connected Speech" (Wasl) where phonemes drop or merge (e.g., Ighdam, Hamzatul Wasl), the global alignment using difflib.SequenceMatcher starts to drift.

By the middle of a long Ayah, a mistake in word #4 causes word #5 to be mapped to the phonemes of word #6, resulting in "phantom" errors and a massive drop in accuracy (sometimes falling to < 5%).

Current Logic: I am using a weighted proportional mapping where I calculate the "phonetic weight" of each word and distribute the global reference index accordingly, then apply SequenceMatche

python nlp speech-recognition difflib sequence-alignment