so we have some project and the main problem is we have to clean .md files and it is time consuming if we do it by hand and we have managed to do automation more than 70% and still we have to do the 30% by hand those .md files collected with OCR from books for AI and some of the sturcture of the data is mixed of broken I tried to do full automation using langchain,rag,llm and it worked perfectly but the hallucination was problem but it was fixing all the errors and fixing all the broken structure can some one help