What are the default BM25 parameters in LlamaIndex BM25Retriever, and how can they be explicitly verified?

10:51 05 Jun 2026

I am using LlamaIndex’s BM25Retriever as part of a Retrieval-Augmented Generation (RAG) pipeline:

from llama_index.retrievers.bm25 import BM25Retriever

bm25_retriever = BM25Retriever.from_defaults(nodes=nodes, similarity_top_k=3)

However, I cannot find a clear way to inspect or explicitly confirm the BM25 hyperparameters used (e.g., k1, b, tokenization settings).

From literature, BM25 typically uses default Okapi parameters such as:

But in LlamaIndex, these parameters are not clearly exposed in the public API or object attributes.

What are the exact default BM25 parameters used internally by BM25Retriever in LlamaIndex?
Are these parameters configurable through the public API?
If not exposed directly, where in the source code are they defined?
Is there a recommended way to verify or log these parameters for reproducibility in experiments?

python nlp information-retrieval llama-index