BM25, one knob at a time
BM25 scores how relevant a document is to a single query term. It's three ideas multiplied together: rarity (IDF), term frequency with diminishing returns, and a length penalty. Drag the knobs and watch the score — and the curves — respond.
Matching a rare term is worth more than a common one (IDF).
Seeing a term more times in a doc helps, but with diminishing returns controlled by
k₁. And a match in a short, focused doc counts for more than the same match in a long
rambly one — the length penalty, controlled by b. DuckDB's defaults are k₁=1.2,
b=0.75.
1.Watch the two curves move
The left curve fixes everything except tf and sweeps it — this is the
saturation shape k₁ controls. The right curve sweeps |d| — the
length penalty b controls. The green marker is where your
current sliders sit.