Spaced Repetition Algorithms: From Ebbinghaus to FSRS — A Deep Dive

hbaristr 約16分で読めます

The scheduler is the whole game

A card appears. You answer. The app picks the next interval — tomorrow, next week, three months from now. That single decision is the product. Get it right and you keep 95% of what you study with almost no waste. Get it wrong and you either forget (intervals too long) or grind through cards you already own (intervals too short).

Everything else in a flashcard app — the UI, the deck format, the gamification — is wallpaper around that one number. The scheduling engine is the load-bearing wall.

The lineage runs 140 years. A German psychologist memorizing nonsense syllables alone in his apartment in 1885. A Polish molecular biology student writing DOS code in 1987. A Chinese student named Jarrett Ye training neural-net weights on hundreds of millions of Anki reviews in 2022. Same problem the whole way: when does a memory next need attention.

This is the history, the math, and the engineering.

1885: Ebbinghaus and the forgetting curve

In 1885 Hermann Ebbinghaus published Über das GedächtnisMemory: A Contribution to Experimental Psychology. He was the subject and the experimenter. He memorized lists of 13 CVC nonsense trigrams (WID, ZOF, BUP — chosen because they carried no prior associations) and tested himself from 20 minutes out to 31 days. His metric was savings: how much less time relearning a list took versus learning it cold.

Time Since Learning Retention (%) Lost (%)
20 minutes 58.2 41.8
1 hour 44.2 55.8
8.8 hours 35.8 64.2
1 day 33.7 66.3
2 days 27.8 72.2
6 days 25.4 74.6
31 days 21.1 78.9

Source: Ebbinghaus (1885), Section 29. Replicated by Murre & Dros (2015) in PLOS ONE with modern controls.

Ebbinghaus's forgetting curve showing memory retention decaying over time, with a steep initial drop and long tail
The classic shape of Ebbinghaus's forgetting curve: rapid initial loss, then a long, slow tail. Source: Wikimedia Commons (public domain).

The shape — cliff first, long tail after — has held up across thousands of studies for 140 years. Cepeda et al. (2006) ran a meta-analysis of 184 articles, 317 experiments. Spaced study beat massed study by 10–30% across virtually every condition tested.

Ebbinghaus's deeper claim is the one that matters here: each successful review resets the curve at a shallower slope. Forgetting still happens, but slower. The window before you forget grows. That is the spacing effect — the oldest, most replicated finding in experimental psychology. Everything downstream is engineering on top of it.

Exponential vs. power law — the curve was wrong for 30 years

What function describes the curve? The whole scheduler hangs on the answer.

Ebbinghaus himself fit his data to a power function: b = 100k / ((log t)c + k). For decades after SuperMemo entered the scene, the field assumed forgetting was exponential: R(t) = e-t/S, where S is memory stability. SuperMemo rode this for 30 years.

In 2024 the FSRS team showed the assumption was wrong. A power function fits real-world data better:

R(t, S) = (1 + F · t/S)C

where F = 19/81 and C = -0.5.

The mechanism: individual memories may decay exponentially, but a flashcard deck is a mixture of memories at different strengths. Mix exponentials and you get a power law in the aggregate. Ebbinghaus had quietly known this — his own data fits a power function better than an exponential. It took the SRS field 139 years to notice.

Model Equation Tail Behavior Fit to Real Data
Exponential R = e-t/S Drops to near-zero quickly Good for single-item, poor for mixed decks
Power law R = (1 + t/9S)-1 Heavy tail, slower decay Better fit across 10K+ user collections

Source: A technical explanation of FSRS, Expertium's Blog.

1967: Pimsleur's graduated intervals

Before computers, Paul Pimsleur published a graduated interval recall schedule in 1967, designed for cassette-tape language instruction:

5 sec → 25 sec → 2 min → 10 min → 1 hour → 5 hours → 1 day → 5 days → 25 days → 4 months → 2 years

Each interval is roughly 5× the previous. Hand-tuned for audio, where you cannot flip back. Pimsleur courses still ship this schedule unchanged. Fixed. No adaptation to the learner. Effective anyway — proof that any exponential schedule beats no schedule.

1972: Leitner's boxes — spaced repetition with cardboard

Sebastian Leitner published So lernt man lernen in 1972. Five physical boxes of flashcards. Get a card right, it moves rightward (longer interval). Get it wrong, back to Box 1.

Box Review Frequency
1 Every day
2 Every 2 days
3 Every 4 days
4 Every 9 days
5 Every 14 days

Diagram of the Leitner flashcard box system: cards advance to the next box on correct recall and return to the first box on failure
The Leitner box system: a correct answer promotes the card one box rightward (less frequent review); a wrong answer demotes it back to box 1. Source: Wikimedia Commons (CC0).

The Leitner system is the only spaced repetition algorithm you can run with cardboard and a kitchen drawer. The insight — spend your study budget on what you know least — is the seed every later algorithm grew from.

1985–1987: Wozniak and SM-2

On February 25, 1985, a 22-year-old molecular biology student in Poznań named Piotr Wozniak — sick of forgetting English vocabulary and biochem facts — started hand-tracking optimal inter-repetition intervals in a notebook. Two and a half years later, on December 13, 1987, he shipped SuperMemo 1.0 for DOS. The first computer program that scheduled flashcards.

The algorithm inside, SM-2, tracks three variables per card:

  • n: repetition number
  • EF: easiness factor (init 2.5, adjusted by responses)
  • I: inter-repetition interval in days

The interval schedule:

I(1) = 1 day
I(2) = 6 days
I(n) = I(n-1) × EF    for n > 2

EF after each review:

EF' = EF + (0.1 - (5-q) × (0.08 + (5-q) × 0.02))

q is the 0–5 quality rating. EF floored at 1.3.

That is the whole algorithm. Two formulas, three variables, fits in a tweet. One person, no formal memory model, just empirical self-observation. Low-entropy keystrokes from the start.

And it works. SM-2 has been running, nearly untouched, inside Anki since 2006, inside Mnemosyne since 2003, and inside dozens of clones. The most widely deployed spaced repetition algorithm in history — written by one student, given away.

The cracks:

Limitation Consequence
No probability model Cannot predict how likely you are to recall a card at any moment
Fixed initial intervals (1, 6) No adaptation to card difficulty before first review
Linear EF adjustment Overreacts to single bad reviews; slow to recover
No per-user optimization Same formula for a medical student and a casual hobbyist
No forgetting model When you fail a card, it just resets — no signal about what went wrong

1989–2016: the SuperMemo divergence

Wozniak kept iterating. SM-4 (1989) introduced an optimization matrix. SM-5 (1989) made it converge faster. SM-8 through SM-18 piled on two-component memory (stability + retrievability), neural-net optimization, and incremental reading.

All of it stayed locked inside a paid Windows product. The rest of the world kept shipping SM-2. The interesting algorithms sat behind a license screen for 20 years while every open-source flashcard app cargo-culted a 1987 design.

Wozniak's own history is one of the most remarkable single-author research programs in software. It is also a case study in what proprietary isolation does to a field.

2016: Duolingo's Half-Life Regression

In 2016 Burr Settles and Brendan Meeder at Duolingo published A Trainable Spaced Repetition Model for Language Learning (ACL 2016). The algorithm: Half-Life Regression (HLR).

HLR models each word's half-life in memory — the time until recall probability drops to 50%. Unlike SM-2 it:

  • Uses logistic regression with psycholinguistic features (word frequency, cognate status, user history)
  • Trains on millions of real review records
  • Predicts actual recall probabilities, not just "next time"

On Duolingo's data HLR cut prediction error by 45%+ versus baselines. In live A/B:

Metric Improvement
Practice session retention +9.5%
Lesson retention +1.7%
Overall daily activity +12%

Source: Settles & Meeder (2016), ACL. Code: github.com/duolingo/halflife-regression.

Proof of concept: ML on real review data beats hand-tuned heuristics by a wide margin. But HLR was wedded to Duolingo's feature set and never escaped the building.

2022–2025: FSRS, finally open

In 2022 Jarrett Ye released FSRS — Free Spaced Repetition Scheduler. Open-source, modern ML, written for Anki. By November 2023 Anki shipped it as a native option. By 2025 it was the default for new users.

FSRS models memory with the DSR (Difficulty, Stability, Retrievability) framework:

Variable Symbol Definition Range
Difficulty D How hard it is to increase stability for this card 1–10
Stability S Days for retrievability to drop from 100% to 90% 0.1–36,500
Retrievability R Probability of successful recall right now 0–1

The core equations.

Forgetting curve (power function):

R(t, S) = (1 + F · t/S)C, where F = 19/81, C = -0.5

Stability after successful recall:

S'_r = S · (1 + ew₈ · (11 - D) · S-w₉ · (ew₁₀·(1-R) - 1) · hard/easy)

Stability after forgetting (lapse):

S'_f = w₁₁ · D-w₁₂ · ((S+1)w₁₃ - 1) · ew₁₄·(1-R)

The 19 weights (w₀ through w₁₈) are optimized per-user via gradient descent on review history. That is the move: FSRS treats scheduling as a machine learning problem, with log loss between predicted and actual recall as the objective.

Source: The Algorithm (FSRS Wiki), ABC of FSRS.

The benchmark: receipts

The open-spaced-repetition/srs-benchmark project scores algorithms on real Anki review data across thousands of user collections. Metric: log loss between predicted recall probability and binary outcome. Lower is better.

Algorithm Year Model Type Parameters Log Loss ↓ Notes
SM-2 (trainable) 1987 Linear EF 2 0.346 Added probability layer for benchmark
Leitner 1972 Fixed boxes 0 ~0.36 No probability prediction natively
HLR (Duolingo) 2016 Logistic regression 3+ 0.327 Feature-engineered
FSRS v3 2022 DSR exponential 13 0.332 First release
FSRS v4 2023 DSR power 17 0.326 Power curve, +4 params
FSRS-5 2024 DSR power + same-day 19 0.325 Same-day review handling
FSRS-6 2025 DSR power + flat curve 21 0.324 Optimizable curve flatness

Source: Benchmark of Spaced Repetition Algorithms, Expertium's Blog. Dataset: 10,000+ Anki user collections.

Headline: FSRS-5 beats SM-2 in 97.4% of user collections. Against SM-17 — SuperMemo's current proprietary algorithm — FSRS-6 wins in 83.3% of collections. The open one beats the locked one.

Translated into hours: users switching from SM-2 to FSRS report 20–30% fewer reviews for the same retention level. For someone doing 200 reviews a day, that is 40–60 fewer cards per session, compounded over years. That is real wall-clock time off your life.

Why FSRS works — three innovations

1. Per-user parameter optimization. SM-2 ships the same formula for everyone. FSRS trains 19 weights on your review history. If you consistently nail 30-day intervals, FSRS notices your stability grows fast and stretches your intervals. If you struggle with kanji compounds, it tightens them. The model adapts to the learner — finally.

2. Difficulty modulates stability growth, not just base interval. A difficult card (D=8) with high stability (S=90 days) gains less stability on a successful review than an easy card (D=3) at the same stability. Hard things need more reinforcement even after you "know" them. SM-2 could not see this; FSRS encodes it directly.

3. Retrievability-aware scheduling. FSRS knows your exact recall probability at any moment. Reviewing at R=0.70 produces a larger stability gain than reviewing at R=0.95 — because retrieving at lower confidence is a desirable difficulty. This is Robert Bjork's theory, implemented in code.

The science underneath: why spacing works

The spacing effect is not just an empirical regularity. Three converging lines of evidence:

The testing effect. Roediger & Karpicke (2006): students who tested themselves three times (STTT) recalled 61% after one week. Students who studied four times (SSSS) recalled 40%. Testing is not assessment — it is the most powerful encoding event you have. Rowland's 2014 meta-analysis of 159 studies pegged the effect at Hedges' g = 0.50.

Desirable difficulties. Robert and Elizabeth Bjork coined the framework in 1994: conditions that make learning harder short-term — spacing, interleaving, retrieval practice — produce better long-term retention. The difficulty is the mechanism, not a cost paid for the result.

Consolidation. Memory consolidation during sleep transfers labile hippocampal traces to stable neocortical representations. Spacing reviews across sleep cycles gives consolidation room to operate. Cramming competes with itself for the same resource.

Study Year N Key Finding Effect Size
Ebbinghaus 1885 1 Forgetting follows power law decay
Cepeda et al. (meta) 2006 184 articles Spacing produces 10–30% better retention d = 0.42–0.77
Roediger & Karpicke 2006 120 Testing beats restudying at 1 week (61% vs 40%) large
Rowland (meta) 2014 159 studies Testing effect robust across conditions g = 0.50
Kornell & Bjork 2008 120 Interleaving doubles classification accuracy d = 0.99

The canon

Five texts. If you want to go deep, start here.

Text Author(s) Year Why It Matters
Spaced Repetition for Efficient Learning Gwern Branwen 2009 The definitive overview. 50,000+ words. History, research, practice, software. If you read one thing, read this.
Augmenting Long-term Memory Michael Nielsen 2018 A working scientist using Anki daily for years. The "memory is a choice" frame that reset how people thought about SRS.
Make It Stick Brown, Roediger, McDaniel 2014 The science of learning distilled for practitioners. Spacing, testing, interleaving, and why most study habits are theatre.
Andy Matuschak's notes Andy Matuschak 2019– Frontier work on "mnemonic media" — embedding spaced repetition inside reading itself.
A Three-Day Journey from Novice to Expert Jarrett Ye 2023 FSRS's creator walking you from zero to the DSR model in three sittings.

Gwern earns special mention. Continuously updated since 2009, arguably the most thorough single piece ever written on the subject. His 5-minute rule for what to add to your deck: if you will spend more than five minutes over your lifetime looking the thing up or suffering from not knowing it, it is worth a card. That heuristic ends most "what should I Anki?" arguments.

Nielsen reframed the entire conversation in 2018: "The single biggest change that Anki brings about is that it means memory is no longer a haphazard event, to be left to chance. Rather, it guarantees I will remember something, with minimal effort. Anki makes memory a choice."

Timeline: 1885–2025

Year Event Innovation
1885 Ebbinghaus publishes Über das Gedächtnis Quantified forgetting for the first time
1932 C.J. Spitzer tests 3,600 students First large-scale spacing effect study
1967 Pimsleur's graduated interval recall Hand-tuned schedule for audio learning
1972 Leitner's box system Physical spaced repetition without computation
1985 Wozniak begins self-experiments Birth of computational spaced repetition
1987 SuperMemo 1.0 / SM-2 First computer scheduling algorithm
1989 SM-4, SM-5 First adaptive algorithms (optimization matrix)
1991 SuperMemo 2.0 released as freeware SM-2 spreads globally
1994 Bjork coins "desirable difficulties" Theoretical framework for why spacing works
2003 Mnemosyne released First open-source SRS (uses SM-2)
2006 Anki released (Damien Elmes) SM-2 goes mainstream; 10M+ users eventually
2006 Roediger & Karpicke testing effect paper Landmark retrieval practice evidence
2016 Duolingo's HLR paper ML-based scheduling enters the literature
2022 Jarrett Ye releases FSRS v3 Open-source DSR model for Anki
2023 FSRS v4 (power curve) Power function replaces exponential
2023 Anki 23.10 ships native FSRS FSRS reaches millions of users
2024 FSRS-5 (same-day reviews, 19 params) Handles short-term memory
2025 FSRS-6 (21 params) Optimizable curve flatness

Implementing FSRS — it is small

Fernando Borretti wrote Implementing FSRS in 100 Lines. He was not exaggerating. The core algorithm fits in one file. The weight in the system is the optimizer that trains the 19 parameters against your review history — and even that is a few hundred lines of straightforward gradient descent.

My Kanji implements FSRS-5 natively in Ruby. The study session system uses the DSR model to schedule kanji reviews, with per-user weight optimization. It lives in app/services/fsrs_scheduler.rb — single file, ~200 lines, no external dependencies. Evergreen knowledge ported into a Rails app.

What is still unsolved

Spaced repetition is good. It is not done.

1. Cold start. FSRS needs review history to optimize weights. A brand-new user gets defaults. Matuschak notes the first 100 reviews are essentially flying blind.

2. Inter-item interference. Learn (wait) and (hold) on the same day, they collide. No production algorithm models this. The FSRS team has discussed it; it remains open research.

3. Recall is not understanding. Current SRS asks "can you retrieve this?" — not whether you understand it in context, can use it productively, or have integrated it with the rest of your knowledge. Matuschak's mnemonic medium is the most interesting work pushing past pure recall.

4. Emotional engagement. Matuschak argues the critical thing to optimize in an SRS is emotional connection to the review session and its contents. No algorithm does this. The 200th card of a session feels different from the 5th, and the scheduler is blind to that.

5. The right retention target. FSRS lets you set a desired retention rate (default: 90%). Is 90% optimal? Higher means more reviews. Lower means more forgetting. The Expertium benchmark shows diminishing returns above 90%, but the right point depends on the learner's goals, time budget, and material. No algorithm adapts it dynamically yet.

The bottom line

Spaced repetition is the closest thing in learning to a free lunch. The spacing effect has been replicated for 140 years across every population, material, and condition anyone has tested. The only question left is how efficiently your algorithm exploits it.

SM-2 was a breakthrough in 1987 and still works. FSRS is measurably better — fewer reviews, accurate predictions, per-user adaptation — and it is open source. Over a year of daily study, a 20% reduction in review load is dozens of hours back. Time you spend learning new material instead of paying tax on a worse scheduler.

The algorithm decides when you forget. Pick a good one.

References

フィードバックをおくる

にんい — おへんじがひつようなばあいのみ。