Cloze Reader

Loading passages...

Cloze Reader

About this Project

In 1953, Wilson Taylor (1953) cut words from a passage at regular intervals, asked readers to reconstruct them from context, and named the result a cloze test. Decades later, Devlin et al. (2019) masked tokens the same way and trained a neural network to predict them from surrounding context. Taylor measured reading comprehension; Devlin induced vector representations. Harris (1954) and Firth (1957) had already established the shared premise, namely, that words derive meaning from the company they keep, and both procedures operationalized it. Cloze Reader hands the fill-in-the-blank task back to human readers at a moment when machines perform it at scale.

Each round pulls a passage from Project Gutenberg and blanks out one or more words, chosen by an AI model rather than at fixed intervals. Read the surrounding context and infer what belongs in the gap. An embedded chat panel offers preset prompts about part of speech, sentence role, word category, and synonymy without revealing the answer. Advancing through levels stacks more blanks per passage and steepens the vocabulary from common to challenging. Scores and streaks accumulate across rounds, and a leaderboard captures the highest levels reached. Non-zero temperature decoding governs word selection, so the same passage can yield different blanks on different runs.

Passages stream from the Hugging Face Datasets API into a Cloudflare Workers backend built with Hono. Word selection, hint generation, and passage contextualization each fire as separate calls to Gemma-3-27B (Gemma Team 2025), an open-weight model from Google, routed through OpenRouter. Each call runs in isolation, with no shared context between tasks, so word selection stays independent of hint generation. Gutenberg texts anchor the source archive as canonical pretraining data, present in The Pile (Gao et al. 2020) and its successors, already absorbed into parameter weights and stripped from their historically situated literary forms. Cloze Reader restores these texts to the surface and asks human readers to engage them slowly, one blank at a time. Source code at github.com/zmuhls/cloze-reader.

Works Cited

Devlin, J. et al. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT 2019.
Firth, J.R. (1957). A synopsis of linguistic theory, 1930–1955. Studies in Linguistic Analysis.
Gao, L. et al. (2020). The Pile: An 800GB dataset of diverse text for language modeling. arXiv:2101.00027.
Gemma Team (2025). Gemma 3 Technical Report. arXiv:2503.19786.
Harris, Z.S. (1954). Distributional structure. Word, 10(2–3), 146–162.
manu (2024). project_gutenberg. Hugging Face Datasets. Available at: huggingface.co/datasets/manu/project_gutenberg.
Project Gutenberg (2026). Project Gutenberg. Available at: gutenberg.org.
Taylor, W.L. (1953). "Cloze procedure": A new tool for measuring readability. Journalism Quarterly, 30(4), 415–433.