Jukujikun: Kanji Compounds That Don't Read Like Their Parts

hbaristr 15 menit membaca

Spelling, not arithmetic

Some kanji compounds aren't compounds at all — they're spelling.

大人 reads otona, not dai-jin or oo-hito. 今日 reads kyō, not kon-nichi (well, sometimes — see below). 紅葉 reads momiji. 田舎 reads inaka. The pronunciation is glued to the whole-word meaning, not assembled from the readings of the individual characters.

These are 熟字訓 (jukujikun, "compound-word kun-readings"). They look like anomalies if you assume kanji-readings build words. They stop being anomalies the moment you remember kanji were imported into a language that already had words. Jukujikun is the receipt.

The opening canon

A short list every Japanese reader recognizes on sight. The phonology column is the load-bearing one — it explains what the spoken word actually is, independent of the characters glued to it.

Surface Kana Romaji Meaning What's going on phonologically
大人 おとな otona adult Native wago otona "grown one" predates the literate scribe; 大 + 人 chosen for sense, not sound
今日 きょう kyō today Yamato deictic kepu > kyō; 今 + 日 = "this day" by gloss
昨日 きのう kinō yesterday OJ ki-no-pi "the day before"; 昨 means "previous", 日 "day"
明日 あす asu tomorrow OJ asu "morning, next day"; 明 "bright/dawn" + 日 "day"
紅葉 もみじ momiji autumn leaves / maple Verb momidu "to redden"; 紅 "crimson" + 葉 "leaf" gloss the meaning
田舎 いなか inaka countryside Compound of OJ ina "rice" + ka "place"; 田 "paddy" + 舎 "dwelling" gloss the sense
海老 えび ebi shrimp / lobster Native ebi; 海 "sea" + 老 "old (bent)" describe the body, not the sound
太刀 たち tachi long sword Native tati (cf. verb tatsu "to stand/strike"); 太 "thick" + 刀 "blade"
大和 やまと yamato ancient Japan / "great harmony" Place name yamatə (mountain-gate region); 大 + 和 added late as honorific spelling
山車 だし dashi festival float Verb dasu "to put out / pull out"; 山 "mountain" + 車 "cart" describe the float
雪崩 なだれ nadare avalanche Verb nadaru "to slide down"; 雪 "snow" + 崩 "collapse" gloss the event
五月雨 さみだれ samidare early-summer rain OJ sa-midare "rice-planting rain"; 五月 "fifth month" + 雨 "rain" gloss the season
二十歳 はたち hatachi twenty years old Yamato numeral hata- "twenty" + age suffix -chi; 二十 + 歳 spell out the meaning
七夕 たなばた tanabata star festival (7 July) OJ tanabata "loom-shelf maiden" (a deity); 七 + 夕 "seventh evening" date-gloss the festival
為替 かわせ kawase exchange / remittance Verb kawasu "to exchange"; 為 "do" + 替 "replace" gloss the action

In every row, the kana column is the historical fact. The kanji are a logograph chosen after the word existed, picked to encode the meaning rather than transcribe the sound. Otona is not made of o- and -tona the way gakkō is made of gaku + . It is one indivisible Yamato word with a two-character spelling.

The mechanism

Japanese had a spoken vocabulary for centuries before kanji arrived from the Korean peninsula in roughly the 5th century CE (Seeley 1991). When the scribal class began writing Japanese — not Chinese — they faced a forced choice for every native word:

  1. Pick a kanji whose Chinese sound resembled the Japanese word. This is ateji (当て字, "applied characters"). 寿司 → sushi is the classic case: 寿 reads ジュ/ス, 司 reads シ; together they ad-hoc-spell su-shi, with semantic flavor (寿 "longevity", 司 "to manage") as a bonus rather than the load-bearing layer.
  2. Pick kanji whose Chinese meaning matched the Japanese word, and ignore the sound entirely. The reader sees the characters, recognizes the meaning, and supplies the native pronunciation from memory. This is jukujikun.
  3. Coin a new on-reading compound from Sino-Japanese morphemes. This is the productive kango mechanism (学校 gakkō, 電話 denwa).

Path 2 is the load-bearing claim of this post. The kanji function as a logograph for the whole word; the phonology is a wago survival, untouched by the borrowing. Habein (1984) traces this layered orthography back to the Nara-period Kojiki (712) and Man'yōshū (c. 759), where the same scribes wrote the same word three different ways within a single text — sometimes phonetically, sometimes semantically, sometimes as jukujikun — depending on register and prosody.

Jukujikun vs ateji vs kango

These three are routinely confused. The crisp distinction is which of {sound, meaning, both} the kanji choice is tracking.

Type Kanji track... Reading source Example Decomposition
Phono-semantic compound (kango) sound + meaning compositionally Sino-Japanese morphemes assembled 学校 gakkō gaku "learn" + 校 "school"
Native compound (wago) meaning compositionally native morphemes assembled 山道 yamamichi yama "mountain" + 道 michi "road"
Ateji (phonetic) sound only the loaned/native word, sound-spelled 寿司 sushi 寿 ≈ su, 司 ≈ shi
Jukujikun meaning only, holistically the native word, ignored phonologically 大人 otona 大 + 人 = "adult" as a unit; sound is wago survival

Joyce (2002) frames this as a difference in which side of the orthography-phonology mapping the morpheme lives on. In kango, a single kanji is a morpheme — a sound-meaning pair. In jukujikun, the compound is the morpheme: indivisible on the phonology side, even though it occupies multiple character slots on the orthography side. Tomita (1989), surveying ateji in Edo-period vernacular literature, distinguishes "phonetic ateji" (sound-driven, like 寿司) from "semantic ateji" — and the latter is essentially modern jukujikun under a different label. The terminological boundary kept getting renegotiated into the early 20th century.

The practical test: if you can swap out one character and the reading of the remaining character stays put, it is a compositional compound. If swapping one character forces the whole reading to change because the unit was the gestalt — you were looking at jukujikun.

Why so many are calendar and agricultural words

Run through any list of jukujikun and a pattern jumps out. They cluster on time, weather, agriculture, fauna, and place names.

Domain Examples Why
Calendar 今日, 昨日, 明日, 七夕, 二十歳 Time-deixis is core vocabulary; yamato words for "today/yesterday/tomorrow" predate writing by millennia
Weather 五月雨, 雪崩, 時雨 shigure, 吹雪 fubuki Climate vocabulary precedes literacy and resists replacement
Agriculture 田舎, 紅葉, 早苗 sanae, 稲妻 inazuma Rice-cycle vocabulary is the deepest stratum of Yamato Japanese
Fauna 海老, 海月 kurage, 河豚 fugu, 百足 mukade Native folk taxonomy; kanji written as descriptive glosses
Place / kin 大和, 田舎, 海原 unabara, 父さん tōsan Toponyms and kin terms are linguistic fossils

Not coincidence. Frellesvig (2010, ch. 1) classifies these as the deep-stratum native lexicon — morphemes attested in Old Japanese (8th century) and reconstructable into proto-Japonic. They were already entrenched, prosodically irregular, and resistant to reanalysis when Chinese characters arrived. Vovin (2010), reconstructing Old Japanese from Man'yōshū phonograms, shows that words like yamatə, ki-no-pi, sa-midare, and tanabata were stable centuries before any scribe needed to write them down. When literacy came, the path of least resistance was to assign kanji as semantic glosses and leave the spoken word alone.

The strongest predictor of jukujikun status is not orthographic. It's lexical: deep-stratum native vocabulary, especially in semantic domains that predate Chinese cultural influence. Compounds for things the Chinese imports actually introduced — bureaucracy, Buddhism, scholarship, technology — went the kango route. Compounds for things the Yamato had already named for centuries went jukujikun.

Fossils

Because the spoken layer is a wago survival untouched by Sino-Japanese phonology, jukujikun preserve archaic readings that have otherwise dropped out of the language. They are linguistic fossils. Historical phonology lives off them.

Modern jukujikun OJ / proto-form Note
大和 yamato OJ yamatə (with reduced central vowel ə) The final -to preserves an OJ ə that merged into o in most modern words; here it is locked in by the spelling
田舎 inaka ina "rice (paddy)" + ka "place / locale" ina- is the same morpheme behind inazuma "lightning" (lit. "rice-spouse") and inaho "ear of rice"; freestanding ina is otherwise extinct
今日 kyō OJ kepu < proto ke-pi "this day" The -pu > -fu > -u shift is regular; kepu is attested in the Man'yōshū
明日 asu OJ asu / asita "morning, next day" The doublet asu (formal/poetic) vs ashita (everyday) preserves an OJ register split
七夕 tanabata tana-bata "shelf-loom" — the loom-maiden deity The free morpheme tana "shelf" survives; hata "loom" survives; the compound noun is otherwise dead
紅葉 momiji OJ verb momidu "to redden/crimson" → nominalized momidi Conjugation class lost; the noun is the only surviving form of the verb root

Vovin (2003), reconstructing Classical Japanese morphology, treats jukujikun as one of the most reliable sources of pre-Heian phonological evidence. The reasoning is mechanical: because the spelling is locked to the meaning rather than the pronunciation, the pronunciation is free to preserve forms that would otherwise have been overwritten by Sino-Japanese phonotactics. The orthography is conservative. The phonology is conservative. The result is a sealed time capsule.

The Man'yōshū as ground zero

The 8th-century Man'yōshū (万葉集, "Collection of Ten Thousand Leaves") is where the trichotomy first becomes legible. In the same poem — sometimes the same line — a scribe might write Japanese using kanji-as-phonogram (the man'yōgana that became kana), kanji-as-semantic-gloss, and kanji-as-jukujikun. Cranston (1993, A Waka Anthology, Volume One: The Gem-Glistening Cup) gives parallel transliterations on facing pages to make this layering visible.

Page from the Genryaku-bon manuscript of the Man'yōshū, an 11th-century copy of an 8th-century Japanese poetry anthology
Genryaku-bon (元暦校本) manuscript of the Man'yōshū, an 11th-century copy collated in 1184 of the 8th-century original. The Nara-period scribes who compiled it wrote Japanese with no native script — they shuttled freely between phonographic kanji (man'yōgana), semantic kanji, and jukujikun within a single poem. This is where the three orthographic strategies were negotiated in real time. Source: Wikimedia Commons.

The deeper background — how kanji even arrived on the archipelago, and why the scribal class needed three competing strategies at all — is in our history of kanji from oracle bones to the three scripts. For this post the relevant fact is that jukujikun is not a 20th-century pedagogical curiosity. It is a 1,300-year-old orthographic practice, present at the moment Japanese became a written language.

The reform threats, and why jukujikun survived

Postwar language-policy reformers, working under the tōyō kanji (当用漢字, 1946) and later jōyō kanji (常用漢字, 1981, revised 2010) lists, took a fundamentally compositional view: each kanji should have a small fixed set of readings, and compounds should be assembleable from those. Jukujikun violates this principle by definition — the compound's reading is not derivable from the parts. Several reformers in the 1946–1956 period proposed retiring jukujikun outright, replacing them with all-kana spellings (おとな for 大人, きのう for 昨日) or with phonetic ateji (Twine 1991, ch. 5).

Three reasons they survived:

Reason Mechanism
Frequency Words like 大人, 今日, 明日, 田舎 are in the top few thousand of any frequency list. Re-spelling the most common words in the language was a non-starter.
Pedagogy Elementary curricula could absorb a small list of "special readings" (特別な読み, tokubetsu na yomi) as exceptions, taught by rote rather than rule — the same way English teaches "knight", "though", "colonel".
Furigana Writers could (and do) annotate jukujikun with kana ruby above the kanji whenever ambiguity threatens. The cost of the exception is amortized across the writing system rather than borne by the lexicon.

Gottlieb (1995, Kanji Politics) argues that the survival of jukujikun is one of the clearest signals that postwar Japanese script reform was incremental, not revolutionary: every proposal that demanded users unlearn high-frequency forms failed, while proposals that simply pruned low-frequency kanji largely succeeded. The pedagogical handling — which kanji get taught when, and how exceptions are introduced — is laid out in our post on the kyōiku kanji and how Japanese children learn them.

The 247 official jukujikun on the Joyo list

The 2010 revision of the jōyō kanji-hyō (常用漢字表) does not just list 2,136 characters with their officially-sanctioned on/kun readings. Appendix 2 (付表 fuhyō) lists exactly 247 jukujikun and special-reading compounds explicitly recognized as standard Japanese, taught in school, and acceptable in government writing. They get their own appendix because they cannot be derived from the main reading tables. The appendix is the official acknowledgment that compositionality breaks here.

A representative sample:

Surface Kana Romaji Meaning Joyo status
明日 あす asu tomorrow Appendix 2
大人 おとな otona adult Appendix 2
今日 きょう kyō today Appendix 2
昨日 きのう kinō yesterday Appendix 2
田舎 いなか inaka countryside Appendix 2
紅葉 もみじ momiji maple, autumn leaves Appendix 2
七夕 たなばた tanabata star festival Appendix 2
二十歳 はたち hatachi 20 years old Appendix 2
五月雨 さみだれ samidare early-summer rain Appendix 2
雪崩 なだれ nadare avalanche Appendix 2

The full list is published by the Agency for Cultural Affairs at the Bunkachō 常用漢字表 page. Words not in Appendix 2 are not "wrong" — 海老 ebi, 太刀 tachi, 山車 dashi, 為替 kawase are entirely standard. They are just not on the list of jukujikun the government commits to teaching every student. Appendix 2 is the floor, not the ceiling.

Edge cases: one surface, multiple readings

Most jukujikun coexist with a kango reading of the same characters. The choice between them encodes register, formality, or domain.

Surface Reading 1 Reading 2 Reading 3 What flips it
今日 きょう (kyō, jukujikun) — everyday こんにち (konnichi, kango) — formal "the present day" Register: news/letters use konnichi; speech uses kyō
明日 あす (asu, jukujikun) — poetic / formal あした (ashita, wago) — everyday みょうにち (myōnichi, kango) — very formal Register and prosody (asu in waka, ashita in conversation)
一日 ついたち (tsuitachi, jukujikun) — "1st of the month" いちにち (ichinichi, kango) — "one day (duration)" いちじつ (ichijitsu, kango) — archaic literary Calendrical vs durational meaning
大人 おとな (otona, jukujikun) — "adult" だいにん / たいじん (dainin / taijin, kango) — "important person" Sense: ordinary "adult" vs honorific "great person"
紅葉 もみじ (momiji, jukujikun) — "maple, autumn leaves" こうよう (kōyō, kango) — "the leaves turning color" (verb-like) Object (the tree/leaf) vs event (the seasonal change)
山車 だし (dashi, jukujikun) — "festival float" さんしゃ (sansha, kango) — not used for floats Domain: festival vocabulary locks the jukujikun reading

Backhouse (1993) frames this as register stratification. The kango reading is the academic / formal / Sinitic option; the jukujikun is the native / everyday / lived-in option. Same surface, two different sociolinguistic registers, distinguished only by which reading the speaker activates. One of the cleanest demonstrations in any writing system that orthography and phonology are separable layers.

How this dictionary marks jukujikun

Every word here has a word_formation_type (kango / wago / jubako / yuto / jukujikun / mixed), and every per-character link in a word carries a reading_type (onyomi / kunyomi / nanori / jukujikun). For jukujikun words the per-character pronunciation field is intentionally nil — the kanji does not contribute a phonetic value. Only the whole-word reading is meaningful. The data model is the same shape as the linguistic claim.

You can see this on the show pages directly:

On a kanji's own show page, words are grouped by reading type. The kunyomi block shows kanji that contribute their kun reading to a compound; the jukujikun block shows compounds where the kanji is just spelling. The visual distinction is intentional. Phonological arithmetic and orthographic logogram are different kinds of facts.

Word-formation typology more broadly — kango, wago, the two mixed types (jubako and yuto), and jukujikun — has its own dedicated post: word formation types: kango, wago, and jukujikun.

Jukujikun is what happens when a writing system is glued onto a language that already had centuries of words. Most of the time the glue is invisible — on-readings flow compositionally, kun-readings flow compositionally, and the system pretends to be modular. Jukujikun is where the pretense drops. The compound is one Yamato word with a two-character spelling, the kanji are along for the ride, and the only thing the reader can do is recognize the gestalt. For a system that gets accused of being unprincipled, this is one of the few places it actually is.

Further reading

Internal:

External:

  • Seeley, C. (1991). A History of Writing in Japan. University of Hawai'i Press. The standard English-language history; chs. 2–4 trace the layered orthography from Nara to Heian.
  • Frellesvig, B. (2010). A History of the Japanese Language. Cambridge University Press. Ch. 1 on the deep-stratum native lexicon; ch. 6 on Sino-Japanese borrowing.
  • Vovin, A. (2003). A Reference Grammar of Classical Japanese Prose. RoutledgeCurzon. Treats jukujikun as a primary source of pre-Heian phonological evidence.
  • Habein, Y. S. (1984). The History of the Japanese Written Language. University of Tokyo Press.
  • Twine, N. (1991). Language and the Modern State: The Reform of Written Japanese. Routledge. Chs. 4–5 on the postwar reform debates over jukujikun and ateji.
  • Bunkachō (2010). 常用漢字表 (joyo kanji table). Appendix 2 is the official jukujikun list.

Send feedback

Optional — only if you'd like a reply.