Word Formation Types: Kango, Wago, Jūbako, Yutō, Jukujikun

hbaristr 14 min read

Six boxes, every compound in exactly one

Every Japanese compound noun falls into one of six buckets — and which bucket it sits in is the single most useful fact about it. 漢語 (kango, Sino-Japanese, on+on). 和語 (wago, native, kun+kun). 重箱 (jūbako, on+kun). 湯桶 (yutō, kun+on). 熟字訓 (jukujikun, meaning-glued). 外来語 (gairaigo, foreign).

This is not trivia. The bucket tells you how to predict the reading, what historical layer the word was minted in, what register it occupies, and whether it will undergo rendaku. The two "mixed" categories — jūbako and yutō — are self-naming: each is an everyday object whose own reading exemplifies the type. The kind of self-referential trick only Japanese lexicography would pull.

The six categories

Type 日本語 Formula Reading rule Share of vocabulary Example
Kango 漢語 on + on Each kanji takes its Sino-Japanese reading ~49% by dictionary entry, ~41% by token 学校 (gakkō, school)
Wago 和語 kun + kun Each kanji takes its native reading ~33% by dictionary entry, ~37% by token 山桜 (yamazakura, mountain cherry)
Jūbako-yomi 重箱読み on + kun First kanji on, second kanji kun ~3% (residual) 本屋 (hon'ya, bookstore)
Yutō-yomi 湯桶読み kun + on First kanji kun, second kanji on ~2% (residual) 場所 (basho, place)
Jukujikun 熟字訓 meaning-glued Reading attaches to compound, not parts <1% (closed class, ~500 words) 大人 (otona, adult)
Gairaigo 外来語 foreign Phonetic adaptation, usually katakana ~10% by entry, ~5% in formal text コンピュータ (konpyūta)

The diagnostic fact lives in the two-axis split — dictionary share against running-text share. Kango wins the lexicon by count. Wago wins speech by frequency, because the most common 1,000 words skew native. Numbers from the National Institute for Japanese Language and Linguistics corpus surveys, summarized in Tamamura (2010) and Frellesvig (2010, ch. 9).

Stratum Share of dictionary entries Share of running tokens Register skew
Kango (Sino-Japanese) 49% 41% (formal text), 18–20% (speech) Heavily formal/written
Wago (native) 33% 37% (formal), ~50% (speech) Heavily everyday/spoken
Gairaigo (foreign) 9–10% 5% (formal), 4% (speech) Tech, fashion, food
Mixed (jūbako + yutō + hybrid) 6–8% ~6% Mixed
Jukujikun <1% <1% but high in core vocabulary Lexicalized fossils

NINJAL "Word List by Semantic Principles" surveys (1964, 2004); Bunkachō registers; Tamamura (2010), §3.

The two mixed categories name themselves

The trick that makes the categories stick. The names are examples.

重箱 (jūbako, "stacked food box") — the lacquered tiered box used for osechi-ryōri at New Year. Read it: 重 in on'yomi (じゅう, ) plus 箱 in kun'yomi (ばこ, bako). On + kun. The compound jūbako is itself a jūbako-yomi.

湯桶 (yutō, "hot-water bucket") — the wooden pourer for soba broth. Read it: 湯 in kun'yomi (ゆ, yu) plus 桶 in on'yomi (とう, ). Kun + on. The compound yutō is itself a yutō-yomi.

A traditional Japanese jūbako (重箱), a tiered lacquered food box used for osechi-ryōri at New Year
A 重箱 (jūbako) — the stacked lacquer food box that gives the on+kun compound type its name. The reading itself is the example: 重 takes its on'yomi じゅう, 箱 takes its kun'yomi ばこ (with rendaku from はこ). Source: Wikimedia Commons.

Common jūbako-yomi (on + kun)

Word Reading First kanji (on) Second kanji (kun) Meaning
本屋 ほんや (hon'ya) 本 hon 屋 ya bookstore
台所 だいどころ (daidokoro) 台 dai 所 dokoro kitchen
額縁 額縁 がくぶち (gakubuchi) 額 gaku 縁 buchi picture frame
王様 おうさま (ōsama) 王 ō 様 sama king (honorific)
役場 やくば (yakuba) 役 yaku 場 ba town hall, office
路肩 ろかた (rokata) 路 ro 肩 kata road shoulder
客間 きゃくま (kyakuma) 客 kyaku 間 ma guest room
番組 ばんぐみ (bangumi) 番 ban 組 gumi TV/radio program

Common yutō-yomi (kun + on)

Word Reading First kanji (kun) Second kanji (on) Meaning
場所 ばしょ (basho) 場 ba 所 sho place
相性 あいしょう (aishō) 相 ai 性 shō compatibility
朝晩 あさばん (asaban) 朝 asa 晩 ban morning and evening
雨具 あまぐ (amagu) 雨 ama 具 gu rain gear
手帳 てちょう (techō) 手 te 帳 chō notebook, planner
夕刊 ゆうかん (yūkan) 夕 yū 刊 kan evening paper
見本 みほん (mihon) 見 mi 本 hon sample
消印 けしいん (keshiin) 消 keshi 印 in postmark

Watch the same kanji play both sides. 所 surfaces as kun (どころ, dokoro) in 台所 and as on (しょ, sho) in 場所. 場 is on (じょう, ) in 工場 (kōjō, factory) and kun (ば, ba) in 場所. Position in the compound does not decide the reading layer — the lexicalized history of the specific compound does. Classification isn't optional, it's the whole game.

Kango: the Sino-Japanese substrate

Kango entered Japan in waves over twelve centuries. Buddhist scripture in the 5th–6th c. (Go-on layer). Tang scholarship in the 7th–9th c. (Kan-on layer). Song/Ming trade in the 12th–17th c. (Tō-on layer). Then a Meiji explosion of newly-coined Sino-Japanese for translating Western concepts. See on'yomi vs kun'yomi for the four-stratum breakdown.

The Meiji coinages are the most consequential cultural fact in the lexicon. Late-19th-century intellectuals — Nishi Amane, Fukuzawa Yukichi, Inoue Tetsujirō — coined hundreds of two-kanji kango compounds to render European political, scientific, and philosophical vocabulary. These were then exported back to Chinese, where they replaced earlier transliterations and now form a sizable fraction of modern Mandarin technical vocabulary. Liu (1995) catalogues roughly 1,000 such reverse-borrowings. A representative sample:

Japanese coinage Reading Meaning Coined for Now standard in Mandarin
社会 しゃかい (shakai) society "society" (Spencer, Comte) 社会 (shèhuì)
経済 けいざい (keizai) economy "political economy" (Mill) 经济 (jīngjì)
哲学 てつがく (tetsugaku) philosophy "philosophy" (Nishi Amane, 1874) 哲学 (zhéxué)
科学 かがく (kagaku) science "science" 科学 (kēxué)
文化 ぶんか (bunka) culture "culture" (Tylor) 文化 (wénhuà)
革命 かくめい (kakumei) revolution "revolution" 革命 (gémìng)
民主 みんしゅ (minshu) democracy "democracy" 民主 (mínzhǔ)
共産 きょうさん (kyōsan) communism "communism" 共产 (gòngchǎn)
自由 じゆう (jiyū) freedom "liberty" 自由 (zìyóu)
権利 けんり (kenri) right (legal) "right" (Mitsukuri Rinshō, 1868) 权利 (quánlì)
個人 こじん (kojin) individual "individual" 个人 (gèrén)
主義 しゅぎ (shugi) -ism "-ism" suffix 主义 (zhǔyì)

See Liu (1995); Saitō (2005); Howland (2002) for the translation politics. The "-ism" → 主義 mapping alone seeds dozens of compounds: 資本主義, 社会主義, 個人主義.

Kango has a phonological fingerprint you can spot on sight. Short morphemes. On'yomi readings. Frequent gemination at the morpheme boundary (gakkō, ippon, kekka). Long vowels ending in -ei or -ō — the ghosts of Middle Chinese -ŋ. Resistance to rendaku. See rendaku for why kango blocks voicing.

Wago: the native substrate

Wago is what was already there. Pre-contact Japanese vocabulary, forming the core of everyday speech — verbs, basic body parts, kinship terms, weather, geography, food. Wago uses kun'yomi exclusively. That's not coincidence: kun'yomi is the native reading attached to a kanji that was glossed onto a pre-existing Japanese word.

Phonological signatures of wago, per Vance (2008, The Sounds of Japanese):

  • Open syllables — every mora ends in a vowel, except moraic ん.
  • Vowel-initial roots are common (asa, ame, ie, umi); kango roots almost never start with a vowel.
  • Native verbs end in /-u/ (taberu, kaku, hashiru); kango verbs are formed only via the -suru auxiliary.
  • Rendaku is the morphophonological signature — wago + wago compounds voice the second element ~90% of the time.
Word Reading Kanji Domain Why wago
やま (yama) geography Standalone kun reading
かわ (kawa) geography Standalone kun reading
食べる たべる (taberu) 食 + okurigana basic verb Native verb stem + inflection
朝日 あさひ (asahi) 朝 + 日 nature compound kun + kun (asa + hi)
山桜 やまざくら (yamazakura) 山 + 桜 nature kun + kun, with rendaku (sakura → zakura)
手紙 てがみ (tegami) 手 + 紙 everyday object te (kun) + kami (kun) → tegami
春雨 はるさめ (harusame) 春 + 雨 nature haru + ame, with phonological binding

The wago/kango register split has a measurable consequence. Wago tokens dominate the most-frequent 500 words in spoken Japanese — over 70% wago. Kango takes over beyond the most-frequent 5,000 in formal text (Tamamura 2010, §3.4). Different lexicons for different rooms.

Gairaigo: the foreign layer

Almost universally written in katakana. Phonologically adapted to Japanese mora structure, with epenthetic vowels for illegal codas and consonant clusters. About 10% of the modern lexicon by entry count, concentrated heavily in domains opened up by Western contact — technology, fashion, food, sport, business.

Word Source language Source word Era Domain
パン (pan) Portuguese pão 16th c. (Nanban trade) food
カステラ (kasutera) Portuguese castella 16th c. food
タバコ (tabako) Portuguese tabaco 16th c. food/luxury
ガラス (garasu) Dutch glas 17th c. (Edo Rangaku) material
コーヒー (kōhī) Dutch koffie 18th c. food
ビール (bīru) Dutch bier 19th c. beverage
アルバイト (arubaito) German Arbeit late 19th c. (Meiji) work
カルテ (karute) German Karte Meiji medical
ズボン (zubon) French jupon Meiji clothing
アンケート (ankēto) French enquête Meiji survey
コンピュータ (konpyūta) English computer postwar tech
バイト (baito) German (clipped) Arbeit postwar work (slang)
パソコン (pasokon) English (compound clip) personal computer 1980s tech
スマホ (sumaho) English (clip) smartphone 2010s tech

See Loveday (1996), *Language Contact in Japan. Portuguese and Dutch loans cluster in 16th–18th c. through Nanban trade and Rangaku; German dominates Meiji-era medical/academic borrowing; English dominates postwar.*

Gairaigo combines freely with kango or wago: アルバイト学生 (arubaito gakusei, "student worker"; gairaigo + kango), 歯ブラシ (ha-burashi, "toothbrush"; wago + gairaigo), マイ箸 (mai-hashi, "my chopsticks", BYO; gairaigo + wago).

Jukujikun: reading glued to meaning, not to kanji

The special case. Jukujikun (熟字訓) is a compound whose reading attaches to the meaning rather than to the individual kanji. The kanji are chosen for their semantic content; the reading is a pre-existing native word slapped onto the whole compound. Decompose the compound and the reading does not decompose with it.

Compound Reading "Naive" sum of parts Why irregular
大人 おとな (otona) dai-jin / oo-hito "Adult" — wago noun glued onto Sino kanji
今日 きょう (kyō) / こんにち kon-nichi (also valid) Dual reading; kyō is the jukujikun
明日 あした (ashita) / あす mei-jitsu Three readings, two of which are jukujikun
紅葉 もみじ (momiji) kō-yō (also valid) Native word for autumn leaves
田舎 いなか (inaka) den-sha "Countryside"
海老 えび (ebi) kai-rō "Shrimp" — semantic, not phonetic
太刀 たち (tachi) tai-tō "Long sword"
土産 みやげ (miyage) do-san "Souvenir"
七夕 たなばた (tanabata) shichi-seki Festival name; reading is the festival's native name
五月雨 さみだれ (samidare) go-getsu-u "Early summer rain"
大和 やまと (yamato) dai-wa Ancient name for Japan
寿司 すし (sushi) ju-shi Ateji-style; reading is wago, kanji are decorative
煙草 たばこ (tabako) en-sō Loanword written with kanji for the meaning
一寸 ちょっと (chotto) i-sun "A little" — adverbial, fully lexicalized
流石 さすが (sasuga) ryū-seki "As expected" — fully opaque

The closed class runs to about 500 entries on the joyo list — the official jōyō kanji table appendix lists 116 jukujikun forms; non-jōyō pushes it to ~500. The category is not productive. New jukujikun are essentially never coined. The existing ones are linguistic fossils that survived because the underlying native word is too entrenched to displace, even when the kanji choice is borrowed wholesale from Chinese.

For deeper treatment see the standalone post on jukujikun. On the historical mechanism, see Seeley (1991, A History of Writing in Japan, ch. 3) on the kun-doku tradition — reading Chinese text as if it were Japanese — which seeded jukujikun by pinning native words to whole Chinese phrases. Joyce (2002) treats the morphological status: jukujikun compounds are stored as single lexical items, not assembled compositionally.

Mixed compounds outside jūbako/yutō

Three- and four-kanji compounds, plus anything with a foreign component, escape the binary jūbako/yutō scheme. The app's word_formation_type = "mixed" covers these.

Compound Reading Composition Type
歯ブラシ はブラシ (ha-burashi) wago (歯) + gairaigo (ブラシ) wago + foreign
マイ箸 マイはし (mai-hashi) gairaigo (マイ) + wago (箸) foreign + wago
アルバイト学生 アルバイトがくせい gairaigo + kango foreign + Sino
生ビール なまビール (nama-bīru) wago (生) + gairaigo (ビール) wago + foreign
日米関係 にちべいかんけい kango (日米) + kango (関係) kango + kango (4-char)
大企業 だいきぎょう kango + kango productive kango compounding
取扱説明書 とりあつかいせつめいしょ wago verb (取扱) + kango (説明書) wago + kango
朝シャン あさシャン (asa-shan) wago (朝) + gairaigo clip (シャン from shampoo) wago + foreign

The four-kanji 四字熟語 (yojijukugo) are almost categorically pure kango — Confucian and Buddhist provenance, on+on+on+on, fixed idioms (一期一会, 切磋琢磨, 温故知新). Mixing in non-kango at the four-character level is rare and conspicuous when it happens.

Why classification predicts pronunciation

This is the payoff. Word formation type is the strongest single predictor of three independent phonological behaviors: rendaku, gemination, and reading-layer choice for ambiguous kanji.

Rendaku targets wago. In wago + wago compounds, the second morpheme voices ~90% of the time. In kango + kango, voicing is suppressed — the rate drops below 10%, and the two kango morphemes more often undergo gemination (sokuonbin) instead. See the rendaku post for the full mechanism, Lyman's Law, and the Right Branch Condition.

Compound Type Phonological behavior Result
山 (yama) + 桜 (sakura) wago + wago Rendaku on second element 山桜 yamazakura
手 (te) + 紙 (kami) wago + wago Rendaku 手紙 tegami
青 (ao) + 空 (sora) wago + wago Rendaku あおぞら (aozora)
学 (gaku) + 校 (kō) kango + kango No rendaku, gemination at boundary 学校 gakkō
一 (ichi) + 本 (hon) kango + kango Gemination 一本 ippon
結 (ketsu) + 果 (ka) kango + kango Gemination 結果 kekka
場 (ba) + 所 (sho) kun + on (yutō) No rendaku 場所 basho
本 (hon) + 屋 (ya) on + kun (jūbako) Variable; often rendaku on the wago element 本屋 hon'ya

The cross-type compounds (jūbako, yutō) sit in the middle — rendaku rates around 20–40%, depending on whether the kun-read element is in the second slot (more likely to voice) or the first slot (cannot voice, since rendaku targets the right element).

Stratum predicts which on'yomi appears. A kanji like 明 has Go-on ミョウ (Buddhist register), Kan-on メイ (secular standard), and Tō-on ミン (late-borrowing trade). Knowing the compound is an old Buddhist coinage (光明 kōmyō) vs. a Meiji-era secular term (説明 setsumei) tells you which on'yomi to predict. See on'yomi vs kun'yomi for the four strata.

The app's word_formation_type field

This dictionary tags every entry in /words with word_formation_type — one of kango, wago, jubako, yuto, jukujikun, or mixed. Most learner dictionaries don't expose this. Standard JMdict-derived tools provide gloss and reading but not formation type, leaving learners to reverse-engineer the category from the reading itself.

The classification is computed at import time from the per-kanji reading_type field on the WordKanji join — every kanji onyomi → kango; every kunyomi → wago; mixed pattern → jubako or yuto by position; any jukujikun readingtype → jukujikun. See app/models/word.rb and db/seeds/kanji/words.json for the data; the rake task `kanji:importwords` populates the field.

A few representative type queries:

Knowing the type collapses the space of plausible readings from "one of 4–8 possibilities per kanji" to "one of 1–2 possibilities per kanji." That is what makes the classification load-bearing rather than ornamental.

Further reading

Internal:

External:

  • Frellesvig, B. (2010). A History of the Japanese Language. Cambridge University Press. — Ch. 9 covers the Sino-Japanese borrowing layers and the wago/kango stratum split.
  • Vance, T.J. (2008). The Sounds of Japanese. Cambridge University Press. — Phonological signatures of wago vs. kango; rendaku environment.
  • Loveday, L.J. (1996). Language Contact in Japan: A Sociolinguistic History. Oxford University Press. — Definitive treatment of gairaigo by source language and era.
  • Seeley, C. (1991). A History of Writing in Japan. University of Hawai'i Press. — The kun-doku tradition that produced jukujikun.
  • Liu, L.H. (1995). Translingual Practice: Literature, National Culture, and Translated Modernity — China, 1900–1937. Stanford University Press. — Catalogues the Meiji kango re-exports into modern Chinese.
  • Tamamura, F. (玉村文郎) (2010). 『日本語の語彙・意味』 [Japanese Vocabulary and Meaning]. Meiji Shoin. — NINJAL corpus statistics on wago/kango/gairaigo proportions.
  • Joyce, T. (2002). "The Japanese Mental Lexicon: The Lexical Retrieval and Representation of Two-Kanji Compound Words." Brain and Language 81. — Experimental evidence that jukujikun are stored as single lexical items.
  • Howland, D.R. (2002). Translating the West: Language and Political Reason in Nineteenth-Century Japan. University of Hawai'i Press. — How Meiji intellectuals coined the kango that became modern East Asian political vocabulary.

Send feedback

Optional — only if you'd like a reply.