Phono-Semantic Compounds: The 80% Rule Kanji Guides Skip

hbaristr 15분 분량

Most kanji are not pictures. They are spellings.

The single most useful fact about kanji: roughly four in five are phono-semantic compounds — 形声 (keisei moji), one component classifying meaning, another component recording (or once recording) the sound. The Shuowen Jiezi, Xu Shen's Han-dynasty catalog from around 100 CE, comes in at ~82%. Modern Mandarin counts land at 80–90% (DeFrancis 1989, Boltz 1994). For the joyo set, 61% are explicitly classified keisei, and that figure understates it — most of the so-called 会意 (compound-ideograph) entries are reanalyzed as phonetic compounds under modern paleography.

So why does almost every English-language kanji curriculum teach the character as an opaque atom — a shape to be glued to a keyword via an invented story? The phonetic gets silently dropped. Heisig acknowledges it in a footnote and then walks away. The apps clone Heisig. Two thousand mnemonics later, the student still cannot read 請 on first sight.

The phonetic is not a vestigial historical artifact. It is a live retrieval cue that collapses thousands of on'yomi into roughly a thousand phonetic series. This post lays out the structure of those series, the recoverability rate, why the readings drifted in the first place, and how to actually use them.

The corpus claim, with numbers

Phono-semantic compounding is not "common." It is the dominant generative mechanism for the entire CJK character space. Four independent counts agree on the magnitude.

Corpus Sample size % phono-semantic Source
Shuowen Jiezi (Han, 100 CE) 9,353 chars ~82% Boltz (1994), Tab. 4
Modern Mandarin general ~7,000 chars ~81% DeFrancis (1989), p. 84
Modern Mandarin, all CJK Unicode ~50,000+ chars ~90% Williams (1965); Boltz (1994)
Joyo kanji (explicit class) 2,136 chars ~61% Kanji Portraits (2021)
Joyo kanji (incl. reanalyzed 会意) 2,136 chars ~80% Boltz (1994) reclassification applied to joyo

The joyo gap is curricular, not structural. Japanese elementary school front-loads pictographs — , , , — and compound ideographs because six-year-olds can draw them. By the time you reach grade 6 and 中学校以降, phono-semantic compounds dominate as heavily as in the wider character space. Yamashita & Maru (2000) and Joyce et al. (2014) both report that orthographic-neighborhood density in Japanese — the count of kanji sharing a phonetic — peaks for late-elementary and secondary kanji. Which is exactly the slice adult learners spend years grinding through.

The first-order conclusion is simple. If you are studying joyo from grade 3 onward without exploiting the phonetic component, you are throwing away the single largest source of structural redundancy in the writing system.

Worked example: the 青 series

The phonetic (sei / shō, "blue-green") is the canonical introductory case. The right-hand or upper slot is fixed at 青; the left-hand semantic radical varies. The phonetic predicts セイ in kan-on (the Tang stratum) and ショウ in go-on (the older Wu stratum). Read the table.

Kanji Semantic radical Radical sense On'yomi Gloss
(whole char.) セイ / ショウ blue-green
氵 (water) water-clean セイ / ショウ clean, pure
日 (sun) sky condition セイ clear weather
米 (rice) refined grain セイ / ショウ refined, essence
言 (speech) verbal request セイ / シン request
忄 (heart) felt state ジョウ / セイ emotion
争 (struggle, contracted) non-struggle セイ / ジョウ quiet, still
立 (stand) standing-firm セイ / ジョウ peaceful

Eight characters. One phonetic. One on'yomi family — セイ with a regular ジョウ go-on doublet on three of them. The semantic radical does the disambiguating: water + 青 = clean; sun + 青 = clear weather; heart + 青 = emotion. This is the architecture DeFrancis (1989, ch. 5) called the misnamed ideograph: the character is not an idea-picture, it is a spelling with a determinative bolted on. The same point Peter Boodberg had been making in Berkeley since the 1930s.

The 請 → シン and 情 → ジョウ outliers are not noise. They are the residue of borrowing the same Old Chinese phonetic at two different centuries (kan-on vs. go-on), which is the next section.

At least three more series

工 (kō) — high reliability with one famous defector

Kanji Semantic radical On'yomi Gloss
(whole) コウ / ク craft
氵 (water) コウ (large) river
糸 (silk) コウ / ク crimson
虫 (insect/serpent) コウ rainbow
力 (strength) コウ / ク merit, achievement
攵 (strike) コウ attack
穴 (hole) クウ sky, empty

Six of seven read コウ in kan-on. is the famous defector — クウ — because 空 underwent an extra vowel change in transit (Middle Chinese kʰuwŋ → クウ rather than コウ). And the defector is more memorable than the regulars, precisely because it breaks the pattern. File 空 as "the 工-series exception" and you have effectively learned eight characters from one phonetic.

寺 (ji) — series drift across layers

Kanji Semantic radical On'yomi Gloss
(whole) temple
日 (sun) time
言 (speech) poem
亻 (person) ジ / シ attendant, samurai
扌 (hand) hold
牛 (ox) トク special
竹 (bamboo) トウ rank, equal

The more interesting kind of series — the kind that punishes naive rule-following and rewards historical understanding. Three of seven (詩, 特, 等) deviate from ジ. The reason: 寺 in Old Chinese is reconstructed by Baxter & Sagart (2014) as something like *s-ɢəʔ, with an *s- prefix that conditioned different reflexes in different daughter dialects. By the time the Tang court was lecturing Japanese envoys on kan-on, that prefix had bleached out unevenly — 寺 itself stayed voiced (ジ), 詩 went to シ, 特 and 等 took an entirely different vocalic path (トク, トウ). The series did not fail. It fissioned. A learner who knows this still holds all seven characters together as a family of recognizable etymological cousins — a far stronger memory than seven unrelated stories about samurai and bamboo.

主 (shu) — clean kan-on series with one wago surprise

Kanji Semantic radical On'yomi Gloss
(whole) シュ / ス master
氵 (water) チュウ pour, note
亻 (person) ジュウ dwell
木 (tree) チュウ pillar
馬 (horse) チュウ halt, station
彳 (step) オウ go toward

Five of six read チュウ / ジュウ. 往 is the outlier — and the history is that the modern shape inherited a different OC root, with 主 functioning more as a graphical stabilizer than a true phonetic (see Schuessler 2007 for the *waŋʔ reconstruction). For a learner the heuristic is dead simple: see 主 on the right, try チュウ first, fall back to ジュウ. Right roughly 80% of the time across joyo. Above 90% within the kan-on layer.

Empirical recoverability

What rate of return are we actually getting? The most-cited modern Mandarin numbers come from Zhou Youguang (1978) and Hsiao & Shillcock (2006): perfect prediction in 26–39% of phono-semantic characters, "approximate" prediction (same initial or same rhyme) in 58–66%, no useful signal in roughly the remaining third. These figures exclude the ~18% of kanji that are pictographs or compound ideographs.

Sino-Japanese on'yomi are more complicated. Recoverability is higher within a single borrowing layer (go-on, kan-on, or tō-on) but lower across layers, because Japan borrowed each phonetic at a different century and each borrowing froze a different snapshot of Chinese phonology. The four-layer structure is treated in detail in On'yomi vs Kun'yomi: A Practical Guide; for now, the relevant point is that asking "does the phonetic predict the on'yomi" is the wrong question. The right question is which on'yomi.

A worked comparison for the 青 series:

Kanji Kan-on Go-on Both layers consistent?
セイ ショウ yes
セイ ショウ yes
セイ (rare) yes
セイ ショウ yes
セイ シン no (go-on shifted)
セイ ジョウ yes (regular doublet)
セイ ジョウ yes (regular doublet)

Within kan-on alone, the 青 series is 100% consistent across the seven characters above — every one reads セイ. Within go-on it is 6/7 consistent (ショウ or its regular ジョウ voicing partner), with 請 → シン the lone defector. Dramatically higher than the cross-layer ~67% Hamilton (2013) reports for joyo as a whole. That gap is the whole point: layer-aware analysis recovers signal that cross-layer analysis loses.

The pedagogically actionable form: when you meet a new kanji and you recognize its phonetic, ask first what is the kan-on of the phonetic? That single question — the kan-on, not the on'yomi in general — has the highest hit rate of any heuristic available to a non-native learner.

Why the readings drifted

Old Chinese (c. 1000 BCE) → Late Han Chinese (~200 CE, the Wu-derived go-on stratum) → Middle Chinese (~600 CE, the Tang-derived kan-on stratum) → Early Modern Mandarin (the tō-on stratum) → modern Mandarin and modern on'yomi. Five centuries of sound change separate go-on from kan-on alone. The single largest source of cross-layer divergence in phonetic series is consonantal: Old Chinese had voicing distinctions and consonant clusters that Middle Chinese collapsed, and Japanese phonotactics flattened them further. (The Swedish sinologist Bernhard Karlgren is the one who first reconstructed Middle Chinese systematically — every modern reconstruction is a descendant of his 1915–1926 work.)

The 各 (kaku) series is the cleanest worked example, because Baxter & Sagart (2014) reconstruct a stable Old Chinese root.

Kanji Old Chinese (B&S) Middle Chinese Kan-on Modern gloss
*kˤak kak カク each
*kʰˤrak khaek カク / キャク guest
*kˤrak kaek カク rank, framework
*kə.rˤak lak ラク fall
*kə.rˤak lak ラク bind, network
*[r]ak ljak リャク abbreviate

Notice the OC initial k- in 各, 客, 格 — preserved as カ- in kan-on. Then 落 and 絡, which had the cluster *kə.rˤak: the velar prefix dropped, the *r became Middle Chinese *l, and Japanese borrowed *l as ラ. Then 略, which had no velar at all (just *[r]ak), gives リャク. The 各 series is entirely coherent at the Old Chinese level — every member contains *-rak or *-ak — but it looks fragmented at the Japanese surface (カク vs. ラク vs. リャク) because Japanese borrowed the *output of a thousand years of Chinese consonant simplification, not the input.

For the historically curious: William Baxter and Laurent Sagart's Old Chinese: A New Reconstruction (2014) is the current reference. The Wiktionary Appendix on Old Chinese phonology is a serviceable open summary, and individual character entries on Wiktionary now routinely list B&S reconstructions in the etymology section. Schuessler's ABC Etymological Dictionary of Old Chinese (2007) is a usable second reference. For the phonological-squeeze rules governing the OC-to-on'yomi transition specifically, see the table in On'yomi vs Kun'yomi.

Page from a Song dynasty edition of the Shuowen Jiezi showing entries under the SPEECH (言) radical with their small-seal forms and glosses
A page from a Song dynasty edition of the Shuowen Jiezi (c. 100 CE), showing characters grouped under 言 (speech) with their small-seal forms and Xu Shen's glosses. The Shuowen is the first systematic attempt to classify the entire character space, and its tally — roughly 82% phono-semantic — is also the earliest surviving documentation of the 80% rule. Source: Wikimedia Commons.

Why Heisig dropped phonetics

James Heisig is explicit about the design choice. The introduction to Remembering the Kanji, Vol. 1 (1977; current 6th ed. 2011) defers all reading instruction to Volume II so the learner can first build "a permanent mnemonic foothold" on character form and English keyword. He grants the existence of phonetic series and reading regularity, then deliberately excludes them from Volume I on the theory that learning meaning, form, and reading simultaneously is too expensive.

The trade is real, and it is asymmetric. RTK Volume I produces faster initial production — the student can write 2,200 keyword-cued kanji within a few months — but pays for it in slower long-term reading. The Volume-I graduate has 2,200 mnemonics for shapes and zero mental model of how on'yomi cluster. That mental model is precisely what lets a Chinese-trained learner sight-read unfamiliar Japanese newspaper text at 80%+ accuracy. Encounter cold and the Heisig graduate looks it up; the phonetics-aware learner sees 青 on the right, hypothesizes セイ, and is correct. Multiplied across the long tail of low-frequency joyo kanji adult learners grind through for years, the two curricula produce strikingly different reading speeds at the five-year mark.

A more thorough critique appears in the review of the reference and method literature. The short version: RTK Volume I is a defensible design considered alone but a strange shape for a complete curriculum, and the honest move for an adult learner is to treat RTK as a writing primer and run a phonetics-first reading curriculum in parallel — not after.

How to actually use this in study

Three rules, in order of importance.

  1. When you meet a new kanji, find the phonetic first. The phonetic is most often the right-hand element (tsukuri) in a left-right composition, less often the bottom (ashi), least often the upper enclosure (tare). The semantic radical is most often the left-hand element (hen). This positional regularity is itself a structural feature — see The Architecture of Kanji for the seven-position taxonomy and the IDC operators.
  2. If you have seen another kanji with the same phonetic, try its kan-on first. Within a series the kan-on is the most consistent layer: 6/7 to 7/7 hit rates are typical, vs. 4/7 to 6/7 for cross-layer guesses. If kan-on fails, fall back to the regular voicing partner (セイ → ジョウ, ホウ → ボウ) before giving up.
  3. Note exceptions explicitly — they are memorable because they break the rule. clinches the 工 series precisely because it does not read コウ. does the same for the 寺 series. The deviation is not a failure of the heuristic; it is a free anchor.

The way to operationalize this: keep a per-phonetic study list, not a per-grade study list. Tomoko Toyoda's Kanji Networks (1995, Journal of the Association of Teachers of Japanese) recommends exactly this, and it is the spine of the Kanji Atlas — every kanji shown alongside its component graph, phonetic included when one is identified. Browsing the grade 4 atlas surfaces under-exploited phonetic families faster than working linearly through a textbook, because the atlas shows the families instead of burying them in alphabetical order.

The deeper psychological claim — empirically testable — is that the phonetic series is not just a study shortcut but the actual cognitive unit proficient L2 readers store. Joyce, Masuda & Ogawa (2014) found orthographic-neighborhood effects (faster recognition for kanji with many phonetic siblings) as strong in advanced L2 readers as in native readers, which means the phonetic-family representation gets built by exposure even when the explicit curriculum ignores it. Your choice as a learner is whether to build it deliberately or accidentally. Deliberate is faster.

Related reading

On the layered structure of on'yomi (kan-on vs. go-on vs. tō-on, and why a single phonetic can read three ways depending on borrowing era), see Understanding On'yomi vs Kun'yomi. On the formal grammar of how phonetics and semantic radicals combine — the seven positional slots, the twelve Unicode IDC operators, the ten most productive phonetic components — see The Architecture of Kanji. For the broader classificatory context, including Boltz's argument that even canonical 会意 characters are reanalyzed phono-semantic compounds, see The Six Types of Kanji.

For deeper reading: Boltz's The Origin and Early Development of the Chinese Writing System (1994) is the canonical scholarly treatment of phono-semantic compounding in early Chinese, and it is where the radical claim — no genuine compound ideographs exist — was first made in modern Sinology. Baxter & Sagart's Old Chinese: A New Reconstruction (2014) is the reference for phonetic reconstruction. If you want one book that lets you read Wiktionary's etymology sections meaningfully, it is this one.

References

  • Baxter, W.H. & Sagart, L. (2014). Old Chinese: A New Reconstruction. Oxford University Press. WorldCat.
  • Boltz, W.G. (1994). The Origin and Early Development of the Chinese Writing System. American Oriental Society. WorldCat.
  • DeFrancis, J. (1989). Visible Speech: The Diverse Oneness of Writing Systems. University of Hawai'i Press. WorldCat.
  • Hamilton, N. (2013). "Identifying Useful Phonetic Components of Kanji." academia.edu.
  • Heisig, J.W. (2011). Remembering the Kanji, Vol. 1: A Complete Course on How Not to Forget the Meaning and Writing of Japanese Characters, 6th ed. University of Hawai'i Press.
  • Hsiao, J.H. & Shillcock, R. (2006). "Analyses of a Chinese phonetic-component database: Implications for orthographic processing." Journal of Psycholinguistic Research, 35(5), 405–426.
  • Joyce, T., Masuda, H. & Ogawa, T. (2014). "Jōyō kanji as core building blocks of the Japanese writing system." Written Language and Literacy, 17(2), 173–194.
  • Schuessler, A. (2007). ABC Etymological Dictionary of Old Chinese. University of Hawai'i Press.
  • Toyoda, T. (1995). "Kanji networks: An exploration of compounding errors." Journal of the Association of Teachers of Japanese, 29(1), 71–87. JSTOR.
  • Williams, C.A.S. (1965). Outlines of Chinese Symbolism and Art Motives. Tuttle.
  • Yamashita, H. & Maru, Y. (2000). "Compounds and word boundaries in Japanese." Working paper, Tohoku University.
  • Zhou, Y.G. (1978). "现代汉字中声旁的表音功能问题" [On the phonetic function of phonetic components in modern Chinese characters]. Zhongguo Yuwen, 1978(3), 172–177.

피드백 보내기

선택사항 — 답변이 필요한 경우에만.