Pitch Accent: The Prosody Western Learners Are Told to Ignore

The feature hiding in plain sight

Tokyo Japanese is a pitch-accent language. Every word carries either a single binary drop — a fall in pitch from one mora to the next — or no drop at all. The drop is lexical. It distinguishes 橋 (はし, bridge) from 箸 (はし, chopsticks) the way English stress separates récord from recórd. Native ears hear it instantly.

The drop is also invisible to the kanji. You cannot read it off the page. That single structural fact — not laziness, not malice — is why pitch accent ends up in the "advanced/optional" appendix of most learner materials. The kanji is the organizing unit of every syllabus, and the kanji carries no prosody. So prosody gets deferred.

The system itself is small. Four patterns. Mechanical rules. Measurable.

The four Tokyo patterns

Standard Tokyo has exactly four accent shapes. Every word is one of them. Diagram convention: H = high mora, L = low mora, the drop is the H→L transition, and the descending vertical line sits between the last H and the first L (は↓し).

Pattern	漢字	Definition	Example	Kana	Diagram (with particle)
Heiban (flat)	平板	No drop. L on mora 1, H from mora 2 onward, particle stays H.	桜	さくら + が	`L H H H — H` (sa-ku-ra-ga)
Atamadaka (head-high)	頭高	Drop after mora 1. H on mora 1, L from mora 2 onward.	雨	あめ + が	`H L — L` (a-me-ga)
Nakadaka (middle-high)	中高	Drop after some internal mora 2..N-1.	心	こころ + が	`L H L — L` (ko-ko-ro-ga)
Odaka (tail-high)	尾高	Drop after the final mora. Distinguishable from heiban only when a particle follows.	山	やま + が	`L H — L` (ya-ma-ga)

The key observation: heiban and odaka look identical on the bare noun — both are L H for a two-mora word. The contrast surfaces only when a particle attaches. Sakura ga keeps rising; yama ga falls on the ga. Pitch accent is a property of the phrase, not the lexeme in isolation. Dictionaries encode the drop position, not the surface contour, for exactly this reason.

A subtler property of the Tokyo system, formalized by McCawley (1968): the first two mora of any word are obligatorily distinct in pitch. If mora 1 is L, mora 2 is H. If mora 1 is H (atamadaka), mora 2 is L. No H H —. No L L —. This is a phonotactic constraint, not a tendency — and it is the proof that the four patterns above exhaust the system. There is no fifth shape.

Minimal pairs

English uses pitch for prosody — questions, emphasis, sarcasm — but never for lexical contrast. Japanese does both. The same segmental string with a different drop position is a different word.

Surface	Kana	Pattern	Drop	Meaning	English analog
橋	はし	odaka	after 2	bridge	—
箸	はし	atamadaka	after 1	chopsticks	—
端	はし	heiban	none	edge, tip	—
雨	あめ	atamadaka	after 1	rain	—
飴	あめ	heiban	none	candy	—
神	かみ	heiban	none	god, deity	—
紙	かみ	atamadaka (some dialects: heiban)	after 1	paper	—
鼻	はな	heiban	none	nose	—
花	はな	odaka	after 2	flower	—

The right column is empty by design. English has nothing equivalent. Présent vs presént contrasts via duration, vowel quality, and pitch acting together, restricted to a handful of disyllabic noun/verb pairs. Japanese minimal pairs are pure pitch, span the full lexicon, and cross part-of-speech freely. はし alone has three.

The はし trio is the canonical example because all three are high-frequency native nouns still used daily. The 神/紙 pair is more interesting historically — both are old native words with no Sino-Japanese alternant, and the accent contrast is attested as far back as the 12th-century Ruijumyōgishō (類聚名義抄), one of the earliest tone-marked Japanese dictionaries.

The mora, not the syllable

Pitch in Japanese is counted in mora (拍, haku), not syllables. This one fact dissolves a great deal of confusion. A mora is roughly "one beat" in traditional Japanese verse, and the rules are mechanical: every CV unit is one mora, the moraic nasal ん is one mora, the geminate っ is one mora, and the second half of a long vowel is one mora.

Word	Surface	Syllables (English ear)	Mora (Japanese ear)	Mora breakdown
東京	とうきょう	2 (Tō-kyō)	4	と・う・きょ・う
学校	がっこう	2 (gak-kō)	4	が・っ・こ・う
新聞	しんぶん	2 (shin-bun)	4	し・ん・ぶ・ん
病院	びょういん	3 (byō-i-n)	4	びょ・う・い・ん
切手	きって	2 (kit-te)	3	き・っ・て

An English ear hears Tōkyō as two heavy syllables. A Japanese ear counts four equal beats. The pitch contour for 東京 is heiban (L H H H), drop never coming. Collapse it to two syllables in your head and you can't place the accent — there's no "second mora" in your model. This is why pitch instruction on Roman-letter approximations (Tokyo) underperforms instruction on kana: the romanization smuggles in syllabic counting.

Vance (2008) Chapter 6 is the standard English-language treatment. Labrune (2012) Chapter 5 derives mora-counting from a more general phonological theory and shows that even allegedly anomalous units — the diphthong-like えい, the palatalized きょ — decompose cleanly under a moraic analysis. The mora is not a folk category. It is the unit the accent grammar operates on.

This app's mora splitter lives in lib/kanjidict/mora.rb and is what pitch_accent indexes into when the descending-line diagram is rendered. A pitch_accent of 2 for とうきょう would mean "drop after the second mora" — と・う↓きょ・う. The actual stored value is 0 (heiban). Tokyo is, fittingly, flat.

Why kanji can't predict pitch

Pitch is a property of the word, not the characters that spell it. The same kanji contributes to different accent patterns depending on the word.

Word	Kana	Pattern	Drop after	Source kanji
雨	あめ	atamadaka	1	雨
大雨	おおあめ	nakadaka	3	大 + 雨
雨水	あまみず	nakadaka	1	雨 + 水
雨具	あまぐ	nakadaka	2	雨 + 具
梅雨	つゆ	heiban	none	梅 + 雨 (jukujikun)

Five compounds, the kanji 雨 in every one, and the drop sits in five different places — or nowhere. The kanji is graphically constant; the pronunciation, the reading-type, and the accent are all functions of the whole word. There is no rule of the form "雨 contributes pitch X." Compounds follow rules, but the rules apply at the compound level.

McCawley (1968) proved this formally: the lexical accent of a Japanese compound is computed from the accent classes of its components plus a small set of compound rules, but it is not a function of any spelling property. The orthography is downstream of the phonology. The reverse mapping doesn't exist.

Compound accent rules

When two morphemes combine, predictability emerges. Kubozono (2008) "Japanese Accent" in The Handbook of Japanese Linguistics gives the modern formulation. First-order generalization: in N+N compounds, the second element determines the compound's accent class.

Second element class	Compound rule	Worked example
Short (1–2 mora), accented	Drop falls on the second element's accented mora	山 (yama, odaka) + 川 (kawa, atamadaka) → やまかわ (drop after 3)
Short (1–2 mora), heiban	Drop falls on the last mora of the first element	山 (yama) + 桜 (sakura, heiban) → やまざくら (drop after 3)
Long (3+ mora), accented	Compound inherits second element's accent	東 + 京都 (kyōto, atamadaka) → とうきょうと, drop after 4
Long (3+ mora), heiban	Compound becomes heiban	西 + 日本 (nihon) → にしにほん (heiban)
Deaccenting suffix	Compound goes flat regardless of input	魚 + 屋 → さかなや (heiban, despite 魚 being heiban + 屋 being odaka)

山桜 is the cleanest illustration. A heiban second element (sakura, L H H H) attached to an odaka first element (yama, L H) yields yamazakura with the drop pulled back onto the last mora of yama. The compound is L H H L L L. The rightward search for an accent failed, so the system installed one at the morpheme boundary. This is the N+N accent rule and it covers the bulk of native compounds.

Exceptions are real and not negligible. Deaccenting suffixes (-ya 屋, -san さん, certain place-name endings) wipe accent. Loanword compounds and recent neologisms often default to atamadaka regardless of inputs. A small inherited class of "preaccenting" suffixes shifts the drop to the mora immediately before the suffix. Kubozono (2008) catalogs roughly 15 productive classes. Rule-governed, but not closed-form.

Tokyo vs Kansai: the inversion

The same words have different drop positions in Kansai (Osaka/Kyoto) Japanese. The two systems are historically related but cannot be derived from one another by a simple shift. Treat them as independent prosodic grammars. Kubozono (2012) gives the typology: Tokyo is a single-accent system (one drop or none); Kansai is a two-tone system (each word carries both an initial register — high or low — and a drop position).

A rough mapping for the words in our table:

Word	Tokyo	Kansai (Kyoto)
桜 (sakura)	heiban (`L H H`)	high-initial atamadaka (`H L L`)
雨 (ame)	atamadaka (`H L`)	low-initial heiban-like (`L H`)
山 (yama)	odaka (`L H`, drops on particle)	high-initial atamadaka (`H L`)
心 (kokoro)	nakadaka after 2 (`L H L`)	high-initial atamadaka (`H L L`)
橋 (hashi, bridge)	odaka (`L H`, drops on particle)	high-initial heiban (`H H`, no drop)
箸 (hashi, chopsticks)	atamadaka (`H L`)	high-initial atamadaka (`H L`)

A first-pass heuristic that comes close but misleads on details: many Tokyo heiban words are atamadaka in Kansai, and vice versa. The "inversion" mnemonic is useful but factually loose. The real correspondence runs through the historical kei class system preserved in the Ruijumyōgishō tone marks. The two modern systems are reflexes of a common medieval ancestor — Kansai conserves the initial-register distinction, Tokyo neutralized it but refined the drop position. OJAD, the University of Tokyo's online accent dictionary, lets you toggle between Tokyo and Kansai readings for tens of thousands of entries.

What this app does

The dictionary stores pitch_accent per word as an integer drop position: 0 = heiban (no drop), 1 = atamadaka (drop after mora 1), n = drop after mora n. The convention follows the NHK Hassei Akusento Shin Jiten (2016) and the kanjium dataset.

sakura  → pitch_accent = 0   →  さくら     LHHH —      (heiban)
ame     → pitch_accent = 1   →  あ↓め     HL —        (atamadaka)
kokoro  → pitch_accent = 2   →  こ こ↓ろ  LHL —       (nakadaka)
yama    → pitch_accent = 2   →  や ま↓    LH — L      (odaka, drop on particle)
hashi   → pitch_accent = 1   →  は↓し     HL —        (chopsticks)
hashi   → pitch_accent = 2   →  は し↓    LH — L      (bridge, odaka)

The descending-line diagram on each word page is rendered server-side by PitchAccentHelper#pitch_diagram from lib/kanjidict/mora.rb (mora splitter) and the integer drop. No JavaScript, no audio dependency. ~9,700 words have accent data, sourced from the public-domain kanjium project, which reconciles NHK 2016, Daijirin, and the Sanseidō Shin Meikai Akusento Jiten. Where sources disagree — regional and generational drift is real — kanjium tracks the variants and the app surfaces the consensus form.

Try it on the minimal pair: /words/hashi for the bridge sense, then the same orthographic input under different lemmas. The mora splitter is what powers the diagram; pitch_accent is meaningless without an answer to "what counts as one beat," which is why the two pieces of code live next to each other.

Three honest reasons for the omission

The deferral is not arbitrary. Three real reasons drive it. Naming them clarifies what to do.

1. Dialectal variance. Teaching pitch is teaching one variety. NHK 2016 codifies an idealized educated Tokyo, but Kansai, Tōhoku, Kyūshū, and the so-called mukei (accentless) zones of central Honshū all differ. An author writing for a global audience can credibly argue: "there is no single accent to teach." Honest. Also a cop-out — the same argument applies to vowel length, and pitch is no more variable than other features that are taught.

2. Notational hostility. IPA-style tone diagrams are visually noisy; the kana-with-arrow conventions (は↓し) require monospace rendering and a willingness to read accent marks. Atkinson (1975) on second-language acquisition observed that prosodic features get sacrificed when authors face a tradeoff between phonetic precision and beginner accessibility. Vowels, consonants, and tones all follow the same rule: precision loses. For Japanese, pitch lost.

3. Functional sufficiency. You can be understood without correct pitch. Native listeners use redundancy — lexical context, particle cues, situational pragmatics — to reconstruct the intended word from a flat-accent rendering. The hashi/hashi/hashi trio is famous precisely because it is unusual; most minimal pairs are disambiguated by context within the first half-second of conversation. The cost in comprehensibility is roughly zero. The cost is paid elsewhere.

The cost: native listeners notice every misplaced drop. Idemaru & Guion (2008) ran perception experiments and found that even after lexical disambiguation, an L2 speaker with systematically wrong pitch is rated as less native, less competent, and less trustworthy on bizarrely wide social dimensions. The drop position is a sociolinguistic marker. Get it wrong consistently and you are a foreigner forever — not because you are unintelligible, but because you sound like one.

How to learn it

Three habits, in order of marginal value per hour.

Look up the pattern for every word you learn. Add the drop position to your flashcard along with the kana. One extra integer per card. The benefit accumulates linearly across your vocabulary. This dictionary surfaces it on every word page; for words not yet in the corpus, OJAD covers ~150,000 lemmas.
Shadow NHK news. NHK announcers are trained on the NHK accent dictionary. They are the only mass-market Japanese audio that reliably uses standard Tokyo pitch — anime, dorama, J-pop, and most YouTube content do not. Twenty minutes of shadowing per day for a month retrains your perception faster than any reading material.
Use suzuki-kun. OJAD's suzuki-kun synthesizer takes arbitrary Japanese text, marks the predicted accent on each word, and synthesizes audio with correct prosody. The only tool that solves the compound-rule problem in real time. Paste in a sentence; it diagrams the accent of the whole phrase, not just the dictionary lemmas.

The canonical reference, if you read Japanese, is the NHK 日本語発音アクセント新辞典 (NHK Pronunciation and Accent Dictionary, 2016 revision). It supersedes the 1998 and 1985 editions by reflecting a generation of measured drift in standard Tokyo speech — notably, many traditionally atamadaka loanwords have flattened to heiban over the last 30 years. ~6,000 yen in print. The entries also appear in OJAD.

Look up the drop. Add it to the card. Shadow the news. Compound and repeat.