Rules for Simple Placement of Japanese Ruby (Draft)

The latest version of this document is now maintained at the W3C.

Rules for Simple Placement of Japanese Ruby (Draft)

Foreword

Ruby is the name given to the small annotations in Japanese content that are rendered alongside base text, usually to provide a pronunciation guide, but sometimes to provide other information. (See the article “What is ruby” by the internationalization Working Group for more information.)

The Difficulties of Ruby Processing

When performing ruby layout in Japanese, the following factors need to be considered in order to decide on the position:

How to handle the correspondence between the base characters and the ruby
What to do when the string of base characters is longer than the ruby string
What to do when the string of base characters is shorter than the ruby string
Protrusion of ruby from base characters
In movable type typesetting, ruby characters protruding from their base characters may not hang over neighboring kanji, but often were allowed to hang over neighboring kana. However, when the ruby is katakana, some publishers would set it so that it would not hang over the kana neighboring the base character. Also, when the characters around the base characters were kanji on one side and kana on the other side, for the sake of balance the ruby would sometimes be set so as to hang neither over the kanji nor over the kana (which would therefore both be spaced away from the base character).

When the ruby string protrudes from the base character string, whether it can be allowed to be laid over the characters preceding or following, and whether this affects the position of the base characters
When the ruby string protrudes from the base character string, and the base character string is at the start or the end of the line, whether the base character string or the ruby string should be aligned with the line edge
Wrap opportunities
In computer-based typesetting, however, rather than always suppressing line wrap opportunities, they would be allowed in cases like compound words. This is because it may otherwise triggers very large spacing adjustments during justification.

When there are multiple base characters, whether there can be line wrap opportunity between them

In movable type typography, such matters were resolved based generic principles, and could always be corrected during the proofreading phase. Essentially, each case was adjusted individually in a flexible manner.

In computer-based typesetting, the layout needs to be more or less determined based on predetermined rules, but it remained necessary to adjust the results in certain cases, for example by changing the association between base characters and the ruby string, or by switching to a different placement policy.

Web Ruby placement

When thinking about computing placement for web content, it is not practical to decide on the positioning case by case as was done in movable type typography. It is therefore necessary to decide upon comprehensive rules that provide solutions to all the problems listed above, so that placement may be determined fully automatically. Considering all the possibilities that existed in movable type typesetting, the system to be designed needs to be very complex.

However, when considering the ideal positioning of ruby, it seems inevitable that exceptions will occur, causing issues.

In such cases, rather than ideal positioning, we must at least make sure that the positioning causes no misunderstanding; there are also practical limits to how complex the system can be in order to be practically implementable.

The following is a proposal for a simple processing system. The target audience is implementers and specification writers. It is expected that a full system may be more complex that what is described here, both due to the interaction with other features or other writing systems, and because those designing such system may wish to provide alternative options. Note that the terminology is based on that defined in JLReq.

Matters considered by the simple placement rules

Here are the fundamental assumptions underlying the simple placement rules.

Ruby is used to display the reading or the meaning of the base characters. Therefore, the number one priority here is to avoid misreadings.
The method detailed in this document attempts to reduce exceptions as much as possible. Therefore, there is no requirement for complex processing.
The method is agnostic to horizontal vs vertical writing, and will use the same logic in either case.
The method places the ruby string relative to the base character string the same way when they occur in the middle, start, or end of the line. Moreover, this method does not change the relative position of the ruby string to the base character string depending on preceding or subsequent characters. In other words, this method calculates a position for the ruby relative to the base string that does not change depending on context.
Generally speaking, the processing method is based on JIS X 4051 (Formatting rules for Japanese documents). However, in some cases, optional steps are used.
The ruby font size is set to half of the base character’s size as a default. However, the method supports using different sizes than 1/2.
While there are cases of ruby on both sides of the base string exist, the method defined here only handles ruby on one side. Handling both sides is left as a future exercise.

Types of ruby

Ruby in Japanese may be divided into the following 3 different types, based on the relationship between the ruby and the base characters (see JLReq “3.3.1 Usage of Ruby”).

Mono-ruby
Jukugo-ruby
Group-ruby

Which one to use depends on the relationship between the ruby and the base characters. Mono-ruby is used to connect ruby to a single base character, Jukugo-ruby is used when multiple base characters each have a corresponding ruby and at the same time the whole group needs to be processed together, and group-ruby is used when ruby is attached to a group of base characters together (see fig. 1). Each is used when specified.

Rules for Simple Placement of Japanese Ruby

Ruby character size and character placement

The size of the ruby characters and their placement in the inline direction relative to the base characters is as follows:

The size of the ruby is by default set to half of the size of the base characters.
In vertical text, ruby is placed to the right of the base characters, and the character frame of the ruby is placed flush against the character frame of the base characters. (see fig. 2)
In horizontal text, ruby is placed to the top of the base characters, and the character frame of the ruby is placed flush against the character frame of the base characters. (see fig. 3)

The following sections describe in detail the placement of mono-ruby, jukugo-ruby, and group-ruby. However, since jukugo-ruby is more complex, it is explained last.

Placement of mono-ruby

Mono-ruby is placed as follows:

Fig. 4
Example mono-ruby with western characters

When the ruby is made of two or more characters, each character in the ruby string is placed immediately next to its neighboring character, without any inter-letter spacing. Furthermore, when the ruby is composed of characters such as Grouped numerals (cl-24), Unit symbols (cl-25), Western word space (cl-26), or Western characters (cl-27) which have their own individual width, they are placed based on each character’s metrics. (see fig. 4)
The center of the ruby string and of the base character string are aligned in the inline direction. (see fig. 5).
Since the base character and its associated ruby form a single unit there is no line wrapping opportunity inside a mono-ruby.
Protruding over surrounding characters
The main placement method defined in JIS X 4051 allows some amount of overhang over the preceding and following base characters, but recognizes the method defined here as an allowed variant.

Fig. 5
Example 1 of mono-ruby protruding

Fig. 6
Example 2 of mono-ruby protruding

When the ruby string is longer than the base character string, the part of the ruby string that extends beyond the base characters must not hang over the characters preceding or following, if they are ideographic characters (cl-19), Hiragana (cl-15), Katakana (cl-16), etc. Space is introduced accordingly between these preceding or following characters and the base characters. (see fig. 5) However, in the following cases, the ruby characters do hang over the preceding or following characters. (see fig. 6)
- If the character preceding the base character is one of: Closing brackets (cl-02), Full stops (cl-06), Commas (cl-07), Full-width ideographic space (cl-14), or Middle dots (cl-05), then the ruby must hang over the blank portion at the end the character. (This blank portion is usually half the character’s width, except in the case of Middle dots (cl-05) where it is a fourth of the character width). However, if this blank part has been compressed due to justification or similar processing of the line, then the ruby may only hang over the resulting compressed blank space (e.g. if it was reduced from half to a quarter em, hang at most a quarter em).
- If the character following the base character is one of: Opening brackets (cl-01) or Full-width ideographic space (cl-14), Middle dots (cl-05), then the ruby must hang over the blank portion at the start the character. (This blank portion is usually half the character’s width for Opening brackets (cl-01), or a quarter of the character’s width for Middle dots (cl-05)) However, if this blank part has been compressed due to justification or similar processing of the line, then the ruby may only hang over the resulting compressed blank space (e.g. if it was reduced from half to a quarter em, hang at most a quarter em).
Fig. 7
Example of mono-ruby at the line start

Fig. 8
Example of mono-ruby at the line end

When the ruby string is longer than the base character string, and the ruby falls at the start of the line, then the start of the ruby string is aligned with the line’s start edge (see fig. 7), while if the ruby falls at the end of the line, then the end of the ruby string is aligned with the line’s end edge (see fig. 8),

Placement of group-ruby

Group-ruby is placed as follows:

When the ruby string and the base character string are composed of characters such as Hiragana (cl-15), Katakana (cl-16), Ideographic characters (cl-19), and so on, excluding characters like Grouped numerals (cl-24), Unit symbols (cl-25), Western word space (cl-26), or Western characters (cl-27) which have their own individual width, the way they are positioned depends on how their respective lengths would compare if they were each laid out without any inter-letter spacing:
- Fig. 9
  Example 1 of group-ruby
  
  When their respective lengths would be the same, both are laid out without inter-letter spacing and placed such that their respective centers in the inline direction are aligned (see fig. 9).
- Fig. 10
  Example 2 of group-ruby
  
  Fig. 11
  Example 3 of group-ruby
  
  When the ruby string is shorter than the base character string, space is inserted between every character in the ruby string as well as at the start and the end of the ruby string so that it becomes the same length as the base character string, then their centers in the inline direction are aligned. The size of the space inserted between each of the ruby characters is twice the size of the space inserted at the end and at the start (see fig. 10). However, the size space inserted at the start and end must be capped at no more than half the size of one base character, and the space inserted between each ruby character is enlarged to compensate (see fig. 11).
- Fig. 12
  Example 4 of group-ruby
  
  When the ruby string is longer than the base character string, space is inserted between every character in the base character string as well as at the start and the end of the base character string so that it becomes the same length as the ruby string, then their centers in the inline direction are aligned. The size of the space inserted between each of the base characters is twice the size of the space inserted at the end and at the start (see fig. 12).
Fig. 13
Example of ruby with western characters

When the base character string is composed of characters like Grouped numerals (cl-24), Unit symbols (cl-25), Western word space (cl-26), or Western characters (cl-27) which have their own individual width, and the ruby string is composed of characters such as Hiragana (cl-15), Katakana (cl-16), Ideographic characters (cl-19), and so on, the placement depends on the following (see fig. 13):
- When their respective lengths would be the same, both are laid out without inter-letter spacing and placed such that their respective centers in the inline direction are aligned.
- When the ruby string is shorter than the base character string, space is inserted between every character in the ruby string as well as at the start and the end of the ruby string so that it becomes the same length as the base character string, then their centers in the inline direction are aligned. The size of the space inserted between each of the ruby characters is twice the size of the space inserted at the end and at the start.
- When the ruby string is longer than the base character string, both are laid out without inter-letter spacing and placed such that their respective centers in the inline direction are aligned. In this case, the ruby string protrudes from the base character string.
When the ruby string is composed of characters like Grouped numerals (cl-24), Unit symbols (cl-25), Western word space (cl-26), or Western characters (cl-27) which have their own individual width, and the base character string is composed of characters such as Hiragana (cl-15), Katakana (cl-16), Ideographic characters (cl-19), and so on, the placement depends on the following (see fig. 13):
- When their respective lengths would be the same, both are laid out without inter-letter spacing and placed such that their respective centers in the inline direction are aligned.
- When the ruby string is shorter than the base character string, both are laid out without inter-letter spacing and placed such that their respective centers in the inline direction are aligned.
- When the ruby string is longer than the base character string, space is inserted between every character in the base character string as well as at the start and the end of the base character string so that it becomes the same length as the ruby string, then their centers in the inline direction are aligned. The size of the space inserted between each of the base characters is twice the size of the space inserted at the end and at the start.
Fig. 14
Example of protruding group-ruby

When the ruby string is longer than the base character string and protrudes, whether and how it hangs over characters preceding or following the base character string is handled in the same way as with mono-ruby (see fig. 14). Also, when the ruby string is longer than the base character string, protrudes, and is located at the start or end of the line, the resulting layout is also identical to that of mono-ruby.
Wrap opportunities in group-ruby
As group-ruby is treated as a unit, there is no wrap opportunity. However, there are examples where allowing wrapping may be desirable. In such cases, based on appropriate association of base characters and ruby characters, handling the wrapping opportunities the same way the are handled for jukugo-ruby may be appropriate. (see fig. 15).

Fig. 15
Wrapping group-ruby

In the case of group ruby, the base character string and its associated ruby string are treated as a unit, so there is no line wrapping opportunity inside either string.

Placement of Jukugo-ruby

Jukugo-ruby is placed as follows:

Fig. 16
Example 1 of jukugo-ruby

With jukugo-ruby, each base character is associated with its own ruby string. When the length of each of these ruby string laid out without inter-letter spacing is shorter than the length of all their corresponding base characters, placement is determined as follows:
- When the ruby string associated with an individual base character is 1 character long, the ruby character and the base character are placed such that their respective centers in the inline direction are aligned (see fig. 16).
- When the ruby string associated with an individual base character is 2 characters long or more, the ruby string is laid out without inter-letter spacing, and placed such that its center and the center of its base character are aligned in the inline direction (see fig. 16).
Fig. 17
Example 2 of jukugo-ruby

Fig. 18
Example 3 of jukugo-ruby

For simple ruby implementations, if even a single ruby string is longer than its corresponding base character when laid out without inter-letter spacing, the resulting layout would look identical to group-ruby. (see fig. 17 and 18).
With jukugo-ruby, individual base characters and their associated ruby string are treated as a unit, and line wrap opportunities are allowed between two base characters. When such a line wrap occurs, if a single base character that is part of the jukugo is placed alone at the end or at the start of a line, it is laid out identically to mono-ruby; conversely when several base characters that are part of the jukugo are placed together at the end or start of a line, they are laid out together as has been described in this section about jukugo-ruby (see fig. 19).

Fig. 19
Example of wrapping jukugo-ruby
When the ruby string is longer than the base character string and protrudes, whether and how it hangs over characters preceding or following the base character string is handled in the same way as with mono-ruby. Also, when the ruby string is longer than the base character string, protrudes, and is located at the start or end of the line, the resulting layout is also identical to that of mono-ruby.

Ruby and Accessibility

Accessibility Improvements Using Ruby

Ruby plays a role in improving accessibility for people with visual impairments, and other sources of reading difficulties. Therefore, this section examines the relationship between ruby and accessibility.

Reading difficulties can be caused by a variety of factors, and therefore, requirements to improve accessibility also vary. For example, here are some common requirements:

General-Ruby and Para-Ruby
See JLReq section “3.2.2 Choice of Base Characters to be Annotated by Ruby” for an explanation of “general-ruby” and “para-ruby”.

To accommodate young children who cannot read any kanji, general-ruby must be added to all kanji.
The Need for Ruby
According to the results of the 2017 DAISY survey towards general users of textbooks, 61% of children require general-ruby. Based on the same survey, para-ruby is found to be sufficient for 32% of children. This means that after having read general-ruby multiple times, having ruby on difficult kanji only is found sufficient. Moreover, printed textbooks use para-ruby, and faithful digital reproduction is needed.

Kanji Studied by Elementary and Middle School Students
Kanji to be learned during primary school are defined in the “Primary School Learning Guidelines”. Those kanji are often called “educational kanji”. In the list published in 2017, 1026 kanji are listed and spread across the various school years according to the “Kanji Allotment Table by Grade”. For middle school, the split between each grade is undefined, but students are required to study the 1110 “characters in common use” no included in educational kanji so as to have studied all 2136 characters in common use by the end of middle school.

As studies progress, a greater number of kanji is known. After having read general-ruby many times, ruby on difficult kanji only becomes sufficient. Therefore para-ruby on only some of the kanji is required.
Some people have difficulties in visually distinguishing between ruby characters and the base characters to which they are attached, and misread the combination as a different character altogether. There must be a display method that enables clearly distinguishing between the two. Also, for those who already know how to read the kanji, there must be a way to hide the ruby.
As inline parenthesised annotations can be used instead, there is no strong need for double-sided ruby.

Ruby Display Requirements for Accessibility

Based on the above, we can gather the following ruby display requirements for accessibility:

Support for general-ruby is required.
Support for para-ruby is required. Moreover, as the number of kanji known increases with the level of studies, based on the content and on the level of the reader, it must be possible to only display ruby for kanji assigned to a particular school year (or later).
Support for hiding ruby is required.
Considering the cost of production, distribution, and of user management, it is necessary to support ruby-less display, general-ruby display, and para-ruby display with the same content.
A method to clearly visually distinguish the ruby characters and their based characters, such as displaying them in different colors, is required.

Table of Contents

Foreword

The Difficulties of Ruby Processing

Web Ruby placement

Matters considered by the simple placement rules

Matters considered by the simple placement rules

Types of ruby

Rules for Simple Placement of Japanese Ruby

Ruby character size and character placement

Placement of mono-ruby

Placement of group-ruby

Placement of Jukugo-ruby

Fig. 19

Ruby and Accessibility

Accessibility Improvements Using Ruby

Ruby Display Requirements for Accessibility