Character directionality

Every character has an inherent directionality, which in broad terms, is one of:

  • LTR (for example, Western European, East Asian (CJK) characters);

  • RTL (for example, Arabic and Hebrew characters);

  • Weak (for example, number characters);

  • Neutral (for example, punctuation and whitespace).

Characters with an inherent LTR or RTL directionality are said to be strongly typed. The directionality of weak and neutral characters is determined from the directionality of the surrounding context.

A sequence of neutral characters positioned between two strongly typed characters assumes the directionality of those characters. In the example below, a comma (U+002C) is positioned between the RTL character (U+05D1) and the RTL character (U+05DC). The resolved direction for the comma is RTL:

A sequence of neutral characters positioned between two strongly typed characters with different directionality assumes the directionality of the prevailing base direction. In the example below, the base direction is LTR. A sequence of full stop (U+002E) characters is positioned between an LTR character (U+0066) and a character with weak directionality (U+0035). This is followed by the RTL character (U+05D1). The resolved direction for both the neutral and weak characters is LTR and the enclosed characters are rendered as part of a single LTR directional run:

If it is necessary to change the way neutral characters are grouped with respect to text direction, the recommended practice is to insert a strongly typed directional mark character within the rendered content:

Code Name
U+200E LEFT-TO-RIGHT MARK (ZERO WIDTH)
U+200F RIGHT-TO-LEFT MARK (ZERO WIDTH)

In the previous example, inserting an RTL mark (U+200F) immediately before the sequence of full stop (U+002E) characters forces it and the following weakly typed character (U+0035) to be processed as part of a separate RTL directional run: