Character map

XML documents can contain any of the characters defined in the Unicode standard. A given font will probably not contain symbols for every possible character. This is particularly true for Type 1 fonts, which do not allow for more than 256 glyphs to be encoded in a single font, and which do not use a Unicode encoding.

[Note] Note

TopLeaf currently only supports characters in the Basic Multilingual Plane.

Declaring a character map file

TopLeaf uses a character map to specify the action taken when a character is not present in the currently selected font. It also allows you to force certain characters to use a particular font, overriding the font specified by the mappings.

The default character map is found in the file charmap in the data\sgml subfolder of the TopLeaf installation folder.

Do not make changes to this file, since it will be replaced each time you update TopLeaf. Instead, copy it into your TopLeaf repository as file charmap.loc.

If this file is present TopLeaf will use it instead of the default charmap file. For example, if your TopLeaf repository is located at C:\TopLeaf, then you could create a local character map file at C:\TopLeaf\charmap.loc. A file in this location will apply to the whole repository. You can also place it in other locations as described in “Configuration files”.

If a definition for the character cannot be found after applying the rules in the character map, an error is generated and a default character is drawn. The default character depends on the currently selected font.

Character map file structure

A character map contains <range> and <replace> elements. Use a range element to map characters to a specific font. Use a replace element to specify an alternate character in the same font.

The information in the range elements is applied first. If this does not result in a match the replace elements are examined.

[Warning] Warning

Although the charmap file uses a syntax that is compatible with XML, it is not read using a full XML parser. Do not put anything other than tags and comments in this file, and only use ASCII characters.

The <range> element

A range element maps a contiguous sequence of Unicode characters to a contiguous sequence of data points in a font. The Unicode range is specified by the ustart and uend attributes. The target data points are specified by the data attribute (only the start of the sequence is required, because it is always the same length as the Unicode sequence). All of these attributes are interpreted as hexadecimal values if they start with “x”, or as decimal values if not.

[Note] Note

If the replacement characters do not form a contiguous sequence in the same order as the original characters, then you must use multiple <range> elements to specify the map.

The typeface attribute determines the target font. The font used is the regular style font for the nominated typeface (normal weight and not italic).

For example:

<range ustart="x39A" uend="x39D" data="x4B" typeface="Symbol"/>

defines the action for the Unicode characters with hexadecimal codes 39A, 39B, 39C and 39D. When one of these characters is encountered and it is not present in the current font, TopLeaf switches to the Symbol font and draws the character at data point 4B, 4C, 4D or 4E, respectively.

[Note] Note

The typeface must appear in the font configuration.

If the uend value is not present the action applies to the single character given by ustart. If the data attribute is not present it defaults to the value of ustart. The ustart attribute must be present.

[Note] Note

Characters in the range U+0020 to U+007F (i.e. ASCII) cannot be mapped. If these code points are used an error will be generated.

When the target font uses the Unicode character set the data attribute can be omitted to indicate that the code point is not changed. However, it can still be used to map to a different code point if required.

You may also specify that the substitution will always happen by including the select attribute with value always, for example:

<range ustart="9986" data="x22" typeface="Wingdings" select="always"/>

which causes the character with decimal code 9986 (U+2702) to always produce the “scissors” character from the Wingdings font, regardless of whether the current font contains this character.

If the sequences specified by the <range> elements overlap, the ones later in the document take precedence over the earlier ones. In other words, define the most general rules first, and more specific rules last.

For example:

<range ustart="x4E00" uend="x9FBF"
       typeface="CJKStandard" select="always"/>
<range ustart="x4EAC"
       typeface="CJKSpecial" select="always"/>

maps all of the characters in the range U+4E00 to U+9FBF (the CJK Unified Ideographs) to a standard typeface, but maps a specific character in this range to a different typeface.

The <replace> element

The replace element must contain two attributes. The char attribute defines a character code point. The alt attribute defines an alternate code point that is used if the font does not contain the character. Both attributes are interpreted as hexadecimal values if they start with “x”, or as decimal values if not.

This is intended to be used to identify characters with identical appearance that can be substituted without changing the meaning of the output. For example:

<replace char="x2011" alt="x002D"/>

This indicates that if the font does not contain the non-breaking hyphen (U+2011) a hyphen-minus (U+002D) can be used instead. Note that this only changes the appearance of the character, not its meaning. The composition engine will still treat it as a non-breaking character.