I’ve come across people who do not think that CSS is related to internationalisation at all, but if you think about it, internationalisation is more than translating the content on your site into multiple languages and calling it a day. There are various nuances to the presentation of that content which affect the experience of a native speaker using your site.
There is no single canonical definition for internationalisation but the W3C offers the following guidance:
Internationalisation is the design and development of a product, application or document that enables easy localisation for target audiences that vary in culture, region, or language.
This is a lot of ground to cover, from use of Unicode and character encodings, to the technical implementations of serving translated content, as well as the presentation of said content. Today I’m only going to cover the CSS-related aspects of multilingual support.
CSS is used to describe the presentation of a web page by telling the browser how elements on the page ought to be styled and laid out. There are several methods we can use to apply different styles to different languages on a multilingual page with CSS.
In addition, there are CSS properties that provide layout and typographic capabilities for scripts and writing systems beyond the Latin-based horizontal top-to-bottom ones that are predominantly seen on the web today.
So buckle up, because this might end up being a pretty lengthy article. ¯\_(ツ)_/¯
Language-related styling
Have you ever wondered how Chrome knows to ask you if you’d like a web page’s content to be translated? No? Okay, maybe it’s just me then. But it’s because of the lang
attribute on the <html>
element.
The lang
attribute is a pretty important one because it identifies the language of textual content on the web, and this information is used in many places. The aforementioned built-in translation of Chrome, search engines for language-specific content, as well as screen readers.
Ah hah, maybe the screen reader bit didn’t occur to you, but if you’re not a screen reader user or know folks who are, it probably isn’t top of mind. Screen readers make use of language information so it can read out the content in the appropriate accent and correct pronunciation.
The key to language-related styling lies in appropriate usage of the lang
attribute in your page markup. The lang
attribute recognises the ISO 639 language codes as values.
Update: Tobias Bengfort pointed out that the lang
attribute uses an IETF specification called BCP 47, which is largely based on ISO 639.
For most cases, you would use a two-letter codes like zh
for Chinese, but Chinese (among other languages like Arabic) is considered a macrolanguage that consist of a number of languages with more specific primary language subtags.
Refer to Language tags in HTML and XML for an in-depth explanation on how to construct language tags.
The general guidance is that the html
element must always have a lang
attribute set, which is then inherited by all other elements.
<html lang="zh">
It is not uncommon to see content of different languages on the same page. In this case, you would wrap that content with a <span>
or a <div>
and apply the correct lang
attribute onto that wrapping element.
<p>The fourth animal in the Chinese Zodiac is Rabbit (<span lang="zh">兔子</span>).</p>
Now that we’ve got that sorted, the following techniques will bear the assumption that lang
attributes have been implemented responsibly.
:lang()
pseudo-class selector
Turns out the :lang()
pseudo-class selector is not that well-known.
But this pseudo-class selector is pretty cool because it recognises the language of the content even if the language is declared outside the element.
For example, a line of markup with two languages like so:
<p>We use <em>italics</em> to emphasise words in English, <span lang="zh">但是中文则是用<em>着重号</em></span>.</p>
Can be styled with the following:
em:lang(zh) {
font-style: normal;
text-emphasis: dot;
}
If your browser supports the text-emphasis
CSS property, you should be able to see emphasis marks (a typographic symbol traditionally used for emphasis on a run of East Asian text) added to each Chinese character within the <em>
. Chrome needs the -webkit-
prefix, boo.
But the point is, the lang
attribute was not applied on the <em>
element, but on its parent. The pseudo-class still works. If we used the more commonly known attribute selectors, e.g. [lang="zh]
, this attribute must be on the <em>
element to take effect.
Using attribute selectors
Which brings me to our next technique, using attribute selectors. These let us select elements with certain attributes or attributes of a certain value. (shameless plug time, for more on attribute selectors, try this Codrops CSS reference entry written by yours truly)
There are seven ways to match attribute selectors, but I’ll only talk about those I think are more relevant to the lang
attribute specifically. All my examples with use Chinese as the targeted language, so zh
and its variants.
Update: Amelia Bellamy-Royds pointed out that my examples make it seem like attribute selectors are neccessary to do partial language tag matching, but the :lang()
pseudo-class covers that use-case already.
First, we can match the lang
attribute value exactly using this syntax:
[lang="zh"]
/* will match only zh */
I mentioned earlier that Chinese is considered a macrolanguage , which means its language tag can be composed with additional specifics, e.g. script subtags Hans
or Hant
(W3C says only use script subtags if they are necessary to make a distinction you need, otherwise don’t), region subtag HK
or TW
and so on.
The point is, language tags can be longer than just two letters. But the most generalised category always comes first, so to target attribute values that start with a particular string, we use this syntax involving the ^
:
[lang^="zh"]
/* will match zh, zh-HK, zh-Hans, zhong, zh123…
* basically anything with zh as the first 2 characters */
There is another syntax involving the |
, which will match the exact value in your selector or a value that starts with your value immediately followed by a -
. This seems like it was made just for language subcode matching, no?
[lang|="zh"]
/* will match zh, zh-HK, zh-Hans, zh-amazing, zh-123 */
Do remember that for attribute selectors, the attribute has to be on the element you want styled, it won’t work if it’s on a parent or ancestor. Note that the examples I've come up with for partial language tag matching can already be done with the :lang()
pseudo-class.
In other words, :lang(en)
will match lang="en-US"
, lang="en-GB"
and so on, in addition to lang="en"
. I will update the examples when I can come up with better ones. Meanwhile, go for the :lang()
pseudo-class.
How about normal classes or ids?
Yes. You can use normal classes or ids. Though you’d no longer be making use of the convenience of what is already on your element. (Again, my assumption is the lang
attribute is being applied correctly and responsibly) But sure, go ahead and give your elements class names for applying specific language-related styling if you really want to, nobody will stop you.
CSS properties
Okay, selectors covered. Let’s talk about the styles we want applied to elements that match those selectors.
Writing mode
The default value for writing-mode
is horizontal-tb
. Perfectly logical because the web was born at CERN, where the official languages are English and French. Moreover, most of web technologies were pioneered in English-speaking countries anyway (I think).
But the wondrous-ness of humanity gave us more than 3000 written languages with scripts and writing directions beyond a horizontal top-to-bottom orientation.
The traditional Mongolian script runs vertically from left-to-right, while East Asian languages like Japanese, Chinese and Korean, when written vertically, runs from right-to-left. The writing mode properties that let you do that are vertical-lr
and vertical-rl
respectively.
There are also the values of sideways-lr
and sideways-rl
, which rotate the glyphs sideways. Every Unicode character has a vertical orientation property that informs rendering engines how the glyph should be oriented by default.
We can change the character’s orientation with the text-orientation
property. This usually comes into play when you have vertically typeset East Asian text interspersed with Latin-based words or characters. For abbreviations, you have the option of using text-combine-upright
to squeeze the letters into one character space.
Some of you might wonder about right-to-left languages like Arabic, Hebrew or Farsi (just to name a few), and whether CSS is applicable for those scripts as well. The short answer is CSS should not be used for bi-directional styling. Guidance from the W3C is as follows:
Because directionality is an integral part of the document structure, markup should be used to set the directionality for a document or chunk of information, or to identify places in the text where the Unicode bidirectional algorithm alone is insufficient to achieve desired directionality.
This is styling applied via CSS has the possibility of being turned off, being overridden, going unrecognised, or being changed/replaced in different contexts. Instead, use of the dir
attribute to set the base direction of text for display is the recommended approach.
I highly recommend referring to Structural markup and right-to-left text in HTML, CSS vs. markup for bidi support and Inline markup and bidirectional text in HTML for more detailed explanations and implementation details.
Logical properties
Everything on a web page is a box, and CSS has always used the physical directions of top
, bottom
, left
and right
to indicate which side of the box we’re targeting. But when writing modes not oriented in the default horizontal top-to-bottom direction, these values tend to get confusing.
Because the specification is still in Editor's Draft status, the syntax may change moving forward. Even now, the current browser implementation is different from what is in the specification, so be sure to double-check with MDN: CSS Logical Properties and Values on the most updated syntax.
Update: David Baron pointed out I was using the old syntax in the previous version of the specification and the syntax implemented in browsers is actually the one in the Editor's Draft. Table has been updated accordingly.
The matrix of writing directions and their corresponding values for a box’s physical sides and logical sides for positioning are as follows (the table has been lifted from the specification as of time of writing):
The logical top of a container uses inset-block-start
, while the logical bottom of a container uses inset-block-end
. The logical left of a container uses inset-inline-start
, while the logical right of a container uses inset-inline-end
.
There are also corresponding mappings for borders, margins and paddings, which are:
-
top
toblock-start
-
right
toinline-end
-
bottom
toblock-end
-
left
toinline-start
And the mappings for sizing are as follows are width
to inline-size
and height
to block-size
.
Lists and counters
Numeral systems are writing systems for expressing numbers, and even though the most commonly used system of numerals is the Hindu-Arabic numeral system (0, 1, 2, 3 and so on), CSS allows us to display ordered lists with other numeral systems as well.
Predefined counter styles can be used with the list-style-type
property, which covers 174 numeral systems from afar
to urdu
. You can check out the full list at MDN.
If you’re interested in CSS counters, I wrote about them some time last year where I explore the “Heavenly-stem” and “Earthly-branch” numeral system used in traditional Chinese contexts (as well as a fizzbuzz implementation in CSS because why not).
Text decoration
As mentioned earlier, East Asian languages do not have the concept of italics. Instead, we have emphasis dots. They can be placed above or beneath characters to emphasise the text, strengthen the tone of voice or avoid ambiguity.
When Chinese is written in horizontal writing mode, these dots are placed underneath the characters, and when written in vertical writing mode, these dots are placed to the right of the characters.
Japanese, on the other hand, places emphasis dots above the characters in horizontal writing mode. In order to make the CSS property more generalised, text-emphasis-style
, text-emphasis-position
and text-emphasis-color
were introduced in CSS Text Decoration Module Level 3.
You can use different symbols other than dots, like circle
, triangle
or even a single character as a string. Position and colour can also be tweaked with their respective properties.
Line decoration is also covered in the same specification, and provides developers more granular control of underlines and overlines (in level 4 of the spec). But this is especially useful for scripts that have ascenders or descenders that regularly spill over the baseline.
text-decoration-skip
is covered in CSS Text Decoration Module Level 4, which controls how overlines and underlines are drawn when they cross over a glyph. Again, something that happens less frequently for languages like English, but greatly affect aesthetics for scripts like Burmese, for example.
Font variations
There are two categories of CSS properties for accessing OpenType features, high-level properties and low-level properties. The specification recommends use of high-level properties whenever possible. This is mostly predicated on browser support.
For example, font-variant-east-asian
allow for control over glyph forms for characters that have variants, like Simplified Chinese glyphs versus Traditional Chinese glyphs. It is the same character, but they can be written differently.
There is also font-variant-ligatures
which provide numerous pre-defined options for ligatures and contextual forms, like discretionary-ligatures
or historical-ligatures
or contextual
.
The low-level properties are accessed via font-feature-settings
where you would use the 4-letter OpenType tags to toggle the features you want (this does depend on whether your font has those features to begin with, but let’s assume it does).
There are 141 feature tags from Alternative Fractions to Justification Alternates to Ruby Notation Forms to Slashed Zero. These CSS properties are closely related to features within the font file itself, so there is that external dependency that lies upon your choice of font. Something to keep in mind.
Wrapping up
This post got really long, so I’ll have a second part where I go into more specifics on how we could build up a layout using the selectors we covered to make sure our layout is robust even if the language changes. Modern layout properties like Flexbox and Grid are well suited for use cases like this.
One of the things I find most interesting about CSS is how we can combine them in different ways to achieve a myriad of outcomes, and with more than 500 CSS properties in existence, that’s a lot of possibilities. I’m not saying anything goes, because often, there are numerous ways to reach the same result, and some ways are more appropriate than others.
However, it is up to us to make an informed decision regarding which is the most appropriate method to use for our context, by understanding the mechanics behind each technique, its pros and cons, and being aware of why we chose to do things a certain way.
I still believe, that after more than three decades, the web is still an informational medium, where content is key. Hence, the presentation of that content should be optimised regardless of what language or script it is in. And I’m glad that CSS is continually developing to provide developers a means to do just that.
Anyway, stay tuned for part 2.
Coming…
Eventually.
References
- W3C Internationalization (i18n)
- Internationalization techniques: Authoring HTML & CSS
- Using the HTML lang attribute
- Styling using language attributes
- Localization vs. Internationalization
- Using the Unicode BIDI Algorithm to Handle Complexities in Typesetting Multi-Script Vertical Text
- OpenType features in CSS