Typing non-English letters

Overview
EurKEY layout
US-International layout
Using non-US layouts
Unicode input
Appendix: Unicode codepoints

¿ á æ ç ñ α β γ ?

Overview

Writing in languages other than English quickly needs letters and symbols not found on the standard US keyboard. This post describes several possible solutions to typing these symbols with QMK keyboards.

EurKEY layout

For typing most symbols in major Western European languages, a popular solution is to set the computer to use the EurKEY layout.

#include "keymap_eurkey.h"

(EurKEY types the ASCII symbols in the same way as QWERTY, there is no need for a send string header.) Use these EU-prefixed keycodes in your keymap. Incomplete list:

Keycode	Symbol	Keycode	Symbol
`EU_AACU`	á	`EU_AE`	æ
`EU_EACU`	é	`EU_CCED`	ç
`EU_IACU`	í	`EU_NTIL`	ñ
`EU_OACU`	ó	`EU_OSTR`	ø
`EU_UACU`	ú	`EU_SS`	ß
`EU_ADIA`	ä	`EU_EURO`	€
`EU_ODIA`	ö	`EU_IQUE`	¿
`EU_UDIA`	ü	`EU_IEXL`	¡

US-International layout

Another option for major Western European languages is the US-International layout. Note that in US-International, ’ and ` are dead keys for typing accented letters. At the top of your keymap.c, include the headers:

#include "keymap_us_international.h"
#include "sendstring_us_international.h"

This adds US-prefixed keycode definitions for the additional symbols that US-International can type. An incomplete list:

Keycode	Symbol	Keycode	Symbol
`US_AACU`	á	`US_AE`	æ
`US_EACU`	é	`US_CCED`	ç
`US_IACU`	í	`US_NTIL`	ñ
`US_OACU`	ó	`US_OSTR`	ø
`US_UACU`	ú	`US_SS`	ß
`US_ADIA`	ä	`US_EURO`	€
`US_ODIA`	ö	`US_IQUE`	¿
`US_UDIA`	ü	`US_IEXL`	¡

Using non-US layouts

It’s preferable to use the EurKEY or US-International layout, since they are (mostly) compatible with the usual KC-prefixed keycodes. But perhaps you need other symbols, like Cyrillic or Greek letters, or for other reasons must set the computer to another keyboard layout. If you do this, beware that correspondence between KC-prefixed keycodes and the keys they actually type may be mixed up.

Instead of the KC keycodes, you should use keycodes corresponding to your layout and (if you use SEND_STRING) the corresponding Sendstring implementation. If the computer is set to the German keyboard layout, then at the top of keymap.c add

#include "keymap_german.h"
#include "sendstring_german.h"

The first header defines alternative “DE”-prefixed keycodes, including keycodes like DE_SS for ß to type non-English symbols. Use these DE keycodes in your keymap instead of the KC keycodes. The second header modifies SEND_STRING such that ASCII strings type correctly under the German layout. For further explanation, see Additional language support and Sendstring support.

See Language-specific Keycodes for the full list of such headers. See the headers themselves to find keycodes for the additional symbols a particular layout can type. For instance keymap_ukrainian.h defines UA_SHCH to type Щ. Here is a sampling:

* For these languages, SEND_STRING is limited to (a subset of) the ASCII character set. Layouts for non-Latin languages lack a corresponding “sendstring_” header because they have no keys to enter most ASCII characters.

Side note: Why do we have this complication of different keycodes for each language? When making keyboards with other layouts, it is standard practice to reuse US QWERTY keyboard firmware and simply print different keycap labels. This mismatch is resolved by configuring the host computer to map key codes to the intended layout. From the Universal Serial Bus HID Usage Tables, section 10:

Where this list is not specific for a key function in a language, the closest equivalent key position should be used, so that a keyboard may be modified for a different language by simply printing different keycaps. One example is the Y key on a North American keyboard. In Germany this is typically Z. Rather than changing the keyboard firmware to put the Z Usage into that place in the descriptor list, the vendor should use the Y Usage on both the North American and German keyboards. This continues to be the existing practice in the industry, in order to minimize the number of changes to the electronics to accommodate other languages.

If the computer is set to the German QWERTZ keyboard layout, then a QMK keyboard sending the KC_Y keycode will be interpreted by the computer as typing z. Indeed, the German keycode DE_Z is an alias of KC_Y.

Unicode input

An entirely different approach to typing non-English characters is through QMK’s Unicode input feature. This can type letters from any language, math symbols, arrows, emojis, and other Unicode symbols.

To give fair warning, the implementation is a hack. Each major OS has an input method where the user may type a Unicode symbol by manually entering its codepoint number. Example: on Linux, the symbol 好 (U+597D) can be typed as “Ctrl+Shift+U, 5, 9, 7, D, space.” QMK literally sends such a key sequence to type each Unicode character. Consequently:

You need to configure which Unicode input mode QMK should use, since the exact key sequence needed depends on the OS. This is fussy if you regularly use your keyboard with more than one OS. Additionally on Windows, you need to either install WinCompose or make a registry edit to enable Unicode input.
There is often a visible flicker of digits on the screen as a Unicode symbol is typed. This may be distracting.

Another limitation is that holding the Shift key during Unicode input does not automatically capitalize the symbol. To make a shift-able symbol key, you need to tell QMK explicitly which Unicode codepoint to send when shifted.

That said, it does work. Here is how to set it up. One further complication is that QMK has not one but three Unicode “input subsystems.” I’ll describe Unicode Map, since it has a convenient method for defining shift-able symbols.

Step 1. In rules.mk, add

UNICODEMAP_ENABLE = yes

Step 2. In config.h, define which input method QMK should use. For Mac, use UNICODE_MODE_MACOS and enable Unicode Hex Input under System Preferences → Keyboard → Input Sources. For Linux, use UNICODE_MODE_LINUX. For Windows, use UNICODE_MODE_WINCOMPOSE and install WinCompose. It is also possible to list multiple methods here, useful if you use your keyboard with more than one OS. See the Unicode input modes documentation for further details.

// Set Unicode input method for Linux.
#define UNICODE_SELECTED_MODES UNICODE_MODE_LINUX

Step 3. In keymap.c, define the unicode_map array, which lists the Unicode codepoint for each symbol you want to use. For shift-able symbols, define both the lower and uppercase versions of the symbols. I’ll implement as an example keys for ß, ñ, ç, and ¿. Codepoints for a few symbols of interest are listed below in the appendix. Codepoints are conventionally denoted like “U+00E7” with the number in hexadecimal, so I have written them here with 0x prefix to make C hex constants.

enum unicode_names {
  U_SS_LOWER,
  U_SS_UPPER,
  U_NTIL_LOWER,
  U_NTIL_UPPER,
  U_CCED_LOWER,
  U_CCED_UPPER,
  U_IQUE_SYM,
};

const uint32_t unicode_map[] PROGMEM = {
  [U_SS_LOWER]   = 0x00df,  // ß
  [U_SS_UPPER]   = 0x1e9e,  // ẞ
  [U_NTIL_LOWER] = 0x00f1,  // ñ
  [U_NTIL_UPPER] = 0x00d1,  // Ñ
  [U_CCED_LOWER] = 0x00e7,  // ç
  [U_CCED_UPPER] = 0x00c7,  // Ç
  [U_IQUE_SYM]   = 0x00bf,  // ¿
};

Step 4. Define keycodes for the symbols. For ß, ñ, ç, use the UP(i, j) macro to represent the pair of unshifted and shifted symbols associated with the key. For ¿, let’s simply ignore shifting and use the UM(i) macro:

// ß and ẞ keycode.
#define U_SS UP(U_SS_LOWER, U_SS_UPPER)
// ñ and Ñ keycode.
#define U_NTIL UP(U_NTIL_LOWER, U_NTIL_UPPER)
// ç and Ç keycode.
#define U_CCED UP(U_CCED_LOWER, U_CCED_UPPER)
// ¿ keycode.
#define U_IQUE UM(U_IQUE_SYM)

Step 5. Finally, use the keycodes U_SS, U_NTIL, U_CCED, U_IQUE in your keymap.

A few remarks:

It is of course a laborious project looking up codepoints and making all these definitions, even for just a handful of symbols. Check whether using the EurKEY layout has the symbols you need, since this is an easier solution.
Some Unicode symbols consist of a sequence of multiple codepoints rather than a single codepoint. Use send_unicode_string() to type multi-codepoint symbols. See this emoji macro for an example.
The UP() macro is limited to the first 128 table entries of unicode_map. This is enough to define a fair number of symbols, but keep this in mind if you’re planning something elaborate.

Appendix: Unicode codepoints

This section lists Unicode codepoints for a few symbols of interest. Codepoints are represented in hexadecimal. To find the codepoint numbers for other symbols, there are many online references and tools that can help, like unicodelookup.com.

Punctuations

Symbol	Codepoint
¿	U+00BF
¡	U+00A1
«	U+00AB
»	U+00BB
– (en dash)	U+2013
— (em dash)	U+2014

Accented vowels

Symbol	Codepoint	Symbol	Codepoint
Á	U+00C1	á	U+00E1
É	U+00C9	é	U+00E9
Í	U+00CD	í	U+00ED
Ó	U+00D3	ó	U+00F3
Ú	U+00DA	ú	U+00FA
Ä	U+00C4	ä	U+00E4
Ë	U+00CB	ë	U+00EB
Ï	U+00CF	ï	U+00EF
Ö	U+00D6	ö	U+00F6
Ü	U+00DC	ü	U+00FC

Misc Western European

Symbol	Codepoint	Symbol	Codepoint
Ç	U+00C7	ç	U+00E7
Ñ	U+00D1	ñ	U+00F1
ẞ	U+1E9E	ß	U+00DF
£	U+00A3	€	U+20AC

Greek

Symbol	Codepoint	Symbol	Codepoint
Α	U+0391	α	U+03B1
Β	U+0392	β	U+03B2
Γ	U+0393	γ	U+03B3
Δ	U+0394	δ	U+03B4
Ε	U+0395	ε	U+03B5
Ζ	U+0396	ζ	U+03B6
Η	U+0397	η	U+03B7
Θ	U+0398	θ	U+03B8
Ι	U+0399	ι	U+03B9
Κ	U+039A	κ	U+03BA
Λ	U+039B	λ	U+03BB
Μ	U+039C	μ	U+03BC
Ν	U+039D	ν	U+03BD
Ξ	U+039E	ξ	U+03BE
Ο	U+039F	ο	U+03BF
Π	U+03A0	π	U+03C0
Ρ	U+03A1	ρ	U+03C1
Σ	U+03A3	σ	U+03C3
Τ	U+03A4	τ	U+03C4
Υ	U+03A5	υ	U+03C5
Φ	U+03A6	φ	U+03C6
Χ	U+03A7	χ	U+03C7
Ψ	U+03A8	ψ	U+03C8
Ω	U+03A9	ω	U+03C9

← More about keyboards