QMK: Typing non-English letters
Pascal Getreuer, 2022-09-13 (updated 2022-12-29)
¿ á æ ç ñ α β γ ?
Overview
Writing in languages other than English quickly needs letters and symbols not found on the standard US keyboard. This section describes several possible solutions to typing these symbols with QMK keyboards.
US-International layout
For typing most symbols in major Western European languages, a good solution is to set the computer to use the US-International layout. You want to use US-International if possible, since with this layout you’ll still be able to refer to keys in QMK using the usual KC-prefixed keycodes. Beware that this is not the case for most other layouts, see Using non-US layouts below.
At the top of your keymap.c, include the headers:
#include "keymap_us_international.h"
#include "sendstring_us_international.h"
This adds US-prefixed keycode definitions for the additional symbols that US-International can type. An incomplete list:
Keycode | Symbol | Keycode | Symbol |
---|---|---|---|
US_AACU |
á | US_AE |
æ |
US_EACU |
é | US_CCED |
ç |
US_IACU |
í | US_NTIL |
ñ |
US_OACU |
ó | US_OSTR |
ø |
US_UACU |
ú | US_SS |
ß |
US_ADIA |
ä | US_MICR |
µ |
US_ODIA |
ö | US_IQUE |
¿ |
US_UDIA |
ü | US_IEXL |
¡ |
Additionally, accented letters can be typed using ’ ` ” ^ ~ as dead keys. For instance tapping ‘ and then u types ú. To type a literal ’, tap the ’ key followed by space.
Using non-US layouts
As mentioned above, the US-International layout is preferable in
order to use the usual KC_
-prefixed keycodes. But perhaps
you need other symbols, like Cyrillic or Greek letters, or for other
reasons must set the computer to another keyboard layout. If you do
this, beware that correspondence between KC_
-prefixed
keycodes and the keys they actually type may be mixed up.
Instead of the KC
keycodes, you should use keycodes
corresponding to your layout and (if you use SEND_STRING
)
the corresponding Sendstring implementation. If the computer is set to
the German keyboard layout, then at the top of keymap.c add
#include "keymap_german.h"
#include "sendstring_german.h"
The first header defines alternative “DE_
”-prefixed
keycodes, including keycodes like DE_SS
for ß to type
non-English symbols. Use these DE_
keycodes in your keymap
instead of the KC_
keycodes. The second header modifies
SEND_STRING
such that ASCII strings type correctly under
the German layout. For further explanation, see Additional
language support and Sendstring
support.
See Language-specific
Keycodes for the full list of such headers. See the headers
themselves to find keycodes for the additional symbols a particular
layout can type. For instance keymap_ukrainian.h defines
UA_SHCH
to type Щ. Here is a sampling:
- keymap_czech.h
- keymap_french.h
- keymap_german.h
- keymap_greek.h*
- keymap_hebrew.h*
- keymap_italian.h
- keymap_russian.h*
- keymap_swedish.h
- keymap_ukrainian.h*
* SEND_STRING
is limited to (a subset of) the ASCII
character set. Layouts for non-Latin languages lack a corresponding
“sendstring_
” header because they have no keys to enter
most ASCII characters.
Side note: Why do we have this complication of different keycodes for each language? When making keyboards with other layouts, it is standard practice to reuse US QWERTY keyboard firmware and simply print different keycap labels. This mismatch is resolved by configuring the host computer to map key codes to the intended layout. From the Universal Serial Bus HID Usage Tables, section 10:
Where this list is not specific for a key function in a language, the closest equivalent key position should be used, so that a keyboard may be modified for a different language by simply printing different keycaps. One example is the Y key on a North American keyboard. In Germany this is typically Z. Rather than changing the keyboard firmware to put the Z Usage into that place in the descriptor list, the vendor should use the Y Usage on both the North American and German keyboards. This continues to be the existing practice in the industry, in order to minimize the number of changes to the electronics to accommodate other languages.
If the computer is set to the German QWERTZ keyboard layout, then a
QMK keyboard sending the KC_Y
keycode will be interpreted
by the computer as typing z
. Indeed, the German keycode
DE_Z
is an alias of KC_Y
.
Unicode input
An entirely different approach to typing non-English characters is through QMK’s Unicode input feature. This can type letters from any language, math symbols, arrows, emojis, and other Unicode symbols.
To give fair warning, the implementation is a hack. Each major OS has an input method where the user may type a Unicode symbol by manually entering its codepoint number. Example: on Linux, the symbol 好 (U+597D) can be typed as “Ctrl+Shift+U, 5, 9, 7, D, space.” QMK literally sends such a key sequence to type each Unicode character. Consequently:
You need to configure which Unicode input mode QMK should use, since the exact key sequence needed depends on the OS. This is fussy if you regularly use your keyboard with more than one OS. Additionally on Windows, you need to either install WinCompose or make a registry edit to enable Unicode input.
There is often a visible flicker of digits on the screen as a Unicode symbol is typed. This may be distracting.
Another limitation is that holding the Shift key during Unicode input does not automatically capitalize the symbol. To make a shift-able symbol key, you need to tell QMK explicitly which Unicode codepoint to send when shifted.
That said, it does work. Here is how to set it up. One further complication is that QMK has not one but three Unicode “input subsystems.” I’ll describe Unicode Map, since it has a convenient method for defining shift-able symbols.
Step 1. In rules.mk, add
UNICODEMAP_ENABLE = yes
Step 2. In config.h, define which input method QMK
should use. For Mac, use UNICODE_MODE_MACOS
and enable
Unicode Hex Input under System Preferences → Keyboard → Input Sources.
For Linux, use UNICODE_MODE_LINUX
. For Windows, use
UNICODE_MODE_WINCOMPOSE
and install WinCompose. It is
also possible to list multiple methods here, useful if you use your
keyboard with more than one OS. See the Unicode input
modes documentation for further details.
// Set Unicode input method for Linux.
#define UNICODE_SELECTED_MODES UNICODE_MODE_LINUX
Step 3. In keymap.c, define the
unicode_map
array, which lists the Unicode codepoint for
each symbol you want to use. For shift-able symbols, define both the
lower and uppercase versions of the symbols. I’ll implement as an
example keys for ß, ñ, ç, and ¿. Codepoints for a few symbols of
interest are listed below in the
appendix. Codepoints are conventionally denoted like
“U+00E7
” with the number in hexadecimal, so I have written
them here with 0x
prefix to make C hex constants.
enum unicode_names {
,
U_SS_LOWER,
U_SS_UPPER,
U_NTIL_LOWER,
U_NTIL_UPPER,
U_CCED_LOWER,
U_CCED_UPPER,
U_IQUE_SYM};
const uint32_t unicode_map[] PROGMEM = {
[U_SS_LOWER] = 0x00df, // ß
[U_SS_UPPER] = 0x1e9e, // ẞ
[U_NTIL_LOWER] = 0x00f1, // ñ
[U_NTIL_UPPER] = 0x00d1, // Ñ
[U_CCED_LOWER] = 0x00e7, // ç
[U_CCED_UPPER] = 0x00c7, // Ç
[U_IQUE_SYM] = 0x00bf, // ¿
};
Step 4. Define keycodes for the symbols. For ß, ñ,
ç, use the UP(i, j)
macro to represent the pair of
unshifted and shifted symbols associated with the key. For ¿, let’s
simply ignore shifting and use the UM(i)
macro:
// ß and ẞ keycode.
#define U_SS UP(U_SS_LOWER, U_SS_UPPER)
// ñ and Ñ keycode.
#define U_NTIL UP(U_NTIL_LOWER, U_NTIL_UPPER)
// ç and Ç keycode.
#define U_CCED UP(U_CCED_LOWER, U_CCED_UPPER)
// ¿ keycode.
#define U_IQUE UM(U_IQUE_SYM)
Step 5. Finally, use the keycodes U_SS
,
U_NTIL
, U_CCED
, U_IQUE
in your
keymap.
A few remarks:
It is of course a laborious project looking up codepoints and making all these definitions, even for just a handful of symbols. Check whether using the US-International layout has the symbols you need, since this is an easier solution.
Some Unicode symbols consist of a sequence of multiple codepoints rather than a single codepoint. Use
send_unicode_string()
to type multi-codepoint symbols. See this emoji macro for an example.The
UP()
macro is limited to the first 128 table entries ofunicode_map
. This is enough to define a fair number of symbols, but keep this in mind if you’re planning something elaborate.
Appendix: Unicode codepoints
This section lists Unicode codepoints for a few symbols of interest. Codepoints are represented in hexadecimal. To find the codepoint numbers for other symbols, there are many online references and tools that can help, like Xah Lee - Unicode Search and unicodelookup.com.
Punctuations
Symbol | Codepoint |
---|---|
¿ | U+00BF |
¡ | U+00A1 |
« | U+00AB |
» | U+00BB |
– (en dash) | U+2013 |
— (em dash) | U+2014 |
Accented vowels
Symbol | Codepoint | Symbol | Codepoint |
---|---|---|---|
Á | U+00C1 | á | U+00E1 |
É | U+00C9 | é | U+00E9 |
Í | U+00CD | í | U+00ED |
Ó | U+00D3 | ó | U+00F3 |
Ú | U+00DA | ú | U+00FA |
Ä | U+00C4 | ä | U+00E4 |
Ë | U+00CB | ë | U+00EB |
Ï | U+00CF | ï | U+00EF |
Ö | U+00D6 | ö | U+00F6 |
Ü | U+00DC | ü | U+00FC |
Misc Western European
Symbol | Codepoint | Symbol | Codepoint |
---|---|---|---|
Ç | U+00C7 | ç | U+00E7 |
Ñ | U+00D1 | ñ | U+00F1 |
ẞ | U+1E9E | ß | U+00DF |
Greek
Symbol | Codepoint | Symbol | Codepoint |
---|---|---|---|
Α | U+0391 | α | U+03B1 |
Β | U+0392 | β | U+03B2 |
Γ | U+0393 | γ | U+03B3 |
Δ | U+0394 | δ | U+03B4 |
Ε | U+0395 | ε | U+03B5 |
Ζ | U+0396 | ζ | U+03B6 |
Η | U+0397 | η | U+03B7 |
Θ | U+0398 | θ | U+03B8 |
Ι | U+0399 | ι | U+03B9 |
Κ | U+039A | κ | U+03BA |
Λ | U+039B | λ | U+03BB |
Μ | U+039C | μ | U+03BC |
Ν | U+039D | ν | U+03BD |
Ξ | U+039E | ξ | U+03BE |
Ο | U+039F | ο | U+03BF |
Π | U+03A0 | π | U+03C0 |
Ρ | U+03A1 | ρ | U+03C1 |
Σ | U+03A3 | σ | U+03C3 |
Τ | U+03A4 | τ | U+03C4 |
Υ | U+03A5 | υ | U+03C5 |
Φ | U+03A6 | φ | U+03C6 |
Χ | U+03A7 | χ | U+03C7 |
Ψ | U+03A8 | ψ | U+03C8 |
Ω | U+03A9 | ω | U+03C9 |