13 KiB
| title | chunk | source | category | tags | date_saved | instance |
|---|---|---|---|---|---|---|
| UTF-16 - Glossary | MDN | 1/3 | https://developer.mozilla.org/en-US/docs/Glossary/UTF-16 | reference | web, html, css, javascript, documentation | 2026-05-05T05:47:59.541050+00:00 | kb-cron |
MDN HTML HTML: Markup language
HTML reference
HTML guides
Markup languages
CSS reference
CSS guides
Layout cookbook
JavaScriptJS JavaScript: Scripting language
JS reference
JS guides
Web APIs Web APIs: Programming interfaces
Web API reference
Web API guides
- Using the Web animation API
- Using the Fetch API
- Working with the History API
- Using the Web speech API
- Using web workers
Technologies
Topics
Learn Learn web development
Frontend developer course
- Getting started modules
- Core modules
- MDN Curriculum
- Check out the video course from Scrimba, our partner
Learn HTML
Learn CSS
Learn JavaScript
Tools Discover our tools
About Get to know MDN better
UTF-16
UTF-16 is a character encoding standard for Unicode. It encodes each Unicode code point using either one or two code units. Each code unit is a 16-bit value.
Code points whose values are less than 216 are encoded as a single code unit that is numerically equal to the code point's value. These code points comprise the Basic Multilingual Plane (BMP), and include the most common characters, including Latin, Greek, Cyrillic, and many East Asian characters.
For example, the Latin character "A" is assigned the code point U+0041 in Unicode, and this is represented in UTF-16 as the single code unit 41.
Code points whose values are greater than 216 are encoded using a pair of code units, which is called a surrogate pair. The values used for surrogate pairs are not used for Unicode code points, so as to avoid ambiguity.
For example, the emoji character "🦊" (Fox Face) is assigned the code point U+1F98A in Unicode, and this is represented in UTF-16 as the surrogate pair d83e dd8a.
In this article
UTF-16 in JavaScript
Strings in JavaScript are represented using UTF-16, and many String APIs operate on code units, not code points. For example, String.length returns 2 for a string containing a single Unicode character which is not in the BMP:
The String.charCodeAt() method returns the code unit at the given index, and the String.codePointAt() method returns the code point at the given index:
See UTF-16 characters, Unicode code points, and grapheme clusters to learn more about working with UTF-16 strings in JavaScript.
UTF-16 and UTF-8
UTF-8 is an alternative encoding for Unicode, which uses one to four bytes for each Unicode code point. UTF-8 is a much more common encoding for documents on the Web than UTF-16.
UTF-16 and UCS-2
UCS-2 is an obsolete encoding for Unicode. It is the same as UTF-16, except it does not support surrogate pairs, so is not able to encode code points outside the BMP.