kb/data/developer.mozilla.org/en-US/docs/Glossary/UTF-16-0.md

13 KiB

title chunk source category tags date_saved instance
UTF-16 - Glossary | MDN 1/3 https://developer.mozilla.org/en-US/docs/Glossary/UTF-16 reference web, html, css, javascript, documentation 2026-05-05T05:47:59.541050+00:00 kb-cron

MDN HTML HTML: Markup language

HTML reference

HTML guides

Markup languages

CSS CSS: Styling language

CSS reference

CSS guides

Layout cookbook

JavaScriptJS JavaScript: Scripting language

JS reference

JS guides

Web APIs Web APIs: Programming interfaces

Web API reference

Web API guides

All All web technology

Technologies

Topics

Learn Learn web development

Frontend developer course

Learn HTML

Learn CSS

Learn JavaScript

Tools Discover our tools

About Get to know MDN better

Blog

  1. Glossary
  2. UTF-16

UTF-16

UTF-16 is a character encoding standard for Unicode. It encodes each Unicode code point using either one or two code units. Each code unit is a 16-bit value. Code points whose values are less than 216 are encoded as a single code unit that is numerically equal to the code point's value. These code points comprise the Basic Multilingual Plane (BMP), and include the most common characters, including Latin, Greek, Cyrillic, and many East Asian characters. For example, the Latin character "A" is assigned the code point U+0041 in Unicode, and this is represented in UTF-16 as the single code unit 41. Code points whose values are greater than 216 are encoded using a pair of code units, which is called a surrogate pair. The values used for surrogate pairs are not used for Unicode code points, so as to avoid ambiguity. For example, the emoji character "🦊" (Fox Face) is assigned the code point U+1F98A in Unicode, and this is represented in UTF-16 as the surrogate pair d83e dd8a.

In this article

UTF-16 in JavaScript

Strings in JavaScript are represented using UTF-16, and many String APIs operate on code units, not code points. For example, String.length returns 2 for a string containing a single Unicode character which is not in the BMP: The String.charCodeAt() method returns the code unit at the given index, and the String.codePointAt() method returns the code point at the given index: See UTF-16 characters, Unicode code points, and grapheme clusters to learn more about working with UTF-16 strings in JavaScript.

UTF-16 and UTF-8

UTF-8 is an alternative encoding for Unicode, which uses one to four bytes for each Unicode code point. UTF-8 is a much more common encoding for documents on the Web than UTF-16.

UTF-16 and UCS-2

UCS-2 is an obsolete encoding for Unicode. It is the same as UTF-16, except it does not support surrogate pairs, so is not able to encode code points outside the BMP.

See also