UTF-8

Suggest Improvement

UTF-8 (Unicode Transformation Format - 8-bit) is a variable-width character encoding used to represent every character in the Unicode character set. It is designed to be backward-compatible with ASCII and capable of encoding all possible characters, or code points, in Unicode.

UTF-8 encodes each Unicode character as one to four bytes. The first 128 characters of Unicode, which correspond to the ASCII character set, are encoded as single bytes, making UTF-8 compatible with ASCII. Characters beyond the ASCII set are encoded using multiple bytes, allowing UTF-8 to handle a vast range of characters from different languages and symbol sets.

Example: Here is an example of encoding and decoding a string using UTF-8 in JavaScript:

// Example string
const text = "Hello, 世界";

// Encoding the string to UTF-8
const encoder = new TextEncoder();
const encodedText = encoder.encode(text);
console.log(encodedText); // Output: Uint8Array containing UTF-8 encoded bytes

// Decoding the UTF-8 bytes back to string
const decoder = new TextDecoder('utf-8');
const decodedText = decoder.decode(encodedText);
console.log(decodedText); // Output: "Hello, 世界"

Explanation of the Example:

  • TextEncoder: The TextEncoder object is used to encode a string into a Uint8Array of UTF-8 bytes. const encodedText = encoder.encode(text); converts the string text into its UTF-8 byte representation.
  • TextDecoder: The TextDecoder object is used to decode a Uint8Array of UTF-8 bytes back into a string. const decodedText = decoder.decode(encodedText); converts the UTF-8 bytes back into the original string.