Text Encoder Explorer — Unicode, UTF-8, UTF-16 Byte Inspector

Explore how text is encoded in UTF-8, UTF-16, and Unicode code points. View per-character byte breakdowns, hex dumps, and encoding statistics. Free online tool, runs entirely in your browser.

12
Characters
19
UTF-8 Bytes
26
UTF-16 Bytes
CharCode PointUTF-8UTF-16DecimalName
HU+00484800 4872ASCII printable
eU+00656500 65101ASCII printable
lU+006C6C00 6C108ASCII printable
lU+006C6C00 6C108ASCII printable
oU+006F6F00 6F111ASCII printable
,U+002C2C00 2C44ASCII printable
U+00202000 2032Space
U+4E16E4 B8 964E 1619990U+4E16
U+754CE7 95 8C75 4C30028U+754C
!U+00212100 2133ASCII printable
U+00202000 2032Space
🌍U+1F30DF0 9F 8C 8DD8 3C DF 0D127757U+1F30D

What is Text Encoding?

Text encoding is the process of converting characters into sequences of bytes that computers can store and transmit. Unicode assigns a unique code point to every character in every script. UTF-8 and UTF-16 are the two most widely used encoding formats: UTF-8 uses 1–4 bytes per character and is the dominant encoding on the web, while UTF-16 uses 2 or 4 bytes and is used internally by JavaScript, Java, and Windows. This tool lets you inspect exactly how each character in your text is represented in these encodings, showing code points, byte sequences, and a hex dump view.

How to Use the Text Encoder Explorer

  1. Enter or paste text into the input field. A default sample with ASCII, CJK, and emoji characters is provided.
  2. View the character table to see each character's Unicode code point, UTF-8 bytes, UTF-16 bytes, and decimal value.
  3. Check the summary stats for total character count, UTF-8 byte count, and UTF-16 byte count.
  4. Switch to the Hex Dump tab to see the raw UTF-8 bytes in a traditional hexdump format similar to xxd.

Common Use Cases

  • Debugging encoding issues — Inspect byte sequences to diagnose mojibake, incorrect character display, or encoding mismatches between systems.
  • Understanding multi-byte characters — See exactly how CJK characters, emoji, and accented letters are encoded in UTF-8 vs UTF-16.
  • Estimating storage and bandwidth — Compare UTF-8 and UTF-16 byte sizes to choose the most efficient encoding for your data.
  • Learning Unicode and encoding fundamentals — A hands-on reference for students and developers learning how text encoding works at the byte level.

FAQ

What is the difference between UTF-8 and UTF-16?
UTF-8 encodes characters using 1 to 4 bytes, with ASCII characters taking just 1 byte. UTF-16 uses 2 bytes for most common characters and 4 bytes for supplementary characters (like some emoji). UTF-8 is more efficient for Latin-heavy text, while UTF-16 can be more compact for CJK-heavy text.
Why do some characters show multiple bytes?
Characters outside the ASCII range (code points above U+007F) require multiple bytes in UTF-8. For example, Chinese characters typically use 3 bytes in UTF-8, and emoji with code points above U+FFFF use 4 bytes in both UTF-8 and UTF-16 (as surrogate pairs).
Is my text sent to a server?
No. All encoding analysis happens entirely in your browser using the JavaScript TextEncoder API. No data is transmitted to any server.

Related Tools