Converter · SOTA Unicode 16 + Intl.Segmenter

Unicode Inspector

Inspect any string at the Unicode codepoint level: names, General_Category, Script, Block, bidi class, and all four normalization forms (NFC/NFD/NFKC/NFKD). Grapheme cluster breakdown via Intl.Segmenter for emoji-accurate analysis.

Text

Summary

Grapheme clusters (Intl.Segmenter)

Codepoints

Normalization forms (side-by-side)

How to Use Unicode Inspector in 3 Steps

Configure. Paste any text — emoji, mixed scripts, invisible characters. The tool breaks it into codepoints with U+XXXX notation, official name, and category.
Process. Compare the 4 normalization forms side-by-side. NFC is the web default (shortest composed form); NFKD decomposes compatibility forms like full-width characters and superscripts. Mismatches between forms hint at equality bugs.
Export. View the Intl.Segmenter grapheme breakdown — family emoji 👨‍👩‍👧 shows as 1 grapheme but 7 codepoints, explaining why JS .length counts wrong.

Why Unicode Inspector on Pixlane

Unicode is full of invisible footguns — combining marks, zero-width joiners, homoglyphs, normalization mismatches that make equal-looking strings compare unequal. Pixlane's Unicode Inspector shows every codepoint in a string with its official name (Unicode 16 data), General_Category, Script, and Block. The side-by-side NFC/NFD/NFKC/NFKD comparison surfaces normalization issues (a common source of equality bugs), and Intl.Segmenter (ES2024) produces accurate grapheme cluster views so ZWJ sequences and variation selectors are grouped as users perceive them.

Unicode 16 Codepoint DB — Uses up-to-date codepoint names, categories, scripts, and blocks. Includes Unicode 16 additions (Sept 2024): Garay script, new CJK additions, symbols for legacy computing.
4 Normalization Forms — NFC / NFD / NFKC / NFKD side-by-side. Spot invisible differences: é (U+00E9) vs e + combining acute (U+0065 U+0301) look identical but compare unequal.
Grapheme Clusters — Intl.Segmenter (granularity: grapheme) groups codepoints into user-perceived characters. The gold standard for character counting, cursor movement, and truncation.
Homoglyph Detection — Flags visually identical characters from different scripts (Cyrillic а U+0430 vs Latin a U+0061). Critical for security-sensitive contexts like domain names and usernames.

Frequently Asked Questions

What is a codepoint?

A number (0 to 0x10FFFF) that Unicode assigns to every character, symbol, and glyph. Written as U+XXXX in hex. A single user-perceived character (grapheme) may be made of multiple codepoints — especially for emoji, accented letters, and complex scripts.

What's the difference between NFC and NFKD?

NFC is canonical composition — shortest equivalent form, what the web uses by default. NFKD is compatibility decomposition — it also breaks apart visual variants like ⅓ → 1⁄3 and full-width A → A. Use NFC for equality, NFKD for matching/searching.

Why does my string length look wrong?

JavaScript's .length counts UTF-16 code units (not codepoints, not graphemes). An emoji like 😀 is 2 code units. A family emoji like 👨‍👩‍👧 is 8 code units but 1 grapheme. Use Intl.Segmenter for correct counting — Pixlane's Text Counter does this.

Is this tool free?

Yes. Unicode Inspector on Pixlane is completely free with no signup required.