Encoding Text in UTF-8 – How Unicode Works (Part 2)
In part 1 of this article I covered the idea of creating character sets, and different strategies for encoding them. The article covered UTF-32 and UTF-16 encodings with the benefits and drawbacks of each. However, for most documents, UTF-8 encoding is the most popular by far, but is more complicated in its implementation.
For a quick re-cap, a code point is a base unit of meaning in the Unicode. A code point can represent a single…