Datasets - How Neural Machine Translation Works

I’m currently working on training client-side translation models for Firefox. These models are small distilled models that can be reasonably downloaded by our users, and run directly on their machines using the CPU. In this article I’m going to explain from a high level how machine translations works. The dataIn order for a translation model to begin learning, it first needs data–a lot of data. This data comes in the form of “sentence pairs”. A “sentence”…

Encoding Text in UTF-8 – How Unicode Works (Part 2)

In part 1 of this article I covered the idea of creating character sets, and different strategies for encoding them. The article covered UTF-32 and UTF-16 encodings with the benefits and drawbacks of each. However, for most documents, UTF-8 encoding is the most popular by far, but is more complicated in its implementation. For a quick re-cap, a code point is a base unit of meaning in the Unicode. A code point can represent a single…

Diacritical Marks in Unicode

I won’t bury the lede, by the end of this article you should be able to write your name in crazy diacritics like this: Ḡ͓̟̟r̬e̱̬͔͑g̰ͮ̃͛ ̇̅T̆a̐̑͢ṫ̀ǔ̓͟m̮̩̠̟. This article is part of the Unicode and i18n series motivated by my work with internationalization in Firefox and the Unicode ICU4X sub-committee. Unicode is made up of a variety of code points that can represent many things beyond just a simple letter. The code point itself is a numeric…

Encoding Text, UTF-32 and UTF-16 – How Unicode Works (Part 1)

The standard for how to represent human writing for many years was ASCII, or American Standard Code for Information Interchange. This representation reserved 7 bits for encoding a character. This served early computing well, but did not scale as computers were used in more and more languages and cultures across the world. This article explains how this simple encoding grew into a standard that aims to represent the writing systems of every culture on Earth.

Better Code Reviews with Mercurial History Rewriting

This is a companion piece to the Better Code Reviews with git History Rewriting, but this time with mercurial. The intro from that post works for this one as well with a few minor changes: The following post is a walkthrough on how I take a larger [changeset], and break it down into well-ordered commits. At Mozilla, there is a pretty strong code review culture. Each line of code added to a project gets reviewed by…

Better Code Reviews with git History Rewriting

The following post is a walkthrough on how I take a larger pull request, and break it down into well-ordered commits. At Mozilla, there is a pretty strong code review culture. Each line of code added to a project gets reviewed by a peer before it is merged in. There can be several cycles of code review where changes are requested. This happens asynchronously, as it’s a globally distributed organization. It can be tough coming…

Drawing ASCII Art to Test a Physics System

Generating my own ASCII art in programming projects is a great way to solve certain hard problems. Diagrams made from PNGs with some kind of rendered documentation is great, but it has a high barrier to entry. Plus, this kind of documentation does not live with the code, which makes it easy to miss and forget about. I have started a series of posts detailing various strategies on how I draw with ASCII. Testing a Physics…

Documenting Regex with ASCII Art

Generating my own ASCII art in programming projects is a great way to solve certain hard problems. Diagrams made from PNGs with some kind of rendered documentation is great, but it has a high barrier to entry. Plus, this kind of documentation does not live with the code, which makes it easy to miss and forget about. I am going to start a series of posts detailing various strategies on how I draw with ASCII. Regular…

WebGL Model View Projection

As part of a fellowship with MDN in 2016, I wrote content around the use of the model, view, and projection matrix for WebGL code. This article explores how to take data within a WebGL project, and project it into the proper spaces to display it on the screen. It assumes a knowledge of basic matrix math using translation, scale, and rotation matrices. It explains the three core matrices that are typically used when composing a…

Matrix Math for the Web

As part of a fellowship with MDN in 2016, I wrote content around the use of matrix math, but with a web content spin. Matrices can be used to represent transformations of objects in space, and are used for performing many key types of computation when constructing images and visualizing data on the Web. This article explores how to create matrices and how to use them with CSS transforms and the matrix3d transform type. While this article…

How to Draw Beautiful Things in the Browser

In 2014 I wrote a pretty detailed post on getting into creative coding. It featured live code examples, and I spoke briefly about some of my philosophy with approaching creativity with programming. Beautiful code is a joy to write, but it is difficult to share that joy with other programmers, not to mention with non-programmers. In my freetime between my day job and family time I’ve been playing around with the idea of a programming poem…