Tuesday, November 29, 2011

Base64 for Unicode UTF16

Recently I needed to store a JPEG file in a Unicode UTF16 string. In UTF8 or ASCII, this is a trivial problem - use base64 encoding. But base64 encoding, which is 75% efficient for a single or multibyte character set, drops to 37.5% efficiency with double bytes - which means that the storage required for my JPEG data would almost triple. Since I had to store potentially thousands of images in memory simultaneously, this was a problem.

I ran across a great blog entry by Markus Scherer about a technique called Base16k. This seemed to be exactly what I needed, so I translated the JavaScript sample code into C++.

Note that this technique is designed particularly for UTF16. You will lose the efficiency if you write it to a file as UTF8. If you use UTF8, then don't use Base16k. Use base64 instead.

You can download the source code and sample project from: