Converting between strings and ArrayBuffers
Is there a commonly accepted technique for efficiently converting JavaScript strings to ArrayBuffers and vice-versa? Specifically, I'd like to be able to write the contents of an ArrayBuffer to localStorage
and to read it back.
Solution 1:
Update 2016 - five years on there are now new methods in the specs (see support below) to convert between strings and typed arrays using proper encoding.
TextEncoder
The TextEncoder
represents:
The
TextEncoder
interface represents an encoder for a specific method, that is a specific character encoding, likeutf-8
,An encoder takes a stream of code points as input and emits a stream of bytes.iso-8859-2
,koi8
,cp1261
,gbk
, ...
Change note since the above was written: (ibid.)
Note: Firefox, Chrome and Opera used to have support for encoding types other than utf-8 (such as utf-16, iso-8859-2, koi8, cp1261, and gbk). As of Firefox 48 [...], Chrome 54 [...] and Opera 41, no other encoding types are available other than utf-8, in order to match the spec.*
*) Updated specs (W3) and here (whatwg).
After creating an instance of the TextEncoder
it will take a string and encode it using a given encoding parameter:
if (!("TextEncoder" in window))
alert("Sorry, this browser does not support TextEncoder...");
var enc = new TextEncoder(); // always utf-8
console.log(enc.encode("This is a string converted to a Uint8Array"));
You then of course use the .buffer
parameter on the resulting Uint8Array
to convert the underlaying ArrayBuffer
to a different view if needed.
Just make sure that the characters in the string adhere to the encoding schema, for example, if you use characters outside the UTF-8 range in the example they will be encoded to two bytes instead of one.
For general use you would use UTF-16 encoding for things like localStorage
.
TextDecoder
Likewise, the opposite process uses the TextDecoder
:
The
TextDecoder
interface represents a decoder for a specific method, that is a specific character encoding, likeutf-8
,iso-8859-2
,koi8
,cp1261
,gbk
, ... A decoder takes a stream of bytes as input and emits a stream of code points.
All available decoding types can be found here.
if (!("TextDecoder" in window))
alert("Sorry, this browser does not support TextDecoder...");
var enc = new TextDecoder("utf-8");
var arr = new Uint8Array([84,104,105,115,32,105,115,32,97,32,85,105,110,116,
56,65,114,114,97,121,32,99,111,110,118,101,114,116,
101,100,32,116,111,32,97,32,115,116,114,105,110,103]);
console.log(enc.decode(arr));
The MDN StringView library
An alternative to these is to use the StringView
library (licensed as lgpl-3.0) which goal is:
- to create a C-like interface for strings (i.e., an array of character codes — an ArrayBufferView in JavaScript) based upon the JavaScript ArrayBuffer interface
- to create a highly extensible library that anyone can extend by adding methods to the object StringView.prototype
- to create a collection of methods for such string-like objects (since now: stringViews) which work strictly on arrays of numbers rather than on creating new immutable JavaScript strings
- to work with Unicode encodings other than JavaScript's default UTF-16 DOMStrings
giving much more flexibility. However, it would require us to link to or embed this library while TextEncoder
/TextDecoder
is being built-in in modern browsers.
Support
As of July/2018:
TextEncoder
(Experimental, On Standard Track)
Chrome | Edge | Firefox | IE | Opera | Safari
----------|-----------|-----------|-----------|-----------|-----------
38 | ? | 19° | - | 25 | -
Chrome/A | Edge/mob | Firefox/A | Opera/A |Safari/iOS | Webview/A
----------|-----------|-----------|-----------|-----------|-----------
38 | ? | 19° | ? | - | 38
°) 18: Firefox 18 implemented an earlier and slightly different version
of the specification.
WEB WORKER SUPPORT:
Experimental, On Standard Track
Chrome | Edge | Firefox | IE | Opera | Safari
----------|-----------|-----------|-----------|-----------|-----------
38 | ? | 20 | - | 25 | -
Chrome/A | Edge/mob | Firefox/A | Opera/A |Safari/iOS | Webview/A
----------|-----------|-----------|-----------|-----------|-----------
38 | ? | 20 | ? | - | 38
Data from MDN - `npm i -g mdncomp` by epistemex
Solution 2:
Although Dennis and gengkev solutions of using Blob/FileReader work, I wouldn't suggest taking that approach. It is an async approach to a simple problem, and it is much slower than a direct solution. I've made a post in html5rocks with a simpler and (much faster) solution: http://updates.html5rocks.com/2012/06/How-to-convert-ArrayBuffer-to-and-from-String
And the solution is:
function ab2str(buf) {
return String.fromCharCode.apply(null, new Uint16Array(buf));
}
function str2ab(str) {
var buf = new ArrayBuffer(str.length*2); // 2 bytes for each char
var bufView = new Uint16Array(buf);
for (var i=0, strLen=str.length; i<strLen; i++) {
bufView[i] = str.charCodeAt(i);
}
return buf;
}
EDIT:
The Encoding API helps solving the string conversion problem. Check out the response from Jeff Posnik on Html5Rocks.com to the above original article.
Excerpt:
The Encoding API makes it simple to translate between raw bytes and native JavaScript strings, regardless of which of the many standard encodings you need to work with.
<pre id="results"></pre>
<script>
if ('TextDecoder' in window) {
// The local files to be fetched, mapped to the encoding that they're using.
var filesToEncoding = {
'utf8.bin': 'utf-8',
'utf16le.bin': 'utf-16le',
'macintosh.bin': 'macintosh'
};
Object.keys(filesToEncoding).forEach(function(file) {
fetchAndDecode(file, filesToEncoding[file]);
});
} else {
document.querySelector('#results').textContent = 'Your browser does not support the Encoding API.'
}
// Use XHR to fetch `file` and interpret its contents as being encoded with `encoding`.
function fetchAndDecode(file, encoding) {
var xhr = new XMLHttpRequest();
xhr.open('GET', file);
// Using 'arraybuffer' as the responseType ensures that the raw data is returned,
// rather than letting XMLHttpRequest decode the data first.
xhr.responseType = 'arraybuffer';
xhr.onload = function() {
if (this.status == 200) {
// The decode() method takes a DataView as a parameter, which is a wrapper on top of the ArrayBuffer.
var dataView = new DataView(this.response);
// The TextDecoder interface is documented at http://encoding.spec.whatwg.org/#interface-textdecoder
var decoder = new TextDecoder(encoding);
var decodedString = decoder.decode(dataView);
// Add the decoded file's text to the <pre> element on the page.
document.querySelector('#results').textContent += decodedString + '\n';
} else {
console.error('Error while requesting', file, this);
}
};
xhr.send();
}
</script>
Solution 3:
You can use TextEncoder
and TextDecoder
from the Encoding standard, which is polyfilled by the stringencoding library, to convert string to and from ArrayBuffers:
var uint8array = new TextEncoder().encode(string);
var string = new TextDecoder(encoding).decode(uint8array);
Solution 4:
Blob is much slower than String.fromCharCode(null,array);
but that fails if the array buffer gets too big. The best solution I have found is to use String.fromCharCode(null,array);
and split it up into operations that won't blow the stack, but are faster than a single char at a time.
The best solution for large array buffer is:
function arrayBufferToString(buffer){
var bufView = new Uint16Array(buffer);
var length = bufView.length;
var result = '';
var addition = Math.pow(2,16)-1;
for(var i = 0;i<length;i+=addition){
if(i + addition > length){
addition = length - i;
}
result += String.fromCharCode.apply(null, bufView.subarray(i,i+addition));
}
return result;
}
I found this to be about 20 times faster than using blob. It also works for large strings of over 100mb.