Binary/B
STATUS: PROPOSED, DISCUSSED, SOME IMPLEMENTATIONS
- Implementations
- [[Implementations/Flusspferd|]], [[Implementations/GPSEE|]], [[Implementations/NarwhalRhino|]] (0.2), [[Implementations/NarwhalJSC|]] (0.2), [[Implementations/NarwhalNode|]] (0.5), [[Implementations/RingoJS|]], [[Implementations/TeaJS|]], [[Implementations/common-node|]]
Draft 2.
All platforms must support two types for interacting with binary data: ByteArray and ByteString. The ByteArray type resembles the interface of Array in that it is mutable, extensible, and indexing will return number values for the byte in the given position, zero by default, or undefined if the index is out of bounds. The ByteString type resembles the interface of String in that it is immutable and indexing returns a ByteString of length 1. These types are exported by the 'binary' top-level module and both types are subtypes of Binary, which is not instantiable but exists only for the convenience of referring to both ByteArray and ByteString. (The idea of using these particular two types and their respective names originated with Jason Orendorff in the Binary API Brouhaha discussion.)
Contents
Philosophy
This proposal is not an object oriented variation on pack and unpack with notions of inherent endianness, read/write head position, or intrinsic codec or charset information. The objects described in this proposal are merely for the storage and direct manipulation of strings and arrays of byte data. Some object oriented conveniences are made, but the exercise of implementing pack, unpack, or an object-oriented analog thereof are left as an exercise for a future proposal of a more abstract type or a 'struct' module (as mentioned by Ionut Gabriel Stan on the list). This goes against most mentioned prior art.
This proposal also does not provide named member functions for any particular subset of the possible charsets, codecs, compression algorithms, or consistent hash digests that might operate on a byte string or array. Instead, convenience member functions are provided for interfacing with any named charset, with the IANA charset name space, and with the possibility of eventually employing a system of modular extensions for other codecs or digests, requiring that the given module exports a specified interface. (As supported originally by Robert Schultz, Davey Waterson, Ross Boucher, and tacitly myself, Kris Kowal, on the First proposition thread on the mailing list). This proposal does not address the need for stream objects to support pipelined codecs and hash digests (mentioned by Tom Robinson and Robert Schultz in the same conversation).
This proposal also reflects both group sentiment and a pragmatic point about properties. This isn't a decree that properties like "length" should be consistently used throughout the CommonJS APIs. However, given that all platforms support properties at the native level (to host String and Array objects) and that byte strings and arrays will require support at the native level, pursuing client-side interoperability is beyond the scope of this proposal and therefore properties have been specified. (See comments by Kris Zyp about the implementability of properties in all platforms, comments by Davey Waterson from Aptana about the counter-productivity of attempting to support this API in browsers, and support properties over accessor and mutator functions by Ionut Gabriel Stand and Cameron McCormack on the mailing list).
The byte types provide functions for encoding, decoding, and transcoding, but they are all shallow interfaces that defer to a charset manager module, and may in turn use a system level charset or use a pair of pure JavaScript modules to transcode through an array or stream of canonical Unicode code points. This behavior may be specified further in the future.
In accordance with Daniel Friesen's Binary/C, a high priority in this proposal was String/ByteString and Array/ByteArray interoperability. For a defined subset of the String and Array APIs, Strings are interoperable with ByteStrings and Arrays with ByteArrays. To this end, it is necessary to support valueAt as an alternative to charAt on Strings with which ByteStrings would iteroperate. Supporting charAt on ByteString would be counter-intuitive. Similarly, codeAt must be implemented on both String and ByteString as an alternative to charCodeAt.
The following methods are interoperable between ByteString and String:
- codeAt returning Number (added to String)
- valueAt returning ByteString or String respectively (added to String)
- get returning ByteString or String respectively (added to String)
- indexOf
- lastIndexOf
- split
- slice
- substr
- substring
The following methods are interoperable between ByteArray and Array:
- get returning Number (added to Array)
- indexOf
- lastIndexOf
- concat
- pop
- push
- shift
- unshift
- reverse
- slice
- sort
- splice
- split
- forEach
- every
- some
- map
- reduce
- reduceRight
The following methods are interoperable between ByteString and ByteArray:
- codeAt returning Number
- valueAt returning ByteString
- get returning ByteString or Number respectively
- toByteString
- toByteArray
- toString
- toArray
- decodeToString
- indexOf
- lastIndexOf
- slice returning ByteString or ByteArray respectively
- split returning an Array of ByteStrings or ByteArrays respectively
- concat returning ByteString or ByteArray respectively
Specification
The "binary" top-level module must export "Binary", "ByteArray" and "ByteString".
(In multi-sandbox scenarios, the implementation should strive to consistently export the same "Binary", "ByteArray", and "ByteString")
Binary
A Binary function must exist. The Binary function must throw a ValueError if it is ever called or constructed. The Binary type exists only to affirm that ByteString and ByteArray instances of Binary.
ByteString
A ByteString is an immutable, fixed-width representation of a C unsigned char (byte) array that does not implicitly terminate at the first 0-valued byte. Indexing returns a byte substring of length 1. ByteString instances are instances of ByteString, Binary, and Object. The typeof a ByteString is "object". Object.prototype.call.toString(byteString) returns "[object ByteString]".
Constructor
- ByteString()
- Construct an empty byte string.
- ByteString(byteString)
- Copies byteString.
- ByteString(byteArray)
- Use the contents of byteArray.
- ByteString(arrayOfNumbers)
- Use the numbers in arrayOfNumbers as the bytes.
- If any element is outside the range 0...255, a RangeError is thrown.
- ByteString(string, charset)
- Convert a string. The ByteString will contain string encoded with charset.
Constructor methods
- join(array, delimiter)
- Like Array.prototype.join, but for Binarys. Returns a ByteString.
The following are equivalent expressions.
ByteString.join([1,2,3], 0) ByteString([1, 0, 2, 0, 3])
Instance properties
- length
- The length in bytes. Immutable.
Instance operators
- [] ByteString
- the immutable [] operator returning ByteStrings
Instance methods (in prototype)
- toByteArray()
- Returns a byte for byte copy in a ByteArray.
- toByteArray(sourceCharset, targetCharset)
- Returns a transcoded copy in a ByteArray.
- toByteString()
- Returns itself, since there's no need to copy an immutable ByteString.
- toByteString(sourceCharset, targetCharset)
- Returns a transcoded copy.
- toArray()
- Returns an array containing the bytes as numbers.
- toArray(charset)
- Returns an array containing the decoded Unicode code points.
- toString()
- Returns a debug representation like "[ByteString 10]", where 10 is the length of the Array. Alternative debug representations are valid too, as long as (1) this method will never fail, (2) the length is included.
- toSource()
- returns "ByteString([])" for a null byte string or "ByteString([0, 1, 2])" for a byte string of length 3 with bytes 0, 1, and 2.
- decodeToString(charset)
- Returns the decoded ByteArray as a string.
- indexOf(byte)
- indexOf(byte, start)
- indexOf(byte, start, stop)
- Returns the index of the first occurance of byte (a Number or a ByteString or ByteArray of any length) or -1 if none was found. If start and/or stop are specified, only elements between the the indexes start and stop are searched.
- lastIndexOf(byte)
- lastIndexOf(byte, start)
- lastIndexOf(byte, start, stop)
- Returns the index of the last occurance of byte (a Number or a ByteString or ByteArray of any length) or -1 if none was found. If start and/or stop are specified, only elements between the the indexes start and stop are searched.
- codeAt(offset) Number
- Return the byte at offset as a Number.
- byteAt(offset) ByteString
- valueAt(offset) ByteString
- get(offset) ByteString
- Return the byte at offset as a ByteString. get(offset) is analogous to indexing with brackets.
- copy(target, start, stop, targetStart)
- copies the Number values from this ByteString between start and stop to a target ByteArray or Array at the targetStart offset. start, stop, and targetStart must be Numbers, undefined, or omitted. If omitted, start is presumed to be 0. If omitted, stop is presumed to be the length of this ByteString. If omitted, targetStart is presumed to be 0.
- split(delimiter, [options])
- Split at delimiter, which can by a Number, a ByteString, a ByteArray or an Array of the prior (containing multiple delimiters, i.e., "split at any of these delimiters"). Delimiters can have arbitrary size.
- Options is an optional object parameter with the following optional properties:
- count - Maximum number of elements (ignoring delimiters) to return. The last returned element may contain delimiters.
- includeDelimiter - Whether the delimiter should be included in the result.
- Returns an array of ByteStrings.
- slice()
- slice(start)
- slice(start, stop)
- See Array.prototype.slice
- concat(...ByteString/ByteArray/Array...)
- returns a ByteString composed of itself concatenated with the given ByteString, ByteArray, and Array values.
- substr(start)
- See String.prototype.substr
- substr(start, length)
- See String.prototype.substr
- substring(first)
- See String.prototype.substring
- substring(first, last)
- See String.prototype.substring
ByteString does not implement toUpperCase() or toLowerCase() since they are not meaningful without the context of a charset.
ByteArray
A ByteArray is a mutable, flexible representation of a C unsigned char (byte) array. Instances of ByteArray are instances of ByteArray, Binary, and Object. The typeof a ByteArray is "object". Object.prototype.toString.call(byteArray) returns "[object ByteArray]".
Constructor
- ByteArray()
- New, empty ByteArray.
- ByteArray(length)
- New ByteArray filled with length zero bytes.
- ByteArray(byteArray)
- Copy byteArray.
- ByteArray(byteString)
- Copy contents of byteString.
- ByteArray(arrayOfBytes)
- Use numbers in arrayOfBytes as contents.
- Throws an exception if any element is outside the range 0...255 (TODO).
- ByteArray(string, charset)
- Create a ByteArray from a Javascript string, the result being encoded with charset.
Unlike the Array, the ByteArray is not variadic so that its initial length constructor is not ambiguous with its copy constructor.
All values within the length of the array are numbers stored as bytes that default to 0 if they have not been explicitly set. Assigning beyond the bounds of a ByteArray implicitly grows the array, just like an Array. Retrieving a value from an index that is out of the bounds of the Array, lower than 0 or at or beyond the length, the returned value is "undefined". Assigning an index with a value that is larger than fits in a byte will be implicitly and silently masked against 0xFF. Negative numbers will be bit extended to a byte in two's complement form and likewise masked.
Instance properties
- mutable length property
- extending a byte array fills the new entries with 0.
Instance Operators
- [] Number
- The mutable [] operator for numbers
Instance methods (in prototype)
- toArray()
- n array of the byte values
- toArray(charset)
- an array of the code points, decoded
- toString()
- A string debug representation like "[ByteArray 10]", where 10 is the length of the byte string. This method must not throw an error.
- toSource()
- returns "ByteArray([])" for a null byte array or "ByteArray([0, 1, 2])" for a byte array of length 3 with bytes 0, 1, and 2.
- decodeToString(charset String)
- returns a String from its decoded bytes in a given charset.
- toByteArray()
- returns a copy
- toByteArray(sourceCharset, targetCharset)
- transcoded
- toByteString()
- byte for byte copy
- toByteString(sourceCharset, targetCharset)
- transcoded
- byteAt(offset) ByteString
- valueAt(offset) ByteString
- Return the byte at offset as a ByteString.
- get(offset) Number
- codeAt(offset) Number
- Return the byte at offset as a Number. get(offset) is anlaogous to indexing with brackets.
- copy(target, start, stop, targetStart)
- copies the Number values from this ByteArray between start and stop to a target ByteArray or Array at the targetStart offset. start, stop, and targetStart must be Numbers, undefined, or omitted. If omitted, start is presumed to be 0. If omitted, stop is presumed to be the length of this ByteArray. If omitted, targetStart is presumed to be 0.
- fill(value, start, stop)
- fills each of the contained bytes with the given value (either a unary ByteString, ByteArray, or a Number) from the "start" offset to the "stop" offset. "start" and "stop" must be numbers, undefined, or omitted. If omitted, "start" is presumed to be 0. If omitted, "stop" is presumed to beh the length of this ByteArray.
- concat(other
- ByteArray|ByteString|Array) ByteArray
- pop() byte
- Number
- See Array.prototype.pop
- push(...variadic Numbers...) -> count(Number)
- See Array.prototype.push
- extendRight(...variadic Numbers / Arrays / ByteArrays / ByteStrings ...)
- equivalent to this.push.apply(this, arguments)
- shift() byte
- Number
- See Array.prototype.shift
- unshift(...variadic Numbers...) count
- Number
- See Array.prototype.unshift
- extendLeft(...variadic Numbers / Arrays / ByteArrays / ByteStrings ...)
- equivalent to this.unshift.apply(this, arguments)
- reverse() in place reversal
- See Array.prototype.reverse
- slice()
- See Array.prototype.slice
- sort()
- See Array.prototype.sort
- splice()
- See Array.prototype.splice
- indexOf()
- See Array.prototype.indexOf
- lastIndexOf()
- See Array.prototype.lastIndexOf
- split()
- Returns an array of ByteArrays instead of ByteStrings.
- filter(callback[, thisObject])
- See Array.prototype.filter
- forEach(callback[, thisObject])
- See Array.prototype.forEach
- every(callback[, thisObject])
- See Array.prototype.every
- some(callback[, thisObject])
- See Array.prototype.some
- map(callback[, thisObject])
- See Array.prototype.map
- reduce(callback[, initialValue])
- See Array.prototype.reduce
- reduceRight(callback[, initialValue])
- See Array.prototype.reduceRight
- displace(start, stop, values/ByteStrings/ByteArrays/Arrays...) -> length
- start/stop are specified like for slice. Can be used like splice but does not return the removed elements.
String
The String prototype will be extended with the following members:
- toByteArray(charset)
- Converts a string to a ByteArray encoded in charset.
- toByteString(charset)
- Converts a string to a ByteString encoded in charset.
- charCodes()
- Returns an array of Unicode code points (as numbers).
- charAt(offset) Number
- an alias for charCodeAt that interoperates with ByteString's charAT.
- get(offset) String
- an alias for the index operator and interoperable with ByteString, ByteArray, and Array.
Array
The Array prototype will be extended with the following members:
- toByteArray(charset)
- Converts an array of Unicode code points to a ByteArray encoded in charset.
- toByteString(charset)
- Converts an array of Unicode code points to a ByteString encoded in charset.
- get(offset)
- an alias for the index operator and interoperable with ByteString, ByteArray, and String.
General Requirements
None of the specified prototypes or augmentations to existing prototypes are enumerable.
Any operation that requires encoding, decoding, or transcoding among charsets may throw an error if that charset is not supported by the implementation. All implementations MUST support "us-ascii" and "utf-8".
Charset strings are as defined by IANA http://www.iana.org/assignments/character-sets.
Charsets are case insensitive.
Tests
- CommonJS tests compatible with this test framework. (targetting draft 1)