Binary/D
STATUS: PROPOSAL
This proposal, based on Binary/B, defines types and methods for constructing, manipulating, and transcoding binary data in both mutable and immutable forms. ByteString resembles its immutable text analog, String, and ByteArray resembles its mutable analog, Array. Both types constrain their data to the byte domain and provide a memory-safe interface to underlying continguous, random-access byte collections.
An overview of the supported types, methods, signatures, and corresponding return types is on Google Docs.
Contents
Specification
The "binary" top-level module must export ByteArray, ByteString, BitArray, BitString, Range, a charset support checking function, and the following alphabets for radix encodings.
- ByteString
- immutable, byte quantized
- ByteArray
- mutable, explicitly resizable, byte quantized
- BitString
- immutable, bit quantized
- BitArray
- mutable, explicitly resizable, bit quantized
- supports(charset String) Boolean
- returns whether the given charset is supported for all encoding, decoding, or transcoding operations.
- base8 String
- The base 8 alphabet, including padding, "01234567=".
- base16 String
- The base 16 alphabet, "0123456789abcdef".
- base32 String
- The default alphabet for base 32 encoding as specified by Doug Crockford. See the toString(radix, alphabet) methods of ByteString and ByteArray.
- rfc4648 String
- An alternate alphabet for base 32 encoding as specified by RFC 4648. See the toString(radix, alphabet)
- base64 String
- The base 64 alphabet, "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/".
ByteString
A ByteString is an immutable, fixed-width representation of a C unsigned char (byte) array that does not implicitly terminate at the first 0-valued byte.
ByteString instances are comparable with the == and === operators based on equal order and respective values of their content.
Constructor
A ByteString function must exist. When called as a function, it must return a byte string. When constructed with "new", it must throw a TypeError.
ByteString(); assert.throws(function () { new ByteString(); }, TypeError);
- ByteString()
- Construct an empty byte string.
- ByteString(byteString)
- Copies byteString.
- ByteString(byteArray)
- Use the contents of byteArray. For performance, transparently to the layer specified here, the "ByteArray" may relinquish ownership of its underlying byte buffer and switch to a copy-on-write mode.
- ByteString(arrayOfNumbers)
- Use the numbers in arrayOfNumbers as the bytes. Coerces Numbers to bytes by selecting their least significant eight bits.
- ByteString(string, charset)
- Convert a string. The ByteString will contain string encoded with charset.
Prototype Properties
- toByteArray() ByteArray
- Returns a byte for byte copy in a ByteArray.
- Note: for best performance, implementations should transfer ownership of the underlying buffer and flag the original owner to copy-on-write.
- toByteArray(sourceCharset, targetCharset) ByteArray
- Returns this transcoded from the source charset to the target charset as a ByteArray. The source and target charset are internally cast to String with the String constructor.
- toByteString() ByteString
- Returns this ByteString.
- toByteString(sourceCharset, targetCharset) ByteString
- Returns this transcoded from the source charset to the target charset as a ByteString. The source and target charset are internally cast to String with the String constructor.
- toBitArray() BitArray
- Returns this as a BitArray by composing each byte, in consistent order, into groups of eight bits from most significant to least significant.
ByteArray([3]).toBitArray()
andBitArray([0, 0, 0, 0, 0, 0, 1, 1])
are equivalent. - toBitString() BitString
- Returns this as a BitString by composing each byte, in consistent order, into groups of eight bits from most significant to least significant.
ByteString([3]).toBitString() == BitString([0, 0, 0, 0, 0, 0, 1, 1])
. - toString() String
- Returns a debug representation like "[ByteString 10]", where 10 is the length this ByteString.
- toString(charset) String
- Decodes this ByteString according to the given character set and returns the String from the corresponding unicode points. May throw a ValueError if the byte string is malformed or if any of the code points are out of the implementation's supported range.
- toString(radix, alphabet_opt) String
- encodes this as base 2, 8, 16, 32, or 64 with the given alphabet. The default padding character is "=", but can be overridden with an alternate padding character at the index of the radix in the alphabet. Throws a "ValueError" if the alphabet is not the length of the radix or one larger than the radix. The default alphabet corresponds to Doug Crockford's base32 alphabet for human-error resistant binary communication. "rfc4648" is an alternate alphabet. These alphabets are exported by the "binary" module as "base32" and "rfc4648" respectively.
- toArray() Array
- Returns an Array containing the bytes as Numbers.
- valueOf(endian_opt) Number
- Returns this as a Number of the highest precision storable, treating these bytes as ordered in the given endianness. endian may be omitted, undefined, "LE" or "BE". If omitted or undefined, defaults to "BE". If "LE", these bytes are treated as little-endian, or least to most significant. If "BE", these bytes are treated as big-endian, albeit network-byte-order, most to least significant.
- toSource() String
- returns "require("binary").ByteString([])" for a null byte string or "require("binary").ByteString([0, 1, 2])" for a byte string of length 3 with bytes 0, 1, and 2.
- valueAt(offset_opt) Number
- Returns the byte at offset as a Number. The offset is coerced internally with the Number constructor. Thus, if it is omitted or passed as undefined, it defaults to getting the value at offset 0.
- byteStringAt(offset_opt) ByteString
- Returns a unary (length 1) ByteString of the byte at the given offset. The offset is coerced internally with the Number constructor. Thus, if it is omitted or passed as undefined, it defaults to getting the value at offset 0.
- indexOf(values, start_opt, stop_opt) Number
- Returns the index of the first occurence of of a byte or consecutive bytes as represented by a Number, ByteString, or ByteArray of any length, or returns -1 if no match was found. If start and/or stop are specified, only elements between the the indexes start and stop are searched. start defaults to 0 and stop defaults to the length of this.
- lastIndexOf(values, start_opt, stop_opt) Number
- Returns the index of the last occurence of of a byte or consecutive bytes as represented by a Number, ByteString, or ByteArray of any length, or returns -1 if no match was found. If start and/or stop are specified, only elements between the the indexes start and stop are searched. start defaults to 0 and stop defaults to the length of this.
- range(start_opt, stop_opt) Range
- Returns a Range object. The value of stop defaults to the length of this. The value of start defaults to 0. Uses this for the Range's ref property.
- copy(target, targetStart, start, stop) ByteString
- Copies the Number values of each byte from this between start and stop to a target ByteArray or Array at the targetStart offset. If undefined or omitted, stop is presumed to be the length of this. targetStart, start, and stop are internally coerced to Numbers with the Number constructor. Throws a TypeError if the target is not a ByteArray or Array. Returns this.
- slice(start_opt, stop_opt) ByteString
- See Array.prototype.slice
- substr(start_opt, length_opt) ByteString
- See String.prototype.substr
- substring(start_opt, last_opt) ByteString
- See String.prototype.substring
- concat(...) ByteString
- Returns a ByteString composed of itself concatenated with the given ByteString, ByteArray, and Array values. Throws a TypeError if any of the arguments are not a ByteString, ByteArray, or Array of Numbers. Coerces Numbers to bytes by selecting their least significant eight bits.
- join(array) ByteString
- Returns a ByteString of the ByteStrings, ByteArrays, Arrays of Numbers, or a combination thereof, delimited by this. Arrays of Numbers are converted to ByteArrays. Coerces Numbers to bytes by selecting their least significant eight bits.
- split(delimiter_opt, max_opt) Array
- Returns an Array of ByteStrings pared from the ByteString between strings of ByteStrings that match the given delimiter for up to the maximum number of delimiters, starting from the left. If omitted, the maximum defaults to Infinity. The delimiter defaults to an empty ByteString. The delimiter is internally coerced to a ByteString with the ByteString constructor.
- splitRight(delimiter_opt, max_opt) Array
- Returns an Array of ByteStrings pared from the ByteString between strings of ByteStrings that match the given delimiter for up to the maximum number of delimiters, starting from the right. If omitted, the maximum defaults to Infinity. The delimiter defaults to an empty ByteString. The delimiter is internally coerced to a ByteString with the ByteString constructor.
- Content
- an alias of ByteString, indicating that ByteString is the type of what [[Get]] returns.
Internal Properties
- [[Get]] ByteString
- Returns a unary (length 1) ByteString of the byte at the given offset.
- [[Put]]
- throws a TypeError.
Instance Properties
- length
- The length in bytes. Not [[Writable]], not [[Configurable]], not [[Enumerable]].
ByteArray
A ByteArray is a mutable, flexible (explicitly growable and shrinkable) representation of a C unsigned char (byte) array.
Constructor
A ByteArray function must exist. Calling and constructing a ByteArray both return a new ByteArray instance.
ByteArray() instanceof ByteArray; new ByteArray() instanceof ByteArray; typeof ByteArray() === "object"; typeof new ByteArray() === "object";
- ByteArray()
- New, empty ByteArray.
- ByteArray(length Number, fill_opt Number)
- New ByteArray of the given length, with the given fill number at all offsets. The default filler is 0.
- ByteArray(byteArray)
- Copy byteArray.
- ByteArray(byteString)
- Copy contents of byteString.
- ByteArray(arrayOfBytes)
- Use numbers in arrayOfBytes as contents. Coerces Numbers to bytes by selecting their least significant eight bits.
- ByteArray(string, charset)
- Create a "ByteArray" from a "String", the result being encoded with charset.
Prototype Properties
- toByteArray() ByteArray
- Returns a byte for byte copy in a ByteArray.
- Note: for best performance, implementations should transfer ownership of the underlying buffer and flag the original owner to copy-on-write.
- toByteArray(sourceCharset, targetCharset) ByteArray
- Returns this transcoded from the source charset to the target charset as a ByteArray. The source and target charset are internally cast to String with the String constructor.
- toByteString() ByteString
- Returns a byte for byte copy in a ByteString.
- Note: for best performance, implementations should transfer ownership of the underlying buffer and flag the original owner to copy-on-write.
- toByteString(sourceCharset, targetCharset) ByteString
- Returns this transcoded from the source charset to the target charset as a ByteString. The source and target charset are internally cast to String with the String constructor.
- toBitArray() BitArray
- Returns this as a BitArray by composing each byte, in consistent order, into groups of eight bits from most significant to least significant.
ByteArray([3]).toBitArray()
andBitArray([0, 0, 0, 0, 0, 0, 1, 1])
are equivalent. - toBitString() BitString
- Returns this as a BitString by composing each byte, in consistent order, into groups of eight bits from most significant to least significant.
ByteString([3]).toBitString() == BitString([0, 0, 0, 0, 0, 0, 1, 1])
. - toString() String
- Returns a debug representation like "[ByteArray 10]", where 10 is the length this ByteArray.
- toString(charset) String
- Decodes this according to the given character set and returns the String from the corresponding unicode points. May throw a ValueError if the byte string is malformed or if any of the code points are out of the implementation's supported range.
- toString(radix, alphabet_opt) String
- encodes this as base 2, 8, 16, 32, or 64 with the given alphabet. The default padding character is "=", but can be overridden with an alternate padding character at the index of the radix in the alphabet. Throws a "ValueError" if the alphabet is not the length of the radix or one larger than the radix. The default alphabet corresponds to Doug Crockford's base32 alphabet for human-error resistant binary communication. "rfc4648" is an alternate alphabet. These alphabets are exported by the "binary" module as "base32" and "rfc4648" respectively.
- toArray() Array
- Returns an Array containing the bytes as Numbers.
- valueOf(endian_opt) Number
- Returns this as a Number of the highest precision storable, treating these bytes as ordered in the given endianness. endian may be omitted, undefined, "LE" or "BE". If omitted or undefined, defaults to "BE". If "LE", these bytes are treated as little-endian, or least to most significant. If "BE", these bytes are treated as big-endian, albeit network-byte-order, most to least significant.
- toSource() String
- returns "require("binary").ByteArray([])" for a null byte string or "require("binary").ByteArray([0, 1, 2])" for a byte string of length 3 with bytes 0, 1, and 2.
- valueAt(offset_opt) Number
- Returns the byte at offset as a Number. The offset is coerced internally with the Number constructor. Thus, if it is omitted or passed as undefined, it defaults to getting the value at offset 0.
- byteStringAt(offset_opt) ByteString
- Returns a unary (length 1) ByteString of the byte at the given offset. The offset is coerced internally with the Number constructor. Thus, if it is omitted or passed as undefined, it defaults to getting the value at offset 0.
- indexOf(values, start_opt, stop_opt) Number
- Returns the index of the first occurence of of a byte or consecutive bytes as represented by a Number, ByteString, or ByteArray of any length, or returns -1 if no match was found. If start and/or stop are specified, only elements between the the indexes start and stop are searched. start defaults to 0 and stop defaults to the length of this.
- lastIndexOf(values, start_opt, stop_opt) Number
- Returns the index of the last occurence of of a byte or consecutive bytes as represented by a Number, ByteString, or ByteArray of any length, or returns -1 if no match was found. If start and/or stop are specified, only elements between the the indexes start and stop are searched. start defaults to 0 and stop defaults to the length of this.
- range(start_opt, stop_opt) Range
- Returns a Range object. The value of stop defaults to the length of this. The value of start defaults to 0. Uses this for the Range's ref property.
- copy(target, targetStart, start, stop) ByteArray
- Copies the Number values of each byte from this between start and stop to a target ByteArray or Array at the targetStart offset. If undefined or omitted, stop is presumed to be the length of this. targetStart, start, and stop are internally coerced to Numbers with the Number constructor. Throws a TypeError if the target is not a ByteArray or Array. Returns this.
- copyFrom(source, sourceStart, start, stop) ByteArray
- Copies the Number values of each byte from a given ByteArray or ByteString into this from start to stop at the given sourceStart offset. If undefined or omitted, stop is presumed to be the length of this. targetStart, start, and stop are internally coerced to Numbers with the Number constructor. Throws a TypeError if the target is not a ByteArray or Array. Returns this.
- fill(value, start_opt, stop_opt) ByteArray
- Fills each of the contained bytes with the given value (either a unary ByteString, ByteArray, or a Number) from the start offset to the stop offset. start and stop must be numbers, undefined, or omitted. If omitted, start is presumed to be 0. If omitted, stop is presumed to beh the length of this ByteArray. If omitted, "value" is presumed to be 0. Returns this.
- splice(start_opt, stop_opt, ...values) ByteArray
- See Array.prototype.splice
- slice(start_opt, stop_opt) ByteArray
- See Array.prototype.slice
- split(delimiter_opt, max_opt) Array
- Returns an Array of ByteArrays pared from the ByteArrays between strings of ByteArrays that match the given delimiter for up to the maximum number of delimiters, starting from the left. If omitted, the maximum defaults to Infinity. The delimiter defaults to an empty ByteArray. The dlimiter is internally coerced to a ByteArray with the ByteArray constructor.
- splitRight(delimiter_opt, max_opt) Array
- Returns an Array of ByteArrays pared from the ByteArray between strings of ByteArrays that match the given delimiter for up to the maximum number of delimiters, starting from the right. If omitted, the maximum defaults to Infinity. The delimiter defaults to an empty ByteArray. The delimiter is internally coerced to a ByteArray with the ByteArray constructor.
- forEach(callback[, thisObject])
- See Array.prototype.forEach
- every(callback[, thisObject])
- See Array.prototype.every
- some(callback[, thisObject])
- See Array.prototype.some
- map(callback[, thisObject])
- See Array.prototype.map
- Content
- an alias of Number, indicating that Number is the type of what [[Get]] returns.
Internal Properties
- [[Get]] Number
- Returns the Number value of a the byte at the given offset.
- [[Put]] Number
- Sets the byte value at the given offset. Throws a ValueError if the index is beyond the current bounds of the byte array (byte arrays can be grown or shrunk by explicitly assigning a new length).
Instance Properties
- length
- The length in bytes. Not [[Configurable]], not [[Enumerable]]. Assigning to the length of a ByteArray causes the array to reallocate its underlying buffer to the given size, copying the original buffer up to the length of the new buffer if it is less, and filling all bytes beyond the length of the original buffer with the value 0.
BitString
A "BitString" is an immutable, fixed-width representation of any number of bits (not necessarily a multiple of 8).
BitString instances are comparable with the == and === operators based on equal order and respective values of their content.
Constructor
A "BitString" function must exist. When called as a function, it must return a bit string. When constructed with "new", it must throw a "TypeError".
BitString(); assert.throws(function () { new BitString(); }, TypeError);
- BitString()
- Construct an empty bit string.
- BitString(bitString)
- Copies bitString.
- BitString(bitArray)
- Use the contents of bitArray.
- BitString(array)
- construct a bit string where a bit is on if Boolean(value) for each value of the given value is true.
- BitString(bytes)
- Returns the given ByteString or ByteArray as a BitString by composing each byte, in consistent order, into groups of eight bits from most significant to least significant.
Prototype Properties
- toByteArray() ByteArray
- Returns a ByteArray composed of these bits in groups of eight from most to least significant. Throws a ValueError if the length of this is not a multiple of 8.
- toByteString() ByteString
- Returns a ByteString composed of these bits in groups of eight from most to least significant. Throws a ValueError if the length of this is not a multiple of 8.
- toBitArray() BitArray
- Returns this as a BitArray with equivalent respective bit values.
- toBitString() BitString
- Returns this BitString.
- toString() String
- Returns this represented as a String of 0s and 1s.
- toArray() Array
- Returns an Array containing the bits as the Numbers 0 and 1.
- valueOf() Number
- Returns this as a Number of the highest precision storable, treating these bits as ordered from most to least significant. The result of valueOf must be equivalent to the Number arrived at by initializing an accumulator with 0 and iteratively multiplying the accumulator by 2 and adding the 0 or 1 Number value of each bit.
- toSource() String
- returns "require("binary").BitString([])" for a null bit string or "require("binary").BitString([0, 1, 0])" for a bit string of length 3 with bits 0, 1, and 0.
- valueAt(offset_opt) Number
- Returns the bit at offset as a Number. The offset is coerced internally with the Number constructor. Thus, if it is omitted or passed as undefined, it defaults to getting the value at offset 0.
- indexOf(values, start_opt, stop_opt) Number
- Returns the index of the first occurence of of a bit or consecutive bits as represented by a Number, BitString, or BitArray of any length, or returns -1 if no match was found. If start and/or stop are specified, only elements between the the indexes start and stop are searched. start defaults to 0 and stop defaults to the length of this.
- lastIndexOf(values, start_opt, stop_opt) Number
- Returns the index of the last occurence of of a bit or consecutive bits as represented by a Number, BitString, or BitArray of any length, or returns -1 if no match was found. If start and/or stop are specified, only elements between the the indexes start and stop are searched. start defaults to 0 and stop defaults to the length of this.
- range(start_opt, stop_opt) Range
- Returns a Range object. The value of stop defaults to the length of this. The value of start defaults to 0. Uses this for the Range's ref property.
- copy(target, targetStart, start, stop) BitString
- Copies the Number values of each bit from this between start and stop to a target BitArray or Array at the targetStart offset. If undefined or omitted, stop is presumed to be the length of this. targetStart, start, and stop are internally coerced to Numbers with the Number constructor. Throws a TypeError if the target is not a BitArray or Array. Returns this.
- slice(start_opt, stop_opt) BitString
- See Array.prototype.slice
- substr(start_opt, length_opt) BitString
- See String.prototype.substr
- substring(start_opt, last_opt) BitString
- See String.prototype.substring
- concat(...) BitString
- Returns a BitString composed of itself concatenated with the given BitString, BitArray, and Array values. Throws a TypeError if any of the arguments are not a ByteString, ByteArray, or Array of Numbers. Coerces nonzero Numbers to 1.
- join(array) BitString
- Returns a BitString of the BitStrings, BitArrays, Arrays of Numbers, or a combination thereof, delimited by this. Arrays of Numbers are converted to BitArrays, so this function may throw a RangeError.
- split(delimiter_opt, max_opt) Array
- Returns an Array of BitStrings pared from the BitStrings between strings of BitStrings that match the given delimiter for up to the maximum number of delimiters, starting from the left. If omitted, the maximum defaults to Infinity. The delimiter defaults to an empty BitString. The delimiter is internally coerced to a ByteString with the ByteString constructor.
- splitRight(delimiter_opt, max_opt) Array
- Returns an Array of BitStrings pared from the BitStrings between strings of BitStrings that match the given delimiter for up to the maximum number of delimiters, starting from the right. If omitted, the maximum defaults to Infinity. The delimiter defaults to an empty BitString. The delimiter is internally coerced to a ByteString with the ByteString constructor.
Internal Properties
- [[Get]] BitString
- Returns a unary (length 1) BitString of the byte at the given offset.
Instance Properties
- length
- The length in bits. Not [[Writable]], not [[Configurable]], not [[Enumerable]].
BitArray
A BitArray is a mutable, flexible (explicitly growable and shrinkable) representation of an ordered collection of bits, not necessarily with a length that is a multiple of 8.
Constructor
A BitArray function must exist. Calling and constructing a BitArray both return a new BitArray instance.
BitArray() instanceof BitArray; new BitArray() instanceof BitArray; typeof BitArray() === "object"; typeof new BitArray() === "object";
- BitArray()
- New, empty BitArray.
- BitArray(length Number, fill_opt Number)
- New BitArray of the given length, with the given fill number at all offsets. The default filler is 0.
- BitArray(bitArray)
- Copy bitArray.
- BitArray(array)
- construct a bit string where a bit is on if Boolean(value) for each value of the given value is true.
- BitArray(bytes, endian)
- Returns the given ByteString or ByteArray as a BitArray by composing each byte, in consistent order, into groups of eight bits from most significant to least significant.
Instance Methods
- toByteArray() ByteArray
- Returns a ByteArray composed of these bits in groups of eight from most to least significant. Throws a ValueError if the length of this is not a multiple of 8.
- toByteString() ByteString
- Returns a ByteString composed of these bits in groups of eight from most to least significant. Throws a ValueError if the length of this is not a multiple of 8.
- toBitArray() BitArray
- Returns a copy of this.
- toBitString() BitString
- Returns this as a BitString with equivalent respective bit values.
- toString() String
- Returns this represented as a String of 0s and 1s.
- toArray() Array
- Returns an Array containing the bits as the Numbers 0 and 1.
- valueOf() Number
- Returns this as a Number of the highest precision storable, treating these bits as ordered from most to least significant. The result of valueOf must be equivalent to the Number arrived at by initializing an accumulator with 0 and iteratively multiplying the accumulator by 2 and adding the 0 or 1 Number value of each bit.
- toSource() String
- returns "require("binary").BitArray([])" for a null bit array or "require("binary").BitArray([0, 1, 0])" for a bit array of length 3 with bits 0, 1, and 0.
- valueAt(offset_opt) Number
- Returns the bit at offset as a Number. The offset is coerced internally with the Number constructor. Thus, if it is omitted or passed as undefined, it defaults to getting the value at offset 0.
- indexOf(values, start_opt, stop_opt) Number
- Returns the index of the first occurence of of a bit or consecutive bits as represented by a Number, BitString, or BitArray of any length, or returns -1 if no match was found. If start and/or stop are specified, only elements between the the indexes start and stop are searched. start defaults to 0 and stop defaults to the length of this.
- lastIndexOf(values, start_opt, stop_opt) Number
- Returns the index of the last occurence of of a bit or consecutive bits as represented by a Number, BitString, or BitArray of any length, or returns -1 if no match was found. If start and/or stop are specified, only elements between the the indexes start and stop are searched. start defaults to 0 and stop defaults to the length of this.
- range(start_opt, stop_opt) Range
- Returns a Range object. The value of stop defaults to the length of this. The value of start defaults to 0. Uses this for the Range's ref property.
- copy(target, targetStart, start, stop) BitArray
- Copies the Number values of each bit from this between start and stop to a target BitArray or Array at the targetStart offset. If undefined or omitted, stop is presumed to be the length of this. targetStart, start, and stop are internally coerced to Numbers with the Number constructor. Throws a TypeError if the target is not a BitArray or Array. Returns this.
- copyFrom(source, sourceStart, start, stop) BitArray
- Copies the Number values of each bit from a given BitArray or BitString into this from start to stop at the given sourceStart offset. If undefined or omitted, stop is presumed to be the length of this. targetStart, start, and stop are internally coerced to Numbers with the Number constructor. Throws a TypeError if the target is not a ByteArray or Array. Returns this.
- fill(value, start_opt, stop_opt) BitArray
- Fills each of the contained bits with the given value (either a unary BitString, BitArray, or a Number) from the start offset to the stop offset. start and stop must be numbers, undefined, or omitted. If omitted, start is presumed to be 0. If omitted, stop is presumed to beh the length of this ByteArray. If omitted, "value" is presumed to be 0. Returns this.
- splice(start_opt, stop_opt, ...values) BitArray
- See Array.prototype.splice
- slice(start_opt, stop_opt) BitArray
- See Array.prototype.slice
- split(delimiter_opt, max_opt) Array
- Returns an Array of BitStrings pared from the BitStrings between strings of BitStrings that match the given delimiter for up to the maximum number of delimiters, starting from the left. If omitted, the maximum defaults to Infinity. The delimiter defaults to an empty BitString. The delimiter is internally coerced to a ByteString with the ByteString constructor.
- splitRight(delimiter_opt, max_opt) Array
- Returns an Array of BitArrays pared from the BitArray between strings of BitArrays that match the given delimiter for up to the maximum number of delimiters, starting from the right. If omitted, the maximum defaults to Infinity. The delimiter defaults to an empty BitArray. The delimiter is internally coerced to a BitArray with the BitArray constructor.
Internal Properties
- [[Get]] Number
- returns the Number value of a the bit at the given offset.
- [[Put]] Number
- Sets the bit value at the given offset. Throws a ValueError if the index is beyond the current bounds of the bit array (bit arrays can be grown or shrunk by explicitly assigning a new length).
Instance Properties
- length
- The length in bits. Not [[Configurable]], not [[Enumerable]]. Assigning to the length of a bit array causes the array to reallocate its underlying buffer to the given size, copying the original buffer up to the length of the new buffer if it is less, and filling all bits beyond the length of the original buffer with the value 0.
Range
A range object is a convenience for copying ranges from one ByteArray or ByteString to another ByteArray.
Constructor
- new Range({ref, start, stop})
- returns a Range object with the respective properties.
Prototype Properties
- copy(target, start_opt, stop_opt)
- generically calls <codee>this.ref.copy(target, this.start, Math.min(this.stop, this.start + target.length - start), start)</code>. The default value of stop is the target length.
- copyFrom(source, start_opt, stop_opt)
- generically calls
this.ref.copyFrom(source, this.start, Math.min(this.stop, this.start + source.length - start), start)
. The default value of stop is the source length. - fill(value)
- generically calls
this.ref.fill(value, this.start, this.stop)
- splice(start_opt, stop_opt, ...values)
- generically calls
this.ref.splice.apply(this.ref, arguments)
String
Prototype Properties
- toByteArray(charset) ByteArray
- Returns a ByteArray of these unicode code points as the corresponding bytes in the given character set. Must throw an error if the given character set is not supported.
- toByteArray(radix, alphabet_opt, lookup_opt) ByteArray
- Returns a ByteArray of these characters encoded in base 2, 8, 16, 32, or 64. If omitted or undefined, the alphabet defaults to the standard alphabet, or the Doug Crockford base 32 alphabet. Optionally accepts a lookup object or function that normalizes ambiguous characters, as in the default behavior of Doug Crockford's base 32 encoding for the characters "o" to 0, and "l" to 1, and all lower-case letters to their upper-case analogs. Must throw a ValueError if any contained character cannot be expressed in the requested radix's alphabet.
- toByteString(charset) ByteString
- Returns a ByteString of these unicode code points as the corresponding bytes in the given character set. Must throw an error if the given character set is not supported.
- toByteString(radix, lookup_opt) ByteString
- Returns a ByteString of these characters encoded in base 2, 8, 16, 32, or 64. If omitted or undefined, the alphabet defaults to the standard alphabet, or the Doug Crockford base 32 alphabet. Optionally accepts a lookup object or function that normalizes ambiguous characters, as in the default behavior of Doug Crockford's base 32 encoding for the characters "o" to 0, and "l" to 1, and all lower-case letters to their upper-case analogs. Must throw a ValueError if any contained character cannot be expressed in the requested radix's alphabet.
- valueAt(offset_opt) Number
- Returns the Unicode code point Number at the given offset. The offset is coerced internally with the Number constructor. Thus, if it is omitted or passed as undefined, it defaults to getting the value at offset 0.
- charCodes() Array
- Returns an array of Unicode code points for each respective contained character.
Array
Prototype Properties
- toByteArray(charset)
- Returns a ByteArray of these unicode code points as the corresponding bytes in the given character set. Must throw an error if the given character set is not supported.
- toByteString(charset)
- Returns a ByteString of these unicode code points as the corresponding bytes in the given character set. Must throw an error if the given character set is not supported.
Number
Prototype Properties
- toByteArray(width, endian_opt) ByteArray
- Returns this as a ByteArray of the given width, extending the most significant bits of a negative Number arithmetically. The endian parameter may be omitted, undefined, "LE" or "BE". If omitted or undefined, the endian parameter defaults to "BE". If the endianness is "BE", the bytes are ordered from most to least significant, and if the endianness is "LE", they are ordered from least to most significant. If this Number translates to bits that cannot be represented
- toByteString(width, endian_opt) ByteString
- Returns this as a ByteString of the given width, extending the most significant bits of a negative Number arithmetically. The endian parameter may be omitted, undefined, "LE" or "BE". If omitted or undefined, the endian parameter defaults to "BE". If the endianness is "BE", the bytes are ordered from most to least significant, and if the endianness is "LE", they are ordered from least to most significant. If this Number translates to bits that cannot be represented
- toBitArray(length) BitArray
- Returns this as a BitArray of the given length, extending the most significant bits of a negative Number arithmetically. The bits are ordered from most to least significant.
- toBitString(length) BitString
- Returns this as a BitString of the given length, extending the most significant bits of a negative Number arithmetically. The bits are ordered from most to least significant.
General Requirements
All of the properties specified on prototypes are not [[Enumerable]] on instances.
Any operation that requires encoding, decoding, or transcoding among charsets may throw an error if that charset is not supported by the implementation. All implementations MUST support "us-ascii", "utf-8", and "utf-16".
Charset strings are as defined by IANA http://www.iana.org/assignments/character-sets.
Charsets are case insensitive.
Endianness arguments are case insensitive.
Rationale
The idea of using the particular ByteString and ByteArray types and their respective names originated with Jason Orendorff in the Binary API Brouhaha discussion.
Structures
This proposal is not an object oriented variation on Perl, PHP, Python, or Ruby's pack, unpack, and calcsize with notions of inherent endianness, read/write head position, or intrinsic codec or charset information. The objects described in this proposal are merely for the storage and direct manipulation of strings and arrays of byte data. Some object oriented conveniences are made, but the exercise of implementing pack, unpack, or an object-oriented analog thereof are left as an exercise for a future proposal of a more abstract type or a 'struct' module (as mentioned by Ionut Gabriel Stan on the list). This goes against most mentioned prior art.
Encoding Methods
This proposal also does not provide separate member functions for any particular subset of the possible charsets, encodings, compression algorithms, or consistent hash digests that might operate on a byte string or array, for example "toBase64", "toMd5", or "toUtf8" are not specified. Instead, "toString" accepts IANA charset names and radix numbers for charsets and encodings. The intent is that implementations will opt to make this extensible by falling back to an "encodings" module and searching for modules by the same name and calling "encode" or "decode" exports on those modules if they exist. (As supported originally by Robert Schultz, Davey Waterson, Ross Boucher, and tacitly myself, Kris Kowal, on the First proposition thread on the mailing list). This proposal does not address the need for stream objects to support pipelined codecs and hash digests (mentioned by Tom Robinson and Robert Schultz in the same conversation).
[[Get]] and [[Put]]
This proposal also reflects both group sentiment and a pragmatic point about properties. This isn't a decree that properties like "length" should be consistently used throughout the CommonJS APIs. However, given that all platforms support properties at the native level (to host String and Array objects) and that byte strings and arrays will require support at the native level, pursuing client-side interoperability is beyond the scope of this proposal and therefore properties have been specified. (See comments by Kris Zyp about the implementability of properties in all platforms, comments by Davey Waterson from Aptana about the counter-productivity of attempting to support this API in browsers, and support properties over accessor and mutator functions by Ionut Gabriel Stand and Cameron McCormack on the mailing list).
Future Proofing
This proposal suggests that the specified data types should be exported by a "binary" module. The intent is that eventually ECMAScript will specify native types to replace these, and these new types would be hosted as "primordials", free variables available in all top-level scopes. Because these types are specified to be exported by the "binary" module, there is some ambiguity of whether "instanceof" relationships would be maintained when references are passed among global contexts. It would be valuable for these identities to be preserved. For module systems that share a common global scope, this would suggest that the binary types should be patched into some commonly shared object, like the global scope, only if they do not yet exist. That would permit the first, permissive context to construct the types, and all subsequent sandboxes to share them. Secure sandboxes would have to make other accomodations.
Memory Optimization
In contrast to Maciej Stachowiak's proposal, there are no methods for explicitly managing the mutability of the underlying buffer. Since it is possible for implementations to explicitly use ownership and copy-on-write flags, it is desirable to keep the memory management beneath the surface of JavaScript. For example, with freezing semantics as proposed by Maciej, freezing a mutable byte array could produce an immutable byte string that usurps ownership of the original byte array's buffer, so there would be no need for allocation. However, this would render the original byte array unusable, and all persisting references to that array would be broken and we would have to define failure modes for that object. Alternately, the byte array and byte string could share the buffer. The byte string would be guaranteed to not modify the content of the buffer, and if the owner of the original byte array were to perform a modifying operation on the byte array, the implementation could at that time incur the cost of copying the buffer. Similar optimizations could be applied by all ByteString and ByteArray constructors and conversion functions.
Genericity
In accordance with Daniel Friesen's Binary/C, a high priority in this proposal was duck typing.
- a generic way to get a value in the Content type of a ByteArray, ByteString, or String
- [[Get]]
- a generic way to get the type of what is returned by [[Get]] for ByteArray, ByteString, and String (Number, ByteString, and String respectively)
- Content
- a generic way to get a ByteString for the value at an offset for either ByteArray or ByteString
- byteStringAt(offset) ByteString
- a generic way to get a ByteArray for the value at an offset for either ByteArray or ByteString
- would have been wasteful to include since byte arrays are meant for assembling large byte buffers through copying data from other binary collections, not for allocation churn.
- a generic way to get a Number for the value at an offset for any ByteArray, ByteString, Array, or String
- valueAt(offset) Number
- a generic way to get an object of length one in the same type that contains just the value at a given offset for any ByteArray, ByteString, Array, or String
- slice(offset, offset + 1)
The following methods and properties are interoperable among ByteArray, ByteString, Array, and String:
- [[Get]]
- indexOf
- lastIndexOf
- slice
The following methods and properties are interoperable among ByteArray, ByteString, and String:
- [[Get]]
- Content
- valueAt
- join
- split
- splitRight
- indexOf
- lastIndexOf
- slice
The following methods and properties are interoperable between ByteArray and ByteString:
- [[Get]]
- Content
- valueAt
- byteStringAt
- indexOf
- lastIndexOf
- slice
- copy
- range
The following methods and properties are interoperable between ByteString and String:
- [[Get]]
- Content
- valueAt
- indexOf
- lastIndexOf
- slice
- substr
- substring
The following methods are interoperable between ByteArray and Array:
- [[Get]]
- indexOf
- lastIndexOf
- slice
- forEach
- every
- some
- map
Idempotence
Binary/B had a "decodeToString" method and "toString" was required to be distinct, to avoid decoding and encoding hazards. However, in keeping with the existing Number.prototype.toString(radix), for subjective aesthetic reasons having worked with prototypes of both, and because it easier to remember which way encoding and decoding go with "toString", "toByteString", and "toByteArray" than with "encode" and "decode", this proposal eschews "decodeToString". This has the side effect of making the generic "toByteString" and "toString" methods idempotent for converting to and from a charset. For example, if you receive an object that may be a ByteString or a String, but if it is a String it will need to be converted to a ByteString with a given charset, you can simply call "toByteString" with that charset, even repeatedly. Likewise, if you receive a ByteString or a String and you need it to be a String, you can simply call "toString" with the desired charset, allowing a succession of adapters or decorators to make the conversion at any point along the way.
Binary Object
This proposal omits the "Binary" type from [Binary/B] in anticipation that "ByteArray" and "ByteString" may become native types, in which case they would not be objects. Presently, any "String" constructed with the "new" keyword is not a native string, but a "boxed" wrapper object for a String.
assert.equals(typeof "", "string"); assert.equals("" instanceof String, false); assert.equals(new String() instanceof String, true); assert.equals(typeof new String(), "object"); assert.equals(new String().slice() instanceof String, false); assert.equals(Object.prototype.toString.call(""), "[object String]");
This anticipates that the following would be true in an ECMA specification:
assert.equals(typeof "", "byteString"); assert.equals("" instanceof ByteString, false); assert.equals(new ByteString() instanceof ByteString, true); assert.equals(typeof new ByteString(), "object"); assert.equals(new ByteString().slice() instanceof ByteString, false); assert.equals(Object.prototype.toString.call(""), "[object ByteString]");
This means that this specification does not provide a facility for checking whether a reference is either of the binary types: byte string or byte array, any more than it provides a direct facility for checking whether an object is either a String or a ByteString. However, it should serve in practice to call "toByteString(charset)" or "toString(charset)" to coerce either type to "ByteString" or "String". If the "charset" is not a useful argument, as it would be to convert a byte string to a byte string, this specification mandates that it be ignored.
This specification also deliberately avoids specifying whether the "typeof" operator on binary types should return "object" or some other value, and whether "instanceof" between a binary instance and its factory function returns true, as it is anticipated that this will be false in the future, but may be onerous to prevent from being true in the short term. This is to permit CommonJS implementations to take a breadth of approaches to providing the type today. This specification, however, does require constructing binary types with "new" to throw an error, in anticipation that that code pattern will eventually return a boxed type.
Miscelaneous
"ByteString" does not implement "toUpperCase" or "toLowerCase" since they are not meaningful without the context of a charset.
Unlike the "Array", the "ByteArray" is not variadic so that its initial length constructor is not ambiguous with its copy constructor.
The Binary/B proposal, at Ash Berlin's recommendation, had split methods on both ByteStrings and ByteArrays that accepted as their optional second argument, an object of options for both the number of delimiters to match, but whether to include the delimiter on the right side of each term, presumably including a terminal delimiter that would obviate an empty collection from being returned as the last value. This has been left as an exercise for a byte reader stream's "readLine(delimiter)" method, so that this proposal's split and splitRight methods may more closely resemble their cousins on existing Strings in JavaScript and other languages.
The "join" methods on "ByteArray" and "ByteString" differ from what you would expect in JavaScript based on Array joins, in a way that will be familiar and probably upsetting coming from Python. The delimiter is the left side of the expression. This is because the Array joining method can not be practically extended to determine whether to return a String, ByteString, or ByteArray based on the types of all of the values it contains. It is far more practical to multi-plex the return type based on the type of the left hand side of the expression.
This proposal calls for extensions to existing types. This does not mean that this specification encourages the practice of monkey-patching. Ideally, this proposal would be implemented at the very lowest levels of each engine, augmenting the existing primordial types before a line of JavaScript is evaluated. However, this specification does not prevent implementations from performing these augmentations either as a stop-gap or as standard practice (as in V8) in JavaScript as part of the engine's bootstrapping process. This suggests that it is not the responsibility of the "binary" module to construct these types, but merely to host them. If multiple modules systems share the same primordial objects, the instantiation of the binary module should not cause these types to be recreated or to not be referentially identical.
The "Content" property begins with a capital letter to distinguish it as a factory method like the constructor function to which it always refers, albeit ByteString, ByteArray, or Number. To date, this remains a point of discussion. Daniel Friesen's proposal Binary/C uses the name "contentConstructor".
Relevant Discussions
- ByteArray and ByteString proposal
- ByteArray: byteAt method
- Binary/B Extension Proposals and Implementation Notes
- Binary D ready for review
- Binary/D Draft 2
Todo
- chronicle the rationale for bit and byte types supported at this architecture layer
- write a section on what kinds of additional requirements could be expected in a future revision of the specification