Skip to content

Strings

Text fields come in two layouts: a fixed-size string stored inline on the stack, and a variable-length string stored on the heap behind a pointer.

Encodings

EncodingNotes
utf8UTF-8
ascii7-bit ASCII
latin18-bit Latin-1
utf16leUTF-16, little-endian

Fixed sizes are byte counts

A fixed encoding[N] size is a number of bytes, not characters. utf8[4] reserves 4 bytes, and a value that encodes to more than 4 bytes is truncated. For content outside 7-bit ASCII, size a fixed field to the encoded byte length, not the character count. Variable-length (*) strings have no fixed size and never truncate.

Fixed-size

encoding[N] reserves exactly N bytes inline on the stack.

ts
import { Struct } from '@remotex-labs/xstruct';

const record = new Struct<{ code: string; title: string }>({
    code: 'ascii[4]',  // 4 bytes
    title: 'utf8[32]'  // 32 bytes
});

A value shorter than N is zero-padded, and those padding bytes are part of the decoded string. A value longer than N is truncated. Choose a size that fits your longest expected value.

ts
const s = new Struct<{ tag: string }>({ tag: 'utf8[4]' });

s.toObject(s.toBuffer({ tag: 'ABCD' })).tag; // 'ABCD'  (fills exactly)
s.toObject(s.toBuffer({ tag: 'AB' })).tag;   // 'AB��'  (zero-padded)

Variable-length (heap)

Prefix with * to store the string on the heap behind a pointer. The stack holds one pointerSize-byte slot; the bytes live in the heap region appended after the struct. This is the form to use when the length is not known in advance.

ts
const user = new Struct<{ name: string }>({ name: '*utf8' });

user.toObject(user.toBuffer({ name: 'Ada Lovelace' })).name; // 'Ada Lovelace'

The payload is sized by its encoded byte length, so any encoding round-trips without truncation, including multi-byte UTF-8 and UTF-16:

ts
const u = new Struct<{ t: string }>({ t: '*utf8' });

u.toObject(u.toBuffer({ t: 'café ☕' })).t; // 'café ☕'

See Heap & Pointers for how pointer payloads are stored and how pointerSize bounds their length.

String arrays

A second [M] after a fixed string repeats it M times. Each element keeps the same fixed byte width.

ts
const codes = new Struct<{ list: string[] }>({ list: 'ascii[4][2]' });
//                                                      ^^^^ ^^^
//                                          4 bytes each, 2 elements

codes.toObject(codes.toBuffer({ list: [ 'ABCD', 'EFGH' ] })).list;
// [ 'ABCD', 'EFGH' ]

A pointer array *utf8[M] is M pointer slots, each addressing its own variable-length heap string.

ts
const tags = new Struct<{ tags: string[] }>({ tags: '*utf8[3]' });

tags.toObject(tags.toBuffer({ tags: [ 'aa', 'bbbb', 'c' ] })).tags;
// [ 'aa', 'bbbb', 'c' ]

See Arrays.

Choosing a layout

NeedUse
Constant record widthutf8[N] (fixed-size)
Variable-length text*utf8 (heap pointer)
Fixed list of fixed stringsutf8[N][M]
List of variable strings*utf8[M]

See also

Released under the Mozilla Public License 2.0