Interpret bytes as a string.

[plum.str] Module Reference

The plum.str module provides the StrX transform which converts strings into bytes and bytes into strings. This reference page demonstrates creating and using an StrX transform as well as provides API details.

The examples shown on this page require the following setup:

>>> from plum.bigendian import uint8
>>> from plum.str import StrX
>>> from plum.structure import member, sized_member, Structure
>>> from plum.utilities import pack, unpack

Basic Usage

The StrX transform accepts the following arguments:

encoding:codecs name (e.g. “ascii”, “utf-8”)
errors:codecs error handling (e.g. “strict”)
nbytes:format size in bytes
pad:pad byte
zero_termination:
 zero termination byte present
name:transform name (for representations including dump format column)

The encoding parameter accepts any valid codecs standard encodings name and controls the conversion format between strings and bytes. The nbytes argument accepts any positive integer to control the expected size of the bytes when converted:

>>> ascii_5 = StrX(encoding="ascii", nbytes=5)

Then use the transform for the format when using the pack() and unpack() utility functions or when using other high level transforms:

>>> fmt = [ascii_5, ascii_5]
>>>
>>> unpack(fmt, b'HelloWorld')
['Hello', 'World']
>>>
>>> pack(['Hello', 'World'], fmt)
 b'HelloWorld'

The errors parameter accepts “errors”, “ignore”, “replace”, “strict”, or any other name registered with the codecs.register_error() function. By default, error handling defaults to “strict”, meaning that encoding errors raise an UnicodeError.

Pad Byte

For strings that occupy a fixed number of bytes, but do not always completely fill them, the pad parameter accepts a single byte. When packing, it fills in any remaining space with as many pad bytes as needed. When unpacking, a pad byte signals the early end of the string:

>>> padded_ascii = StrX(encoding="ascii", nbytes=8, pad=b"\x00")
>>>
>>> fmt = [padded_ascii, padded_ascii]
>>>
>>> unpack(fmt, b'Hello\x00\x00\x00World!\x00\x00')
['Hello', 'World!']
>>>
>>> pack(['Hello', 'World!'], fmt)
 b'Hello\x00\x00\x00World!\x00\x00'

Zero Termination

To allow the string size to vary, leave nbytes default to None and set zero_termination=True. This adds a zero termination byte at the end when packing, and uses the zero termination as a signal to stop unpacking:

>>> ascii_zt = StrX(encoding="ascii", zero_termination=True)
>>>
>>> fmt = [ascii_zt, ascii_zt]
>>>
>>> unpack(fmt, b'Hello\x00World!\x00')
['Hello', 'World!']
>>>
>>> pack(['Hello', 'World!'], fmt)
 b'Hello\x00World!\x00'

Sized Strings

When nbytes is left to default to None, the transform becomes “greedy”. When packing, the transform converts any size string into bytes. When unpacking, it consumes all remaining bytes and converts them into a string. Within a structure, the sized_member() function accepts this type of greedy string transform as its format but controls its greed. When unpacking, the member keeps the greed in check by limiting the buffer bytes available to consume to the size controlled by a separate member of the structure (the size argument of the sized_member() function defines which member definition the size comes from).

>>> greedy_ascii = StrX(encoding="ascii")
>>> class SizedStruct(Structure):
...     size: int = member(fmt=uint8, compute=True)
...     string: bytes = sized_member(fmt=greedy_ascii, size=size)
...     bookend: int = member(fmt=uint8)
...
>>> struct = unpack(SizedStruct, b'\x0cHello World!\x99')
>>> struct.dump()
+--------+----------+----------------+-------------------------------------+-------------------------+
| Offset | Access   | Value          | Bytes                               | Format                  |
+--------+----------+----------------+-------------------------------------+-------------------------+
|        |          |                |                                     | SizedStruct (Structure) |
|  0     | size     | 12             | 0c                                  | uint8                   |
|        | string   |                |                                     | str (ascii)             |
|  1     |   [0:12] | 'Hello World!' | 48 65 6c 6c 6f 20 57 6f 72 6c 64 21 |                         |
| 13     | bookend  | 153            | 99                                  | uint8                   |
+--------+----------+----------------+-------------------------------------+-------------------------+

Passing compute=True when defining the size member property facilitates leaving the size member uninitialized when constructing the structure. When packing, the structure member gets computed automatically, in this case from the length of the packed string member:

>>> struct = SizedStruct(string="Hello World!", bookend=0x99)
>>> struct.dump()
+--------+----------+----------------+-------------------------------------+-------------------------+
| Offset | Access   | Value          | Bytes                               | Format                  |
+--------+----------+----------------+-------------------------------------+-------------------------+
|        |          |                |                                     | SizedStruct (Structure) |
|  0     | size     | 12             | 0c                                  | uint8                   |
|        | string   |                |                                     | str (ascii)             |
|  1     |   [0:12] | 'Hello World!' | 48 65 6c 6c 6f 20 57 6f 72 6c 64 21 |                         |
| 13     | bookend  | 153            | 99                                  | uint8                   |
+--------+----------+----------------+-------------------------------------+-------------------------+
>>> pack(struct)
b'\x0cHello World!\x99'

See the Sized Structure Member tutorial for additional features of sized_member() function such as specifying size ratios and offsets.

API Reference

class plum.str.StrX(encoding: str, errors: str = 'strict', nbytes: Optional[int] = None, pad: bytes = b'', zero_termination: bool = False, name: Optional[str] = None)

String to bytes and bytes to string transform.

name

Transform format name (for repr and dump “Format” column).

encoding

Codecs encoding name.

errors

Codecs error handling.

nbytes

Transform format size in bytes.

pad

Pad byte.

zero_termination

Zero termination byte present.

pack(value: Any) → bytes

Pack value as formatted bytes.

Raises:PackError if type error, value error, etc.
pack_and_dump(value: Any) → Tuple[bytes, plum.dump.Dump]

Pack value as formatted bytes and produce bytes summary.

Raises:PackError if type error, value error, etc.
unpack(buffer: bytes) → Any

Unpack value from formatted bytes.

Raises:UnpackError if insufficient bytes, excess bytes, or value error
unpack_and_dump(buffer: bytes) → Tuple[Any, plum.dump.Dump]

Unpack value from bytes and produce packed bytes summary.

Raises:UnpackError if insufficient bytes, excess bytes, or value error