Interpret bytes as a string.

[plum.str] Module Reference¶

The plum.str module provides the StrX transform which converts strings into bytes and bytes into strings. This reference page demonstrates creating and using an StrX transform as well as provides API details.

Basic Usage
Pad Byte
Zero Termination
Sized Strings
API Reference

The examples shown on this page require the following setup:

>>> from plum.bigendian import uint8
>>> from plum.str import StrX
>>> from plum.structure import member, sized_member, Structure
>>> from plum.utilities import pack, unpack

Basic Usage ¶

The StrX transform accepts the following arguments:

encoding: codecs name (e.g. “ascii”, “utf-8”)

errors: codecs error handling (e.g. “strict”)

nbytes: format size in bytes

pad: pad byte

zero_termination:

zero termination byte present

name: transform name (for representations including dump format column)

The encoding parameter accepts any valid codecs standard encodings name and controls the conversion format between strings and bytes. The nbytes argument accepts any positive integer to control the expected size of the bytes when converted:

>>> ascii_5 = StrX(encoding="ascii", nbytes=5)

Then use the transform for the format when using the pack() and unpack() utility functions or when using other high level transforms:

>>> fmt = [ascii_5, ascii_5]
>>>
>>> unpack(fmt, b'HelloWorld')
['Hello', 'World']
>>>
>>> pack(['Hello', 'World'], fmt)
 b'HelloWorld'

The errors parameter accepts “errors”, “ignore”, “replace”, “strict”, or any other name registered with the codecs.register_error() function. By default, error handling defaults to “strict”, meaning that encoding errors raise an UnicodeError.

Pad Byte ¶

For strings that occupy a fixed number of bytes, but do not always completely fill them, the pad parameter accepts a single byte. When packing, it fills in any remaining space with as many pad bytes as needed. When unpacking, a pad byte signals the early end of the string:

>>> padded_ascii = StrX(encoding="ascii", nbytes=8, pad=b"\x00")
>>>
>>> fmt = [padded_ascii, padded_ascii]
>>>
>>> unpack(fmt, b'Hello\x00\x00\x00World!\x00\x00')
['Hello', 'World!']
>>>
>>> pack(['Hello', 'World!'], fmt)
 b'Hello\x00\x00\x00World!\x00\x00'

Zero Termination ¶

To allow the string size to vary, leave nbytes default to None and set zero_termination=True. This adds a zero termination byte at the end when packing, and uses the zero termination as a signal to stop unpacking:

>>> ascii_zt = StrX(encoding="ascii", zero_termination=True)
>>>
>>> fmt = [ascii_zt, ascii_zt]
>>>
>>> unpack(fmt, b'Hello\x00World!\x00')
['Hello', 'World!']
>>>
>>> pack(['Hello', 'World!'], fmt)
 b'Hello\x00World!\x00'

Sized Strings ¶

When nbytes is left to default to None, the transform becomes “greedy”. When packing, the transform converts any size string into bytes. When unpacking, it consumes all remaining bytes and converts them into a string. Within a structure, the sized_member() function accepts this type of greedy string transform as its format but controls its greed. When unpacking, the member keeps the greed in check by limiting the buffer bytes available to consume to the size controlled by a separate member of the structure (the size argument of the sized_member() function defines which member definition the size comes from).

>>> greedy_ascii = StrX(encoding="ascii")

>>> class SizedStruct(Structure):
...     size: int = member(fmt=uint8, compute=True)
...     string: bytes = sized_member(fmt=greedy_ascii, size=size)
...     bookend: int = member(fmt=uint8)
...
>>> struct = unpack(SizedStruct, b'\x0cHello World!\x99')
>>> struct.dump()
+--------+----------+----------------+-------------------------------------+-------------------------+
| Offset | Access   | Value          | Bytes                               | Format                  |
+--------+----------+----------------+-------------------------------------+-------------------------+
|        |          |                |                                     | SizedStruct (Structure) |
|  0     | size     | 12             | 0c                                  | uint8                   |
|        | string   |                |                                     | str (ascii)             |
|  1     |   [0:12] | 'Hello World!' | 48 65 6c 6c 6f 20 57 6f 72 6c 64 21 |                         |
| 13     | bookend  | 153            | 99                                  | uint8                   |
+--------+----------+----------------+-------------------------------------+-------------------------+

Passing compute=True when defining the size member property facilitates leaving the size member uninitialized when constructing the structure. When packing, the structure member gets computed automatically, in this case from the length of the packed string member:

>>> struct = SizedStruct(string="Hello World!", bookend=0x99)
>>> struct.dump()
+--------+----------+----------------+-------------------------------------+-------------------------+
| Offset | Access   | Value          | Bytes                               | Format                  |
+--------+----------+----------------+-------------------------------------+-------------------------+
|        |          |                |                                     | SizedStruct (Structure) |
|  0     | size     | 12             | 0c                                  | uint8                   |
|        | string   |                |                                     | str (ascii)             |
|  1     |   [0:12] | 'Hello World!' | 48 65 6c 6c 6f 20 57 6f 72 6c 64 21 |                         |
| 13     | bookend  | 153            | 99                                  | uint8                   |
+--------+----------+----------------+-------------------------------------+-------------------------+
>>> pack(struct)
b'\x0cHello World!\x99'

See the Sized Structure Member tutorial for additional features of sized_member() function such as specifying size ratios and offsets.

API Reference ¶

class plum.str.StrX(encoding: str, errors: str = 'strict', nbytes: Optional[int] = None, pad: bytes = b'', zero_termination: bool = False, name: Optional[str] = None)¶

String to bytes and bytes to string transform.

name¶: Transform format name (for repr and dump “Format” column).

encoding¶: Codecs encoding name.

errors¶: Codecs error handling.

nbytes¶: Transform format size in bytes.

pad¶: Pad byte.

zero_termination¶: Zero termination byte present.

pack(value: Any) → bytes¶

Pack value as formatted bytes.

Raises:	`PackError` if type error, value error, etc.

pack_and_dump(value: Any) → Tuple[bytes, plum.dump.Dump]¶

Pack value as formatted bytes and produce bytes summary.

Raises:	`PackError` if type error, value error, etc.

unpack(buffer: bytes) → Any¶

Unpack value from formatted bytes.

Raises:	`UnpackError` if insufficient bytes, excess bytes, or value error

unpack_and_dump(buffer: bytes) → Tuple[Any, plum.dump.Dump]¶

Unpack value from bytes and produce packed bytes summary.

Raises:	`UnpackError` if insufficient bytes, excess bytes, or value error

[plum.str] Module Reference¶

Basic Usage¶

Pad Byte¶

Zero Termination¶

Sized Strings¶

API Reference¶

Basic Usage ¶

Pad Byte ¶

Zero Termination ¶

Sized Strings ¶

API Reference ¶