Interpret bytes as a string.
[plum.str] Module Reference¶
The plum.str
module provides the StrX
transform which converts
strings into bytes and bytes into strings. This reference page demonstrates
creating and using an StrX
transform as well as provides API details.
The examples shown on this page require the following setup:
>>> from plum.bigendian import uint8
>>> from plum.str import StrX
>>> from plum.structure import member, sized_member, Structure
>>> from plum.utilities import pack, unpack
Basic Usage¶
The StrX
transform accepts the following arguments:
encoding: codecs name (e.g. “ascii”, “utf-8”) errors: codecs error handling (e.g. “strict”) nbytes: format size in bytes pad: pad byte zero_termination: zero termination byte present name: transform name (for representations including dump format column)
The encoding
parameter accepts any valid codecs
standard encodings
name and controls the conversion format between strings and bytes. The
nbytes
argument accepts any positive integer to control the expected
size of the bytes when converted:
>>> ascii_5 = StrX(encoding="ascii", nbytes=5)
Then use the transform for the format when using the pack()
and unpack()
utility
functions or when using other high level transforms:
>>> fmt = [ascii_5, ascii_5]
>>>
>>> unpack(fmt, b'HelloWorld')
['Hello', 'World']
>>>
>>> pack(['Hello', 'World'], fmt)
b'HelloWorld'
The errors
parameter accepts “errors”, “ignore”, “replace”, “strict”, or
any other name registered with the codecs.register_error()
function. By
default, error handling defaults to “strict”, meaning that encoding errors
raise an UnicodeError
.
Pad Byte¶
For strings that occupy a fixed number of bytes, but do not always completely
fill them, the pad
parameter accepts a single byte. When packing, it
fills in any remaining space with as many pad bytes as needed. When unpacking,
a pad byte signals the early end of the string:
>>> padded_ascii = StrX(encoding="ascii", nbytes=8, pad=b"\x00")
>>>
>>> fmt = [padded_ascii, padded_ascii]
>>>
>>> unpack(fmt, b'Hello\x00\x00\x00World!\x00\x00')
['Hello', 'World!']
>>>
>>> pack(['Hello', 'World!'], fmt)
b'Hello\x00\x00\x00World!\x00\x00'
Zero Termination¶
To allow the string size to vary, leave nbytes
default to None
and
set zero_termination=True
. This adds a zero termination byte at the end
when packing, and uses the zero termination as a signal to stop unpacking:
>>> ascii_zt = StrX(encoding="ascii", zero_termination=True)
>>>
>>> fmt = [ascii_zt, ascii_zt]
>>>
>>> unpack(fmt, b'Hello\x00World!\x00')
['Hello', 'World!']
>>>
>>> pack(['Hello', 'World!'], fmt)
b'Hello\x00World!\x00'
Sized Strings¶
When nbytes
is left to default to None
, the transform becomes “greedy”.
When packing, the transform converts any size string into bytes. When unpacking,
it consumes all remaining bytes and converts them into a string. Within a
structure, the sized_member()
function accepts this type of greedy string
transform as its format but controls its greed. When unpacking, the member
keeps the greed in check by limiting the buffer bytes available to consume to
the size controlled by a separate member of the structure (the size
argument of the sized_member()
function defines which member definition
the size comes from).
>>> greedy_ascii = StrX(encoding="ascii")
>>> class SizedStruct(Structure):
... size: int = member(fmt=uint8, compute=True)
... string: bytes = sized_member(fmt=greedy_ascii, size=size)
... bookend: int = member(fmt=uint8)
...
>>> struct = unpack(SizedStruct, b'\x0cHello World!\x99')
>>> struct.dump()
+--------+----------+----------------+-------------------------------------+-------------------------+
| Offset | Access | Value | Bytes | Format |
+--------+----------+----------------+-------------------------------------+-------------------------+
| | | | | SizedStruct (Structure) |
| 0 | size | 12 | 0c | uint8 |
| | string | | | str (ascii) |
| 1 | [0:12] | 'Hello World!' | 48 65 6c 6c 6f 20 57 6f 72 6c 64 21 | |
| 13 | bookend | 153 | 99 | uint8 |
+--------+----------+----------------+-------------------------------------+-------------------------+
Passing compute=True
when defining the size member property facilitates
leaving the size member uninitialized when constructing the structure. When
packing, the structure member gets computed automatically, in this case
from the length of the packed string member:
>>> struct = SizedStruct(string="Hello World!", bookend=0x99)
>>> struct.dump()
+--------+----------+----------------+-------------------------------------+-------------------------+
| Offset | Access | Value | Bytes | Format |
+--------+----------+----------------+-------------------------------------+-------------------------+
| | | | | SizedStruct (Structure) |
| 0 | size | 12 | 0c | uint8 |
| | string | | | str (ascii) |
| 1 | [0:12] | 'Hello World!' | 48 65 6c 6c 6f 20 57 6f 72 6c 64 21 | |
| 13 | bookend | 153 | 99 | uint8 |
+--------+----------+----------------+-------------------------------------+-------------------------+
>>> pack(struct)
b'\x0cHello World!\x99'
See the Sized Structure Member tutorial for additional features
of sized_member()
function such as specifying size ratios and offsets.
API Reference¶
-
class
plum.str.
StrX
(encoding: str, errors: str = 'strict', nbytes: Optional[int] = None, pad: bytes = b'', zero_termination: bool = False, name: Optional[str] = None)¶ String to bytes and bytes to string transform.
-
name
¶ Transform format name (for repr and dump “Format” column).
-
encoding
¶ Codecs encoding name.
-
errors
¶ Codecs error handling.
-
nbytes
¶ Transform format size in bytes.
-
pad
¶ Pad byte.
-
zero_termination
¶ Zero termination byte present.
-
pack
(value: Any) → bytes¶ Pack value as formatted bytes.
Raises: PackError
if type error, value error, etc.
-
pack_and_dump
(value: Any) → Tuple[bytes, plum.dump.Dump]¶ Pack value as formatted bytes and produce bytes summary.
Raises: PackError
if type error, value error, etc.
-
unpack
(buffer: bytes) → Any¶ Unpack value from formatted bytes.
Raises: UnpackError
if insufficient bytes, excess bytes, or value error
-
unpack_and_dump
(buffer: bytes) → Tuple[Any, plum.dump.Dump]¶ Unpack value from bytes and produce packed bytes summary.
Raises: UnpackError
if insufficient bytes, excess bytes, or value error
-