[plum] Tutorial: Unpacking Bytes

This tutorial demonstrates the various methods and capabilities of unpacking bytes into the convenient forms supported by the various plum types.

The tutorial uses the following representative plum type, but the examples apply to all plum types or derivatives of them.

>>> from plum.structure import Member, Structure
>>> from plum.int.big import UInt8, UInt16
>>>
>>> class Sample(Structure):
...     m1: int = Member(cls=UInt16)
...     m2: int = Member(cls=UInt8)
...
>>>

Unpack Method

[Plum Type Class Method]

Off the shelf plum types (e.g. UInt16) or derivatives of plum types (e.g. Sample from above) offer an unpack() class method that accepts a bytes-like buffer (e.g. bytes, bytearray, or memoryview) and unpacks an item of that type from it. For example:

>>> # off the shelf plum type unpacked from a "bytes" buffer
>>> buffer = b'\x00\x01'
>>> UInt16.unpack(buffer)
1
>>>
>>> # derivative plum type unpacked from a "bytearray" buffer
>>> buffer = bytearray([0, 1, 2])
>>> Sample.unpack(buffer)
Sample(m1=1, m2=2)

When insufficient bytes are present in the buffer for the type being unpacked, the unpack operation raises a plum.UnpackError exception.

>>> UInt16.unpack(b'\x00')
Traceback (most recent call last):
  ...
plum._exceptions.UnpackError:
<BLANKLINE>
+--------+----------------------+-------+--------+
| Offset | Value                | Bytes | Type   |
+--------+----------------------+-------+--------+
| 0      | <insufficient bytes> | 00    | UInt16 |
+--------+----------------------+-------+--------+
<BLANKLINE>
InsufficientMemoryError occurred during unpack operation:
<BLANKLINE>
1 too few bytes to unpack UInt16 (2 needed, only 1 available)

Similarly, when too many bytes are present in the buffer for the type being unpacked, the unpack operation raises a plum.UnpackError exception.

>>> UInt16.unpack(b'\x00\x01\x02')
Traceback (most recent call last):
    ...
plum._exceptions.UnpackError:
<BLANKLINE>
+--------+----------------+-------+--------+
| Offset | Value          | Bytes | Type   |
+--------+----------------+-------+--------+
| 0      | 1              | 00 01 | UInt16 |
+--------+----------------+-------+--------+
| 2      | <excess bytes> | 02    |        |
+--------+----------------+-------+--------+
<BLANKLINE>
ExcessMemoryError occurred during unpack operation:
<BLANKLINE>
1 unconsumed bytes

Note

In both cases, the exception instance __context__ attribute holds a reference to the underlying exception, InsufficientMemoryError or ExcessMemoryError (respectively).

Unpack Utility Function

The pack() utility function mirrors the standard library struct.unpack() function except instead of accepting a format string, it accepts plum types. The utility function provides the same ability to unpack a single item and follows the same exception behaviors as the unpack() class method. For example:

>>> from plum import unpack
>>>
>>> # off the shelf plum type unpacked from a "bytes" buffer
>>> buffer = b'\x00\x01'
>>> unpack(UInt16, buffer)
1
>>>
>>> # derivative plum type unpacked from a "bytearray" buffer
>>> buffer = bytearray([0, 1, 2])
>>> unpack(Sample, buffer)
Sample(m1=1, m2=2)

Tip

Using the utility function instead of the class method may result in a slight performance benefit.

The unpack() utility function also supports unpacking multiple items at once from a bytes-like buffer. Two choices exist, unpacking items in the form of a tuple/list and unpacking in the form of a dictionary.

Unpacking a Tuple/list

Instead of passing unpack() a single plum type, pass a iterable (e.g. list or tuple) of plum types along with the bytes-like buffer. unpack() returns a tuple or list with unpacked items that resulted from unpacking each of the provided types. For example:

>>> from plum.int.big import UInt8
>>>
>>> buffer = b'\x01\x02\x03'
>>>
>>> # provide types as tuple, get tuple of values
>>> unpack((UInt8, UInt8, UInt8), buffer)
(1, 2, 3)
>>>
>>> # provide types as list, get list of values
>>> unpack([UInt8] * 3, buffer)
[1, 2, 3]

Unpacking a Dict

Alternatively, pass a dictionary of name/type pairs and unpack() returns a dictionary with unpacked items that resulted from unpacking each of the provided types. For example:

>>> unpack({'a': UInt8, 'b': UInt8, 'c': UInt8}, buffer)
{'a': 1, 'b': 2, 'c': 3}

Arbitrarily Nested Structure

The unpack() format specifier supports arbitrary nesting of tuple/lists and dicts to provide the unpacked values in a more convenient and organized form. For example:

>>>
>>> fmt = {'a': (UInt8, UInt8), 'b': {'i': UInt8, 'ii': UInt8}}
>>>
>>> unpack(fmt, bytearray(b'\x01\x02\x03\x04'))
{'a': (1, 2), 'b': {'i': 3, 'ii': 4}}

Unpacking a Portion of a Bytes Buffer

The unpack_from() utility function and the unpack_from() plum type class method have the same unpacking behaviors discussed in the previous sections. But in addition to accepting a bytes-like buffer, the unpack_from() variants accept a byte offset that controls the location within the bytes buffer to start unpacking from. Also, the unpack_from() variants do not raise exceptions when they do not consume the entire remaining portion of the bytes buffer. For example:

>>> from plum import unpack_from
>>> from plum.int.big import UInt8
>>>
>>> buffer = b'\x00\x00\x01\x02\x00'
>>> OFFSET = 2
>>>
>>> # class method
>>> UInt8.unpack_from(buffer, OFFSET)
1
>>> # utility function
>>> fmt = (UInt8, UInt8)
>>> unpack_from(fmt, buffer, OFFSET)
(1, 2)

Incremental Unpacking

BytesIO

The unpack_from() variants support incremental unpacking when used with a io.BytesIO instance. As bytes are consumed unpacking items from the buffer, the unpacking operation adjusts the current stream position as it goes. This eliminates the need for tracking or passing a buffer byte offset. For example:

>>> from io import BytesIO
>>> from plum import unpack_from
>>> from plum.int.big import UInt8
>>>
>>> buffer = BytesIO(b'\x03\x01\x02\x03\x04')
>>>
>>> array_length = UInt8.unpack_from(buffer)
>>> array_length
3
>>> fmt = [UInt8] * array_length
>>> array = unpack_from(fmt, buffer)
>>> array
[1, 2, 3]

Plum Buffer

In the previous example, the buffer contained 1 extra byte (0x04). To detect unconsumed bytes, create a plum.Buffer instead and use it as a context manager. Then use the buffer’s unpack() method as normal within the context. If any bytes remain unconsumed in the buffer, the context manager raises an exception when the context is exited. For example:

>>> from plum import Buffer
>>>
>>> with Buffer(b'\x03\x01\x02\x03\x04') as buffer:
...     array_length = buffer.unpack(UInt8)
...     array = buffer.unpack([UInt8] * array_length)
...
Traceback (most recent call last):
    ...
plum._exceptions.ExcessMemoryError: 1 unconsumed bytes

Obtaining Bytes Summary Dump

For every unpack method/function shown in the previous sections, an “and_dump” variation exists to provide a bytes summary dump in addition to the unpacked item(s). The following example shows the utility function variations. But variations exist for all the plum type and plum.Buffer unpacking methods.

>>> from plum import unpack_and_dump, unpack_from_and_dump
>>> from plum.int.big import UInt8
>>>
>>> item, dump = unpack_and_dump(Sample, bytearray([0, 1, 2]))
>>> item
Sample(m1=1, m2=2)
>>> print(dump)
+--------+-----------+-------+-------+--------+
| Offset | Access    | Value | Bytes | Type   |
+--------+-----------+-------+-------+--------+
|        |           |       |       | Sample |
| 0      | [0] (.m1) | 1     | 00 01 | UInt16 |
| 2      | [1] (.m2) | 2     | 02    | UInt8  |
+--------+-----------+-------+-------+--------+
>>>
>>> item, dump = unpack_from_and_dump(UInt8, bytearray([0, 1, 2]), offset=1)
>>> item
1
>>> print(dump)
+--------+-------+-------+-------+
| Offset | Value | Bytes | Type  |
+--------+-------+-------+-------+
| 1      | 1     | 01    | UInt8 |
+--------+-------+-------+-------+

Tip

The “and_dump” unpacking variations exist primarily for ease of use and performance. Otherwise a dump as a separate operation after an item is unpacked would redo similar work (and in some cases require extra work to instantiate the type with the unpacked value to have access to the dump property).