[plum] Tutorial: Unpacking Bytes¶
This tutorial demonstrates the various methods and capabilities of
unpacking bytes into the convenient forms supported by the various
plum
types.
The tutorial uses the following representative plum
type, but the
examples apply to all plum
types or derivatives of them.
>>> from plum.structure import Member, Structure
>>> from plum.int.big import UInt8, UInt16
>>>
>>> class Sample(Structure):
... m1: int = Member(cls=UInt16)
... m2: int = Member(cls=UInt8)
...
>>>
Unpack Method¶
[Plum Type Class Method]
Off the shelf plum
types (e.g. UInt16
) or derivatives
of plum
types (e.g. Sample
from above) offer an unpack()
class
method that accepts a bytes-like buffer (e.g. bytes
, bytearray
,
or memoryview
) and unpacks an item of that type from it. For example:
>>> # off the shelf plum type unpacked from a "bytes" buffer
>>> buffer = b'\x00\x01'
>>> UInt16.unpack(buffer)
1
>>>
>>> # derivative plum type unpacked from a "bytearray" buffer
>>> buffer = bytearray([0, 1, 2])
>>> Sample.unpack(buffer)
Sample(m1=1, m2=2)
When insufficient bytes are present in the buffer for the type being unpacked,
the unpack operation raises a plum.UnpackError
exception.
>>> UInt16.unpack(b'\x00')
Traceback (most recent call last):
...
plum._exceptions.UnpackError:
<BLANKLINE>
+--------+----------------------+-------+--------+
| Offset | Value | Bytes | Type |
+--------+----------------------+-------+--------+
| 0 | <insufficient bytes> | 00 | UInt16 |
+--------+----------------------+-------+--------+
<BLANKLINE>
InsufficientMemoryError occurred during unpack operation:
<BLANKLINE>
1 too few bytes to unpack UInt16 (2 needed, only 1 available)
Similarly, when too many bytes are present in the buffer for the type being unpacked,
the unpack operation raises a plum.UnpackError
exception.
>>> UInt16.unpack(b'\x00\x01\x02')
Traceback (most recent call last):
...
plum._exceptions.UnpackError:
<BLANKLINE>
+--------+----------------+-------+--------+
| Offset | Value | Bytes | Type |
+--------+----------------+-------+--------+
| 0 | 1 | 00 01 | UInt16 |
+--------+----------------+-------+--------+
| 2 | <excess bytes> | 02 | |
+--------+----------------+-------+--------+
<BLANKLINE>
ExcessMemoryError occurred during unpack operation:
<BLANKLINE>
1 unconsumed bytes
Note
In both cases, the exception instance __context__
attribute holds a
reference to the underlying exception, InsufficientMemoryError
or
ExcessMemoryError
(respectively).
Unpack Utility Function¶
The pack()
utility function mirrors the standard library
struct.unpack()
function except instead of accepting a format
string, it accepts plum
types. The utility function provides the
same ability to unpack a single item and follows the same exception
behaviors as the unpack()
class method. For example:
>>> from plum import unpack
>>>
>>> # off the shelf plum type unpacked from a "bytes" buffer
>>> buffer = b'\x00\x01'
>>> unpack(UInt16, buffer)
1
>>>
>>> # derivative plum type unpacked from a "bytearray" buffer
>>> buffer = bytearray([0, 1, 2])
>>> unpack(Sample, buffer)
Sample(m1=1, m2=2)
Tip
Using the utility function instead of the class method may result in a slight performance benefit.
The unpack()
utility function also supports unpacking multiple items
at once from a bytes-like buffer. Two choices exist, unpacking items in the
form of a tuple/list and unpacking in the form of a dictionary.
Unpacking a Tuple/list
Instead of passing unpack()
a single plum
type, pass a iterable
(e.g. list or tuple) of plum
types along with the bytes-like buffer.
unpack()
returns a tuple or list with unpacked items that resulted
from unpacking each of the provided types. For example:
>>> from plum.int.big import UInt8
>>>
>>> buffer = b'\x01\x02\x03'
>>>
>>> # provide types as tuple, get tuple of values
>>> unpack((UInt8, UInt8, UInt8), buffer)
(1, 2, 3)
>>>
>>> # provide types as list, get list of values
>>> unpack([UInt8] * 3, buffer)
[1, 2, 3]
Unpacking a Dict
Alternatively, pass a dictionary of name/type pairs and unpack()
returns a dictionary with unpacked items that resulted from unpacking
each of the provided types. For example:
>>> unpack({'a': UInt8, 'b': UInt8, 'c': UInt8}, buffer)
{'a': 1, 'b': 2, 'c': 3}
Arbitrarily Nested Structure
The unpack()
format specifier supports arbitrary nesting of tuple/lists and
dicts to provide the unpacked values in a more convenient and organized form.
For example:
>>>
>>> fmt = {'a': (UInt8, UInt8), 'b': {'i': UInt8, 'ii': UInt8}}
>>>
>>> unpack(fmt, bytearray(b'\x01\x02\x03\x04'))
{'a': (1, 2), 'b': {'i': 3, 'ii': 4}}
Unpacking a Portion of a Bytes Buffer¶
The unpack_from()
utility function and the unpack_from()
plum
type class method have the same unpacking behaviors discussed in
the previous sections. But in addition to accepting a bytes-like buffer,
the unpack_from()
variants accept a byte offset that controls the
location within the bytes buffer to start unpacking from. Also, the
unpack_from()
variants do not raise exceptions when they do not
consume the entire remaining portion of the bytes buffer. For example:
>>> from plum import unpack_from
>>> from plum.int.big import UInt8
>>>
>>> buffer = b'\x00\x00\x01\x02\x00'
>>> OFFSET = 2
>>>
>>> # class method
>>> UInt8.unpack_from(buffer, OFFSET)
1
>>> # utility function
>>> fmt = (UInt8, UInt8)
>>> unpack_from(fmt, buffer, OFFSET)
(1, 2)
Incremental Unpacking¶
BytesIO
The unpack_from()
variants support incremental unpacking when used
with a io.BytesIO
instance. As bytes are consumed unpacking items from
the buffer, the unpacking operation adjusts the current stream position as it
goes. This eliminates the need for tracking or passing a buffer byte offset.
For example:
>>> from io import BytesIO
>>> from plum import unpack_from
>>> from plum.int.big import UInt8
>>>
>>> buffer = BytesIO(b'\x03\x01\x02\x03\x04')
>>>
>>> array_length = UInt8.unpack_from(buffer)
>>> array_length
3
>>> fmt = [UInt8] * array_length
>>> array = unpack_from(fmt, buffer)
>>> array
[1, 2, 3]
Plum Buffer
In the previous example, the buffer contained 1 extra byte (0x04
). To detect
unconsumed bytes, create a plum.Buffer
instead and use it as a context
manager. Then use the buffer’s unpack()
method as normal within the
context. If any bytes remain unconsumed in the buffer, the context manager raises
an exception when the context is exited. For example:
>>> from plum import Buffer
>>>
>>> with Buffer(b'\x03\x01\x02\x03\x04') as buffer:
... array_length = buffer.unpack(UInt8)
... array = buffer.unpack([UInt8] * array_length)
...
Traceback (most recent call last):
...
plum._exceptions.ExcessMemoryError: 1 unconsumed bytes
Obtaining Bytes Summary Dump¶
For every unpack method/function shown in the previous sections, an “and_dump”
variation exists to provide a bytes summary dump in addition to the unpacked
item(s). The following example shows the utility function variations. But
variations exist for all the plum
type and plum.Buffer
unpacking
methods.
>>> from plum import unpack_and_dump, unpack_from_and_dump
>>> from plum.int.big import UInt8
>>>
>>> item, dump = unpack_and_dump(Sample, bytearray([0, 1, 2]))
>>> item
Sample(m1=1, m2=2)
>>> print(dump)
+--------+-----------+-------+-------+--------+
| Offset | Access | Value | Bytes | Type |
+--------+-----------+-------+-------+--------+
| | | | | Sample |
| 0 | [0] (.m1) | 1 | 00 01 | UInt16 |
| 2 | [1] (.m2) | 2 | 02 | UInt8 |
+--------+-----------+-------+-------+--------+
>>>
>>> item, dump = unpack_from_and_dump(UInt8, bytearray([0, 1, 2]), offset=1)
>>> item
1
>>> print(dump)
+--------+-------+-------+-------+
| Offset | Value | Bytes | Type |
+--------+-------+-------+-------+
| 1 | 1 | 01 | UInt8 |
+--------+-------+-------+-------+
Tip
The “and_dump” unpacking variations exist primarily for ease of use
and performance. Otherwise a dump as a separate operation after an
item is unpacked would redo similar work (and in some cases require
extra work to instantiate the type with the unpacked value to have
access to the dump
property).