[plum.str] String Tutorial: Basic String Usage

This tutorial shows the basics of using the pre-baked plum string types, AsciiStr and Utf8Str. These types facilitate packing (encoding) strings into bytes and unpacking (decoding) bytes into strings for the two most common string types. plum supports other encodings, but first read this page and then refer to the Create Custom Types tutorial.

The standard Python str supports both encoding and decoding. The plum string types extend that capability such that they can be used with the pack() and unpack() functions as well as with plum aggregation types (e.g. as an Array item type or a Structure member type). Since plum string types by default are greedy (they consume as many bytes or characters that are given to them), usage of the types require additional work when utilizing them in arrays or structures. First read this tutorial, then refer to the remaining String Tutorials to learn how.

Unpacking

When used as the first argument to the unpack() function, plum string types decode the bytes provided based on the encoding pre-configured within the type. Obvious by its name, the AsciiStr decodes bytes as simple ASCII characters:

>>> from plum import unpack, unpack_and_dump
>>> from plum.str import AsciiStr
>>>
>>> s = unpack(AsciiStr, b'The quick brown fox jumped over the lazy dog.')
>>> s
'The quick brown fox jumped over the lazy dog.'
>>>
>>> s, dump = unpack_and_dump(AsciiStr, b'The quick brown fox jumped over the lazy dog.')
>>> print(dump)
+--------+---------+--------------------+-------------------------------------------------+----------+
| Offset | Access  | Value              | Bytes                                           | Type     |
+--------+---------+--------------------+-------------------------------------------------+----------+
|        |         |                    |                                                 | AsciiStr |
|  0     | [0:16]  | 'The quick brown ' | 54 68 65 20 71 75 69 63 6b 20 62 72 6f 77 6e 20 |          |
| 16     | [16:32] | 'fox jumped over ' | 66 6f 78 20 6a 75 6d 70 65 64 20 6f 76 65 72 20 |          |
| 32     | [32:45] | 'the lazy dog.'    | 74 68 65 20 6c 61 7a 79 20 64 6f 67 2e          |          |
+--------+---------+--------------------+-------------------------------------------------+----------+

When encountering non-ASCII bytes, the AsciiStr throws an exception and provides a good indication where the problem lies:

>>> unpack(AsciiStr, b'Ahoj sv\xc4\x9bte')
Traceback (most recent call last):
    ...
plum._exceptions.UnpackError:
<BLANKLINE>
+--------+-----------+-----------+----------------------+----------+
| Offset | Access    | Value     | Bytes                | Type     |
+--------+-----------+-----------+----------------------+----------+
|        |           |           |                      | AsciiStr |
|  0     | [0:7]     | 'Ahoj sv' | 41 68 6f 6a 20 73 76 |          |
|  7     | --error-- |           | c4 9b 74 65          |          |
+--------+-----------+-----------+----------------------+----------+
<BLANKLINE>
UnicodeDecodeError occurred during unpack operation:
<BLANKLINE>
'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128)

Instantiating & Packing

plum string types follow the str API. The constructor accepts a string. When used with the plum pack() function, the type encodes the bytes based on the encoding pre-configured within the type:

>>> from plum.str import Utf8Str
>>>
>>> s = Utf8Str('Ahoj světe')
>>> s.pack()
bytearray(b'Ahoj sv\xc4\x9bte')

Like the standard Python str, plum string types accept encoding and error parameters for use in combination with a bytes input. Note, when used with the pack() function, the type encodes the bytes following the pre-configured string type encoding (in this example utf-8), not the encoding used during instantiation:

>>> s = Utf8Str(b'\xff\xfeA\x00h\x00o\x00j\x00 \x00s\x00v\x00\x1b\x01t\x00e\x00', encoding='utf-16')
>>> s.pack()
bytearray(b'Ahoj sv\xc4\x9bte')
>>> s.dump()
+--------+--------+--------------+----------------------------------+---------+
| Offset | Access | Value        | Bytes                            | Type    |
+--------+--------+--------------+----------------------------------+---------+
|        |        |              |                                  | Utf8Str |
|  0     | [0:10] | 'Ahoj světe' | 41 68 6f 6a 20 73 76 c4 9b 74 65 |         |
+--------+--------+--------------+----------------------------------+---------+

Direct Packing

The pack() function also allows the string type and the string value be passed separately. This may have slight performance benefits since it avoids the instantiation of the plum string type:

>>> from plum import pack
>>> pack(Utf8Str, 'Ahoj světe')
bytearray(b'Ahoj sv\xc4\x9bte')

String Features

plum string types subclass Python’s str type and inherit all string methods and behaviors. Besides the ability to plug and play plum string types within other plum types such as Array or Structure, the plum string types offer no additional functionality.