c++boost.gif (8819 bytes)

Boost.Python Pickle Support

Pickle is a Python module for object serialization, also known as persistence, marshalling, or flattening.

It is often necessary to save and restore the contents of an object to a file. One approach to this problem is to write a pair of functions that read and write data from a file in a special format. A powerful alternative approach is to use Python's pickle module. Exploiting Python's ability for introspection, the pickle module recursively converts nearly arbitrary Python objects into a stream of bytes that can be written to a file.

The Boost Python Library supports the pickle module by emulating the interface implemented by Jim Fulton's ExtensionClass module that is included in the ZOPE distribution. This interface is similar to that for regular Python classes as described in detail in the Python Library Reference for pickle.


The Boost.Python Pickle Interface

At the user level, the Boost.Python pickle interface involves three special methods:
__getinitargs__
When an instance of a Boost.Python extension class is pickled, the pickler tests if the instance has a __getinitargs__ method. This method must return a Python tuple (it is most convenient to use a boost::python::tuple). When the instance is restored by the unpickler, the contents of this tuple are used as the arguments for the class constructor.

If __getinitargs__ is not defined, the class constructor will be called without arguments.

__getstate__
When an instance of a Boost.Python extension class is pickled, the pickler tests if the instance has a __getstate__ method. This method should return a Python object representing the state of the instance.

If __getstate__ is not defined, the instance's __dict__ is pickled (if it is not empty).

__setstate__
When an instance of a Boost.Python extension class is restored by the unpickler, it is first constructed using the result of __getinitargs__ as arguments (see above). Subsequently the unpickler tests if the new instance has a __setstate__ method. If so, this method is called with the result of __getstate__ (a Python object) as the argument.

If __setstate__ is not defined, the result of __getstate__ must be a Python dictionary. The items of this dictionary are added to the instance's __dict__.

If both __getstate__ and __setstate__ are defined, the Python object returned by __getstate__ need not be a dictionary. The __getstate__ and __setstate__ methods can do what they want.

Pitfalls and Safety Guards

In Boost.Python extension modules with many extension classes, providing complete pickle support for all classes would be a significant overhead. In general complete pickle support should only be implemented for extension classes that will eventually be pickled. However, the author of a Boost.Python extension module might not anticipate correctly which classes need support for pickle. Unfortunately, the pickle protocol described above has two important pitfalls that the end user of a Boost.Python extension module might not be aware of:
Pitfall 1: Both __getinitargs__ and __getstate__ are not defined.
In this situation the unpickler calls the class constructor without arguments and then adds the __dict__ that was pickled by default to that of the new instance.

However, most C++ classes wrapped with Boost.Python will have member data that are not restored correctly by this procedure. To alert the user to this problem, a safety guard is provided. If both __getinitargs__ and __getstate__ are not defined, Boost.Python tests if the class has an attribute __dict_defines_state__. An exception is raised if this attribute is not defined:

    RuntimeError: Incomplete pickle support (__dict_defines_state__ not set)
In the rare cases where this is not the desired behavior, the safety guard can deliberately be disabled. The corresponding C++ code for this is, e.g.:
    class_builder<your_class> py_your_class(your_module, "your_class");
    py_your_class.dict_defines_state();
It is also possible to override the safety guard at the Python level. E.g.:
    import your_bpl_module
    class your_class(your_bpl_module.your_class):
      __dict_defines_state__ = 1

Pitfall 2: __getstate__ is defined and the instance's __dict__ is not empty.
The author of a Boost.Python extension class might provide a __getstate__ method without considering the possibilities that:

  • his class is used in Python as a base class. Most likely the __dict__ of instances of the derived class needs to be pickled in order to restore the instances correctly.

  • the user adds items to the instance's __dict__ directly. Again, the __dict__ of the instance then needs to be pickled.

To alert the user to this highly unobvious problem, a safety guard is provided. If __getstate__ is defined and the instance's __dict__ is not empty, Boost.Python tests if the class has an attribute __getstate_manages_dict__. An exception is raised if this attribute is not defined:

    RuntimeError: Incomplete pickle support (__getstate_manages_dict__ not set)
To resolve this problem, it should first be established that the __getstate__ and __setstate__ methods manage the instances's __dict__ correctly. Note that this can be done both at the C++ and the Python level. Finally, the safety guard should intentionally be overridden. E.g. in C++:
    class_builder<your_class> py_your_class(your_module, "your_class");
    py_your_class.getstate_manages_dict();
In Python:
    import your_bpl_module
    class your_class(your_bpl_module.your_class):
      __getstate_manages_dict__ = 1
      def __getstate__(self):
        # your code here
      def __setstate__(self, state):
        # your code here

Practical Advice


Examples

There are three files in boost/libs/python/example that show how so provide pickle support.

pickle1.cpp

The C++ class in this example can be fully restored by passing the appropriate argument to the constructor. Therefore it is sufficient to define the pickle interface method __getinitargs__.

pickle2.cpp

The C++ class in this example contains member data that cannot be restored by any of the constructors. Therefore it is necessary to provide the __getstate__/__setstate__ pair of pickle interface methods.

For simplicity, the __dict__ is not included in the result of __getstate__. This is not generally recommended, but a valid approach if it is anticipated that the object's __dict__ will always be empty. Note that the safety guards will catch the cases where this assumption is violated.

pickle3.cpp

This example is similar to pickle2.cpp. However, the object's __dict__ is included in the result of __getstate__. This requires more code but is unavoidable if the object's __dict__ is not always empty.
© Copyright Ralf W. Grosse-Kunstleve 2001. Permission to copy, use, modify, sell and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.

Updated: March 21, 2001