-
Notifications
You must be signed in to change notification settings - Fork 0
Serialization
Serialization is an import part of NuPIC. Saving trained models is important for sharing in the community and also for many potential applications. The important aspects are:
- Speed - we want to be able to save to and load from disk as fast as possible.
- Durability - a model should be completely identical after deserialization to the model that was saved.
- Compatibility - we want a format that is easy to maintain as it evolves (old saved models should still be able to be loaded) and works across languages.
The current format is implementation-specific. It uses a combination of Python pickling (cPickle module) and direct file writing in C++. Models using different implementations of the classifier or other components will not have the same serialization format and can only be deserialized into the same implementation.
We are currently exploring options for a new format that better fits the goals. Specifically, the new format must work across languages and be at least as fast, hopefully faster, than the current method.
Options:
- MessagePack
- Protocol Buffers
- Cap'n Proto
- JSON / BSON
- Avro
- Thrift
This is a very new data serialization format that has some potentially nice properties. Specifically, it uses the same in-memory representation across languages and potentially supports mmap for fast serialization of modifications.
It appears that it is still lacking a reliable release but is in active development so will be re-reviewed after other options.