A header-only, C++11 data conversion library
cppdatalib offers implementations of several different serialization formats designed for hierarchical data (and some that aren't). cppdatalib is able to easily convert to and from a standard internal representation. Adapters also exist to integrate smoothly with the following frameworks:
- Boost.Compute
- Boost.Container
- Qt
- POCO
- ETL
- The C++ STL
Supported formats include
- JSON
- UBJSON
- Bencode
- plain text property lists
- CSV
- Binn
- MessagePack
- MySQL (database/table retrieval and writing)
- XML property lists (write-only)
- XML-RPC (write-only)
- XML-XLS (write-only)
- BJSON (write-only)
- Netstrings (write-only)
- Transenc
- CBOR
- TSV
cppdatalib offers a variety of filters that can be applied to stream handlers. These include the following:
buffer_filter
Optionally buffers strings, arrays, and objects, or any combination of the same, or acts as a pass-through filterautomatic_buffer_filter
Automatically determines the correct settings for the underlying buffer_filter based on the output stream handlertee_filter
Splits an input stream to two output stream handlersview_filter
Applies a view function to every element of the specified type. Essentially the same ascustom_converter_filter
, but can't edit the value and is more efficient.range_filter
Pass-through filter that computes the maximum and minimum values of the specified type, as well as the midpoint (only applicable for numeric values)mean_filter
Pass-through filter that computes the arithmetic, geometric, and harmonic means of numeric valuesdispersal_filter
Pass-through filter that computes the variance and standard deviation of numeric values (subclass ofmean_filter
, so both central tendency and dispersal can be calculated with this class)array_sort_filter
Sorts all arrays deeper than the specified nesting level (or all arrays, if 0 is specified), in either ascending or descending ordertable_to_array_of_maps_filter
Converts a table to an array of maps, using an external column-name list. Also supports converting single-dimension arrays to object-wrapped values with specified column keyduplicate_key_check_filter
Ensures the input stream only provides unique keys in objects. This filter supports complex keys, including nested objectsconverter_filter
Converts from one internal type to another, for example, all integers to strings. This filter has built-in conversionscustom_converter_filter
Converts the specified internal type, using the user-specified converter function. This filter supports varying output types, including the same type as the inputgeneric_converter_filter
Sends all scalar values to a user-specified function for conversion. Arrays and objects cannot be converted with this filter
Filters can be assigned on top of other filters. How many filters are permitted is limited only by the runtime environment.
cppdatalib supports streaming with a small memory footprint. Most conversions require no buffering or minimal buffering. Also, there is no limit to the nesting depth of arrays or objects. This makes cppdatalib much more suitable for large datasets.
Using the library is simple. Everything is under the main namespace cppdatalib
, and underneath is the core
namespace and individual format namespaces (e.g. json
).
It is recommended to use using
statements to pull in format namespaces.
For example, the following programs are identical attempts to read a JSON structure from STDIN, and output it to STDOUT:
Read through value class:
#include <cppdatalib/cppdatalib.h>
int main() {
using namespace cppdatalib; // Parent namespace
using namespace json; // Format namespace
core::value my_value; // Global cross-format value class
try {
json::parser p(std::cin); // Initialize parser
json::stream_writer w(std::cout); // Initialize writer
p >> my_value; // Read in to core::value as JSON
w << my_value; // Write core::value out as JSON
} catch (core::error e) {
std::cerr << e.what() << std::endl; // Catch any errors that might have occured (syntax or logical)
}
return 0;
}
Read without parser (still uses intermediate value - result of from_json
):
#include <cppdatalib/cppdatalib.h>
int main()
{
using namespace cppdatalib; // Parent namespace
using namespace json; // Format namespace
try {
json::stream_writer(std::cout) << from_json(std::cin); // Write core::value out to STDOUT as JSON
} catch (core::error e) {
std::cerr << e.what() << std::endl; // Catch any errors that might have occured (syntax or logical)
}
return 0;
}
Read without intermediate value (extremely memory efficient):
#include <cppdatalib/cppdatalib.h>
int main()
{
using namespace cppdatalib; // Parent namespace
using namespace json; // Format namespace
try {
json::parser(std::cin) >> json::stream_writer(std::cout); // Write core::value out to STDOUT as JSON
} catch (core::error e) {
std::cerr << e.what() << std::endl; // Catch any errors that might have occured (syntax or logical)
}
return 0;
}
To use the lower-level stream-handling (parse-on-the-fly) classes, see the below example:
#include <cppdatalib/cppdatalib.h>
int main() {
using namespace cppdatalib; // Parent namespace
using namespace json; // Format namespace
using namespace ubjson; // Another format namespace
core::value my_value;
core::value_builder builder(my_value); // Set up value builder stream handler
// It acts as a stream handler that writes all data into the
// internal value structure
ubjson::stream_writer writer(std::cout); // UBJSON writer to standard output
try {
json::parser parser(std::cin);
parser >> writer; // Convert from JSON on standard input to UBJSON on standard output
// Note that this DOES NOT READ the entire stream before writing!
// The data is read and written at the same time
parser >> builder; // Convert from JSON on standard input to internal representation in
// `my_value`. Note that my_value is also accessible by using `builder.value()`
core::convert(my_value, writer); // Convert from internal representation to UBJSON on standard output
} catch (core::error e) {
std::cerr << e.what() << std::endl; // Catch any errors that might have occured (syntax or logical)
}
try {
json::parser parser(std::cin);
core::stream_filter<core::null, core::string> filter(writer);
// Set up filter on UBJSON output that converts all `null` values to empty strings
parser >> filter; // Convert from JSON on standard input to UBJSON on standard output, converting `null`s to empty strings
// When using a filter, write to the filter, instead of the handler the filter is modifying
// (i.e. don't write to `writer` here unless you don't want to employ the filter)
// Note that this DOES NOT READ the entire stream before writing!
// The data is read and written at the same time
} catch (core::error e) {
std::cerr << e.what() << std::endl; // Catch any errors that might have occured (syntax or logical)
}
try {
json::parser parser(std::cin);
core::stream_filter<core::null, core::string> filter(writer);
// Set up filter on UBJSON output that converts all `null` values to empty strings
auto lambda = [](core::value &v)
{
if (v.is_string() && v.get_string().find('a') == 0)
v.set_string("");
};
core::generic_stream_filter<decltype(lambda)> generic_filter(filter, lambda);
// Set up filter on top of previous filter that clears all strings beginning with lowercase 'a'
parser >> generic_filter; // Convert from JSON on standard input to UBJSON on standard output,
// converting `null`s to empty strings, and clearing all strings beginning with 'a'
// Again, note that this does not read the entire stream before writing
// The data is read and written at the same time
} catch (core::error e) {
std::cerr << e.what() << std::endl; // Catch any errors that might have occured (syntax or logical)
}
try {
json::parser parser(std::cin);
core::stream_filter<core::boolean, core::integer> second_filter(writer);
// Set up filter on UBJSON output that converts booleans to integers
core::stream_filter<core::integer, core::real> first_filter(second_filter);
// Set up filter on top of previous filter that converts all integers to reals
parser >> first_filter; // Convert from JSON on standard input to UBJSON on standard output,
// converting booleans to integers, and converting integers to reals
// Note that order of filters is important. The last filter enabled will be the first to be called.
// If the filter order was switched, all booleans and integers would become reals.
} catch (core::error e) {
std::cerr << e.what() << std::endl; // Catch any errors that might have occured (syntax or logical)
}
try {
json::parser parser(std::cin);
core::tee_filter tee(writer, builder);
// Set up tee filter. Tee filters split the input to two handlers, which can be stream_filters or other tee_filters.
// In this case, we'll parse the JSON input once, and output to UBJSON on standard output and build an internal
// representation simultaneously.
parser >> tee_filter; // Convert from JSON on standard input to UBJSON on standard output,
// as well as building internal representation in my_value
} catch (core::error e) {
std::cerr << e.what() << std::endl; // Catch any errors that might have occured (syntax or logical)
}
return 0;
}
Almost every part of cppdatalib is extensible.
To create a new output format, follow the following guidelines:
- Create a new namespace under
cppdatalib
for the format - In the new namespace, define a class
stream_writer
that inheritscore::stream_writer
andcore::stream_handler
. The stream writer class can reimplement the following functions:void begin_();
- Called to initialize the format. The default implementation does nothingvoid end_();
- Called to deinitalize the format. The default implementation does nothingbool write_(const core::value &v, bool is_key);
- Called when a value is written bywrite()
, with the value to be written passed inv
.is_key
is true if the specified value is an object key. Reimplementations of this function should returntrue
if the value was written, andfalse
if the value still needs to be processed. The default implementation returnsfalse
void begin_item_(const core::value &v);
- Called when starting to parse any non-key value. The opposite ofbegin_key_()
. The default implementation does nothing.void end_item_(const core::value &v);
- Called when ending parsing of any non-key value. The opposite ofend_key_()
. The default implementation does nothing.void begin_scalar_(const core::value &v, bool is_key);
- Called when starting to parse any scalar value. This includes all value types except arrays and objects.is_key
is true if the specified value is an object key. The default implementation does nothing.void end_scalar_(const core::value &v, bool is_key);
- Called when ending parsing of any scalar value. This includes all value types except arrays and objects.is_key
is true if the specified value is an object key. The default implementation does nothing.void begin_key_(const core::value &v);
- Called when starting to parse any key value. The opposite ofbegin_item_()
. The default implementation does nothing.void end_key_(const core::value &v);
- Called when ending parsing of any key value. The opposite ofend_item_()
. The default implementation does nothing.void null_(const core::value &v);
- Called when a scalarnull
is written. The default implementation does nothing.void bool_(const core::value &v);
- Called when a scalarboolean
is written. The default implementation does nothing.void integer_(const core::value &v);
- Called when a scalarinteger
is written. The default implementation does nothing.void uinteger_(const core::value &v);
- Called when a scalaruinteger
is written. The default implementation does nothing.void real_(const core::value &v);
- Called when a scalarreal
is written. The default implementation does nothing.void begin_string_(const core::value &v, core::int_t size, bool is_key);
- Called when starting to parse a string. The size, if known, is passed insize
. If the size is unknown,size
is equal tostream_handler::unknown_size
. Ifv.size()
is equal tosize
, the entire string is available for analysis.is_key
is true if the string is an object key. The default implementation does nothing.void string_data_(const core::value &v, bool is_key);
- Called when data is available to append to a string. The string is contained inv
.is_key
is true if the string currently being parsed is an object key. The default implementation does nothing.void end_string_(const core::value &v, bool is_key);
- Called when ending parsing of a string.is_key
is true if the string is an object key. The default implementation does nothing.void begin_array_(const core::value &v, core::int_t size, bool is_key);
- Called when starting to parse an array. The size, if known, is passed insize
. If the size is unknown,size
is equal tostream_handler::unknown_size
. Ifv.size()
is equal tosize
, the entire array is available for analysis.is_key
is true if the array is an object key. The default implementation does nothing.void end_array_(const core::value &v, bool is_key);
- Called when ending parsing of an array.is_key
is true if the array is an object key. The default implementation does nothing.void begin_object_(const core::value &v, core::int_t size, bool is_key);
- Called when starting to parse an object. The size, if known, is passed insize
. If the size is unknown,size
is equal tostream_handler::unknown_size
. Ifv.size()
is equal tosize
, the entire object is available for analysis.is_key
is true if the object is an object key. The default implementation does nothing.void end_object_(const core::value &v, bool is_key);
- Called when ending parsing of an object.is_key
is true if the object is an object key. The default implementation does nothing.
- All data should be written via the member function
core::ostream &output_stream();
- Required buffering features should be provided by member function
unsigned int required_features() const;
The available features areconst
members of thecore::stream_handler
class. The default implementation returnsstream_handler::requires_none
.
Since the default implementations of all these functions do nothing, an instance of core::stream_handler
can be used as a dummy output sink that does nothing.
To create a new filter, follow the guidelines for output formats, but place them in the cppdatalib::core
namespace. Inherit all filters from core::stream_filter_base
. Writing should be done exclusively via member variable output
. The reimplementation rules all still apply, but core::stream_filter_base
provides pass-through writing of all values to output
. This allows you to reimplement just what you need, and pass everything else through unmodified.
Below is a list of compile-time type adjustments supported by cppdatalib:
CPPDATALIB_BOOL_T
- The underlying boolean type of the implementation. Should be able to store a true and false value. Defaults tobool
CPPDATALIB_INT_T
- The underlying integer type of the implementation. Should be able to store a signed integral value. Defaults toint64_t
CPPDATALIB_UINT_T
- The underlying unsigned integer type of the implementation. Should be able to store an unsigned integral value. Defaults touint64_t
CPPDATALIB_REAL_T
- The underlying floating-point type of the implementation. Should be able to store at least an IEEE-754 value. Defaults todouble
CPPDATALIB_CSTRING_T
- The underlying C-style string type of the implementation. Defaults toconst char *
CPPDATALIB_STRING_T
- The underlying string type of the implementation. Defaults tostd::string
CPPDATALIB_ARRAY_T
- The underlying array type of the implementation. Defaults tostd::vector<cppdatalib::core::value>
CPPDATALIB_OBJECT_T
- The underlying object type of the implementation. Defaults tostd::multimap<cppdatalib::core::value, cppdatalib::core::value>
CPPDATALIB_SUBTYPE_T
- The underlying subtype type of the implementation. Must be able to store all subtypes specified in thecore
namespace. Default toint16_t
Enable/Disable flags are listed below:
CPPDATALIB_ENABLE_MYSQL
- Enables inclusion of the MySQL interface library. If defined, the MySQL headers must be available in the include pathCPPDATALIB_DISABLE_WRITE_CHECKS
- Disables nesting checks in the stream_handler class. If write checks are disabled, and the generating code is buggy, it may generate corrupted output without catching the errors, but can result in better performance. Use at your own riskCPPDATALIB_ENABLE_FAST_IO
- Swaps usage of thestd::ios
classes to trimmed-down, more performant, custom I/O classes. Although it acts as a drop-in replacement for the STL, it only implements a subset of the features (but the features it does implement should be usage-compatible). Use at your own riskCPPDATALIB_DISABLE_FAST_IO_GCOUNT
- Disables calculation ofgcount()
in the fast input classes. This removes thegcount()
function altogether. This flag only has an effect ifCPPDATALIB_ENABLE_FAST_INPUT
is definedCPPDATALIB_OPTIMIZE_FOR_NUMERIC_SPACE
- Trims value sizes down to optimize space for large numeric arrays. This theoretically slows down string values somewhat, but saves spaceCPPDATALIB_ENABLE_BOOST_COMPUTE
- Enables the Boost.Compute adapters, to smoothly integrate with Boost.Compute types. The Boost source tree must be in the include pathCPPDATALIB_ENABLE_BOOST_CONTAINER
- Enables the Boost.Container adapters, to smoothly integrate with Boost.Container types. The Boost source tree must be in the include pathCPPDATALIB_ENABLE_QT
- Enables the Qt adapters, to smoothly integrate with the most common Qt types. The Qt source tree must be in the include pathCPPDATALIB_ENABLE_POCO
- Enables the POCO adapters, to smoothly integrate with POCO types. The "Poco" source tree must be in the include pathCPPDATALIB_ENABLE_ETL
- Enables the ETL adapters, to smoothly integrate with ETL types. The "etl" source tree must be in the include pathCPPDATALIB_ENABLE_STL
- Enables the STL adapters, to smoothly integrate with all STL types
Please note that custom datatypes are a work-in-progress. Defining custom types may work, or may not work at all.
If a type is unsupported in a serialization format, the type is not converted to something recognizable by the format, but an error is thrown instead, describing the failure. However, if a subtype is not supported, the value is processed as if it had none (i.e. if a value is a string
, with unsupported subtype date
, the value is processed as a meaningless string and the subtype metadata is removed).
If a format-defined limit is reached, such as an object key length limit, an error will be thrown.
-
JSON supports
null
,bool
,uint
andint
andreal
,string
,array
, andobject
.
Notes:string
subtypebignum
is fully supported, for both reading and writing.- There is no limit to the magnitude of numbers supported.
int
is attempted first, thenuint
, thenreal
, thenbignum
. - Numerical metadata is lost when converting to and from JSON.
-
Bencode supports
uint
,int
,string
,array
, andobject
.
Notes:null
,bool
, andreal
s are not supported.- integers are limited to two's complement 64-bit integers when reading.
- No subtypes are supported.
- Numerical metadata is lost when converting to and from Bencode.
-
plain text property lists support
bool
,uint
,int
,real
,string
,array
, andobject
.
Notes:null
s are not supported.- integers are limited to two's complement 64-bit integers when reading.
string
subtypesdate
,time
, anddatetime
are supported.- Numerical metadata is lost when converting to and from plain text property lists.
-
Binn supports
null
,bool
,int
,real
,string
,array
, andobject
.
Notes:string
subtypesdate
,time
,datetime
,bignum
,blob
, andclob
are supported.object
subtypemap
is supported.- Map keys are limited to signed 32-bit integers.
- Object keys are limited to 255 characters or fewer.
- Custom subtypes for
null
,bool
,int
,string
, andarray
are supported, as long as they are at least equal to the value ofcore::user
.
-
XML property lists support
bool
,uint
,int
,real
,string
,array
, andobject
.
Notes:null
s are not supported.string
subtypesdate
,time
, anddatetime
are supported.- Numerical metadata is lost when converting to XML property lists.
-
XML-RPC supports
bool
,uint
,int
,real
,string
,array
, andobject
.
Notes:null
s are not supported.- No subtypes are supported.
- Numerical metadata is lost when converting to XML-RPC.
-
CSV supports
null
,bool
,uint
,int
,real
,string
, andarray
.
Notes:object
s are not supported.uint
values are fully supported when reading.- No subtypes are supported.
- Numerical metadata is lost when converting to and from CSV.
-
UBJSON supports
null
,bool
,uint
,int
,real
,string
,array
, andobject
. Notes:string
subtypebignum
is supported.uint
values are limited to theint
value range when reading or writing, due to lack of format support for unsigned types
-
XML-XLS supports
null
,bool
,uint
,int
,real
,string
, andarray
.
Notes:object
s are not supported.string
subtypesdate
,time
, anddatetime
are supported.- Numerical metadata is lost when converting to and from XML-XLS.
-
MessagePack supports
null
,bool
,uint
,int
,real
,string
,array
, andobject
.
Notes:- MessagePack extensions are not currently supported.
string
subtypesblob
andclob
are supported.- MessagePack timestamps are not currently supported.
-
BJSON supports
null
,bool
,uint
,int
,real
,string
,array
, andobject
.
Notes:string
subtypesblob
andclob
are supported.- Numerical metadata is lost when converting to and from BJSON.
-
Netstrings supports
null
,bool
,uint
,int
,real
,string
,array
, andobject
.
Notes:- No subtypes are supported.
- Type information is lost when converting to Netstrings.