C++ FAQ Celebrating Twenty-One Years of the C++ FAQ!!!
(Click here for a personal note from Marshall Cline.)
Section 36:
[36.8] How do I serialize objects that are part of an inheritance hierarchy and that don't contain pointers to other objects?

Suppose you want to serialize a "shape" object, where Shape is an abstract class with derived classes Rectangle, Ellipse, Line, Text, etc. You would declare a pure virtual function serialize(std::ostream&) const within class Shape, and make sure the first thing done by each override is to write out the class's identity. For example, Ellipse::serialize(std::ostream&) const would write out the identifier Ellipse (perhaps as a simple string, but there are several alternatives discussed below).

Things get a little trickier when unserializing the object. You typically start with a static member function in the base class such as Shape::unserialize(std::istream& istr). This is declared to return a Shape* or perhaps a smart pointer such as Shape::Ptr. It reads the class-name identifier, then uses some sort of creational pattern to create the object. For example, you might have a table that maps from the class name to an object of the class, then use the Virtual Constructor Idiom to create the object.

Here's a concrete example: Add a pure virtual method create(std::istream&) const within base class Shape, and define each override to a one-liner that uses new to allocate an object of the appropriate derived class. E.g., Ellipse::create(std::istream& istr) const would be { return new Ellipse(istr); }. Add a static std::map<std::string,Shape*> object that maps from the class name to a representative (AKA prototype) object of the appropriate class; e.g., "Ellipse" would map to a new Ellipse(). Function Shape::unserialize(std::istream& istr) would read the class-name, throw an exception if it's not in the map (if (theMap.count(className) == 0) throw ...something...), then look up the associated Shape* and call its create() method: return theMap[className]->create(istr).

The map is typically populated during static initialization. For example, if file Ellipse.cpp contains the code for derived class Ellipse, it would also contain a static object whose ctor adds that class to the map: theMap["Ellipse"] = new Ellipse().

Notes and caveats:

  • It adds a little flexibility if Shape::unserialize() passes the class name to the create() method. In particular, that would let a derived class be used with two or more names, each with its own "network format." For example, derived class Ellipse could be used for both "Ellipse" and "Circle", which might be useful to save space in the output stream or perhaps other reasons.
  • It's usually easiest to handle errors during unserialization by throwing an exception. You can return NULL if you want, but you will need to move the code that reads the input stream out of the derived class' ctors into the corresponding create() methods, and ultimately the result is often that your code is more complicated.
  • You must be careful to avoid the static initialization order fiasco with the map used by Shape::unserialize(). This normally means using the Construct On First Use Idiom for the map itself.
  • For the map used by Shape::unserialize(), I personally prefer the Named Constructor Idiom over the Virtual Constructor Idiom — it simplifies a few steps. Details: I usually define a typedef within Shape such as typedef Shape* (*Factory)(std::istream&). This means Shape::Factory is a "pointer to a function that takes a std::istream& and returns a Shape*." I then define the map as std::map<std::string,Factory>. Finally I populate that map using lines like theMap["Ellipse"] = Ellipse::create (where Ellipse::create(std::istream&) is now a static member function of class Ellipse, that is, the Named Constructor Idiom). You'd change the return value in function Shape::unserialize(std::istream& istr) from theMap[className]->create(istr) to theMap[className](istr).
  • If you might need to serialize a NULL pointer, it's usually easy since you already write out a class identifier so you can just as easily write out a pseudo class identifier like "NULL". You might need an extra if statement in Shape::unserialize(), but if you chose my preference from the previous bullet, you can eliminate that special case (and generally keep your code clean) by defining static member function Shape* Shape::nullFactory(istream&) { return NULL; }. You add that function to the map as any other: theMap["NULL"] = Shape::nullFactory;.
  • You can make the serialized form smaller and a little faster if you tokenize the class name identifiers. For example, write a class name only the first time it is seen, and for subsequent uses write only a corresponding integer index. A mapping such as std::map<std::string,unsigned> unique makes this easy: if a class name is already in the map, write unique[className]; otherwise set a variable unsigned n = unique.size(), write n, write the class name, and set unique[className] = n. (Note: be sure to copy it into a separate variable. Do not say unique[className] = unique.size()! You have been warned! Reason: the compiler might evaluate unique[className] before unique.size(), and if so, unique[className] will pre-increment the size.) When unserializing, use std::vector<std::string> unique, read the number n, and if n == unique.size(), read a name and add it to the vector. Either way the name will be unique[n]. You can also pre-populate the first N slots in these tables with the N most common names, that way streams won't need to contain any of those strings.