C++ FAQ Celebrating Twenty-One Years of the C++ FAQ!!!
(Click here for a personal note from Marshall Cline.)
Section 36:
[36.2] How do I select the best serialization technique?

There are lots and lots (and lots) of if's, and's and but's, and in reality there are a whole continuum of techniques with lots of dimensions. Because I have a finite amount of time (translation: I don't get paid for any of this), I've simplified it to a decision between using human-readable ("text") or non-human-readable ("binary") format, followed by a list of five techniques arranged more-or-less in increasing order of sophistication.

You are, of course, not limited to those five techniques. You will probably end up mixing ideas from several techniques. And certainly you can always use a more sophisticated (higher numbered) technique than is actually needed. In fact it might be wise to use a more sophisticated technique than is minimally needed if you believe future changes will require the greater sophistication. So think of this list merely as a good starting point.

There's a lot here, so get ready!

  1. Decide between human-readable ("text") and non-human-readable ("binary") formats. The tradeoffs are non-trivial. Later FAQs show how to write simple types in text format and how to write simple types in binary format.
  2. Use the least sophisticated solution when the objects to be serialized aren't part of an inheritance hierarchy (that is, when they're all of the same class) and when they don't contain pointers to other objects.
  3. Use the second level of sophistication when the objects to be serialized are part of an inheritance hierarchy, but when they don't contain pointers to other objects.
  4. Use the third level of sophistication when the objects to be serialized contain pointers to other objects, but when those pointers form a tree with no cycles and no joins.
  5. Use the fourth level of sophistication when the objects to be serialized contain pointers to other objects, and when those pointers form a graph with no cycles, and with joins at the leaves only.
  6. Use the most sophisticated solution when the objects to be serialized contain pointers to other objects, and when those pointers form a graph that might have cycles or joins.

Here's that same information arranged like an algorithm:

  1. The first step is to make an eyes-open decision between text- and binary-formats.
  2. If your objects aren't part of an inheritance hierarchy and don't contain pointers, use solution #1.
  3. Else if your objects don't contain pointers to other objects, use solution #2.
  4. Else if the graph of pointers within your objects contain neither cycles nor joins, use solution #3.
  5. Else if the graph of pointers within your objects don't contain cycles and if the only joins are to terminal (leaf) nodes, use solution #4.
  6. Else use solution #5.

Remember: feel free to mix and match, to add to the above list, and, if you can justify the added expense, to use a more sophisticated technique than is minimally required.

One more thing: the issues of inheritance and of pointers within the objects are logically unrelated, so there's no theoretical reason for #2 to be any less sophisticated than #3-5. However in practice it often (not always) works out that way. So please do not think of these categories as somehow sacred — they're somewhat arbitrary, and you are expected to mix and match the solutions to fit your situation. This whole area of serialization has far more variants and shades of gray than can be covered in a few questions/answers.