C++ Trees Part 1
Understanding the core::tree<> Implementation
by Justin Gottschlich

1. Introduction

For all of C++'s brilliance of strongly type-safe generic programming design, its standard library selection leaves much to be desired. Unlike Java, which has libraries for just about everything, C++ only has a handful. These C++ standard libraries are for containers, algorithms, streams and the like. Surprisingly, C++'s list of standard containers does not include a tree container [Musser1].

While hopes exist that C++0x may come with tree container support, C++'s current lack of native tree container support is enough to push programmers away from correctly designed systems using trees, in favor of poorly designed systems that use currently available standard containers. I created the core::tree<> container primarily to overcome this hurdle and add a missing piece to C++'s already powerful and elegantly designed generic containers.

After presenting the arguments made within this article to senior software engineers at Quark, Inc., they began adopting the core::tree<> family (tree, multitree, tree_pair, multitree_pair). Quark has since licensed the core::tree family (royalty free) and has been using it since 2002. They are currently using the core::tree family in their world class software, QuarkXPress 6.0 and expanding its use greatly for QuarkXPress 7.0 [Quark1].

This article is only part one of a series of articles written on the core::tree<> family. Part one primarily focuses on explaining the limitations of using C++'s map<> and multimap<> as tree containers and show the advantages of using the core::tree<> in its stead.

Additionally, simplistic sample core::tree<> code is given to increase basic familiarity with the core::tree<> design. Later installments of to the core::tree<> series will explain more complex usage of the core::tree<> and its flexibility in design.

Lastly, the core::tree<> source code is included for anyone to use, royalty free. The only request is that the licensing agreement found in the tree.h header is followed.

The layout of this article is as follows:

Introduction
std::map<> versus core::tree<>
Using the core::tree<>
Conclusion
References
core::tree<> source (tree.h header)

2. std::map<> versus core::tree<>

Many experienced C++ programmers use C++'s standard template library's (STL) map<> and multimap<> to generate trees. Yet, these containers make poor substitutes for true tree designs for a variety of reasons. This section will take a closer look at the pitfalls of using C++'s std::map<> and std::multimap<> for tree containment.

2.1 Basic std::map<> tree implementation

Before beginning the implementation of a tree using std::map<>, a framework must be defined in which the tree will work. For the purposes of this article the following namespace will be used to control the number of leaves generated at the root level and maximum branch depth of the trees constructed:

namespace nTreeData
{
   const int kMaxLeaves = 10;
   const int kMaxDepth = 5;
}
Figure 2.1

Additionally, the following class defined (in figure 2.2) will be used as the leaf node, which will be inserted at every branch of the tree.

class Leaf
{
public:

   Leaf() : value_(0) {}
   explicit Leaf(const int &value) : value_(value) {}

   const int &value() const { return value_; }

   bool operator==(const Leaf &rhs) const
      { return this->value() == rhs.value(); }
   bool operator<(const Leaf &rhs) const
      { return this->value() < rhs.value(); }

private:
   int value_;
};
Figure 2.2

Combining the definitions provided in figure 2.1 and figure 2.2, an example of an std::map<> tree can be constructed:

#include <map>
#include <iostream>

typedef std::map<Leaf, int> LeafMapConcrete;
typedef std::map<Leaf, int>* LeafMapPointer;
typedef std::map<Leaf, LeafMapPointer > LeafMap;

void fun()
{
   using namespace nTreeData;
   LeafMap leafTree;

   ////////////////////////////////////////////////////
   // create a simple leaf tree
   ////////////////////////////////////////////////////
   for (int i = 0; i < kMaxLeaves; ++i)
   {
      // insert a starter leaf
      LeafMapPointer p = new LeafMapConcrete;
      leafTree.insert(LeafMap::value_type(Leaf(i), p));
      LeafMap::iterator iter = leafTree.find(Leaf(i));

      // continue inserting children inside of children
      for (int depth = 0; depth < kMaxDepth; ++depth)
      {
         LeafMapPointer inner = new LeafMapConcrete;
         LeafMap* outer = (LeafMap*)(iter->second);
         outer->insert(LeafMap::value_type(Leaf(depth), inner));

         iter = outer->find(Leaf(depth));
      }
   }

   ////////////////////////////////////////////////////
   // deallocate the leaf tree
   ////////////////////////////////////////////////////
   for (LeafMap::iterator destroy = leafTree.begin();
        destroy != leafTree.end();
        ++destroy)
   {
      LeafMap::const_iterator inner = destroy;
      LeafMap* iterMap = (LeafMap*)(destroy->second);
      LeafMap* lastMap;

      for (inner = iterMap->begin(); inner != iterMap->end();
           inner = iterMap->begin())
      {
         lastMap = iterMap;
         // move the iterMap forward
         iterMap = (LeafMap*)inner->second;
         delete lastMap;
      }
   }   
}
Figure 2.3

Figure 2.3 demonstrates how to use a std::map<> to implement a tree in C++. The above implementation is a common method for overcoming STL's lack of native tree container support. Unfortunately, it has many problems:

Dynamic memory allocation and deallocation must be implemented for the tree by the programmer. Additional code must be added in figure 2.3 to do full depth and breadth tree iteration to allocate and deallocate each branch. Currently, figure 2.3 only provides simple deallocation (since we're assuming knowledge of the tree layout). Correct and complete memory deallocation requires more work. Additionally, it is very common for programmers to make mistakes when implementing complex memory management, which leads to memory leaks. One of the advantages of using STL containers is that they perform memory management internally [Josuttis1]. However, when using STL maps to construct trees in this manner, the memory management advantage of STL is lost.
Many C-style / reinterpret_cast<> type-casts are required. Type-casting is dangerous (especially, C-style casting and reinterpret_cast<>). Accidental incorrect type-casting can cause unexpected run time behavior. As Stroustrup reminds us, we should avoid explicit casting [Stroustrup1].
The code is complex. Simply doing the construction and destruction of the tree is rather hard to understand. If the code to generate and populate a simple tree is this difficult to write, how difficult will it be to write more complex trees? What about the difficulty of maintaining this tree code (especially by someone who didn't originally write the code)? Sutter and Alexandrescu point out in their "C++ Coding Standards", simple should always be preferred over complex [Sutter1]. Figure 2.3's tree implementation clearly violates this rule.
There is a great deal of wasted space. For each branch of the tree generated, an extra integer is generated serving no purpose except as a filler. Unfortunately, there is no way to remove this extra filler. This is another clue that the design is flawed - unnecessary pieces are present in the implementation that do not aid the design.

Consider now the same tree being generated using the core::tree<> container:

#include "tree.h"
#include <iostream>

void fun()
{
   using namespace nTreeData;
   using namespace core;

   core::tree<Leaf> leafTree;

   ////////////////////////////////////////////////////
   // create a simple leaf tree
   ////////////////////////////////////////////////////
   for (int i = 0; i < kMaxLeaves; ++i)
   {
      // insert a starter leaf
      tree<Leaf>::iterator iter = leafTree.insert(Leaf(i));

      // continue inserting children each time
      for (int depth = 0; depth < kMaxDepth; ++depth)
      {
         // insert and step into the newly created branch 
         iter = iter.insert(Leaf(depth));
      }
   }
}
Figure 2.4

The code within figure 2.4 implements the same tree as the std::map<> solution within figure 2.3. However, the tree constructed using the core::tree<> container (figure 2.4) is less complex than the tree constructed using std::map<>. Furthermore, the core::tree<> implementation requires less code than the std::map<> implementation. Lastly, the core::tree<> implementation has none of the pitfalls the std::map<> implementation has:

No dynamic memory allocation or deallocation is needed, it is all handled internally.
Not even a single type-cast is used.
The code is very straightforward.
There is no wasted space.

Reviewing the above implementations of trees, it is clear the tree implementation using std::map<> is error-prone, complex and run time unsafe. However, the tree implementation using the core::tree<> is simple and elegant, and solves the tree design problem with no additional programmatic overhead. The core::tree<> code is also easier to understand than the std::map<> code which leads to easier code maintenance.

3. Using the core::tree<>

C++'s STL implementation is genius. The advantages and subtleties of its implementation are so numerous, it is impossible to cover them in a single article, or even in a single book. It was with STL in mind, that the core::tree<> was designed. The core::tree<> container follows as many of the STL design practices as possible, while still ensuring its own tree behavior remains correct.

The core::tree<> implements both const_iterators and iterators for tree iteration. By default, it uses operator<() and operator==() for inserts and finds/removes, but allows predicates to be used to overload that functionality. The core::tree<> implements begin() and end() on its containers as well as post-increment and pre-increment on its iterators. For moving in and out within the tree, methods in() and out() can be called – this makes tree depth iteration very simple. Many other powerful pieces of functionality are implemented as well, such as size(), level() and, of course, full tree copying (just like all STL containers) by use of operator=().

Perhaps the biggest downfall of the std::map<> tree implementation is its lack of simple "complete" tree copying. Calling operator=() on the std::map<> would result in pointer copying, not object copying and implementing a full tree copy would require even more dynamical memory allocation introducing more possibilities for erroneous programmer made memory management mistakes (more memory leaks).

Again, the above problems dissolve when using the core::tree<>. Copying a tree into another tree is as simple as:

   core::tree<Leaf> tree1;
   // ... do work on tree1 here
   // now copy the entire contents of tree1 into tree2
   core::tree<Leaf> tree2 = tree1;

Thus, the core::tree<> container's learning curve is very low for anyone who is already familiar with C++'s STL containers.

The below sample code demonstrates the ease of use of the core::tree<> container. The code in figure 3.1 performs simple tree construction and tree output.

#include <iostream>
#include "tree.h"

void fun()
{
   // the tree containers all sit inside the core namespace
   using namespace core;
   using namespace nTreeData;

   tree<Leaf> leafTree;

   ////////////////////////////////////////////////////
   // create a simple leaf tree
   ////////////////////////////////////////////////////
   for (int i = 0; i < kMaxLeaves; ++i)
   {
      // insert a starter leaf
      tree<Leaf>::iterator iter = leafTree.insert(Leaf(i));

      // continue inserting children each time
      for (int depth = 0; depth < kMaxDepth; ++depth)
      {
         // insert and step into the newly created branch 
         iter = iter.insert(Leaf(depth));
      }
   }

   ////////////////////////////////////////////////////
   // output the leaf tree
   ////////////////////////////////////////////////////
   for (tree<Leaf>::const_iterator iter = leafTree.begin();
        iter != leafTree.end();
        ++iter)
   {
      std::cout << iter.data().value() << std::endl;

      tree<Leaf>::const_iterator inner = iter;
      // tree's iterators are containers themselves - use the
      // same iterator to traverse inwards through the tree
      for (inner = inner.in(); inner != inner.end();
           inner = inner.in())
      {
         for (int tabs = 1; tabs < inner.level(); ++tabs)
            std::cout << "\t";

         std::cout << (*inner).value() << std::endl;
      }
   }
}
Figure 3.1

For clarity, a step-by-step analysis of the core::tree<> code in figure 3.1 is given below.

Construct the tree container:
```
tree<Leaf> leafTree;
```
Populate the tree with data. The outer for loop inserts branches into the root tree. The inner for loop inserts branches into each additional branch inserted. Thus, the outer for loop is a breadth population and the inner for loop is a depth population. Notice that the inner for loop assigns the inserted iterator to itself: iter = iter.insert(). This forces iter to continue stepping inward, becoming the iterator of the branch inserted by the iter.insert() operation.
```
for (int i = 0; i < kMaxLeaves; ++i)
{
   // insert a starter leaf
   tree<Leaf>::iterator iter = leafTree.insert(Leaf(i));

   // continue inserting children each time
   for (int depth = 0; depth < kMaxDepth; ++depth)
   {
      // insert and step into the newly created branch 
      iter = iter.insert(Leaf(depth));
   }
}
```

Iterate through the tree. The outer for loop iterates through the branches of the root tree. The inner for loop iterates inward. Again, notice how inner = inner.in() is performed within the nested for loop. This forces the inner iterator to continue to step into itself until there are no further branches inward.

for (tree<Leaf>::const_iterator iter = leafTree.begin();
     iter != leafTree.end();
     ++iter)
{
   std::cout << iter.data().value() << std::endl;

   tree<Leaf>::const_iterator inner = iter;
   // tree's iterators are containers themselves - use
   // the same iterator to traverse inwards through the tree
   for (inner = inner.in(); inner != inner.end();
        inner = inner.in())
   {
      for (int tabs = 1; tabs < inner.level(); ++tabs)
         std::cout << "\t";

      std::cout << (*inner).value() << std::endl;
   }
}

The output of the leafTree that's built above is as follows:

As you can see from the above output the root of the tree has ten branches. Each of those ten branches contain five inner branches, all nested within each other. Analyzing the output in step 4 can aid in understanding the code in step 3.

Due to the core::tree<> design following some of the STL container paradigms, certain STL algorithms can be followed. For example, the std::for_each algorithm can be used to iterate across any breadth of tree (keep in mind, tree depth iteration requires a bit more work). Figure 3.2 demonstrates a simple core::tree<> implementation using std::for_each.

#include <algorithm>
#include <iostream>
#include "tree.h"

////////////////////////////////////////////////////
void outputLeaf(const Leaf &l)
{
   std::cout << l.value() << std::endl;
}

////////////////////////////////////////////////////
void fun()
{
   core::tree<Leaf> leafTree;

   for (int i = 0; i < nTreeData::kMaxLeaves; ++i)
      leafTree.insert(Leaf(i));
   std::for_each(leafTree.begin(), leafTree.end(), outputLeaf);
}
Figure 3.2

To perform complete tree output iteration (both breadth and depth) in a single function, a minor amount of recursion is generally suggested (although it can be performed iteratively). However, the code is still very straightforward and easy to write:

#include <iostream>
#include "tree.h"

////////////////////////////////////////////////////
void outputLeaf(core::tree<Leaf>::const_iterator &tree)
{
   // a tree iterator can check itself for its end
   for (core::tree<Leaf>::const_iterator i = tree.begin();
        i != tree.end(); ++i)
   {
      for (int tabs = 1; tabs < i.level(); ++tabs)
         std::cout << "\t";
      std::cout << (*i).value() << std::endl;

      outputLeaf(i);
   }
}
   
////////////////////////////////////////////////////
void fun()
{
   // the tree containers all sit inside the core namespace
   using namespace core;
   using namespace nTreeData;

   core::tree<Leaf> leafTree;

   //for (int i = 0; i < nTreeData::kMaxLeaves; ++i)
   //   leafTree.insert(Leaf(i));
   ////////////////////////////////////////////////////
   // create a simple leaf tree
   ////////////////////////////////////////////////////
   for (int i = 0; i < kMaxLeaves; ++i)
   {
      // insert a starter leaf
      tree<Leaf>::iterator iter = leafTree.insert(Leaf(i));

      // continue inserting children each time
      for (int depth = 0; depth < kMaxDepth; ++depth)
      {
         // insert a 100 placeholder leaf, then insert a leaf
         // and step into it
         iter.insert(Leaf(100));
         iter = iter.insert(Leaf(depth));
      }
   }

   outputLeaf(static_cast<core::tree<Leaf>::const_iterator >(leafTree));
}
Figure 3.3

Figure 3.3 will properly output a tree with any number of branches within any given branch. Additionally, figure 3.3 demonstrates a fundamental difference between core::tree<> containers and STL containers:

core::trees<> are core::tree<>::iterators.

The last line of code in figure 3.3 shows the static_cast<> operation converting a core::tree<> into a core::tree<>const_iterator. This is a fundamental deviation and necessity for the core::tree<> implementation. This concept alone, is what makes the core::tree<> container possible. More detailed explanation is given about this concept in the next portion of the core::tree<> series.

4. Conclusion

C++'s STL containers are superior when implementing the role they were designed to perform. However, using STL containers to perform actions they weren't designed to perform (as with anything else) will not only result in a poorly designed systems, it will likely cause many implementation flaws to arise (as seen in section 2.1).

C++'s std::map<> was built for key/value tables, not for N-depth tree implementations. While programmers, especially C++ programmers, are notorious for inventive solutions, generic frameworks should be used in the manner they were meant to be used. Using these frameworks for purposes other than their intended roles may (and often will) result in trouble.

The core::tree<> container is far from perfect. It certainly has limitations and flaws in its implementation; it will not work with all STL algorithms and it may even have trouble compiling on some systems. Yet, when considering what is available in C++ today and the advantages the core::tree<> brings with it, it is a great replacement for std::map<> tree implementations. Additionally, any problems encountered can be fixed directly by those using it as the entire source is at your disposal.

While the core::tree<> implementation of trees can surely be improved (as with most anything) the core::tree<> is 1) easy for most C++ programmers familiar with STL to use and 2) is already being used in two commercial pieces of software: Quark XPress 6.0 and Nodeka. This shows, at the very least, its stability of operation. If you have the need for generic tree containers in your C++ software, you should consider using the core::tree<>.

5. References

[Josuttis1] Josuttis, Nicolai M. The C++ Standard Library. Addison-Wesley, Upper Saddle River, NJ 1999 . pp 31-32.

[Musser1] Musser, David and Atul Saini. STL Tutorial and Reference Guide. Addison- Wesley, Upper Saddle River, NJ 1996. pp 69.

[Quark1] Quark, Inc. currently uses the core::tree_pair and core::multitree_pair in their commercial software for tree implementations (mostly XML trees) as the core::tree and core::multitree weren't available at the time they licensed the software. Quark has recently (April 2004) requested increased licensing permission due to their expanding need of the core::tree family for its use anywhere within their commercial software.

[Stroustrup1] Stroustrup, Bjarne. The C++ Programming Language, Special Edition. Addison-Wesley, Upper Saddle River, NJ 1997. pp 417-425.

[Sutter1] Sutter, Herb and Andrei Alexandrescu. C++ Coding Standards. Addison-Wesley, Upper Saddle River, NJ 2005.

6. core::tree<> source (tree.h header)

tree.h

About the Author

Justin has a BS in Computer Science from CSU. He is currently working on an ME in Computer Engineering / Computer Science at CU-Boulder while working full time at Raytheon where he is a Senior Software Engineer. He has been doing professional software development for over six years. He worked at Quark, Inc. for four years as a Senior Software Engineer as well as working for KORE, Inc. (startup company) for two years as a Senior Engineer. Additionally, he is also the founder/creator of the online game, Nodeka (www.nodeka.com). Nodeka is currently over 300,000 lines of C++ source, solely written by Justin.

Discuss this article in the forums

Date this article was posted to GameDev.net: 1/16/2005
(Note that this date does not necessarily correspond to the date the article was written)

See Also:
C and C++
Featured Articles