Using XML Technologies For Enhancing Log Files
by Jack "jollyjeffers" Hoxley and Oli "evolutional" Wilkinson

Basic requirements and background

Before I get stuck in with explaining the finer points of how to apply XML technologies to software logging there is some background and foundation work to get through.

It is out of the scope of this article to explain all of the technologies from scratch – everything is pretty easy to learn, and I'd say it's even possible for any moderately intelligent individual to pick up everything they need from this article alone.

In order to generate a powerful logging tool it is not necessary to understand all of the technologies fully – there are some parts that just aren't that important. For example, one assumption (that may be frowned upon by some!) is that we will be completely skipping validation with DTDs or schema. This assumption is based on the idea that a computer program will be writing the source XML, and provided the code is written correctly the output should also be correct. However, if you desire this extra validation or wish to do further work on the XML, it is advised that you look into creating your own DTD for your logfiles.

As a quick summary, competency in the following 4 areas is necessary:

XML
- Basic XML tag definitions/rules
- Not the DTD/Schema validation
- Read through the information at [3] and [6]
HTML
- Basic HTML document structure
- Text formatting (bold, underline, italic, fonts)
- How to construct tables
- Read through the description at [7] and the tutorials at [8]
3. XSLT
- Basic structure and usage of XSLT to produce an HTML output
- Use of parameters
- Conditional branching ('choose' and 'if' statements)
- Read through the description at [9] and the tutorials at [10]
4. JavaScript
- How to execute JavaScript from within an HTML context
- Basic JavaScript function definitions and variable declarations
- Read through the information at [11] and the tutorials at [12]

If you feel you're fairly proficient on all of these, you can skip to the next section, otherwise I suggest you read the recommended links and have a go at this article later.

Now that I've mentioned all of the core technologies that will be used in this article, it's necessary to mention a few of the necessary tools.

As previously stated, the required tools are all free and readily available. All you really need is a text editor and a web browser. Given that we're going to be integrating this directly with the software, I assume you already have that set up and working, it may well be that you can use your development tool suite.

Using Microsoft's Visual Studio 2002 (and later) it is possible to have your HTML page to be viewed in the browser component (where MSDN usually pops up), then another tab with the HTML source code, another tab with the XML source (for reference) and a final tab with the XSLT code. Changing something in one of these tabs, saving changes and then refreshing the original browser pane worked extremely well for me.

A simpler set of tools would be to use Notepad (or equivalent under Unix/Linux) and a browser such as FireFox [13].

During the course of developing this technology, I came across a variety of problems with cross browser compatibility (a.k.a. making it run on Mozilla and Internet Explorer) and it is quite easy to write a very clever logging system and have it fail to operate on a different browser. This article has been tested with FireFox and Internet Explorer 6 under WindowsXP SP2 and appears to work fine. One small caveat is that under SP2, Internet Explorer can get quite picky (with respect to security considerations) about what it will and won't allow – so you can end up having the browser refusing to view your own files!!

Writing the data to an XML file

Right, background aside, we can get onto the "meat" of this article – writing the XML based log files.

My personal implementation of this technology is done in C++ using a singleton class representing my log. C++ also offers some extremely powerful features via the C Pre-Processor directives that make coding log file output a breeze.

However, it is important to note that you can write XML data from pretty much any computer language currently in use – for the simple reason that it is just text data. It can get a little hairy when messing with Unicode/ASCII output, but this'll probably only bite you if you're working on internationalisation and/or your software logs its output in a language that doesn't fit into the ASCII table.

There are a variety of additional libraries that can be used within your language of choice to aid in reading/writing XML. Most Java distributions have implementations of SAX/DOM ([14] and [15]) parsers and writers, and the TinyXML library [16] comes well recommended for C++ developers. For this article, using one of these libraries may well be useful – but not required at all. Streams (C++) and Writers (Java) are perfectly sufficient to output the necessary information to an ".xml" text file.

Before I get into describing a C++ mechanism for writing the output to a file, an output format needs to be formed. The following is the general structure used by my log system:

<RunTimeLog>
  <LogHeader>
    …
  </LogHeader>
  <LogEvent>
    …
  </LogEvent>
</RunTimeLog>

That is, we wrap everything up in a single top-level RunTimeLog node, and then we have a header element, LogHeader, that describes general session and configuration information. The real log information is contained in a long list of LogEvent elements. For a short "test run" of my program I get around 1000 LogEvents recorded – a proper test session can generate a 1000 times this number!

The LogHeader element is included in the design to cover general-purpose information that can be used to identify the log file as well as general session/configuration information so that the environment under which the test was run can be recorded.

<LogHeader>
  <OutputLevel> … </OutputLevel>
  <Session>
    <Started>
      <Time> … </Time>
      <Date> … </Date>
    </Started>
    <Configuration>
      <Environment> … </Environment>
      <Processor>
        <Family> … </Family>
        <Vendor> … </Vendor>
        <Features> … </Features>
        <ClockSpeed> … </ClockSpeed>
      </Processor>
      <Memory>
        <Total> … </Total>
        <Available> … </Available>
      </Memory>
    </Configuration>
  </Session>
</LogHeader>

As you can see from the above listing, a substantial amount of information is recorded that will later be used in the HTML output/representation of the log file. Actually gathering the information that fills in the fields is somewhat platform specific – if you're a windows (Win32) developer then there are a substantial number of functions listed in the MSDN libraries.

The information proves to be particularly interesting – the Memory/Available field helped me find a problem with our platform when it was running on a (somewhat ancient!) 128mb windows laptop. The log file was full of warnings/errors that didn't appear on near-identical systems, but the inclusion of information about the environment pointed it out quite quickly.

The next element in the output XML structure is that which contains information each time a log-worthy event occurs:

<LogEvent id="…">
  <Type> … </Type>
  <TimeIndex> … </TimeIndex>
  <NameSpace> … </NameSpace>
  <File> … </File>
  <Function> … </Function>
  <LineNumber> … </LineNumber>
  <Message> … </Message>
</LogEvent>

The design of this particular element is going to be specific to your application – but the above outline should cover most people's needs. It's also dependent on what information is available to you at runtime.

It is worth noting that the information provided within the tags in these structures will later appear directly injected into the HTML representation. XSLT has some ability for mathematical formatting/processing, but it is much more limited than you might be used to. One particular area that I noticed this was in the TimeIndex field. Internally my engine runs on a millisecond accurate clock – such that "1min 13secs 228ms" is actually "73228" internally. This doesn't work too well when you want a human-readable log file format so it's worth making sure that you convert any internal values to ones that are easily read by a human.

In order to write this data to a text file, you can get away with having a simple function definition along the lines of:

void WriteLogEntry(int iEntryType,
                   std::string strNameSpace,
                   std::string strSourceFile,
                   std::string strFunction,
                   int iSourceLine,
                   std::string strMessage );

And then use some very straightforward std::ofstream output:

ofsXMLDatabase << "<NameSpace>"
               << strNameSpace
               << "</NameSpace>"
               << std::endl;

ofsXMLDatabase << "<File>"
               << strSourceFile
               << "</File>"
               << std::endl;

ofsXMLDatabase.flush( );

Note the flush() command at the end – this is quite important (equivalents also exist in other languages, such as Java). If you flush() the recently written output then it's committed to disk and no longer buffered somewhere in the runtime/OS. If your program manages to squeeze out a vital error message just before a massive crash but hasn't actually been committed to disk you can end up staring at a log file with no apparent errors. Simply because your process' memory has been reclaimed by the OS and the buffer that you "wrote" the log entry to has been eliminated before it made its way to the actual disk.

This next part is going to be quite specific to C/C++ programmers – some of the "magic" can be found in other languages, but it's particularly prevalent in C/C++. That is, making use of the Pre Processor Directives offered by the C language. Usually these are frowned upon as a brilliant way to complicate code and introduce some very obscure bugs. In this instance, it can be used to both control the amount of log file output as well as simplify the work a programmer has to do in order to commit valuable information to the log.

In our software library we have a MasterInclude.h header file that includes the log-file related macros and a few other useful utility macros and definitions. This particular header file finds it's way into the top of every translation unit that makes up the final product. Thus a few timesaving devices exist here…

//configure some basic precompilation defines
  //0 - no debugging wanted
  //1 - very basic debug output
  //2 - comprehensive debug output
  //3 - all output, and a transcript of all messages
#ifdef _DEBUG
  //Set the output level for 'DEBUG' builds
  #define SYSTEM_DEBUG_LEVEL 2
#else
  //Set the output level for 'RELEASE' builds
  #define SYSTEM_DEBUG_LEVEL 1
#endif

With those in place, anywhere in the project that someone wants to add additional debug output it can be wrapped up like this:

#if SYSTEM_DEBUG_LEVEL >= 2
  //Perform some extra debug verification here
#endif

The bonus of this is that for a normal run, where we believe things to be mostly "in order" we can reduce the level to '1' and look at a log file with about 150 entries detailing the key events and statistics (as well as major errors). If we're testing a new component, or a problem has been identified we can ramp up the output to '2' or '3' and look at absolutely everything the software did prior to the error/bug. Granted it does require a recompile for that particular component, but for our team this doesn't prove to be a substantial problem.

The next trick that I made use of with the C macro language was to use some of the system-defined macro values that the compiler exposes. In particular, __LINE__ and __FILE__. Microsoft's compilers also expose the __FUNCTION__ macro that is very useful; I also defined a __NAMESPACE__ value for each of the translation units.

If you notice, these macro defined values correspond with various fields in the XML's LogEvent element, as well as several of the parameters of the WriteLogEntry( ) function prototype. If we can get the compiler to generate this information for us, then all the better! So I define the following macro:

#define Log_Write_L1( linetype, linetext ) \
  CLogFile::GetSingletonPtr()->WriteLogEntry( \
      (linetype), \
      __NAMESPACE__, \
      __FILE__, \
      __FUNCTION__, \
      __LINE__, \
      (linetext) )

To further aid development, the valid linetypes were included as constant members of CLogFile. The final trick with the pre processor is as so:

#if SYSTEM_DEBUG_LEVEL == 3
  //enable all macros
  #define Log_Write_L1( linetype, linetext ) …
  #define Log_Write_L2( linetype, linetext ) …
  #define Log_Write_L3( linetype, linetext ) …
      
#elif SYSTEM_DEBUG_LEVEL == 2
  //enable levels 1..2 macros
  #define Log_Write_L1( linetype, linetext ) …
  #define Log_Write_L2( linetype, linetext ) …
  #define Log_Write_L3( linetype, linetext )
      
#elif SYSTEM_DEBUG_LEVEL == 1
  //enable level 1 macros
  #define Log_Write_L1( linetype, linetext ) …
  #define Log_Write_L2( linetype, linetext )
  #define Log_Write_L3( linetype, linetext )
      
#else
  //disable macros
  #define Log_Write_L1( linetype, linetext )
  #define Log_Write_L2( linetype, linetext )
  #define Log_Write_L3( linetype, linetext )
      
#endif

Where, in the above bit of code, "…" is the full expansion mentioned above (but with a slightly different suffix). To save space I omitted it from the above listing. Note that under varying SYSTEM_DEBUG_LEVELs some of the macros actually expand to nothing – that is, no code will be inserted in their place. This is by intention – throwing a number of strings and writing to files can be a slow process (our platform, at level 3, runs at about 40% of the speed it does at level 0) so when the log entries aren't actually wanted they, quite literally, don't even exist in the build.

When actually developing the code, the programmer can now write something like:

Log_Write_L3( CLogFile::DEBUG, "Function entered" );

The compiler will fill in all the details about the functions name and location for us. Less work for the programmer often means that there is even less of an excuse to document the flow/state of your software at runtime…

If you're developing in a different language then there are various tricks along similar lines that you can employ. Defining constant/static Boolean's will often see compilers removing them as "dead code" where appropriate.

Prettifying XML using XSLT

Contents

	Introduction
	Basic requirements and background
	Prettifying XML using XSLT
	Enhancing XSLT using JavaScript
	Conclusion

	Printable version
	Discuss this article