How to avoid learning XML


May 18, 2003

Revision History
Revision 1.1May 18, 2004
Mentioning new editors like XMLMind's XXE and Syntext's Serna. Other minor updates.
Revision 1.0March 25, 2003
Version delivered to Online SIG at local STC meeting.
Revision .1March 23, 2003
Initial draft.

The title of this talk is actually an allusion to Mike Smith's article Don't learn XML , where he suggests using the DocBook DTD and stylesheets to get started using XML for documentation rather than starting from scratch, writing your own DTD, stylesheets, and so on. My title is more than a little misleading. I'll explain how to get started with DocBook without learning much XML, but I also hope to show you how to get started learning the technologies that make a DocBook-based system tick.

What is this “DocBook” you speak of?

More broadly defined, DocBook is the DTD mention above and the mechanisms to transform DocBook documents to a useful format.

Of the existing mechanisms, two are free (as in freedom and beer) open source and a third is commercial:

XML facilitates creating multiple output from a single source. All of the documents linked below were created from the identical XML file. This talk also is what it is about. I wrote it in a form of DocBook (the slides DTD) and use XSLs to transform it into various output types.

  • The slide show you're looking at now/a single, monolithic htmlIf you are viewing this page on the web, open this page in the Opera web browser and press F11 to view as a full screen slide show.

  • “chunked” html (example) : each chapter, section, and so on is broken into a separate html page to create an online book.

  • HTML Help or RoboHELP's WebHelp (example) : We also created chms from our server books because that format was more convenient in some circumstances. We used a less booklike variant for help sets. If you have a chm file, you can easily create WebHelp by opening the .hhp file in RoboHELP and generating the WebHelp. This requires owning RoboHELP.

  • Print output using the stock DocBook XSLs: the output of the DocBook XSL stylesheets if you don't customize anything.

  • The Motive pdf (example) : this is the output from the stylesheets we use at Motive. After Motive acquired BroadJump, I customized the DocBook stylesheets to mimic the FrameMaker template they were using.

  • Eclipse documentation plugins.

  • The BroadJump pdf (example) : this is the output from the stylesheets we used at BroadJump.

  • The BroadJump “technical whitepaper” pdf (example) : We created this stylesheet for a series of whitepapers that described our product's architecture.

  • The DocBook 'slides' XSLs : A browser based slide show that doesn't depend on Opera like the one you're viewing now does. There are XSLs to create html with and without frames as well as fo/pdf.

  • You can even create a Word doc out of it. (example)

WYSIWYG is ultimately impossible any time you produce multiple outputs from a single source. This subject is discussed every few months on one of the XML or DocBook lists.

See a list of DocBook authoring tools at the DocBook Wiki.

In addition to providing help authoring instances of DocBook XML (or any DTD) by indicating what elements are valid at what points, these tools validate documents. When a tool validates a document, it compares the XML to its DTD to see if it conforms to the DTD. If the document does not conform to the DTD, the tools typically try to indicate what and where the problem is, though they can't always tell you exactly what is wrong.

There are a few other editors at various levels of maturity. I hear Oxygen and Morphon mentioned sometimes.

FO (Formatting Objects) is part of the w3c's XSL specification. It is like html in that it describes how content should be presented, but unlike html, fo focuses on the printed page, so it provides for headers, footers, page number, and so on.


XSL is a language for expressing stylesheets. It consists of three parts: XSL Transformations (XSLT): a language for transforming XML documents, the XML Path Language (XPath), an expression language used by XSLT to access or refer to parts of an XML document. (XPath is also used by the XML Linking specification). The third part is XSL Formatting Objects: an XML vocabulary for specifying formatting semantics. An XSL stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses the formatting vocabulary.

-- w3c  

See the DocBook Wiki for a complete list.

The XSL stylesheet distribution comes with several sets of stylesheets. The folder names indicate their purpose: fo, html, htmlhelp. See Profiling for information about profiling.


Use docbook.xsl to generate a flat html file.

Use profile-docbook.xsl to generate a flat html file using profiling.

Use chunk.xsl...

Use profile-chunk.xsl...


Use htmlhelp.xsl to generate the collection of html pages, .hhc, .hhp, and .hhk files necessary to make a chm.

Use profile-htmlhelp.xsl to generate the collection of html pages, .hhc, .hhp, and .hhk files necessary to make a chm using profiling.


Use docbook.xsl to generate an xsl-fo file.

Use profile-docbook.xsl to generate an xsl-fo file using profiling.

The problem with DocBook is not a lack of documentation. In fact, there are probably too many guides for getting started that well meaning people have posted on the web over the years, but you may have difficulty figuring out which of these guides is current, which addresses your needs, and so on.

Notice that you have to go to two separate places to get the DTD and stylesheets. That may seem strange (and even inconvenient), but there's a reason behind it. The DTD dictates the semantics of a DocBook document. There are some processing expectations associated with many elements, but it is not a requirement and further, there is no presupposition about what tool will be used to process the document. The stylesheets maintained at the DocBook Open Repository are not the last word on processing DocBook documents. You are free to build or buy a different implementation because you own your data.