DocBook Tutorial

Using DocBook you may obtain both well structured and beautiful documents

Adrian Giurca


1. Why ?

The main selling point for DocBook is its portability. A document written in DocBook markup can be converted into HTML, PostScript, PDF, RTF, DVI, and plain ASCII text easily and quickly without any expensive tools. In fact, DocBook and all of the tools used to work with DocBook are freely available under open source licenses. DocBook documents are plain text, and can be edited with any text editor or word processor that can save documents as plain ASCII text. Note that if you use a word processor, take extra care to save DocBook documents as plain text; otherwise they will not parse correctly. If you'll want to use your documentation in more than one format, like print and online, you'll find DocBook is a great solution.

Another advantage of DocBook is that it frees the author from worrying about the formatting and layout of a document. DocBook is only concerned with the structure of a document. For instance, an author simply uses the DocBook markup to indicate text that should be emphasized with the <emphasis> tag. This is one less thing for the author to worry about while writing a document.

2. What is DocBook ?

DocBook is an SGML dialect developed by O'Reilly and HaL Computer Systems in 1991. It is currently maintained by the Organization for the Advancement of Structured Information Standards (OASIS). DocBook describes the content of articles, books, technical manuals, and other documents. Although DocBook is focused on technical writing styles, it is general enough to describe most prose writing. In this tutorial, I'll discuss an XML variant of the DocBook DTD that is also available.

The first and ultimate key to time-resistant documents is using open standards, such as XML/SGML, for document formats. These open standards comprise two elements:

  • Syntax, or what a document must look like l. The syntax of a DocBook document is wholly contained in the simple rules of XML markup and in the DocBook DTD inherent in every DocBook document.

  • Semantics, or what a document means l. The semantics are slightly less distinct. For example, the DTD contains certain semantic features that determine which elements can or must occur inside other elements. The DocBook tags are applied so that they have a certain "common sense" semantic content, at least to English speakers. But other, more detailed semantic issues rely on specific publication guidelines, common usage rules, and editorial judgments (for example, governing the type of list that is appropriate in a certain place in the text). Note that the DocBook manuals, can give you some information on general semantic guidelines, but various publications may have more specific guidelines.

It is important to keep in mind that a DocBook document annotates the semantics of the document, not its typography or appearance. This focus on document semantics stands in contrast to the focus of word processors. Word processors often allow style sheets that help you mark conceptual categories like "Header, Level 2," but increasingly they attempt to deliver "what you see is what you get" (WYSIWYG). Even style sheets are rarely uniform across documents. This approach makes broad assumptions about things such as page size and layout, available fonts, and typestyles of elements. Most of these assumptions have little to do with the actual conceptual meaning of the text. And almost all of them make it more difficult to adapt the document to a different format -- whether it be a different printed layout, onscreen display, speech-synthesized version, or an index for Web robots. HTML, originally similar (albeit simpler) to DocBook, has added more and more typographic tags, so that it is currently a mixin of semantics and typography (for example, <h2> versus <b>).

3. Creating a document with DocBook

Creating a document with DocBook is easy. We'll focus on creating a document using the XML DTD. With the exception of the document declaration, everything in this article should apply to SGML DTD as well as the XML DTD. There are two kind of DocBook DTDs: Complete DocBook (http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd) and Simplified DocBook (http://www.oasis-open.org/docbook/xml/simple/1.0/sdocbook.dtd). The examples from this section was maded with the simplified version of DocBook.

To begin, fire up your favorite text editor[1] and create a new document. The first we must make a document declaration. At the beginning use the simplified format of the DocBook DTD. Note that, every DocBook document requires a document declaration to be considered valid. Here is a DOCTYPE declaration for a DocBook article using the simplified DTD:

<!DOCTYPE article PUBLIC "-//OASIS//DTD Simplified DocBook XML V1.0//EN"
"http://www.oasis-open.org/docbook/xml/simple/1.0/sdocbook.dtd">

In the following we have a DOCTYPE declaration for a DocBook article using the complete DTD:

<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
"http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd">

Now we'll start adding a little meat to the document. We'll start with a title, author information, and a short paragraph. This brief example shows a few of the basic DocBook tags, or elements, in use. While DocBook elements may look similar to HTML tags, remember that DocBook parsers are much more demanding than your average Web browser. While you can get away with not declaring an HTML document, or even skipping some "required" tags, DocBook is not quite so forgiving. Be careful to include all required elements and use them in their proper order.

Example Example 3.1. Using elements

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD Simplified DocBook XML V1.0//EN"
"http://www.oasis-open.org/docbook/xml/simple/1.0/sdocbook.dtd">
<article>
  <title>DocBook Tutorial</title>
  <articleinfo>
    <author>
      <firstname>Adrian</firstname>
      <surname>Giurca</surname>
    </author>
    <date>April 5, 2005</date>
  </articleinfo>
  <section>
    <title>What is DocBook ?</title>
    <para>DocBook is an SGML dialect developed by O'Reilly and HaL Computer
    Systems in 1991. It is currently maintained by the Organization for the
    Advancement of Structured Information Standards (OASIS). DocBook describes
    the content of articles, books, technical manuals, and other documents.
    Although DocBook is focused on technical writing styles, it is general
    enough to describe most prose writing. In this article, I'll discuss an
    XML variant of the DocBook DTD that is also available.
    </para>
  </section>
</article>

Each section of the document has a <title> element to encode the title of the section. Most of the elements used here are self-explanatory. Some of the elements, such as the <firstname> and <surname> elements, are only valid when nested inside their parent elements. The <firstname> and <surname> elements, for example, are valid nested within their parent element <author> but would not be valid if used within the <para> element. In the complete standard, DocBook there are five levels of the <section> element depends of the level of nesting. So, you cannot skip from a level one section to a level three section. The next element is the <para> element. The <para> element is easy to remember because it stands for paragraph. You will probably find that the majority of DocBook elements make sense, and you probably won't need to look up the common elements after writing one or two documents with DocBook. Take a look on another example:

Example Example 3.2. Using attributes

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD Simplified DocBook XML V1.0//EN"
"http://www.oasis-open.org/docbook/xml/simple/1.0/sdocbook.dtd">
<article class="techreport" lang="en" status="draft">
  <title>DocBook Tutorial</title>
  <articleinfo>
    <author>
      <firstname>Adrian</firstname>
      <surname>Giurca</surname>
    </author>
    <date>April 5, 2005</date>
  </articleinfo>
  <section label="1" revisionflag="added" status="finnished">
    <title>Why ?</title>
    <para>The main selling point for <acronym>DocBook</acronym> is its
    portability. A document written in <acronym>DocBook</acronym> markup can
    be converted into <acronym>HTML</acronym>, <acronym>PostScript</acronym>,
    <acronym>PDF</acronym>, <acronym>RTF</acronym>, <acronym>DVI</acronym>,
    and plain <acronym>ASCII</acronym> .....</para>
    <para>Another advantage of <acronym>DocBook </acronym>is that .... 
    For instance, an author simply uses the
    <acronym>DocBook</acronym> markup to indicate text that should be
    <emphasis>emphasized</emphasis> with the
    <literal>&lt;emphasis&gt;</literal> tag. This is one less thing for the
    author to worry about while writing a document.</para>
    <section>
      <title>A subsection</title>
      <para></para>
    </section>
  </section>
  <section label="2" revisionflag="changed" status="unfinnished">
    <title>What is DocBook ?</title>
    <para><acronym>DocBook</acronym> is an ...</para>
    <para>The first and ultimate key to time-resistant documents is using open
    standards, such as XML/SGML, for document formats. These open standards
    comprise two elements:</para>
    <itemizedlist>
      <listitem>
        <para>Syntax, or what a document must look like l</para>
      </listitem>
      <listitem>
        <para>Semantics, or what a document means l</para>
      </listitem>
    </itemizedlist>
  </section>
</article>

Some elements in DocBook can also include attributes that further describe the element. The <section> element in the above example includes a label and status attributes. Also, we include some other elements like <acronym>, <emphasis> and <literal>. For a good structuring we use a list defined by <itemizedlist> element and items by <listitem> element.

<section label="2" revisionflag="changed" status="unfinnished">

Generally, elements have optional attributes. However, some elements like the <ulink> element require an attribute.

<ulink url="http://www.oasis-open.org">OASIS</ulink>

If you're unsure, check the official DocBook documentation to see what attributes are applicable to the elements you are using (see References).

4. Some of most used DocBook elements

This section is devoted to a short presentation of the most used elements in simplified DocBook DTD.

4.1. <article>

An article.

4.1.1. Content model

article ::=
((title,subtitle?,titleabbrev?)?,
 articleinfo?,
 (((itemizedlist|orderedlist|variablelist|note|literallayout|
    programlisting|para|blockquote|mediaobject|informaltable|
    example|figure|table|sidebar|abstract|authorblurb|epigraph)+,
   section*)|
  section+),
 ((appendix)|
  bibliography)*)

4.1.2. Description

The article element is a general-purpose container for articles. The content model is both quite complex and rather loose in order to accommodate the wide range of possible article structures. An Article is composed of a header and a body. The body may include a table of contents and multiple lists of tables, figures, and so on, before the main text of the article and may include a number of common end-matter components at the end.

4.1.3. Attributes

  • class - Class identifies the type of article. As values we can have: faq, journalarticle, productsheet, specification, techreport, whitepaper.

  • parentbook - ParentBook holds the ID of an enclosing Book, if applicable.

  • status - Status identifies the editorial or publication status of the Article. Publication status might be used to control formatting (for example, if the value is draft then, printing a "draft" watermark on drafts) or processing (perhaps a document with a status of final should not include any components that are not final).

4.1.4. Processing expectations

Formatted as a displayed block. Frequently causes a forced page break in print media. May be numbered separately and presented in the table of contents.

4.1.5. Children

The following elements occur in article: abstract, appendix, articleinfo, authorblurb, bibliography, blockquote, epigraph, example, figure, informaltable, itemizedlist, literallayout, mediaobject, note, orderedlist, para, programlisting, section, sidebar, subtitle, table, title, titleabbrev, variablelist.

4.1.6. Example

<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
          "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<article>
<articleinfo>
  <author><firstname>Norman</firstname><surname>Walsh</surname></author>
  <authorinitials>ndw</authorinitials>
  <artpagenums>339-343</artpagenums>
  <volumenum>15</volumenum>
  <issuenum>3</issuenum>
  <publisher><publishername>The TeX User's Group</publishername></publisher>
  <pubdate>1994</pubdate>
  <title>A World Wide Web Interface to CTAN</title>
  <titleabbrev>CTAN-Web</titleabbrev>
  <revhistory>
     <revision>
        <revnumber>1.0</revnumber>
        <date>28 Mar 1994</date>
        <revremark>Submitted.</revremark>
     </revision>
     <revision>
        <revnumber>0.5</revnumber>
        <date>15 Feb 1994</date>
        <revremark>First draft for review.</revremark>
     </revision>
  </revhistory>
</articleinfo>
<para>
The body of the article &hellip;
</para>
</article>

4.2. <articleinfo>

Meta-information for an Article.

4.2.1. Content model

articleinfo ::=
((mediaobject|legalnotice|subjectset|keywordset|abbrev|abstract|
  author|authorgroup|bibliomisc|copyright|corpauthor|date|edition|
  editor|issuenum|othercredit|pubdate|publishername|releaseinfo|
  revhistory|subtitle|title|titleabbrev|volumenum|citetitle|
  honorific|firstname|surname|lineage|othername|affiliation|
  authorblurb)+)

4.2.2. Attributes

Common attributes

4.2.3. Description

The articleinfo element is a wrapper for a large collection of meta-information about an article. Much of this data is bibliographic in nature.

4.2.4. Processing expectations

Suppressed. Many of the elements in this wrapper may be used in presentation, but they are not generally printed as part of the formatting of the wrapper. It merely serves to identify where they occur.

4.2.5. Parents

These elements contain articleinfo: article.

4.2.6. Children

The following elements occur in articleinfo: abbrev, abstract, affiliation, author, authorblurb, authorgroup, bibliomisc, citetitle, copyright, corpauthor, date, edition, editor, firstname, honorific, issuenum, keywordset, legalnotice, lineage, mediaobject, othercredit, othername, pubdate, publishername, releaseinfo, revhistory, subjectset, subtitle, surname, title, titleabbrev, volumenum.

4.2.7. Example

 <articleinfo>
    <author>
      <firstname>Adrian</firstname>
      <surname>Giurca</surname>
    </author>
    <date>April 5, 2005</date>
    <affiliation>
      Dept. of Computer Science, BTU Cottbus
    </affiliation>
  </articleinfo>

4.3. <revhistory>

A history of the revisions to a document.

4.3.1. Content model

revhistory ::=
(revision+)

4.3.2. Attributes

Common attributes.

4.3.3. Description

revhistory is a structure for documenting a history of changes, specifically, a history of changes to the document or section in which it occurs. DocBook does not mandate an order for revisions: ascending order by date, descending order by date, and orders based on some other criteria are all equally acceptable.

4.3.4. Processing expectations

Formatted as a displayed block. A tabular or list presentation is most common. The order of revisions within a revhistory (ascending or descending date order, for example) is not mandated by DocBook.

4.3.5. Parents

These elements contain revhistory: articleinfo, bibliomixed, bibliomset, objectinfo, sectioninfo, subtitle, title, titleabbrev.

4.3.6. Children

The following elements occur in revhistory: revision.

4.3.7. Example

<!DOCTYPE revhistory PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
          "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<revhistory>
 <revision>
  <revnumber>0.91</revnumber>
  <date>11 Dec 1996</date>
  <authorinitials>ndw</authorinitials>
  <revremark>Bug fixes</revremark>
 </revision>
 <revision>
  <revnumber>0.90</revnumber>
  <date>30 Nov 1996</date>
  <authorinitials>ndw</authorinitials>
  <revremark>First beta release</revremark>
 </revision>
</revhistory>

4.4. <section>

A recursive section

4.4.1. Content model

section ::=
(sectioninfo?,
 (title,subtitle?,titleabbrev?),
 (((itemizedlist|orderedlist|variablelist|note|literallayout|
    programlisting|para|blockquote|mediaobject|informaltable|
    example|figure|table|sidebar|abstract|authorblurb|epigraph)+,
   section*)|
  section+))

4.4.2. Attributes

  • label - Label specifies an identifying string for presentation purposes. Generally, an explicit label attribute is used only if the processing system is incapable of generating the label automatically. If present, the label is normative; it will used even if the processing system is capable of automatic labelling.

  • status- Status identifies the editorial or publication status of the Section. See also <article> tag.

4.4.3. Description

Section is one of the top-level sectioning elements in a component. There are three types of sectioning elements in DocBook:

  1. Explicitly numbered sections, sect1 sect5, which must be properly nested and can only be five levels deep.

  2. Recursive Sections, which are an alternative to the numbered sections and have unbounded depth.

  3. SimpleSects, which are terminal. SimpleSects can occur as the "leaf" sections in either recursive sections or any of the numbered sections, or directly in components.

Sections may be more convenient than numbered sections in some authoring environments because they can be moved around in the document hierarchy without renaming. None of the sectioning elements is allowed to "float" in a component. You can place paragraphs and other block elements before a section, but you cannot place anything after it.

4.4.4. Processing expectations

Formatted as a displayed block. Sometimes sections are numbered. Use of deeply nested Sections may cause problems in some processing systems.

4.4.5. Parents

These elements contain section: appendix, article, section.

4.4.6. Children

The following elements occur in section: abstract, authorblurb, blockquote, epigraph, example, figure, informaltable, itemizedlist, literallayout, mediaobject, note, orderedlist, para, programlisting, section, sectioninfo, sidebar, subtitle, table, title, titleabbrev, variablelist.

4.4.7. Example

<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
          "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<chapter>
  <title>Test Chapter</title>
  <para>This chapter uses recursive sections.</para>
  <section>
   <sectioninfo>
    <abstract><para>A trivial example of recursive sections.</para>
    </abstract>
   </sectioninfo>
   <title>Like a Sect1</title>
   <subtitle>Or How I Learned to Let Go of Enumeration and Love to Recurse</subtitle>
   <para>This section is like a Sect1.</para>
   <section>
     <title>Like a Sect2</title>
     <para>This section is like a Sect2.</para>
     <section>
      <title>Like a Sect3</title>
       <para>This section is like a Sect3.</para>
        <section>
         <title>Like a Sect4</title>
         <para>This section is like a Sect4.</para>
         <section>
           <title>Like a Sect5</title>
           <para>This section is like a Sect5.</para>
           <section>
            <title>Would be like a Sect6</title>
            <para>This section would be like a Sect6, if there was one.</para>
           </section>
          </section>
        </section>
       </section>
      </section>
   </section>
</chapter>

4.5. <para>

A paragraph.

4.5.1. Content model

para ::=
(#PCDATA|footnoteref|xref|abbrev|acronym|citetitle|emphasis|
 footnote|phrase|quote|trademark|link|ulink|command|
 computeroutput|email|filename|literal|option|replaceable|
 systemitem|userinput|inlinemediaobject)*

4.5.2. Attributes

Common attributes.

4.5.3. Description

A para is a paragraph. Paragraphs in DocBook may contain almost all inlines and most block elements. Sectioning and higher-level structural elements are excluded. DocBook offers two variants of paragraph: simpara, which cannot contain block elements, and formalpara, which has a title. Some processing systems may find the presence of block elements in a paragraph difficult to handle. On the other hand, it is frequently most logical, from a structural point of view, to include block elements, especially informal block elements, in the paragraphs that describe their content. There is no easy answer to this problem.

4.5.4. Processing expectations

Formatted as a displayed block.

4.5.5. Parents

These elements contain para: abstract, appendix, article, authorblurb, bibliodiv, bibliography, blockquote, caption, entry, epigraph, example, footnote, legalnotice, listitem, note, revdescription, section, sidebar, textobject.

4.5.6. Children

The following elements occur in para: abbrev, acronym, citetitle, command, computeroutput, email, emphasis, filename, footnote, footnoteref, inlinemediaobject, link, literal, option, phrase, quote, replaceable, systemitem, trademark, ulink, userinput, xref.

4.5.7. Example

Simple paragraphs can contain most block elements:

<!DOCTYPE para PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
          "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<para>
 The component suffered from three failings:
 <itemizedlist>
   <listitem>
    <para>It was slow</para>
   </listitem>
   <listitem>
    <para>It ran hot</para>
   </listitem>
   <listitem>
     <para>It didn't actually work</para>
   </listitem>
 </itemizedlist>
 Of these three, the last was probably the most important.
</para>

Formal paragraphs include a title:

<!DOCTYPE formalpara PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
          "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<formalpara>
  <title>A Test</title>
  <para>
    This is a test.  This is only a test.  Had this been a real example, it would have made 
    more sense.
  </para>
</formalpara>

Simple paragraphs may not contain block elements:

<!DOCTYPE simpara PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
          "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<simpara>
 Just the text, ma'am.
</simpara>

4.6. <orderedlist>

A list in which each entry is marked with a sequentially incremented label

4.6.1. Content model

orderedlist ::=
((title,titleabbrev?)?,
 listitem+)

4.6.2. Attributes

Common attributes and

  • continuation - If Continuation is specified, it indicates how list numbering should begin relative to the immediately preceding list. Restarts, the default, indicates that numbering should begin again at 1. Continues indicates that numbering should begin where the preceding list left off.

  • inheritnum - In a nested list, InheritNum indicates whether or not the enumeration of interior lists should include the numbers of containing list items. If InheritNum is Inherit then the third item of a list inside the second item of a list inside the fourth item of a list might be enumerated as "4.2.3". If it is Ignore, the default, then it would be simply "3". (The Numeration attribute controls the actual format of the item numbers, of course.)

  • numeration - Numeration specifies the style of numbering to be used for items in the current OrderedList.

  • spacing - Spacing indicates whether or not the vertical space in the list should be minimized.

4.6.3. Description

In an orderedlist, each member of the list is marked with anumeral, letter, or other sequential symbol (such as roman numerals).

4.6.4. Processing expectations

Formatted as a displayed block. If no value is specified for numeration, Arabic numerals (1, 2, 3, ) are to be used. In nested lists, DocBook does not specify the sequence of numerations. Note that the attributes of orderedlist have a significant influence on the processing expectations.

4.6.5. Parents

These elements contain orderedlist: appendix, article, bibliodiv, bibliography, blockquote, caption, entry, example, footnote, legalnotice, listitem, note, revdescription, section, sidebar, textobject.

4.6.6. Children

The following elements occur in orderedlist: listitem, title, titleabbrev.

4.6.7. Example

<!DOCTYPE orderedlist PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
          "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<orderedlist numeration="lowerroman">
 <listitem>
   <para>One</para>
 </listitem>
 <listitem>
  <para>Two</para>
 </listitem>
 <listitem>
  <para>Three</para>
 </listitem>
 <listitem>
  <para>Four</para>
 </listitem>
</orderedlist>

4.7. <itemizedlist>

A list in which each entry is marked with a bullet or other dingbat

4.7.1. Content model

itemizedlist ::=
((title,titleabbrev?)?,
 listitem+)

4.7.2. Attributes

  • mark - Mark contains a keyword indicating the type of mark to be used on items in this ItemizedList. DocBook does not provide a fixed list of appropriate keywords.

  • spacing - Spacing indicates whether or not the vertical space in the list should be minimized.

4.7.3. Description

In an itemizedlist, each member of the list is marked with a bullet, dash, or other symbol.

4.7.4. Processing expectations

Formatted as a displayed block. DocBook specifies neither the initial mark nor the sequence of marks to be used in nested lists. If explicit control is desired, the mark attribute should be used. The values of the mark attribute are expected to be keywords, not representations (numerical character references, entities, and so on.) of the actual mark. In order to enforce a standard set of marks at your organization, it may be useful to construct a customization layer that limits the values of the mark attribute to an enumerated list.

4.7.5. Parents

These elements contain para: abstract, appendix, article, authorblurb, bibliodiv, bibliography, blockquote, caption, entry, epigraph, example, footnote, legalnotice, listitem, note, revdescription, section, sidebar, textobject.

4.7.6. Children

The following elements occur in para: abbrev, acronym, citetitle, command, computeroutput, email, emphasis, filename, footnote, footnoteref, inlinemediaobject, link, literal, option, phrase, quote, replaceable, systemitem, trademark, ulink, userinput, xref.

4.7.7. Example

<!DOCTYPE itemizedlist PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
          "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<itemizedlist mark='opencircle'>
<listitem>
<para>
TeX and LaTeX
</para>
</listitem>
<listitem override='bullet'>
<para>
Troff
</para>
</listitem>
<listitem>
<para>
Lout
</para>
</listitem>
</itemizedlist>

4.8. <literal>

Inline text that is some literal value.

4.8.1. Content model

literal ::=
(#PCDATA|link|ulink|command|computeroutput|email|filename|literal|
 option|replaceable|systemitem|userinput|inlinemediaobject)*

4.8.2. Attributes

Common attributes and

  • moreinfo - If MoreInfo is set to refentry, it implies that a RefEntry exists which further describes the literal.

4.8.3. Description

A literal is some specific piece of data, taken literally, from a computer system. It is similar in some ways to userinput and computeroutput, but is somewhat more of a general classification. The sorts of things that constitute literals varies by domain.

4.8.4. Processing expectations

Formatted inline. A literal is frequently distinguished typographically and literal is often used wherever that typographic presentation is desired. The moreinfo attribute can help generate a link or query to retrieve additional information.

4.8.5. Parents

These elements contain literal: attribution, bibliomisc, citetitle, command, computeroutput, emphasis, entry, lineannotation, link, literal, literallayout, para, phrase, programlisting, quote, subtitle, term, title, titleabbrev, trademark, ulink, userinput.

4.8.6. Children

The following elements occur in literal: command, computeroutput, email, filename, inlinemediaobject, link, literal, option, replaceable, systemitem, ulink, userinput.

4.8.7. Example

<!DOCTYPE para PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
          "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<para>There are several undocumented settings for <varname>debug</varname>,
among them <literal>3.27</literal> to enable a complete trace and
<literal>3.8</literal> to debug the spell checker. For a complete
list of the possible settings,
see <filename class="headerfile">edit/debug.h</filename>.</para>

4.9. <programlisting>

A literal listing of all or part of a program.

4.9.1. Content model

programlisting ::=
(#PCDATA|footnoteref|xref|abbrev|acronym|citetitle|emphasis|
 footnote|phrase|quote|trademark|link|ulink|command|
 computeroutput|email|filename|literal|option|replaceable|
 systemitem|userinput|inlinemediaobject|lineannotation)*

4.9.2. Attributes

  • format - The format attribute applies the linespecific notation to all ProgramListings. All white space and line breaks must be preserved.

  • linenumbering - (in version [4.0]) Line numbering indicates whether or not the lines of a ProgramListing are to be automatically numbered. The details of numbering (every line or only selected lines, on the left or right, etc.) are left up to the processing application. Be aware that not all processors are capable of numbering lines.

  • width - the width attribute specifies the width (in characters) of the longest line in this ProgramListing (formatters may use this value to determine scaling or rotation).

4.9.3. Description

A programlisting is a verbatim environment for program source or source fragment listings. ProgramListings are often placed in Examples or Figures so that they can be cross-referenced from the text.

4.9.4. Processing expectations

Formatted as a displayed block. This element is displayed "verbatim"; whitespace and linebreaks within this element are significant. ProgramListings are usually displayed in a fixed width font. Other markup within a ProgramListing is recognized. Contrast this with systems like LaTeX, in which verbatim environments disable markup recognition. If you want to disable markup recognition, you must use a CDATA section:

<programlisting>
<![CDATA[
This is a programlisting so white space and line breaks are significant.  
But it is also a CDATA section so <emphasis>tags</emphasis> and &entities;
are not recognized.  The only markup that is recognized is the end-of-section marker, 
which is two "]"'s in a row followed by a >.
]]>
</programlisting>

Two markup tags have special significance in ProgramListings: co and lineannotation. A co identifies the location of a Callout. A lineannotation is a comment, added by the documentor-not by the programmer.

This element is displayed "verbatim"; whitespace and linebreaks within this element are significant.

4.9.5. Parents

These elements contain programlisting: appendix, article, bibliodiv, bibliography, blockquote, caption, entry, example, figure, footnote, legalnotice, listitem, note, revdescription, section, sidebar, textobject.

4.9.6. Children

The following elements occur in programlisting: abbrev, acronym, citetitle, command, computeroutput, email, emphasis, filename, footnote, footnoteref, inlinemediaobject, lineannotation, link, literal, option, phrase, quote, replaceable, systemitem, trademark, ulink, userinput, xref.

4.9.7. Example

See the <example> Section.

4.10. <ulink>

A link that addresses its target by means of a URL (Uniform Resource Locator).

4.10.1. Content model

ulink ::=
(#PCDATA|footnoteref|xref|abbrev|acronym|citetitle|emphasis|
 footnote|phrase|quote|trademark|link|ulink|command|
 computeroutput|email|filename|literal|option|replaceable|
 systemitem|userinput|inlinemediaobject)*

4.10.2. Attributes

Common attributes.

  • type - Type is available for application-specific customization of the linking behavior.

  • url - URL specifies the Uniform Resource Locator that is the target of the ULink.

4.10.3. Description

The ulink element forms the equivalent of an HTML anchor (<A HREF="...">) for cross reference by a Uniform Resource Locator (URL).

4.10.4. Processing expectations

Formatted inline. When rendered online, it is natural to make the content of the ulink element an active link. When rendered in print media, the URL might be ignored, printed after the text of the link, or printed as a footnote. When the content of the ULink element is empty, i.e., for either of the following cases: <ulink url="..."/> and <ulink url="..."></ulink>, the content of the url attribute should be rendered as the text of the link. Linking elements must not be nested within other linking elements (including themselves). Because DocBook is harmonizing towards XML, this restriction cannot easily be enforced by the DTD. The processing of nested linking elements is undefined.

4.10.5. Parents

These elements contain ulink: abbrev, acronym, attribution, authorinitials, bibliomisc, citetitle, command, computeroutput, corpauthor, date, edition, email, emphasis, entry, figure, firstname, holder, honorific, issuenum, jobtitle, lineage, lineannotation, link, literal, literallayout, orgname, othername, para, phrase, programlisting, pubdate, publishername, quote, releaseinfo, replaceable, revnumber, revremark, subtitle, surname, term, title, titleabbrev, trademark, ulink, userinput, volumenum, year.

4.10.6. Children

The following elements occur in ulink: abbrev, acronym, citetitle, command, computeroutput, email, emphasis, filename, footnote, footnoteref, inlinemediaobject, link, literal, option, phrase, quote, replaceable, systemitem, trademark, ulink, userinput, xref.

4.10.7. Example

<!DOCTYPE para PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
          "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<para>
For more information, see the O'Reilly catalog entry for
<ulink url="http://www.ora.com/catalog/tex/"><citetitle>Making TeX
Work</citetitle></ulink>.
</para>

4.11. <link>

A hypertext link.

4.11.1. Content model

link ::=
(#PCDATA|footnoteref|xref|abbrev|acronym|citetitle|emphasis|
 footnote|phrase|quote|trademark|link|ulink|command|
 computeroutput|email|filename|literal|option|replaceable|
 systemitem|userinput|inlinemediaobject)*

4.11.2. Attributes

  • endterm - Endterm points to the element whose content is to be used as the text of the link. If endterm is supplied on a link which has content, the value of endterm should be ignored.

  • linkend - Linkend points to the target of the link.

  • type - Type is available for application-specific customization of the linking behavior.

4.11.3. Description

link is a general purpose hypertext element. Usually, link surrounds the text that should be made "hot," (unlike XRef which must generate the text) but the endterm attribute can be used to copy text from another element.

4.11.4. Processing expectations

Formatted inline.

If the link element has content, then that content is processed for output as the "hot" text. If the link element has content and an endterm attribute, then the content is used and the endterm is ignored. If the link element has an endterm attribute and no content, then the content of the element pointed to by endterm should be repeated at the location of the link and used as the "hot" text.

Linking elements must not be nested within other linking elements (including themselves). Because DocBook is harmonizing towards XML, this restriction cannot easily be enforced by the DTD. The processing of nested linking elements is undefined.

4.11.5. Parents

These elements contain link: abbrev, acronym, attribution, authorinitials, bibliomisc, citetitle, command, computeroutput, corpauthor, date, edition, email, emphasis, entry, figure, firstname, holder, honorific, issuenum, jobtitle, lineage, lineannotation, link, literal, literallayout, orgname, othername, para, phrase, programlisting, pubdate, publishername, quote, releaseinfo, replaceable, revnumber, revremark, subtitle, surname, term, title, titleabbrev, trademark, ulink, userinput, volumenum, year.

4.11.6. Children

The following elements occur in link: abbrev, acronym, citetitle, command, computeroutput, email, emphasis, filename, footnote, footnoteref, inlinemediaobject, link, literal, option, phrase, quote, replaceable, systemitem, trademark, ulink, userinput, xref.

4.11.7. Example

<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
          "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<section>
 <title>Examples of <literal>link</literal></title>
 <para>
  In this sentence <link linkend='nextsect'>this</link> word is
  hot and points to the following section.
 </para>
 <para>
  There is also a link to the section called 
   <quote>
     <link linkend='nextsect' endterm="nextsect.title"/>
   </quote>
  in this sentence.
 </para>
<section id='nextsect'>
 <title id='nextsect.title'>A Subsection</title>
 <para>
  This section only exists to be the target of a couple of links.
 </para>
 </section>
</section>

4.12. <mediaobject>

A displayed media object (video, audio, image, etc.).

4.12.1. Content model

mediaobject ::=
(objectinfo?,
 (videoobject|audioobject|imageobject),
 (videoobject|audioobject|imageobject|textobject)*,
 caption?)

4.12.2. Attributes

Common attributes.

4.12.3. Description

This element contains a set of alternative "media objects. Additional textual descriptions may be provided with TextObjects.

4.12.4. Processing expectations

Formatted as a displayed block. The primary purpose of the mediaobject is to provide a wrapper around a set of alternative presentations of the same information. If possible, the processing system should use the content of the first object within the mediaobject. If the first object cannot be used, the remaining objects should be considered in the order that they occur. A processor should use the first object that it can, although it is free to choose any of the remaining objects if the primary one cannot be used.

Under no circumstances should more than one object in a mediaobject be used or presented at the same time.

For example, a mediaobject might contain a video, a high resolution image, a low resolution image, a long text description, and a short text description. In a "high end" online system, the video is used. For print publishing, the high resolution image is used. For other online systems, either the high or low resolution image is used, possibly including the short text description as the online alternative. In a text-only environment, either the long or short text descriptions are used.

4.12.5. Parents

These elements contain mediaobject: appendix, article, articleinfo, bibliodiv, bibliography, blockquote, entry, example, figure, footnote, informaltable, listitem, note, objectinfo, revdescription, section, sectioninfo, sidebar, table.

4.12.6. Children

The following elements occur in mediaobject: audioobject, caption, imageobject, objectinfo, textobject, videoobject.

4.12.7. Example

<!DOCTYPE mediaobject PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
          "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<mediaobject>
 <videoobject>
  <videodata fileref='movie.avi'/>
 </videoobject>
  <audioobject>
  <objectinfo>
    <title>Acordeonists</title>
  </objectinfo>
  <audiodata fileref="acordeon.wav"/>
 </audioobject>
 <imageobject>
  <imagedata fileref='movie-frame.gif'/>
 </imageobject>
 <imageobject>
  <imagedata fileref="eiffeltower.eps" format="EPS"/>
 </imageobject>
 <imageobject>
  <imagedata fileref="eiffeltower.png" format="PNG"/>
 </imageobject>
 <textobject>
   <para>This video is obtained in a Paris trip.
   </para>
  <warning>
   <para>
    It was made with an amateur camera
   </para>
  </warning>
 </textobject>
 <caption>
  <para>Designed by Gustave Eiffel in 1889, The Eiffel Tower is one of the most widely recognized 
   buildings in the world.
  </para>
 </caption>
 </mediaobject>

4.13. <example>

A formal example, with a title.

4.13.1. Content model

example ::=
((title,titleabbrev?),
 (itemizedlist|orderedlist|variablelist|literallayout|
  programlisting|para|blockquote|mediaobject|informaltable)+)

4.13.2. Attributes

  • label - Label specifies an identifying string for presentation purposes.

  • width - Width specifies the width (in characters) of the longest line in this example (formatters may use this value to determine scaling or rotation).

4.13.3. Description

example is a formal example with a title. Examples often contain ProgramListings or other large, block elements. Frequently they are given IDs and referenced from the text with xref or link.

4.13.4. Processing expectations

Formatted as a displayed block. DocBook does not specify the location of the example within the final displayed flow of text; it may float or remain where it is located.

A list of examples may be generated at the beginning of a document.

4.13.5. Parents

These elements contain example: appendix, article, bibliodiv, bibliography, blockquote, listitem, note, revdescription, section, sidebar.

4.13.6. Children

The following elements occur in example: blockquote, informaltable, itemizedlist, literallayout, mediaobject, orderedlist, para, programlisting, title, titleabbrev, variablelist.

4.13.7. Example

<!DOCTYPE example PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
          "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<example><title>A DSSSL Function</title>
<programlisting>
(define (node-list-filter-by-gi nodelist gilist)
  ;; Returns the node-list that contains every element of the original
  ;; nodelist whose gi is in gilist
  (let loop ((result (empty-node-list)) (nl nodelist))
    (if (node-list-empty? nl)
 result
  (if (member (gi (node-list-first nl)) gilist)
    (loop (node-list result (node-list-first nl)) 
    (node-list-rest nl))
    (loop result (node-list-rest nl))))))
</programlisting>
</example>

4.14. <figure>

A formal figure, generally an illustration, with a title.

4.14.1. Content model

figure ::=
((title,titleabbrev?),
 (literallayout|programlisting|blockquote|mediaobject|
  informaltable|link|ulink)+)

4.14.2. Attributes

  • float - If float has the value 1 (true), then the processing system is free to move the figure to a convenient location. (Where convenient location may be described in the style sheet or may be application dependent.) A value of 0 (false) indicates that the figure should be placed precisely where it occurs in the flow.

  • label - label specifies an identifying string for presentation purposes.

  • pgwide - If pgwide has the value 0 (false), then the Figure is rendered in the current text flow (with flow column width). A value of 1 (true) specifies that the figure should be rendered across the full text page.

4.14.3. Description

figure is a formal example with a title. Figures often contain Graphics, or other large, display elements. Frequently they are given IDs and referenced from the text with xref or link.

4.14.4. Processing expectations

Formatted as a displayed block.

Figures may contain multiple display elements. DocBook does not specify how these elements are to be presented with respect to one another.

DocBook does not specify the location of the figure within the final displayed flow of text; it may float or remain where it is located.

A list of figures may be generated at the beginning of a document.

4.14.5. Parents

These elements contain figure: appendix, article, bibliodiv, bibliography, blockquote, listitem, note, revdescription, section, sidebar.

4.14.6. Children

The following elements occur in figure: blockquote, informaltable, link, literallayout, mediaobject, programlisting, title, titleabbrev, ulink.

4.14.7. Example

<!DOCTYPE figure PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
          "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<figure><title>A geometrical figure</title>
<graphic fileref="figures/geom.png"/>
</figure>

4.15. <table>

A formal table in a document

4.15.1. Content model

table ::=
((title,
  (mediaobject+|tgroup+)))

4.15.2. Attributes

  • colsep - If colsep has the value 1 (true), then a rule will be drawn to the right of all columns in this table. A value of 0 (false) suppresses the rule. The rule to the right of the last column in the table is controlled by the frame attribute, not the colsep.

  • frame - Specifies how the table is to be framed. Below are the possible values:

    • all - Frame all four sides of the table. In some environments with limited control over table border formatting, such as HTML, this may imply additional borders.

    • bottom - Frame only the bottom of the table.

    • none - Place no border on the table. In some environments with limited control over table border formatting, such as HTML, this may disable other borders as well.

    • sides - Frame the left and right sides of the table.

    • top - Frame the top of the table.

    • topbot - Frame the top and bottom of the table.

  • label - Label specifies an identifying string for presentation purposes. Generally, an explicit label attribute is used only if the processing system is incapable of generating the label automatically. If present, the label is normative; it will used even if the processing system is capable of automatic labelling.

  • orient - Specifies the orientation of the table. An orientation of port (portrait) is the "upright", the same orientation as the rest of the text flow. An orientation of land (lanscape) is 90 degrees counterclockwise from the upright orientation.

  • pgwide - If pgwide has the value 0 (false), then the table is rendered in the current text flow (with flow column width). A value of 1 (true) specifies that the table should be rendered across the full text page.

  • rowsep - If rowsep has the value 1 (true), then a rule will be drawn below all the rows in the table (unless other, interior elements, suppress some or all of the rules). A value of 0 (false) suppresses the rule. The rule below the last row in the table is controlled by the frame attribute and the rowsep of the last row is ignored.

  • shortentry - If shortentry has the value 1 (true), then the table's titleabbrev will be used. A value of 0 (false) indicates that the full title should be used in those places.

  • tabstyle - Holds the name of a table style defined in a stylesheet (e.g., a FOSI) that will be used to process this document.

  • tocentry - If tocentry has the value 1 (true), then the Table will appear in a generated list of tables. The default value of 0 (false) indicates that it will not.

4.15.3. Description

The table element identifies a formal table. DocBook uses the CALS table model, which describes tables geometrically using rows, columns, and cells.

Tables may include column headers and footers, but there is no provision for row headers.

4.15.4. Processing expectations

Formatted as a displayed block. This element is expected to obey the semantics of the CALS Table Model Document Type Definition, as specified by OASIS Technical Memorandum TM 9502:1995

4.15.5. Parents

These elements contain table: appendix, article, bibliodiv, bibliography, blockquote, listitem, note, revdescription, section, sidebar.

4.15.6. Children

The following elements occur in table: mediaobject, tgroup, title.

4.15.7. Example

<!DOCTYPE table PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
          "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<table frame='all'><title>Sample Table</title>
<tgroup cols='5' align='left' colsep='1' rowsep='1'>
<colspec colname='c1'/>
<colspec colname='c2'/>
<colspec colname='c3'/>
<colspec colnum='5' colname='c5'/>
<thead>
<row>
  <entry namest="c1" nameend="c2" align="center">Horizontal Span</entry>
  <entry>a3</entry>
  <entry>a4</entry>
  <entry>a5</entry>
</row>
</thead>
<tfoot>
<row>
  <entry>f1</entry>
  <entry>f2</entry>
  <entry>f3</entry>
  <entry>f4</entry>
  <entry>f5</entry>
</row>
</tfoot>
<tbody>
<row>
  <entry>b1</entry>
  <entry>b2</entry>
  <entry>b3</entry>
  <entry>b4</entry>
  <entry morerows='1' valign='middle'><para>  <!-- Pernicous Mixed Content -->
  Vertical Span</para></entry>
</row>
<row>
  <entry>c1</entry>
  <entry namest="c2" nameend="c3" align='center' morerows='1' valign='bottom'>Span Both</entry>
  <entry>c4</entry>
</row>
<row>
  <entry>d1</entry>
  <entry>d4</entry>
  <entry>d5</entry>
</row>
</tbody>
</tgroup>
</table>

4.16. <bibliography>

A bibliography.

4.16.1. Content model

bibliography ::=
((title,subtitle?,titleabbrev?)?,
 (itemizedlist|orderedlist|variablelist|note|literallayout|
  programlisting|para|blockquote|mediaobject|informaltable|
  example|figure|table|sidebar|abstract|authorblurb|epigraph)*,
 (bibliodiv+|bibliomixed+))

4.16.2. Attributes

  • status - Status identifies the editorial or publication status of the Bibliography. Publication status might be used to control formatting (for example, printing a "draft" watermark on drafts) or processing (perhaps a document with a status of "final" should not include any components that are not final).

4.16.3. Description

A DocBook bibliography may contain some preferatory matter, but its main content is a set of bibliography entries (either biblioentry or bibliomixed).

4.16.4. Processing expectations

Formatted as a displayed block. A bibliography in a book frequently causes a forced page break in print media.

Some systems may display only those entries within a Bibliography that are cited in the containing document. This may be an interchange issue.

The two styles of bibliography entry have quite different processing expectations. BiblioEntrys are "raw;" they contain a database-like collection of named fields. BiblioMixed entries are "cooked;" the fields occur in the order in which they will be presented and additional punctuation may be sprinkled between the fields.

4.16.5. Parents

These elements contain bibliography: article.

4.16.6. Children

The following elements occur in bibliography: abstract, authorblurb, bibliodiv, bibliomixed, blockquote, epigraph, example, figure, informaltable, itemizedlist, literallayout, mediaobject, note, orderedlist, para, programlisting, sidebar, subtitle, table, title, titleabbrev, variablelist.

4.16.7. Example

<!DOCTYPE bibliography PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
          "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<bibliography>
 <title>A Test Bibliography</title>
 <bibliodiv>
  <title>Books</title>
  <biblioentry>
   <abbrev>AhoSethiUllman96</abbrev>
   <authorgroup>
    <author><firstname>Alfred V.</firstname><surname>Aho</surname></author>
    <author><firstname>Ravi</firstname><surname>Sethi</surname></author>
    <author><firstname>Jeffrey D.</firstname><surname>Ullman</surname></author>
   </authorgroup>
   <copyright><year>1996</year>
   <holder>Bell Telephone Laboratories, Inc.</holder></copyright>
   <editor><firstname>James T.</firstname><surname>DeWolf</surname></editor>
   <isbn>0-201-10088-6</isbn>
   <publisher>
    <publishername>Addison-Wesley Publishing Company</publishername>
   </publisher>
   <title>Compilers, Principles, Techniques, and Tools</title>
 </biblioentry>

 <biblioentry xreflabel="Kites75">
  <authorgroup>
    <author><firstname>Andrea</firstname><surname>Bahadur</surname></author>
    <author><firstname>Mark</firstname><surname>Shwarek</surname></author>
  </authorgroup>
  <copyright><year>1974</year><year>1975</year>
     <holder>Product Development International Holding N. V.</holder>
     </copyright>
  <isbn>0-88459-021-6</isbn>
  <publisher>
    <publishername>Plenary Publications International, Inc.</publishername>
  </publisher>
  <title>Kites</title>
  <subtitle>Ancient Craft to Modern Sport</subtitle>
  <pagenums>988-999</pagenums>
</biblioentry>

5. Publishing DocBook documents

This section of the tutorial is devoted to learn how we can convert/export the DocBook files into several other types of files. DocBook documents are not designed to be viewed directly. FreeBSD and most Linux distributions come with conversion tools (collectively called a tool chain) for converting DocBook files to presentation formats such as Postscript, HTML, PDF, DVI, roff (the native man page format), HTMLHelp, JavaHelp and text.

DocBook files are validated, parsed and translated by a combination of applications collectively called a DocBook tool chain. The core function of a tool chain is to read the DocBook markup and transform it to a presentation format (for example HTML, PDF, HTML Help) using a set of rules and stylesheets.

A wide range of user output format requirements coupled with a choice of available tools and stylesheets results in many valid tool chain combinations.

Assembling a working DocBook tool chain requires a fairly detailed understanding of how the tools work together. Popular Linux distributions come with both toolchains and toolchain wrapper scripts.

Wrapper scripts tie together the various toolchain commands and simplify the underlying complexity. Redhat/Fedora distributions have shipped fully configured toolchains along with both the xmlto(1) and thejw(1) toolchain wrappers (which process DocBook XML and DocBook SGML documents using XSL and DSSSL stylesheets respectively). xmlto(1) and related tools are standard Cygwin packages and are recommended for Microsoft Windows users.

5.1. Toolchain Components

Here are the commands and packages I use to generate the HTML, PDF and HTML Help documentation files:

  • DocBook XSL Stylesheets. This package contains a set of XSL stylesheets for converting DocBook XML documents to HTML, XSL-FO and HTML Help source

  • xsltproc. xsltproc is a command line XML parser for applying XSLT stylesheets (in our case the DocBook XSL Stylesheets) to XML documents. It is part of libxslt, the XSLT C library for GNOME (see http://www.xmlsoft.org).

  • FOP. The Apache Formatting Objects Processor converts XSL-FO (*.fo) files to PDF files (see the FOP section).

  • Microsoft Help Compiler. The Microsoft HTML Help Compiler (hhc.exe) is a command-line tool that converts HTML Help source files to a single HTML Help (*.chm) file. It runs on MS Windows platforms and can be downloaded from http://www.microsoft.com.

5.2. Exporting DocBook

You will have noticed that the distributed PDF, HTML and HTML Help documentation files (for example ./doc/asciidoc.html) are not the plain outputs produced using the default DocBook XSL Stylesheets configuration. This is because they have been processed using customized DocBook XSL Stylesheet drivers.

You'll find these DocBook XSL drivers in the distribution ./doc directory. The examples which follow are executed from the distribution ./doc directory:

  • common.xsl - Shared driver parameters. This file is not used directly but is included in all the following drivers.

  • chunked.xsl - Generate chunked XHTML (separate HTML pages for each document section) in the ./doc/chunked directory. For example:

    $ xsltproc --nonet chunked.xsl mydocbook.xml
    
  • fo.xsl - GenerateXSL Formatting Object (*.fo) files for subsequent PDF file generation using FOP. For example:

    $ xsltproc --nonet fo.xsl mydocbook.xml > mydocbook.fo
    $ fop.sh mydocbook.fo mydocbook.pdf
    
  • htmlhelp.xsl - Generate Microsoft HTML Help source files for the MS HTML Help Compiler in the ./doc/htmlhelp directory. See the article at http://www.codeproject.com/winhelp/docbook_howto.asp. This example is run on MS Windows from a Cygwin shell prompt:

    $ xsltproc --nonet htmlhelp.xsl mydocbook.xml
    $ c:/Program\ Files/HTML\ Help\ Workshop/hhc.exe htmlhelp.hhp
    $ mv htmlhelp.chm mydocbook.chm
    
  • xhtml.xsl - Convert a DocBook XML file to a single XHTML file. For example:

    $ xsltproc --nonet xhtml.xsl mydocbook.xml > mydocbook.html
    

5.3. About Apache Formatting Object Processor (FOP)

XSL Stylesheets can be used to generate FO (Formatting Object) files, which in turn can be used to produce PDF files using the Apache Formatting Object Processor program (FOP). More the FOP home page is at http://xml.apache.org/fop/.

5.3.1. Five reasons to us FOP

  1. You can produce PDF on both Windows and POSIX platforms.

  2. The PDF quality is on a par with that produced by jw(1).

  3. PDF files are about half the size of those produced by jw(1) and friends.

  4. Processes images, table of contents and images and inserts PDF Bookmarks and active hypertext links.

  5. Uses DocBook XML (no need to produce DocBook SGML).

5.3.2. Installing FOP on Windows

  1. Download latest FOP distribution from http://xml.apache.org/fop/.

  2. Unzip to C:\bin.

  3. Edit the distribution fop.bat file and put it in the search PATH:

    set LOCAL_FOP_HOME=C:\bin\fop-0.20.5
    
  4. Download the JIMI image processing library from http://java.sun.com/products/jimi/.

  5. Extract the JimiProClasses.jar library from the JIMI distribution and copy to the FOP ./lib directory.

  6. Edit the distribution fop.bat file again and add the JIMI library to LOCALCLASSPATH:

    set LOCALCLASSPATH=%LOCALCLASSPATH%;%LIBDIR%\JimiProClasses.jar
    
  7. You should now be able to run FOP from a DOS prompt - execute it without arguments to get a list of command options:

    > fop.bat
    

5.3.3. Installing FOP on Linux

  1. Download latest FOP distribution from http://xml.apache.org/fop/.

  2. Install the FOP distribution:

    $ su
    # cd /usr/local/lib
    # unzip ~srackham/tmp/fop-0.20.5-bin.zip
    # cp /usr/local/lib/fop-0.20.5/fop.sh /usr/local/bin
    # chmod +x /usr/local/bin/fop.sh
    
  3. Edit the FOP start script fop.sh adding this line to the start of the script::

    FOP_HOME=/usr/local/lib/fop-0.20.5
    
  4. Download the JIMI image processing library from http://java.sun.com/products/jimi/.

  5. Extract the JimiProClasses.jar library from the JIMI distribution and copy to the FOP ./lib directory.

    # cp ~srackham/tmp/JimiProClasses.jar /usr/local/lib/fop-0.20.5/lib/
    
  6. You should now be able to run FOP - execute it without arguments to get a list of command options:

    $ fop.sh
    

Example 3. A complete example of using FOP (Windows)

Soon.

5.4. Using XML Mind Editor

XMLMind Editor (standard edition is free) runs on: Linux, Windows, Mac OS X. Requires Java, but you need Java anyway or you won't be able to build the docs from the sources.

Features:

Tree view (all elements collapsible) and Styled view (chapters and sections collapsible). The latter is what I usually work in: it shows the document in a basic but clear word-processor-like layout, defined in a stylesheet that comes with the program. Both views can be active simultaneously. DocBook mode won't let you enter anything non-DocBook. Element chooser. Attribute editor. Edit and Search functions. Spell checker. Special character picker. Speedbuttons to create frequently used elements like sections, lists, tables, etc. What I miss is a plain text XML source view.

Hint: After you export your DocBook document to HTML using XMLMind you can convert to the XHTML using, for example, HTML Tidy, or HTML Kit editor. It is clear that you can use your personal stylesheet file.

6. That's not all...

Try to make your own DocBook document and export it into XHML and PDF.This has just been a brief overview of using DocBook. It is by no means an exhaustive look at all of the elements or potential that DocBook has. Hopefully, however, this tutorial will suffice to get you started learning more about DocBook. After following along with this tutorial you should be able to create basic DocBook documents and use SGML-tools Lite to produce usable output from DocBook files. For more information on DocBook, you can consult the online documentation at DocBook.org (see Resources). If you would like to tinker a bit more with DocBook, a good place to start might be the Linux Documentation Project. Most of the documents in the LDP have a DocBook version available online that you could examine for more detailed usage of DocBook.

Bibliography

[thesite] Visit the DocBook site at http://www.docbook.org/tdg/index.html

[ldp] The Linux Documentation Project (www.tldp.org) contains many documents written in DocBook.

[oasis] OASIS DocBook Pages (http://www.oasis-open.org/specs/index.php#dbv4.1 ), you'll find the DocBook Standard. OASIS is the Organization for the Advancement of Structured Information Standards, a non-profit, international consortium that creates interoperable industry specifications based on public standards such as XML and SGML.

[xmlmind] XML Mind Editor, standard edition, http://www.xmlmind.com/

TeX ( http:www.tug.org ) is an important tool whose purpose overlaps DocBook's. The focus of TeX is closer to typography, but TeX also has many elements of semantic markup especially for mathematics.



[1] Recommendation: Use XML Mind Editor from XMLMind to easy edit and convert DocBook documents.