EXSLFO Project: Requirements

SourceForge.net Logo

Last revised: 4 Dec 2002 $Revision: 1.5 $


[Home] [Requirements] [Specifications]


FO Extension Requirements

This page captures the specific requirements for extensions that have been identified, either through community dicussion or by the existence of extensions in existing FO implementations.


General Requirements

This section captures requirements that are not specific to any particular delivery media or tool.

Auxiliary data output ("side files")

There are a number of use cases that require the ability to correlate original XML input elements to artifacts of the paginated document they are rendered on to, including the page numbers of the pages they occurred on and the values of markers on the pages they occurred on. These use cases include:

  • Creating cross-document cross references with page numbers
  • Creating multi-document master indexes with page numbers
  • Using page position in conditions in XSLT style sheets (e.g., "if this element is on the same page as that element, then do X, otherwise do Y")
  • Capturing page-aware data in databases
  • Using pagination information to do billing or costing

The most general statement of this requirement is:

  1. It shall be possible, as part of the FO pagination process, to create auxiliary output data streams in XML syntax containing arbitrary markup that includes at least the following composition-specific information:
    • Current page number
    • Value of any marker as would be generated by fo:retrieve-marker used within static content on the same page.
    • Computed property values.
    • Page numbers as returned by fo:page-number-citation

Page-Aware Conditional Processing

There are a number of use cases in which the same element produces different layout results depending on that element's positional relationship to other elements in the FO instance. These use cases include:

  1. Repeating the same footnote content, with the same or different callout, on each page on which that footnote is referenced, but so that a given footnote content is only presented at most once on any given page. That is, given that a particular footnote content occurs multiple times in an FO instance, the footnote body should only be processed if it has not yet been processed on the current page. This can also be thought of as making a choice between processing an fo:footnote or simply generating a callout based on whether or not a related footnote has already been processed on the current page.
  2. Generating or not generating a page number citation in a cross reference if the target of the cross reference is or is not on the current page.
  3. Generating text such as "on this page", "above", and "below" based on the relative position of a target element.

Partial sum or carried forward

Some financial documents such as invoices or bank statements when span more than one page use special header or footer with "carried forward" summary that contains some partial information up to the page break like partial sum, partial number of items etc. While in general this requires some kind of correlation between source xml tree and the formatter, it can be reduced to requirement to retrieve markers from within table footer/header/caption if stylesheet author prepares partial information to be formatted for all possible breaks as markers.


PDF Requirements

These requirements are specific to the functions provided by Adobe PDF. PDF is a proproprietary format for final-form delivery of documents. However, PDF is in such wide use that is is essentially a standard. Adobe publishes the PDF specification and there are a number of open-source solutions for both viewing PDF documents and accessing PDF documents programmatically. PDF is primary rendition target for most FO implementations.

PDF provides a number of useful features for online document delivery, including hyperlinks, bookmarks, interactive forms, and so on. There is a general requirement to be able to take advantage of these features of PDF using FO-based composition systems. Some of these features have a direct mapping from generic FO constructs (e.g., basic-link maps to PDF links) but many do not. Most of the PDF-producing FO implementations, certainly all the commercial ones, have recognized this and already provide proprietary extensions for many of these requirements.

Bookmark generation

PDF provides facilities for creating "bookmarks", which are navigation aids that are typically used in same way that traditional tables of contents are used. Most of the existing FO implementations provide some facility for creating PDF bookmarks from documents. Thus is a clear requirement for a general mechanism for creating PDF bookmarks. Detailed requirements are:

  1. It shall be possible to create a bookmark that links to any rendered FO component
  2. It shall be possible to create bookmarks that are ordered and structured independently of the order and structure of the FO components to which they link (for example, it must be possible to generate a list of figure bookmarks at the beginning or end of a list of bookmarks that reflects the hiearchical divisions in the document.
  3. It should be possible to control the zoom and positioning of the bookmark target. In PDF a link links to a rectangular region of a page, not to a semantic structure within the page, so it is possible to specify, within the PDF linking structures, exactly how the target of the link is displayed when the link is traversed.

Annotation generation

PDF provides a generic "annotation" mechanism, by which different types of annotations can be applied to documents, including highlighting of text, notes overlaid on a page, and arbitrary pen-line marks.

It would be handy if annotations could be created as part of the PDF generation process. For example, given an online review system in which annotations were created in some generic form, documents could be rendered to PDF with the annotations reflected as PDF annotations rather than simply as base components of the rendered pages.

While it is possible to add annotations to PDF documents, either using the Adobe Acrobat product or programmatically, it would be difficult to do this for annotations associated only with the original XML document without some mapping between the original input XML elements and the pages those elements were eventually rendered to (see general requirements).

Logical Page Numbers

Originally submitted by David Cramer via the exslfo-discussion list. From David's post:

An important feature of pdfs that I would like to see in renderers is logical page numbering. Currently, if your frontmatter is paginated with lowercase romans and the first chapter starts at arabic page 1, you're pdf pagination and the page numbers that appear on the page will be off by the number of frontmatter pages. So if one user of the document tells another user to go to page 52, he would have to specify which page 52 he means, the pdf 52 or the page on which the number 52 appears in the header/footer.

From the Acrobat User Guide:

Use Logical Page Numbers allows you to set page numbering in a PDF document using the Document > Number Pages command. You typically do this when you want PDF page numbering to match the numbering printed on the pages. A page's number, followed by the page position in parentheses, appears in the status bar and in the Go To Page, Delete Pages, and Print dialog boxes. For example, if the first page in a document is numbered "i", it might appear as "i(1 of 10)". If this option is not selected, Acrobat ignores page numbering information in documents and numbers pages using arabic numbers starting at 1.

Layout Requirements

This section captures requirements for layout functionality not currently provided by the FO specification. Many of these requirements are already booked by the XSL Working Group and will likely be addressed in future versions of the XSL FO specification. At such time as the XSL Working Group takes up any of the requirements in this section, their discussion here shall be retired. Any implementations of these requirements should be considered as being in the service of gaining implementation experience in advance of formal standardization.

Revision marks

Many technical documentation business practices, especially in military and manufacturing, use revision marks to identify sub-block portions of text that have been changed from an earlier version of the same information. These revision marks are usually implemented as vertical rules rendered in the margin vertically-aligned with the text sequences they apply to:

Mockup of revision marks

"Table continued" header/footer

When tables span multiple pages, there is often a requirement to create either a separate "table continued" message above or below the table or to add something like "Continued" to the table caption. This requirement can be met in a weak way using markers in the static content but it is not completely satisfactory, especially for messages placed at the bottom of the table (e.g., "Table continued on next page") because there is no way to reliably place the message immediately below the table (because the vertical extent of a continued table cannot be absolutely controlled). Thus, satisfying this requirement requires an extension to FO that would provide a way to either retrieve markers within the table caption, table header, or table footer or define an additional component of the table caption to hold text for use on pages after the first. This requirement could be partially satisfied by using repeating before floats, as provided by Epic editor 4.3's FO implementation.

Index Generation Requirements

XSLT provides all the grouping and sorting functions needed to generate back-of-the-book indexes from index entry markup in an XML document (see the Docbook XSL Stylesheets project for an example of how to generate indexes with XSLT). However, the result of such a generated index must necessarily be index entries (in the composed index) with multiple references to the same page. This is for the simple reason that there is no way to know, at XSLT processing time, what page a given index marker in the document flow will resolve to.

Thus, to be able to create proper indexes, there must be a way to eliminate duplicate page numbers from the list of page numbers associated with a given index entry.

This requirement can be met using a clever post-processing mechanism developed by Ken Holman (Ken--need a pointer to your write up on this). However, this process requires human intervension and so is not generally useful in lights-out production environments.