Last revised: 4 Dec 2002 $Revision: 1.5 $
This page captures the specific requirements for extensions that have been identified, either through community dicussion or by the existence of extensions in existing FO implementations.
This section captures requirements that are not specific to any particular delivery media or tool.
There are a number of use cases that require the ability to correlate original XML input elements to artifacts of the paginated document they are rendered on to, including the page numbers of the pages they occurred on and the values of markers on the pages they occurred on. These use cases include:
The most general statement of this requirement is:
There are a number of use cases in which the same element produces different layout results depending on that element's positional relationship to other elements in the FO instance. These use cases include:
Some financial documents such as invoices or bank statements when span more than one page use special header or footer with "carried forward" summary that contains some partial information up to the page break like partial sum, partial number of items etc. While in general this requires some kind of correlation between source xml tree and the formatter, it can be reduced to requirement to retrieve markers from within table footer/header/caption if stylesheet author prepares partial information to be formatted for all possible breaks as markers.
These requirements are specific to the functions provided by Adobe PDF. PDF is a proproprietary format for final-form delivery of documents. However, PDF is in such wide use that is is essentially a standard. Adobe publishes the PDF specification and there are a number of open-source solutions for both viewing PDF documents and accessing PDF documents programmatically. PDF is primary rendition target for most FO implementations.
PDF provides a number of useful features for online document delivery, including hyperlinks, bookmarks, interactive forms, and so on. There is a general requirement to be able to take advantage of these features of PDF using FO-based composition systems. Some of these features have a direct mapping from generic FO constructs (e.g., basic-link maps to PDF links) but many do not. Most of the PDF-producing FO implementations, certainly all the commercial ones, have recognized this and already provide proprietary extensions for many of these requirements.
PDF provides facilities for creating "bookmarks", which are navigation aids that are typically used in same way that traditional tables of contents are used. Most of the existing FO implementations provide some facility for creating PDF bookmarks from documents. Thus is a clear requirement for a general mechanism for creating PDF bookmarks. Detailed requirements are:
PDF provides a generic "annotation" mechanism, by which different types of annotations can be applied to documents, including highlighting of text, notes overlaid on a page, and arbitrary pen-line marks.
It would be handy if annotations could be created as part of the PDF generation process. For example, given an online review system in which annotations were created in some generic form, documents could be rendered to PDF with the annotations reflected as PDF annotations rather than simply as base components of the rendered pages.
While it is possible to add annotations to PDF documents, either using the Adobe Acrobat product or programmatically, it would be difficult to do this for annotations associated only with the original XML document without some mapping between the original input XML elements and the pages those elements were eventually rendered to (see general requirements).
Originally submitted by David Cramer via the exslfo-discussion list. From David's post:
An important feature of pdfs that I would like to see in renderers is logical page numbering. Currently, if your frontmatter is paginated with lowercase romans and the first chapter starts at arabic page 1, you're pdf pagination and the page numbers that appear on the page will be off by the number of frontmatter pages. So if one user of the document tells another user to go to page 52, he would have to specify which page 52 he means, the pdf 52 or the page on which the number 52 appears in the header/footer.
From the Acrobat User Guide:
Use Logical Page Numbers allows you to set page numbering in a PDF document using the Document > Number Pages command. You typically do this when you want PDF page numbering to match the numbering printed on the pages. A page's number, followed by the page position in parentheses, appears in the status bar and in the Go To Page, Delete Pages, and Print dialog boxes. For example, if the first page in a document is numbered "i", it might appear as "i(1 of 10)". If this option is not selected, Acrobat ignores page numbering information in documents and numbers pages using arabic numbers starting at 1.
This section captures requirements for layout functionality not currently provided by the FO specification. Many of these requirements are already booked by the XSL Working Group and will likely be addressed in future versions of the XSL FO specification. At such time as the XSL Working Group takes up any of the requirements in this section, their discussion here shall be retired. Any implementations of these requirements should be considered as being in the service of gaining implementation experience in advance of formal standardization.
Many technical documentation business practices, especially in military and manufacturing, use revision marks to identify sub-block portions of text that have been changed from an earlier version of the same information. These revision marks are usually implemented as vertical rules rendered in the margin vertically-aligned with the text sequences they apply to:
When tables span multiple pages, there is often a requirement to create either a separate "table continued" message above or below the table or to add something like "Continued" to the table caption. This requirement can be met in a weak way using markers in the static content but it is not completely satisfactory, especially for messages placed at the bottom of the table (e.g., "Table continued on next page") because there is no way to reliably place the message immediately below the table (because the vertical extent of a continued table cannot be absolutely controlled). Thus, satisfying this requirement requires an extension to FO that would provide a way to either retrieve markers within the table caption, table header, or table footer or define an additional component of the table caption to hold text for use on pages after the first. This requirement could be partially satisfied by using repeating before floats, as provided by Epic editor 4.3's FO implementation.
XSLT provides all the grouping and sorting functions needed to generate back-of-the-book indexes from index entry markup in an XML document (see the Docbook XSL Stylesheets project for an example of how to generate indexes with XSLT). However, the result of such a generated index must necessarily be index entries (in the composed index) with multiple references to the same page. This is for the simple reason that there is no way to know, at XSLT processing time, what page a given index marker in the document flow will resolve to.
Thus, to be able to create proper indexes, there must be a way to eliminate duplicate page numbers from the list of page numbers associated with a given index entry.
This requirement can be met using a clever post-processing mechanism developed by Ken Holman (Ken--need a pointer to your write up on this). However, this process requires human intervension and so is not generally useful in lights-out production environments.