• Support many document formats
• Incorporate non-Digital conversion modules
• Select conversion modules dynamically • Build on existing CDA Toolkit services
The converter architecture is based on a conversion hub model. A front-end module converts an input document to a hub format. A back-end module then converts the hub format to an output format. The hub format can be in either DDIF or DTIF format.
42
The hub model has the significant advantage that the number of conversion modules increases rela tive to the sum, rather than the product, of the number of input and output document formats supported . However, the model can be used only when the hub format can fully express the seman tics of input documents; otherwise, loss occurs on conversion to the hub format. The CDA formats were designed to avoid this problem.
The in-memory aggregate structures defined for document transfer to and from encodings are the same structures used for document transfer through the conversion modules. Thus, the converter archi tecture is an extension of the toolkit procedural interface.
As in the case of the CDA encoding and decoding interface, a variety of document sources and desti nations are available. Files and streams must be supported by conversion modules. In-memory aggregate trees are supported when the source or
destination format is a CDA format.
The following sections present more details about the document conversion procedure within the model and the translation that occurs from one hub format to another, also called domain crossing. Front- and back-end interfaces are also discussed further.
Conversion Control Procedure
A single control procedure that is packaged w ith the toolkit initiates a document conversion. The application provides
• The name of the source and destination docu ment format
• Identification of the data source and destination
• Parameters that are u ltimately passed through to the specific conversion modules to control their operation
The control procedure prepares for conversion by determining the location of the requested front and back end. Three means of determining the location are possible. First, the toolkit contains a CDA front end and CDA back end for use with DDIF or DTIF source or destination formats. The toolkit's function is only to encode or decode the aggregate stream. Second, the application can specify the address of a procedure to act as the front end or back end . Third, the control procedure uses the specified format name to locate and activate an external program image that contains the front end or back end.
Thus, as illustrated in Figure 1 , the converter control procedure assembles the complete conver sion program at execution time based on the con version modules installed and available at the time they are referenced . Figure 2 shows document data flow through the conversion.
CONVERTER CONTROL PROCEDURE
(CDA TOOLKIT)
Figure 1 Complete Document Conversion Program
Each operating system includes a small number of essential conversion modules as standard equip ment. Because the procedural interface is published and the binding of conversion modules is dynamic, Digital and other vendors can easily make available an unlimited range of optional conversion modules that increase the value of their products. Many appli cation developers who use CDA conversion services have added logic that scans for all available conversion modules and displays a conversion format menu dynamically tailored to the actual environment.
Domain Crossing
As we have seen, the toolkit supports both DDI F and DTIF conversion hub formats. Stated differ ently, a front-end or back-end module operates in either the DDI F domain or DTIF domain, depending upon which hub format the module produces or consumes.
In many cases, the front end and back end operate in the same domain. For example, in a word process ing document conversion, the front and back ends both operate in the DDIF domain. In a spreadsheet conversion, both operate in the DTIF domain. Clearly, however, a user might wish to print a spreadsheet or incorporate it into a memorandum. In this case,
Development of the CDA Toolkit
the front-end module produces DTIF format , and the back-end module expects DDIF format.
We added additional logic to the converter control procedure to permit such domain-crossing conver sions. We also developed a DTIF-to-DDI F conversion module. A domain conversion module receives aggregates from one domain and translates them to aggregates of another domain. A domain conver sion module does not process files; its inputs and outputs are in-memory aggregates.
If the converter control procedure determines that the front and back ends operate in different domains, it attempts to locate a domain conversion module that converts the input domain to the out put domain.
If one is available, the control procedure alters the conversion data flow such that aggregates flow from the front end into the domain conversion module, and from the domain conversion module into the back end. The invocation of a domain conver sion is transparent to the front end , the back end, and the application requesting the conversion service. Again, the control procedure makes a dynamic search with a stylized name so that the set of available domain conversion modules is extensible.
The DTIF-to-DDIF domain conversion module is thus the single point that performs report writing and formatting operations for tabular data. Thus, as DTIF-compliant applications rely on the con vener architecture for the interchange of tables w ith other formats, so they can rely on this module for printing or document viewing requirements. Each application need not contain this logic.
Front-end Procedural Interface
The primary function of a front end is to read the input document format that it supports, translate the document semantics to the hub format, and return content aggregates one by one upon demand. Therefore, a front end must present a defined inter face to the converter control procedure. Front ends must define four procedures : initialize, get aggregate, terminate, and get position.
The initialize procedure initializes the front end for the conversion . Because the procedure has a
Figure 2 Data Flow in Conversion
known name, it can be located by the control proce dure. Typically, this procedure opens an input file, allocates and initializes a context block, and pro cesses options passed to it by the control procedure. The context block is used to maintain current state information about the conversion. The initialize procedure must also return the addresses of the get-aggregate, terminate, and position procedures to the control procedure.
The get -aggregate procedure performs most of the conversion. It reads from the input file and produces DDIF or DTIF aggregates. This procedure does not convert the entire input document in a single call; rather, it reads from the input document until it is able to produce the next sequential top-level con tent aggregate. This aggregate is then returned to the control procedure. Subsequent calls to this pro cedure conti nue to build top-level aggregates until the end of document is reached.
The control procedure calls the terminate proce dure when an end-of-document is returned from the get-aggregate procedure or when an error has occurred. The terminate procedure typically closes the input file and deallocates its context block.
Finally, the get-position procedure is used by applications that must report the progress of a con version. For example, a document viewer can use this information to position a scroll bar within the document window.