• No results found

2.3 Computing requirement identifiers

3.1.1 Dealing with embedded media

The ReqIF file format allows to embed external media into rich text content (see Section 2.1.3).

In order to maximize compatibility across different RM-tools, ReqIF contains different layers

of content for each media-artifact. On the lowest level each such artifact is represented by an

XHTML-formatted String which is expected to be digestible by all conceivable RM-tools (the

embedded pictures in Figure 10 on page 47 are displayed this way). The next level is always a

PNG-image and the last, optional level is a file of arbitrary format. While rendering, RM-tools are

23In the computer-science denotation of the term. I.e. some action is being performed on each node of the tree.

24read: “write in correct order”

required to start with the highest available layer, but may fall back onto the preceding one if they

fail to handle it. The entire process is described in more detail in [Obj13, clause 10.8.20, point 2].

Currently, the tool only writes the first two layers. This implies that all embedded media which

is not already in PNG format needs to receive special treatment. For this purpose the media

subfolder contains two CSV-files:

images.csv deals with all graphical objects which can be extracted as a separate file from the

DOC input. Those raw (unconverted) files are saved alongside the CSV and are usually of

Windows Metafile (WMF) or Windows Enhanced Metafile (EMF) format.

Each line in images.csv represents a reference to an individual object and stores the tar-

get dimensions (width and height) along with it. By feeding this file to a dedicated macro

designed for Microsoft Visio, all those objects can be batch-converted into PNG.

The tool can also be reconfigured to use different conversion approaches which do not

rely on proprietary software from the Microsoft Office family. However, those alternatives

(namely: ImageMagick’s convert on Windows with GDI-support and libwmf on Unix, both

of which are open-source) do not provide comparable quality.

shapes.csv deals with shapes in the so-called “Office Drawing Binary Format” as specified in

[Mic14b]. These are commonly created through the drawing tools natively provided by

Microsoft Word. Such shapes cannot exist in isolation (i.e. they cannot be extracted and

legally saved into a separate file) [Mic11]. Thus, shapes.csv only states offsets (similar to

the startOffset used for backward tracing in Section 2.2) of those objects in the original

input DOC together with the filename where the resulting PNG is expected to go. The ac-

tual extraction is performed by another macro, which requires both the original DOC-file

and shapes.csv as its input. Although this macro runs inside Microsoft Word, it needs Mi-

crosoft Visio to be present as well.

There is no viable alternative25for the handling of such content, except for one special

kind of drawings (see Section 3.3.3). Formalized directly by the tool, they use a very lim-

ited subset of the drawing format discussed above and are therefore exempted from the

file shapes.csv. Hence, this is the only time when the tool must rely on external propri-

etary software.

In the example of Listing 5 both CSV-files are explicitly referenced (Lines 44–45). If the input

file happens to contain only one kind of media or no media at all, the non-applicable lines are

omitted and the CSV will not be present, either.

As stated in Section 2.3.2, the input documents contain a fair amount of OLE-data. Using the

approach outlined above, these data will always be flattened to WMF or EMF26. By utilizing

ReqIF’s third content layer which can hold arbitrary data, one could also link these original OLE-

BLOBs to the ReqIF output file. However, only a few RM-tools can actually take advantage of

this option. Besides, the focus of the tool was primarily on providing a decent input to imple-

menters of a system, rather than to authors of a specification willing to alter the embedded

25Although LibreOffce/OpenOffice, respectively their headless variant unoconv, claim to support such drawings, they,

in fact, fail miserably with those embedded in the Subset-026.

26In fact, this is performed by Microsoft Word automatically in order to display something meaningful in case the

graphics (which is why one would embed the original OLE-data in the first place). Lastly, this

approach will not work for the non-independent data referenced by shapes.csv unless it is em-

bedded into an artificial wrapper document27, which is quite an onerous task.

Extracting the original MTEF-representation of equations is also unlikely to be worthwhile since

edits can only be performed using Microsoft’s own Equation editor for as long as those objects

are embedded in an Office document. Alternatively, Design Science’s MathType software, which

the Microsoft editor derives from, may still be used even after they have been extracted. How-

ever, that is a rather exotic piece of proprietary software without any significant market pene-

tration. Alas, a truly useful formalization of such equations as TEX- or MathML-markup is hard to

obtain because only a limited open-source implementation of the MTEF file format is available

[SP12] and ReqIF lacks support for any of the aforementioned markups. Fortunately, this situ-

ation has somewhat improved with the XML-based successor of the DOC file format (*.docx)

where equations are stored in the openly documented Office MathML (OMML) format, a com-

petitor to MathML [Mur06].