• No results found

5.3 Bridging Context-free Grammars and Object-oriented Modeling

5.3.1 Conceptual Mapping

Structural Mapping As outlined in the description of the general mapping problem of technical spaces (cf. Sect. 5.1.1), structural concepts are very important when transfer- ring information from one space to another. Therefore, the structural concepts of both context-free grammars and object-oriented models are depicted in Fig. 5.4, as well as a potential mapping between them.

One can recognize concepts such as productions, terminals and non-terminals for the grammar space and terms like classifier, reference and attribute for the modeling space. Before mapping the concepts to each other, let us deepen the understanding of the concepts a little more.

First, models defined by modeling languages are composed of elements that in turn consist of attribute values and other contained elements. In addition, cross-references between elements can exist. Which elements are allowed is defined in a metamodel through classifiers. By defining containment references between classifiers, valid con- tainment relations are defined. Which attribute values can be defined for an element is

5.3 Bridging Context-free Grammars and Object-oriented Modeling

declared by attributes in the metamodel. Classifiers are also connected through super- class relationships, which express exchangeability. Possible cross-references are defined by non-containment references.

Second, sentences defined by context-free grammars consist of a sequence of elements, where an element is either represented by a single symbol (i.e., a set of connected char- acters) or another nested sequence, which again consists of symbols and possibly other nested sequences. Valid parts are defined through productions. By referencing other productions through non-terminals in a sequence, the possible nesting of elements is defined. The symbols that may appear in a sequence are defined by terminals. Choices of different non-terminals can be defined to express exchangeability.

Based on [2], the following mapping can be derived for the core structural concepts of both technical spaces:

• Classifier—Production (1–1)

Element types are defined through classifiers in one space and by productions in the other.

• Containment-Reference—Non-Terminal (2–2)

The composition of elements from others is expressed by containment references on the metamodel side. Non-terminals that appear in a sequence, express a similar relationship on the grammar side.

• Attribute—Terminal (3–3)

Attribute values are also part of a model element and can therefore be expressed by symbols.

• Superclass—Choice (4–4)

The superclass relationship can be mapped to the alternative concept because both express exchangeability.

• Non-Containment-Reference—Terminal (5–3)

For cross-references, there exists no direct correspondence in grammars. They can however be mapped to terminal symbols as well. In this case, the symbol represents an identifier that identifies the element to be referenced.

• Multiplicity—Sequence (6–6)

Features of metaclasses can hold multiple values depending on their cardinality. In grammars, sequences can be used to express that elements occur multiple times. A closer look at this mapping reveals that the structural concepts of models and context-free grammars are quite similar. Up to a single case—mapping non-containment references to terminals—all concepts have an exact counterpart in the opposite technical

5 Bridging Technical Spaces

space. Furthermore, all concrete concepts (i.e., all exceptReference andFeature) can be mapped, which indicates that the mapping of concepts is complete.

The only discrepancy w.r.t. the structure of artifacts from both spaces, is the fact that both non-containment references and attributes are mapped to terminals (i.e., a non-injective mapping). This situation is triggered by the mathematical structures that form the bases of the two spaces. Models are based on graphs, whereas grammars use trees as primary means to construct complex structures. Trees do not allow arbitrary connections between nodes, whereas graphs do so. Non-containment references are ex- actly the connections, which are not allowed by trees. Therefore, no direct equivalent can be found in the context-free grammar space.

But, to represent such connections to other nodes in the grammar space, it is common practice to use terminals. For example, most programming languages refer to declared elements using identifiers. Even though there is no explicit support in context-free gram- mars to state that an identifier references certain other elements in the grammar, this connection is often part of the static semantics of the language defined by the grammar. In contrast, models use explicit references. For example, a reference to declared elements is stored in models, rather than using a symbolic identifier.

In summary one can say, that the formalism to build complex structures in one space can be resembled in the other.

Mapping Primitive Types After having established the mapping of the structural con- cepts, the relation of primitive types available in context-free grammars and models needs to be clarified. The former do have only one primitive type—strings. There is no explicit distinction between numbers, dates or booleans. The basic building blocks of all sentences are characters and sequences of characters. Models on the other hand are based on the MOF standard, which imports a set of predefined primitive types from the UML infrastructure specification [21]. This set includesInteger,String,Booleanand UnlimitedNatural1.

Since grammars do use only one primitive type, whereas models allow multiple, a conversion is needed. The type of the concrete attribute that is mapped to a terminal determines the target for this conversion. For example, if a boolean attribute is mapped to a specific terminal, the values of this terminal must be transformed to eithertrueor false—the possible values for boolean typed attributes.

Naturally, concrete conversions depend on the semantics of the context-free grammar at hand. As there is no information about the meaning of terminals in the grammar itself, the designer of a concrete bridge must decide how to convert string values of terminals to the primitive types used by attributes in models. One can say that the conversion of primitive types cannot be handled generically as the information required

1

In contrast toIntegers,UnlimitedNaturals do not allow for negative values, but additionally include infinity, which is denoted by an asterisk.

5.3 Bridging Context-free Grammars and Object-oriented Modeling

to do so is not formally available. Converting primitive types is left to the designer of a concrete bridge between a grammar and a metamodel.

Mapping Semantics Besides the structural aspects that can be found in technical spaces, the semantics of artifacts that reside in such spaces is an important issue to consider when bridging spaces. Even when structures can be mapped from one space to another, this does not necessarily imply that two artifacts have equivalent meaning.

For the case at hand, the semantic mapping is rather narrow. Grammars, being primarily developed to describe syntax rather than semantics, do not have a complex semantics. The single important property that should be preserved by mappings of grammars to another space, are the rules by which productions can be expanded (i.e., the construction of sentences from a grammar). Informally these rules are as follows:

• Starting from the start symbol, any non-terminal on the right side of a rule can be replaced by the rule for this non-terminal. This rewrite procedure can go on until all non-terminals have been replaced by terminals.

• Choices in grammar rules must be replaced by a single option.

• Choices and non-terminals can be picked non-deterministically during replacement. When looking at rewrite rules in the context of our mapping to the modeling space, the following observations can be made. First, metamodels do not have the notion of a start symbol. One can start the creation of a model by creating an instance of a random metaclass. In further steps, this instance can be extended by adding more instances and by adding references to them. Thus, grammars are more restrictive in the sense that they identify a particular top-level language element—the start symbol. Metamodels are less restrictive, because they allow to construct models starting from arbitrary, potentially multiple root elements.

The actual expansion of grammars and models is preserved by our mapping. Replacing a non-terminal by the right side of the respective production, is equivalent to the creation of an instance of the metaclass that corresponds to that production. Of course, the new instance must be referenced. In addition, choices which allow the definition of alternative syntax in grammar rules, were mapped to the inheritance relation between metaclasses. This mapping is also valid, since the semantics of subclassing is equal to the one of choices—one must pick exactly one out of the given possibilities (i.e., either an option or a subclass respectively).

Consequently, if grammar and metamodel concepts are mapped according to Fig. 5.4, the expansion of the grammar is parallel to the creation of a model. This is a nice property as it guarantees, that there is a valid textual representation for every model and that a model exists for every sentence of the grammar. The former requires, that the

5 Bridging Technical Spaces

model is constructed starting at the root metaclass (i.e., the metaclass that corresponds to the start symbol of the grammar).

If models have multiple root elements, a gap between the two spaces exists. However, this can be easily resolved. For example, one can introduce an artificial start symbol to the grammar, which allows to produce all the non-terminals that correspond to root metaclasses. Or, the other way around, an artificial start metaclass can be introduced that serves as superclass for all root metaclasses. In both cases, slight modifications of the grammar or metamodel are needed to establish a mapping. However, these modifications are very lightweight, which is why the start symbol problem is not severe.

5.3.2 Conclusion

Context-free grammars and models form a pair of technical spaces, which is highly relevant in the context of RTE. The former is the most widespread formalism to capture the structure of software artifacts formally. The latter is gaining significance with the advent of MDSD. The bridge that was conceptually developed above and implemented in the EMFText tool, allows to apply modeling tools—editors, analyzers, or transformation engines—to grammar-based artifacts. This is not only useful in the context of this thesis, where the main focus is RTE, but opens up many other application areas. For example, Reverse Engineering existing applications becomes possible [179].

Even though context-free grammars are only one existing technical space out of many, its flexibility and thereby its wide usage is striking. The vast amount of artifacts that can be described by context-free grammars makes this particular bridge especially important. It allows to represent different kinds of artifacts using a common syntax formalism (i.e., metamodels). Once all artifacts reside in the same technical space, synchronization procedures can be uniformly applied. Also, the technical and the logical translation (cf. Sect. 3.4.3) are strictly separated.

Besides the fact that the creation of this bridge allows many artifacts to be subject to RTE, one must be aware of the limitations of the mapping between the two spaces. When facing the concrete task of establishing a bridge between a grammar-based language and a respective metamodel, manual labor is often required. Depending on the complexity of the involved language, this may range from very little specification effort to enormous time consuming work that is needed to fill the gap between a grammar and a metamodel. The reason for requiring manual labor here, is the lack of important information in the grammar of the language at hand.

First, context-free grammars are based on typed trees. Thus, the first missing piece of information are the cross-links needed to establish a graph structure. This graph structure is often defined by the semantics of the language, but not necessarily captured formally. As a consequence, cross-links are part of the language, but no description of the links is readily available. Therefore, building a bridge involves a definition of the cross-references that form the graph structure of the artifact at hand.