THE HERA SOFTWARE ARCHITECTURE FOR GENERATING HYPERMEDIA APPLICATIONS FOR AD-HOC DATABASE OUTPUT

(1)

THE HERA SOFTWARE ARCHITECTURE FOR GENERATING

HYPERMEDIA APPLICATIONS FOR AD-HOC DATABASE OUTPUT

Geert-Jan Houben* and Pim Lemmens

Eindhoven University of Technology

Department of Computing Science

PO Box 513, NL 5600 MB Eindhoven, the Netherlands

(* also at University of Antwerp – UIA)

E-mail: {g.j.houben,w.j.m.lemmens}@tue.nl

ABSTRACT

The HERA research project investigates the automatic generation of hypermedia (Web) presentations for data resulting from ad-hoc database queries. In earlier work we have proposed a heuristic approach to generate navigation structures for multimedia database output, based on ideas from RMM. Due to the ad-hoc nature of the data, only part of the RMM design methodology can be automated deterministically. Therefore, generally relevant intelligence is provided to be applied in the generation process. Also, a query language extension has been designed to allow users on an ad-hoc basis to explicitly override the general properties of the generation process, e.g. the heuristics. This paper describes the architecture of the accompanying HERA software tool.

INTRODUCTION

The use of a hypermedia platform, such as World Wide Web, can help to present the less structured and not purely textual information one typically finds in applications such as employee databases, museum databases, geographic information systems, and mail-order catalogs and services. Although the actual data in these applications are less strictly structured than in traditional applications, many of these legacy applications are based on a structured data model, e.g. the relational model. Part of the application design concerns the presentation of query results in such a way that end-users can efficiently browse the output of their queries. (Houben and De Bra 1997; Houben and De Bra 1999) have addressed our approach to automatically

generate World Wide Web (hypermedia) applications for

data resulting from ad-hoc database queries.

• One target application area deals with collections of

products or services on offer, e.g. the houses a

real-estate agent can offer for sale, or the items on sale in an auction. While for the results of some standard queries Web presentations can be carefully designed by hand, this research concerns the automatic generation of a presentation for the result of an unforeseen ad-hoc query.

• A second target application area is investigated in the

Dynamo project (with Philips and CWI). That

concerns Electronic (TV) Program Guides. With lots

of data available on the programs broadcasted on the various TV channels, the goal is to produce an appropriate presentation of relevant data: an on-the-fly generation process produces a personalized, time-dependent EPG (Van Doorn et al. 1999).

An essential characteristic of our target applications is the volatility of the data to be presented. The data and their internal structure can change over time, and therefore the presentation of the query results can not be carefully designed beforehand.

Since the design and construction of a hypermedia application involves the representation of relationships between information objects, the approach for generating the navigation structure is based on the ideas of RMM (Isakowitz et al. 1995; Isakowitz et al. 1998). RMM combines elements from the Entity-Relationship model (Elmasri and Navate 1990) and HDM (Garzotto et al. 1991) to effectively manage relationships between objects. Generating a hypermedia presentation for volatile data includes the presentation of the records from the query result and the presentation of a record’s relationships with other records. RMM itself and its supporting tool RMCase (Díaz et al. 1995) do not help in the case of results to arbitrary ad-hoc queries. The main problem is that the dynamically generated structure of a query result cannot be (trivially) translated into a hypermedia presentation.

The main subject of this paper is a description of the architecture for the software implementing this approach. We describe how we handle the communication between a client browser and the underlying database containing the relational data. We also describe how we have set up a prototype for the software tool HERA to experiment with the architecture of the process to generate a hypermedia presentation for the data retrieved from the database.

AUTOMATIC PRESENTATION DESIGN

There are multiple reasons to produce a mapping from data retrieved from a relational database to data presented in a hypermedia structure. One can have developed a hypermedia application that one wants to populate with data available from a legacy database. On the other hand, one can have a legacy database full of data and one wants to produce presentations for query results accessible through a hypermedia browser. Either way, one wants to

(2)

produce an alternative presentation for the given relational data that uses relationships (possibly implemented by hypermedia links) to offer an internal structure that improves the user-friendliness of the presentation.

When you generate data automatically, it is not feasible to come up with a presentation designed by hand. This is certainly the case in our target applications where we look at results to ad-hoc database queries. Since we cannot foresee exactly which queries are going to be relevant, we cannot come up with a specific hypermedia presentation for each one of them.

We have chosen an approach in which we want to

derive automatically a hypermedia presentation for these

ad-hoc queries. The volatile nature of the data has led us to a heuristics based routine for the generation of the presentation, specifically the relationships. In this routine we combine “intelligence” from different sources.

• General presentation design intelligence to be used in

the generation process is included in the heuristics embedded in the application or system. In the real-estate example the heuristics can for instance capture guidelines how to present collections of houses.

• The actual data to be presented can themselves also

have specific properties that require specific design decisions. In the real-estate example, one can think of the situation that for some houses the description is not available through a picture, but a video.

• A third kind of intelligence is provided on an ad-hoc

basis by the user. These explicit user directives can be used to influence or override the general properties of the generation process. If a user of the real-estate application is not satisfied with the index-based approach that the process offers for a set of houses, the user can ask the system to give a guided tour instead. In general, the generation process is based on several kinds of intelligence. In the associated Dynamo project we also consider properties related to the platform the user is using: viewing a program guide on a TV with a large screen, or on a handheld machine with a small screen, may make a difference for the presentation, just as the bandwidth available for sending the data. Another kind of intelligence is the set of general user preferences: knowing that a user prefers a guided tour over an index enables a better adaptation to the user.

ad-hoc query

interpret

related standard query

specific query details

presentation for standard query

generate

presentation for ad-hoc query get reuse standard design standard presentations heuristics ad-hoc design

In this specific approach, the heuristics represent the “best practice” in generating hypermedia presentations. As an example, the automatic design process wants to reuse as much as possible from the result of the manual presentation design process for some standard queries. Assuming that a human designer has carefully designed a hypermedia presentation for the contents of each of the base tables, the generation process can treat an ad-hoc query on a base table as a minor deviation of that table, and base its generation on the presentation available for that base table.

NAVIGATION DESIGN

Navigation structures build the most characteristic aspect

of a hypermedia application. DHymE (Bieber 1998) shows the role that navigation structures can play in adding value to a legacy application. The associated Relationship-Navigation Analysis method (RNA) uses a rich taxonomy of relationship types to assure that the analysis process results in an appropriate set of relationships. Subsequently, a set of navigation structures is chosen to “implement” the relationships, e.g. hyperlinks on World Wide Web.

In our target applications we are dealing with data resulting from queries on relational databases. In the process of generating a hypermedia presentation for those data, we add hyperlinks to those data. This leads to a hypermedia presentation where the nodes basically represent the data elements, in this case the records, and where the hyperlinks between the nodes represent the structural relationships between those data elements. The motivation for constructing such a hypermedia representation is to increase the ease with which users can access the query result: in addition to the straightforward set-of-records presentation other structural relationships between the data elements can be made accessible. Note that while RNA suggests a “very rich”, or even “complete” set of relationships, in our project we try to have the heuristics produce a “reasonable” set of relationships.

We use RMM’s data model (Isakowitz et al. 1998) to express the (structural) relationships between the data elements. Its Application Diagrams use the concept of slice (m-slice) to specify the different mechanisms available for connecting the data elements and thus offering the users navigational access to those data elements. The navigation structures that we typically encounter are a combination of access routines that relate records to each other, the

inter-record navigation, and access routines that relate parts of a

single record to each other, the intra-record navigation. Based on RMM the mechanisms and tools available for

inter-record navigation are: index, guided tour, and

indexed guided tour. An index is used to access records by referring to some key identifier. By choosing an identifier from the index the user asks the system to navigate to the corresponding record. Typically the index is composed of words or icons that naturally identify the associated record. A guided tour is used to access records in a given order. The records are connected in a chain-like manner, and the user can follow that chain and thus access the different

(3)

records. An indexed guided tour is a combination of an index and a guided tour, where the user has both options to choose from: navigating using the identifiers in the index, or following the predefined path of the guided tour.

In comparison to RMM, there is one special mechanism. The user is able to specify how records are displayed on pages in the hypermedia environment. In practice it is often convenient to display a number of records on one page, if the size of the representation of the records allows this.

Turning to the intra-record navigation, we use RMM’s

slices as a way to divide the presentation of a data element,

e.g. a record, in multiple parts such that each part can be presented in a natural and easily comprehensible manner. The presentation of all properties of one element on a single page often appears not to be feasible. Each slice offers the user a view on a part of the element, typically including a number of attributes, but possibly also some relationships with other slices. In order to move the view towards other parts of the element, there is a link structure available that connects the different slices of an element. These slice links build the intra-record link structure, which is orthogonal to the inter-record connections.

SOFTWARE ARCHITECTURE

The HERA software environment has two main functions. First, it needs to interpret a user query to retrieve the required data. Next, it needs to present these data using hypermedia functionality. For this second function the general intelligence (heuristics) and the explicit, ad-hoc user directives are exploited to generate a hypermedia presentation. Therefore, we have technically conceived our system as two relatively independent parts, a data manager and a presentation manager.

browser _database query interpretation query management session management presentation generation data management question query query data data presentation retrieval session info ad-hoc presentation requirements

presentation manager data manager

browsing

Presentation Manager

The main task of the presentation manager is to take care of the display of data, hence its name. It presents data to the user, more specifically to the user’s browser (in standard HTML pages and forms). In order to do so, it has to generate a hypermedia presentation for the data to be presented. This means that the presentation manager receives data to be displayed from the data manager, it

generates a hypermedia presentation for those data on the basis of the appropriate intelligence, and it sends this presentation to the user’s browser. This intelligence includes the general heuristics, for example to derive new navigation structures, and the ad-hoc user directives, for example to explicitly override general design suggestions. Also, the actual layout and styling of the presentations are part of the presentation manager’s task.

Secondly, as the presentation manager communicates with the user’s browser, it is also made responsible for the translation of the user input (in the form of HTTP requests) into formal requests for the data manager. The motivation is that the presentation manager handles all communication between the user and the underlying system, while the user is browsing the hypermedia presentation generated for one query result, i.e. during one browsing session. This implies that the presentation manager is not only responsible for the processing of the initially asked question, but that during the browsing session it is also responsible for the processing of user actions (e.g. mouse clicks) that lead to the presentation of other parts of the same query result. To translate this user input into requests to the data manager the presentation manager should know what the state is of the browsing session, what the original query has been and what part of the data the user is viewing.

Data Manager

On the other hand, the data manager translates the formal requests it receives from the presentation manager into queries to the underlying database. So, the data manager communicates with the DBMS of the underlying database and asks this DBMS to retrieve data from its database. This communication involves standard database queries and responses, using JDBC. Subsequently, the data manager delivers the output from the database to the presentation manager.

The output delivered to the presentation manger represents sets of records retrieved from the database. In correspondence with our use of the slice as the basic unit of data, the output is organized in terms of slices. This implies that the output delivered to the presentation manager contains slice data (for the actual data), lists of slice names (for the links to other slices) and index data (for the inter-record navigation). As the communication between data manager and presentation manager is realized at the level of slices, actual data values are transferred in terms of slices. The data manager maintains a cache of retrieved data from which it serves the slices the users can ask for.

PRESENTATION

The data displayed by the presentation manager is arranged in frames. A frame typically contains subframes for the actual slice contents, the inter-record navigation and the intra-record navigation.

• The first subframe is the data frame. It displays data

(4)

of the current record. This frame is used for the inspection of the actual data by the user.

• The second subframe is the inter-record navigation

frame. It presents navigation mechanisms to allow the user to navigate to specific records. These records are the result of the user’s query, and together they contain the information the user has asked for. This subframe is meant to facilitate the user in accessing records from that result set, for example with an index showing a list (set) of the records. This inter-record navigation subframe can also contain next and previous buttons to realize a guided tour, if that is the chosen inter-record access structure. As an option (depending on the definitions in the heuristics) an index can be made conditional in the sense that it will not be shown when it contains not enough items.

• The third subframe, the intra-record navigation

subframe, shows a list (menu) of alternative slices (of the current record). It allows the user to navigate to those slices, thus accessing other parts of the same record. The use of the navigational links in this subframe is basically orthogonal to the use of links in the inter-record navigation frame: the user can navigate through the slices of a record independently from the navigation through the set of records. The standard construction of three subframes applies to standard queries that ask for records from one single table. Part of our heuristics deal with queries, sometimes called

join queries, where multiple tables are involved. Then, the

heuristics result in more subframes:

• There is one data subframe per base record to display

the data (slice) from that record.

• For every base record there is one intra-record

navigation subframe (for navigating to its other slices).

• Since a base record can be involved in multiple

“joined” records, there is a subframe that offers the navigation to associated base records (within the joined record). Usually, the heuristics propose the use of an indexed guided tour for this.

• In addition, there is one subframe for the inter-record

navigation to access the complete set of joined records. Usually, the heuristics choose the preferred access mechanism of the “first” base table.

Within each of these frames there are a number of

fields (attributes), each representing a piece of data, that

need to be arranged into a meaningful layout and presented in an adequate style. The presentation manager receives the relevant data from the data manager and then uses its own

presentation intelligence to derive and produce a

presentation format. It decides on the actual format for the presentations by considering for example the size of the data items and the number of slices to be presented.

SESSIONS

Since the presentation manager delivers only part of the query result directly to the browser, in the technical architecture the presentation manager at first only receives part of the query result (the “first” slice). Subsequently, on

the basis of user interaction, it can ask the data manager for other parts of the query result (other slices). This so called

session management is also part of the responsibility of the

presentation manager, since we chose to use the presentation manager for the direct communication with the users. Together with the browsers with which it communicates, the presentation manager maintains a

context for each client. It is however only concerned with

the actual contents of slices and indexes in so far as it influences their size and style of presentation. Note that since the system is required to concurrently serve an arbitrary number of users, it should keep user records as limited as possible. If it is necessary to store user dependent state information, it does so in a centrally coordinated manner.

The dialogue mentioned above applies whenever a user is browsing though the presentation generated for a query result. Now, we concentrate on the entire session dialogue in more detail.

• When a user starts a session by specifying a question

or query, the client will send it as an HTTP Post request to the presentation manager, which wraps it in an XML object that represents the session, and passes it to the data manager.

• The data manager retrieves the query, executes it, puts

its response in the session object and returns it to the presentation manager.

• The presentation manager passes this session object

back to the data manager with each request.

o When a request involves a choice through an

index for an other record, the state of the session object is changed to indicate that the chosen record becomes the “current” record. The data manager returns, in the updated session object, the record’s contents, or rather the contents from the current slice of this record.

o If the user chooses a different slice of the record,

this choice will change the “current slice type” in the session state: this current slice type represents the slice that is considered the current one (for all records). The current record will be unchanged: the focus shifts to a different slice, but from the same record.

So, the elements sent from the presentation manager to the data manager are either (initial) queries, or requests for the same slice but from a different record (instance), or

requests for a different slice from the same record.

The architecture has been developed in such a way that the data manager does not send the complete set of records contained in the query result: it sends only that portion of the query result that the presentation manager will display, while keeping the rest of the data in stock. So, the data manager decomposes the given query result into navigation information, i.e. lists of record keys for the indexes, and in subqueries that contain the data for the individual slices and that may be activated when asked for by the presentation manager. Each time the presentation manager requests a new slice instance, either the same slice of a new record or a new slice of the same record, the data manager

(5)

adjusts the session data (stored in the session object) to reflect the state change. The data manager also contains a caching mechanism that deals with repetitive access to the same data. However, it does not keep track of the state of the individual sessions in which the system is involved.

DATA TYPES

There are four types of data involved in the communication between the presentation manager and the data manager: 1. Session object: The initial query is passed from the

presentation manager to the data manager in a newly created session object, together with an indication that it concerns a new query. The session object contains a query string with placeholders for parameters whose value depends on choices made by the user. The values resulting from these choices are provided alongside the parameterized query specification. In addition to the query, this session object contains data representing the current state of the session, including:

• data on the record navigation (e.g. a frame

containing an index)

• data on the contents (e.g. a value frame)

• data on the slice navigation (e.g. a slice menu)

• additional data in the case of a join query

(dependent on the heuristics for join queries) In addition, it indicates which record is the currently chosen record (chosen from the index) and which slice is the currently displayed slice (chosen from the slice menu). For this object type an XML DTD has been defined, that allows an easy storage and retrieval on some background medium, as well as exchange over a network. Session objects are passed from presentation manager to data manager with each request.

2. Retrieval request: This request represents the start of a session. It indicates to the data manager that it should execute the query contained in the accompanying session object.

3. Slice request: An object of this type originates in the presentation manager and asks for a certain slice. Technically, it consists of a number to indicate the level of the request (to position the base record in a joined record) and the name of the slice type that is chosen at that level.

4. Record request: An object of the record request type is passed from the presentation manager to the data manager, asking for a certain record. Technically, it contains an indication of the position of the base record concerned and a number indicating which of the alternatives for that record has been chosen.

CONCLUSION

In this paper we have discussed the architecture of the

HERA tool for the generation of hypermedia applications for data from a relational database. The tool supports a generation process that involves the use of different kinds of intelligence to adapt the presentation to the circumstances. We have shown the central issues in this approach and the way in which the software architecture is designed to support this. The HERA system architecture primarily contains two parts, a data manager and a presentation manager, and we have indicated how the functionality of the system is distributed over these parts and how they interact. Specifically, we have shown what data are exchanged and in which order. The result is a flexible and reliable system that allows us to change parts and protocols very easily without compromising the functionality of the other parts. Since the main focus of this investigation concerns the use of these different kinds of intelligence, we need a system in which we can easily experiment with different sets of intelligence. The current software appears useful in evaluations of different versions of the generation process.

REFERENCES

(Bieber 1998) Bieber, M. 1998. “Hypertext and Web Engineering.” In Proc. ACM Hypertext’98, ACM, 277-278. (Díaz et al. 1995) Díaz, A., T. Isakowitz, V. Maiorana and G. Gilabert. 1995. “RMC: A Tool to Design WWW Applications.” In Proc. 4th International World Wide Web Conference, 559-566. (Elmasri and Navate 1990) Elmasri, R. and S. Navate. 1990.

Fundamentals of Database Systems, second edition.

Benjamin/Cummings Publishing Company.

(Garzotto et al. 1991) Garzotto, F., P. Paolini and D. Schwabe. 1991. “HDM: A Model for the Design of Hypertext applications.” In Proc. ACM Hypertext ‘91 Conference, ACM, 313-328. (Houben and De Bra 1997) Houben, G.J. and P. De Bra. 1997. “World Wide Web Presentations for Volatile Hypermedia Database Output.” In Proc. WebNet97, the World Conference of

the WWW, Internet,and Intranet, AACE, 229-234.

(Houben and De Bra 1999) Houben, G.J. and P. De Bra. 1999. “Retrieval of Volatile Database Output through Hypermedia Applications.” In Proc. 32nd Hawaii International Conference on

Systems Sciences, IEEE Computer Society, CD.

(Isakowitz et al. 1995) Isakowitz, T., E. Stohr and P. Balasubramanian. 1995. “RMM: A Methodology for Structured Hypermedia Design.”. Comm. of the ACM, 38, no. 8: 34-44. (Isakowitz et al. 1998) Isakowitz, T., A. Kamis and M. Koufaris. 1998. “The Extended RMM Methodology for Web Publishing.” Working Paper IS-98-18, NYU, Center for Information-Intensive Systems.

(Van Doorn et al. 1999) Van Doorn, M., J. Gielen and W. Ten Kate. 1999. “WebEPG: Applying XML technology to tailor TV program information presentation”. Submitted to Ninth