• No results found

MISTI An Integrated Web Content Management System

N/A
N/A
Protected

Academic year: 2021

Share "MISTI An Integrated Web Content Management System"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Abstract-- The Multi-Industry Supply-chain Transaction Infrastructure (MISTI) has been developed to facilitate today’s business-to-business (B2B) e-commerce [1]. Its architecture seamlessly integrates three key components, the web content extraction tool, Web Content Management (WCM) database, and controlled content delivery Web mechanism. This article introduces the MISTI’s architecture and each of its three key components. It focuses on the MISIT principle instead of the detailed implementation. Lessons learned have been discussed as well to help future research and development efforts.

Index Terms—WCM, MISTI, and B2B.

I. INTRODUCTION

HE growth of the Internet and the increasing

sophistication of Web-based tools have changed the purpose and production of Web sites. In contrast to the traditional content management system, such as the airline reservation system and hotel chain management, an efficient and capable WCM system employed on today’s e-commerce Web site must deal with the explosive amount of information distributed across the Internet.

Fig. 1 WCM Tasks

This work was originally funded by the U.S. Defense Advanced Research Projects Agency.

Q. Lin is with Science Application International Corporation, 5180 Parkstone Drve, Mail Stop T1-7-3, Chantilly, Virginia USA 20151 (telephone: 703-761-4033 ext 162, e-mail: [email protected]).

ICITA2002 ISBN: 1-86467-114-9

Content management can be defined as a combination of well-defined roles, formal processes, and supporting systems architecture that together help an organization contribute, collaborate on, and control information assets [2]. The key content management tasks include contribution of assets, collaboration across organizations, and delivery control illustrated in Fig. 1.

MISTI has been developed to implement all three key tasks shown in Fig. 1. Its goal is to enable customers and suppliers in any industry to use Web to publish and search for product information to establish supply chains. MISTI was initially focused on the missile industry, but it can be deployed to facilitate today’s B2B e-commerce.

The remaining of this article introduces the MISTI architecture, including its web content extraction tool, content management database, and controlled content delivery Web mechanism. Lessons learned are discussed as well to help future research and development efforts.

II. MISTI ARCHITECTURE

Fig. 2 shows the MISTI architecture. Each web site or data source shown in Fig. 2 is managed by the supplier and independent to the others and not managed by MISTI. MISTI behaves as a bridge among all suppliers’ web sites and holds meta-information about each supplier’s product. All those web sites or data sources are combined to contribute assets into MISTI.

Fig. 2 Architecture of MISTI

To better understand the MISTI architecture, Fig. 3 illustrates the major components of MISTI, which are the building blocks to collaborate across suppliers and to control product information delivery to the intended customers.

MISTI – An Integrated Web Content

Management System

Qiang Lin, Ph.D

T

Web Site\Data Source 1

Web Site\Data Source 2

Web Site\Data Source 3

Web Site\Data Source N internet

MISTI Database

Server MISTI Web Server Web User 1

Web User 2

Web User 3

(2)

There are two groups of components comprised of MISTI: 1. Web Server components, including the MISTI

Universal Commerce Language and Protocol (UCLP) page generator, Digital Equipment Corporation AltaVista server, Microsoft Active Server Page (ASP), and IONA Common Object Request Broker

Architecture (CORBA) server.

2. Database Server components, including the MISTI UCLP parser, MISTI ontology, and meta-information about each supplier’s product.

Fig. 3 Major Components of MISTI

The following sections discuss more details about each of those components related to three WCM tasks.

III. CONTRIBUTE ASSETS

Contributing assets into MISTI includes two processes: 1. Create the UCLP pages using the UCLP page generator

and

2. Crawl UCLP pages and store the crawled information by using AltaVista.

A. UCLP Page Generator

To allow each participating supplier to contribute assets or product information into MISTI, a universal language is necessary to explicitly express the semi-structured product information, such as organization name, product name, part number, available quantity, etc. The MISTI UCLP has been served as this language, which is similar to today’s Extensible Markup Language (XML) [3]. Fig. 4 illustrates basic concepts of the MISTI UCLP.

Fig. 4 UCLP Basic Concepts

Therefore, any supplier that wishes to contribute its assets or product information to MISTI must use the MISTI UCLP page generator to create a HyperText Markup Language (HTML) like UCLP page in which additional UCLP tags are embedded with standard HTML tags. The MISTI UCLP page generator provides a set of standard templates to guide the supplier to build the new UCLP pages or modified the existing UCLP pages. It works based on the predefined UCLP tags and the MISTI ontology stored in the MISTI database.

B. AltaVista

AltaVist has been a well-known Web search engine. It contains automatic crawler program to collect information from the given Web page, database to store and to index crawled information, and program to facilitate search and rank those stored information. Therefore, it is an ideal tool for MISTI to use AltaVista periodically to crawl those registered suppliers’ Web sites loaded with the UCLP pages. However, the commercial AltaVista engine has to be tailored to recognize the UCLP tags embedded in the UCLP pages.

C. CORBA

The main purpose of CORBA is to locate and access geographically distribtued MISTI databases if more than one MISTI databases are deployed for any largscale e-commerce applications. CORBA automates many common network programming tasks such as object registration, location, and activation; request demultiplexing; framing and error-handling; parameter marshalling and demarshalling; and operation dispatching [4]. Therefore, the integration of CORBA into the MISTI architecture permits MISTI to scale to the real world WCM system.

D. ASP

ASP is an open, compile-free application environment in which one can combine HTML, scripts, and reusable ActiveX server components to create dynamic and powerful Web-based business solutions. Active Server Pages enables server side scripting for Microsoft Internet Information Server (IIS) with native support for both VBScript and Jscript [5]. MISTI becomes the true WCM system by using ASP as its Web interface between its customers/suppliers and MISTI database.

IV. COLLABORATE ACROSS ORGANIZATIONS

Since MISTI has been designed to collect product information from many suppliers of different industries, it must be able to collaborate across them. MISTI relies on its ontology to make sure nobody’s information will be overwritten when it processes the incoming UCLP pages and stores the parsed UCLP information into the MISTI database.

A. MISTI Ontology

The MISTI ontology starts with independent domains as the ontology roots. Under each domain there are a set of classes and subclasses to form a tree structure. Each class or subclass has its own property or set of attributes. Both IS-A

MISTI Web Server

MISTI Database

Product Meta Information Ontology UCLP Parser

CORBA ASP

AltaVista UCLP Page Generator

(3)

and COMPOSED-OF relationships are allowed to form the tree. The UCLP has been developed to closely support the MISTI ontology. For example, a battery can be classified under the domain “Product” and the class tree of

“electronics/power supply/unregulated/battery”. The MISTI ontology is stored in the MISTI database and it can be updated and modified as needed. As mentioned before, the MISTI page generator has to work coordinately and precisely from the information provided by the MISTI ontology. Otherwise, the product information stored in the MISTI database based on the certain class node of the ontology tree may become an orphan if the class node has been deleted or relocated to the different part of the tree. Therefore, the MISTI ontology can be considered as an objected-oriented tree structure.

B. UCLP Parser

The product information crawled into MISTI by AltaVista will ultimately stored into the MISTI database. The UCLP parser is responsible to process each individual UCLP tag and to strip the information from the tag and stored it as an attribute value of a class instance according to the predefined domain and class hierarchy specified in each UCLP page. The domain and class information is stored under the UCLP UC tag in the format of:

< UC domain = < domain > version = < version > class = < class >

[status = < status >] [privacy = < privacy >] > An example of a “UC” tag is:

< UC domain ="products" version ="2.0" class = ="electronics/power supply/unregulated" status ="obsolete">

C. Product Meta Information

The product information specified on a UCLP page most likely is very limited, such as company name, product name, part number, contact phone number and Web Uniform Resource Locator (URL). Therefore, this information is considered as product meta information in MISTI. After parsed by the UCLP parser, this information is stored and indexed into the MISTI database tables. Compared to a file system, a database-backed CMS offers Web site developers and users several benefits, including [6]:

• Concurrency management and version control

• A consistent yet customizable look and feel throughout the site

• Access control and

• Fast and effective searches.

In case of a customer becomes interested to get further and more detailed information on the product, such as the price, discount on quantity, etc., the stored URL can lead the customer to the actual supplier’s Web site so that the customer can inquire more information. This loosely coupled MISTI meta product information relaxes the administration and security requirements on MISTI since it is up to each supplier to manage and secure its own proprietary information. Each supplier can selectively publish their

products’ information and make changes anytime when it is necessary.

The MISTI meta product information was originally designed to store in the object-oriented database,

ObjectStore, since the MISTI ontology is based on an object-oriented tree structure. The latest MISTI has been changed to use the Oracle relational database to store the meta product information based on the new system requirements and lessons learned.

V. CONTROL DELIVERY

Any Web customers can get into the MISTI Web site to initiate a search with a list of given list of keywords, in a way of any of the words, all the words, or the exact phrase, such as domain name, class name, product name, etc. The AltaVista search engine performs the actual searches and returns a list of matched products from the MISTI database with scores just as any regular Web search engine does.

A very powerful and unique feature of MISTI is called rank. Assume that a customer tries to search for a type of resistors. The customer can open a rank template and fill in whatever number of criteria that describe the ideal resistor. Then, the customer clicks the “Rank” button. MISTI starts to rank (actually to perform the fuzzy pattern matching) this filled template against any product meta information stored in the MISTI database. It looks like that the customer has just created a UCLP page on the fly and submitted it to MISTI to see if any preprocessed UCLP pages may get certain matches. If certain resistors match the criteria, their product meta information will be recalled from the MISTI database with a score from 100% down to whatever percentage plus the supplier’s URL. Then, the customer can click on each matched product and get more detailed information, even go to the supplier’s Web site by clicking the supplier’s URL.

VI. LESSONS LEARNED

MISTI has had a few major releases since 1997. During these years several major technologies have been emerged into the mainstream, such as JAVA 2 Platform Enterprise Edition (J2EE) based application servers, JAVAServer Page (JSP), XML, and better Oracle relational database management system (RDBMS). Therefore, some technologies employed by MISTI in its earlier releases are necessary to be reviewed based on these new technologies. This section discusses several lessons learned from MISTI.

A. Database Technology

Since the MISTI ontology is object-oriented, MISTI was originally designed to use ObjectStore database to store the meta product information, which is an object-oriented database. ObjectStore has been proved to be easy to work with object-oriented languages, such as C++ and JAVA, and very efficient in performance. However, ObjectStore has several major drawbacks, such as:

(4)

world e-commerce applications.

2. Limited security – it does not provide adequate security to the real world e-commerce applications either.

3. Limited search ability – it does not offer efficient key word search so that the separate AltaVista database has to be incorporated for the key word search.

4. Fairly rigid – it cannot be used to hold the UCLP specification or any XML specifications since these specifications are more like relational-based. Therefore, it becomes a major task for even a minor change in the UCLP specification and it forces almost completely reprogramming if a new XML specification is to be applied to MISTI.

Based on the above experiences, the latest MISTI uses the Oracle relational database, which has apparently solved the above problems associated with ObjectStore database. MISTI also has added extra internal error checking by using Oracle database. However, using Oracle database introduces two new issues:

1. It becomes indirectly using C++ and JAVA to access relational database tables so that a mapping layer has to be created between the Oracle relational database and the MISTI CORBA engine, which increases the system complexity.

2. The overall system performance with Oracle database has not been able to match with ObjectStore database even after the intensive Oracle database tuning. With the same hardware and software configurations MISTI with ObjectStore can crawl and process about 50,000 UCLP pages in about 3.5 hours, while MISTI with Oracle database takes about 7 hours to crawl and process the same set of UCLP pages. Apparently, part of the performance slowdown has been introduced by the extra error checking since each error checking involves the execution of a SQL statement.

B. Data Consistency

As mentioned in previous sections, MISTI stored the crawled UCLP page information in two different databases:

1. AltaVista reversed indexed database to hold page URL and certain key words so that AltaVista can perform regular key word search like any Web search engines. 2. ObjectStore object database to hold parsed meta

product information so that the MISTI users can rank a set of parsed UCLP pages based on the users’ criteria. The MISTI ontology, which is stored in a flat file, is also a critical component for MISTI because:

1. Where to store the parsed meta product information relies on the proper MISTI ontology.

2. The UCLP page generator also relies on it.

Based on the above MISTI architecture it is crucial to guarantee all two databases and the flat file to be consistent at all time, which is a difficult and error-prone task. Therefore, the latest MISTI has made dramatic changes in its architecture

so that it only uses the single Oracle database to:

• Store the MISTI ontology.

• Store UCLP parsing information, which can be easily changed when the new UCLP specification comes or added when the new XML specification is required.

• Store and search the key words.

• Store the meta product information for ranking. By using the single Oracle database MISTI has eliminated most of the potential data inconsistency problems. However, when the MISTI users want to generate a new UCLP page, the only way to guarantee the data consistency is to let the users directly connect to the MISTI Oracle database to access the live MISTI ontology. But it precludes the users to create the new UCLP pages if there is no database connection available and it may contribute a new bottleneck if there are too many users connecting to the MISTI database at the same time.

C. Managing Distributed Data Sources

As mentioned in previous section, ObjectStore has limited object capacity. To overcome it and to scale to the true e-commerce application more than one MISTI databases are necessary. CORBA has been the core technology employed in MISTI since 1997, which was intended to access those geographically distributed MISTI databases. One of the key challenges of deploying multiple MISTI databases is how to create and maintain a universal MISTI ontology that not only can be used across all MISTI databases but also can distinguish one MISTI database from the other. As the latest MISTI changes from ObjectStore database to Oracle database, a single MISTI Oracle database can certainly scale to store information for many different e-commerce applications. As J2EE based application servers gain more popularity and have more features than CORBA, MISTI can benefit by deploying more than one J2EE based application servers that connect to a single central Oracle database. Therefore, CORBA may be replaced by one or more J2EE based application servers in MISTI in the future. By using J2EE based application servers can also allow MISTI to use JSP instead of ASP, which provides better security and scalability, and expand the MISTI system from the only Microsoft Windows based system to the other popular operating systems, such as Unix or Linux.

D. Managing Ontology

Ontology is the key to the success of MISTI. Before create and store it in the Oracle database with the latest MISTI, it was stored and maintained using a flat file. Therefore, maintaining the existing ontology such as adding or deleting an entry is difficult and error-prone. Searching the ontology can be very slow since the flat file does not support indexing. With the Oracle database support the ontology management becomes much easier and more efficient. However, several issues still need to be resolved for MISTI.

1) Class Namining

Naming a class seems simple but it involves serious considerations. In Oracle database each class can be assigned

(5)

with a unique sequence number so that there is no confusion throughout the ontology. It is trivial to add or delete a class node from the ontology. However, when the MISTI users create a new UCLP page based on the ontology, they need a meaningful class name, not just a decimal number. The MISTI ontology is a tree with many levels of depth so that it is impossible to create a class name simply by attaching each class names together starting from the root node. Normally, each Oracle text field has limited length (up to 8 Kbytes) so that creating a long name may sooner or later run out the space. On the other hand, if the class name is too short, the name duplication may occur within the ontology. Addition and deletion of a class from the MISTI ontology may invalidate an entire set of the existing UCLP pages as well.

2) Managing Class Property

As discussed in previous sections each MISTI class has its own set of property such as attributes as defined in the MISTI ontology. During the creation of a new UCLP page those attributes serve as the available data fields for each class instance. Sometime later, if the class property needs to be modified, such as deleting an attribute or changing its name, all existing pages associated with this class requires to be changed. Although the meta product information stored in the MISTI database can be changed accordingly, the actual affecting pages are distributed at many different product suppliers’ web sites. To fix those pages certain programs have to be executed on those Web sites based on the updated ontology, which involves detailed procedures and permissions to occur. It requires coordination and even creations of specific memorandum between MISTI and all the participating suppliers.

3) Managing Inheritance

Creation and deletion of a class from the MISTI ontology become more challenging when Oracle database is used. Since each class may have its own property, a new class as a child should inherit its parent’s property. It is a trivial task with the object-oriented database such as ObjectStore. However, it becomes a difficult task when using relational database such as Oracle. Everything has to be programmed to implement both IS-A and COMPOSED-OF relationships in the relational database, which are the building blocks in the object-oriented database. Therefore, extra tasks, efforts, and cost are involved.

VII. CONCLUSION

MISTI has been evolved and improved significantly since it was started in 1997. It can be considered as a prototype of an integrated WCM system. A few pilot sites are testing different releases of MISTI, including both ObjectStore based and Oracle based. Based on discussions of lessons learned, many future research topics still remain open:

1. How to properly name a class, especially when a tree (sometimes is called a mini-tree) is being attached under a node of the other tree.

2. How to properly and efficiently program inheritance in the MISTI ontology when Oracle database is used. Oracle database has its own object extension, such as object tables and columns. However, Oracle database object does not support inheritance.

3. With the powerful searching capability of the Oracle database, AltaVista may become less usable. A much simpler crawler or spider program can be tested and incorporated by MISTI to craw in the UCLP information from the UCLP pages distributed at different product suppliers Web sites.

4. How to architect and implement a new MISTI using the J2EE based application servers, especially using JSP as the Web interface.

5. Investigate how to incorporate commercial XML parsers into MISTI so that it becomes more flexible to accommodate the real world e-commerce applications.

REFERENCES

[1] "Welcome to the MISTI Web Site," SAIC,

http://misti.apo.saic.com/default0.asp.

[2] "How do I manage my web content effectively," Intel,

http://www.intel.com/eBusiness/business/manage/2/hi15039.htm. [3] “Extensible Markup Language (XML),” W3C, http://www.w3.org/XML. [4] Douglas C. Schmidt,Overview of CORBA,” Washington University in

St. Louis, http://www.cs.wustl.edu/~schmidt/corba-overview.html. [5] “An ASP You Can Grasp: The ABCs of Active Server Pages,” Microsoft,

http://msdn.microsoft.com/library/default.asp?URL=/library/en-us/dnasp/html/ASPover.asp.

[6] "Managing Web Content – From File System to Database," Oracle,

Figure

Fig. 2 shows the MISTI architecture. Each web site or data  source shown in Fig. 2 is managed  by the supplier and  independent to the others and not managed by MISTI
Fig. 3  Major Components of MISTI

References

Related documents

Así, podemos constatar que las prácticas de comisariado caracterizan toda su obra: tanto sus exposiciones de comisario como sus trabajos de artista parecen inscribirse en la

• Goal: experts in ICT systems and applications for which BOTH Computer Science and Networking play a central role. • Distributed systems,

Voluntary having sexual connection with Voluntary having sexual connection with another person by the introduction of the another person by the introduction of the penis into the

We overcome this limitation arguing that the map gener- ated by Pose SLAM, or any other trajectory-based SLAM algorithm, is perfectly suited to be used as a belief roadmap. And

mance is its RTT(Round-Trip Time). We evaluate the impact of SAP on the performance of TCP flows of heterogeneous RTTs. In our experiments, we split each of HTTP and FTP groups to

The essential problem of gene expression microarray data analysis is to identify differentially expressed genes (DEGs) under different treatment levels.. Various statistical

The scale construct of CRT individual efficacy included two areas: (1) teachers’ perceptions about their own abilities for engaging in CRT practices (i.e., CRTSE beliefs), and

Bulletin Of University Of Agricultural Sciences And Veterinary Medicine Cluj- Napoca Veterinary Medicine, V.67(1), 2010, pg.370 Science Citation Index Expanded Zoological Record