Asset Locator –
A Framework for Enterprise Software Asset Management
Avi Yaeli, Alex Akilov, Sara Porat, Iftach Ragoler, Shlomit Shachor-Ifergan, Gabi Zodik IBM, Haifa Labs
Haifa University Mount Carmel, Haifa 31905, Israel {aviy, akilov, porat, ragoler, shlomiti, zodik}@il.ibm.com
Abstract
This paper introduces the Enterprise Software Asset Management (ESAM) paradigm, which defines an approach to doing automated software asset management. ESAM is a comprehensive integrated solution supporting search and reuse, collaboration, knowledge sharing, impact analysis, and other enterprise-centric services. We describe Asset Locator, a low-cost, scalable and extensible solution that realizes ESAM. Asset Locator uses a set of autonomous scheduled crawlers that scan enterprise repositories to discover development resources. A set of
domain-specific analyzers process the discovered resources by identifying and extracting
semantic features. Powerful search and navigation engines enable clients to explore the analyzed information. The design of Asset Locator as an extensible framework has enabled its easy
integration into several IBM product offerings.
1 Introduction
Now that the boom of the IT industry appears to be behind us, many organizations are reassessing their IT expenditures, searching for ways to realign their existing investments in software assets to serve their future IT needs. Yet, as pressures of time to market and efficient execution have not abated, this realignment should not come at the expense of such factors.
To address these needs effectively, software asset management tools will become increasingly important.
This paper introduces the Enterprise Software Asset Management (ESAM for short) paradigm, which defines an approach to doing automated software asset management. ESAM leverages the various software artifacts that comprise the applications running in the enterprise. These artifacts play roles during the whole software life-cycle, starting from design documents, source code, compiled code, executables, up to deployed applications in production servers. ESAM is
introduced as a comprehensive integrated solution that provides developers with a wide range of enterprise-centric services which include search and reuse, sharing and collaboration, impact analysis and knowledge management (KM), among others. ESAM is based on an automated mechanism which integrates with existing development processes and systems to provide a low-cost, low-maintenance solution. An initial description of ESAM appeared in [1].
There is a fair amount of research and tools that specialize in certain aspects of asset
management. For example, ComponentSource ([2]), Component Manager ([3]), and Component Registry ([4]) cover aspects of component reuse and components stores. Others, such as
Sourceforge ([5]), Alexandria ([6]) and CodeBeamer ([7]) provide a web-based collaborative
environment for project management, often tightly integrated with CVS. Program understanding and impact analysis aspects are addressed by CodeBeamer, Cast Application Miner ([8]) and SmallWorlds ([9]).
To the best of our knowledge, none of these tools attempts to provide a comprehensive and integrated solution such as ESAM. ESAM covers a wider range of asset types and domains, provides a broader set of asset management services, and can be easily integrated with existing tools and processes. The comprehensive and integrated nature of ESAM is a major differentiator in helping organizations adopt this paradigm to manage and control their IT investments.
Asset Locator is an extensible and scalable solution that has been developed at the IBM Haifa Research Lab as a first step towards achieving ESAM. One of the major services that Asset Locator provides is that of search and reuse. Though previous efforts to force a reuse philosophy on organizations have largely failed, a more pragmatic and automatic approach that facilitates reuse and collaboration may be the key to success in the future. If it is to meet the organizational requirements without raising objections on the part of managers and accountants, the solution must be cost effective, unintrusive, integrated with the day to day tools in use by the organization and introduce little or no overhead. Asset Locator evolved from the eCollabra work as presented in [10].
The rest of the paper is organized as follows: Section 2 describes the ESAM paradigm and its major phases. Section 3 describes Asset Locator architecture and how it correlates with ESAM.
Section 4 covers the extensible framework design of Asset Locator and Section 5 presents how Asset Locator can be used to facilitate reuse. Finally, Section 6 summarizes our achievements and outlines future plans.
2 The Enterprise Software Asset Management Paradigm
Enterprise Software Asset Management (ESAM) is a paradigm for the management of software assets in the enterprise. ESAM leverages these assets in order to provide developers with search and reuse, collaboration, impact analysis, knowledge management, and other capabilities. ESAM is based on an automated mechanism which integrates with existing development processes and systems to provide a low-cost, low-maintenance solution.
At the heart of the ESAM paradigm lies a Central Repository capable to persist and maintain meta-data about the software artifacts that needs to be managed. The Central Repository should be scalable to support the large amount of assets that typically exist in the enterprise.
Based on the Central Repository, ESAM further defines the following major phases:
O Discovery – involves discovery of software assets from enterprise repositories and systems, and collection of relevant information from these repositories. This type of activity is
complex because the assets may be distributed among multiple locations and exist in various
formats. To make the discovery process effective, it must be performed with minimum
user/administrator interaction. The process must also ensure that the Central Repository is
kept in sync with the discovered domain as assets are constantly being added, deleted and modified. If established organizational processes exist where users contribute additional assets or descriptive information (e.g., component library), these should be collected as well.
O Analysis – entails extraction of textual and semantic features (meta-data) from the discovered assets. For ESAM to be effective, it is essential to properly identify the relevant meta-data for each asset type so that it can be leveraged for different uses (e.g. search, impact analysis).
The meta-data and analysis results are persisted in the Central Repository.
O Repository Analysis – provides additional types of analysis that take into consideration a global view of the entire domain. This can only be performed after the Central Repository has been populated. Repository Analysis works on the meta-data that has been collected and can include finding relationships between assets, finding duplications, accumulating
statistics, performing data mining, enforcing coding conventions, etc. The result of the
Repository Analysis is also stored in the Central Repository. In some cases, user input may be required to complement the analysis results (e.g. due to static analysis limitation) and achieve better precision that may be needed for certain services (described below).
O Service Implementation – encapsulates implementation of capabilities as services that
leverage the information accumulated in the Central Repository. The services can be utilized by developers and development tools. For example, this may include a search interface for finding reusable code, navigation interfaces for exploring the enterprise repositories, etc. As users begin to work with the system, usage profiles and other runtime information such as statistics, can be recorded in the Central Repository and leveraged for other types of activities. The Service Implementation must provide an API/Protocol to allow easy integration of the capabilities into existing tools and processes.
O Service Integration – entails integration of the implemented services into existing processes and tools such as IDEs, Portals, Knowledge Management systems, etc. This phase often includes implementation of client side components that utilize the Service Implementation via the APIs/Protocol.
The following diagram illustrates the phases of the ESAM paradigm:
Repository Analysis
Service Implementation
Service Integration
Information Source Analysis
Analysis
Metrics & Statistics
Impact Analysis Search & Reuse Discovery
Navigation
KM & Skill Location Sharing &
Collaboration Statistics
Dependency Analysis
Information Source Discovery
Discovery Runtime
Servers File Systems SystemsCM Enterprise Repositories
User Input
Central
Repository ...
Categorization ...
IDEs
Portals
KM Systems
Reuse Systems
Figure 1: ESAM Phases
3 Asset Locator Architecture
Asset Locator is a low cost, low maintenance, extensible, and scalable solution that has been developed in IBM, Haifa Research Lab, as a first step towards achieving ESAM. Currently, Asset Locator mainly addresses aspects of search and reuse.
Asset Locator Central Repository is implemented using the DB2 relational database and an index server that maintains the indexed textual information. The latter leverages the integrated DB2 Text Extender to provide a powerful and flexible full-text linguistic search functionality.
Asset Locator is comprised of two major modules: the Information Gathering module which implements the first three ESAM phases that address asset Discovery, Analysis and Repository Analysis, and the Service Provider module which implements the two ESAM phases that address Service Implementation and Service Integration.
The following describes the special architectural issues and features in Asset Locator and how they correlate with the ESAM phases.
3.1The Information Gathering Module
This module corresponds to the first three ESAM phases:
O Asset Locator Discovery - Asset Locator uses a set of autonomous crawlers that crawl into
enterprise repositories to discover files that play role in the software development process
(e.g. design documents, source files, test suites). The process is setup via configuration files that determine crawling roots, certain seeds in the repositories from which crawlers begin to discover resources. No other user input is required at this stage. Asset Locator Discovery is scheduled to be performed automatically at predefined time slots, e.g. at night, and is
incremental in nature. It identifies new development resources in the repositories, as well as maintaining previously crawled resources in sync with the crawled domain via time stamp comparison. Asset Locator ships with built in Crawlers for file system, Configuration Management (CM) systems such as TeamConnection ([11]) and ClearCase ([12]), and WebDAV ([13]) support for other common CM systems.
O Asset Locator Analysis - each file is recognized as corresponding to a specific type, usually identified by its extension or via content analysis. The Analysis phase is carried out by a set of Analyzers, each being responsible for analyzing resources of a certain type. Each Analyzer extracts meta-data, specific for the corresponding type of resource. For example, the Java Analyzer extracts class names, super classes/interfaces, defined/used methods, comments, etc.
This lays the necessary foundations for the search engine, facilitating free-text search as well as search for semantic information. Asset Locator ships with built in Analyzers for Java, Java bytecode, COBOL, C/C++, JSP, HTML, XML, J2EE artifacts (EAR, WAR, EJBJar, JSP Tag Library) and text files. However, it can be made to support a much wider set of resource types using the extension mechanism described further on.
O Asset Locator Repository Analysis - the meta-data populated in the Asset Locator Central Repository is further analyzed in this phase. Asset Locator provides two basic repository analyses:
O Categorization of development resources into predefined (yahoo-like) domain
taxonomies. Asset Locator ships with support for Java, HTML and XML taxonomies.
Main categories in the Java taxonomy include Applets, Data Structures, EJBs, I/O, GUI, Multimedia, Networking, Security, and Servlets.
O Dependency Analysis that is performed against the entire Asset Locator Central Repository and discovers relationships between the resources, both within assets that conform to the same type (e.g. hierarchy and reference relationships between Java classes), and between distinct types of resources (e.g. relationships between HTML/JSP, Servlets and JSP Tag Libraries).
The following picture outlines the architecture of the Information Gathering module.
Central Repository Analysis
DB2 database
&
Index Server Dependency Analysis
Discovery ClearCase
File System WebDAV
ConnectionTeam
Crawlers
JAVA C++
COBOL HTML XML
JSP Text
JAVA ClassFile EAR
WAR EJB
JSP Tag Libraries Analyzers
121 2 3 4 65 7 8 9 10 11
scheduler
Repository Analysis Enterprise
Repositories
ClearCase WebDAV
TC
File System
Categorization
Figure 2: Information Gathering Module
3.2The Service Provider Module
This module corresponds to the two ESAM phases that address services and their integration:
O Asset Locator Service Implementation - a semantic Search Service enables clients, users or applications, to invoke queries against the information that resides in the Asset Locator Central Repository. This service enables to search for assets using both linguistic free-text constraints, as well as semantic constraints which correspond to the semantic features that were extracted for each type of resource. Besides the Search Service, Asset Locator incorporates a Navigation Service that provides navigation capabilities over resource
relationships and domain taxonomies, as discovered during the Repository Analysis phase. A set of Result Formatters are provided to support a variety of formats for presentation and integration of query results. Asset Locator is shipped with built-in Result Formatters for general XML, HTML, SVG (Scalable Vector Graphics) language, and with portlet support to facilitate integration into Portal pages. Both services are implemented within a query server, using an open protocol over HTTP, thus enabling easy integration into client applications.
O Asset Locator Service Integration - Asset Locator services are integrated within a set of client systems:
O An integrated view in the client component of TeamConnection, IBM’s CM system.
O A web client, based on HTML, JavaScript and SVG, that supports various repository search and exploration views (further presented in section 5). An early version of this client has been integrated in IBM DeveloperWorks ([14]) to support crawling of IBM RedBook content.
O An IDE integrated Eclipse plugin that provides a natural incorporation of query results into the IDE editors. This implementation uses the general XML result format and specialized Eclipse graphical features.
O A Portlet-based client that integrates with IBM WebSphere Portal Server.
The following picture outlines the architecture of the Service Provider module.
Central Repository
databaseDB2
&
Index Server
Service Integration Service Implementation
Search ...
Semantic &
Free-text
XML
Result Formatters HTML
SVG Portlet Navigation
Taxonomies Dependencies
Team Connection
WebSphere Portal Server WEB Browser
Eclipse IDE Query Server
Figure 3: Service Provider Module
4 Extensible Framework Design
In order to support the changing demands of customers (e.g. supporting analysis of additional asset types) and time-to-market pressures, Asset Locator has been designed as an extensible framework that would let us, as well as customers and third party vendors, extend and configure its functionality. This framework design proved to be effective as it enabled integration of Asset Locator into several IBM product offerings, where each product introduced a different set of requirements.
The Asset Locator framework provides several extension points through which new functionality can be added:
O Analyzers Extension Point - enables analyzing additional types of assets.
O Crawlers Extension Point - enables discovering assets in additional types of enterprise repositories.
O Repository Analysis Extension Point - enables performing additional types of analyses that work on the collected meta-data.
O Search Service Extension Point - enables configuring and modifying the query logic that will be carried out by the semantic Search Service (e.g. recursive queries, weighted search).
O Result Formatters Extension Point - enables formatting query results according to client needs.
One of the goals in designing the extension point mechanism was to keep it as simple as possible
for developers and administrators to use. Most extension points are implemented via a plugin
mechanism where the developer supplies a description of the plugin module in an Asset Locator
configuration file, along with an implementation class for the plugin module. At startup, Asset Locator reads the definitions of the available plugin modules from the configuration files, and dynamically loads the implementation classes.
In addition, the framework attempts to automate much of the common behavior in each of the extension points so that developers are left with supplying minimal coding.
5 Using Asset Locator for Reuse
Asset Locator facilitates reuse by providing capabilities in three main stages:
Repository Exploration - involves exploring the repository to find potential reuse candidates.
Exploration can be performed using simple or advanced Search Services (described in section 5.1), and through various Navigation Services. Asset Locator supports navigation of the repository via predefined domain taxonomies (yahoo-like directory, see Figure 4) and through navigation of dependencies between resources (described in section 5.2).
Resource Examination - involves examining the reuse candidates. Since recognition of the reusability of existing software resources is a much more complex process than simply
examining the code itself, Asset Locator provides several views to assist in examining the reuse candidates. A detailed view (Figure 5), which is unique to each type of asset, presents additional information on the candidate asset. For Java classes, the detailed view also includes a graphical representation of the immediate inheritance and reference relationships, which help in
understanding the context of the candidate class.
Resource Adaptation - this is the act of reuse itself which involves integration or modification of the candidate resources for new purposes. Asset Locator supports adaptation by enabling downloading of resources from the crawled enterprise repositories. Furthermore, our IDE integrated Eclipse client allows developers to import the inspected asset directly into their working project and immediately start editing it.
Figure 4: Repository Exploration through Categorization Taxonomy
Figure 5: WAR file Detailed View
5.1Semantic Search
As mentioned above, Asset Locator allows searching for resources in the repository using either a Simple Search or an Advanced Search.
The Simple Search enables users to search for occurrences of terms in assets, in similar fashion to how common search engines provide search support for documents. The user supplies the search terms and chooses a resource type (e.g. Java, C++, HTML). Asset Locator then uses its Index Server (see Section 3) to find all resources that contain those terms. Optionally, the user can choose to perform a federated search across all resource types.
The Advanced Search mechanism leverages the semantic attributes that were extracted by each
Analyzer. For each type of asset, the user can supply conditions for features that were extracted
by the corresponding Analyzer. For example, the user may compose a query for all Java classes
that define the method “addMouseListener”. Furthermore she can combine this with additional
conditions on other Java features, using boolean operators. These semantic search capabilities enable users to precisely define the characteristics of the reuse candidate, and narrow down the results returned by Simple Search.
Figure 6 illustrates a query in the Advanced Search Service.
Figure 6: Advanced Search Service
5.2 Navigating through Resource Relationships
Users can also navigate through the relationships that were discovered during the Dependency
Analysis phase. References have an associated type. For example, a Java class may reference
another Java class via a Class Reference, whereas a J2EE WAR (Web Archive) references an
EJB via a JNDI Reference. In addition, there is a Containment relationship that the user can use
to navigate into compound resources. For example, a WAR contains Servlet classes and HTML
files, whereas J2EE EAR (Enterprise Archive) contains WARs and EJBJars. Figure 7 shows an
example of Containment relationships.
Figure 7: Containment Relationship in the Dependency Navigator