Enterprise Application Integration
(Middleware)
Cesare Pautasso
Computer Science Department
Swiss Federal Institute of Technology (ETHZ) [email protected]
http://www.iks.inf.ethz.ch/
©IKS, ETH Zürich. 2
EAI Course Administration
Lecture: Tuesdays 13.15 - 15:00 (HRS F5) Discussion and Exercises: Thursdays 10:10 - 11:55 (HRS F5) Web site
http://www.iks.inf.ethz.ch/education/ws04/eai
Getting in touch with us:
Cesare Pautasso HRS G7 pautasso@inf 01 632 0879 Thomas Heinis HRS G8 heinist@inf 01 632 4693 Daniel Jönsson HRS G12 jodaniel@inf 01 632 7259
Practical exercises:
Designing, building and programming a composite Web service Exercise is mandatory
Course material:
Script of the lecture (download from the course website) Book (recommended)
Exam:
©IKS, ETH Zürich. 3
EAI Text Book
Available from
Frau Schuemperlin, HRS G10 (01 632 4531) 50.- CHF
©IKS, ETH Zürich. 4
Goals of the EAI Course
The course aims at introducing and discussing in depth several
important topics related to distributed information systems in general and enterprise application integration in particular. In many ways, the course explores the synergy between
information and communication systems and how this synergy can be best exploited for EAI and B2B integration.
The course is more practical than theoretical. The objective is to give
a clear overview of the problems and their nature, how can they be solved, and how this solutions are implemented in practice. While we will spend some time understanding the theoretical
underpinnings of the ideas discussed, the emphasis will be on how these ideas can be implemented in practice. An important part of the course will be devoted to how technology has evolved and the reason why existing systems are the way they are.
You will have the opportunity to program a relatively complex
integrated information system. Without taking part in the exercises you will not be allowed to take the exam.
The lectures, discussions and presentations form an integral part of
the course. If you take the time to learn from them, you will get much more out of this course. Take advantage of the opportunity!
©IKS, ETH Zürich. 5
Motivation for the EAI Course
The architecture of the information systems we use is becoming
increasingly complex.
The access methods, the capabilities, the goals, and the available
technology is continuously changing. What can we learn that will remain valuable in the years to come?
One example: 70 - 90 % of the software costs are maintenance costs. Using the right abstractions helps! Databases used as services remove about 40 % of the code of commercial applications Another example: software reuse is truly efficient and makes
economic sense at a large granularity. How can we build systems that can be tailored to the user needs and yet are applicable in a wide range of areas and environments?
Communications
Demand
Components
Today’s systems are no longer isolated. Comm-unications play a key role in their use. New access methods also change the nature of the problems
The demands on the existing systems keep growing: centralized solutions are not always feasible; cooperation among systems is a must.
System integration is the most challenging aspect of the IT world.
Programming today is to combine already existing, heterogeneous systems. INTEGRATION TIER ACCESS TIER CLIENT TIER RE SO U RC E TI ER APP TIER wrapper wrapper wrapper db db db business
object businessobject businessobject api api
api web
client clientwap clientjava
WWW servers, J2EE, CGI JAVA Servlets API
WWW servers, J2EE, CGI JAVA Servlets API
databases, multi-tier systems backends, mainframes
databases, multi-tier systems backends, mainframes
system federations, filters object monitors, MOM
system federations, filters object monitors, MOM
TP-Monitors, stored procedures programs, scripts, beans
TP-Monitors, stored procedures programs, scripts, beans
WWW and WAP browsers specialized clients (Java, .NET)
Eclipse RCP, SMS ...
WWW and WAP browsers specialized clients (Java, .NET)
Eclipse RCP, SMS ... CLI EN T A C C ES S A PP IN TE G RA TI O N RE SO U RC E HTML, SOAP, XML MOM, HTML, IIOP, RMI-IIOP, SOAP, XML MOM, IIOP, RMI-IIOP, XML ODBC, JDBC, RPC, MOM, IIOP, RMI-IIOP
©IKS, ETH Zürich. 7
Understanding the Layers
Client is any user or program that wants to perform an operation over the system. To support a client, the system needs to have a presentation layer through which the user can submit operations and obtain a result.
The application logic establishes what operations can be performed over the system and how they take place. It takes care of enforcing the business rules and establish the business processes. The application logic can be expressed and implemented in many different ways: constraints, business processes, server with encoded logic ...
The resource manager deals with the organization (storage, indexing, and retrieval) of the data necessary to support the application logic. This is typically a database but it can also be a text retrieval system or any other data management system providing querying capabilities and persistence.
Presentation logic Application Logic Resource Manager 2-5 years Application (system’s logic) 1-2 years Clients and external
interface
(presentation, access channels)
~10 years Data management systems (operational and strategic data)
©IKS, ETH Zürich. 8
A modern e-commerce platform
ASP
ASP SSLSSL
FARM A
FARM A
Games/Music
Games/Music VideosVideos
Comp/Soft
Comp/SoftBooksBooks MusicMusic SQL Product Server
SQL Product Server
ASP File Server
ASP File Server
Cache Server Cache Server Basket/Ad/Surplus Basket/Ad/Surplus Receipt/Fulfillment Receipt/Fulfillment
Monitor and cache
Monitor and cache
ASP
ASP SSLSSL FARM B
FARM B
Games/Music
Games/Music VideosVideos
Comp/Soft
Comp/SoftBooksBooks MusicMusic SQL Product Server
SQL Product Server
ASP File Server
ASP File Server
Search Servers
Search Servers Search ServersSearch Servers
5
5 2222 55 22
Diagram courtesy of Robert Barnes, Microsoft
©IKS, ETH Zürich. 9
Scale-up versus Scale-out
Scale-up
Scale-out
Diagrams courtesy of Jim Gray,Microsoft
•Scale up is based on using a bigger
computer as the load increases. This
requires to use parallel computers
(SMP) with more and more
processors.
•Scale out is based on using more
computers as the load increases
instead of using a bigger computer.
•Both are usually combined! Scale
out can be applied at any level of
the scale up.
©IKS, ETH Zürich. 10
Challenges of Integration
A lot of the problems to be addressed in Enterprise Application
Integration stem from having to integrate standalone applications which have been developed independently, operate autonomously, and were not originally indented to be integrated with one another.
Heterogeneous – each application implements its own data model.
Concepts may be shared, but representation mismatches are to be expected. Mappings and transformations are required.
Autonomous – applications update their state independently without
coordinating with each other. The systems to be integrated are maintained independently and upgraded at different times.
Distributed – in the worst case, every application runs on a completely
separate environment, e.g., database storage is not shared among applications. Message-based communication is the only possibility to exchange information.
©IKS, ETH Zürich. 11
Semantics
Interaction
Ideal integration
The purpose of integration technology is to provide the
illusion of an ideal integration scenario hiding the shortcomings of the real world.
Secure and Reliable Messaging. Many technologies and
protocols have been developed to achieve secure, exactly-once message delivery over unreliable and insecure networks. In an ideal scenario, there would be a single network connecting all partners and systems and providing such features at the network interface layer.
Uniform Semantic Data Model. Ideally, all applications
would share the same schema, providing a unique and
well-defined model of the data avoiding all misinterpretation problems. Translation, mappings, and transformations between different formats and mismatching representations are no longer necessary.
Homogeneous interface processes. The message-based interaction
between different systems happens in the same way. The external interfaces of all systems follow the same public business processes, so that they can be seamlessly interconnected.
Message
Delivery
©IKS, ETH Zürich. 12
Why integration matters
Useful information systems evolve over time by growing in size and by
incorporating functionality of existing standalone systems. Applications originally intended to operate separately, later on are required to interoperate with others.
Technology change affects all layers, legacy does not go away so easily. The architecture of the enterprise information system depends on
constraints related to the technology but also to the organization. In the case of B2B, each company owns its information system and
will not open it up more than strictly necessary as it is part of their competitive advantage. For example, not all business processes are going to be shared, as business processes are mostly kept secret. Within an enterprise, each department may have its own IT
infrastructure, systems and databases which are maintained independently. Integrating them may bring additional value to the company.
Mergers, acquisitions and spin-offs leave a long lasting trace in the information systems of the corresponding companies
©IKS, ETH Zürich. 13
EAI in Context
Databases Networking Software Engineering Programming Languages Enterprise Application Integration MiddlewareHow to build
applications
from scratch
How to integrate two or more
existing applications
©IKS, ETH Zürich. 14
Kinds of Integration
Given two (or more) applications, how can you integrate them? It
depends on the assumptions and on whether you can change the applications. Some examples:
Manual Integration
Manual Integration with Copy & Paste File based integration
API extraction and publishing Script different command lines
Wrap existing software (screen scraping) Data transformation and conversion Message based integration
Point to point, Centralized, Peer to Peer
There are many different ways of doing EAI. Also, Integration can be
©IKS, ETH Zürich. 15 Build product #3, according to specs. Customer 1 Customer 2 Supplier 1 Supplier 2 Retailer Manufacturer 1 Manufacturer 2 Buy products #23, #45 and part #101 Get products #23 and #45
Get parts #A1, #B42, #H2, #R2
Order parts #R2, #101, #ES-01, #G7, #G11 Order parts #A1,
#H2, #G7, #G11, #B42
Get parts #G7, #G11, #ES-01, #R2
©IKS, ETH Zürich. 16
Astronomy
©IKS, ETH Zürich. 17
Our ability to produce data
exceeds our capacity to
explain how the data was
produced.
Scientific Method?
©IKS, ETH Zürich. 18
PROCESSING LOGIC (PL) DATA MANAGEMENT (DM) SERVER
MANAGER DIRECTORYSERVICES REFERENCEMANAGER FILTERSDATA
... TMP STORAGE SPACE IDL SERVER TMP STORAGE SPACE TMP STORAGE SPACE IDL SERVER IDL SERVER ... IMAGES AND RAW DATA NETWORK FILE SYSTEM
DB SPACE SPACEDB DBMS 1 (Oracle) DBMS 2(Oracle) LESS RELEVAT DATA TAPE ARCHIVE FRONT
END (HTTP, RMI) MANAGERARCHIVE PRESENTATION LAYER
APPLICATION LAYER
RESOURCE MANAGEMENT LAYER THIN CLIENT
(HTTP)
WEB BROWSER STREAMCORDER
(HTTP) HEDC
web server (Apache)
www.hedc.ethz.ch JAVA CLIENT LOCAL DB
©IKS, ETH Zürich. 19
Planets outside the solar system
©IKS, ETH Zürich. 20
©IKS, ETH Zürich. 21
The Grid
©IKS, ETH Zürich. 22
Course philosophy
Addressing the increasing need for connectivity, the ever growing
demand, and facing the challenge of component based software design requires to solve a number of data management issues.
By learning to identify the problems and being aware of the state of
the art and possible solutions both theoretical and practical, a system designer will be in a much better position to deal with evolving technology.
Design
Problem
Technical
Solutions
System
Design
©IKS, ETH Zürich. 23
The future of distributed IS
Why distributed information systems?
Computer environments:
Distributed, heterogeneous, autonomous nodes linked by a network (intranet, internet. Emphasis on communication).
Technology advances: On
computing power (powerful clients), on networks (reliability, speed. ATM, ISDN …).
Application demands: Larger and larger applications. Decentralized corporations. Need for autonomy.
New environments and business
models: WWW, distributed service providers, Java, CORBA, Workflow Management.
Basic services: A great deal of work is being invested in producing the type of standards and reusable software needed to make this a reality (SOAP/WSDL/UDDI)
Distributed IS applications: Emphasis on interoperability:
combine your data with that of the rest of the world.
Emphasis on distribution: Intranet, Internet are here to stay. Huge demand for this functionality:
Lotus Notes (applications built on replicated databases). WWW+Java+persistence
(distributed service providers). TP-Monitors (OLTP, OLAP,
transactional processing). Queuing Systems (applications
on top of reliable, asynchronous communications).
CORBA (applications on top of a TP-Monitor like object oriented system)
Workflow
Web Services … and more
©IKS, ETH Zürich. 24
The Web services stack
SAML S/MIME WS-Security Security BTP BTP WS-Transactions Transactions ebXML registries UDDI Discovery ebXML CPA Contracts BPML BPML BPEL4WS WSFL/XLANG Business processes WS-Coordination Choreography ebXML BPSS WSCI WSCL Conversations DAML-S WSEL Nonfunctional description ebXML CPP RDF WSDL Description ebXML MSS SOAP Messaging ebXML Semantic Web WSDL-based
©IKS, ETH Zürich. 25
The distributed systems dilemma
Theoretical advantages of distributed systems:
Locality of reference: With the
proper data placement, most accesses should be to local data, which increases response time and throughput.
Scalability/Processing capacity:
With better hardware available, the overall processing power should be a function of the number of nodes in the system (see parallelism). If more power is needed, add more nodes.
Availability/Fault tolerance: A
distributed system should be able to provide services even when part of the system is down (unlike centralized systems). This is important for large installations and mission critical applications (24x7 computing).
In theory, a distributed system is
faster (better response time and throughput), bigger (more capacity), and more reliable (built-in redundancy). But, in practice, this is not true.
Centralized (mainframe based):
the old-fashioned approach. Most of the valuable data is still in mainframes, although it is only 1 % of all existing data (mainframes are still a good business).
Client/Server (a variation of the
centralized version): a first approach to distribution. Made too many promises and now it is suffering from its lack of
success. Servers are not
mainframes and quickly become a bottleneck. Applications move towards distribution, and find there is no support for it.
©IKS, ETH Zürich. 26
Course Organization
January December November
EAI in industry (Guest Lecture)
October
TP-Monitors
Message Oriented Middleware Workflow Management Systems The role of the WWW in EAI SOAP
WSDL UDDI
Limitations of SOAP, WSDL, UDDI Distributed Information Systems Middleware
©IKS, ETH Zürich. 27
Concrete goals for the course
Provide a basic understanding of the problems associated with distributed
environments (many of the ideas we will discuss apply in many areas, not just typical commercial applications).
Provide the conceptual tools required to understand commercial products
(basic idea behind a product, what its weaknesses are, how to solve them). Understanding how technology has evolved and why products are the way they are is the key to understanding what might happen in the future
Develop the skills and know how necessary to participate in an enterprise
application integration effort: motivation, vocabulary, systems, some programming experience.
Gain sufficient awareness of the state-of-the-art. Some of the problems
covered in the course are very hard and many people have worked on them for years. It is very useful to know what has been done so far and how it can be used.