• No results found

2.5 Grid Computing: Large-scale Coordination, Collaboration and Sharing

2.5.4 Key Grid Technologies

The most important part of any distributed system and what defines it is messaging. The abilities of technologies used to represent, structure, and convey data directly influence the ability of distributed computing resources to interoperate. The key technology of the Grid is Extensible Markup Language (XML) which, along with XML Schemas and SOAP, provide a standardised, powerful and flexible messaging mechanism for the Grid. In addition, these technologies form the basis for many other Grid technologies.

2.5.4.1 Extensible Markup Language (XML)

The Extensible Markup Language (XML) [6] is an application independent, simple and flexible text format for data. It is the most important technology in Grid computing because it defines standard methods for structuring, self-describing and defining for- mats for information and data that may be read by almost any computer or software application.

XML exclusively utilises plain text for all document markup, including data structures and data values, an example of XML document is given in Listing 2.2. Summarised below are features of XML that make it well-suited for data transfer:

• its usage of plain text mark-up make it both human and machine readable, • Unicode support which allows writing of its documents in any human language, • the ability to represent fundamental computer science data structures such as,

records, lists and trees,

• self-documenting formatting that describes structure, field names and specific val- ues, and

• strict syntax and parsing requirements which makes parsing of documents simple, efficient, and consistent

XML is also heavily used as a format for document storage and processing, both online and offline, and offers several benefits:

• its robust, logically-verifiable format is based on international standards, • the hierarchical structure is suitable for most (but not all) types of documents, • it manifests as plain text files, unencumbered by licenses or restrictions,

• it is platform-independent, thus relatively immune to changes in technology, and • it and its predecessor, SGML, have been in use since 1986, so there is extensive

experience and software available

However, XML suffers from some of the following weaknesses:

• its syntax’s high verbosity and partially redundancy often lowers human readability and application efficiency, and increases storage needs. In addition, the larger size of XML formatted documents makes it unsuitable for bandwidth restricted networks, such as Mobile phones.

• its strict, descriptive and partially redundant syntax requires that all parsers, even for the most basic XML usage, recurse arbitrarily nested data structures and per- form additional checks to detect improperly formatted or differently ordered syntax or data. Consequently, XML parsers have significant processing and memory de- mands that limit its usage on devices with restricted resources, such as embedded devices. In addition, badly or malformed XML documents can cause exhaustion of resources and stack overflows. Therefore, security considerations arise when XML input is fed from untrustworthy sources.

1 <?xml v e r s i o n=” 1 . 0 ” e n c o d i n g=”UTF−8” ?> 2 <x s d : s c h e m a x m l n s : x s d=” h t t p : //www. w3 . o r g / 2 0 0 1 /XMLSchema” 3 e l e m e n t F o r m D e f a u l t=” q u a l i f i e d ” a t t r i b u t e F o r m D e f a u l t=” u n q u a l i f i e d ” 4 t a r g e t N a m e s p a c e=” h t t p : //www. e−s c i e n c e . s o t o n . a c . uk / c o m p u t a t i o n ” 5 x m l n s : c a=” h t t p : //www. e−s c i e n c e . s o t o n . a c . uk / c o m p u t a t i o n ”> 6 <x s d : e l e m e n t name=” C l u s t e r ”> 7 <x s d : c o m p l e x T y p e> 8 <x s d : s e q u e n c e>

9 <x s d : e l e m e n t name=” Job ” t y p e=” c a : J o b T y p e ” maxOccurs=” unbounded ” />

10 </ x s d : s e q u e n c e> 11 <x s d : a t t r i b u t e name=”Name” t y p e=” x s d : s t r i n g ” /> 12 </ x s d : c o m p l e x T y p e> 13 </ x s d : e l e m e n t> 14 <x s d : c o m p l e x T y p e name=” JobType ”> 15 <x s d : s e q u e n c e> 16 <x s d : e l e m e n t name=” M a c h i n e H o s t ” t y p e=” x s d : s t r i n g ” m i n O c c u r s=” 1 ” 17 maxOccurs=” 1 ” /> 18 <x s d : e l e m e n t name=” S t a t u s ” t y p e=” x s d : s t r i n g ” m i n O c c u r s=” 1 ” 19 maxOccurs=” 1 ” /> 20 </ x s d : s e q u e n c e> 21 <x s d : a t t r i b u t e name=”Name” t y p e=” x s d : s t r i n g ” /> 22 </ x s d : c o m p l e x T y p e> 23 </ x s d : s c h e m a>

Listing 2.3: Example XML Schema that defines data structures for describing the status of jobs running within a computational cluster. Listing 2.2 shows an XML

document that employs this schema.

• it does not define a wide array of data types, requiring additional parsing in order to process the desired data from a document. For instance, XML does not specify whether the value “485.12” is an number or a six-character string. In addition, it does not come with out-of-the-box support for rich data types; the scientist must build them.

• Uses the hierarchical model for representation, which is limited compared to the relational model, since it only gives a fixed view of the actual information.

The XML specification includes a simple type and data structure definition model, known as Document Type Definitions (DTDs) [45]. These allow sharing of common document formats and types, and enable document consistency checking. However, it lacks a way to define complex data structures. To address DTDs’ failings and to improve XML’s usefulness, the XML Schema Language (XSD) [46] was developed as a more advanced, feature-rich and flexible alternative.

XML versatility as a tool for distributed computing, is only fully realised when it is used in conjunction with XML Schemas, see example in Listing 2.3. Standardisation of data formats, essential to the sharing of data. Many important new distributed communication protocols and description languages utilise XML-based documents and the XML Schema Language to specify their data and messaging formats.

XML Schema uses XML itself to define the types and structures of XML data. It provides rich support for basic data types like integer and string as well as common data structures available in computer programming languages. It is also possible to construct user defined data formats, such as postcodes. The flexibility of XML enables the XML Schema to support the sophisticated data structures necessary to define complex user types.

2.5.4.2 Simple Object Application Protocol

The Simple Object Access Protocol (SOAP) [47] provides a simple and extensible frame- work that defines how an XML message is structured. SOAP is designed to be a lightweight protocol for the exchange of information in a decentralised and distributed environment. In addition, it was designed to be independent of any particular transport mechanisms, however, to facilitate message passing, careful attention was paid to en- suring interoperability with commonly supported transfer protocols, such as HTTP [26] and SMTP [48].

SOAP messages consist of a header and a body. SOAP allows security systems, such as firewalls, to identify XML messages without needing to understand their contents, thus it is possible to prevent the blocking of unknown HTTP requests. SOAP also provides rich semantics for indicating encoding style, array structure, and data types. SOAP is currently under inspection by the W3C consortium and is a prototype of the future XML Protocol (XMLP) [49].

Both SMTP and HTTP are valid application layer protocols for SOAP however HTTP has gained wider acceptance due to its extensive capabilities and Internet infrastructure support. Importantly, HTTP enables SOAP to work with network firewalls which is a major advantage over other distributed protocols like GIOP/IIOP or DCOM which are normally filtered by firewalls. A key issue under discussion is whether or not HTTP is the right transport given its inherent synchronous nature.

As has been discussed in section2.5.4.1, XML’s verbose syntax can be both a benefit and a drawback. Compared with CORBA, GIOP and DCOM (see section 2.3.2) that use much shorter, binary message formats, SOAP’s XML messages take much longer to pro- cess making SOAP the slower messaging mechanisms. Nonetheless, hardware appliances are available to accelerate processing of XML messages. It has been suggested [50, 51] that Binary XML may offer a solution to improving performance of SOAP messaging, al- though this creates its own set of problems including the loss of readability and confusing over little or big endian byte representation.