4.3 Knowledge storage
5.1.1 Content on the WWW
The hypertext markup language
The documents on the web are defined in terms of the hypertext markup language (HTML).
HTML is a text-orientated markup scheme designed to embed formatting, link and other infor-mation into the inforinfor-mational content of the document itself. HTML tags are used to disseminate between informational content and markup data [30].
In brief, HTML is the language used to create documents for the web. As with any language HTML has a specific structure and syntax, defined by the world wide web consortium (W3C) [31]. For a given document, a web browser parses and interprets the HTML tags embedded inside the document and then finally displays the document as specified by the tags.
As the web has matured since its inception, so has the technology for creating web documents.
On the web today, HTML documents can be classified into two broad groups, namely static HTMLor dynamic HTML. The differences between the two are discussed below.
Static HTML documents In the early days of the web authors created documents manually, i.e. they literally produced content for their web pages and then manually inserted the needed HTML tags into their content. Documents produced in this manner are called static HTML doc-uments, because the content (and markup data) does not change unless the author himself/herself modifies the document.
Obviously for authors to create their pages they needed to have an extensive knowledge of the syntax and structure of HTML, making publishing on the web only feasible for a select few. Tools were developed for the creation of HTML documents. This enabled more authors to publish their content on the web without having specific knowledge on HTML.
Currently on the web, much of the content is still defined in terms of static HTML documents.
One of the biggest problems with static documents is that they are very labour intensive to main-tain and quite inflexible. In order to address these problems scripting languages (or scripts) were developed.
Two types of scripting models exist, client side scripting and server side scripting. Client side scripts are interpreted by a web browser whereas server side scripts are executed by a web server [30].
Client side scripts allow authors to extend their HTML documents through the embedding of code segments inside a HTML document. These code segments can then be used for a variety of purposes , from calculating simple values to graphics. Server side scripts are also useful for a variety of purposes, of which one of the most important ones are discussed below [30].
Dynamic HTML (DHTML) documents Both client and server side scripts can be used to programmatically generate HTML content. Server side scripts are a popular choice for this as the script code is executed when a web browser requests a certain page from the server. The script then typically outputs HTML text to the browser. Documents that are created in this way are called dynamic HTML documents, because of the dynamic nature in which they are generated.
This inherent flexibility of DHTML documents has made it a popular choice for content authors [30].
Client and server-side scripts are however not the only means of active content found on the current web. One of the alternatives will be discussed in the next section.
The extensible hypertext markup language (XHTML) The extensible hypertext markup lan-guage is the reformulation of HTML 4.01 in XML. The XHTML family is seen as the next step in the evolution of the Internet, providing content authors the benefits and power of XML while still providing backward and future content compatibility [32].
One of the main benefits of XHTML for content authors is the ability to introduce new ele-ments or additional element attributes, in contrast to HTML 4 where eleele-ments and attributes are predefined. It is also possible for XHTML documents to utilize scripts or other active content.
There are distinct differences between HTML 4.01 and XHTML. These and more are discussed in greater detail in the XHTML specification [32].
Other types of Active Content
In addition to the methods discussed above, other types of active content exist on the current web.
Two of the most fundamental of these are Java Applets and Macromedia Flash applications.
Java Applets The java programming language is a machine independent language designed to be flexible enough for the construction of standalone applications as well as embedded applica-tions for the web. These embedded applicaapplica-tions are called applets. To understand how this is done, a brief background on the java language is given below [30].
Java can be described as a compiled and interpreted language. Java source code is compiled to an universal format called byte-code. This byte-code is then interpreted by an interpreter called the java virtual machine (JVM). The JVM is quite lightweight and can be implemented as either a separate application or embedded into another application, like a web-browser. This then enables the browser to use the JVM to execute compiled java code and display it as part of the content of a web page [30].
Applets constitute active content as a user is able to interact with the running program and extract usable information from it. Applets can also be used as navigational aids on a web page or to perform some processing function [30].
Macromedia Flash Macromedia Flash is a vector graphics based drawing system designed by the Macromedia corporation. The idea behind flash content is similar to java applets in the sense that the content is embedded inside a web page. A special plug-in is then needed for the web browser to enable it to display the content [33].
Flash animations are encapsulated in a special file format called shockwave flash (SWF) (pro-nounced “swiff”). One of the main goals of the SWF format is delivery over a network with limited and unpredictable bandwidth. This makes it an excellent choice for content providers who wish to produce highly graphical front-ends for their websites and still remain accessible over the web [33].
Flash animations are not only static vector graphics, but users can also interact with the anima-tion. This gives it enough flexibility to be used for full screen navigation interfaces or high-impact content presentation [33].