uwww.tugraz.at
Web Architecture I
03.12.2014
Outline
• Development of the Web • Quality Requirements
• HTTP Protocol • Web Architecture • A Changing Web
• Web Applications and State Management • Web n-Tier Architecture
• Web Data Management
uwww.tugraz.at
History of the web
• Devised 1989 to deliver static content
• Hypermedia: documents linked into a web • Navigate by flowing links
• Underlying standards
• HTTP (Hyper Text Transfer Protocol)
• HTML (Hyper Text Mark-up Language)
• URL (Uniform Resource Locator)
• All underlying standards
• Simple
4
World Wide Web vs. Internet
03.12.2014
WWW Architecture I
5
Growth of the Web I
Growth of the Web II
• Time to reach 50 million people
• Telephone: 75 years • Radio: 35 years • TV: 13 years • WWW: 4 years 03.12.2014 WWW Architecture I 7
Growth of the Web III
8 95,5 40,4 15,8 32,0 9,8 0 10 20 30 40 50 60 70 80 90 100 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014* P e r 1 0 0 inh a b ita n tsGlobal ICT developments, 2001-2014
Mobile-cellular telephone subscriptions Individuals using the Internet
Fixed-telephone subscriptions
Active mobile-broadband subscriptions Fixed (wired)-broadband subscriptions
uwww.tugraz.at
Quality attributes I
• Usability - it must be very easy to use
• I.e. very easy to create, structure and reference information
• Participation was voluntary and it was the only possibility to attract the users
• Very error forgiving in structuring and referencing because of non-technical background of users
• Some things might look different from today’s point of view
Quality attributes II
• Technical simplicity - it must be very easy for developers to implement
• All components simple and text-based
• I.e. the first version of HTTP: servers need to respond to the GET method
• HTML very simple: easy to write parsers and browsers
• URLs extremely simple
03.12.2014
WWW Architecture I
Quality attributes III
• Extensibility - it must be easy to add new features
• The first versions of components (standards) where very simple - improvements were needed
• User requirements change even in a closed environment
• In a global scope the change is only feature that does not change
• Examples:
• users wanted to have search facility apart browsing • Interaction with the content
Quality attributes IV
• Scalability - it needs to match the Internet-scale
• anarchic scalability (think about growth rate)
• The Internet is not under control of a single organization – it is totally decentralized
• Need to continue operating when under an
unanticipated load or malformed or maliciously constructed data
• Examples:
• 40,000 Google search queries every second
• https://en.wikipedia.org/wiki/List_of_most_viewed_ YouTube_videos
03.12.2014
WWW Architecture I
Quality attributes V
• Anarchic scalability - consequences
• Clients cannot be expected to maintain knowledge of all servers ⇒ Make it searchable!
• Servers cannot be expected to retain knowledge of state across requests ⇒ Make it stateless!
• Documents cannot have back-links: the number of
references to a resource is proportional to the number of people interested in that information (Google
PageRank)
Development of the Web
• The original Web was not designed to meet all of
the requirements and quality attributed defined
above
• It lacked also an architectural vision that would meet these ambitious requirements
• World Wide Web Consortium (W3C) was founded to solve these problems
• A lot of researchers worked on defining an architecture to meet these needs
• Security and Encryption was not mentioned at all Conclusion of all the quality attributes
03.12.2014
WWW Architecture I
Overview
• Content
• HTML (Hyper Text Mark-up Language)
• Identification
• URL (Uniform Resource Locator)
• Communication / information exchange
• HTTP (Hyper Text Transfer Protocol)
• Based on TCP Connections, where TCP
• itself is based on IP
• Original design was completely stateless
03.12.2014
WWW Architecture I
HTTP Characteristics
• Text based protocol, human readable • Request consists of:
• Method
• Number of headers (key-value pairs)
• Some methods allow a payload
• Response includes
• Status code
• Number of headers (key-value pairs)
• Depending an the request, a payload is returned
HTTP Examples
• Request
GET /webpage/index.html HTTP/1.1 Host: example.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) … Cookie: JSESSIONID=9C6694142332E65F0CB175BDF1758243; • Response HTTP/1.1 200 OK Server: Apache-Coyote/1.1 Content-Type: text/html;charset=ISO-8859-1 Content-Length: 64
Date: Wen, 03 Dec 2014 14:15:05 GMT <content>
03.12.2014
WWW Architecture I
HTTP Versions
• HTTP/0.9 - released in 1991
• HTTP/1.0 - released in 1996
• Stateless, i.e. each request is done in a new TCP session
• HTTP/1.1 - Todays standard
• Reusing of TCP sessions can increases the throughput (keep alive flag)
• Header specifying the content length needed
• HTTP/2 - different drafts are already tested
• HTTP/3 – talks have already started
uwww.tugraz.at
Deriving the Web architecture
• Introducing constraints on the Web architecture to obtain an optimal solution to the requirements and
quality attributes
• Each constraint will have advantages and disadvantages
• The whole design process is then a balancing process
• Optimisation to obtain a best-match for the Web architecture
Client-Server: Separation of concerns I
03.12.2014
WWW Architecture I
Client-Server: Separation of concerns II
Client-Server: Separation of concerns III
• Separates user-interface from data manipulation concerns
• Supports independent evolvability
• Clients and servers can be developed independently and across organizational boundaries
• E.g. someone uses Google Maps on their own homepage
• Supports Internet-scale attribute
03.12.2014
WWW Architecture I
Stateless I
Stateless II
• Communication must be stateless in nature • Each request from client must contain all the
information needed to process that request
• I.e. it can not take advantage of session information stored on the server
• Session state is completely on the client • Possible Drawback
• Information might need to be send multiple times
• Important Benefits are visibility, reliability and scalability
03.12.2014
WWW Architecture I
Stateless III
• Visibility:
• Only look at a single request to determine the full nature of the request
• Reliability:
• It eases the task of recovering from partial failures
• Scalability:
• Server can free resources after each request
• Simplifies implementation because servers do not need to manage information across multiple requests
Cache I
03.12.2014
WWW Architecture I
Cache II
• Information can be labeled (by servers) as cacheable
• If a response is cacheable, then a client cache is given the right to reuse that response data for later, equivalent requests
• Advantage: Improves efficiency, scalability,
user-perceived performance
• Disadvantage: Decreases reliability if the data does not match
• Midway: ask a server if the data has changed
Uniform interface I
03.12.2014
WWW Architecture I
Uniform interface II
• Uniform interface between components • Advantages:
• Visibility of interactions is improved
• Simplifies the overall architecture
• Decouples implementations from the services
• Improves Internet-scale
• Disadvantages:
• Degrades efficiency
Uniform interface III
• Prerequisites for a uniform interface
• Unambiguous Identification of resources (URL)
• Manipulation of resources through representations • In the beginning: HTML
• Later: Extensible Markup Language (XML - still widely used)
• Now: JavaScript Object Notation (JSON)
• Self-descriptive messages
• HTTP Methods describe the action (GET, POST, PUT, DELETE)
03.12.2014
WWW Architecture I
Layered system I
Layered system II
03.12.2014
WWW Architecture I
35
• Improves Internet-scale
• Application composed of layers that are only aware of the neighbouring components not the complete system
• Bounds complexity and promotes independence between components
• Each laver
• Uses the service of the underlying layer
• Provides a service to the layer above
Layered system III
• Supports scalability by introduction of proxies, shared-caches, gateways
• E.g. load-balancing behind a gateway
• Reduce user-perceived performance because they add processing overhead
Code on demand I
03.12.2014
WWW Architecture I
Code on demand II
• Client functionality extension by downloading code • Advantages:
• Improves extensibility
• Independent development
• Be aware of security concerns! • Technologies
• JavaScript (by far most important)
• Flash (is loosing ground fast)
• Java applets (already dead)
•
uwww.tugraz.at
The Web evolved as a platform I
• The Web evolved as a platform
• Started out with simple Homepages with static documents (1990s)
• Developed into more and more interactivity (2000s)
• Now the web is a complex system of different types, applications, services,
• Two faces of the Web nowadays
• The Web as an application platform
• The Web as a huge distributed database
The Web evolved as a platform II
03.12.2014
WWW Architecture I
Web Applications and State
Management
What are the issues when building Web
applications?
• User requirements
• User interface and usability
• Application state (manage state) and hypertext (navigate)
• Addressability • Architecture • Scalability • Performance
• Fast development circles
03.12.2014
WWW Architecture I
Traditional Stack of Web-Applications
Example: Apache Tomcat
44
Operating System Virtual Machine / Hardware Web Server
Web Application Server
W eb Ap pl ica tion 1 W eb Ap p lica tio n 3 W eb Appli cat ion 2 • Application logic
• Answers the requests • Manages the sessions
• “Servelts” packaged in a war-file
• Stateless connection handling (HTTP) • “Coyote”-part of Tomcat
• Support for Session handling • Servlet-container
Modern Stack of Web-Applications
Example: Dropwizard
03.12.2014
WWW Architecture I
45
Operating System (OS) Virtual Machine / Hardware
• Each Web application is an application on OS-Level
• Web-Server functionality is provided by library / framework (i.e. part of
Dropwizard)
• Result is one complete Java application as jar-file
• Session Handling is done by the application (with library / framework) • Solves scaling issues of traditional
stack
• Provides better isolation of applications
W eb Se rver W eb App licatio n 1 W eb Se rver W eb Se rver W eb App licatio n 2 W eb Ap pl ica tion 2
Session Tracking
• HTTP is stateless
• Sessions are tracked by unique identifiers
(Session-Id)
• Session-Ids are transmitted from and to the server
• As part of the URL (URL rewriting, permalink)
• In the HTTP Header (Cookies)
• Sessions must either be tracked by
• Application
• Client
•
Session Tracking on the server
• Cookies or URL rewriting can be used
• Web server provides only low-level tracking
• I.e. they provide the framework for session tracking not the full logic
• Application server has other responsibilities as well
• Can lead to serious scalability problems
• Load balancing between server becomes complicated
• Handover form one server to another in one session gets difficult or even impossible
03.12.2014
WWW Architecture I
Session Tracking on the client
• URL rewriting can be used
• Transfer parts of the application logic into the client (Code on demand)
• Manage it there with AJAX (Asynchronous JavaScript and XML)
• But other problems arise
• How to recover states with a new session: AJAX applications have typically single URLs?
• How to recover previous state, i.e. browser back button problem?
Session Tracking on the client and server
• The optimal solution is typically somewhere in the middle:
• Manage only important states on the server
• Give each state an own URL
• Use linking to relate states to each other
• No management of the state on the server: no scalability problems
• No management of the state on the client: no recovery problems
03.12.2014
WWW Architecture I
Session Management - URL Rewriting
• Advantages
• Meaningful, easier for humans, readable
• URL can be bookmarked, share with others
• Search engines can retrieve different parts and index it
• Advantages for service integration, as you might link services to each other
• Make different content representations addressable (HTML for humans, XML or JSON for services)
• Disadvantages
• Too long links, Browser limits are usually 2048 or 4096
Session Management - Cookies
• Advantages
• Can store more data
• Limit depends on browser (4kB to 10MB per domain)
• Short URLs are kept
• Disadvantages
• Might be difficult for the user to grasp, as nothing is seen
• Legal issues
• Must be used with care, use URL rewriting when whenever possible
03.12.2014
WWW Architecture I
Session Management - Example
• Google Maps uses AJAX to maintain a permalink • Any action that you execute changes the permalink • The permalink is kept as a part of HTML
• This is the equivalent of the address bar
Session Management - Example
03.12.2014
WWW Architecture I
Session Management - Example
• A little bit of extra DOM/JavaScript work keeps the Permalink up to date as you navigate
• Every point on the map is a separate application state that has its own URL
• Application states were destroyed by AJAX but was put back by application design
• It allowed communities to grow around the Google Maps application
• Only because of proper management of application states with URLs
uwww.tugraz.at
Starting point - 2-layer applications
• Everything runs on the server
• One and the same scripts implements application
logic and the presentation (e.g. generating of HTML) • Application / Presentation • Scripts (e.g. PHP) • Data Management • Relational database 56 Data management Application / Presentation
Problems of 2-layer applications
• Mixture of application and presentation related functionality
• Changes in application logic lead to changes in presentation functionality and vice versa
• E.g. changing a table that present some application data leads to changes in the return values of some application specific functions
• Even more dangerous the presentation layer talks directly to the database via a data manipulation language (DML)
⇒ Better modularity is achieved with the third layer
03.12.2014
WWW Architecture I
Evolvement - 3-layer applications
• Separation between Application and Presentation layer
• No direct connection between Presentation and Data Management
• Decoupling of Application and Presentation layer • Possibility to exchange Presentation layers
• Example:
• Making a Web gateway to an existing application
• Old GUI (e.g. a standalone GUI) is replaced with a Web GUI
3-layer applications - Architecture
• Presentation tier
• HTML, templates and scripts to generate HTML
• Application logic tier
• actual application, the business logic
• Data access tier
• manages persistent application data 03.12.2014 WWW Architecture I 59 Data management Process Logic User Interface
3-layer applications - Surroundings
• User interacts via the Web browser
• Complete Work is done in the Web application
• Provide GUI
• Do the actual logic
• Load & store data
• Persistence Backend
realised with a relational database 60 Web Browser Database Data management Process Logic User Interface Web Application HTTP SQL
3-layer applications - Client-side / Browser
inclusion I
• With introduction of AJAX different possibilities where to situate tiers
• E.g. presentation in browser: HTML + (presentation) JavaScript, application and data access on server
• E.g. presentation and application in browser: HTML + (presentation and application) JavaScript, data access on server
• Note: May require additional considerations in regard to security (if the application logic is done on the
client)
03.12.2014
WWW Architecture I
3-layer applications - Client-side / Browser
inclusion II
62 Web Browser Database Data management Process Logic User Interface Web Application HTTP SQL Web Browser Data management Process Logic User Interface Web Application HTTP SQL3-layer applications - Model-View-Controller
• There are numerous architecture variants built on the top of N-tier architectures
• In traditional software engineering User-oriented database applications are built with an N-tier
architecture
• The most important for Web applications:
Model-View-Controller architecture
• It was invented in the early days of GUIs
• To decouple the graphical interface from the application data and logic
• Very useful also for Web applications 03.12.2014
WWW Architecture I
3-layer applications - Current state
64
Database
Data management Process Logic
Other Web Apps.
Web Application
SQL
Database Web Application
NoSQL HTTP
Client
HTML , JSON/XML over HTTP
Static Content User Interface
• Browsers combine static content (HTML) with dynamic data
• Other Web Application only use the dynamic data
• Web Application provides different endpoints for static and dynamic content
• Combination of existing DBs/services with new
uwww.tugraz.at
Data Backbone
• Often Web applications deal with relational databases
• Need to manage relational data in object-oriented applications
• Use design patterns like Data Access Object (DAO) • Use object/relational mapping (ORM), like Hibernate
framework or Java Persistence API (JPA)
Web as a database
• The Web we use is full of data
• Book information, opinions, prices, arrival times, blogs, tags, tweets, etc.
• The data is organized around a simple data model: node-link model
• Each node is a data item that has a unique address and a representation
• Representation formats are e.g. HTML, PDF,... for humans, or e.g. XML, JSON for programs
• Nodes can be interlinked using their unique addresses
03.12.2014
WWW Architecture I
Information retrieval
• How to find what I’m looking for (again)?
• The mainstream approach are search engines with full-text processing
• Another approaches analyze links
• Links in databases, or within/between documents/sites
• Mixed approach: full-text and links, e.g. Google
Managing Metadata
• Metadata is data about other data, often semi-structured
• On the web
• Tag information items (everything that you can access via URL) in a structured manner
• Social Web 2.0 applications http://del.icio.us or
http://www.flickr.com
• Semantic annotation of Web content (Microformats)
⇒ Search inside metadata
03.12.2014
WWW Architecture I
Web Architecture I
03.12.2014