• No results found

CSE 135 Server Side Web Languages Lecture # 12. Web Performance Notes

N/A
N/A
Protected

Academic year: 2021

Share "CSE 135 Server Side Web Languages Lecture # 12. Web Performance Notes"

Copied!
52
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)

Core Ideas

•  Given the trade-off of server side we really need

to think about time, but interestingly most gains

come on client side!

•  To a user time passed matters not bytes sent

•  There is a difference between perceived time

and actual time

•  Page paint time matter

•  Amount of screen refresh matters

–  Frames, Emphasis on reflows in HTML/CSS parse •  How the screen refreshes matters

–  All at once vs. incrementally •  Application pacing matters

(3)

Core Ideas

•  To some Web owners bytes sent may matter quite a bit

as well because of cost.

•  Obviously cost is bandwidth

–  How much does 50K cost…nothing

–  How much does 50K * thousands of customers cost maybe something?

•  Note the design focus of e-commerce sites, $ and bytes

in content not in navigation

•  Heavy pages don t just cause bandwidth they may cost

hardware in terms of scalability, servers can t be done

with a connection as quick thus you will need more of

them more quickly

(4)
(5)

Core Idea – Golden Rule

Golden Rule of Optimization

Less data, less often

and close by!

•  Send as little as you need to as infrequently as

you need to if you want to go faster

–  Example: compression -> little to send

–  Example: caching -> less frequent requests

(6)

Scale != Speed

•  Adding more servers doesn t make a site faster

–  Scaling does not mean faster unless things are

overloaded

•  If a server is overloaded by offloading it, it may

appear to go faster but in that sense the server

was not operating at optimal efficiency

•  Do you know the tolerance of a server,

connection, etc.?

–  Number of concurrent connections it could handle,

amount of traffic before the pipe is saturated, etc.

(7)

Upgrading 5-10mbps gets only about 5% load time improvement!

-20ms roundtrip time = linear load time improvement

(8)

Simple (Re)view to Think about Optimization

HTTP Request Web Server Hardware & Software

HTTP Response Backend System (e.g. Database)

User Agent of some sort Server Side programming technology Apache, IIS, Zeus, etc. CGI

Apache Module, ISAPI Scripting Tech (PHP)

(9)

•  Steve Souders (and others) have stated that

80-90%* of your user-response time is client-side

–  Start there most gains

–  Simple to do

–  Easy to measure

• 

http://stevesouders.com/

Shouldn’t this be in client-side class? Yes but…

(10)
(11)
(12)

Web Overview: Steps 1 & 2 Issues

•  Main challenge is DNS which is both fragile and robust

–  Don t skimp on DNS servers

–  Consider DNS replication or managed services

•  UltraDNS (www.ultradns.com)

–  Consider using shortened and contingency names to help users

•  Forget the www

–  Minimal domains (e.g www.pint.com ~ pint.com)

•  Contingency hosts

•  w, ww, wwww

•  All pretty much free just DNS entries

•  Contingency domains

–  Expanded out: powellinternet.com –  Products and brands: www.ipod.com

–  Typos: www.gooogle.com, www.amazom.com –  Misspells: www.zerox.com

(13)
(14)

Step 3 Notes

•  To improve this step reduce travel time

–  close the gap

–  closer à less hops, less distance

–  geographical/network geographical sensitivity

–  implies edge servers a la CDNs (content distribution network)

•  Getting beer at the ball park example

•  Reduce the request size

•  make payload smaller; analyze payload

•  no savings here as request is small, response though will be large

–  Increase bandwidth?

•  Not really helpful

(15)
(16)

Step 4 Issues

•  Bottleneck: Server Capacity

•  Can the server take my request?

–  No – too busy right now

•  -Not enough capacity or incoming bandwidth –  hardware/software

–  flash traffic

–  too many requests

–  holding requests open too long (many slow –  downloaders)

–  taking too long to fulfill requests

»  processing time of request are significant

(17)
(18)
(19)

Server Capacity Notes

•  Simple solution of DNS round-robin is often used

for sites with only a few servers

•  While easy to set up, it has cons

–  Sub-optimal distribution of traffic across the servers

–  Does not deal well with hung servers

•  To improve the situation you can create a

cluster using software or hardware

–  Hardware often makes more sense (except cost)

–  Many intelligent switch/loadbalancer vendors

(20)

Server Capacity Notes

•  A load balancer will distribute load across a server farm

based upon metrics such as least busy, most available,

closest network wise, fastest response, etc.

•  While you may not be able to afford a load balancer you

can segment server traffic as well

•  Consider a machine like images.pint.com to handle your

image traffic, store.pint.com to handle your SSL traffic,

and so on. Then by just using your links and changing

HTML you will distribute some load around

–  This particular approach with images actually may have a

benefit in user download speed as well since browsers will open new connections to the other domain and parallelize your

(21)
(22)

Server Capacity Contd.

•  Can the server take my request now?

–  Yes but…

•  -Response is still slow! –  Static content

»  disk problems

»  disk to network copy delays •  Get a faster disk drive

–  Most Web servers are not CPU bound they are disk and network bound

(23)

Server Capacity Contd.

•  Can the server take my request now?

•  Yes but…

–  -Dynamic content problems

–  Generation times

–  It does take time to build a page on the fly and if you are doing a lot of this your box may be CPU bound

•  A significant problem here is the so called static dynamic

–  http://www.xyz.com/pressrelease.asp?id=5 in most sites is the same page regardless of the users yet it is been rebuilt every single visit to the site which costs time and CPU resources

•  Solution: self-generate content into HTML

•  Solution: use a reverse proxy cache

(24)
(25)

Step 4 Solution – Cache Issues

•  Note the similarity between interpretation and compilation problems with coding in the caching solution

•  A cache functions by responding to requests for Web objects. If the object requested is unknown, it is fetched from the origin server; if it has been fetched recently, the request is served from the cache. •  What you can t put into a proxy cache easily:

–  Extremely dynamic content particularly personalized content

•  You can do it and recode pages to let the proxy know what is cacheable and what is not, but the work involved may be

(26)

Step 4 Solution –Connection Offload

•  Many sites get network bound since they cannot let go of

a connection until the last ACK packet is back from the

user

–  Given the mix of fast and slow users you may find that a box will get saturated sooner that it should

–  Solution: network stack tuning

–  Solution: TCP termination at the Load Balancer

•  Terminating and muxing the connections gives the servers orderly workloads to handle

•  Add in the overhead of crypto (SSL) and you really want

to offload to a server with a special card or have SSL

decryption in your terminating device

(27)

On to the real problem

•  Once the request has been processed either by

the server or a cache and is ready to be sent,

you hit the real trouble spot in the process –

result delivery

(28)
(29)

Payload Issues

•  The return is composed of headers and data

•  The bulk of the payload is of course the data of a Web

page which is composed of two types of components

Text Binary

HTML/XHTML/XML Images (GIF, JPEG, PNG)

CSS Animations

JavaScript (Animated GIFs, Flash, Shockwave)

Video

Audio

PDF

(30)

Addressing Payload Issues

•  Once again our network Aware Web

Development Mantra

– 

Send less, less often

•  Reduce content sent

•  Do you really need that image, Flash splash page, etc. •  Cool design may = more money in delivery

–  The fallacy of flat bandwidth rates –  Incremental bandwidth costs

•  Real world examples: goodbye graphic rollovers, CSS oriented design, advertisements as a higher % of byte payload

(31)

Addressing Payload Issues Contd.

•  Compress images properly

–  Beware of decompression time with large physical

dimension images. Paint delay may far outweigh

delivery savings.

–  Don t be packet stupid

•  The envelope holds a basic min. amount about 1K making it smaller doesn t help!

–  Designers beware that some acceleration devices

will recompress your images, sorry to say what you

do may not make it to the end user the way you

intended.

(32)

Addressing Payload Issues Contd.

•  Crunch HTML

–  Who needs white space, comments, etc? –  Some types of <meta> is wasted

–  Color remap #ff0000 -> red

•  In some cases other way: name -> hex

–  Entity remap &curren; -> &#164;

•  In some cases the other way as well

–  Most changes would not hurt search engines, users, etc.

–  Most of these byte shaves have to be done automatically to be of value but they add up

(33)

Addressing Payload Issues Contd.

•  Crunch CSS

–  Same whitespace and comment issues as HTML

–  You can also use shorter id and classnames

•  .P1 instead of .Paragraph1 or .Tc instead of .TableCell •  This should be done automatically because of readability

–  Color condensing

•  #FF0000 can become #f00

–  Rule condensing

•  Short hand rules background not background-image •  Rule rewriting to take advantage or repetition

(34)

Addressing Payload Issues Contd.

•  Crunch JavaScript

–  Same whitespace and comment issues as HTML & CSS

–  Variable and function renaming can produce significant savings function validation() becomes function v() or similar

–  Some basic dead code elimination –  Semi-colon removal in some places –  Object remapping

•  var d= document; d.write(); d.write(); etc.

–  Script roll up would be very useful and would also reduce a request <script src= one.js >, <script src= two.js > becomes <script src= three.js >

•  Most web sites the separate JS files is a developer value not delivery value

(35)

Addressing Payload Issues Contd.

•  URL and Filename optimatizations

–  Index file removal

•  <a href= products/index.html > becomes <a href= products/ >

–  Issue with having a Web server around during development –  Dependent file renaming

•  instead of <img src= bnRolloveron.gif > remap to <img src= b.gif >

–  Path reduction

•  Instead of paths like ../../images/logo.gif remap to /i/logo.gif or better yet /i/l.gif

–  Saves huge amounts of space file wise since the names are often repeated all over the place

–  User never types in so no hurt there, some obfuscation benefit as well

(36)

Addressing Payload Issues Contd.

•  Source optimization of (X)HTML actually can be more beneficial than it would seem because it tends to be the root document from which future requests are made

–  Slow it down and you add small additions to everything else –  Do not confuse source optimization with obfuscation, they have

different goals

•  Unless you are a massive traffic site you should do these types of optimizations automatically using a tool like w3compiler.com otherwise it just isn t worth the effort most of the time.

•  With Web programming Develop for maintenance, but prepare

for delivery

•  Other aspects of a site like PDF and Flash can be compressed with tools but I have been placing focus on that which is most commonly used.

(37)

HTTP Compression

•  Transparent and harmless given that browsers send an accept-encoding header to negotiate this

–  Some of the biggest sites use this Google, Amazon, etc.

–  Only works on text formats: HTML, CSS, JS, PDF, and some Office formats

–  Savings as high as 70%

–  Implementations: Apache: mod_gzip IIS: httpZip or use an appliance like a Redline

•  Will use CPU cycles to do this, but your server isn t doing much •  It does increase the Time to First Byte (TTFB) but significantly

decreases the Time to Last Byte (TTLB)

–  Latency issue with compression (LAN vs dial-up value) –  Saves bandwidth no matter what

(38)

Addressing Payload Issues Contd. - Caching

•  Why do you keep sending me that logo?!

–  Signed your browser

•  304 not modified and network chatter issues

•  Control the cache and download only what you need and

when you need it

–  JS and CSS is in page is not so good, linked with good cache control headers is much better

•  Unfortunately little control is possible unless we design

for it in the first place

–  Design a cache control policy : when do things expire?

–  Consider organizing your site to help this /images/cached

•  Note: Caching and compression address different things

and can sometimes be in conflict.

(39)

Addressing Payload Issues Contd. - Caching

•  The danger of a stale cache

–  The browser or some intermediary is holding my image until next Tuesday and I really need to update it !

–  Solution: Rename the object

•  <img src= logo.gif > becomes <img src= logo1.gif > •  Lots of work, but easy in CMS systems

•  Watch out with caching your base documents then!

–  Fineground has an interesting automatic caching policy generation technique

•  Some browsers can cut corners though and this can

cause you trouble

(40)

Addressing Payload Issues Contd. – Exotic Stuff

•  Delta encoding

–  Notice that most pages have similar structures and sometimes even content

–  Why do we keep sending the same html tags, tables, etc.? –  You don t have to you could send a base page and then send

only the differences from page to page –  Read about the idea here:

http://webreference.com/internet/software/servers/http/ deltaencoding/intro/

–  Some AFE (Application Front End) appliances implements this using a proxy and JavaScript and it produces amazing results though it is obviously more dangerous than other solutions

(41)

Still having troubles…

•  You can do all this and still have a slow site, at least to

some users

–  Point source web serving will always have latency problems

•  You could set up multiple Web farms around the world

and then perform global load balancing between them

–  Redirection choices based upon server availability, network distance, geography or some mixture of these metrics

•  The downside to multiple farms is of course increased

data center and hardware costs

(42)
(43)

Solving the Latency Issue: CDNs

•  Because hardware and co-location costs go way

up, some people use CDN services.

•  CDNs replicate and move content to the edge of

the network improving reliability, scalability,

and performance.

•  In order to redirect an edge cache

–  DNS must be modified or use special URLs used [e.g.

ARL]

–  Obviously the second takes more effort, but may

allow for more flexibility in caching decisions.

(44)

CDN Solution

Move content to
(45)
(46)

Implications of CDN Besides Cost

•  Even with CDN you still have last mile issues

which can be significant.

•  Another problem is dynamic content assembly at

edge

–  Indicate what s cacheable and what is not

–  Edge Side Includes (www.esi.org)

•  Suggests that edge caches may become more

intelligent edge servers in the future, thus

moving us to a distributed computing style

(47)

Request Reduction

•  In modern broadband situations the number of requests

can significantly effect the performance of a page

•  Bundling dependent objects can potentially

tremendously improve the performance of a site

•  Sometime the separation frankly is more out of

convention that being appropriate

–  Example – JavaScript

•  <script src= file1.js ></script> <script src= file2.js ></script> Becomes

•  <script src= filebundle.js ></script>

(48)

Request Reduction Contd.

•  For CSS files you see a similar situation as JS

•  For images you could adopt an idea called CSS

sprites where you make a large image tile of all

the independent pieces and then show portions

of the image

<img src= " pixel.gif"

style="background:url('image_1878169298.png') -3095px 0px no-repeat; height: 45px; width: 33px;" />


Next image 


<img src="pixel.gif"

style="background:url('image_1878169298.png') -2896px 0px no-repeat; height: 32px; width: 199px;" />"

(49)
(50)

All About End Users?

•  -Bytes vs. Time

•  -Read, Decide, Click, Wait, Repeat

•  -Download ahead of time

–  Flash preloaders, cache tricks, JS preload

–  Mozilla prefetch

•  http://www.mozilla.org/projects/netlib/

Link_Prefetching_FAQ.html

–  Precache example

•  http://ajaxref.com/ch8/longscroll.html

–  Don t forget Browser Bulk

–  IE vs Opera vs Mozilla vs Safari – they are different pieces of software with different qualities of execution

(51)

How do you know you are doing well?

•  Measure server time, network time, and paint time

–  Server time is easy, network time is harder, and paint time requires a JavaScript injection to then start and stop a timer

–  http://ajaxref.com/ch6/connectionspeed.html

–  Interesting to note that such features are coming directly to browsers now (

http://blogs.msdn.com/b/ie/archive/2010/06/28/measuring-web-page-performance.aspx) and the W3C is creating

performance working group (

(52)

SPDY, Sockets and Beyond

•  Can we fix HTTP?

–  How hard would it be to do HTTP 2.0?

–  Pretty hard, we can t even do simple HTTP as it stands

•  Evidence: Proxies, Get/Post, Header Issues, Compression, etc.

–  SPDY offers some solution using an SSL tunnel

•  If you can t fix it then offer something else

–  Parallel protocol?

•  WebSocket?

•  Does this get at underlying file protocol vs. app

protocol difference? I think so.

References

Related documents

Misalnya, majoriti responden menyatakan mereka marah dan kecewa apabila memikirkan cara industri mencemarkan alam sekitar, kerajaan patut memberi subsidi kepada penyelidikan

This suggests that as firms use more internal funds relative to external equity, their costs of equity capital will fall and the rate the market uses to discount unexpected earnings

The nature of the dataset enabled us to explore the impact of various factors (like the rights on pension schemes, the timing of retirement and the receipt of a lump sum) on

Section 6.2 of the Memorandum Circular requires all cable television system operators operating in a community within Grade &#34;A&#34; or &#34;B&#34; contours to carry the

The Statement of Owner Equity allows you to determine to what degree the change in equity was caused by (1) earnings from the business, and nonfarm income, in excess

inthelungsln man.However,the present da11ghter rediae and cercariae resemblein morphology those of the triploid formobtainedinShimazu’S(1981)experiment and those PrObably of

Institutional data were used to assess the impact of service-learning class participation on three critical student success metrics: Inclusive Excellence (i.e., the degree to