• No results found

2010 Exceptional Web Experience

N/A
N/A
Protected

Academic year: 2021

Share "2010 Exceptional Web Experience"

Copied!
44
0
0

Loading.... (view fulltext now)

Full text

(1)

TECH-B21

Search Engine Optimization and

WebSphere Portal - Best Practices

Andreas Prokoph, Lead architect – Search in Portal and WCM, Portal development

([email protected])

(2)

Agenda

Introduction

User patterns using Internet search engines

What makes a webpage 'relevant'?

What features does WebSphere Portal provide to support SEO

Reference materials

(3)
(4)

“Why aren’t our pages on the top in Google?”

… asks your boss

You did your best – you thought ...

Even you were wondering what was

going on

… Keywords are there …

… Metadata polished …

… beautiful URLs …

(5)

Search marketing versus ‘organic search’

Paid

placement

O

rg

a

n

ic

s

e

a

rc

h

re

su

lt

(6)

“Search and your Internet presence”

Typically two steps involved to attract traffic and keep/win users/customers:

First step is to attract and get users to your website

Internet Search

Second step is once they are at your site, they might have already found what they

were looking for, or even look for more

Site search

Google

attracts

users to a website

If interesting enough they are

likely to ‘search’ for more

information at that website

(7)
(8)

The golden triangle from an eye-tracking study

Aggregate map:

All consumer

search activity

Red is most-viewed;

black is

un

viewed.

Source

: Enquiro

Aggregate map:

All consumer

search activity

Red is most-viewed;

black is

un

viewed.

(9)

Ideally this is where you want your page to score:

in

the Golden Triangle

Visitors view their results for an

average of 6.3 seconds before

clicking on a link.

Just enough time to scan the first

three to five

results and the top

one or two

ads. Chances are: if

Visitors view their results for an

average of 6.3 seconds before

clicking on a link.

Just enough time to scan the first

three to five

results and the top

one or two

ads. Chances are: if

1

2

3

1

2

(10)

What make a webpage a good webpage?

Good content and information

Good content and information

focused

content and information

… and finally .. interesting enough for others (external)

(11)

Why do search engines keep their ranking metrics a secret?

If the metrics would be published, too many would take advantage

of them, trying to boost their pages

In the end: Internet search engine would be worthless to the broad

community!!

bad example of the past:

AltaVista – emphasized on keywords in titles and metadata

Within 1 year AltaVista was ‘dead’ because it’s search results quality was

declining rapidly

Imagine you were ‘Google’ – would you take chances to ruin your

(12)

What does not work ….

Metadata usage …

stuffing metadata fields like ‘Keywords’ and ‘Description’ with all kinds of (unrelated)

keywords

‘Alt tag’ stuffing …

used to describe what a certain image is about ..

example:

<img alt=“windows, ABC consulting, windows, developer, tutorials, ABC

consulting, developer, windows, tutorial, tutorial, tutorials, resources, windows, tutorials,

developer" />

.. and there is also: ‘search engine friendly URLs’

a widespread misconception as to how search engines work

for a crawler –

if it gets a return code '200', then that URL is OK

(13)

A word about metadata usage by Internet search engines

Title

and

description

information is important for the initial

representation of a webpage in the search result list

However: these two above and any others are not relevant in any

way for determining the webpage's relevancy

one could still argue about ‘Title’ – again: in the past that was one of the metrics

documented by AltaVista and ... we know how that went ...

An example for 'official' webmaster recommendations, see:

(14)

… User friendly URLs .. why would you think it’s relevant?

When looking at the

presentation of a Google search

result, the assumption is that

Google highlights whatever has

contributed to a webpage’s

relevance ranking

Truth is: the highlighting is

straight forward and simply

marks everything in bold which

matches one of the keywords,

assuming the user will then be

able to better judge a page's

relevance for himself...

(15)

How search engines work –

having discussed what DOESN'T work,

(16)

The following are the main metrics that get applied

Page or document relevancy

term frequency

times

inverted document frequency

tf x idf

To some degree … Hypertext-Matching Analysis: analyzes the full

content of a page and also analyzes the content of linked local

pages

PageRank

link popularity

- how important other Internet users think a specific page is

Note that Google specifically has a set of more than 100

rules that potentially can get applied (for various purposes)

Note further: if one of the rules is seen to be mis-used, then its

(17)

Basic relevance – about ‘precision’ and ‘recall’

Comparison between two

search engines, one is real

world, the other the ideal one

optimal

good

(18)

PageRank – the obvious …

A

B

C

Which webpage

would you think

is the more

important one?

(19)

How PageRank is calculated

PR(A)

= 0,15 + 0,85 * PR(C)

PR(B)

= 0,15 + 0,85 * (PR(A)/2)

PR(C)

= 0,15 + 0,85 * (PR(B) + PR(A)/2)

PageRank formula put simple:

PR(A) = (1-d)/N + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

PageRank calculation for the three pages shown on the left

Definition:

PR(A) is PageRank of document A

PR(Tn) are PageRank of document Tni, which includes a link to document A

N is the count of qualifying documents

C(Ti) total count of links on page Tiand

d is a confidence value, where 0 d 1

(20)

Can I influence my PageRank?

A

B

C

With this:

again: which is the

more important

webpage?

$

and also 100 links on it, then a referenced (linked) page would only receive 2/100ths of its PageRank score … which is .. just about nothing ..

$

$

$

(21)

OK, so what can I do? – Part 1

Ensure proper crawling of your website

Redirects only if required!

Don't even think of redirecting only crawlers

‘What the crawler gets is the same as to what the user gets’!!

no JavaScript to generate content or URLs

have good navigation – e.g. crawlers like a Sitemap!

Sitemaps 0.90 protocol support

see also: WebSphere Portal ‘Search Sitemap Utility’ portlet available

(22)

OK, so what can I do? – Part 2

Publish appropriate content

not too little not too much information on a web page

note that Flash objects and images might hide essential information from

the crawlers

Focus on what you want to tell your users or customers first!

Then think about what keywords users (not you!) might choose to find

similar content/information elsewhere

reconsider in cases of mismatch to adjust the keywords on your web

(23)

What

MORE

can I do to improve my PageRanks?

Let’s face it: Not much!

Seek relationships with trusted web sites and share information

with one another

Register your web site with Yahoo!

Make your web pages easy to cross-link

Note that even more web sites – based on the platform, like WebSphere

Portal – have URLs which somehow reference the user’s history (e.g.

session object/ID)

consider making use of 'user friendly URLs' for users to pick up and use

(24)

A brief word to Sitemaps 0.90 protocol ..

<?xml version="1.0" encoding="UTF-8"?>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

<url>

<loc>http://www.example.com/</loc>

<lastmod>2019-06-09</lastmod>

<changefreq>weekly</changefreq>

<priority>0.8</priority>

</url>

</urlset>

it doesn't influence

No …

the PageRank score of

that page

(25)
(26)

How do I promote my pages?

Question: how will users know what information there is at my web

site if they don’t find it easily (top) through Google?

As mentioned in the beginning: this is also a two-phase approach:

get the high-level pages in good shape with all the important keywords

you have selected

once users get to your homepage, they might be inspired to look for more

information

How: does you site have good search?

If so: they’ll most likely find the information – if it’s there!

If what they find is good – they might consider pointing to those

(27)

WebSphere Portal and SEO enablement

Search engine crawler awareness since Portal V6.0

Portal Server will externalize for the crawler ‘normalized URLs’

URLs which do not maintain e.g. navigational state

Sitemaps 0.90 protocol support

www.sitemaps.org

developerWorks article:

http://www.ibm.com/developerworks/library/x-sitemaps/index.html

Search Sitemap Utility portlet download:

https://greenhouse.lotus.com/plugins/plugincatalog.nsf/assetDetails.xsp?action=editDocument&documentId=A1FF51D2C2E82CBE852576AB006ED 590

Remains: bookmark support for

(28)

… almost forgot .. for all those still insisting to get

'search engine friendly URLs'

(29)

search engine

User

friendly-URLs

Friendly-URLs result in human readable URL

prefixes that lead to portal pages

Each content node might have a friendly name

assigned

The friendly-URL is a hierarchical path

constructed from these names

based on the

content topology

(see URL mappings)

Every URL that is generated by WP APIs will

contain the friendly-path automatically

It is even guaranteed that every URL that leads to a

particular page will start with the page‘s friendly-path

Content Nodes

root

home

shop

info

shoes

/wps/portal

/home

/wps/portal

/home/shop

(30)
(31)

Public Portal pages and how Internet search engines work

Web crawlers

Web crawlers

Web crawlers

Web crawlers

Search indexes

Homepage HomepageHomepage Homepage or oror or Sitemap Sitemap Sitemap Sitemap

Crawler follows ‘hrefs’

only 'GET' requests

no Javascript interpreted

Portal Server recognizes the crawler and triggers URLs to be normalized.

(32)

The fundamental problem –

for web-crawlers!

Welcome Page link!

Welcome page Page A

Page B

Page C

URL URL URL URL----AAAA

U R L U R L U R L U R L -B BB B URL URLURL URL---C-CCC

URL URL URL URL----DDDD

UR L UR L UR L UR

L--- -E EE E

Portal encodes in URLs

additional information about

the navigational state of

the user:

like: which page he comes

from and how he left it –

e.g. a

portlet was maximized

Information encoded within URLs:

Information encoded within URLs:

Information encoded within URLs:

Information encoded within URLs:

URL-A– Target: Page A, coming from Welcome page

URL-B– Target: Page B, coming from Welcome page

URL-C– Target: Page C, coming from Page A

URL-D– Target: Page A, coming from Page C

URL-E– Target: Page B, coming from Page C

A crawler would want to assume:

URL-A and URL-D to be identical

(33)

A thing of the past - How Internet search engines had seen a Portal site

Web crawlers

Search indexes

This set of pages represents the structure of the Portal site.

This set of pages the crawler retrieves and assumes to be unique based on the link structure of the site.

(34)

WebSphere Portal V6 – crawlability enablement!

Web crawlers

Search indexes

Portal Server recognizes the crawler and triggers URLs to be normalized.

Result:

no more ‘duplicate’ pages

all linked and public Portal pages are crawled and indexed

Normalized URL

= all navigational state information is

discarded from the URL

(35)

Crawlability enablement for Search Engines

Crawler awareness - the Portal Server will recognize a crawler by its web agent identifier. A default list is available

already covering the 50 most popular crawlers (via pattern matching - thus potentially more enabled).

The Portal will then transform all URLs that are output on the pages as so-called normalized URLs, thus making

them unique. In addition - action URLs are nullified, thus not allowing crawlers to execute actions such as 'delete

document' or 'login', etc..

A Sitemap portlet

a crawler can be pointed at. For efficiency reasons it might be advisable to also place appropriate

robot directives into the theme to ensure that the crawler will only follow such links available in the Sitemap, and

thus not having to re-evaluate links found on each of the pages.

Search Engine Utility portlet

– provides support for the ‘Sitemap 0.90’ protocol. Supported by Google, Yahoo! and

Microsoft Live Search

In Summary

– Portal will provide the means to allow for a complete crawl of a portal site (public pages) and the

tools to allow for adequate linking of portal pages from an external site to support PageRanks.

(36)

Export as Google Sitemap

WebSphere Portal – Search Engine Utility portlet

Added feature – Export to Google Sitemap

Export the Sitemap to a Google Sitemap XML file.

Click on the Browse button to specify where in the filesystem to store the output XML sitemap file.

(37)

Search Engine Utility portlet – configuration/editing option

0.5 Default values:

Update frequency: Priority:

Weekly

(38)

Dynamic and personalized content – what crawlers will not get

hold of …

Crawlers try to not fetch dynamic or personalized content

there might be spoofing involved !?!

in the past this was the main reason for truncation of URLs after the first

‘?’

What can be done:

have the Web content management system generate a link list of the

non-default (dynamic) content

append or reference via the website’s sitemap

(39)

Summary

WebSphere Portal allows for safe and efficient crawling of Portal sites

Efficiency increased through support for Sitemaps 0.90 protocol

Good pages are determined by its contents

link popularity is the additional boost factor

Consult the 'webmaster guides' that the search engine sites publish

if a SEO consultant suggests metadata 'spaming' or 'pretty URLs'

ask them for proof – which is webpages applying such techniques (before and after),

or it is officially documented by Google et al

'dynamic Portal URLs' prevent from getting adequate ranking

(40)

Excellent book on ‘Search Engine Marketing’!

Search Engine Marketing, Inc. Driving Search Traffic to Your Company's Web Site,

Mike Moran, Bill Hunt, IBM Press

http://www.amazon.de/Search-Marketing-Driving-Traffic-Companys/dp/0131852922/ref=sr_1_1?ie=UTF8&s=books-intl-de&qid=1202128301&sr=1-1

Acknowledgement: Overview slides taken from Mike’s SEO presentation

developerWorks articles – Basics on SEO – Part 1-4

http://www.ibm.com/developerworks/search/searchResults.jsp?searchType=1&pageLang=&

displaySearchScope=dW&searchSite=dW&lastUserQuery1=search+engine+optimization&las

tUserQuery2=&lastUserQuery3=&lastUserQuery4=&query=search+engine+optimization+bas

ics&searchScope=dW&Go.x=0&Go.y=0

(41)

For More Information (1)

WebSphere Portal – IBM Site

http://www-3.ibm.com/software/genservers/portal/

WebSphere Portal Information Center

http://www.ibm.com/developerworks/websphere/zones/portal/proddoc.html

WebSphere Portal Business Solutions Catalog (on Lotus Greenhouse)

https://greenhouse.lotus.com/catalog/home_full.xsp?fProduct=WebSphere%20Portal

WebSphere and Lotus Web Content Management Portal Open Beta

https://www14.software.ibm.com/iwm/web/cc/earlyprograms/lotus/portalopenbeta/

WebSphere Portal Blog

(42)

For More Information (2 )

IBM Lotus Connections

http://www.ibm.com/software/lotus/products/connections

IBM Lotus Forms

http://www.ibm.com/software/lotus/forms

IBM Lotus Quickr

http://www.ibm.com/

lotus/quickr

IBM Lotus Sametime

http://www.ibm.com/lotus/sametime

WebSphere Commerce

http://www.ibm.com/websphere/commerce

WebSphere Process Server and Business Process Automation

(43)

Please complete the session survey for this session:

TECH-B21

Session Speakers:

Andreas Prokoph

(44)

© IBM Corporation 2010. All Rights Reserved.

The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software.

References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.

All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.

IBM, the IBM logo, Lotus, Lotus Notes, Notes, Domino, Quickr, Sametime, WebSphere, UC2, PartnerWorld are trademarks of International Business Machines Corporation in the United States, other countries, or both. Unyte is a trademark of WebDialogs, Inc., in the United States, other countries, or both.

Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

References

Related documents

As anticipated, we see that very large r max 10 h − 1 (for sources more than 10 2 times more active than FRB 121102) makes it possible to detect high-DMex repeaters without requiring

While EAOs give reckless credit providers a means of securing repayment, this is not the root problem – rather, it is the poorly drafted provisions of the NCA

The percent of patients discharged alive after a code blue event improved from an average of 24% pre-implementation to 29%

Viewing investment projects in new technologies as real options, this paper studies the effects of endogenous competition and asymmetric information on the strategic exercise of

If  an  individual seeks  to  contest a  determination, an NYSE Staff  Governor or Executive  Floor Governor  not  involved  in  the  original  determination  (or,  if 

In the 1990s, the analysis of poverty in Africa became susceptible to a livelihood approach, with an actor-oriented perspective of putting people at the centre and pointing out their

in the re-dedication of the altars of the emperors (Nos. The altars in question served the imperial cult, and sacrifices were offered on them to the ruling emperor; when he

By analyzing the diffraction process of the GOWs with the MSFVWs in the MDYG waveguide, it is known that: (1) the MO diffraction efficiency can be improved by a proper selection of