• No results found

Building a large scale SaaS app

N/A
N/A
Protected

Academic year: 2021

Share "Building a large scale SaaS app"

Copied!
47
0
0

Loading.... (view fulltext now)

Full text

(1)

Building a large scale SaaS app

Open Source, Storage and Scalability Dan Hanley, CTO

http://www.magus.co.uk

(2)

Agenda

Who are Magus?

ƒ What do we do? ƒ Who do we do it for? How do we do it? ƒ SOA ƒ Scalability ƒ Storage ƒ F/OSS 2

(3)

The Magus proposition

• Leading provider of innovative web-content engineering solutions to global corporationsg g g p • Specialise in managed applications that help

clients build value from their online assets and from clients build value from their online assets and from the wider web

• Three main applications:Three main applications:

ƒ ActiveStandards

RemoteSearch

ƒ RemoteSearch

ƒ CrucialInformation

(4)

Our managed applications

Delivering Software-as-a-service (ASP model)

z ActiveStandards designed to help companies stay on-brand, on-line

by tracking and managing corporate web standards compliance, worldwide

z RemoteSearch a multi-site search engine providing integrated search z RemoteSearch a multi site search engine, providing integrated search

frameworks for enterprise websites

C i lI f ti i t i d li i

z CrucialInformation a premium current awareness service delivering

high-quality, strategic intelligence from the web and syndicated services

(5)
(6)

RemoteSearch

(7)
(8)

Social Networking

(9)
(10)

Technically - where we were

1 product

W b d i b i

• Web design business • All home grown

• No appservers • No failoverNo failover • No common infrastructure infrastructure • Scalability worries No ersion control • No version control • Unclear methodology 10

(11)

Technically – where we are now

• 3 main applicationspp • Bespoke capability • Common • Common infrastructure • Platform of services • Platform of services • Fault tolerant • Scalable

• Defined process & methodology

(12)

Approach

• Do a lot with a little – 35 people, punching p p , p g above our weight

• Don't reinvent the wheel

• Extract commonality – keep it DRY

(13)

The components of the stack

• Trawl Harvest • Routing Store • Harvest • Index • Search • Store • Quartz • ClientEngine Search • Analysis • Monitor ClientEngine • Profile • LinkChecker

(14)

Logical architecture

REST (not SOAP)

(15)

Trawl

• Responsible for managing the gathering of data in its raw form into the Store.

• Currently have Trawlers for:

ƒ HTTP ƒ FTP (several flavors) RSS A ƒ RSS, Atom etc ƒ SMTP G l ƒ Google ƒ Technorati M ƒ Moreover ƒ FT (several flavors)

(16)

Trawler service

Pluggable architecture based on JMX Mbean service

(17)

Harvest

• Responsible for extracting explicit data from Links

d t i th fi ld d d t i th d t b d th

and storing the fielded data in the database, and the non fielded data in the Store.

(18)

Harvest service

Pluggable architecture based on JMX Mbean service

(19)

Index

(20)

Search

• Responsible for searching indices and delivering results.

(21)

Analysis

• Responsible for deriving scores for information implicit in the page

ƒ Sentiment

ƒ Sentiment

ƒ Readability

(22)

Monitor

− Badly named, should be called “Classifier”

− Responsible for creating filings between Links and Categories. A Li k b b k k i bl i l

− A Link can be a bookmark, news item, blog article etc.

− A Category can be Users Bookmarks, News Topic, an AST Guideline etc.

(23)

Classifier (monitor) service

(24)

LinkChecker

• Responsible for checking the life of links and removing them correctly from the system when they have expired

from the system when they have expired.

(25)

Routing

• Manages the workflow of jobs through the stack

Has the capability to dynamically loadbalance workloads • Has the capability to dynamically loadbalance workloads.

(26)

Content stores

z We needed a multiple terabyte (currently 24 TB) distributed, fail

f fil

safe, filesystem

z NFS was crumbling under load z ZFS was vapourware

z ZFS was vapourware z Lustre was too complex z We built our own!

z Magus Contentstores, responsible for holding both the raw and

processed non fielded content of links which have been trawled and harvested

and harvested

(27)

Content stores - configuration

<mbean code="uk.co.magus.store.service.StoreService" name="magus.service.store:service=StoreServiceLocalCalls"> <attribute name="JndiName">magus/services/StoreServiceLocalCalls</attribute> <attribute name="Config"> <TryEachStripeStore> <List> <MirrorStore> <List> <List> <RemoteStore>nas:1299;StoreServiceRemoteCallsInvokeTarget</RemoteStore> <RemoteStore>m4:1099;StoreServiceRemoteCallsInvokeTarget</RemoteStore> </List> </MirrorStore> <MirrorStore> <List> <RemoteStore>nas:1199;StoreServiceRemoteCallsInvokeTarget</RemoteStore> <RemoteStore>m5:1099;StoreServiceRemoteCallsInvokeTarget</RemoteStore> </List> </List> </MirrorStore> </List> </TryEachStripeStore> </attribute> <depends>jboss:service=Naming</depends> </mbean>

(28)
(29)
(30)

Store JMX Beans

(31)

Contentstore - engines

Can use many types of engine on a node Currently supports:

Currently supports:

ƒ Mysql

ƒ SleepyCatSleepyCat

ƒ Filesystem

(32)

Content Store Classes

(33)

Quartz

• Responsible for firing messages on time. • The “heartbeat” of the stack.

(34)

Client Engine

• Responsible for stack based processing for Client

A li ti

Applications.

• Keeps “heavy lifting” out of the Web Tier.

• Coordinates Client Applications requests across multiple stack services.

(35)

Management Application

zManage taxonomyg y zManage rules

zManage scheduling

zManage scheduling

zFocus on managing the business

L i t t JMX b

zLeave service management to JMX or web

consoles

(36)
(37)
(38)
(39)
(40)

Profile

• An internal service used to collect metrics on

t id f

system wide performance

(41)
(42)

Infrastructure

hit

t

architecture

(43)

Methodology

• Agility – sprintsg y p

• Issue tracking – Jira • Issue tracking – Jira

R l h d l d d l t

• Regular, scheduled, deployments • Consolidated build & version control

(44)

Deployment

Deployment

Subversion (Code repository ) 1. C hec k out 2. C ode /

Developer Local Box

2. C ode / Loc al T es t 3. C hec k In D ev eloper

4. auto C hec k out

6 & 12 N otify Bamboo 5 . Build / U nit T es ts / Metr ic s 7 . Publis h r es ults 8. D eploy

10. D eploy D ependenc ies

P a y t o $

R eleas e N ote 13 . Pr epar e R eleas e N ote

D ev C lus ter 9 & 11 . F IT T es ts Pr oduc t O w ner / D ev T L R eleas e N ote G r anite T L 14. G et R eleas e N ote T es t C lus ter 18. D eploy Applic ations

Str es s T es t 15 . R ejec t R eleas e

17 . Manage T es t & Pr oduc tion Env ir onments

16. G et Applic ation Ar tifac ts

Pr oduc tion 19 . D eploy Applic ations

Jboss ON

(45)

Throughput

• 11,000 sources in system, y

• ~16 000 000 pages rolling store

• 16,000,000 pages rolling store

200 000 d

• ~200,000 new pages per day

• Average < 2 minutes from page detection to fully classified and indexed.

(46)

Cost comparisons

• Apples and oranges?pp g

Proprietary Licence Free

Product Per CPU CPUs Total Product Per CPU CPUs Total

O l 20 000 00 10 $200 000 M S l $0 00 10 $0 00

Oracle 20,000.00 10 $200,000 MySql $0.00 10 $0.00

Weblogic AS 10,000.00 38 $380,000 Jboss AS $0.00 38 $0.00

MS Windows Server 3,919.00 48 $188,112 Redhat/Apa $0.00 48 $0.00

Visual Team Studio 1,000.00 12 $12,000 Eclipse $0.00 12 $0.00

ClearCase 4,125.00 1 $4,125 Subversion $0.00 1 $0.00

Jira 2 000 00 1 $2 000 Trac $0 00 1 $0 00

Jira 2,000.00 1 $2,000 Trac $0.00 1 $0.00

Autonomy IDOL bundl 75,000.00 2 $150,000 Carrot2 $0.00 12 $0.00

IBM Intelligent Datami 132,000.00 1 $132,000 LingPipe $0.00 12 $0.00

Verity K2 50,000.00 2 $100,000 Lucene $0.00 8 $0.00 UIMA $0.00 12 $0.00 $1,068,237 $0.00 £580,531.26 €849,629.77 46

(47)

Thank you

Questions? Questions?

References

Related documents

Electronic Journal of Science Education ejse.southwestern.edu difficulty &lt; -0.5); Medium (item difficulty between -0.5  0.5); or Difficult (item difficulty &gt; 0.5);

To create an internal link to another page on our site, we use the same format but just type the file name directly into the href attribute:.. &lt;a href=&#34; page2.html

In contrast to the normal data format, each sparse instance tag contains a type attribute with the value sparse:.

• To input data using check boxes, use the &lt;input /&gt; tag with the &#34;type&#34; attribute set to &#34;checkbox.&#34; The variable names used with each check box must be

The following code illustrates how the images object can be cached by using a variable:. &lt;SCRIPT language=&#34;JavaScript&#34;&gt; var myimages=document.images

 &lt;add key=&#34;PassiveNode&#34; value=&#34;active&#34; /&gt; - Web interface can have data written as normal, and Passwordstate Windows Service does not process anything

The provider dropdown list on the Service Type Eligibility Benefit Request Form will include a special &#34;&lt;&lt; Select From Provider Reference File &gt;&gt;&#34; item to

&#34;multipart/form-data&#34; is used when a form requires binary data, like the contents of a file, to be uploaded.. • The type=&#34;file&#34; attribute of the &lt;input&gt;