• No results found

Return on Experience on Cloud Compu2ng Issues a stairway to clouds. Experts Workshop Nov. 21st, 2013

N/A
N/A
Protected

Academic year: 2021

Share "Return on Experience on Cloud Compu2ng Issues a stairway to clouds. Experts Workshop Nov. 21st, 2013"

Copied!
27
0
0

Loading.... (view fulltext now)

Full text

(1)

Return  on  Experience    

on  Cloud  Compu2ng  Issues  

…  a  stairway  to  clouds  …  

Experts  Workshop  

Nov.  21st,  2013  

(2)

InGeoCloudS  SoCware  Stack  

InGeoCloudS  Elas2city  and  Scalability  

–  Elas2c  File  Server  

–  Elas2c  Database  Server  

–  Elas2c  Web  Server  

–  Elas2c  Map  Server  

–  Elas2c  Linked  Data  Store  

InGeoCloudS  Monitoring  and  Accoun2ng  

Agenda  

Nov.  21st,  2013   2  

(3)

•  Cloud  compu2ng  comes  from  the  convergence  of:    

–  service  oriented  architectures  

•  ...  loose  coupling  of  services  with  opera2ng  systems  and  technologies  ...  

–  parallel  compu2ng  

•  large  scale  data  analysis,  up  to  thousands  of  machines  

–  virtualiza2on  

•  independence  from  physical  hardware  

What  is  Cloud  Compu3ng  

Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. (NIST)

(4)

Diverse  so6ware  requirements  

Diverse  resource  requirements  

 

Resource  requirements  vary  over  2me  

Reduce  costs  

InGeoCloudS  Challenges  

and  Cloud  Compu3ng  

4   InGeoCloudS  Experts  Workshop   Nov.  21st,  2013  

(5)

Diverse  so6ware  requirements              

 

<-­‐>      

 Virtualiza2on    

•  To  support  a  larger  number  of  soCware  requirements  

Diverse  resource  requirements                

 

<-­‐>

                 Scalability    

•  To  support  large  data  volumes  and  high  throughput  

•  To  support  increasing  dataset  sizes  

Resource  requirements  vary  over  2me      

 

<-­‐>

   Elas2city  

•  To  support  a  varying  number  of  users  

•  To  support  on  demand  computa2ons  (e.g.,  shake-­‐map)  

Reduce  costs                                                                          

<-­‐>

           Pay-­‐as-­‐you-­‐go

 

•  To  reduce  infrastructural  cost  during  low  plaUorm  usage  

 

InGeoCloudS  Challenges  

and  Cloud  Compu3ng  

(6)

InGeoCLOUDS  

Architecture:  

Auto-­‐Scaling  

Layers  

6   InGeoCloudS  Experts  Workshop   Nov.  21st,  2013  

<<Auto-Scaling Layer>>

Elastic File Server

<<File Server>> GlusterFS <<File Server>> GlusterFS <<File Server>> GlusterFS <<Auto-Scaling Layer>>

Elastic DataBase Server

<<DB Server>> PG-Pool II <<DB Server>> PostgreSQL <<DB Server>> PostgreSQL <<Auto-Scaling Layer>>

Elastic Map Server

<<Web Server>> Mapserver <<Web Server>> Mapserver <<Web Server>> Mapserver <<Auto-Scaling Layer>>

Elastic Linked Data Storage

<<Triple Store>> Virtuoso <<Triple Store>> Virtuoso <<Triple Store>> Virtuoso <<Auto-Scaling Layer>>

InGeoCLOUDS Web Portal

<<Web Server>> Apache <<Web Server>> Tomcat <<Web Server>> Jetty <<Auto-Scaling Layer>> Geo-Computational-Layer <<Virtual Instance>>

Data Provider Service

<<Virtual Instance>>

Data Provider Service

<<Virtual Instance>> InGeoCLOUDS Backend <<Web Server>> Tomcat + SPRING <<Web Archive>> IGC API Implementation <<storage device>>

Cloud Permanent Storage

<<Virtual Image>>

Web Server

<<Virtual Image>>

Data Provicer Service

<<Virtual Image>> Virtuoso <<Virtual Image>> Mapserver <<Virtual Image>> PostgreSQL <<Virtual Image>> GlusterFS <<Data Snapshot>> Back-up

(7)

•  Es2mated  resources:  

–  12  instances,    500GB  storage,  35  GB/month  network  

•  We  analyzed  several  Cloud  providers:  

–  Amazon  AWS,  SigmaCloud,  Atlan2c.Net,  Flexiant  Flexiscale,  GoGrid,   Google  App  Engine,  Joyent,  MicrosoC  Azure,  OpSource,  Rackspace,   OVH  Public  Cloud.  

•  On  the  basis  of  several  criteria:  

–  Func2onal/SoCware  Requirements,  Elas2city  Model,  As-­‐a-­‐Service   Model,  Maturity  and  Diffusion,  Migra2on  Cost  Model  

•  Including  Monthly  Cost:  

–  E.g.,  Amazon  AWS  €900,  Rackspace  €1600  

•  We  observed  15-­‐20%  costs  drop  in  the  last  year  

Choice  of  the    

Cloud  Compu3ng  PlaBorm  

Data Management Data Integration & Linking Data import IGC Middleware

Cloud Computing Platform Elastic Database Server Elastic File Server Elastic Compute Portal && Tools

IGC Management Data Publication

IGC Administration

Cloud Platform API ODBC/ JDBC/ SQL NFS/GFS IGC-API /elasticfs /elasticdb /elasticcomp IGC-API /master IGC-API /metadata/md /metadata/db IGC-API /data-import/fs /data-import/db /data-import/harvests HTTP/S FTP/S IGC-API /mapfiles /layertemplates SPARQL HTTP/S WebGIS Client Accounting Monitoring OGC:WMS OGC:WFS OGC:CSW OGC:WMS OGC:WFS HTTP/S Geospatial Metadata and Catalog Services Elastic Map Server Elastic Web Server

Data Providers Services

(8)

This  is  the  

gateway

 

to  the  Cloud  PlaUorm  Services  

–  Transparent  access  and  portability  to  new  cloud  providers  

Exposed  Services:  

–  Virtual  Instances  Management  

•  Run  a  new  instance,  Stop  an  instance,  aeach  a  storage  device,   Elas2c  IP,  automa2cally  mount  the  distributed  file  system.  

–  Auto-­‐Scaling  Layer  Managment  

•  Manage  an  elas2c  pool  of  servers,  including  load  balancing  

InGeoCloudS  Elas3c  Compute  

8   InGeoCloudS  Experts  Workshop   Nov.  21st,  2013  

Data Management Data Integration & Linking Data import IGC Middleware

Cloud Computing Platform Elastic Database Server Elastic File Server Elastic Compute Portal && Tools

IGC Management Data Publication

IGC Administration

Cloud Platform API ODBC/ JDBC/ SQL NFS/GFS IGC-API /elasticfs /elasticdb /elasticcomp IGC-API /master IGC-API /metadata/md /metadata/db IGC-API /data-import/fs /data-import/db /data-import/harvests HTTP/S FTP/S IGC-API /mapfiles /layertemplates SPARQL HTTP/S WebGIS Client Accounting Monitoring OGC:WMS OGC:WFS OGC:CSW OGC:WMS OGC:WFS HTTP/S Geospatial Metadata and Catalog Services Elastic Map Server Elastic Web Server

Data Providers Services

(9)

InGeoCloudS  scalable  services:  

–  Elas2c  File  Server  

–  Elas2c  Database  Server  

–  Elas2c  Web  Server  

–  Elas2c  Map  Server  

–  Elas2c  Linked  Data  Store  

All  of  the  able  are  

hot  topics  

from  a  

technological  

and  scien2fic

 point  of  view.    

(10)

Elas3c  File  Server  

!

•  We  evaluated  several  technologies:  

–  S3FS,  S3Backer,  pNFS,  LUSTRE,  …  

•  Our  choice  was  GlusterFS  

–  No  single  point  of  failure  

•  No  file  metadata  server  

–  Scalable  

•  Can  add  as  many  servers  as  needed  at  any  2me.  

–  Can  use  standard  protocols  (e.g.  NFS)  

–  Includes  some  op2miza2ons,  e.g.,  read  ahead,  write  behind,  async  I/O,   scheduling,  caching  

•  It  is  currently  sponsored  by  RedHat  

•  Other  Cloud-­‐based  storage  solu?ons  are  based  on  the  key-­‐value  

access  pa@ern,  which  is  incompa?ble  with  every  other   technology  on  the  Geo-­‐Spa?al  SoDware  stack  

–  This  is  almost  a  research  challenge  !  

10   InGeoCloudS  Experts  Workshop   Nov.  21st,  2013  

(11)

Transparent  access  for  applica2ons  

–  Similar  to  NFS.  Automa2c  set-­‐up  on  IGC  instances.  

(12)

Elas3c  File  Server  Scalability  

12   InGeoCloudS  Experts  Workshop   Nov.  21st,  2013  

55   77   210   344   78   125   342   730   0   100   200   300   400   500   600   700   800   1   2   4   8   Th ro ug hp ut  (MB /s )  

Number  of  Servers  

(13)

PostgreSQL  (+PostGIS)  

PgPool  

Load  balancer  

–  Master/Slave  architecture  

–  Streaming  replica2on  

Scalability  

–  Parallel  read  opera2ons  

–  Can  add  as  many  servers  

as  needed  at  any  2me.  

Reliability  

–  Automa2c  fail-­‐over  

–  A  slave  replaces  the  Master  

(14)

Simplify  the  process  of  “transforming”  

geo-­‐data  as  

geo-­‐services  

Guarantee  the  geo-­‐service  compliance  with  

OGC

 

standards  and  

INSPIRE

 

requirements    

3  components  in  the  Data  Publica2on  :    

–  Read  Only  services  with  OGC:WMS  (image)  and  OGC:WFS  

(data)  

–  CRUD  API  to  manage  the  configura2on  of  each  service  by  

data-­‐provider  

–  Metadata  management  (ISO  1911  +  OGC:CSW)  

Data  Publica3on  Objec3ves  

Nov.  21st,  2013   14  

(15)

Data  Publica3on  Component  

Architecture  

Elastic FS and DB ELASTIC GEOSPATIAL SERVER CLUSTER Mapserver Server WMS WFS

Mounting FS for all data provider

ReadOnly Access DB 3306 port

Mapserver Server Mapserver Server … HTTP load balancer HTTP/API

Mounting FS for all data provider Write

Data publication

(16)

Example  with  the  number  of  requests  

with  a  WMS  GetMap  

16   InGeoCloudS  INSPIRE  Florence  Workshop   June  26,  2013  

Small  Amazon   instance   6   Large  Amazon   instance   50       WMS   Performance   GetMap  800x600  <5  s   Capacity   simultaneaus  requests  >  20/s   Availability   99%  

(17)

Elas3city  Experiment:  

Elas3c  Web  Server  

0   10   20   30   40   50   60   70   80   90   100   0   2000   4000   6000   8000   10000   12000   1   6   11   16   21   26   31   36   41   46   51   Av er ag e   CP U  U 3l iz a3 on   Re qu es ts  /   mi n  

Issued  Requests   System  Load   No.  Servers   Load  Threshold  

1  server  

2  servers  

3  servers  

(18)

18   InGeoCloudS  Experts  Workshop   Nov.  21st,  2013  

0   10   20   30   40   50   60   70   80   90   100   0   2000   4000   6000   8000   10000   12000   1   6   11   16   21   26   31   36   41   46   51   Av er ag e   CP U  U 3l iz a3 on   Re qu es ts  /  m in   Time   1  server   2  servers   3  servers   4  servers   System load

increases quickly System load

increases slowly: the system can sustain peak loads more easily

(19)

Purpose:    

–  integrate,  describe  and  query    

heterogeneous  data  in  a  uniform  way  

Approach:  

–  Crea2on  of  a  Conceptual  Model  to  integrate  and  cover  all  

the  thema2c  fields  

–  Map  the  source  rela2onal  data  into  RDF  data  compliant  to  

the  Conceptual  Model  

–  Rely  on  a  scalable  RDF  Triple  Store  (Virtuoso)  to  enforce  the  

mappings  and  enable  the  storage    and  query  of  the  RDF   data  

Data  Integra3on  and  Linking  

Data Management Data Integration & Linking Data import IGC Middleware Elastic Database Server Elastic File Server Elastic Compute Portal && Tools

IGC Management Data Publication

IGC Administration

Cloud Platform API ODBC/ JDBC/ SQL NFS/GFS IGC-API /elasticfs /elasticdb /elasticcomp IGC-API /master IGC-API /metadata/md /metadata/db IGC-API /data-import/fs /data-import/db /data-import/harvests HTTP/S FTP/S IGC-API /mapfiles /layertemplates SPARQL HTTP/S WebGIS Client Accounting Monitoring OGC:WMS OGC:WFS OGC:CSW OGC:WMS OGC:WFS HTTP/S Geospatial Metadata and Catalog Services Elastic Map Server Elastic Web Server

Data Providers Services

(20)

Data  Integra2on  Layer  

•  Abstrac2on  layer  for  data  

access  

abstract  the  applica?ons  from  the  

specific  setup  of  the  data  management   service  (such  as  local  vs.  remote,  

federa?on,  and  distribu?on)  

•  Beyond  Data  Access  

•  Enabling  automa2on  of  

discovery,  composi2on,  and  use   of  datasets  

•  Data  Markets  

•  Online  Visualiza2on  Services  

•  Data  Publishing  Solu2ons  

•  Data  Aggregators  

•  BI  /  Analy2cs  as  a  Service  

Linked  Open  Data  as  Service  

20   novembre  26,  2013   Query  Engine   Rel   DB   Rel  DB   Excel   files   XML  files   A P

I   Query   Update   Import   Export  

Linked Data

Extensible     Applica3on     Pool  

Visualiza2on   Collabora2on   sets'  Querying  Cross  Data  

(21)

We  are  using  a  

Nagios

-­‐based  solu2on  

–  Every  instance  has  specific  

Nagios  clients  genera2ng  the  

indicators  to  be  monitored  

–  The  informa2on  received  by  Nagios  

is  then  stored  in  a  Amazon  RDS  

–  We  can  analyze  the  monitoring  indicators  at  any  point  in  

2me,  even  when  the  plaUorm  is  not  running  

–  Indicators  include:  

•  Avg.  CPU  load,  memory,  disk  usage,  response  2me,  etc.  

–  We  developed  a  dedicated  interface  

•  Which  is  intended  for  admin  use  

(22)

Monitoring  

22   InGeoCloudS  Experts  Workshop   Nov.  21st,  2013  

(23)

We  can  have  per-­‐service  cost  from  Amazon  billing  

Elas?c  Database  Server  cost:  

–  Compute  hours/month      ………...  XXX  $  

–  Storage    GB/month  ……….…..  XXX  $  

–  Data  transfer                ………..      XXX  $  

This  allows  to  es2mate  the  cost  of  the  IGC  plaUorm  

components  

–  Also  useful  for  you  own  private  IGC  plaUorm  deployment  

We  need  more:  

– Per-­‐user  split  of  costs  

(24)

IGC  provides  Accoun2ng  APIs  

–  They  provide  a  detailed  user’s  share  of  cost  

For  each  Data  Provider:  

–  Elas?c  Web  Server                        ……….………...  XXX  $  

–  Elas?c  Map  Server                        ……….………...  XXX  $  

–  Other                                                                      ………..  

–                                                                                       GRAND  TOTAL      ……..………...  $  not  a  lot  $    

This  is  computed:  

–  By  measuring  directly  storage  occupancy  (both  DB  and  FS)  

–  By  applica2on  logs  to  es2mate  usage  shares  of  indivisible  

services  (e.g.,  compute  hours  of  Map  Server)  

Accoun3ng  Service  

24   InGeoCloudS  Experts  Workshop   Nov.  21st,  2013  

(25)

So…  how  much  does  it  cost  ?  

We  will  this  discuss  later  in  the  session  “

InGeoCloudS  

Sustainability,  Costs,  and  Opportuni2es  for  

Coopera2on  and  Trials

”  

(26)

InGeoCloudS  is  an  interes2ng  and  evolving    

cloud-­‐based

 

plaRorm  for  geo-­‐data  providers  

The  IGC  plaUorm  was  designed  on  the  basis  of  actual  data  

providers  use  cases:  

–  To  support  mul2ple  applica2ons  

–  To  enable  fast  por2ng  to  the  cloud  

It  provides  

scalable  services  and  on-­‐demand  

computa2on

,  by  taking  advantage  of:  

–  Cloud  “infinite”  resources  

–  Pay-­‐as-­‐you-­‐go  cost  model  

The  plaUorm  can  support  a  much  larger  number  of  users  

than  the  project  consor2um  size  

–  The  more  users,  the  smaller  the  cost  !  

Conclusions  

26   InGeoCloudS  Experts  Workshop   Nov.  21st,  2013  

(27)

References

Related documents

In memory of Harold Taub, beloved husband of Paula Taub by: Karen &amp; Charles Rosen.. Honouring Maria Belenkova-Buford on her marriage by: Karen &amp;

Strong evidence indicates that, as well as changes in global average conditions such as temperature and precipitation, projected climate change is likely to be accompanied by

Volume (2018), Issue 1 Spring 2018 Wildlife Sightings Newsletter of the Iowa Chapter of The Wildlife Society. Make TWS Journals Your Publishing Choice

Advantage: Tungsten carbide, TSP, and diamond enhanced inserts provide extra protection to the gauge. Benefit: In-gauge hole and longer bit life;

To address these questions, the following goals were set: (a) to reproduce field explosions pertaining to primary blast injury as accurate as possible in a controlled

“ the seven words of our LORD on

s-process p-process Mass known Half-life known

“Workforce Development Board” or “WDB” shall mean the Board established by the City as a non-profit, public benefit corporation and certified by the Governor of the State