appscale: open-source platform-level
cloud computing
I2 Joint Techs February 2nd, 2010
Chandra Krintz
Computer Science Dept.
cloud computing
•
Remote access to distributed and shared cluster resources
Potentially owned by someone else (e.g. Amazon, Google, …) Users rent access to vast resources
Advertised service-level-agreements (SLAs)
Resources are opaque and isolated
Highly scalable, fault tolerant
Service-oriented, utility computing
Relies on OS, network, and storage virtualization
cloud computing
•
3 types: as-a-Service (aaS)
Infrastructure: Amazon Web Services (EC2, S3, EBS)
Virtualized, isolated (CPU, Network, Storage) systems on which
users execute entire runtime stacks
Fully customer self-service
Open APIs (IaaS standard), scalable services
Platform: Google App Engine, Microsoft Azure
Scalable program-level abstractions via well-defined interfaces
Enable construction of network-accessible applications
Process-level (sandbox) isolation, complete software stack
Software: Salesforce.com
Applications provided to thin clients over a network
an opening in the clouds
•
Open-source cloud computing systems from the
UCSB Computer Science Department
Goal: Bring popular cloud fabrics to “on-premise” clusters that
are easy to use and are transparent
To facilitate investigation of
Energy-efficient cloud computing
Services, underlying device technology, support technologies
Customization (availability, performance, application behavior)
an opening in the clouds
•
Open-source cloud computing systems from the
UCSB Computer Science Department
Goal: Bring popular cloud fabrics to “on-premise” clusters that
are easy to use and are transparent
To facilitate investigation of
Energy-efficient cloud computing
Services, underlying device technology, support technologies
Customization (availability, performance, application behavior)
Hybrid cloud solutions (public and on-premise)
By emulating key cloud layers from the commercial sector
Engender user community, access to real applications/users
Leverage extant software technologies
cloud computing from UCSB
•
IaaS:
Open-source implementation of all AWS APIs Robust, highly-available, scalable emulation
Cluster/data center support over Xen, KVM, VMWare
•
PaaS:
Open-source implementation of Google App Engine APIs Pluggable (services), scalable, fault tolerant
Runs over virtualization or IaaS layer: AWS, Eucalyptus
google app engine
GAE Application (Python, Java)
private, enterprise data
Images
IM Memcache Mail
Users URL Fetch Administr ator Console Data Store Protobuf Data APIs SDC Google App Engine (GAE)
Services Cron Tasks MyApp.appspot.com Blob store
google app engine: the sdk
GAE Application (Python, Java)
Google App Engine (GAE)
python2.5 dev_appserver.py –port=8181 MyApp
Open-source Google App Engine Software Development Kit (SDK) Images URL Users Mail IM Cache Mem
Fetch Tasks Cron store Data Blob
google app engine: run/test locally
GAE Application (Python, Java)
Google App Engine (GAE)
Open-source Google App Engine Software Development Kit (SDK)
python2.5 dev_appserver.py –port=8181 MyApp
send- mail
= simulation of actual API functionality using localhost (flat file, in-memory hash (Memcache))
curl /wget frame-
work lib auth no console on console on
Images URL Users Mail IM Cache Mem
Fetch Tasks Cron store Data Blob
Store
on console
google app engine: upload to google
GAE Application (Python, Java)
Google App Engine (GAE)
appcfg.py update MyApp/
private, enterprise data
SDC
Administrator Console
…
GAE app users via the Internet
Images URL Users Mail IM Cache Mem
Fetch Tasks Cron store Data Blob
sandbox restrictions
GAE Application (Python, Java)
Google App Engine (GAE)
MyApp.appspot.com
• Pure Python or Java, white list of library calls to framework
• No thread/subprocess spawning, system calls
• No writes to file system, reads only to static files uploaded w/app
• Storage using key-value, schema-free datastore (Bigtable-based)
• HTTP/S communication only, CGI to handle page requests
• Limit on number of datastore elements accessed per request
• Limit on response duration, task frequency, request rate
• Enforced quotas (BW, CPU, requests/s, files, app size, …)
• Other things to consider
• Your code and data on Google resources
from gae to appscale
•
GAE SDK extensions
Pluggable using open-source distributed database technologies
HBase, Hypertable, Cassandra, Voldemort, MongoDB, MemcacheDB, MySQL
MemcacheD library (Python and Java)
From console or as background thread (automatically)
Interface to Hadoop (MapReduce)
Multi-language support: Python, Java, Ruby, Perl, soon: X10
Translator to Linux Cron job, similar to Tasks
Mem Cache
Tasks Data store
appscale
GAE App Developer (AppScale Admin) GAE App Users AppScale HTTPS App Controller ALB DB S/P AS GAE App Users GAE App Users AppScale Cloud•
Distributed system with four key components
AppLoadBalancer (ALB) Database Master/Peer (DB M/P) AppServer (AS) Database Slave/Peer (DB S/P)
•
Services
Automatic deployment, database replication, node & front-end scaling
Over Eucalyptus, EC2, and virtualization (Xen, KVM)
System-wide performance/availability monitoring, user/admin console
Tasks (e.g. Map Reduce)
appscale
GAE App Developer
App
ALB
AppScale Cloud
Tasks (e.g. Map Reduce)
DB M/P
•
Implements every AppScale component
Can instantiate as a particular role (ALB, AS, DB)
Can change functionality and instantiate itself as another
appscale performance
•
2 VCPUs 2.83GHz, 4GB RAM, 16GB disk
1 2 3 4 5 6 7
Query Time [s]
Average Time to Query a Database of Size 1000
HBase (1 accessor) MongoDB (1 accessor) MemcacheDB (1 accessor) Google (1 accessor) HBase (3 accessors) MongoDB (3 accessors) MemcacheDB (3 accessors) Google (3 accessors)
appscale projects: http://appscale.cs.ucsb.edu
•
Open-source community management
Bug fixes, feature additions, releases, user support
•
Research (currently only internally available)
Automatic scaling of load, demand, other metrics
Scheduling and load balancing of apps, tasks, components
Hybrid cloud solutions (public/private, multi-zone)
Tunable fault-tolerance and availability
Efficient communication across isolation boundaries Alternative application domains (streaming, HPC)
appscale projects: http://appscale.cs.ucsb.edu
•
Open-source community management
Bug fixes, feature additions, releases, user support
•
Research (currently only internally available)
Automatic scaling of load, demand, other metrics
Scheduling and load balancing of apps, tasks, components
Hybrid cloud solutions (public/private, multi-zone)
Tunable fault-tolerance and availability
Efficient communication across isolation boundaries Alternative application domains (streaming, HPC)
Distributed profiling/sampling, feedback-driven optimization
Paas/IaaS integration and co-operation
Customized, dynamic/adaptive SLAs
appscale
http://appscale.cs.ucsb.edu
•
Thanks!
Leads: Chris Bunch, Navraj Chohan
Development and research team: Jovan Chohan, Nupur Garg,
Matt Hubert, Jonathan Kupferman, Puneet Lakhina, Yiming Li, Nagy Mostafa, Yoshihide Nomura (Fujitsu), Michal Weigel