• No results found

Data Lab System Architecture

N/A
N/A
Protected

Academic year: 2021

Share "Data Lab System Architecture"

Copied!
19
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)
(3)

Data Lab Architecture

Astronomer’s Desktop Legacy Apps User Code Cmdline Tools Web Page

Data Lab Ops

User Mgmt Monitoring

Data Access Services

VOSpace UWS SCS SSA SIA TAP UWS SQL Service UWS Public Services Resource Resolver Storage Mgr Query Manager Job Manager Authentication Private Services Ops Monitor Private Repo Public Repo Storage Resource User Space Virtual Space Compute Resource Compute Jobs External Resources VO Data VO Svcs NSA Databases Data Pub Ops DBs

Large Cats UWS

MyDB Presentation Layer Services Layer Data Access Layer Resources Layer

(4)

Data Lab Architecture

Storage Resource User Space Virtual Space Compute Resource Compute Jobs External Resources VO Data VO Svcs NSA Databases Data Pub Ops DBs

Large Cats UWS

MyDB

Data Access Services

VOSpace UWS SCS SSA SIA TAP UWS SQL Service UWS Public Services Resource Resolver Storage Mgr Query Manager Job Manager Authentication Private Services Ops Monitor Private Repo Public Repo Astronomer’s Desktop Legacy Apps User Code Cmdline Tools Web Page

Data Lab Ops

User Mgmt Monitoring Presentation

Layer Services Layer Data Access Layer Resources Layer

(5)

Presentation Layer

This layer contains the primary user interfaces.

Astronomer’s Desktop

–  Web clients -- data query forms, content browsers, monitors, etc –  Command-line tools -- for local desktop access

–  Legacy Apps -- inc. scripting environments such as Python –  User-written code -- custom science clients

–  Login shells

Operators Tools

–  System Monitoring / Administration –  User and Resource management

(6)

Services Layer

This layer provides interfaces used mostly by software.

Public Services

–  Authentication / Authorization – controlled access to D/L –  Job Manager – manage compute jobs

–  Query Manager – manage large data queries

–  Storage Manager – manage virtual storage resource

–  Resource Resolver – locate services / resource within D/L

Private Services

(7)

Data Access Layer

This layer provides interfaces to data services.

Simple VO data services

–  Catalog/images/spectra – positional (+constraint) based query –  Anonymous access allowed

Advanced Catalog Services

–  Full SQL query capability

•  VO standard interface (public access)

•  Custom SQL interface (authorized access)

Virtual storage

(8)

Service vs. Access Layers

Why the need for different layers?

Service Layer Access Layer

Astronomer Friendly X

Authorized Access X

Anonymous Access X X

Direct VO Protocols X

Job Control X Depends

Data Lab API X X

Virtual Observatory API X

Web Interface X* X

Programmatic (Desktop) Interface X* X*

(9)

Resources Layer

This layer describes physical / logical resources in the D/L.

Databases

–  Large (distributed) Catalog DB

–  Personal DB (similar to SDSS MyDB) –  User-published datasets

–  Operational DB

Physical Storage

–  Persistent user storage –  Virtual storage

Compute Resources

–  Servers for processing workflows

External Services

(10)

Large Catalogs

•  Require a low-cost, scalable and reliable solution

•  No viable turnkey system available

•  The LSST

QServ

project will gain us valuable experience

•  Presents a “normal” DB interface to client

-  Can put TAP/SQL service in front of it

QServ

•  Can optimize data partitioning

thru experimentation

•  Requires dedicated hardware

for each catalog instance

(11)

Virtual Storage

•  Implemented using disk filesystem as back-end

–  Simplifies exported service for use on local user file systems –  Provides options for D/L operations:

•  User-based partition scheme

•  Legacy code can bypass VOSpace protocols (via FUSE mounted filesystem)

•  Cons: Potential synchronization issues

•  Containers used to package service

–  Bundle dependencies

–  FUSE mounts for other containers

•  Exploit protocol’s support of:

–  Capabilities –  Views

Virtual  Storage  Service  Container  

Image/Table   Support  Apps   Data  Lab  Interfaces  

Python  

VOSpace    

Database  

Base  Docker  OS  

Local  Disk   Container  

(12)

Example - Bringing It All Together

NOAO Data Lab DL Task

Virtual Storage Svcs Large Catalog Svcs DL Task

Data Publication Svcs PI/Survey NSA

MyDB

User 1 Desktop

Virtual Storage Svc DL Task DL Task

MyDB

User 2 Laptop

Virtual Storage Svc Legacy Tools

Data Publication Svc

1(a)

1(b) 1(c)

2(a)

(13)

Compute Services / Virtualization

Task Containers

•  Why are they interesting?

–  Provide task-level virtualization

–  Much smaller in size, faster to startup –  Bundles / isolates dependencies

–  Container images can be layered

•  E.g. a “base Python 2.7 environment” –  Containers have their own IP address –  Users can “login” to a container

–  Can be deployed to other Clouds easily –  Growing user / developer community –  Repository of public containers available

Tasking

Interface <<Task>>

Data Lab Support Code Base OS Image Disk Cache Mount Virtual Storage F U S E Task Container Params Results

(14)

Compute Services / Virtualization

Task Containers

•  What can you contain?

–  Web applications

–  Desktop Tools

–  Almost anything….

Tasking Interface

–  Handles UWS communications with the Job Manager

•  Allows for setting of parameters, results collection, timeouts –  Redirects stdio streams back to calling client

Container Storage

–  Persistent cache container shared in a workflow

–  Virtual storage can be mounted as part of environment

Tasking

Interface <<Task>>

Data Lab Support Code Base OS Image Disk Cache Mount Virtual Storage F U S E Task Container Params Results

(15)

Compute Services / Job Manager

Job Manager

•  Parallelizes a request based on user parameters

–  User-defined independent input list to parallelize

•  Initializes a job on the remote compute server

•  Executes as

sync

or

async

job

–  UWS for job control

•  Polls for completion

•  Gets result objects

•  Returns results to client

–  Or, creates new transfer job

•  Manages hundreds of jobs

Tasking Interface <<Task>> Tasking Interface UWS Client <<Task>> fork() fork() stdio streams stdio streams

Job Manager Job Manager

ssh ssh

(16)

Query Manager / SQL Service

Query Manager

•  Provides a high-level, uniform, interface for clients to query

data services

–  Hides the sync/async job handling and VO protocols from clients –  Orchestrates result handling (download, save to virtual storage, etc)

SQL Service

•  Provides job control for query by implementing

UWS

•  Offers options for query-result handling

–  Store to personal database, virtual storage, direct download, etc. –  Download format options (FITS, etc)

•  Offers alternative to VO TAP

(17)

Data Publication

•  Capability is used in multiple contexts

–  Public access to high-level data products (static)

–  Private access used in workflows (transient)

–  Semi-private access within a collaboration (shared)

•  Shared responsibility between D/L and Users

–  D/L provides tools, resources and a publishing framework –  Users provide the content and the scientific curation

•  Low-cost,

simple

, services for all datasets

(18)

Storage Manager

•  Provides a

simple interface

for user applications

–  Hides details of the Virtual Storage implementation (VOSpace) –  Can map to idiomatic filesystem interfaces easily (i.e. get, put, list)

•  Abstracts

easily to web, desktop and programmatic APIs

•  Provides

authenticated access

to data holdings

•  Manages the details for other Data Lab services

(19)

Authentication / Authorization

•  Deferred implementation

in Year-1 due to potential

landmines in a changing landscape

–  General user support not needed, trusted-users only

–  Y1 services to use null interface to identify need for service in the code w/out requiring a working service

–  Various authentication methods under discussion

•  Requests to

public services

passed-thru automatically

–  Implies, service knows public vs private services

•  Manages user- and group-level

access to resources

References

Related documents