• No results found

Big Data in OpenTopography

N/A
N/A
Protected

Academic year: 2021

Share "Big Data in OpenTopography"

Copied!
29
0
0

Loading.... (view fulltext now)

Full text

(1)

Big  Data  in  OpenTopography  

Vishu  Nandigam  

San  Diego  Supercomputer  Center  

 

NSF  Big  Data  in  Educa<on  Workshop  

(2)

Presenta<on  Overview  

•  Lidar  and  OpenTopography  

•  Data  and  Workflow  

•  Cyberinfrastructure  

•  Data  Growth  and  Challenges  

•  Data  Insights  

(3)

LIDAR  

•  LIght  Detec<on  And  Ranging  (aka    airborne  laser  

swath  mapping)  

•  Billions  of  of  accurate  distance  measurements  with  a  

scanning  laser  rangefinder  +  GPS  +  Iner<al   Measurement  Unit  (IMU)  

Point  cloud      

(x,y,z  coordinates)  =   fundamental  LIDAR   data  product  

(4)
(5)

OpenTopography  

•  NSF  Earth  Science  Facility:  3  year  support  in  2009.  

Renewed  in  2012    

(Award  No.  1226353  &  1225810  EAR/IF)  

•  CI  and  Science  Collabora3on  

–  SDSC,  ASU  &  UNAVCO  

•  Related  research  efforts  

–  NASA  ROSES:  Extend  to  Satellite-­‐based  LIDAR  (waveform  data)  

–  NSF  SI2  CyberGIS:  OpenTopo  as  an  exemplar  of  cyber  GIS  

–  NSF  CluE:  Inves<gate  Computer  Science  issues  in  big  data  

•  Partnerships  with  state  and  local  agencies  to  support  

(6)
(7)

OpenTopography  Data  

•  Large  user  community  with  

variable  needs  and  levels  of   sophis<ca<on.  

•  Goal:  maximize  access  to  

data  to  achieve  greatest   scien<fic  impact.    

•  Big  data    

–  treat  data  as  an  asset  that  can  

be  used  and  reused  

–  Co-­‐locate  data  with  on-­‐demand  

(8)

Data  Workflow  

1.  Original  Source  Data  from  Collector  (eg.  NCALM)  

2.  Extract  relevant  data  products  

3.  Source  data  is  archived  in  Chronopolis  digital  preserva<on  

network    

1.  UCSD  Library  (Library  of  Congress)  

2.  Three  geographically  distributed  copies  of  the  data  

4.  Extracted  data  products  go  through  QA/QC    

5.  Data  transforma<on  and  op<miza<on   1.  Error  correc<on  

2.  Projec<on  conversion  

6.  Update  Metadata  ISO  19115  (Data)  

7.  Generate  addi<onal  derived  products  (e.g.  GE  hillshades)  

   

(9)

Catalog  Service  for  the  Web  /  DOI    

(8  &9  of  the  Data  Workflow)  

•  CSW  Catalog  –  ESRI  Geoportal  Server  

•  ISO  19115  (Data)  

•  CZO,  CyberGIS,  Thomson  Reuters  Web  of  Science  

 

Protocol:  'CSW'  Profile:  'ArcGIS  Server  Geoportal   Extension’  

 

DOI:  hmp://dx.doi.org/10.5069/ G9ST7MR9  

EZID  is  a  service  of  UC  Cura<on  Center   of  the  CDL  

(10)

Current  Data  Holdings  

•  Lidar  Point  Cloud  Datasets   –  770  Billion+  lidar  returns.    

–  Each  return  has  addi<onal  amributes  

–  On-­‐demand  processing  capabili<es  

•  SRTM,  Raster  (mul<ple  layers)  

e.g.  Sonoma  -­‐  several  intensity  products,  canopy  height,  bare   earth,  hydro-­‐enforced  bare  earth,  canopy  top,  etc.  

•  30+  TB  of  on-­‐demand  online  data  

(Excluding  big  custom  job  runs,  original  archived  data,  pre  processing   data  and  user  generated  products)  

(11)

OT  CyberInfrastructure  

Infrastructure Tier Application Tier

OT Resources Data and Processing

XSEDE Resources Gordon Nodes

SDSC Resources SDSC Cloud Point  Cloud  Data  

PC  Data  Access   Service   PC  Data   Processing   Service   DOI   CSW   SRTM  Dataset   SRTM  Data   Access  Service   SRTM  Data   Processing   Service  

(12)

OT  Challenges    

(Keeping  pace  with  data  and  user  growth  

and  advancing  science!)  

(13)

LIDAR  Point  Cloud  Data  Growth  

0   20   40   60   80   100   120   140   160   Sum m er  2 00 5   fal l2 00 5   w in te r  2 00 6   spr ing  2006   Sum m er  2 00 6   fal l  2 00 6   w in te r  2 00 7   spr ing  2007   summe r  2007   fal l  2 00 7   w in te r  2 00 8   spr ing  2008   Sum m er  2 00 8   Fal l  2 00 8   W in te r  2 00 9   Spr ing  2 00 9   summe r  2009   Fal l  2 00 9   W in te r  2 01 0   Spr ing  2 01 0   Sum m er  2 01 0   Fal l  2 01 0   W in te r  2 01 1   Spr ing  2 01 1   Sum m er  2 01 1   Fal l  2 01 1   W in te r  2 01 2   Spr ing  2 01 2   Sum m er  2 01 2   Fal l  2 01 2   W in te r  2 01 3   Spr ing  2 01 3   Sum m er  2 01 3   Fal l  2 01 3   W in te r  2 01 4   Spr ing  2 01 4   OpenTopography  

(14)

Sensor  Hardware  Technology  

•  Rapid  Evolu<on  of  Laser  Scanner  Technology  

•  Data  is  being  collected  on  mul<ple  channels  (different  

wavelengths)  and  capturing  full  waveform  data.  

•  Early  datasets  collected  with  scanners  opera<ng  at  

less  than  33Khz.  Current  systems  collect  data  at   ~900Khz  

Greater  Resolu<on  =     Larger  Data  Volumes  

(15)

Diverse  and  complex  datasets  support  

•  Discrete  return  lidar  

•  Full  waveform  lidar    

•  Op<cal  imagery  (R,G,B  orthophotography)  

•  Hyperspectral  imagery  

         

NEON  Hyperspectral  Imagery  collected  light  reflected   across  the  electromagne<c  spectrum  for  a  total  of  426   bands  of  informa<on.  Image:  Nathan  Leisso,  NEON  AOP  

(16)

User  Growth  

0   1000   2000   3000   4000   5000   6000   7000   Au g-­‐ 08   No v-­‐ 08   Fe b-­‐0 9   May -­‐0 9   Au g-­‐ 09   No v-­‐ 09   Fe b-­‐1 0   May -­‐1 0   Au g-­‐ 10   No v-­‐ 10   Fe b-­‐1 1   May -­‐1 1   Au g-­‐ 11   No v-­‐ 11   Fe b-­‐1 2   May -­‐1 2   Au g-­‐ 12   No v-­‐ 12   Fe b-­‐1 3   May -­‐1 3   Au g-­‐ 13   No v-­‐ 13   Fe b-­‐1 4   May -­‐1 4   Au g-­‐ 14   No v-­‐ 14  

CyberGIS,  NASA/UNAVCO  communi<es     Service  Level  Access  

(17)

Pluggable  Services  Infrastructure  

•  Methods  for  scien<fic  data  processing  are  evolving  

•  Users  demanding  more  processing  services  

•  Pluggable  services  infrastructure   –  OT  development  sandbox  

–  Assist  researchers  with  their  code  

–  Deployment  of  the  algorithm  as  a  service  

–  Update  processing  workflow  and  UI    

(18)

Data  Insights  

What  can  we  learn  from:    

40,000+  custom  PC  jobs      

1.2  trillion  lidar  points  processed   15,000+  custom  raster  jobs  (past  year)  

(19)

Data  Access  Pamerns  

(20)

Access  Pamerns  in  LIDAR  Scien<fic  Datasets  

B4  –  San  Andreas  Fault  

0   20   40   60   80   100   120   140   160  

(21)

Access  Pamerns  in  LIDAR  Scien<fic  Datasets  

Event  based  datasets  -­‐  El  Mayor-­‐Cucapah  Earthquake  

0   20   40   60   80   100   120   140  

(22)

Access  Pamerns  in  LIDAR  Scien<fic  Datasets  

Event  based  datasets  –  Hai<  Earthquake  

0   20   40   60   80   100   120   140   160  

(23)

Data  Usage  

Analy<cs  

Northern  San  Andreas   Fault  

Social  networking  with  data   Recommenda<on  system  

(24)

Understanding  Traffic  Pamerns  

Impact  of  dataset  releases  and  workshops  on  traffic  

0   20   40   60   80   100   120   140   160   180   200   0   2000   4000   6000   8000   10000   12000   14000  

Jan-­‐13   Feb-­‐13   Mar-­‐13   Apr-­‐13   May-­‐13   Jun-­‐13   Jul-­‐13   Aug-­‐13   Sep-­‐13   Oct-­‐13   Nov-­‐13   Dec-­‐13  

Re gi ste re d   U se rs   Pag e   Vi si ts  

New  Users   Registered  users   Daily  visits  

SRTM  Global   Dataset  Released  

(25)

Tiered  storage  based  on  Data  Access  

•  Ac<vity  based  data  ranking  and  <ered  cloud  

integrated  storage   SSD New and existing most active data Regular Disk

Between Hot and Cool

Cloud

(26)

Cloud  Compu<ng  

•  Cost  effec<veness  &  feasibility  of  data  science  facili<es  on  the  cloud    

•  Microsos  Azure  for  Research  Award  Integra<on  of  cloud  based  on-­‐

demand  geospa<al  processing  services  into  community  earth   science  data  facili<es.    

(27)

Leveraging  HPC    

•  Dedicated  Gordon  nodes  via  XSEDE  (democra<za<on  

of  supercompu<ng  resources)    

 

GEO  Data  and  Cyberinfrastructure  Impera<ve:   Harness  the  Power  of  Compu<ng  and  

Computa<onal  Infrastructure.  

(28)

Summary  

•  OpenTopography  is  an  modern  agile  data  facility  

•  Cyberinfrastructure  driven  by  science  use  cases  –  CI  

and  science  collabora<on  

•  Big  Data  needs  to  be  usable  -­‐  Community  not  only  

wants  access  to  data  but  also  wants  tools  for   processing  these  data.  

•  Concept  of  OT  can  be  used  as  a  template  for  other  

(29)

Thank  You!  

@OpenTopography   [email protected]  

[email protected]    

White  River,  IN.   Credit:  Indiana   Geological  Survey/The   State  of  Indiana  

References

Related documents

The Green Roof also significantly moderated the heat flow through the roofing system and reduced the average daily energy demand for space conditioning due to the heat flow

350&lt; CD4+&lt; 200 months pts 200&lt; CD4 or AIDS pts Median=15.6 Median=32.4 0 1 2 3 4 5 6 Hepatitis C Hepatitis B Syphilis Gonorrhoea genital herpes condylomatosis

The study used time series data for Zimbabwe (1975-2012) to: (i) empirically determine the link between economic growth and four macroeconomic variables (Foreign

The Green party has a wider territorial coverage, but faces difficulties in several small cantons, especially where restrictive electoral systems for national (and sometimes also

The diode D1 is forward bias “ON” and diode D2 is reverse bias “OFF” in the positive half cycle of input voltage and current flows from point a to point b.. Whereas in

Selective prosecution against Muslim Americans who decline to comply with the requirements of the law would clearly violate the Constitution on equal protection grounds. 96

This paper is structured as follows: Section 2 provides an overview of SOLA; Section 3 outlines the basic operation of the improved phase vocoder [5], which makes use of

The process computational semantics, such as those reflected in task node types, process link types, and conditions on task nodes and process links, are assigned to those typed