• No results found

CASA Analysis and Visualization

N/A
N/A
Protected

Academic year: 2021

Share "CASA Analysis and Visualization"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

CASA  Analysis  and  Visualization    

Synthesis  ...  1  

Current  Status  ...  1  

General  Goals  and  Challenges  ...  3  

Immediate  Goals  ...  5  

Harnessing  Community  Development  ...  7  

 

Synthesis  

 

We   summarize   capabilities   and   challenges   facing   CASA   visualization   and   image   analysis:   immediate   needs,   large   data   with   spectral   axes   covering   many   lines,   and   limited   resources   for   development.   We   describe   the   path   forward   that   these   concerns   suggest:   one   where   basic   functionality   is   added   over   each   six   month   development   cycle,   focusing   on   cube   exploration,   basic   data   access   and   manipulation,   and   high   quality   output.   We   describe   the   interface   with   community   development,  which  has  two  components:  (1)  providing  easy  access  to  the  data  for   scientist-­‐scale   development   and   interface   with   existing   tools   and   packages,   including  VO  servers,  and  (2)  serving  as  a  functional,  immediately  available  bridge   until  external  projects  (ALMA  development  or  otherwise)  reach  maturity.  

Current  Status  

 

CASA   includes   an   interactive   data   browser,   the   CASA   viewer,   and   a   suite   of   core   image   analysis   tools   wrapped   into   ~15   analysis-­‐oriented   tasks   that   address   basic   image  processing  needs.  

 

The  viewer  can  be  run  as  a  standalone  application  or  from  within  CASA.  It  can  be   scripted   from   the   shell,   but   the   set   of   features   accessible   while   scripting   remains   limited.   The   viewer   reads   CASA   image   files   and   FITS   files.     It   registers   images   on   loading  and  provides  the  ability  to  explore  data  in  the  following  ways:  

 

1. Scroll  among  channel  maps,  zoom,  and  pan  to  move  through  an  image  cube.   2. Examine  spectra  of  individual  lines  of  sight  or  averaged  over  regions.   3. Identify  spectral  lines  by  comparing  to  the  Splatalogue1  database.  

4. Interactively  fit  a  spectral  line.                                                                                                                  

(2)

5. Interactively   specify   or   load   from   disk   regions   of   interest   for   statistical   analysis,  spectral  analysis,  or  use  as  a  mask.  

6. Calculation  of  image  statistics  from  regions  of  interst.   7. On-­‐the-­‐fly  calculation  of  moment  maps  from  image  cubes.  

8. Comparisons  of  multiple  cubes  via  contours,  blinking,  or  multiple  panels.    

A   separate   viewer   mode,   currently   accessible   only   via   the   CLEAN   task,   allows   the   user   to   interactively   construct   or   edit   a   “mask”   image   from   the   union   of   hand-­‐ specified  regions.  That  mode  also  gives  the  user  control  over  the  CLEAN  loop  and   the  two  capabilities  can  be  used  hand  in  hand.  Extensions  with  the  next  release  will   include   histograms   of   pixel   values   for   use   with   noise   estimation,   color   table   manipulation,   etc.;   the   ability   to   calculate   position-­‐velocity   cuts;   and   continuing   improvements   to   the   integration   with   the   splatalogue   spectral   line   database   and   general   interface.   The   user   friendliness   of   these   capabilities   still   needs   to   be   improved  over  the  course  of  future  cycles,  but  the  viewer  already  represents  a  best-­‐ in-­‐class  cube  viewer  in  many  ways.  

 

Many  of  the  current  image  analysis  tasks  focus  on  manipulation  of  image  cubes  to   allow  the  user  to  produce  their  desired  final  image  product.  These  include  tasks  to:  

 

1. “Collapse”  a  data  cube  to  an  image  in  several  ways.   2. Regrid  a  cube  to  a  new  astrometric  grid  or  ordering.   3. Smooth  a  data  cube  spatially  (but  not  yet  spectrall).  

4. Extract  regions  of  interest  to  a  python  array  or  a  new  sub  image.   5. Manipulate  masks  for  moment  creation  and  deconvolution.  

6. Perform  a  large  set  of  pixelwise  mathematical  operations  on  an  image.   7. Import  and  export  data  from  FITS  (to  CASA  images).    

 

There  is  also  a  limited  set  of  more  directly  analysis  focused  tasks  that  allow  one  to:    

1. Fit  Gaussian  sources  to  an  image.  

2. Fit  a  spectrum  with  line  profiles  or  polynomials.   3. Derive  statistics  from  a  region  of  interest.    

Current  development  is  mainly  carried  out  by  two  developers  with  support  from  the   original  viewer  developer.  One  focuses  mainly  on  enhancements  to  the  viewer  and   exposing   analysis   functionality   graphically.   The   other   focuses   on   adding   image   analysis  functionality  and  exposing  this  via  shell-­‐level  tasks.  

 

Qualitatively,  the  state  of  the  CASA  viewer  is  comparable  to  that  of  the  karma  and   ds9   packages.   Unlike   karma,   the   viewer   is   being   actively   developed.   The   cube   browsing  features  (including  integration  with  splatalogue)  exceed  those  of  ds9  but   overall  ds9’s  image  browsing  capabilities  (color  table  and  coordinate  manipulation,   interactive  analysis)  exceed  the  viewer.  The  integration  of  the  viewer  with  the  CASA   image  format  and  thus  with  the  data  reduction/imaging  loop  represents  a  unique,   currently  irreplaceable  aspect  of  the  viewer.    

(3)

General  Goals  and  Challenges  

 

The   most   conservative   goals   for   the   viewer   and   analysis   packages   are   to   allow   ALMA  and  VLA  users  to:  

 

1. Interactively   explore   their   data,   especially   data   cubes.   Examine   spectra,   movies,   derive   image   statistics,   identify   spectral   lines.   Compare   multiple   lines,  identify  unknown  lines.  

 

2. Manipulate   their   cubes   and   gain   easy   access   to   whatever   form   or   part   of   the   data  that  they  want.  This  includes  smoothing  along  the  spatial  and  frequency   axes,  regridding,  subcube  extraction/slicing,  extraction  of  moment  maps,  and   extraction   of   arbitrary   position-­‐velocity   slices   and   export   to   FITS   or   other   data  structures  appropriate  for  further  analysis.  

 

3. Visualize  the  data  in  a  way  that  is  appropriate  for  publication.   Produce   high   quality,  labeled  images  (at  least)  and  plots  and  export  to  appropriate  format.    

 

This  is  the  absolute  minimum  bar  for  CASA’s  visualization  and  analysis:  tools  to  look   at   the   data   produced   by   the   NRAO   telescopes   and   export   the   desired   form   of   that   data   to   the   field-­‐standard   format.   This   list   does   not   include   model   fitting   (species   fitting,   rotation   curves,   density   profiles),   novel   data   browsing   techniques   (like   wireframe   surfaces   or   3-­‐d   rotation),   source   extraction   (clumpfinding)   and   characterization,  or  other  advanced  analysis.    

 

Key  challenges  to  meet  these  basic  goals  are:    

1. The  Spectral  Axis.  To  state  the  obvious:  robust  handling  of  a  data  cube  (rather   than   just   an   image)   is   a   key   requirement   and   strength   of   the   CASA   viewer,   compared   to,   say,   ds9.   This   means   that   things   like   three-­‐dimensional   mask   creation  for  imaging,  position-­‐velocity  cuts,  spectral  browsing,  and  moment   creation   are   core   functionality.   These   require   careful   coordinate   handling,   one  reason  that  non-­‐astronomy  software  is  not  a  trivial  solution.  

 

2. Large  Data  Sets.   A   single   field   ALMA   observation   can   currently   produces   a   cube   with   natural   size   ~250x250x4000   per   baseband   (x4   per   data   set).   Mosaics   increase   the   spatial   dimensions   up   to   as   many   as   a   few   thousand   pixels,   while   spectral   averaging   and/or   considering   only   a   single   line   at   a   time   reduces   the   spectral   dimension   substantially.   Realistically,   data   cubes   with   hundreds   of   elements   along   each   spectral   and   spatial   dimension   and   sizes   of   several   hundreds   of   MBs   are   now   commonplace.   Data   cubes   with   several   thousand   elements   along   one   or   both   axes   and   sizes   of   several   GBs   are  no  longer  exceptional.  In  the  near  future,  cubes  with  sizes  of  several  tens   of  GBs  and  thousands  of  elements  along  both  spatial  and  spectral  axes  have  

(4)

the   potential   to   become   commonplace.   This   is   especially   true   if   the   ALMA   pipeline  makes  the  decision  to  archive  imaging  of  entire  spectral  windows.    

3. Multiple  Lines.    The  large  spectral  coverage  and  sensitivity  of  ALMA  and  the   VLA  mean  that  these  large  data  sets  now  often  cover  multiple  spectral  lines   in   a   single   data   set,   often   including   unexpected   lines   that   need   to   be   identified.   The   viewer   and   analysis   software   needs   to   allow   for   ready   line   identification,  extraction  of  single  line  data  sets,  and  easy  comparison  among   different  lines  within  a  data  cube  or  drawn  from  several  data  cubes.  

 

4. Immediate  need:  ALMA  and  the  VLA  are  working  now.  Both  NRAO  facilities  are   now  producing  these  large,  complex  data  cubes  and  the  software  needs  to  be   in   place   for   image   inspection   to   evaluate   reduction,   tune   the   imaging   (requiring  noise  estimates  and  CLEAN  mask  creation),  and  create  the  output   to  be  used  in  scientific  analysis.  CASA’s  viewer  already  plays  an  indispensible   role  in  this  process  for  both  telescopes.  

 

5. Limited  Resources.   CASA   has   two   developers   devoting   most   of   their   time   to   this   topic,   one   focused   on   graphical   user   interfaces   for   data   inspection   and   one  focused  on  analysis  code  (a  third  expert  developer  assists  but  is  focused   elsewhere).   These   developers   attempt   to   deploy   professional   code   that   has   undergone  some  quality  assurance  on  six  month  cycles.  

 

6. Complement  the  community.   The   community   has   developed   a   large   array   of   tools   to   analyze   spectroscopic   and   image   radio   data   over   the   years.   This   largely   does   not   include   the   kind   of   basic   visualization   and   infrastructure   routines  discussed  above,  but  it  does  include  tools  for  model  fitting  and  other   science  analysis.  These  are  often  not  adequate  to  extract  full  scientific  return   from  rich  ALMA  or  VLA  data,  but  the  community  can  be  expected  to  develop   new  tools  as  they  are  presented  with  new  data.  These  “end  tools”  also  do  not   represent  widespread  points  of  failure  if  they  are  not  maintained.  

 

These  concerns  naturally  suggest  that  in  the  short  term,  CASA  and  NRAO  adopt  an   “evolutionary,  not  revolutionary”  approach  to  visualization  and  analysis.  Given  the   scarce  resources  within  the  CASA  project  and  the  very  clear,  very  immediate  need   for  these  tools  (they  are  actively  used  by  the  ALMA  and  VLA  projects  every  day)  it   makes   sense   for   CASA   to   focus   its   efforts   on   providing   continuous   improvement  in  the  basic  exploration  and  data  manipulation  tools.  These  are   unlikely   to   be   developed   by   the   community   to   a   standard   of   robustness,   user-­‐ friendliness,   and   adaptability   that   NRAO   could   adopt   and   maintain   and   they   are   essential   to   move   our   data   to   science-­‐ready   state.   In   parallel,   CASA   is   making   an   effort   to   expose   data   to   users   in   as   many   ways   as   possible   to   facilitate   ready   integration  with  scientist-­‐developed  tools.  

 

This  focus  on  improving  existing  tools  and  adding  basic  functionality  complements   work   being   pursued   by   ALMA   Development   Studies,   which   are   exploring   the  

(5)

creation   of   entirely   new   server-­‐side   environments   and   new   analysis   packages   to   explore  highly  multi-­‐dimensional  data.  This  functionality  may  eventually  be  key  to   make  the  most  out  of  rich  ALMA  and  VLA  data  sets,  but  does  not  appear  to  be  best   pursued   by   a   small   software   development   team   in   the   context   of   the   six   month   development   cycle   and   does   not   appear   to   be   at   all   imminent   (if   it   ever   materializes).   The   sensible   vision   for   CASA   is   to   continue   to   make   conservative,   widely   useful,   well-­‐tested   additions   and   serve   as   a   mainstay   package   until   some   more  ambitious  next-­‐generation  package  is  ready  for  deployment.  This  ensures  that   the  community  has  access  to  a  well-­‐maintained,  full-­‐featured,  immediately  available   data  exploration  tool  for  the  years  that  it  will  take  to  develop  the  viewer’s  successor.  

Immediate  Goals  

 

The  practice  for  the  last  year  (since  the  addition  of  the  second  developer)  has  been   to   keep   the   priorities   above   in   mind   and   identify   pressing   needs   that   can   be   addressed   during   a   6-­‐month   CASA   development   cycle,   drawing   inspiration   from   other  data  browsing  programs  like  karma,  ds9,  GAIA,  GILDAS,  and  AIPS/AIPS++.  The   last  few  cycles  have  seen  the  following  non-­‐exhaustive  list  of  additions:  

 

1. A   spectral   line   browser   linked   to   the   main   viewer   that   can   be   used   to   examine  spectra  for  points  or  regions  and  navigate  the  cube.  

2. Integration  with  the  splatalogue  database  for  shell-­‐level  and  interactive  line   identification.  

3. A  rich,  robust  region  system  exposed  through  the  GUI.  

4. Improved  registration  of  data  cubes  comparing  different  transitions.  

5. The  addition  of  interactive  moment  creation  and  performance  enhancements   to  moment  creation.  

6. New  tasks  for  subcube  extraction  and  better  exposure  of  the  data  directly  to   python  (with  full  coordinate  information).  

 

These  exemplify  the  “incremental”  approach  described  above.  They  are  all  general   use   functionality   that   can   be   developed,   tested,   and   deployed   during   a   six   month   development  cycle.  They  improve  the  exploration  infrastructure  for  almost  all  users   and  allow  easier  access  to  the  data  for  specialized  analysis.  

 

Following  this  model,  the  current  development  cycle  has  seen  improved  browsing  of   cube   values   via   pixel   histograms,   which   are   also   being   coupled   to   moment   map   creation   and   the   color   table.  At   the   task/shell   level,   CASA   has   added   the   ability   to   extract  position-­‐velocity  cuts,  which  has  been  one  of  the  most  requested  features  for   several   cycles.   Priorities   for   the   next   cycle   have   not   been   set   yet,   but   will   almost   certainly  include  exposing  these  cuts  via  a  graphical  interface  inside  the  viewer  and   the  ability  to  browse  intensity  profiles  along  cuts  or  user-­‐defined  paths.  

 

(6)

 

1. Improved   GUI   exposures   for   subcube   and   position-­‐velocity   slice   extraction   and  the  addition  of  new  viewer  windows  to  explore  position  velocity  slices   (analogous  to  the  spectrum  browser).  

2. A  general  version  of  the  GUI  mask  creation  utility  available  through  CLEAN.   3. Task-­‐level   exposure   of   basic   masking/thresholding   capabilities   for   use   in  

clean  mask  creation.  

4. Adding   the   ability   to   spectrally   smooth   images   (in   addition   to   the   current   spatial  smoothing).  

5. Improvements   to   the   interface   to   register   images   on   common   coordinate   frame  and  compare  different  lines  within  a  data  set.  

6. Added  functionality  for  users  to  manipulate  their  data  directly.    

In   addition   to   these   goals,   there   are   three   major   areas   that   can   be   expected   to   become  main  focus  items  during  upcoming  development  cycles:  

   

1. Production  of  publication  quality  plots.  

2. The  ability  to  script  viewer  functionality  from  the  command  line.   3. Exposure  of  the  viewer  to  scientist  designed  code.  

 

The  ability  to  produce  publication  quality  plots  has  been  a  long-­‐standing  target  and   the  viewer  has  seen  a  continuous  stream  of  minor  cosmetic  improvements.  This  can   be  expected  to  continue  and  could  rise  to  a  main  focus  of  development  depending  on   community  feedback.  That  is,  the  viewer  makes  plots  now.  As  the  demand  for  better   plots  exceeds  the  demand  for  other  basic  functionality,  development  focus  on  this   item  will  increase.  

 

The  viewer  has  basic  scripting  capability  via  the  IMVIEW  task,  but  the  exposure  of   new   functionality   to   command   line   calls   has   lagged   and   will   require   a   substantial   investment   of   developer   time.   Eventually,   this   is   key   to   interface   the   visualization   capabilities  of  the  viewer  with  automated  scripting.  This  level  of  reproducibility  is   especially   important   for   the   publication   quality   plot   angle   -­‐   imagine   wanting   to   change  one  aspect  of  a  plot  during  revision  of  a  paper  or  (a  very  common  occurence)   to  cycle  through  a  survey  using  the  viewer.  

 

Finally,  if  CASA  development  focuses  on  infrastructure  and  basic  functionality  then   the   viewer   needs   to   be   exposed   to   scientist   developers   in   a   way   that   makes   it   possible  for  them  to  interface  their  code  with  it  in  real  time.  A  basic  example  would   be  spectrum  fitting  or  overplotting  of  models  on  a  spectrum.  The  splatalogue  team   has   developed   a   test   case   along   these   lines,   working   out   an   LTE   calculation   that   predicts  relative  line  strengths  for  specified  species.  In  general,  the  issue  is  to  define   how  the  viewer  would  interact  with  user-­‐contributed  code.  A  closely  related  issue  is   the  interface  of  the  viewer  and  CASA  tasks  with  VO  servers  to  allow  the  integration   of  cube  exploration  with  existing  databases.  

(7)

The   relative   priority   of   these   three   broader   goals   still   has   to   be   weighed   at   each   cycle  by  gauging  community  need2.  Broadly,  along  with  basic  cube  exploration  and  

manipulation  these  represent  the  core  goals  of  the  “evolution”  approach.  In  an  ideal   world,   scripting   and   exposure   of   a   viewer   interface   might   allow   next-­‐generation   analysis  approaches  developed  by  the  community  to  meld  with  CASA.  

Harnessing  Community  Development  

 

We  have  tried  to  emphasize  that  an  incremental  approach  within  CASA  makes  the   most   sense   to   slowly   build   up   viewer   and   analysis   capabilities.   This   ensures   that   limited   developer   effort   aids   the   largest   possible   fraction   of   the   user   base.   In   this   picture,  community  development  (or  development  at  NRAO  outside  the  main  CASA   context,   see   the   Kern   white   paper)   has   a   critical   role   to   play.   This   has   three   main   aspects:   (1)   development   studies   outside   CASA   aiming   for   the   “next   big   thing”   in   data   cube   analysis,   (2)   interfacing   with   existing   infrastructure   like   SciPy,   NumPy,   Matplotlib,   and   AstroPy   to   avoid   duplicate   development,   (3)   creating   an   environment  (both  in  CASA  and  in  the  community)  that  fosters  small-­‐scale  scientist   development  and  sharing  of  code.  

 

We  have  already  discussed  the  role  of  ALMA  development  programs  in  fostering  the   “next   big   thing”   for   analysis,   e.g.,   server-­‐side   analysis,   high-­‐dimensional   analysis,   unique   interfaces,   interface   with   large   databases   (though   basic   interface   with   VO-­‐ compliant  servers  is  definitely  within  the  scope  of  the  6-­‐month  cycle  development).      

The   second   point   is   that   a   large   amount   of   infrastructure   for   plotting,   fitting   data,   even  advanced  statistical  calculations  and  image  processing  exists  already.  CASA  has   already   taken   advantage   of   the   matplotlib   /   pylab   plotting   infrastructure   (e.g.,   in   PLOTANTS,  PLOTCAL,  PLOTBANDPASS,  and  the  AnalysisUtils  packages).  These  are   now  standard  astronomical  analysis  tools  with  a  good  pedigree.  Rather  than  invest   large   amounts   of   developer   effort   in   duplicating   them   inside   CASA   it   makes   most   sense   to   expose   the   data   in   ways   that   allow   users   to   interface   these   tools.   This   is   already   well   underway   via   the   toolkit   and   tasks   like   IMVAL   but   will   require   continued  refinement  and  documentation/examples.  A  key  area  to  watch  will  be  the   development   of   AstroPy,   a   community   project   (led   by   staff   members   at   STScI   and   MPIA)  to  develop  a  set  of  astronomical  python  modules  that  complement  SciPy  and   fill  a  similar  niche  to  Goddard’s  IDL  libraries.  

 

Specialized,   small-­‐scale   community   development   is   just   as   critical.   Under   the   framework  that  we  describe  it  will  be  left  up  to  the  community  to  develop  sub-­‐field   specific  applications  (e.g,  an  ammonia  spectrum  fitter,  a  rotation  curve  fitter,  etc.).                                                                                                                    

2  The   recent   ALMA   User   Survey   provides   some   help,   but   the   North   American  

responses   were   not   very   focused   on   analysis   -­‐   probably   because   the   survey   preceded  the  availability  of  archive  data  or  many  deliveries.  

(8)

NRAO  and  CASA  have  two  critical  roles  to  play  here.  First,  CASA  needs  to  provide  the   cleanest   and   easiest   possible   access   to   the   data.   This   means   both   reading   and   writing,  easy  coordinate  access,  p-­‐v  cut  and  spectrum  extraction  to  FITS  and  python   data  structures  in  automated  ways.  These  goals  have  already  been  detailed  above.   Second,   for   maximum   impact   NRAO   should   directly   foster   sharing   of   community   code   without   adopting   responsibility   for   maintenance.   This   represents   mostly   scientist,  rather  than  developer,  effort.  Some  movement  in  this  direction  has  already   occurred   in   the   establishment   of   the   (so   far   lightly   used)   NRAO   forums   and   the   addition  of  contributed  code  areas  to  the  CASA  guides.  Over  the  next  year,  a  goal  of   both   CASA   scientific   staff   and   NAASC   will   be   to   work   out   the   right   approach   to   collectt   and   distribute   community   analysis   code   in   a   way   that   complements   CASA   without  adding  to  the  already  strained  developer  load.  

References

Related documents

79 Denver Health Medical Plan, Inc.. You can find information on what the symbols and abbreviations in this table mean by going to the introduction pages of this document. 80

While suppliers will continue to work to monetize the computing and network assets that underpin the cloud services, it is the operational expertise of billing

In order to meet the demand of power and heat consumption throughout a year in case the renewable solar and wind were insu ffi cient, a biogas engine genset (1 kW, 220 V) was

Evaluation for each course shall be done by a continuous internal assessment (CIA) by the concerned course teacher as well as by end semester examination and will

In the empirical part, practices of peacebuilding and community security – and their embeddedness in the post- liberal trajectory of statebuilding – are analysed by the

He is certified in the following areas: CFI, Certified Fire Investigator, through the INTERNATIONAL ASSOCIATION OF ARSON INVESTIGATORS (IAAI); CFEI, Certified Fire and

2. Assessment should be appropriate. Assessment needs to provide information about the particular kind of learning in which we are interested. This means that we need to use

Poissonov toˇ ckovni proces ili Poissonova sluˇ cajna mjera najvaˇ zniji je primjer toˇ ckovnih procesa.. Ona se pojavljuje