Analysis Programs
DPDAK and DAWN
An Overview
Gero Flucke
FS-EC
PNI-HDRI Spring Meeting April 13-14, 2015
Outline
É Introduction
É Overview of Analysis Programs: É DPDAK
É DAWN É Summary
Introduction
Background
É Data rates in modern X-ray experiments are rising. ⇒ Raw data might not be “exportable” to users’ institutes. É Online (or near-real time) analysis becomes more
important for efficient use of beam time.
É Standardisation of data format ongoing: NeXus.
⇒ Provide support for data analysis tools.
Introduction (ctd.)
PaN-Data WP D5.3: Investigate Existing Solutions
Two suitable programs with different strengths: É DPDAK:
É Setup flexible tool chains for online and offline data processing and analysis,
É 1/2D visualisation, e.g. for monitoring, É easy to extend.
É DAWN:
É Generic data browsing, slicing and 1/2/3D visualisation, É rich set of tools for ROIs, profiling, fitting, etc.,
É toolbox “for everything”.
My Involvement
É DPDAK: After core developer left (Jan 2014), I took over maintenance and development last summer.
É DAWN: I am contact person at DESY PETRA III doing tests and bug reports plus providing two minor contributions.
DPDAK
Directly Programmable DataAnalysisKit
DPDAK
Directly Programmable Data Analysis Kit
É Open source Python program using ‘standard’ packages: É NumPy, SciPy, matplotlib, wxPython, fabio, h5py, pyFAI. É Cooperation between DESY(PETRA III P03) and MPI KG,
É “(online)analysis of 2D scattering data”. É Windows, Linux.
É Core idea: sequential data processing and visualisation. É Talk here about version 1.1.0 whose release is imminent. J. Appl. Cryst. 47, 1797 (2014) [doi:10.1107/S1600576714019773].
DPDAK
Concept and Core
É Minimalistic start-up GUI. É Three plugin types for
É dataprocessingsteps,
É GUItoolsfor data display etc., É dataexport.
É Storage of processed data: É in memory “database”, É types: scalars, 1D/2D arrays,
strings, file/directory paths, É for images just their paths.
DPDAK: Configuration of Processing Chain
É Select plugins from list. É Select inputfrom output of
other plugins (type match ensured).
É Setparameters.
DPDAK
Processing
É Data input via ordinary plugins, É “asked” to providen’th data item. É Start, pause or stop processing chain
interactively.
É If input plugin has done all ⇒stops. É Ifn’th data not found, retry(for online). É Store database and configuration for
later reload (PythoncPickle).
É Can run in batch mode without GUI.
DPDAK
Parallel Processing
É Enabled using Pythons
multiprocessingmodule. É Number of threads is an option,
via GUI or command line. É Each processing thread
É instantiates all plugins, É runs on a well defined subset
of the data.
DPDAK
Input Data Formats
É Images:
É usingfabiolib from ESRF (tif, edf, . . . ), É numpy, mar300.
É Fio text files (DESY format). É Two column text (“chi”) files. É 2D data out of 3D Hdf5 stack:
É as 2D array (but most plugins have an image path as input),
É convert to image (tif) file.
É New plugins are needed for other data formats. (But that is rather easy to do.)
DPDAK Tools
É Choose from menu. É Each in separate window,
É to show processed data or É to provide e.g. powder
diffraction calibration.⇒ É During data processing:
É used to monitor since notified to update regularly.
DPDAK Tools
Image Display
É Scrollthrough processed images. É Or directly open an image file. É DisplayssectorROI.
É Edit ROI coordinates.
DPDAK Tools
Interplay: Tools and Processing Parameters
É Tools notified if parameters of processing plugins edited. É Tools can set parameters of processing plugins.
É Example: innerandouterradius of sector ROI.
DPDAK Tools
1D Plots
É Select x and y from output of processing plugins.
É If array: show É result of single
processed data item, É or stack of all.
É If scalar: show
É vs. index of processed data item,
É or vs. other scalar.
DPDAK Tools
2D Color Plot
É 1D distributionof each processed data item (“frame”). É Attach one after anotherto create 2D colour plot.
DPDAK Fitting
Processing Plugin
É Restrictfit range.
É Configure modelas sum of function components (e.g. Pseudo-Voigt). É Set (fixed) start values.
Peak Fit Display Tool
É Original distribution (highlight fit region). É Fitted curve and its
components.
É Can show function with start parameters.
DPDAK Export
Export Plugins
É Can access complete database.
É User dialogues may specify what to export to which file. É Generic plugins to export data (except 2D) to text files:
É Single Plugin Text Export, É DB Text Export.
É Could add further plugins for further formats.
DPDAK: Extendability
User Plugins
É DPDAK easily extendable: (almost)everything is a plugin. É New plugins can be added:
É write Python class inheriting from one of the plugin base classes (processing, tools, export),
É put code into a directory,
É add this directory to list of user plugin directories. É User plugins treated exactly as DPDAK’score plugins.
DPDAK: Outlook
User Defined Fit Functions
É So far only a fixed set of functions available.
⇒ Using mechanism of user plugin directories it is easy to allow users to extend the set.
In-Memory Database
É All output of all processing plugins run for all “frames” kept in memory ⇒not scalable.
⇒ Prototype of replacement by Hdf5 file back-end exists.
Stored Configuration not Human Readable
É Batch processing needs DPDAK GUI to edit configuration. É No principle problem to create text version:
Logic already established for Hdf5 file back-end.
DAWN
Data AnalysisWorkbeNch
DAWN
Data Analysis WorkbeNch
É Open source Java program, based on Eclipse RCP.
É Mainly by Diamond, contributions from ESRF and others. É “Implements sophisticated support for:
É Visualization of data in 1D, 2D and 3D,
É Python script development, debugging and execution, É Workflows for analyzing scientific data calling Python and
binary codes.”
É “By and for the synchrotron community - overlap with other communities like neutron scattering, photon science, etc.”
É Windows, Linux, (Mac OS).
J. Synchr. Rad. 22(2015) [doi:10.1107/S1600577515002283]
DAWN: Core Concepts
Data and File Formats
É Abstract dataset class – likenumpy arrays in Python. É Plugin approach to load data of various formats:
É images, .edf, ascii, hdf5/nexus, .fio, . . . É DAWN finds out which loader is needed.
É Lazy loading of user-selected sections of large datasets.
Visualisation: Plotting System
É Line graphs, scatter plots, images, surfaces,. . . É Includes ROIs: lines, rectangles, sectors,. . . É Includes tools to “act” on displayed data:
É fitting, derivatives, profiles, masking . . .
DAWN: 1D Visualisation, Expressions
É Choose xandy datausing up to twoyscales.
É Define expression: “virtual” dataset as function of others.
DAWN: Slicing, 2D Visualisation
É Select indices(or ranges) of multi-dimensional datasets.
Slice across stack of images.
DAWN: Surface Plot
Live 3D view (instantly updated): É Rotate as you want.
É Select a box
DAWN: Hyper3D “Box and Line”
É Image on the left is average of images selected by the blue shaded area on the right.
É Red curve on the right displays, as a function of the image index, the average of the region of interest selected by the red square on the left.
DAWN: Profiling
É Various image profiling tools: Line, box, radial, azimuthal,. . .
É In-time update if region of interest moved.
DAWN: Line Profile Masked
Mask taken into account in profiles.
DAWN: Peak Fitting
Interactively define range and maximum number of peaks to be fitted
(can be many!).
DAWN: Partitioning of Functionality
Perspectives
É Synchrotron data and its analysis are very diverse. É Single entry point to all features swamps the user:
⇒ group functionality in “perspectives”.
É Re-use common functionality (data, visualisation): ⇒ user gets easily accustomed to it.
DAWN: Perspectives
Generic Core Perspectives
É Data Browsing/DEXPLORE: É view 1D and 2D data,
É slices/subsets of data of any dimensionality, É expressions to apply mathematical calculations
⇒creating “virtual datasets”. É Trace:
É simplified working with line traces from multiple files. É Workflow:
É graphically design data processing steps (similar to LabView),
É can call Python code. É Processing:
É Setup chain of data processing steps. É PyDev:
É IDE to develop and debug Python/Jython scripts.
DAWN: Data Browsing Perspective
a) explorer
b) data/slice selection
c) plot (incl. sector ROI,mask)
d) colour mapping e) radial profile
f) result of fitted peaks
DAWN
Generic Data Processing: Two approaches
É Workflows: Graphically compose processing using actors, + can also use actors running (C)Python code,
- limited (documentation of) workflow control,
- graphical output (e.g. for monitoring) not supported. É New: processing perspective - so far for diffraction only.
DAWN: Python Development
É DAWN includes PyDev, a Python IDEfor Eclipse.
DAWN: More Perspectives
Specific Science Cases
É Powder Diffraction Calibration
É PEEMA: Photo-emission electron mircosopy analysis É Tomography Reconstruction
É NCD (Non-Crystalline Diffraction) Data Calibration/Reconstruction
É XAFS (X-ray absorption fine-structure) É . . .
DAWN: Diffraction Calibration
Summary
É At PETRA III we are going to make both DPDAK and DAWN centrally available.
É DPDAK:
Flexible tool chains for online and offline data monitoring, processing and analysis.
É DAWN:
Generic data browsing (Hdf5/NeXus), visualisation and analysis.
Backup
DPDAK
General Image Configuration
É rotation, axis flipping, É masks,
É background.