• No results found

3.8 Extensions

3.9.5 Combining All Attributes

All attributes are subsequently combined:

ˆ Concatenate Attributes 204021 combines the spectroscopic attributes, the pho- tometric attributes, the concentration indices, the absolute magnitudes, the Cartesian positions and the densities.

ˆ Select Attributes 204031 selects the density, the inverse concentration, absolute magnitudes and sky position and distance.

3.9.6

Processing

The node at the end of the dependency graph is the final target Source Collection which represents the catalog data that will be used in plots etc. The information system reorganizes the dependency graph to build the target Source Collection as efficiently as possible (figure 3.4):

ˆ From the entire dependency graph a temporary transient copy has been made which can be seen from the negative numbers used as identifiers.

ˆ The density calculator had all its data stored and has been converted into Ex- ternal -17380791.

ˆ The Select Attributes at the end of the dependency graph has been moved upwards. The part of the graph were comoving positions were calculated was not required anymore and is thus removed from the dependency graph. ˆ The Filter Sources that performs the final sample selection had the identifiers

of its sources stored. Therefore it has been converted into a Select Sources with External -95651700 as selected sources. This Select Sources has subsequently been moved through the dependency graph, resulting in several copies to limit the required processing in all parts of the graph.

ˆ The other Filter Sources Source Collections are removed entirely, since the final sample is determined by the last Filter Sources.

ˆ The Source Collections are subsequently processed. The transient copies of the Attribute Calculators represent a subset of the originals, requiring less process- ing.

ˆ The target Source Collection is assembled last and the catalog data is returned to the scientist.

-46947968 SelectAttributes f DEC RA SDSS_z HTM -48319168 ConcatenateAttributes f DEC RA SDSS_z HTM SDSS_extinction_u ... -66729982 AttributeCalculator f iC C -84782923 SelectAttributes f iC -95651700 External f -96091343 SelectSources f density_volume -71885730 SelectSources f SDSS_rowc_g SDSS_rowc_i SDSS_rowc_u SDSS_rowc_r SDSS_rowc_z ... -29090978 SelectSources f SDSS_mag_0 SDSS_mag_1 SDSS_mag_2 SDSS_nGood SDSS_targetID ... -39128171 SelectAttributes f SDSS_extinction_u SDSS_extinction_z SDSS_extinction_g SDSS_extinction_i SDSS_extinction_r ... -5656675 AttributeCalculator f kcorr_u absMag_u amivar_u kcorr_g absMag_g ... -14089549 SelectAttributes f absMag_u absMag_g absMag_r -68033885 SelectAttributes f DEC RA SDSS_z HTM -91576662 RenameAttributes f DEC RA redshift HTM -6532187 SelectAttributes f SDSS_rowc_g SDSS_rowc_i SDSS_rowc_u SDSS_rowc_r SDSS_rowc_z ... -74904541 ConcatenateAttributes f SDSS_mag_0 SDSS_mag_1 SDSS_mag_2 SDSS_nGood SDSS_targetID ... -81441469 AttributeCalculator f R transverse -79896725 ConcatenateAttributes f DEC RA R density_volume absMag_u ... -22398028 SelectAttributes f R -26721864 External a SDSS_mag_0 SDSS_mag_1 SDSS_mag_2 SDSS_nGood SDSS_targetID ... -99817931 External b SDSS_rowc_g SDSS_rowc_i SDSS_rowc_u SDSS_rowc_r SDSS_rowc_z ... -9188502 RelabelSources c SDSS_rowc_g SDSS_rowc_i SDSS_rowc_u SDSS_rowc_r SDSS_rowc_z ... -69543474 RenameAttributes f DEC RA redshift HTM SDSS_extinction_u ... -83168062 SelectAttributes f DEC RA -17380791 External f density_volume

Chapter

4

Comparison of Density Estimation

Methods for Astronomical Datasets

Abstract1: Galaxies are strongly influenced by their environment. Quantifying the

galaxy density is a difficult but critical step in studying the properties of galaxies. We aim to determine differences in density estimation methods and their applica- bility in astronomical problems. We study the performance of four density estimation techniques: k-nearest neighbors (kNN), adaptive Gaussian kernel density estimation (DEDICA), a special case of adaptive Epanechnikov kernel density estimation (MBE), and the Delaunay tessellation field estimator (DTFE).

The density estimators are applied to six artificial datasets and on three astronom- ical datasets, the Millennium Simulation and two samples from the Sloan Digital Sky Survey. We compare the performance of the methods in two ways: first, by measuring the integrated squared error and Kullback–Leibler divergence of each of the methods with the parametric densities of the datasets (in case of the artificial datasets); sec- ond, by examining the applicability of the densities to study the properties of galaxies in relation to their environment (for the SDSS datasets).

The adaptive kernel based methods, especially MBE, perform better than the other methods in terms of calculating the density properly and have stronger predictive power in astronomical use cases.

We recommend the Modified Breiman Estimator as a fast and reliable method to quantify the environment of galaxies.

1

Authors: B. J. Ferdosi, H. Buddelmeijer, S. C. Trager, M. H. F. Wilkinson, J. B. T. M. Roerdink (Astronomy & Astrophysics, accepted, Ferdosi et al. 2011)

4.1

Introduction

Estimating densities in datasets is a critical first step in making progress in many areas of astronomy. For example, a galaxy’s environment apparently plays an impor- tant role in its evolution, as seen in the morphology–density relation (e.g., Hubble and Humason 1931; Dressler 1980) or the color–density and color–concentration–density relations (e.g., Baldry et al. 2006). For these relations, a consistent, repeatable – and hopefully accurate – estimate of the local density of galaxies is an important datum. As another example, reconstruction of the large-scale structure of the Universe re- quires a proper estimation of the cosmic density field (e.g., Romano-D´ıaz and van de Weygaert 2007). Even simulations require density estimation: smoothed particle hydrodynamics (SPH) is a method to create simulated astronomical data using astro- physical fluid dynamical computation (Gingold and Monaghan, 1977; Lucy, 1977), in which kernel-based density estimation is used to solve the hydrodynamical equations. Density estimation is not only required for analyzing spatial domain structures but also for structures in other spaces, like finding bound structures in six-dimensional phase space in simulations of cosmic structure formation (Maciejewski et al., 2009) or in three-dimensional projections of phase space in simulations of the accretion of satellites by large galaxies (Helmi and de Zeeuw, 2000).

In the current work we are motivated by a desire to quantify the three-dimensional density distribution of galaxies in large surveys (like the Sloan Digital Sky Survey, York et al. 2000, hereafter SDSS) in order to study environmental effects on galaxy evolution. We are also interested in finding structures in higher-dimensional spaces, like six-dimensional phase space or even higher-dimensional spaces in large astronom- ical databases (such as the SDSS database itself). We are therefore interested in

accurate and (computationally) efficient density estimators for astronomical datasets

in multiple dimensions.

In this paper we investigate the performance of four density estimation methods: ˆ k-nearest neighbors (kNN);

ˆ a 3D implementation of adaptive Gaussian kernel density estimation, called DEDICA (Pisani, 1996);

ˆ a modified version of the adaptive kernel density estimation of Breiman et al. (1977), called the modified Breiman estimator (MBE); and

ˆ the Delaunay tessellation field estimator (DTFE: Schaap and van de Weygaert 2000).

The first method is well-known to astronomers and involves determining densities by counting the number of nearby neighbors to a point under consideration. This method is typically used in studies of the morphology–density relation and other ob- servational studies of the relation between environment and galaxy properties (e.g., Dressler 1980; Balogh et al. 2004; Baldry et al. 2006; Ball et al. 2008; Cowan and Ivezic 2008; Deng et al. 2009, just to mention a few studies). The second and third methods are both adaptive-kernel density estimators, where a kernel whose size adapts to local

conditions (usually isotropically), depending on some criteria set before or iteratively during the estimation process, is used to smooth the point distribution so that typical densities can be estimated. The fourth method, like the first, uses the positions of nearby neighbors to estimate local densities. We compare the methods using artificial datasets with known densities and three astronomical datasets, including the Millen- nium simulation of Springel et al. (2005) and two samples of real galaxies drawn from SDSS.

This paper is organized as follows. Section 4.2 discusses the four density estimation methods under consideration. Section 4.3 describes the datasets we used. Section 4.4 contains a comparison between the methods based on datasets with both known and unknown underlying density fields. Finally, in Section 4.5 we summarize our findings and draw conclusions.

We point out here that our goal here is not to quantify the shape of the environ- ments of objects in datasets, but rather to estimate the density field or the densities at specific points in those datasets (see below). Information about the shapes of the structures found in the datasets is beyond the scope of this work; we refer the inter- ested reader to recent excellent studies by, e.g., Jasche et al. (2009), Aragon-Calvo et al. (2010) and Sousbie et al. (2009).