Graduate School of Oceanography Faculty
Publications Graduate School of Oceanography
2019
Deriving inherent optical properties from
decomposition of hyperspectral non-water
absorption
Brice K. Grunert Colleen B. Mouw
University of Rhode Island, [email protected]
See next page for additional authors
Follow this and additional works at:https://digitalcommons.uri.edu/gsofacpubs The University of Rhode Island Faculty have made this article openly available.
Please let us knowhow Open Access to this research benefits you. This is a pre-publication author manuscript of the final, published article. Terms of Use
This article is made available under the terms and conditions applicable towards Open Access Policy Articles, as set forth in ourTerms of Use.
This Article is brought to you for free and open access by the Graduate School of Oceanography at DigitalCommons@URI. It has been accepted for inclusion in Graduate School of Oceanography Faculty Publications by an authorized administrator of DigitalCommons@URI. For more information, please [email protected].
Citation/Publisher Attribution
Grunert, B. K., Mouw, C. B., Ciochett, A. B. (2019). Deriving inherent optical properties from decomposition of hyperspectral non-water absorption. Remote Sensing of Environment, 225, 193-206 .doi: 10.1016/j.rse.2019.03.004
1
Deriving inherent optical properties from decomposition of hyperspectral non-water 2
absorption 3
4
Brice K. Grunert1*, Colleen B. Mouw2, Audrey B. Ciochetto2 5
6
1Michigan Technological University, Department of Geological and Mining Engineering and 7
Sciences, 1400 Townsend Drive, Houghton, MI 49931, USA 8
9
2 University of Rhode Island, Graduate School of Oceanography, 215 South Ferry Road, 10
Narragansett, RI 02882, USA 11
12
*Corresponding author: [email protected], +1 414-322-7506 13 14 15 16 17 18 19 20 21
Keywords: hyperspectral, PACE, inherent optical properties, colored dissolved organic matter, 22
phytoplankton, ocean biogeochemistry 23
24
Abstract 25
Semi-analytical algorithms (SAAs) developed for multispectral ocean color sensors have 26
benefited from a variety of approaches for retrieving the magnitude and spectral shape of inherent 27
optical properties (IOPs). SAAs generally follow two approaches: 1) simultaneous retrieval of all 28
IOPs, resulting in pre-defined bio-optical models and spectral dependence between IOPs and 2) 29
retrieval of bulk IOPs (absorption and backscattering) first followed by decomposition into 30
separate components, allowing for independent retrievals of some components. Current algorithms 31
used to decompose hyperspectral remotely-sensed reflectance into IOPs follow the first strategy. 32
Here, a spectral deconvolution algorithm for incorporation into the second strategy is presented 33
that decomposes at-w() from in situ measurements and estimates absorption due to phytoplankton 34
(aph()) and colored detrital material (adg()) free of explicit assumptions. The algorithm described 35
here, Derivative Analysis and Iterative Spectral Evaluation of Absorption (DAISEA), provides 36
estimates of aph() and adg() over a spectral range from 350-700 nm. Estimated aph() and adg() 37
showed an average normalized root mean square difference of <30% and <20%, respectively, from 38
350-650 nm for the majority of optically distinct environments considered. Estimated Sdg median 39
difference was less than 20% for all environments considered, while distribution of Sdg uncertainty 40
suggests that biogeochemical variability represented by Sdg can be estimated free of bias. DAISEA 41
results suggest that hyperspectral satellite ocean color data will improve our ability to track 42
biogeochemical processes affiliated with variability in adg() and Sdg free of explicit assumptions. 43
1. Introduction 44
Dissolved organic matter (DOM) comprises the largest pool of fixed carbon in the ocean, 45
roughly equivalent to the reservoir of atmospheric CO2 (~670 Pg C; Hansell et al. 2009; Ogawa et 46
al. 2001). Yet, sources and cycling of DOM in the global ocean remain poorly constrained due to 47
difficulty in assigning origin and tracking changes in composition to a complex mixture of organic 48
compounds composed of up to ~20,000 molecular formulas in a sample (Andrew et al. 2013; 49
Mentges et al. 2017; Riedel and Dittmar 2014). A portion of DOM is optically active, colored 50
dissolved organic matter (CDOM), and displays distinct spectral variability between uniquely 51
sourced material, namely terrestrial and marine-derived, and different degradation pathways, such 52
as microbial or photodegradation (Catalá et al. 2016; Danhiez et al. 2017; Helms et al. 2013; Helms 53
et al. 2008; Zhao et al. 2017). Due to its interaction with light, CDOM can be rapidly characterized 54
using optical sensors and is observable from autonomous and satellite platforms (e.g., Siegel et al. 55
2005; Xing et al. 2012). These observations are crucial to adequately model ocean biogeochemical 56
and physical processes due to the influence of CDOM on distribution and spectral quality of light 57
in the water column and heating of the surface ocean (Chang and Dickey 2004; Dutkiewicz et al. 58
2015; Kim et al. 2016). 59
CDOM absorption (ag(), m-1; denotes wavelength) at visible wavelengths also tracks 60
the spectral shape of ag (Sg) and dissolved organic carbon concentration ([DOC], mgL-1) in coastal 61
waters where a strong gradient of relatively degraded, terrestrial-derived material and conservative 62
mixing produce a clear, observable signal across unique pools of CDOM (Cory and Kling 2018; 63
Fichot and Benner 2011; Mannino et al. 2014; Stedmon and Markager 2003). This continuous 64
dilution of ag() in coastal waters presents predictive capability of CDOM molecular weight, 65
degradation state and terrestrial biomarkers (e.g., lignin) using ag() due to unique spectral features 66
present in terrestrial material relative to CDOM of marine origin (Fichot et al. 2016; Fichot et al. 67
2013; Helms et al. 2008; Vantrepotte et al. 2015). While these relationships are strong in coastal 68
waters, open ocean waters do not display a consistent relationship between ag(), Sg and [DOC] 69
due to relatively low production rates and strong photodegradation in surface ocean waters (Helms 70
et al. 2013; Nelson et al. 2010). This disconnect between single wavelength estimates of ag() and 71
Sg currently limits our ability to accurately track production and degradation of CDOM across 72
broad spatial scales while also introducing significant uncertainty in estimates of aph() and derived 73
products (e.g., chlorophyll-a concentration). Additionally, increasing observations of ag() have 74
shown that Sg displays significant variability and is capable of characterizing CDOM of unique 75
source, environmental conditions and degradation state (Asmala et al. 2018; Danhiez et al. 2017; 76
Grunert et al. 2018; Helms et al. 2008, 2013). Considering this, it is likely that this parameter 77
contains very useful information regarding food web processes and marine carbon cycling relevant 78
to understanding the balance of the marine DOM carbon reservoir. 79
Hyperspectral ocean color observations from in situ measurements including flow-through 80
systems and proposed satellite sensors such as the German Aerospace Center’s Environmental 81
Mapping and Analysis Program sensor and NASA’s Plankton, Aerosol, Cloud and ocean 82
Ecosystem (PACE) sensor provide the potential to observe inherent optical properties (IOP’s), 83
including phytoplankton absorption (aph(), m-1), non-algal particulate (NAP) absorption (ad(), 84
m-1) and ag(), with greater accuracy across the global ocean. Hyperspectral satellite observations 85
have the proven ability to characterize unique phytoplankton functional groups (Bracher et al. 86
2009; Sadeghi et al. 2012) while flow-through systems have provided an unprecedented view of 87
phytoplankton productivity and physiology at a global scale (Chase et al. 2013; Werdell et al. 88
2013). Additional work including derivative analysis has also shown potential for estimating 89
pigment concentrations, ag(), Sg, ad() and the spectral shape of ad() (Sd; Wang et al. 2016; Chase 90
et al. 2017; Vandermeulen et al. 2017; Wang et al. 2018). To date, satellite algorithms use an 91
assumed value or starting point for Sdg, the combined spectral slope term for ad() and ag(), based 92
on global or regional observations and/or constrain solutions within a pre-defined space (Lee et al. 93
2002; Werdell et al. 2013; Dong et al. 2013; Zhang et al. 2015). These approaches are all made 94
possible by a variety of existing inversion approaches developed for multispectral data outlined by 95
Werdell et al. (2018). 96
Hyperspectral approaches are still scarce but apply bottom-up strategies on in situ Rrs() 97
capable of estimating pigment concentrations and separating ag()/Sg and ad()/Sd using assumed 98
starting points and lower/upper bounds on variables. Bottom-up strategies provide accurate 99
solutions but result in IOP retrievals that are spectrally dependent on each other (Mouw et al. 100
2015). Here, we provide a top-down approach that independently estimates Sdg, adg() and aph() 101
free of explicit assumptions from total non-water absorption (at-w()) using derivative analysis, 102
iterative spectral evaluation and Gaussian decomposition of total non-water absorption spectra. 103
Beyond estimation of Sdg and more accurate spectral retrievals of aph(), such a method provides 104
clearer spectral features for the derivation of phytoplankton functional types, including Gaussian 105
fitting and second or fourth derivative analysis of phytoplankton pigments (Chase et al. 2017; 106
Vandermeulen et al. 2017; Wang et al. 2017). We focus on accurate retrieval of Sdg and adg() to 107
represent biogeochemical variability in NAP and CDOM absorption represented by the spectral 108
shape and magnitude of adg(). Our results suggest the algorithm, Derivative Analysis and Iterative 109
Spectral Evaluation of Absorption (DAISEA), will work well with future top-down hyperspectral 110
inversion approaches. 111
2. Methods 113
2.1 Data 114
In situ data were accessed from NASA’s SeaWiFS Bio-optical Archive and Storage System 115
(SeaBASS, https://seabass.gsfc.nasa.gov/) on January 12, 2018 (Werdell et al. 2003). We focused 116
our collection on data where aph(), ad() and ag() were all measured coincidentally on a benchtop 117
spectrophotometer within 10 m of the surface (Fig. 1). We initially quality controlled each set of 118
absorption spectra by considering if any values were below zero for individual spectra. If the 119
minimum value was more negative than -0.1, the spectra was discarded; if the value was greater 120
than -0.1, an offset for the most negative value was applied to the entire spectrum. In doing so, 121
spectral shape was retained while removing poorly defined absorption values that resulted in 122
negative algorithm solutions. We removed any spectra where Sdg was less than 0.004 nm-1, values 123
unrealistic with historic observations and estimates (e.g., Siegel et al. 2002; Wang et al. 2005). 124
Additionally, spectra that had been sampled at a resolution less than 2 nm were not considered to 125
ensure spectral shape was maintained when downsampling. After removing poor quality spectra, 126
a total of 4,787 spectra remained. These spectra were randomly split into training (n=3,434; Fig. 127
1a) and test datasets (n=1,353; Fig. 1b) so that training spectra accounted for ~75% of total spectra. 128
All absorption spectra were subsampled to 5 nm either through direct sub-sampling or linear 129
interpolation to avoid introducing artificial curvature, with the spectral range from 350-700 nm 130
used (71 data points). Some spectra were not sampled down to exactly 350 nm but were measured 131
at or below 355 nm (e.g., 350.7; n=79); for these spectra, we extrapolated to 350 nm using a 132
discretized partial differential equation with an enhanced plate metaphor (D’Errico 2005). Typical 133
uncertainty estimates for spectrophotometer measurements, assessed as differences among 134
triplicate samples, ranged from ~5-10% relative difference (Mouw, unpublished data). We focused 135
on 5 nm spectral resolution here for an assessment of performance relative to the anticipated 136
resolution of PACE. 137
2.2 DAISEA Algorithm Development 138
Our approach for decomposing at-w() focused on estimating adg() first through derivative 139
analysis, optimizing the fit of adg() through iterative spectral evaluation, then estimating aph() 140
using Gaussian decomposition. Steps described in this section are summarized in a schematic and 141
accompanied by figures illustrating the primary components of each step (Fig. 2). Steps 1-7 142
evaluate at-w() to optimize estimates of adg() and aph() and Step 8 is a Gaussian decomposition 143
of at-w() using estimated adg() and aph() with constraints defined below. For a detailed 144
discussion on general algorithm framework and empirical relationships used, we refer the reader 145
to section 4.1.2. 146
Step 1 147
To first parameterize adg(), the second derivative of at-w() was calculated as: 148 𝑑2𝑎𝑡−𝑤(𝜆) 𝑑𝜆2 ≈ 𝑎𝑡−𝑤(𝜆𝑖) − 2𝑎𝑡−𝑤(𝜆𝑗) + 𝑎𝑡−𝑤(𝜆𝑘) ∆𝜆2 (1) where ∆ indicates the wavelength resolution used to measure at-w() (here, 5 nm as described 149
above) and ∆=k-j=j-i, j=i+1, k=i+2 and i is the current wavelength (Tsai and Philpot 1998). 150
Points where the second derivative equals 0 indicate inflection points of the spectrum (Fig. 2a; Lee 151
et al. 2007). In theory, for at-w(), these are points where individual phytoplankton pigments least 152
impact the underlying exponential signal and thus are considered as the observed signal most likely 153
representative of adg() spectral shape. These points were defined as d0 and were found by
154
identifying where d2at-w() was approximately 0. These points were identified by rounding second 155
derivative values to the median magnitude of the second derivative, which is a function of the 156
magnitude of observed absorption. For example, if the median second derivative was 0.005, a 157
value of 0.0008 at 440 nm would be considered not zero and not included (rounded to 0.001). A 158
value of 0.0004 at 450 nm would be considered zero (rounded to 0.000), 450 nm would be 159
classified as a d0 wavelength and the corresponding absorption would be used in Step 2.
160
Step 2 161
Using wavelengths identified in Step 1, an initial exponential expression was fitted 162
following 163
𝑎𝑡−𝑤(𝜆𝑑0) = 𝑎𝑡−𝑤(𝜆0)𝑒−𝑆(𝜆𝑑0−𝜆0) (2) where 0 is the minimum wavelength in d0 (Fig. 2b). S derived from Eq. 2 was used as the initial
164
estimate of Sdg and adg(0) was estimated at 440 nm by estimating the relative contribution of 165
adg(440) to at-w(440) using a piece-wise exponential relationship derived from the training dataset 166 as follows: 167 % 𝑎𝑝ℎ(440) = 1.038𝑒 −0.9257(𝑎𝑎𝑡−𝑤(555) 𝑡−𝑤(680))where 𝑎𝑡−𝑤(555) 𝑎𝑡−𝑤(680) > 0.685 (3) or 168 % 𝑎𝑝ℎ(440) = 2.088𝑒 −1.946(𝑎𝑎𝑡−𝑤(555) 𝑡−𝑤(680)) where 𝑎𝑡−𝑤(555) 𝑎𝑡−𝑤(680) ≤ 0.685 (4) and 169 % 𝑎𝑑𝑔(440) = 100 − % 𝑎𝑝ℎ𝑦(440) (5) Eq. 3 was developed on the entire dataset, and outliers were determined as residuals outside 1.5 170
times the interquartile range (25th and 75th quantiles). After determining outliers, a moving window 171
of 10% aph(440) contribution to at-w(440) was used to assess for a significant bias in residuals 172
derived from this relationship. Bias was defined as a median residual value an order of magnitude 173
different (positive or negative) than median residual bias between the relationship and all data 174
points. This threshold indicated a bias for aph(440) contributions greater than 60%, corresponding 175
to an at-w(555)/ at-w(680) ratio of 0.685. From this, a new relationship (Eq. 4) was developed to 176
estimate aph(440) percent contribution above 60% without bias. These equations are discussed 177
further in Section 4.1.2 and figures referenced therein. 178
From the previous steps, the spectra for adg() was then estimated (Fig. 2b) as follows 179
𝑎𝑑𝑔(𝜆) = (𝑎𝑡−𝑤(440) ∙ %𝑎𝑑𝑔(440)) 𝑒−𝑆𝑑𝑔(𝜆−440) (6) Step 3
180
To determine if the adg() estimate was acceptable, we compared it to at-w(): 181
𝑎𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙(𝜆) = 𝑎𝑡−𝑤(𝜆) − 𝑎𝑑𝑔(𝜆) (7) If aresidual() was always positive, the previous variables - 0, adg(0), Sdg – were maintained at the
182
current estimated values (e.g., 0=440 nm; Fig. 2c). If aresidual() was negative at any point, the
183
wavelength corresponding to the most negative residual was used as 0, and adg(0)=at-w(0) to re-184
calculate adg() from 185
𝑎𝑑𝑔(𝜆) = 𝑎𝑑𝑔(𝜆0)𝑒−𝑆𝑑𝑔(𝜆−𝜆0) (8) Resulting aresidual() was re-calculated again following Eq. 7 for the new estimated adg(). This
186
step was repeated until all aresidual() values were positive, with Sdg incrementally adjusted by
187
+0.0001 nm-1 to a maximum adjustment of +0.011 nm-1. If a potential solution was not found, Sdg 188
was then incrementally adjusted by -0.0001 nm-1 to a minimum adjustment of -0.004 nm-1. The 189
difference in adjustment and focus on positive adjustment values first is discussed further in 190
Section 4.1.2. If no valid solution was found through this routine, the initial estimate of adg() was 191
used; if a valid solution was found, that was the new adg() estimate (e.g., Fig. 2c). At this step, 192
negative residual values were allowed, and accounted for in Step 5. This occurred in 91 of the 193
1,353 spectra evaluated (6.7% of the time). 194
Step 4 195
Using the new or initial adg() estimate, aph() was estimated (Fig. 2d) following 196
𝑎𝑝ℎ(𝜆) = 𝑎𝑡−𝑤(𝜆) − 𝑎𝑑𝑔(𝜆) (9) Step 5
197
To determine if adg() was estimated reasonably well, we considered the ratio of 198
aph(350):aph(440), where a value greater than 1.5 was used to indicate whether a significant portion 199
of the adg() signal was still present in the residuals. While some waters with a significant pigment 200
contribution below 400 nm (e.g., mycosporine-like amino acids) may have violated this rule, it 201
was generally applicable following discussion in Section 4.1.2. 202
If aph(350):aph(440) was greater than 1.5, a blended estimate of adg() was produced by 203
fitting residuals from 350-400 nm with an exponential model (Fig. 2e) following: 204
𝑎𝑑𝑔_𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙(𝜆) = 𝑎𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙(𝜆0)𝑒−𝑆𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙(𝜆−𝜆0) (10) A new estimate of adg(), denoted as adg2(), was created from:
205
𝑎𝑑𝑔2(𝜆) = 𝑎𝑑𝑔(𝜆) + 𝑎𝑑𝑔_𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙(𝜆) (11) A new Sdg was re-calculated for adg2() and the next iteration of adg() was estimated from: 206
𝑎𝑑𝑔(𝜆) = (𝑎𝑡−𝑤(440) ∙ %𝑎𝑑𝑔(440)) 𝑒−𝑆𝑑𝑔_𝑛𝑒𝑤(𝜆−440) (12) The adg() estimated from Eq. 12 was then iteratively evaluated by adjusting Sdg and assessing 207
whether adg() > at-w() at any wavelength, within each iteration. If adg() > at-w(), an offset was 208
calculated by finding the wavelength where adg() was most overestimated following 209
𝑎𝑑𝑔(𝜆𝑖𝑛𝑑) = 𝑎𝑡−𝑤(𝜆𝑖𝑛𝑑) − [𝑎𝑑𝑔(𝜆𝑖𝑛𝑑) − 𝑎𝑡−𝑤(𝜆𝑖𝑛𝑑)] (13) where ind corresponds to the wavelength where adg() was most overestimated (maximum 210
positive value from adg()-at-w()). Eq. 8 was then used to re-calculate adg(), with 0=ind and 211
adg(0) equivalent to adg(ind) from Eq. 13. The offset corrects for overestimations, but the 212
application of a new 0 with the current Sdg can allow for overestimation at a different . If this 213
step was performed, adg(0) was no longer set to the empirically-derived estimate of adg(440), 214
rather, and adg(0) and Sdg were altered simultaneously to find a solution (i.e., adg(0) set to at-w() 215
minus an offset, and Sdg to the next iterative slope value). These steps were performed in a step-216
wise manner until aph(350):aph(440) was less than 1.5 or until the maximum number of allowable 217
iterations, currently set to 20, was reached (Fig. 2f,g). If 20 iterations were reached without a 218
solution, the final calculated model (from the 20th iteration) was used. This allowed for negative 219
residuals to be included in the subsequent estimate of aph(), following Eq. 9. However, negative 220
values did not impact the spectral analysis used to identify pigments for fitting in Step 7 and 221
negative values were removed through simultaneous fitting of adg() and aph() in Step 8, described 222
below, where a constraint of non-negative values on solutions was imposed. 223
Step 6 224
In Step 6, we identified locations and widths of Gaussian curves in aph() derived from Eq. 225
9 or its equivalent derived from Steps 10-13. For this, we utilized a generic version of Eq. 1 to 226
calculate the second derivative of estimated aph() as spectral features were accentuated in the 227
second derivative relative to aph() (Fig. 2h). The second derivative was smoothed with a linear 228
Savitzky-Golay filter using a 10 nm smoothing window. This smoothing reduced the number of 229
features identified that correspond to signal noise. The smoothed second derivative was inverted 230
to allow for identification of local maxima (equivalent to where the first derivative equals 0) and 231
direct estimation of Gaussian curves on these spectral features – this is a key distinction between 232
our methodology and published bottom-up approaches where Gaussian width and height are 233
constrained as initial conditions or within a fitting window. Identified peaks were then used as an 234
initial estimate of the number of peaks and each peak’s location and width (Fig. 2h,i) with each 235
Gaussian curve modeled following 236
𝑓(𝑥, 𝜑, 𝜇, 𝜎) = 𝜑𝑒−(𝑥−𝜇)
2
2𝜎 (14)
where σ (nm) is the width of the curve, φ (m-1) is the height of the Gaussian curve defined as 𝜑 = 237
1
𝜎√2𝜋, consistent with Gaussian curve height defined as full width at the half maximum, and μ (nm) 238
is the peak center position. Any Gaussian curves with a σ less than 5 nm were removed at this 239
stage, as these features were fit to noise and not pigments when using a 5 nm spectral resolution. 240
At this stage, φ was scaled to the second derivative requiring re-parameterization of Gaussian curve 241
heights relative to aph(). Additionally, noisy data where σ > 5 nm can result in more identified 242
peaks than was realistic. These issues were addressed in Step 7. 243
Step 7 244
For Step 7, Gaussian curves identified in Step 6 were scaled to aph(). This was done by 245
prioritizing peaks based on their relative prominence, identified as the φ determined for each 246
identified peak in Step 6 (scaled to the second derivative). When identified in this manner, 247
pigments that did not overlap, or overlap little, were fitted to aph() first, following the assumption 248
that the majority of the absorption signal in that spectral region belongs to that Gaussian 249
component (e.g., chlorophyll-a peak at 676 nm was typically prioritized for fitting first due to little 250
overlap with other pigments). From this, aph() was iteratively fit with each Gaussian curve, the 251
signal from that curve was removed, and the next Gaussian curve was fit to the remaining aph() 252
signal to get a best approximation of φ for each Gaussian curve following 253 𝑎𝑝ℎ𝑖(𝜆) = 𝑎𝑝ℎ(𝜆) − ∑ 𝜑𝑖𝑒 −(𝑥−𝜇𝑖) 2 2𝜎𝑖2 𝑛 𝑖=1 (15)
where n indicates the number of peaks identified for fitting from Steps 6d and 6e (Fig. 2) and μ 254
and σ identified in Step 6 were used for each peak. Due to the additive nature of fitting Gaussian 255
curves, there was potential for some peaks to have a negative height. After estimating an 256
appropriate φ for each curve, we filtered out peaks with negative heights and we limited the total 257
possible number of peaks to 16, although fewer peaks were typically identified (mean=7.7 peaks, 258
s.d.=2.2 peaks). Most Gaussian decomposition schemes assume the presence of ~12 peaks (e.g., 259
Hoepffner and Sathyendranath 1993; Wang et al. 2016; Chase et al. 2017). These studies have 260
considered similar peak locations with minor differences accounting for a total of 16 unique peak 261
locations in the literature. From this, we assumed if more than 16 peaks were present and all had 262
a positive peak height, some identified peaks were noise or signals not affiliated with 263
phytoplankton pigments that had not been removed in earlier steps. We sorted for likely pigment 264
signals by prominence, using the same method described for peak height previously, and selected 265
the 16 most prominent identified peaks if more than 16 peaks were identified. Next, we used the 266
σ, φ and μ values identified for each Gaussian curve as input into a least squares Gaussian 267
decomposition model that best fit our initial aph() estimate (Eq. 10) with the initial Gaussian curve 268
estimates and fitting constraints described in Step 8 to define an updated set of Gaussian curves 269
𝑎𝑝ℎ(𝜆) = ∑ 𝜑𝑖𝑒− (𝑥−𝜇𝑖)2 2𝜎𝑖2 𝑛 𝑖=1 (16) Step 8 271
Results from Steps 1-7 provided the start point for a combined retrieval of adg() and aph() 272
from at-w(). Using the estimate of adg() from Steps 1-7 and an estimate for each identified 273
Gaussian curve fitted to aph(), a least squares fitting approach was performed using the following 274 expression: 275 𝑎𝑡−𝑤(𝜆) = 𝑎𝑑𝑔(𝜆0)𝑒−𝑆𝑑𝑔(𝜆−𝜆0)+ ∑ 𝜑 𝑖𝑒 −(𝑥−𝜇𝑖) 2 2𝜎𝑖2 𝑛 𝑖=1 (17)
Analogous to methods used for identifying poorly constrained features that deviate from an 276
underlying exponential signal presented elsewhere (e.g., Massicotte and Markager 2016), the 277
model decomposed at-w() by utilizing a baseline exponential (Eq. 8) accompanied by a pre-278
defined number of Gaussian components based on previous steps (Eq. 16). This method differs 279
from other Gaussian decomposition methods applied to particulate absorption (ap), in that those 280
methods typically have a pre-defined number of Gaussian components based on analysis of 281
separate aph() for the respective system (e.g., Chase et al. 2013; Wang et al. 2016). This 282
methodology fits primary pigments with width estimated from spectral features identified in the 283
second derivative of estimated aph(), allowing for a constrained solution to decomposing at-w() 284
while not assuming the presence of any specific types of phytoplankton. Parameters in Eq. 17 were 285
constrained utilizing results from Steps 1-7: adg(0) can vary from 0 m-1 to at-w(0), Sdg can vary by 286
-0.002 nm-1 to +0.003 nm-1 from the input estimate, Gaussian peak width can vary from input 287
width to 3 times the input width, Gaussian peak height can vary by 0.25 times input height to 3 288
times input height and μ is fixed at the identified location due to high confidence in the second 289
derivative analysis. DAISEA output was as follows: adg() was that estimated in Eq. 17, while 290
aph() was the difference between observed at-w() and adg() from Eq. 17 (Fig. 3). Step 8 ensures 291
coherence between the exponential signal and overlying deviations due to aph() as constrained 292
through Steps 1-7 in a flexible manner, while not assuming that aph() can be best parameterized 293
by 6-8 Gaussian curves. Fitting of secondary features was possible but also increases the 294
probability of over-constraining a solution (i.e. less flexibility in adjustments to adg()). 295
2.2.1 Low aph() waters
296
We found that waters dominated by adg() were best decomposed by fitting an initial 297
exponential function and adjusting to a realistic solution following Eq. 8, 9 and 13. These cases 298
were identified after Eq. 3 and 4; waters were considered dominated by adg() where the ratio of 299
at-w(555):at-w(680) > 2.528 (the empirical value indicating aph(440) < 10% of at-w(440)). For these 300
situations, the algorithm opted out of the Gaussian decomposition routine and followed a 301
simplified routine analogous to Steps 2-4, where Sdg was considered equivalent to S calculated for 302
at-w() (Eq. 2), and magnitude was adjusted so that adg() ≤ at-w(). We chose this threshold as 303
forcing Eq. 17 to fit all cases resulted in significantly more error in Sdg estimates when aph(440) 304
contributed < 10% of at-w(440). Above this threshold, using Eq. 17 to fit for aph() improved 305
estimates of adg() and Sdg while also providing an estimate of aph(). The exact value of 10% may 306
not be an ideal threshold for all datasets but worked well as a threshold here and fit within our 307
presentation scheme. Eq. 3 and 4 are empirical and follow band-ratio techniques used for fitting 308
Sdg in current semi-analytical schemes (Lee et al. 2009; Matsuoka et al. 2013). Noise in this 309
relationship was explained by variability in the exact shape of aph() due to varying phytoplankton 310
composition, physiology and pigment packaging effects (Bricaud and Morel 1986; Bricaud et al. 311
1983; Ciotti et al. 2002; Johnsen et al. 1994) as well as variability in the spectral shape and features 312
of ag() and ad() (Grunert et al. 2018). As the algorithm is currently optimized for a global 313
approach, users may find that adjusting the empirical values used to initially estimate adg(440) and 314
adjusting the value of 1.5 for the ratio of aph(350):aph(440) (Step 5) for a value more representative 315
of their study region results in better algorithm performance. 316
2.2.2 Functions 317
To develop DAISEA, we focused on creating a primary, custom Matlab function – daisea, 318
an approach that utilizes derivative analysis and iterative fitting to optimize input spectra used in 319
a least squares Gaussian decomposition scheme fitting an exponential signal and a pre-defined 320
number of constrained Gaussian peaks. DAISEA uses a package of custom sub-functions. This 321
package is freely available via GitHub (https://github.com/bricegrunert/daisea/tree/v1.0.0; DOI: 322
10.5281/zenodo.1306817). Version updates will follow Github conventions. Users are encouraged 323
to use the most recent version for application. 324
2.3 Data Analysis 325
To assess the performance of DAISEA across a variety of water conditions, we present 326
results as eight different categories based on the percent contribution of aph(440) relative to at-327
w(440), with the distribution of spectra within these classes shown in Fig. 1. Classes were defined 328
as the percent contribution aph has to the overall absorption budget at 440 nm (%aph(440)) of 0-10, 329
10-20, 20-30, 30-40, 40-50, 50-60, 60-70, and >70, with n=286, 257, 303, 210, 146, 89, 34 and 28 330
spectra, respectively. This classification scheme emphasizes the relative, not the absolute, 331
contribution of phytoplankton to the overall absorption signal. Thus, waters where aph(440) is the 332
dominant contributor to total absorption are not limited to highly productive waters. In this sense, 333
algorithm performance was not assessed across classic definitions of Case 1 or Case 2 waters 334
(Morel and Prieur 1977). Rather, the only group dominated by coastal and inland waters was 0-10 335
%aph(440). It should be emphasized that these categories are only used to present the data within 336
the context of relative contribution of aph() and adg(). Beyond separating 0-10 %aph(440) spectra 337
for fitting using Eq. 2, the algorithm does not analyze spectra differently based on these categories. 338
To determine whether aph() or adg() was retrievable, we calculated the absolute difference 339
in the opposing metric and compared it to the observed value. For example, if aph_obs() > 340
|adg_obs()-adg_est()|, we consider it retrievable at that wavelength. Within each %aph(440) group, 341
we summed the total number of instances at each wavelength where aph() or adg() was greater 342
than the absolute difference in the opposing metric and divided by the total number of spectra to 343
get a percent retrievable metric for that %aph(440) group. In addition to percent retrievable metrics, 344
we calculated Bayes factors (BF10, unitless) to assess fit significance (Wetzels and Wagenmakers 345
2012). Bayes factors represent the likelihood that the fitted model adequately represents the data 346
relative to an alternative model. Bayes factors can be interpreted literally, so that BF10=2 means 347
the data are twice as likely to be explained by the fitted model than an alternative model. Here, we 348
used a BF10 3 as the threshold for significance (Wetzels and Wagenmakers 2012). We also 349
calculated root mean square difference (RMSD), normalized RMSD (NRMSD), bias, mean 350
absolute difference (MAD) and unbiased absolute percent difference (UAPD) using the following 351 expressions: 352 𝑅𝑀𝑆𝐷 = √∑ [(𝑥𝑖 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑) − (𝑥 𝑖𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑)] 2 𝑛 𝑖=1 𝑛 (18) 353 𝑁𝑅𝑀𝑆𝐷 (%) = 𝑅𝑀𝑆𝐷 𝑥𝑚𝑎𝑥𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑− 𝑥 𝑚𝑖𝑛𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 × 100 (19) 354
𝐵𝑖𝑎𝑠 =1 𝑛∑(𝑥𝑖 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑− 𝑥 𝑖𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑) 𝑛 𝑖=1 (20) 355 𝑀𝐴𝐷 =∑ (|𝑥𝑖 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑− 𝑥 𝑖𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑|) 𝑛 𝑖=1 𝑛 (21) 356 𝑈𝐴𝑃𝐷 (%) = |𝑥𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑− 𝑥𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑| 0.5(𝑥𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑+ 𝑥𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑)× 100 (22) 357 3. Results 358 3.1 DAISEA Performance 359
Here, we present the results of DAISEA performance on the test dataset. Across all groups, 360
adg() was retrievable >80% of the time for wavelengths < 450 nm (Fig. 4a). For waters where 361
adg() contributed greater than 60%, it was retrievable at a rate of >80% for all wavelengths up to 362
650 nm. For aph(), local maxima in retrieval corresponded to chlorophyll-a absorption peaks 363
(~440 and 680 nm within DAISEA), with these wavelengths displaying >80% retrievability for 364
waters with %aph(440) > 10 (Fig. 4b). Relative difference for aph() and adg() was parameterized 365
as NRMSD and displayed excellent performance for both parameters across most wavelengths and 366
environments. For all conditions except %aph(440) > 70, adg() had a mean difference less than 367
20% for wavelengths from 350-650 nm (Fig. 4c). Mean aph() difference was generally less than 368
20% from 350-650 nm when %aph(440) was > 10 (Fig. 4d). As seen in Fig. 4 and 5, aph() was 369
biased to greater than observed values when it was a non-dominant contributor at 440 nm and was 370
biased towards values less than observed when it was a dominant contributor at 440 nm, and vice 371
versa for adg() (Fig. 4e). Mean absolute difference generally decreased as the contribution of 372
aph(440) increased (Fig. 4f). 373
The threshold for estimating adg() with DAISEA appears to be %aph(440) < 70; for these 374
conditions, adg() is estimated with NRMSD < 20% from 350-650 nm. NRMSD for aph() was < 375
20% for the majority of wavelengths between 400-650 nm when %aph(440) was > 10. This was 376
also consistent when considering the retrievability of aph(440) under different conditions and can 377
be considered as the threshold for estimating aph(). Sdg uncertainty increased with increasing 378
contribution of aph(440); however, performance was reasonable across all water conditions and 379
estimates (Table 1). This was also confirmed when considering Bayes factors for fitted models. 380
Overall, BF10 were quite high with 94.0% of collective model retrievals showing a BF10 > 3, our 381
cutoff for significance, and 92.9% with a BF10 > 10, demonstrating strong confidence in our 382
approach. The model retrieved adg() with slightly better success than aph(), with BF10 > 3 for 383
99.5% and 88.5% of the dataset, respectively. Of models that did not fit observed aph() adequately 384
(n=156), the majority of poor model fits occurred in waters where %aph(440) < 20 (n=129). Only 385
6 model fits were not adequate for adg() and distribution across %aph(440) groups was random. 386
Using the 2.528 threshold applied to at-w(440) ratios to separate low %aph(440) did present 387
issues, particularly in the %aph(440) of 10-20 category. This threshold miscategorized spectra from 388
this category as having %aph(440) < 10 in 175 of 257 spectra, resulting in poorly resolved aph() 389
for these. This highlights the primary drawback of utilizing empirical relationships and a weakness 390
in our approach. Due to using the 2.528 threshold to separate %aph(440) contribution, more than 391
half of the aph() estimates in the 10-20 %aph(440) group had negative values at wavelengths 392
greater than 650 nm, outside of the chlorophyll-a (Chl) absorption peak at 676 nm (typically 393
assigned to 680 nm within the algorithm framework; Fig. 5). While we attempted to account for 394
this by using an offset from Eq. 13, moving the location of 0 and maintaining Sdg can result in 395
negative values at a different spectral location. However, the benefit of using this scheme was 396
evident in improved estimates of Sdg for correctly classified spectra (i.e., spectra where %aph(440) 397
was <10). Without a threshold to separate these spectra, attempting to fit aph() when it contributed 398
very little resulted in poor algorithm performance, with more spectra poorly fit than with the 399
current approach including a threshold. 400
One of the primary motivators for developing DAISEA was to accurately retrieve Sdg 401
without any assumptions regarding spectral shape while also independently estimating aph(). Our 402
results suggest that this is possible across a variety of optical conditions with a reasonable to 403
excellent degree of accuracy, depending on the relative contribution of adg(). Across the different 404
groups of varying aph(440) contribution, median difference in Sdg varied from 0.9-17.7%, with 405
third quartile differences ranging from 2.4-39.2% (Fig. 6a; Table 1). Mean Sdg observed across all 406
spectra in the test dataset was 0.0147 nm-1 compared to a mean estimated value of 0.0150 nm-1, 407
while median observed and estimated Sdg was 0.0152 and 0.0153 nm-1, respectively. Across 408
individual groups, we evaluated the differences and present anticipated accuracy for Sdg (Table 1). 409
For most groups, median difference was << 0.001 nm-1 and absolute differences affiliated with the 410
1st and 3rd quantiles ranged up to -0.0037 and 0.0021 nm-1, respectively,but were typically much 411
smaller. We also considered distribution of differences in Sdg across all groups and it followed a 412
predominantly normal distribution (data not shown), without an obvious bias between observed 413
and estimated Sdg regardless of %aph(440) contribution (Fig. 6b). 414
3.2 Consistency in Gaussian features 415
We considered the accuracy of our Gaussian component locations within DAISEA by 416
comparing to Gaussian component locations identified on observed aph() using the same Gaussian 417
decomposition approach (Fig. 7). Overall, peak locations were quite similar, although DAISEA 418
fitted more peaks (total peaks=10,394; 7.7 peaks/spectra) than for observed aph() (total 419
peaks=8,794; 6.5 peaks/spectra). Since aph() estimated from the algorithm was derived from the 420
smoothed residuals of at-w() – adg_est(),the additional noise in the spectra was derived from 421
deviations in ad() and ag() not accounted for by a strictly exponential fit. We discuss potential 422
reasons for an increase in fitted peaks in DAISEA output over observed aph() in Section 4.2, as 423
well as fitting significantly fewer peaks under our approach than other Gaussian decomposition 424
approaches (e.g., Chase et al. 2013). 425 426 4. Discussion 427 4.1 DAISEA 428 4.1.1 Application 429
As evidenced here and elsewhere, hyperspectral ocean color data provides a means for 430
estimating more variables in a less constrained manner (Bracher et al. 2009; Dierssen et al. 2015; 431
Uitz et al. 2015; Vandermeulen et al. 2017; Wang et al. 2017). Global variability in water optical 432
properties is significant yet the non-uniqueness of Rrs() hampers consistent interpretation across 433
both empirical and semi-analytical methods (Werdell et al. 2018 and references therein). Previous 434
concepts for working around this issue, particularly in light of multispectral limitations, have 435
included screening Rrs() to most likely cases based on optical water types, non-linear spectral 436
optimization, linear matrix inversion, bulk inversion, ensemble inversion and spectral 437
deconvolution (Brando et al. 2012; Hieronymi et al. 2017; Mélin and Vantrepotte 2015; Trochta 438
et al. 2015; Werdell et al. 2018). These approaches are broadly defined as bottom-up and top-down 439
strategies (Mouw et al. 2015), where bottom-up strategies simultaneously solve for all parameters 440
while top-down strategies allow for independent retrieval of absorbing and scattering constituents 441
(a and bb). 442
Current hyperspectral approaches capable of estimating aph(), ad() and ag() operate with 443
bounded ranges and relatively defined pigment locations within a bottom-up framework using 444
Rrs(). Notably, Chase et al. (2017) provide a global approach that performs quite well on in situ 445
reflectance data through the use of assumed starting points for IOP’s based on global means, 446
Gaussian decomposition of aph() using constrained Gaussian curves and lower and upper bounds 447
imposed on all IOP’s. Here, we developed a hyperspectral decomposition approach, DAISEA, 448
suitable for a top-down inversion strategy analogous to spectral deconvolution approaches 449
developed for multispectral data (e.g., QAA; Lee et al. 2002; Werdell et al. 2018). Within 450
DAISEA, spectral shapes and features of adg() and aph() are not assumed but identified through 451
derivative analysis and comparing retrieved spectra to anticipated thresholds. This provides a 452
means for estimating the spectral shape of adg() and phytoplankton pigment identification free of 453
explicit assumptions while also limiting retrievals based on a suitable signal-to-noise ratio (i.e., 454
the algorithm only fits primary spectral features after spectral smoothing with a Savitzky-Golay 455
filter). DAISEA is anticipated to pair well with future inversion schemes designed to work with 456
hyperspectral Rrs() and flow-through at-w() datasets (e.g., Twardowski and Tonizzo 2018). 457
Top-down approaches have been used to retrieve IOP’s in an independent manner in a 458
variety of aquatic environments (Mouw et al. 2015). We demonstrated that DAISEA works quite 459
well for in situ absorption datasets. The algorithm does not currently perform well with top-down 460
inversion strategies designed for multispectral data due to relatively high error in estimating at-461
w(350). This is a short-coming of our approach, but future top-down hyperspectral inversion 462
approaches are expected to have minimal error and bias across the full spectral range of PACE 463
(350-800 nm) (Twardowski and Tonizzo 2018). While it is not possible at this time to fully assess 464
the compatibility of DAISEA with these developing approaches, early indications suggest that 465
spectral accuracy of aph() and adg() could be quite reasonable and in-line with error attached to 466
current approaches (Chase et al. 2017; Wang et al. 2018). Additionally, we did not pursue 467
separation of ad() and ag() as current methods for separating within a top-down scheme rely on 468
empirical approaches. Independent approaches for separating these features are currently being 469
considered for future work. Considering the accuracy of estimating Sdg with DAISEA (Table 1) 470
and minimal additional uncertainty in separating adg() into ad() and ag() in future work (5-10%), 471
Sg could feasibly be retrieved with a median difference of 0.001-0.002 nm-1 across most optical 472
conditions. This would provide an adequate resolution for estimating CDOM source, production 473
and degradation processes as characterized in a variety of in situ studies. 474
4.1.2 General framework and empirical relationships 475
The general premise of DAISEA is that adg() can be accurately modeled using an 476
exponential model and that deviations from this exponential model are solely due to aph(). There 477
are alternate explanations for both of these assumptions (e.g., Cael and Boss 2017; Catalá et al. 478
2016); however, there is biogeochemical significance in Sdg, while phytoplankton would 479
presumably produce the largest deviation from an exponential signal as observable from satellite 480
ocean color data. Beyond these basic assumptions, we also considered the relationship between 481
adg() and aph() within a theoretical framework (Fig. 8). Based on this framework, it is important 482
to recognize how varying contributions of each component will inherently lead to specific biases. 483
For example, S350:400 fitted to at-w() where aph() contributes will result in lower slope values due 484
to the higher contribution of aph() at 400 nm relative to 350 nm. This is why we increased 485
estimates of Sdg first, then alternated to decreasing Sdg, as an exponential fit of at-w() will produce 486
lower S values when aph() contributes to the signal. Finding where the second derivative of at-487
w() equals 0 and fitting an exponential at these points minimizes this impact (essentially “cutting 488
through” primary pigment features for a least squares fit); however, there was still a consistent 489
bias towards lower Sdg values as aph(440) contribution increased, as expected. The general 490
framework illustrated in Fig. 8 is also the justification for setting a ratio of 1.5 to aph(350):aph(440); 491
when the residual used to estimate aph() had a ratio higher than this, it was almost always 492
indicative of a significant portion of the adg() signal remaining in the residual used to calculate 493
aph(). 494
In short of independent variables to validate each component of interest, some explicit 495
assumptions are required within any algorithm framework. Here, we chose to limit our solutions 496
by constraining initial adg(440) estimates by the empirical relationship between at-w(555)/at-w(680) 497
and %aph(440) from the training dataset (Fig. 9a) and a theoretical ratio of 1.5 for 498
aresidual(350)/aresidual(440) (Eq. 7) to determine whether the contribution of adg() to at-w() from 499
350-400 nm had been reasonably estimated and removed. These relationships do not explicitly 500
dictate the final product, but guide the algorithm to reasonable estimates, at which point fitting is 501
not constrained by these specific values. They do, however, leave an impact on how results are 502
constrained. As we discussed previously, empirical relationships can often fall short of their 503
intended accuracy. Despite a similar optical and geographical distribution between the training and 504
test datasets (Fig. 1), the piece-wise exponential relationship derived from the training dataset to 505
predict %aph(440) (r2=0.91, RMSD=0.068 for fitted points) did not predict the same relationship 506
nearly as well for the test dataset (r2=0.58, RMSD=0.110; Fig. 9b). We considered sensitivity to 507
this empirical relationship on the test dataset. By using a single exponential expression fitted to 508
the test dataset (data points in Fig. 9b), with a value of 0.779 instead of 1.038 and -0.5834 instead 509
of -0.9257 in Eq. 3, NRMSD fell below 6% for all wavelengths, with higher values in the UV and 510
lower at longer wavelengths (data not shown). However, the number of Gaussian curves fitted 511
within the algorithm were different for 17% of spectra, with nearly all instances fit with one fewer 512
Gaussian curves. This suggests that the algorithm is relatively robust across datasets but does 513
exhibit significant sensitivity to empirical values. 514
We adjusted the theoretical value of 1.5 to lower values as a stricter threshold for removing 515
a residual adg() signal from the estimate of aph() derived from Eq. 9. Algorithm results did not 516
significantly change with values less than 1.5; however, spectra that contain pigments below 400 517
nm (e.g., mycosporine-like amino acids) required a value of 1.5 to adequately identify and fit these 518
pigments. For spectra that did not contain a significant absorption signal below 400 nm, the shape 519
of the spectra here is predominantly exponential. If a Gaussian component is erroneously assigned 520
in this spectral region, as in the example spectra in Figs. 2 and 3, the curve can be minimized in 521
Step 8 by fitting an adjusted exponential to at-w(). This adjustment is allowable within the 522
constraints provided and provides for consistent and stable solutions, since no Gaussian 523
components are dropped. This approach and the empirical values used best fit our global dataset, 524
but adjusting empirical values to a regional value is quite easy within the code available online 525
(Section 2.2.2). 526
4.2 Gaussian Decomposition Approaches 527
Gaussian component location was consistent between observed aph() and DAISEA output 528
(Fig. 7). We did not utilize Gaussian components to estimate the final aph() output, as we found 529
that a smoothed residual of at-w()-adg_est() more accurately represented observed aph(). This is 530
likely due to fitting fewer Gaussian components than needed to accurately model aph(), as 531
DAISEA fits fewer peaks than alternate Gaussian decomposition schemes due to a difference in 532
methodologies (Hoepffner and Sathyendranath 1993; Chase et al. 2013). These algorithms 533
typically identify pigments from first and second derivative analysis of an existing database of 534
phytoplankton spectra then assign windows around these points (typically 12 peaks). Our approach 535
focuses on identifying primary pigment features to best fit observed at-w() without assuming the 536
locations of pigments, resulting in fewer identified peaks (~7 peaks). There is potential to increase 537
the sensitivity of the peak finding step. Our focus was to retrieve adg() and aph() accurately, 538
including spectral shape, rather than individually parameterizing phytoplankton pigments. It is 539
possible to utilize the aph() output in a separate Gaussian decomposition scheme, or other 540
approach that identifies phytoplankton pigments. However, it should be noted that derivative 541
analysis of the final aph() output, even after smoothing, resulted in more identified peaks than the 542
observed aph() using our scheme. This is very likely due to the inclusion of chromophores in ag() 543
and ad() that result in deviations from the typical exponential expression used to model these 544
parameters, features visibly apparent in many of the adg() spectra. While often overlooked, these 545
features have been recognized for some time (Babin et al. 2003; Schwarz et al. 2002) and a recent 546
methodology for fitting these peaks provides a means of both quantifying them and more 547
accurately modeling the underlying exponential signal (Catalá et al. 2016; Massicotte and 548
Markager 2016; Grunert et al. 2018). This approach is useful for in situ data, but not practical for 549
our proposed methodology and likely a non-factor when considering at-w() derived from satellite 550 Rrs(). 551 552 5. Conclusions 553
We show that across most optical conditions considered, DAISEA can accurately estimate 554
adg(), Sdg and aph() magnitude and spectral features for all water types where %aph(440) > 10. 555
We parameterized the percent of adg() and aph() estimates that were retrievable by comparing 556
the signal observed for one IOP to the difference between estimated and observed values obtained 557
for the other IOP. Consistent with the general accuracy of DAISEA, primary features (i.e., 558
chlorophyll-a absorption peaks) of aph() were retrievable for greater than 80% of spectra across 559
environments where %aph(440) > 10; adg() was retrievable for at least 80% of spectra from 350-560
650 nm when %aph(440) < 70. NRMSD metrics suggest strong algorithm performance across most 561
optical variability from 350-650 nm. Algorithm bias shows a tendency to overestimate aph() when 562
%aph(440) < 40 and to underestimate aph() when %aph(440) > 60. 563
Currently, coincident hyperspectral measurements of Rrs(), bbp(), aph(), ad() and ag() 564
observed down to a minimum wavelength of 350 nm, the proposed lower wavelength limit of 565
PACE, are quite uncommon relative to coincident measurements at wavelengths ≥ 400 nm. This, 566
along with limited hyperspectral inversion approaches free of spectral bias, limited our ability to 567
fully assess how well DAISEA will perform in the context of a top-down spectral deconvolution 568
approach. Despite empirical schemes for separation of adg() and Sdg into the component parts 569
(NAP and CDOM; e.g., Dong et al. 2013), we did not pursue separation here. Considering current 570
algorithm performance, we anticipate that a well-performing scheme to separate adg() into its 571
component parts will allow for appropriate resolution in Sg to estimate source and degradation 572
state of CDOM in the surface ocean. 573
Acknowledgements 574
We gratefully acknowledge contributors to the SeaBASS data set (https://seabass.gsfc.nasa.gov/). 575
Thank you to Piotr Kowalczuk, Frédéric Mélin and two anonymous reviewers for comments that 576
greatly improved the final version of this manuscript. Algorithm development was funded by a 577
NASA Earth and Space Science Fellowship awarded to Grunert. 578
579 580
References 581
582
Andrew, A.A., Del Vecchio, R., Subramaniam, A., & Blough, N.V. (2013). Chromophoric 583
dissolved organic matter (CDOM) in the Equatorial Atlantic Ocean: Optical properties 584
and their relation to CDOM structure and source. Marine Chemistry, 148, 33-43 585
Babin, M., Stramski, D., Ferrari, G.M., Claustre, H., Bricaud, A., Obolensky, G., & Hoepffner, 586
N. (2003). Variations in the light absorption coefficients of phytoplankton, nonalgal 587
particles, and dissolved organic matter in coastal waters around Europe. Journal of 588
Geophysical Research, 108 589
Behrenfeld, M.J., Hu, Y., Hostetler, C.A., Dall'Olmo, G., Rodier, S.D., Hair, J.W., & Trepte, 590
C.R. (2013). Space-based lidar measurements of global ocean carbon stocks. Geophysical 591
Research Letters, 40, 4355-4360 592
Behrenfeld, M.J., Hu, Y., O’Malley, R.T., Boss, E.S., Hostetler, C.A., Siegel, D.A., Sarmiento, 593
J.L., Schulien, J., Hair, J.W., Lu, X., Rodier, S., & Scarino, A.J. (2016). Annual boom– 594
bust cycles of polar phytoplankton biomass revealed by space-based lidar. Nature 595
Geoscience, 10, 118-122 596
Bracher, A., Vountas, M., Dinter, T., Burrows, J.P., Rüttgers, R., & Peeken, I. (2009). 597
Quantitative observation of cyanobacteria and diatoms from space using PhytoDOAS on 598
SCIAMACHY data. Biogeosciences, 6, 751-764 599
Brando, V., Dekker, A., Park, Y.J., & Schroeder, T. (2012). Adaptive semianalytical inversion of 600
ocean color radiometry in optically complex waters. Appl Opt, 51 601
Bricaud, A., Babin, M., Morel, A., & Claustre, H. (1995). Variability in the chlorophyll-specific 602
absorption coefficients of natural phytoplankton: Analysis and parameterization. Journal 603
of Geophysical Research, 100, 13,321-313,332 604
Bricaud, A., & Morel, A. (1986). Light attenuation and scattering by phytoplanktonic cells: a 605
theoretical modeling. Appl Opt, 25, 571-580 606
Bricaud, A., Morel, A., & Prieur, L. (1983). Optical efficiency factors of some phytoplankters. 607
Limnology & Oceanography, 28, 816-832 608
Cael, B.B., & Boss, E. (2017). Simplified model of spectral absorption by non-algal particles and 609
dissolved organic materials in aquatic environments. Opt Express, 25, 25486-25491 610
Catalá, T.S., Reche, I., Ramón, C.L., López‐Sanz, À., Álvarez, M., Calvo, E., & Álvarez‐ 611
Salgado, X.A. (2016). Chromophoric signatures of microbial by‐products in the dark 612
ocean. Geophysical Research Letters, 43, 7639-7648 613
Chang, G.C., & Dickey, T.D. (2004). Coastal ocean optical influences on solar transmission and 614
radiant heating rate. Journal of Geophysical Research, 109 615
Chase, A., Boss, E., Zaneveld, R., Bricaud, A., Claustre, H., Ras, J., Dall’Olmo, G., & 616
Westberry, T.K. (2013). Decomposition of in situ particulate absorption spectra. Methods 617
in Oceanography, 7, 110-124 618
Chase, A.P., Boss, E., Cetinić, I., & Slade, W. (2017). Estimation of Phytoplankton Accessory 619
Pigments From Hyperspectral Reflectance Spectra: Toward a Global Algorithm. Journal 620
of Geophysical Research: Oceans, 122, 9725-9743 621
Ciotti, A.M., Lewis, M., & Cullen, J.J. (2002). Assessment of the relationship between dominant 622
cell size in natural phytoplankton communities and the spectral shape of the absorption 623
coefficient. Limnology & Oceanography, 47, 404-417 624
Cory, R.M., & Kling, G.W. (2018). Interactions between sunlight and microorganisms influence 625
dissolved organic matter degradation along the aquatic continuum. Limnology and 626
Oceanography Letters, 3, 102-116 627
Danhiez, F., Vantrepotte, V., Cauvin, A., Lebourg, E., & Loisel, H. (2017). Optical properties of 628
chromophoric dissolved organic matter during a phytoplankton bloom. Implication for 629
DOC estimates from CDOM absorption. Limnology and Oceanography, 62, 1409-1425 630
Dutkiewicz, S., Hickman, A.E., Jahn, O., Gregg, W.W., Mouw, C.B., & Follows, M.J. (2015). 631
Capturing optically important constituents and properties in a marine biogeochemical and 632
ecosystem model. Biogeosciences, 12, 4447-4481 633
D’Errico, J. (2005). Surface Fitting using gridfit. In. 634
(
https://www.mathworks.com/matlabcentral/fileexchange/8998-surface-fitting-using-635
gridfit): MATLAB Central File Exchange 636
Fichot, C.G., & Benner, R. (2011). A novel method to estimate DOC concentrations from 637
CDOM absorption coefficients in coastal waters. Geophysical Research Letters, 38 638
Fichot, C.G., Benner, R., Kaiser, K., Shen, Y., Amon, R.M.W., Ogawa, H., & Lu, C.-J. (2016). 639
Predicting Dissolved Lignin Phenol Concentrations in the Coastal Ocean from 640
Chromophoric Dissolved Organic Matter (CDOM) Absorption Coefficients. Frontiers in 641
Marine Science, 3 642
Fichot, C.G., Kaiser, K., Hooker, S.B., Amon, R.M., Babin, M., Belanger, S., Walker, S.A., & 643
Benner, R. (2013). Pan-Arctic distributions of continental runoff in the Arctic Ocean. Sci 644
Rep, 3, 1053 645
Grunert, B.K., Mouw, C.B., & Ciochetto, A.B. (2018). Characterizing CDOM Spectral 646
Variability Across Diverse Regions and Spectral Ranges. Global Biogeochemical Cycles, 647
32, 57-77 648
Hansell, D.A., Carlson, C.A., Repeta, D.J., & Schlitzer, R. (2009). Dissolved organic matter in 649
the ocean: A controversy stimulates new insights. Oceanography, 22, 202-211 650
Helms, J.R., Stubbins, A., Perdue, E.M., Green, N.W., Chen, H., & Mopper, K. (2013). 651
Photochemical bleaching of oceanic dissolved organic matter and its effect on absorption 652
spectral slope and fluorescence. Marine Chemistry, 155, 81-91 653
Helms, J.R., Stubbins, A., Ritchie, J.D., Minor, E.C., Kieber, D.J., & Mopper, K. (2008). 654
Absorption spectral slopes and slope ratios as indicators of molecular weight, source, and 655
photobleaching of chromophoric dissolved organic matter. Limnology and 656
Oceanography, 53, 955-969 657
Hieronymi, M., Müller, D., & Doerffer, R. (2017). The OLCI Neural Network Swarm (ONNS): 658
A Bio-Geo-Optical Algorithm for Open Ocean and Coastal Waters. Frontiers in Marine 659
Science, 4 660
Johnsen, G., Nelson, N., Jovine, R.V.M., & Prezelin, B. (1994). Chromoprotein- and pigment 661
dependent modeling of spectral light absorption in two dinoflagellates, Prorocentrum 662
minimum and Heterocapsa pygmaea. Marine Ecology Progress Series, 114, 245-258 663
Kim, G.E., Gnanadesikan, A., & Pradal, M.-A. (2016). Increased Surface Ocean Heating by 664
Colored Detrital Matter (CDM) Linked to Greater Northern Hemisphere Ice Formation in 665
the GFDL CM2Mc ESM. Journal of Climate, 29, 9063-9076 666
Lee, Z., Carder, K.L., Arnone, R., & He, M. (2007). Determination of primary spectral bands for 667
remote sensing of aquatic environments. Sensors, 7, 3428-3441 668
Lee, Z., Carder, K.L., & Arnone, R.A. (2002). Deriving inherent optical properties from water 669
color: a multiband quasi-analytical algorithm for optically deep waters. Appl Opt, 41, 670
5755-5772 671
Mannino, A., Novak, M.G., Hooker, S.B., Hyde, K., & Aurin, D. (2014). Algorithm 672
development and validation of CDOM properties for estuarine and continental shelf 673
waters along the northeastern U.S. coast. Remote Sensing of Environment, 152, 576-602 674
Mentges, A., Feenders, C., Seibt, M., Blasius, B., & Dittmar, T. (2017). Functional Molecular 675
Diversity of Marine Dissolved Organic Matter Is Reduced during Degradation. Frontiers 676
in Marine Science, 4, 194 677
Morel, A., & Prieur, L. (1977). Analysis of variations in ocean color. Limnology & 678
Oceanography, 22, 709-722 679
Mouw, C.B., Greb, S., Aurin, D., DiGiacomo, P.M., Lee, Z., Twardowski, M., Binding, C., Hu, 680
C., Ma, R., Moore, T., Moses, W., & Craig, S.E. (2015). Aquatic color radiometry remote 681
sensing of coastal and inland waters: Challenges and recommendations for future satellite 682
missions. Remote Sensing of Environment, 160, 15-30 683
Mélin, F., & Vantrepotte, V. (2015). How optically diverse is the coastal ocean? Remote Sensing 684
of Environment, 160, 235-251 685
Nelson, N.B., Siegel, D.A., Carlson, C.A., & Swan, C.M. (2010). Tracing global biogeochemical 686
cycles and meridional overturning circulation using chromophoric dissolved organic 687
matter. Geophysical Research Letters, 37 688
Ogawa, H., Amagai, Y., Koike, I., Kaiser, K., & Benner, R. (2001). Production of refractory 689
dissolved organic matter by bacteria. Science, 292, 917-920 690
Riedel, T., & Dittmar, T. (2014). A method detection limit for the analysis of natural organic 691
matter via Fourier transform ion cyclotron resonance mass spectrometry. Analytical 692
chemistry, 86, 8376-8382 693
Sadeghi, A., Dinter, T., Vountas, M., Taylor, B., Altenburg-Soppa, M., & Bracher, A. (2012). 694
Remote sensing of coccolithophore blooms in selected oceanic regions using the 695
PhytoDOAS method applied to hyper-spectral satellite data. Biogeosciences, 9, 2127-696
2143 697
Schwarz, J.N., Kowalczuk, P., Kaczmarek, S., Cota, G., Mitchell, B.G., Kahru, M., Chavez, F.P., 698
Cunningham, A., McKee, D., Gege, P., Kishino, M., Phinney, D.A., & Raine, R. (2002). 699
Two models for absorption by coloured dissolved organic matter (CDOM). Oceanologia, 700
44, 209-241 701
Stedmon, C.A., & Markager, S. (2003). Behaviour of the optical properties of coloured dissolved 702
organic matter under conservative mixing. Estuarine, Coastal and Shelf Science, 57, 973-703
979 704
Trochta, J.T., Mouw, C.B., & Moore, T.S. (2015). Remote sensing of physical cycles in Lake 705
Superior using a spatio-temporal analysis of optical water typologies. Remote Sensing of 706
Environment, 171, 149-161 707
Vandermeulen, R.A., Mannino, A., Neeley, A., Werdell, J., & Arnone, R. (2017). Determining 708
the optimal spectral sampling frequency and uncertainty thresholds for hyperspectral 709
remote sensing of ocean color. Opt Express, 25, A785-A797 710
Vantrepotte, V., Danhiez, F.P., Loisel, H., Ouillon, S., Meriaux, X., Cauvin, A., & Dessailly, D. 711
(2015). CDOM-DOC relationship in contrasted coastal waters: implication for DOC 712
retrieval from ocean color remote sensing observation. Opt Express, 23, 33-54 713