First, the HDF5 files are easily loaded into MATLAB with one command, turning the binary data into decimal format automatically.
The CRL data from the Licel counters is recorded as multi-bit words; The photon count- ing channels are 16-bit words while the analogue channels are 25-bit words formed from a 16-bit least significant word and a 9-bit most significant word [94]. In each case, the most significant bit is a flag bit which indicates that the counter has either underflowed or overflowed on at least one laser scan. In this case, one or more of the 600 scans recorded that minute has oversaturated the detector, or has recorded exactly a value of zero. In both cases, the transient recorder indicates that the value is not to be trusted. This is because the true value could have been exactly the maximum value measured, or it could have been far in excess of this number, as only the maximum measurable value will be reported.
A complicating factor for the manner in which MATLAB reads HDF5 files is that it makes no distinction between this flag bit and the bits recording the actual value of photon counts or analogue voltage in the channel. To be useable, the MATLAB values must each be turned back into binary, the most significant bit must be trimmed from the binary word, and the remains must be turned back into a decimal value.
The procedure for dealing with the flagged bits was necessarily different for photon counting and analogue channel data because of the different ways in which MATLAB’s
loading function interprets the flags. The effect of the flag bit being “flagged” in PC is that MATLAB interprets the entire value as a negative number. Conversely, the effect of the flag bit being “flagged” in ANA is that MATLAB interprets the entire value as being much, much larger than the actual stored value. This difference results in different code being written to trim the flag bit from each kind of data.
5.3.1
Handling Analogue channel overflow flags
For analogue channels, MATLAB interprets the flagged bit to be a regular bit in the number; that is to say, setting this bit to ‘1’ results in a value which is orders of magnitude larger than the same value would be with the ‘0’ of an unflagged bit.
There are two options for dealing with this. The first, and the option used for CRL, is to use the the dec2bin command to convert the decimal number into binary. Then make a new variable which takes all binary bit values except for the most significant bit. Use bin2dec to take this new binary value, and turn it back into a decimal number, which is the measured value according to the transient recorder. To keep track of which data points have had a flagged bit in the first place, the difference between the original and new decimal numbers may be examined. If the difference is zero, then the most significant bit was zero also, as chopping it off has not had any influence. If the difference is non-zero, then one of the scans going into that value must have been overflowed.
An alternate method for identifying the flagged bit is simply to determine the maximum decimal value to which the Licel is able to measure. In the case of the analogue channel, which has a 25-bit value including the flagged bit, and thus contains 24-bits of true infor- mation, the maximum decimal value the counter can achieve is 224 =8388608. Any values
higher than that must have occured by setting the most significant (flag) bit to ‘1’, reaching a maximum decimal value of 225= 33554432. If the flag bit is set, but otherwise the counts
are zero, then the value in binary will be 1000000000000000000000000. That is a ‘1’ followed by 24 zeros. The decimal representation of this number is 16777216. There is
evidently a huge difference between the largest decimal value to which a non-overflowed count can reach and the smallest overflowed count which will be recorded.
It should be possible to simply remove any analogue data points which have been flagged, but in practice it is instructive to simply convert them to their “as measured” val- ues, and note that they have in fact been flagged. Any rejection of data based on such flags may be done at a later stage if so required. This is discussed further in Section 5.3.3.
5.3.2
Handling Photon Counting channel overflow flags
For the photon counting channels, the method used for analogue channels does not work. This is because, in the case of photon counting channels, MATLAB interprets flag bits as indicative of a negative number rather than a very large one, and dec2bin only works on nonnegative integer inputs.
Instead, photon counting raw count values lower than zero in decimal notation, indicat- ing overflowed large numbers, are identified on the basis of their sign, and are excluded in that manner from further analysis.
5.3.3
Removal (or not) of the flagged values
Recall that the PC channel is best for low count rates, and saturates easily. The ANA channel is better for high count rates. It does not saturate until much higher count rates than the PC channel, although its uncertainties exceed those for the PC channel in regions where both are valid.
The decision to remove flagged photon counting values is simple. If the PC channel is saturated to the point of flagging the most significant bit, then it has also begun to respond nonlinearly at count rates even below this point. For the depolarization glued channels, anywhere the photon counting has saturated may be filled in using information from the analogue channels during a gluing procedure. As will be shown in the procedures about merging/gluing, the best transition region in terms of count rate from PC to ANA data
happens at a count rate much lower than the PC’s saturation threshold, in a region where the PC counts are still responding linearly to input signals.
Decisions for the analogue channels require more consideration. It is clear that one should not just blindly ignore the flagged bits. The consequence of doing this would be a profile with sudden jumps to orders-of-magnitude higher values than those actually mea- sured. At the very least, the flagged bit itself must be removed from the measured value according to the procedure in Section 5.3.1. This leaves a profile with the values the data acquisition system reports as having measured.
It is then up to the scientist to determine whether the values measured by the transient recorder, and reported as flagged, are trustworthy or not. An examination of a general selection of lidar data will reveal whether the lidar’s analogue channel is consistently oper- ating at the top of its dynamic range. If this is the case, and there is an overflow bin flagged, then chances are that it is truly an overflow, and that measurements from many of the 600 laser shots contributing to each minute of data have overflowed. One should not use the bin’s data because it is, on the whole, underreporting the number of photons incident on the detector for bits which are flagged. Consistently high numbers of overflowed bins suggest that the analogue range settings should be reconsidered to allow for higher count rates, or perhaps a neutral density filter can be placed in front of the detector to reduce the incoming signal.
A flagged bit is a different situation entirely if it is found practically in isolation of other flagged bits, which happens if the analogue channel operates comfortably within its dynamic range most of the time. This is the case for CRL; most often, the analogue channels are operated at reasonable signal levels. A flagged bit in these conditions is less likely to indicate an overflow in many of the 600 contributing laser shots. It is more likely to indicate a “fluke” spike from an oriented ice crystal or similar for only one of the 600 shots in a minute of data. Even a sizeable overflow in one of 600 shots is not going to make a huge difference when you add them all up and take the average. If the signal were to, in
an extreme case, actually double for a single one of the laser shots, this would only be an increase in overall signal for that minute of (1/600)x100%=0.1167%.
The case is further solidified if the values, once the flagged bit has been trimmed off, lie within the realm of neighbouring values. If the value still appears to be an outlier, then potentially it should be excluded. More likely, if the value fits in with its neighbours, the number remains meaningful.
For further encouragement of this practice (remove flagged PC; keep flagged bit-trimmed ANA but note that they are flagged in case it matters later), an examination was done of derived data products for the CRL. If the flagged analogue bits were truly meaningless, then odd effects would be expected. These have not been seen in several years worth of data, in any of the derived data products produced by the CRL.