This problem involves two steps. First, given an attitude and orbit parameter, the algorithm needs to predict the CCD temperature. Next, it needs to find the attitude that produces the minimum temperature.
8.4.1 Away-from-Sun Attitude
BRITE-Toronto thermal modeling uses for the worst cold case the orientation shown in Figure 8.2 in which only one face of the satellite is facing the sun. This orientation minimizes the total area of the satellite that absorb solar radiation, and thus reduces the amount of heat absorbed from the sun.
Chapter 8. BRITE-Toronto Temperature Prediction 53
Figure 8.2: BRITE-Toronto coldest orientation in thermal modeling[25]
This attitude is the opposite of the sunβs position vector, which can be estimated to a precision of about 36 arcseconds using equations from the Astronomical Almanac [26].
Figure 8.3: Right Ascension and Declination in Equatorial Coordinates[27]
Given the RA of the sun πΌπ π’πand the declination πΏπ π’πin equatorial coordinates, the satellite attitude that points the telescope in the opposite direction away from the sun can be found with the following equations.
πΌ = πΌπ π’π+ 12β mod 24β πΏ = βπΏπ π’π
(8.6)
As shown in Figure 8.3, RA is expressed in hours starting from the vernal equinox, with a full circle being 24β. Declination measures the angular distance in degrees from the equator, with positive values to the north and negative to the south. Thus, to get the opposite orientation, 12βcorresponding to a half circle
Chapter 8. BRITE-Toronto Temperature Prediction 54 is added to the RA and the negative declination is taken.
Because the CCD is mounted at the back of the telescope and thermally insulated from the back panel closest to it, having the back panel facing the sun and the telescope facing empty space usually produces a reasonable temperature. This method has the advantage that it doesnβt require any past data or data processing. The disadvantage is that it usually doesnβt produce a low temperature either, with an average of around 22Β°C. There are four other panels of the satellite that can face the sun instead of the panel at the back of the telescope. Using these alternate attitudes might yield lower temperatures for the CCD and star tracker.
Given its simplicity and reliability, this method is used as a baseline to compare the performance of the other more complex methods.
8.4.2 Linear and Polynomial Regression
Predicting a scalar value giving scalar inputs is a regression problem. Linear regression is a method that approximates the output as a linear function of the inputs.
The output
π¦ = wπx + π (8.7)
where w is the weight, x is the input, and π is the bias. The bias is set to zero.
One can also combine all the training examples into a design matrix to use efficient linear algebra libraries such as NumPy in Python. The design matrix
X =
Here t is the temperatures of the training data, also known as the target vector.
Because the model is nonlinear in nature as it involves trigonometry functions, polynomial regression is used. In polynomial regression, feature mapping is used so that the same linear regression algorithm can be applied to polynomial functions. The feature map π(x) for a 2-degree polynomial is a column matrix consisting of all the 2-degree terms made up by the variables in the input vector x
π(x) =(οΈ
πΌ πΏ πΎ π β¦ π πΌ2 πΌπΏ πΌπΎ . . . β¦π π2 )οΈπ
(8.11) .
Chapter 8. BRITE-Toronto Temperature Prediction 55 The feature map for a 3-degree polynomial is similar, with additional degree 3 terms with every combi-nation of the 6 parameters. The total number of features for a 2-degree polynomial feature map is 27 and the total number of features for a 3-degree polynomial feature map is 83. Now with the polynomial feature maps, equation 8.10 can be used to find the optimal weights that minimizes the cost function.
Regularization is used to keep the weights small. With πΏ2 regularization, the cost becomes π½ =
1 2π
βοΈπ
π=1(π¦(π) β π‘(π))2+ π2βοΈ
ππ€2π, where π is a hyper-parameter that could be tuned using a validation set. Equation 8.10 is modified to
w = (XπX + πI)β1Xπt. (8.12)
8.4.3 k-Nearest-Neighbors
k-Nearest-Neighbors is an algorithm that classifies test data by looking at the k training data points closest to the test data. In the case of regression, it predicts the output (temperature) given the input (attitude and orbit parameters) by finding the k points closest to the input in the training data set and using their average as the output temperature.
The k-NN algorithm takes as input x(π)= (πΌ, πΏ, πΎ, π, β¦)without the day of year because the day of year is not needed to find the closest points. All five input parameters are angles in degrees.
First, the algorithm finds k nearest neighbours to a test point x*based on the distance defined as
d = (180Β° β | | x*β x(π)| β180Β°|)2 (8.13)
| x | is the absolute value of x. The subtraction of 180 degrees and the absolute value functions wrap around the angles and make sure that 0Β° and 359Β° have a distance of 1 and not 359. Now d is a 5 by 1 vector of the differences in each angle parameter. To get a single distance scalar, it can be summed with weights.
Given the weights z = [π§πΌ, π§πΏ, π§πΎ, π§π, π§Ξ©]π, the distance is
π = dπz (8.14)
Then, the closest k points are found by using the argmin function on π. The predicted temperature is then the average of the temperatures of the closest k points.
8.4.4 Neural Network
A neural network is a form of nonlinear regression. By combining simple neurons that follow nonlinear regression rules into multiple layers, the neural network can perform powerful computations. Instead of equation 8.7, a nonlinear function π also known as the activation function is used between the inputs and output.
π¦ = π(π§) = π(wπx + π), (8.15)
where w is the weights vector, x is the input, and b is the bias.
There are many possible activation functions. The Soft ReLU (Rectified Linear Unit) π¦ = log(1 + ππ§), logistic function π¦ = 1+π1βπ§, and the Hyperbolic Tangent (tanh) function π¦ = πππ§π§βπ+πβπ§βπ§ are considered. The hyperbolic tangent function is chosen because of its numerical stability.
Chapter 8. BRITE-Toronto Temperature Prediction 56
Figure 8.4: Neural Net Structure and Node Labeling
Figure 8.5: Fully Connected Neural Net
The input x to the neural network is of the same form as linear regression, where x = [πΌ, πΏ, πΎ, π, β¦, π]. As shown in Figure 8.4, three hidden layers β1, β2, β3 are defined all with dimension of 6. All the layers are fully connected, as illustrated in Figure 8.5. As such, there are weight matrices π1, π2, π3 of size (6, 6), π4 of size (6, 1), bias vectors π1, π2, π3 of size (6, 1), and scalar bias π4.
Because there are 700,000 training data, stochastic gradient descent is used to train the network. A subset of the training data set is selected in each iteration of gradient descent. Calculating the average loss
Chapter 8. BRITE-Toronto Temperature Prediction 57 for the subset is more efficient than calculating the average loss for the entire training data set. To calculate the gradient, the function βgradβ from the autograd Python module is used.
8.4.5 Finding the Optimal Attitude
Linear regression, k-NN, and neural networks can predict the CCD temperature given a spacecraft attitude and orbit parameters. To find the optimal attitude for a given set of orbit parameters, three methods are used.
A brute-force way that works for any methods of prediction is creating a mesh grid of different possible attitudes, and then predicting the temperature corresponding to each attitude using one of the methods mentioned above. After that, the minimum predicted temperature can be found along with its corresponding attitude. Because there are three attitude parameters RA, declination, and roll, the dimension of the mesh grid is three. It is computationally inefficient to create and compute on a fine mesh grid. For a less fine mesh grid such as every 5 to 10 degrees of each parameter, this method can work. After a minimum temperature is found in a coarse mesh grid, a finer mesh grid can be used around that optimal attitude to further refine the output.
For linear regression, after the weights are computed, an analytical minimum can be found for the polynomial equation.
π¦ = wππ(x) (8.16)
To find the minimum, the gradient of y with respect to the three attitude parameters π₯1 = πΌ, π₯2 = πΏ, π₯3= πΎ is set to 0, that is
To demonstrate for feature map with dimension πΎ = 2, πΏπ¦
For k-NN, for a given orbit inclination and RAAN, π points with the closest orbit inclination and RAAN are found from the training data. From the π closest points, π points are returned that have the lowest temperatures. π is set to between 100 and 3000, and π is set to between 3 and 10.