Building Energy System Identification Model Development

4. CHAPTER 4 On-line Building Energy Forecasting: System Identification Methodology

4.4 Building Energy System Identification Model Development

In this section, the detailed methods for system structure selection, order determination, and parameter identification are introduced.

4.4.1 Building Energy System Identification Model Structure Selection

Generally, model structure selection for data-aided system identification is an underdeveloped field [149]. There are two category model structures: time domain model and frequency domain model. The frequency domain model refers to the mathematical functions for the system with respect to frequency, rather than time. The most benefit of frequency domain model is to lower down the model order, even convert the nonlinear model to linear model, which is much easier to solve. However, there has not been much discussion about which model

structure is better for a building, nor frequency domain methods used in building energy forecasting. Therefore, this study will try to explore the use of frequency domain method in building energy forecasting and provide system structure selection recommendations for model energy forecasting model development.

To better recommend the model structure for building energy forecasting, this project also plans to compare the performance of different potential models, such as Frequency response model developed in Chapter 3, RC model, and different pure data driven models.

4.4.2 Building Energy System Identification Model Order Determination

Unlike model structure selection, there are many different ways to estimate a system’s order based on data analysis:

1. Testing rank of covariance matrix

Suppose the order of a system is “n”, and let

𝜑𝑠(𝑡) = [−𝑦(𝑡 − 1) … . −𝑦(𝑡 − 𝑠) 𝑢(𝑡 − 1) … 𝑢(𝑡 − 𝑠)]𝑇

Where, 𝜑𝑠(𝑡) is a testing vector containing both outputs, y, and inputs u. Then the covariance matrix will be:

𝑅𝑠(𝑁) = 1

𝑁∑ 𝜑𝑠(𝑡) 𝑁 𝑡=1

𝜑𝑠𝑇(𝑡)

Therefore, R will be nonsingular for 𝑠 ≤ 𝑛 and singular for 𝑠 > 𝑛 [149]. Hence system order can be calculated by determining 𝑠.

2. Testing correlating variables

The order-determination is to include the right number of variables in the model. This problem can be solved by checking whether a new variable has any contribution for the output. This contribution can be measured by the correlation test between system outputs and inputs.

3. Testing rank of information matrix

Information matrix is a measure of the amount of information that a signal (x) contains about an unknown parameter ():

𝐼𝑛(𝜃)𝑖,𝑗= −𝐸 [

𝜕2_{ln 𝑓(𝑋|𝜃)} 𝜕𝜃𝑖𝜕𝜃𝑗

]

The rank of the information matrix, 𝐼𝑛, is the order of the system.

All these three methods are planned to be used and compared in the future system order determining study under different situation. An overall system order determination strategy will be recommended based on the comparison studies.

This information matrix is also related to the system identifiability. System identifiability is whether the identification procedure will yield a unique value of the parameter () and whether the final model is representing the true system. A model structure 𝐹(𝑧,) is locally structurally identifiable at ∗∈ Θ, if for all 𝜃1 and 𝜃2 in the neighborhood of ∗, then for all z [179]:

𝐹(𝑧, 𝜃1) = 𝐹(𝑧, 𝜃2) ⇒ 𝜃1= 𝜃2

It is important to note that even a model is structurally identifiable but it might not be data dependent identifiable if the input data is low quality:

𝐹(𝑧, 𝜃1)𝑢𝑘= 𝐹(𝑧, 𝜃2)𝑢𝑘 ⇒ 𝜃1= 𝜃2

Where, 𝑢𝑘 is system input and model structure 𝐹(𝑧,) is data dependent identifiable [179]. Therefore, in order to generate high quality system training data, a system excitation plan is developedd. To test the data-dependent system identifiability, Fisher matrix, M, bounds the covariance of the parameter estimation error according the Cramer-Rao inequality [149]. Suppose 𝜀𝑘(𝜃) = 𝑦𝑘− 𝑦̂𝑘 and 𝜀𝑘~𝑁(0, Σ𝑒), then M is defined as:

𝑀 = ∑ [𝜕𝜀𝑘(𝜃) 𝜕𝜃 𝛴𝑒 −1𝜕𝜀𝑘(𝜃) 𝜕𝜃 𝑇 ] 𝑁 𝑘=1 = ∑𝜕𝑦̂𝑘(𝜃) 𝜕𝜃 𝛴𝑒 −1𝜕𝑦̂𝑘(𝜃) 𝜕𝜃 𝑇 𝑁 𝑘=1 Eq. 4.12

However, the Fisher matrix cannot be used to select the system model structure, because it need structure model as prior information for calculation. The Fisher matrix usually uses to identify the model parameters, and guide to design experiment for training data collecting.

4.4.3 Building Energy System Identification Model Training Data Generation

Most of the building energy systems are typically only operating within a very small range of settings, for example, temperature setpoints, internal equipment schedules. Usually, the passive model training methods are using regular building operation data for certain days. The only way to improve the training data is to increase the training period time. Considering that an active training method is developedd in this study, which not only controls the length of training period but also improves the quality of training data. The active training method is realize by optimizing the excitation signal based on data information entropy and optimal experiment design theories. The excitation signals are needed to satisfy key theoretical assumptions on reliable statistical identification – persistent exciting signals [180]. Three basic “plant-friendly” excitation signal constraints were presented by Braun et al. [159]:

1. Keeping minimum deviations in control signal;

2. Implementing a signal of short duration to minimize the amount of off-spec product; 3. Keeping move sizes small to satisfy actuator constraints and minimize “wear and tear”

on process equipment.

Pseudo-random binary signals, sum of sinusoids and multilevel pseudo-random signal are the three most common excitation signals in system identification. For dynamic systems, system excitation experiment design includes choice of inputs and outputs, test signals, and sampling

intervals. A preliminary study on excitation signal for training data generation has been introduced in section 3.3, where sum of sinusoids is used. Crest factor (Eq. 3.7) is applied to determine the frequency and phase of the signal. The system identification model developed on the training data achieved over 90% accuracy. Therefore, this sum of sinusoids function will also be used in future studies, while, other functions will also be tested if it is necessary.

The experiment design for system identification includes determining input signals, measurements, sampling intervals and how to manipulate measurements. From informative excitation theory, the excitation signals should be persistence exciting, and the order of excitation function should be equal to the number of parameters to be estimated in the system.

The sampling interval is also critical to system identification model performance and identifiability. Fast sampling leads to high-frequency bands problem and increase calculation burden. On the other hand, if the sampling interval is longer than the system natural time constant, the sampling data’s variance increases drastically. The suitable sampling interval lies in the range of tem times the bandwidth of the system [149].

4.4.4 Building Description

To evaluate how different building size, envelope, and HVAC systems affect the SID process, two building sizes with different HVAC systems are used in this study as the objects. One is a small-size single story office building (same as that used in [181]). It has five conditioned zones and an unconditioned attic. The total floor area is 510 m2. The window-to-wall ratio is approximately 21.2%. The overall U-factor of its single pane windows is 3.4 W/m2K with a solar heat gain factor of 0.36. The overall U-factor of the external envelopes is0.68 W/m2K. The R- value and solar absorptivity value of the roof are 5.1 W/m2K and 0.9, respectively. The HVAC

systems used in this building are constant-air-volume (CAV) air handling units (AHUs) with direct expansion (DX) coils.

The other building is a three-story office building, and each floor has five conditioned zones. The total floor area is 4982 m2. The window-to-wall ratio is around 33%, and the U-factor of these windows is 3.3 W/m2K with a solar heat gain factor of 0.36. The R-value and solar absorptivity value of the roof are 0.33 W/m2K and 0.7, respectively. This building uses packaged multi-zone variable-air-volume (VAV) with electric reheat.

Very detailed physics based building simulation models using EnergyPlus [177] are used in this study in lieu of real building systems. The EnergyPlus models are developed by U.S. DOE and are validated by real field data [177]. In the small building model, the coefficients of performance (COP) of the DX coils is modeled using quadratic equation [177]. In the medium building model, the performance of the VAV system is modeled using second-order and third-order polynomial equation [177]. The location of these two building is selected as in Philadelphia, PA, USA for this study. Typical Meteorological Year (TMY) weather data file provided by the DOE is used as the weather input. A virtual building system identification emulator[182]is used here to simulate the building operation and apply the introduced system identification schemes.

4.4.5 Training and Testing Condition

The two buildings, as described in section 4.4.4, are simulated for three time periods under Philadelphia TMY data: 1) from August 1st to August 7th ; 2) from August 1st to August 14th ; 3) from August 22nd to August 28th; and 4) from July 1st to September 30th. Simulated building operation data during time period 1 is used as training data for the SID model. Simulated data during time period 2, which is longer than time period 1, is used as the training data for other models in the comparison study (described in Section 4.7). Simulated data during time period 3 is

used for model adaptation (described in section 4.6) and model uncertainty comparison study (described in section 4.7.2.3). Simulated data during time period 4 is used as the testing data for all models.

Two sets of simulated operating data are generated in this study for the model development and comparison. One set of data assumes no measurement noise and is labeled as noise-free data in the following sections. The other set of data includes measurement noise by adding Gaussian distributed random white noise to each measurement. More details about the noise generation process will be introduced in section 4.7.2.

In document Net-zero Building Cluster Simulations and On-line Energy Forecasting for Adaptive and Real-Time Control and Decisions (Page 116-122)