Proof of Lemma 8 - Identification of Recurrent Neural Networks by Bayesian Interrogation Techni

One needs to compute the value of the integral

Z dyt+1P(yt+1|{x}t1+1,{y}t1)E[trV|x1t+1,yt1+1]tr Kt+xt+1xTt+1 −1 +xT_t₊₁K−_t₊1₁xt+1 . However, Z dyt+1P(yt+1|{x}t1+1,{y} t 1)E[trV|xt1+1,y t+1 1 ] = Z dyt+1 d

∏

i=1

T(

yt+1)i (βt)i,2(αt)i,(Mtxt+1)i, γt+1 2 d

∑

i=1 (βt+1)i (αt+1)i−1 ,

and depends only on the values ofαt+1andβtas a result of (19) and Lemma 4.4, and is independent

of the value of xt+1. Thus, we arrive at the minimization of the following expression:

tr " K−_t 1₋K −1 t xt+1xTt+1K−t 1 1+xT t+1K−t 1xt+1 ! +xT_t₊₁ K−_t 1₋K −1 t xt+1xTt+1K−t 1 1+xT t+1K−t 1xt+1 ! xt+1 # = " tr(K−_t 1)₋trK −1 t xt+1xtT+1K−t 1 1+xT t+1Kt−1xt+1 + x T t+1Kt−1xt+1 1+xT t+1K−t 1xt+1 # , = " tr(K−_t 1)₋x T t+1Kt−1K−t 1xt+1 1+xT t+1Kt−1xt+1 + x T t+1Kt−1xt+1 1+xT t+1K−t 1xt+1 # , =1+x T t+1Kt−1K−t 1xt+1 1+xT_t₊₁K−t 1xt+1 . References

J. Abonyi, R. Babuska, A. Ayala-Botto, A. Szeifert, and L. Nagy. Identification and control of nonlienar systems using Hammerstein-models. Ind. Eng. Chem. Res, 39:4302–4314, 2000. L. A. Aguirre, M. C. S. Coelho, and M. V. Correa. On the interpretation and practice of dynamical

differences between Hammerstein and Wiener models. In IEE. Proc. of Control Theory Applica-

tion, volume 152, pages 349–354, 2005.

S. Amari, A. Cichocki, and H. H. Yang. A new learning algorithm for blind signal separation. In

Advances in Neural Information Processing Systems, volume 8, pages 757–763, Cambridge, MA,

1996. MIT Press.

B. Anderson and A. Moore. Active learning for hidden Markov models: objective functions and algorithms. In ICML ’05: Proceedings of the 22nd international conference on Machine learning, pages 9–16, New York, NY, USA, 2005. ACM.

F. R. Bach. Active learning for misspecified generalized linear models. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems, volume 19, pages 65–72, Cambridge, MA, 2007. MIT Press.

E. W. Bai. A blind approach to the Hammerstein-Wiener model identification. Automatica, 38: 967–979, 2002.

M. J. Beal. Variational Algorithms for Approximate Bayesian Inference. PhD thesis, Gatsby Com- putational Neuroscience Unit, University College London, 2003.

J. M. Bernardo. Expected information as expected utility. The Annals of Statistics, 7(3):686–690, 1979.

S. A. Billings and I. J. Leontaritis. Identifiaction of nonlinear systems using parametric estimation techniques. In IEE. Proc. of Control and its Applications, pages 183–187, 1981.

X. Boyen and D. Koller. Tractable inference for complex stochastic processes. In Fourteenth

J. F. Cardoso. High-order contrasts for independent component analysis. Neural Computation, 11 (1):157–192, 1999.

R. Castro, J. Jaupt, and R. Nowak. Compressed sensing vs. active learning. In ICASSP 2006, IEEE

International Conference on Acoustics, Speech and Signal Processing, volume 3, 2006a.

R. Castro, R. Willett, and R. Nowak. Faster rates in regression via active learning. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages 179–186. MIT Press, Cambridge, MA, 2006b.

P. Celka, N. J. Bershad, and J. Vesin. Stochastic gradient identification of polynomial Wiener systems: Analysis and application. IEEE Trans. Signal Process., 49(2):301–313, 2001.

K. Chaloner. Optimal bayesian experimental design for linear models. The Annals of Statistics, 12 (1):283–300, 1984.

K. Chaloner and I. Verdinelli. Bayesian experimental design: A review. Statist. Sci., 10:273–304, 1995.

N. Chiras, C. Evans, and D. Rees. Nonlinear gas turibne modeling using NARMAX structures.

IEEE Trans. Instrum. Meas., 50(4):893–898, 2001.

D. A. Cohn. Neural network exploration using optimal experiment design. In Advances in Neural

Information Processing Systems, volume 6, pages 679–686, 1994.

D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. Journal of

Artificial Intelligence Research, 4:129–145, 1996.

T. F. Coleman and Y. Li. A reflective newton method for minimizing a quadratic function subject to bounds on some of the variables. SIAM Journal on Optimization, 6(4):1040–1058, 1996.

P. Comon. Independent component analysis, a new concept? Signal Processing, 36(3):287–314, 1994.

T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience, 1991.

R. C. deCharms, D. T. Blake, and M. M. Merzenich. Optimizing sound features for cortical neurons.

Science, 280:1439–1444, 1998.

G. Duncan and M. H. DeGroot. A mean squared error approach to optimal design theory. In

Proceedings of the 1976 Conference on Information: Sciences and Systems, pages 217–221. The

Johns Hopkins University, 1976.

V. V. Fedorov. Theory of Optimal Experiments. Academic Press, New York, 1972.

P. Földiák. Stimulus optimization in primary visual cortex. Neurocomputing, 38–40:1217–1222, 2001.

Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28:133–168, 1997.

K. Fukumizu. Active learning in multilayer perceptrons. In David S. Touretzky, Michael C. Mozer, and Michael E. Hasselmo, editors, Advances in Neural Information Processing Systems, vol- ume 8, pages 295–301. The MIT Press, 1996.

K. Fukumizu. Statistical active learning in multilayer perceptrons. IEEE Transactions on Neural

Networks, 11(1):17–26, 2000.

M. Gäfvert. Modelling the Furuta pendulum. Technical report ISRN LUTFD2/TFRT–7574–SE, Department of Automatic Control, Lund University, Sweden, April 1998.

A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian Data Analysis. CRC Press, 2nd erdition, 2003.

Z. Ghahramani. Online variational Bayesian learning, 2000. Slides from talk presented at NIPS 2000 workshop on Online Learning.

G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins, Baltimore, MD, 3rd ed. edition, 1996.

A. K. Gupta and D. K. Nagar. Matrix Variate Distributions, volume 104 of Monographs and Surveys

in Pure and Applied Mathematics. Chapman and Hall/CRC, 1999.

R. Haber and H. Unbehauen. Structure identification of nonlinear dynamic systems– a survey on input/output approaches. Automatica, 26(4):651–677, 1990.

D. A. Harville. Matrix Algebra From a Statistician’s Perspective. Springer-Verlag, 1997.

A. Honkela and H. Valpola. On-line variational Bayesian learning. In 4th International Symposium

on Independent Component Analysis and Blind Signal Separation, pages 803–808, 2003.

A. Hyvärinen. Fast and robust fixed-point algorithms for independent component analysis. IEEE

Trans. on Neural Networks, (10):626–634, 1999.

A. Hyvärinen, J. Karhunen, and E. Oja. Independent Component Analysis. John Wiley, New York,

2001. URLhttp://www.cis.hut.fi/projects/ica/book/.

A. Hyvärinen. Independent component analysis for time-dependent stochastic processes. In Proc.

of ICANN’98, International Conference on Artificial Neural Networks, Skövde, Sweden, pages

541–546, 1998.

H. Jaeger. Short term memory in echo state networks. GMD Report, 152, Fraunhofer AIS, 2001. http://publica.fraunhofer.de/starweb/pub08/en/index.htm.

C. Jutten and J. Hérault. Blind separation of sources: An adaptive algorithm based on neuromimetic architecture. Signal Processing, 24:1–10, 1991.

J. Kiefer. Optimum experimental designs. Journal of the Royal Statistical Society, Series B, 21: 272–304, 1959.

S. Kotz and S. Nadarajah. Multivariate T-Distributions and Their Applications. Cambridge Univer- sity Press, 2004.

A. Krause and C. Guestrin. Nonmyopic active learning of gaussian processes: an exploration- exploitation approach. In ICML ’07: Proceedings of the 24th international conference on Ma-

chine learning, pages 449–456, New York, NY, USA, 2007. ACM.

J. Lewi, R. Butera, and L. Paninski. Real-time adaptive information-theoretic optimization of neu- rophysiology experiments. In Advances in Neural Information Processing Systems, volume 19, 2007.

D. D. Lewis and J. Catlett. Heterogeneous uncertainty sampling for supervised learning. In

William W. Cohen and Haym Hirsh, editors, Proceedings of ICML-94, 11th International Con-

ference on Machine Learning, pages 148–156, New Brunswick, US, 1994. Morgan Kaufmann

Publishers, San Francisco, US.

D. D. Lewis and W. A. Gale. A sequential algorithm for training text classifiers. In Proceedings

of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval, pages 3–12, Dublin, IE, 1994. Springer Verlag, Heidelberg, DE.

D. V. Lindley. Bayesian Statistics: A Review. SIAM, 1971.

W. Maass, T. Natschläger, and H. Markram. Real-time computing without stable states: A new framework for neural computation based on perturbations. Neural Computation, 14:2531–2560, 2002.

C. K. Machens, T. Gollisch, O. Kolesnikova, and A. V. M. Herz. Testing the efficiency of sensory coding with optimal stimulus ensembles. Neuron, 47:447–456, 2005.

D. J. C. MacKay. Information-based objective functions for active data selection. Neural Compu-

tation, 4(4):590–604, 1992.

T. Minka. Bayesian linear regression, 2000. MIT Media Lab note.

T. Minka. A Family of Algorithms for Approximate Bayesian Inference. PhD thesis, MIT Media Lab, MIT, 2001.

M. Opper and O. Winther. A Bayesian approach to online learning. In Online Learning in Neural

Networks. Cambridge University Press, 1999.

R. K. Pearson and M. Pottmann. Gray-box identification of block-oriented nonlnear models. Journal

of Process Control, 10:301–315, 2000.

F. Pukelsheim. Optimal Design of Experiments. John Wiley & Sons, 1993. H. Rabitz. Shaped laser pulses as reagents. Science, 299:525–527, 2003.

H. Raiffa and R. Schlaifer. Applied Statistical Decision Theory. Boston, MIT Press, 1961.

R. Rangarajan, R. Raich, and A. O. Hero. Optimal sequential energy allocation for inverse problems.

IEEE Journal of Selected Topics in Signal Processing, 1:67–78, 2007.

N. Roy and A. McCallum. Toward optimal active learning through sampling estimation of error reduction. In Proc. 18th International Conference on Machine Learning, pages 441–448. Morgan Kaufmann, San Francisco, CA, 2001.

A. I. Schein. Active Learning for Logistic Regression. PhD thesis, University of Pennsylvania, 2005.

A. I. Schein and L. H. Ungar. Active learning for logistic regression: an evaluation. Machine

Learning, 68(3):235–265, 2007.

H. S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Computational Learning

Theory, 1992.

S. Solla and O. Winther. Optimal perceptron learning: An online Bayesian approach. In Online

Learning in Neural Networks. Cambridge University Press, 1999.

D. M. Steinberg and W.G. Hunter. Experimental design: review and comment. Technometrixs, 26: 71–97, 1984.

M. Stone. Application of a measure of information to the design and comparison of regression experiments. Ann. Math. Statist, 30(1):55–70, 1959.

M. Sugiyama. Active learning in approximately linear regression based on conditional expectation of generalization error. The Journal of Machine Learning Research, 7:141–166, 2006.

Z. Szabó and A. L˝orincz. Towards independent subspace analysis in controlled dynamical systems. ICA Research Network International Workshop, 2008.

B. Toman and J. L. Gastwirth. Robust Bayesian experimental design and estimation for analysis of variance models using a class of normal mixtures. Journal of statistical planning and inference, 35(3):383–398, 1993.

S. Tong and D. Koller. Active learning for parameter estimation in Bayesian networks. In Advances

in Neural Information Processing Systems, pages 647–653, 2000.

S. Tong and D. Koller. Active learning for structure in Bayesian networks. In Proceedings of the

International Joint Conference on Artificial Intelligence, 2001a.

S. Tong and D. Koller. Support vector machine active learning with applications to text classifica- tion. Journal of Machine Learning Research, pages 45–66, 2001b.

T. M. Vaughan, W. J. Heetderks, L. J. Trejo, W. Z. Rymer, M. Weinrich, M. M. Moore, A. Kübler, B. H. Dobkin, N. Birbaumer, E. Donchin, E. W. Wolpaw, and J. R. Wolpaw. Brain-computer interface technology: a review of the Second International Meeting. IEEE Transactions on Neural

Systems and Rehabilitation Engineering, 11:94–109, 2003.

I. Verdinelli. A note on Bayesian design for the normal linear model with unknown error variance.

Biometrika, 87:222–227, 2000.

M. Yamakita, M. Iwashiro, Y. Sugahara, and K. Furuta. Robust swing-up control of double pendu- lum. In American Control Conference, volume 1, pages 290–295, 1995.

K. Zografos and S. Nadarajah. Expressions for Rényi and Shannon entropies for multivariate dis- tributions. Statistics and Probability Letters, 71(1):71–84, 2005.

In document Identification of Recurrent Neural Networks by Bayesian Interrogation Techniques (Page 35-40)