Deep Learning Model Selection of Suboptimal Complexity

We consider the problem of model selection for deep learning models of suboptimal complexity. The complexity of a model is understood as the minimum description length of the combination of the sample and the classification or regression model. Suboptimal complexity is understood as an approximate estimate of the minimum description length, obtained with Bayesian inference and variational methods. We introduce probabilistic assumptions about the distribution of parameters. Based on Bayesian inference, we propose the likelihood function of the model. To obtain an estimate for the likelihood, we apply variational methods with gradient optimization algorithms. We perform a computational experiment on several samples.

Keywords

Сlassification, regression, deep learning. model selection. Bayesian inference. variational inference. complexity

Received

30.09.2018

Publication date

30.09.2018

Number of characters

703

Cite Download pdf To download PDF you should sign in

GOST	Bakhteev O., Strijov V. Deep Learning Model Selection of Suboptimal Complexity // Avtomatika i Telemekhanika – 2018. – Issue 8 C. 129-147 [Electronic resource]. URL: http://ras.jes.su/ait/s207987840000499-7-1-en (circulation date: 23.07.2024). DOI: 10.31857/S000523100001252-1
MLA	Bakhteev, Oleg, Strijov, Vadim "Deep Learning Model Selection of Suboptimal Complexity." Avtomatika i Telemekhanika 8 (2018):129-147. DOI: 10.31857/S000523100001252-1
APA	Bakhteev O., Strijov V. (2018). Deep Learning Model Selection of Suboptimal Complexity. Avtomatika i Telemekhanika (8), pp.129-147 DOI: 10.31857/S000523100001252-1

Размещенный ниже текст является ознакомительной версией и может не соответствовать печатной

Readers community rating: votes 0

1. Grunwald P. A Tutorial Introduction to the Minimum Description Length Principle // Advances Minimum Descript. Length: Theory Appl. MIT Press, 2005.

2. Bishop C. Pattern Recognition and Machine Learning (Information Science and Statistics). Secaucus, USA: Springer-Verlag New York, Inc., 2006.

3. Graves A. Practical Variational Inference for Neural Networks // Advances Neural Inform. Proc. Syst. 24 / Ed. by J. Shawe-Taylor, R.S. Zemel, P.L. Bartlett et al. Curran Associat. Inc. 2011. P. 2348–2356.

4. Duvenaud D., Maclaurin D., Adams R. Early Stopping as Nonparametric Variational Inference // Artific. Intelligen. Statist. 2016. P. 1070–1077.

5. Salakhutdinov R., Hinton G. Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure // J. Machine Learning Res. Proc. Track. 2007. V. 2. P. 412–419.

6. Sutskever I., Vinyals O., Le Q. Sequence to Sequence Learning with Neural Networks // Advances Neural Inform. Proc. Syst. 27: Annual Conf. Neural Inform. Proc. Syst. 2014, December 8–13, 2014, Montreal, Quebec, Canada. 2014. P. 3104–3112.

7. MacKay David J.C. Information Theory, Inference & Learning Algorithms. USA: Cambridge Univer. Press, 2002.

8. Maclaurin D., Duvenaud D., Adams R. Gradient-based Hyperparameter Optimization through Reversible Learning // Proc. 32 Int. Conf. Machine Learning (ICML-15). JMLR Workshop Conf. Proc. 2015.

9. Herna´ndez-Lobato J.M., Adams R.P. Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks // Proc. 32 Int. Conf. Machine Learning. 2015. P. 1861–1869.

10. Kuznetsov M.P., Tokmakova A.A., Strijov V.V. Analytic and stochastic methods of structure parameter estimation // Informatica. 2016. V. 27. No. 3. P. 607–624.

11. Shang Y., Wah B. Global optimization for neural network training // Computer. 1996. Mar. V. 29. No. 3. P. 45–54.

12. Welling M., Teh Y. Bayesian Learning via Stochastic Gradient Langevin Dynamics // Proc. 28 Int. Conf. Machine Learning (ICML-11) / Ed. by Lise Getoor, Tobias Scheffer. ICML ’11. NY, USA: ACM, 2011. June. P. 681–688.

13. Dembo A, Cover T., Thomas J. Information theoretic inequalities // Inform. Theory, IEEE Transact. 1991. V. 37. No. 6. P. 1501–1518.

14. Altieri N., Duvenaud D. Variational Inference with Gradient Flows. URL: http://approximateinference.org/accepted/AltieriDuvenaud2015.pdf. Data obrascheniya: 15.03.2017.

15. Sato I., Nakagawa H. Approximation analysis of stochastic gradient langevin dynamics by using fokker-planck equation and ito process // Proc. 31 Int. Conf. Machine Learning (ICML-14). 2014. P. 982–990.

16. Li Chunyuan, Chen Changyou, Carlson David, Carin Lawrence. Preconditioned Stochastic Gradient Langevin Dynamics for deep neural networks // Proc. Thirtieth AAAI Conf. Artific. Intelligence / AAAI Press. 2016. P. 1788–1794.

17. Lichman M. UCI Machine Learning Repository. URL: http://archive.ics.uci.edu/ml. Data obrascheniya: 15.03.2017.

18. LeCun Yann, Cortes Corinna. MNIST handwritten digit database. 2010. http://yann.lecun.com/exdb/mnist/.

19. Maclaurin Dougal, Adams Ryan P. Firefly Monte Carlo: exact MCMC with subsets of data // Proc. 24 Int. Conf. Artific. Intelligence / AAAI Press. 2015. P. 4289–4295.

20. Kod vychislitel'nogo ehksperimenta. URL:svn.code.sf.net/p/mlalgorithms/code/Group074/Bakhteev2016Evidence. Data obrascheniya: 15.03.2017.

21. Lee J., Simchowitz M., Jordan M., Recht B. Gradient descent converges to minimizers // Univer. California, Berkeley. 2016. V. 1050.