Deep Learning Model Selection of Suboptimal Complexity

 
PIIS000523100001252-1-1
DOI10.31857/S000523100001252-1
Publication type Article
Status Published
Authors
Affiliation: Moscow Institute of Physics and Technology
Address: Russian Federation, Moscow
Affiliation: Dorodnicyn Computing Centre Russian Academy of Sciences
Address: Russian Federation, Moscow
Journal nameAvtomatika i Telemekhanika
EditionIssue 8
Pages129-147
Abstract

We consider the problem of model selection for deep learning models of suboptimal complexity. The complexity of a model is understood as the minimum description length of the combination of the sample and the classification or regression model. Suboptimal complexity is understood as an approximate estimate of the minimum description length, obtained with Bayesian inference and variational methods. We introduce probabilistic assumptions about the distribution of parameters. Based on Bayesian inference, we propose the likelihood function of the model. To obtain an estimate for the likelihood, we apply variational methods with gradient optimization algorithms. We perform a computational experiment on several samples.

KeywordsСlassification, regression, deep learning. model selection. Bayesian inference. variational inference. complexity
Received30.09.2018
Publication date30.09.2018
Number of characters703
Cite   Download pdf To download PDF you should sign in
Размещенный ниже текст является ознакомительной версией и может не соответствовать печатной

views: 1551

Readers community rating: votes 0

1. Grunwald P. A Tutorial Introduction to the Minimum Description Length Principle // Advances Minimum Descript. Length: Theory Appl. MIT Press, 2005.

2. Bishop C. Pattern Recognition and Machine Learning (Information Science and Statistics). Secaucus, USA: Springer-Verlag New York, Inc., 2006.

3. Graves A. Practical Variational Inference for Neural Networks // Advances Neural Inform. Proc. Syst. 24 / Ed. by J. Shawe-Taylor, R.S. Zemel, P.L. Bartlett et al. Curran Associat. Inc. 2011. P. 2348–2356.

4. Duvenaud D., Maclaurin D., Adams R. Early Stopping as Nonparametric Variational Inference // Artific. Intelligen. Statist. 2016. P. 1070–1077.

5. Salakhutdinov R., Hinton G. Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure // J. Machine Learning Res. Proc. Track. 2007. V. 2. P. 412–419.

6. Sutskever I., Vinyals O., Le Q. Sequence to Sequence Learning with Neural Networks // Advances Neural Inform. Proc. Syst. 27: Annual Conf. Neural Inform. Proc. Syst. 2014, December 8–13, 2014, Montreal, Quebec, Canada. 2014. P. 3104–3112.

7. MacKay David J.C. Information Theory, Inference & Learning Algorithms. USA: Cambridge Univer. Press, 2002.

8. Maclaurin D., Duvenaud D., Adams R. Gradient-based Hyperparameter Optimization through Reversible Learning // Proc. 32 Int. Conf. Machine Learning (ICML-15). JMLR Workshop Conf. Proc. 2015.

9. Herna´ndez-Lobato J.M., Adams R.P. Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks // Proc. 32 Int. Conf. Machine Learning. 2015. P. 1861–1869.

10. Kuznetsov M.P., Tokmakova A.A., Strijov V.V. Analytic and stochastic methods of structure parameter estimation // Informatica. 2016. V. 27. No. 3. P. 607–624.

11. Shang Y., Wah B. Global optimization for neural network training // Computer. 1996. Mar. V. 29. No. 3. P. 45–54.

12. Welling M., Teh Y. Bayesian Learning via Stochastic Gradient Langevin Dynamics // Proc. 28 Int. Conf. Machine Learning (ICML-11) / Ed. by Lise Getoor, Tobias Scheffer. ICML ’11. NY, USA: ACM, 2011. June. P. 681–688.

13. Dembo A, Cover T., Thomas J. Information theoretic inequalities // Inform. Theory, IEEE Transact. 1991. V. 37. No. 6. P. 1501–1518.

14. Altieri N., Duvenaud D. Variational Inference with Gradient Flows. URL: http://approximateinference.org/accepted/AltieriDuvenaud2015.pdf. Data obrascheniya: 15.03.2017.

15. Sato I., Nakagawa H. Approximation analysis of stochastic gradient langevin dynamics by using fokker-planck equation and ito process // Proc. 31 Int. Conf. Machine Learning (ICML-14). 2014. P. 982–990.

16. Li Chunyuan, Chen Changyou, Carlson David, Carin Lawrence. Preconditioned Stochastic Gradient Langevin Dynamics for deep neural networks // Proc. Thirtieth AAAI Conf. Artific. Intelligence / AAAI Press. 2016. P. 1788–1794.

17. Lichman M. UCI Machine Learning Repository. URL: http://archive.ics.uci.edu/ml. Data obrascheniya: 15.03.2017.

18. LeCun Yann, Cortes Corinna. MNIST handwritten digit database. 2010. http://yann.lecun.com/exdb/mnist/.

19. Maclaurin Dougal, Adams Ryan P. Firefly Monte Carlo: exact MCMC with subsets of data // Proc. 24 Int. Conf. Artific. Intelligence / AAAI Press. 2015. P. 4289–4295.

20. Kod vychislitel'nogo ehksperimenta. URL:svn.code.sf.net/p/mlalgorithms/code/Group074/Bakhteev2016Evidence. Data obrascheniya: 15.03.2017.

21. Lee J., Simchowitz M., Jordan M., Recht B. Gradient descent converges to minimizers // Univer. California, Berkeley. 2016. V. 1050.

Система Orphus

Loading...
Up