views: 1551
Readers community rating: votes 0
1. Grunwald P. A Tutorial Introduction to the Minimum Description Length Principle // Advances Minimum Descript. Length: Theory Appl. MIT Press, 2005.
2. Bishop C. Pattern Recognition and Machine Learning (Information Science and Statistics). Secaucus, USA: Springer-Verlag New York, Inc., 2006.
3. Graves A. Practical Variational Inference for Neural Networks // Advances Neural Inform. Proc. Syst. 24 / Ed. by J. Shawe-Taylor, R.S. Zemel, P.L. Bartlett et al. Curran Associat. Inc. 2011. P. 2348–2356.
4. Duvenaud D., Maclaurin D., Adams R. Early Stopping as Nonparametric Variational Inference // Artific. Intelligen. Statist. 2016. P. 1070–1077.
5. Salakhutdinov R., Hinton G. Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure // J. Machine Learning Res. Proc. Track. 2007. V. 2. P. 412–419.
6. Sutskever I., Vinyals O., Le Q. Sequence to Sequence Learning with Neural Networks // Advances Neural Inform. Proc. Syst. 27: Annual Conf. Neural Inform. Proc. Syst. 2014, December 8–13, 2014, Montreal, Quebec, Canada. 2014. P. 3104–3112.
7. MacKay David J.C. Information Theory, Inference & Learning Algorithms. USA: Cambridge Univer. Press, 2002.
8. Maclaurin D., Duvenaud D., Adams R. Gradient-based Hyperparameter Optimization through Reversible Learning // Proc. 32 Int. Conf. Machine Learning (ICML-15). JMLR Workshop Conf. Proc. 2015.
9. Herna´ndez-Lobato J.M., Adams R.P. Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks // Proc. 32 Int. Conf. Machine Learning. 2015. P. 1861–1869.
10. Kuznetsov M.P., Tokmakova A.A., Strijov V.V. Analytic and stochastic methods of structure parameter estimation // Informatica. 2016. V. 27. No. 3. P. 607–624.
11. Shang Y., Wah B. Global optimization for neural network training // Computer. 1996. Mar. V. 29. No. 3. P. 45–54.
12. Welling M., Teh Y. Bayesian Learning via Stochastic Gradient Langevin Dynamics // Proc. 28 Int. Conf. Machine Learning (ICML-11) / Ed. by Lise Getoor, Tobias Scheffer. ICML ’11. NY, USA: ACM, 2011. June. P. 681–688.
13. Dembo A, Cover T., Thomas J. Information theoretic inequalities // Inform. Theory, IEEE Transact. 1991. V. 37. No. 6. P. 1501–1518.
14. Altieri N., Duvenaud D. Variational Inference with Gradient Flows. URL: http://approximateinference.org/accepted/AltieriDuvenaud2015.pdf. Data obrascheniya: 15.03.2017.
15. Sato I., Nakagawa H. Approximation analysis of stochastic gradient langevin dynamics by using fokker-planck equation and ito process // Proc. 31 Int. Conf. Machine Learning (ICML-14). 2014. P. 982–990.
16. Li Chunyuan, Chen Changyou, Carlson David, Carin Lawrence. Preconditioned Stochastic Gradient Langevin Dynamics for deep neural networks // Proc. Thirtieth AAAI Conf. Artific. Intelligence / AAAI Press. 2016. P. 1788–1794.
17. Lichman M. UCI Machine Learning Repository. URL: http://archive.ics.uci.edu/ml. Data obrascheniya: 15.03.2017.
18. LeCun Yann, Cortes Corinna. MNIST handwritten digit database. 2010. http://yann.lecun.com/exdb/mnist/.
19. Maclaurin Dougal, Adams Ryan P. Firefly Monte Carlo: exact MCMC with subsets of data // Proc. 24 Int. Conf. Artific. Intelligence / AAAI Press. 2015. P. 4289–4295.
20. Kod vychislitel'nogo ehksperimenta. URL:svn.code.sf.net/p/mlalgorithms/code/Group074/Bakhteev2016Evidence. Data obrascheniya: 15.03.2017.
21. Lee J., Simchowitz M., Jordan M., Recht B. Gradient descent converges to minimizers // Univer. California, Berkeley. 2016. V. 1050.