Laplace's rule of succession in information geometry

Author(s): Yann Ollivier
DOI URL: http://dx.doi.org/10.1007/9783319250403_34
Video: http://www.youtube.com/watch?v=hBzvSaA9yRU
Slides: Ollivier_Laplace rule succession.pdf
Presentation: https://www.see.asso.fr/node/14288
Creative Commons AttributionShareAlike 4.0 InternationalAbstract:
When observing data x1, . . . , x t modelled by a probabilistic distribution pθ(x), the maximum likelihood (ML) estimator θML = arg max θ Σti=1 ln pθ(x i ) cannot, in general, safely be used to predict xt + 1. For instance, for a Bernoulli process, if only “tails” have been observed so far, the probability of “heads” is estimated to 0. (Thus for the standard logloss scoring rule, this results in infinite loss the first time “heads” appears.)