Pierre Baudot - Information Topology: Statistical Physic of Complex Systems and Data Analysis
-
- Pierre Baudot Inserm, UNIS U1072, Marseille, France.
Information Topology: Statistical Physic of Complex Systems and Data Analysis
Download Slides 1
Download Slides 2
Watch video
Download video
Abstract:
The present analysis is based on the information cohomology framework developed in [1], and relies on theorems establishing uniquely the usual entropy and multi-variate mutual information (Ik) as the first class of cohomology and coboundaries respectively, with finite (non-asymptotic) methods. Here [2], in a first
part that justifies theoretically the functions estimated in the data analysis, we establish the coboundary nature of mutual informations more directly by computing the cohomology in higher degrees. It allows to generalize statistical independence to arbitrary degree k, making coincide the ”0” of k-mutual-information
Ik with k-cocycle condition. In a second part, we develop the computationally tractable subcases of simplicial information cohomology. To represent such a structure, we define entropy Hk and information Ik landscapes, which are the basis of the data analysis. They consist in an exhaustive computatin and rep-
resentation of all the information elements of the lattice of subsets. We then settle the space of paths on the simplicial information structure-landscape, relating the existence of local minima to information inequalities and conditional information negativity. Then relating our analysis to n-body systems, we pro-
pose a general expression of the free energy functional in purely informational terms and that holds for arbitrary empirical data. These minima allow to define the maximal (length) chains that characterize a minimum free energy complex. We provide a tractable algorithm in order to approximate this complex. The
last part is dedicated to the investigation of the dependence of the analysis on the sample size and the graining of the probability estimation. The methods and results exposed here are partial, first because for computational reasons we restricted the data analysis to tractable simplicial information and information
path subcases, second because we employed here a basic procedure of probability estimation of the data that can also mask some of the statistical relationships, third because of the small sample size used.
We present the multivariate mutual information (Ik) analysis on single-cell retro-transcription quantitative PCR (sc-RTqPCR) data obtained from midbrain neurons to estimate the k-dimensional topology of their gene expression profiles [2]. 41 mRNAs were quantified and statistical dependences in gene expression levels could be fully described for 21 genes: Ik analysis revealed a complex combinatorial structure including modules of pairs, triplets (up to 6-tuples) sharing strong positive, negative or zero Ik, corresponding to co-varying, clustering and independent sets of genes, respectively. Therefore, Ik analysis simultaneously identified heterogeneity (negative Ik) of the cell population under study and regulatory principles conserved across the population (homogeneity, positive Ik). Moreover, maximum information paths enabled to determine the size and stability of such transcriptional modules. Ik analysis represents a new topological and statistical method of data analysis.
Joint work with Monica Tapia, Jean-Marc Goaillard, Daniel Bennequin [1] and al.
References:
[1] P. Baudot and D. Bennequin. The homological nature of entropy. Entropy, 17(5):3253–3318, 2015.
[2] M. Tapia, P. Baudot, M. Dufour, C. Formisano-Tréziny, S. Temporal, M. Lasserre, J. Gabert, K. Kobayashi, JM. Goaillard . Information topology of gene expression profile in dopaminergic neurons doi: https://doi.org/10.1101/168740
Data analysis Software: INFOTOPO opensource (GNU GPL) available on https://github.com/pierrebaudot/INFOTOPO