Find all the docs and tutorials of the version 0.2.3 in the read the docs website:
N.B.: This is still an alpha release! Please send me your feedback: I will polish the user interface, implement Hausdorff divergences, add support for meshes, images, volumes and clean the documentation over the summer of 2020.
The GeomLoss library provides efficient GPU implementations for:
Kernel norms (also known as Maximum Mean Discrepancies).
Hausdorff divergences, which are positive definite generalizations of the ICP loss, analogous to log-likelihoods of Gaussian Mixture Models.
Unbiased Sinkhorn divergences, which are cheap yet positive definite approximations of Optimal Transport (Wasserstein) costs.
These loss functions, defined between positive measures, are available through the custom PyTorch layers SamplesLoss, ImagesLoss and VolumesLoss which allow you to work with weighted point clouds (of any dimension), density maps and volumetric segmentation masks. Geometric losses come with three backends each:
A simple tensorized implementation, for small problems (< 5,000 samples).
A reference online implementation, with a linear (instead of quadratic) memory footprint, that can be used for finely sampled measures.
A very fast multiscale code, which uses an octree-like structure for large-scale problems in dimension <= 3.
GeomLoss is a simple interface for cutting-edge Optimal Transport algorithms. It provides:
Note, however, that SamplesLoss does not implement the Fast Multipole or Fast Gauss transforms. If you are aware of a well-packaged implementation of these algorithms on the GPU, please contact me!
The divergences implemented here are all symmetric, positive definite and therefore suitable for measure-fitting applications. For positive input measures 𝛼 and 𝛽, our Loss
functions are such that
Loss(𝛼,𝛽) = Loss(𝛽,𝛼),
0 = Loss(𝛼,𝛼) ⩽ Loss(𝛼,𝛽),
0 = Loss(𝛼,𝛽) ⟺ 𝛼=𝛽.
GeomLoss can be used in a wide variety of settings, from shape analysis (LDDMM, optimal transport…) to machine learning (kernel methods, GANs…) and image processing. Details and examples are provided below:
GeomLoss is licensed under the MIT license.
Author and Contributors
Feel free to contact us for any bug report or feature request:
Related projects
You may be interested by:
The KeOps library, which provides efficient CUDA routines for point cloud processing, with full PyTorch support.
Rémi Flamary and Nicolas Courty’s Python Optimal Transport library, which provides a reference implementation of OT-related methods for small problems.
Bernhard Schmitzer’s Optimal Transport toolbox, which provides a reference multiscale solver for the OT problem, on the CPU.
OFFICIAL WEBSITE - DOWLOADS - MANUALS
ACCELERATED DATA SCIENCE
The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs.
Learn more
SCALE OUT ON GPUS
Seamlessly scale from GPU workstations to multi-GPU servers and multi-node clusters with Dask.
Learn more about Dask
PYTHON INTEGRATION
Accelerate your Python data science toolchain with minimal code changes and no new tools to learn.
TOP MODEL ACCURACY
Increase machine learning model accuracy by iterating on models faster and deploying them more frequently.
REDUCED TRAINING TIME
Drastically improve your productivity with more interactive data science.
Learn more about XGBoost
OPEN SOURCE
RAPIDS is an open source project. Supported by NVIDIA, it also relies on numba, apache arrow, and many more open source projects.
Learn more
It provides the following solvers:
Some demonstrations (both in Python and Jupyter Notebook format) are available in the examples folder.
Using and citing the toolbox
If you use this toolbox in your research and find it useful, please cite POT using the following bibtex reference:
@misc{flamary2017pot,
title={POT Python Optimal Transport library},
author={Flamary, R{'e}mi and Courty, Nicolas},
url={https://github.com/rflamary/POT},
year={2017}
}
Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction. The algorithm is founded on three assumptions about the data
From these assumptions it is possible to model the manifold with a fuzzy topological structure. The embedding is found by searching for a low dimensional projection of the data that has the closest possible equivalent fuzzy topological structure.
The details for the underlying mathematics can be found in our paper on ArXiv:
The important thing is that you don't need to worry about that -- you can use UMAP right now for dimension reduction and visualisation as easily as a drop in replacement for scikit-learn's t-SNE.
Documentation is available via ReadTheDocs.
Installation, licence, how to use information is avalaible on
GITHUB - OFFICIAL WEBSITE](https://github.com/lmcinnes/umap)
Benefits of UMAP
UMAP has a few signficant wins in its current incarnation.
Performance and Examples
UMAP is very efficient at embedding large high dimensional datasets. In particular it scales well with both input dimension and embedding dimension. Thus, for a problem such as the 784-dimensional MNIST digits dataset with 70000 data samples, UMAP can complete the embedding in around 2.5 minutes (as compared with around 45 minutes for most t-SNE implementations). Despite this runtime efficiency UMAP still produces high quality embeddings.
The obligatory MNIST digits dataset, embedded in 2 minutes and 22 seconds using a 3.1 GHz Intel Core i7 processor (n_neighbors=10, min_dist=0 .001):
UMAP embedding of MNIST digits
The MNIST digits dataset is fairly straightforward however. A better test is the more recent "Fashion MNIST" dataset of images of fashion items (again 70000 data sample in 784 dimensions). UMAP produced this embedding in 2 minutes exactly (n_neighbors=5, min_dist=0.1):
UMAP embedding of "Fashion MNIST"
The UCI shuttle dataset (43500 sample in 8 dimensions) embeds well under correlation distance in 2 minutes and 39 seconds (note the longer time required for correlation distance computations):
UMAP embedding the UCI Shuttle dataset
The Numerical Tours of Data Sciences
The Numerical Tours of Data Sciences, by Gabriel Peyré, gather Matlab, Python, Julia and R experiments to explore modern mathematical data sciences. They cover data sciences in a broad sense, including imaging, machine learning, computer vision and computer graphics. It showcases application of numerical and mathematical methods such as convex optimization, PDEs, optimal transport, inverse problems, sparsity, etc. The tours are complemented by slides of courses detailing the theory and the algorithms.
Numerical Tours now in R
Link - 35 R tours available
Posted by Gabriel Peyré on February 26, 2018
Numerical Tours on Machine Learning
Link - 4 new Matlab and Python tours
Posted by Gabriel Peyré on August 11, 2017
Numerical Tours now in Julia
Link - 30 Julia tours available
Posted by Gabriel Peyré on August 5, 2017
Numerical Tours now in Python
Link - 30 Python tours available
Posted by Gabriel Peyré on September 17, 2016
New Python Tours
Optimization by Laurent Condat
Posted by Gabriel Peyré on June 14, 2016
Numerical Tours now in R
OFFICIAL WEBPAGE
The R tours, that can be browsed as HTML pages, but can also be downloaded as Jupyter notebooks. Please read the installation page for more information about how to run these tours.
Basics
Wavelets
Approximation, Coding and Compression
Denoising
Inverse Problems
Optimization
Machine Learning
Shapes
Audio Processing
Computer Graphics
Mesh Parameterization and Deformation
Geodesic Processing
Optimal Transport
OFFICIAL WEBPAGE
hese are the Python tours, that can be browsed as HTML pages, but can also be downloaded as Jupyter notebooks. Please read the installation page for more information about how to run these tours.
Basics
Wavelets
Approximation, Coding and Compression
Denoising
Inverse Problems
Optimization
Shapes
Audio Processing
Computer Graphics
Mesh Parameterization and Deformation
Geodesic Processing
Optimal Transport
Machine Learning
]]>pyRiemann is a Python machine learning library based on scikit-learn API. It provides a high-level interface for classification and manipulation of multivariate signal through Riemannian Geometry of covariance matrices.
pyRiemann aim at being a generic package for multivariate signal classification but has been designed around applications of biosignal (M/EEG, EMG, etc) classification.
For a brief introduction to the ideas behind the package, you can read the introductory notes. More practical information is on the installation page. You may also want to browse the example gallery to get a sense for what you can do with pyRiemann and then check out the tutorial and API reference to find out how.
To see the code or report a bug, please visit the github repository.
Documentation
]]>OFFICIAL WEBPAGE - INFORMATIONS - DOWNLOADS
Contact us by email at admin [at] ramp.studio if you are interested in using the platform in your classroom, as an internal tool for prototyping in your data science team, or to launch a data challenge. Consider joining our slack team if you would like to be part of the growing community of rampers.
Paris-Saclay Center of Data Science: RAMPs are organized by the Paris Saclay Center for Data Science: A multi-disciplinary initiative to define, structure, and manage the data science ecosystem at the University Paris-Saclay.
What is a RAMP?
A RAMP is a collaborative data challenge. See here for more details.
Bibliography:
Team
Alumni
]]>If you are new to the project, don't forget to add your name there!
]]>OFFICIAL WEBSITE - TUTAORIAL - DOWNLOAD
Description
pyMEF is a Python framework allowing to manipulate, learn, simplify and compare mixtures of exponential families. It is designed to ease the use of various exponential families in mixture models.
See also jMEF for a Java implementation of the same kind of library and libmef for a faster C implementation.
What are exponential families?
An exponential family is a generic set of probability distributions that admit the following canonical distribution:
Exponential families are characterized by the log normalizer function F, and include the following well-known distributions: Gaussian (generic, isotropic Gaussian, diagonal Gaussian, rectified Gaussian or Wald distributions, lognormal), Poisson, Bernoulli, binomial, multinomial, Laplacian, Gamma (incl. chi-squared), Beta, exponential, Wishart, Dirichlet, Rayleigh, probability simplex, negative binomial distribution, Weibull, von Mises, Pareto distributions, skew logistic, etc.
Mixtures of exponential families provide a generic framework for handling Gaussian mixture models (GMMs also called MoGs for mixture of Gaussians), mixture of Poisson distributions, and Laplacian mixture models as well.
Tutorials
A generic tutorial on the exponential families and the simplification of mixture models have been made during the workshop Matrix Information Geometries.
More pyMEF specific tutorials are available here:
Basic manipulation of mixture models
Bibliography
See also jMEF for a Java implementation of the same kind of library and libmef for a faster C implementation.
]]>“Intelligence is the faculty of manufacturing artificial objects, especially tools to make tools, and of indefinitely varying the manufacture.” Henri Bergson
GSI forge presents and lists packages and softwares usually opensource (Python and associated Github depositories, R and associated CRAN-R depositories) that can be useful in the statistical and informational analysis of data with a geometrical or topological approach.
Venus at the Forge of Vulcan, Le Nain Brothers, Musée Saint-Denis, Reims (Vulcan is the god of fire and god of metalworking and the forge, often depicted with a blacksmith’s hammer).
Cartan's father Joseph (1837-1917) was born in the village of Saint Victor de Morestel, which is 13 kilometers from Dolomieu. After he married Anne Cottaz (1841-1927) the family settled in Dolomieu, where Anne had lived. Joseph Cartan was the village blacksmith. Elie Cartan recalled that his childhood had passed under "blows of the anvil, which started every morning from dawn", and that "his mother, during those rare minutes when she was free from taking care of the children and the house, was working with a spinning-wheel".
The village Dolomieu.
The Computational Geometry Algorithms Library
OFFICIAL WEBSITE - MANUAL - TUTORIAL - REFERENCES
CGAL (Computational Geometry Algorithms Library) is a comprehensive library of geometric algorithms. The goal of CGAL is to advance the state of the art of geometric computing and to offer robust and efficient programs for research purpose and industrial applications. The initial development of CGAL is a joint effort of six groups in Europe partially funded by European Projects. The library consists of about 1,000,000 lines of C++ code with users all over the world. Since november 2003, CGAL is an Open Source Project. The spin-off Geometry Factory sells CGAL commercial licenses, support for CGAL and customized developments based on CGAL.
The library offers data structures and algorithms like triangulations, Voronoi diagrams, Boolean operations on polygons and polyhedra, point set processing, arrangements of curves, surface and volume mesh generation, geometry processing, alpha shapes, convex hull algorithms, shape analysis, AABB and KD trees...
Learn more about CGAL by browsing through the Package Overview.
]]>OFFICIAL WEBSITE - DOWNLOAD -RESARCH - MANUAL
GUDHI – Geometry Understanding in Higher Dimensions
The GUDHI library is a generic open source C++ library, with a Python interface, for Topological Data Analysis (TDA) and Higher Dimensional Geometry Understanding. The library offers state-of-the-art data structures and algorithms to construct simplicial complexes and compute persistent homology.
The library comes with data sets, demos, examples and test suites.
The GUDHI library is developed as part of the GUDHI project supported by the European Research Council.
New release Gudhi-3.1.1 is a bug-fix release. In particular, it fixes the installation of the Python representation module.
]]>INFOTOPO - Topological Information Data Analysis. Deep statistical unsupervised and supervised learning.
The INFOTOPO library is a generic open source suite of Python Programs (compatible with Python 3.4.x, on Linux, windows, or mac) for Information Topological Data Analysis. It is available on Github depository. The library offers state-of-the-art statistical high dimensional data structures analysis and algorithms to detect covarying patterns and clusters, multiscale data analysis.
New release (easy to use: scikit and sklearn compatible and format, and with pip install august 2020):
INFOTOPO version 0.1
You can find the software on github.
InfoTopo is a Machine Learning method based on Information Cohomology, a cohomology of statistical systems [0,1,8,9]. It allows to estimate higher order statistical structures, dependences and (refined) independences or generalised (possibly non-linear) correlations and to uncover their structure as simplicial complex. It provides estimations of the basic information functions, entropy, joint and condtional, multivariate Mutual-Informations (MI) and conditional MI, Total Correlations…
InfoTopo is at the cross-road of Topological Data Analysis, Deep Neural Network learning, statistical physics and complex systems:
It assumes basically:
The details for the underlying mathematics and methods can be found in the papers:
[0] Manin, Y., Marcolli, M., Homotopy Theoretic and Categorical Models of Neural Information Networks, 2020, arXiv:2006.15136, PDF-0
[1] Vigneaux J., Topology of Statistical Systems. A Cohomological Approach to Information Theory. Ph.D. Thesis, Paris 7 Diderot University, Paris, France, June 2019. PDF-1
[2] Baudot P., Tapia M., Bennequin, D. , Goaillard J.M., Topological Information Data Analysis. 2019, Entropy, 21(9), 869 PDF-2
[3] Baudot P., The Poincaré-Shannon Machine: Statistical Physics and Machine Learning aspects of Information Cohomology. 2019, Entropy , 21(9), PDF-3
[4] Baudot P. , Bernardi M., The Poincaré-Boltzmann Machine: passing the information between disciplines, ENAC Toulouse France. 2019 PDF-4
[5] Baudot P. , Bernardi M., Information Cohomology methods for learning the statistical structures of data. DS3 Data Science, Ecole Polytechnique 2019 PDF-5
[6] Tapia M., Baudot P., Dufour M., Formizano-Treziny C., Temporal S., Lasserre M., Kobayashi K., Goaillard J.M.. Neurotransmitter identity and electrophysiological phenotype are genetically coupled in midbrain dopaminergic neurons. Scientific Reports. 2018. PDF-6
[7] Baudot P., Elements of qualitative cognition: an Information Topology Perspective. Physics of Life Reviews. 2019. extended version on Arxiv. PDF-7
[8] Baudot P., Bennequin D., The homological nature of entropy. Entropy, 2015, 17, 1-66; doi:10.3390. PDF-8
[9] Baudot P., Bennequin D., Topological forms of information. AIP conf. Proc., 2015. 1641, 213. PDF-9
The previous version of the software INFOTOPO : the 2013-2017 scripts are available at Github infotopo
The INFOTOPO library is developed as part of the Channelomics project supported by the European Research Council, developped at UNIS Inserm 1072, and thanks previously to supports and hostings since 2007 of Max Planck Institute for Mathematic in the Sciences (MPI-MIS) and Complex System Instititute Paris-Ile-de-France (ISC-PIF) and Institut de Mathématiques de Jussieu - Paris Rive Gauche (IMJ-PRG)
OFFICIAL WEBSITE - INSTALLATION -TUTORIALS - MANUAL
A Matlab toolbox for optimization on manifolds
Optimization on manifolds is a powerful paradigm to address nonlinear optimization problems. With Manopt, it is easy to deal with various types of symmetries and constraints which arise naturally in applications, such as orthonormality and low rank.
Manifolds?
Manifolds are mathematical sets with a smooth geometry, such as spheres. If you are facing a nonlinear (and possibly nonconvex) optimization problem with nice-looking constraints, symmetries or invariance properties, Manopt may just be the tool for you. Check out the manifolds library to find out!
Key features
Manopt comes with a large library of manifolds and ready-to-use Riemannian optimization algorithms. It is well documented and includes diagnostics tools to help you get started quickly. It provides flexibility in describing your cost function and incorporates an optional caching system for more efficiency.
It's open source
Check out the license and let us know how you use Manopt. Please cite this paper if you publish work using Manopt (bibtex).
]]>OFFICIAL WEBSITE - DOCUMENTATION INSTALLATION