• Geo-Sci-Info

    Capture.PNG

    MATHEMATICAL FOUNDATIONS OF MACHINE LEARNING
    Associate Prof. Dr.Sc. Hông Vân Lê
    photo___20150506093223_16.jpg

    DOWNLOAD PDF of the flyer of the course
    DOWNLOAD PDF of the LECTURE NOTES of the course

    Machine learning is an interdisciplinary field in the intersection of mathematical statistics and computer sciences. Machine learning studies statistical models and algorithms for deriving predictors, or meaningful patterns from empirical data. Machine learning techniques are applied in search engine, speech recognition and natural language processing, image detection, robotics etc. In our course we address the following questions:
    What is the mathematical model of learning? How to quantify the difficulty/hardness/complexity of a learning problem? How to choose a learningmodel and learning algorithm? How to measure success of machine learning?
    The syllabus of our course:

    1. Supervised learning, unsupervised learning
    2. Generalization ability of machine learning
    3. Support vector machine, Kernel machine
    4. Neural networks and deep learning
    5. Bayesian machine learning and Bayesian networks.

    Recommended Literature.

    1. S. Shalev-Shwart, and S. Ben-David, Understanding Machine Learning:
      From Theory to Algorithms, Cambridge University Press, 2014.
    2. Sergios Theodoridis, Machine Learning A Bayesian and Optimization
      Perspective, Elsevier, 2015.
    3. M. Mohri, A. Rostamizadeh, A. Talwalkar, Foundations of Machine
      Learning, MIT Press, 2012.
    4. H. V. Lˆe, Mathematical foundations of machine learning, lecture note
      http://users.math.cas.cz/hvle/MFML.pdf

    During the course we shall discuss topics for term paper assignment which
    could be qualified as the exam.

    The first meeting shall take place at 10:40 AM Thursday October 2019, in the seminar room MU MFF UK (3rd floor). Anybody
    interested in the lecture course please contact me per email hvle [ at] math.cas.cz
    for arranging more suitable lecture time.

    Location : Address: Institute of Mathematics of Czech Academy of Sciences, Zitna 25, 11567 Praha 1, Czech Republic

    CALENDAR OF FUTURE COURSE

    Lecture course (NMAG 469, Fall term 2019-2020)

    • Mathematical foundations of machine learning The first meeting: Octobber 03, Thursday, 10.40-12.10, in the seminar room MU MFF UK (3rd floor).

    CONTENTS

    1. Learning, machine learning and artificial intelligence
      1.1. Learning, inductive learning and machine learning
      1.2. A brief history of machine learning
      1.3. Current tasks and types of machine learning
      1.4. Basic questions in mathematical foundations of machine
      learning
      1.5. Conclusion
    2. Statistical models and frameworks for supervised learning
      2.1. Discriminative model of supervised learning
      2.2. Generative model of supervised learning
      2.3. Empirical Risk Minimization and overfittig
      2.4. Conclusion
    3. Statistical models and frameworks for unsupervised learning and
      reinforcement learning
      3.1. Statistical models and frameworks for density estimation
      3.2. Statistical models and frameworks for clustering
      3.3. Statistical models and frameworks for dimension reduction and
      manifold learning
      3.4. Statistical model and framework for reinforcement learning
      3.5. Conclusion
    4. Fisher metric and maximum likelihood estimator
      4.1. The space of all probability measures and total variation norm
      4.2. Fisher metric on a statistical model
      4.3. The Fisher metric, MSE and Cram´er-Rao inequality
      4.4. Efficient estimators and MLE
      4.5. Consistency of MLE
      4.6. Conclusion
    5. Consistency of a learning algorithm
      5.1. Consistent learning algorithm and its sample complexity
      5.2. Uniformly consistent learning and VC-dimension
      5.3. Fundamental theorem of binary classification
      5.4. Conclusions
    6. Generalization ability of a learning machine and model selection
      6.1. Covering number and sample complexity
      6.2. Rademacher complexities and sample complexity
      6.3. Model selection
      6.4. Conclusion
    7. Support vector machines
      7.1. Linear classifier and hard SVM
      7.2. Soft SVM
      7.3. Sample complexities of SVM
      7.4. Conclusion
    8. Kernel based SVMs
      8.1. Kernel trick
      8.2. PSD kernels and reproducing kernel Hilbert spaces
      8.3. Kernel based SVMs and their generalization ability
      8.4. Conclusion
    9. Neural networks
      9.1. Neural networks as computing devices
      9.2. The expressive power of neural networks
      9.3. Sample complexities of neural networks
      9.4. Conclusion
    10. Training neural networks
      10.1. Gradient and subgradient descend
      10.2. Stochastic gradient descend (SGD)
      10.3. Online gradient descend and online learnability
      10.4. Conclusion
    11. Bayesian machine learning
      11.1. Bayesian concept of learning
      11.2. Estimating decisions using posterior distributions
      11.3. Bayesian model selection
      11.4. Conclusion
      Appendix A. Some basic notions in probability theory
      A.1. Dominating measures and the Radon-Nikodym theorem
      A.2. Conditional expectation and regular conditional measure
      A.3. Joint distribution and Bayes’ theorem
      A.4. Transition measure, Markov kernel, and parameterized
      statistical model
      Appendix B. Concentration-of-measure inequalities
      B.1. Markov’s inequality
      B.2. Hoeffding’s inequality
      B.3. Bernstein’s inequality
      B.4. McDiarmid’s inequality
      References

    posted in Mathematical Foundations of Machine Leaning - Online Course - Hong Van Le read more
  • Geo-Sci-Info

    • Introduction to Symplectic Geometry Jean-Louis Koszul -
      (reed) 2019 Springer LINK Video
      Flyer
      Offers a unique and unified overview of symplectic geometry, Highlights the differential properties of symplectic manifolds, Great interest for the emerging field of "Geometric Science of Information”
      This introductory book offers a unique and unified overview of symplectic geometry, highlighting the differential properties of symplectic manifolds. It consists of six chapters: Some Algebra Basics, Symplectic Manifolds, Cotangent Bundles, Symplectic G-spaces, Poisson Manifolds, and A Graded Case, concluding with a discussion of the differential properties of graded symplectic manifolds of dimensions (0,n). It is a useful reference resource for students and researchers interested in geometry, group theory, analysis and differential equations. This book is also inspiring in the emerging field of Geometric Science of Information, in particular the chapter on Symplectic G-spaces, where Jean-Louis Koszul develops Jean-Marie Souriau's tools related to the non-equivariant case of co-adjoint action on Souriau’s moment map through Souriau’s Cocycle, opening the door to Lie Group Machine Learning with Souriau-Fisher metric.

    posted in Preprints - Books - Archivs - Journal special edition (Entropy...) read more
  • Geo-Sci-Info

    Capture du 2019-08-05 11-09-36.png

    Special Issue: "Lie Group Machine Learning and Lie Group Structure Preserving Integrators" Entropy MDPI

    OFFICIAL WEBSITE

    Download Flyer

    Machine/deep learning is exploring use-cases extensions for more abstract spaces such as graphs, differential manifolds, and structured data. The most recent fruitful exchanges between geometric science of information and Lie group theory have opened new perspectives to extend machine learning on Lie groups. After the Lie group’s foundation by Sophus Lie, Felix Klein, and Henri Poincaré, based on the Wilhelm Killing study of Lie algebra, Elie Cartan achieved the classification of simple real Lie algebras and introduced affine representation of Lie groups/algebras applied systematically by Jean-Louis Koszul. In parallel, the noncommutative harmonic analysis for non-Abelian groups has been addressed with the orbit method (coadjoint representation of group) with many contributors (Jacques Dixmier, Alexander Kirillov, etc.). In physics, Valentine Bargmann, Jean-Marie Souriau, and Bertram Kostant provided the basic concepts of Symplectic Geometry to Geometric Mechanics, such as the KKS symplectic form on coadjoint orbits and the notion of Momentum map associated to the action of a Lie group. Using these tools Souriau also developed the theory of Lie Group Thermodynamics based on coadjoint representations. These set of tools could be revisited in the framework of Lie group machine learning to develop new schemes for processing structured data.

    Structure preserving integrators are numerical algorithms that are specifically designed to preserve the geometric properties of the flow of the differential equation, such as invariants, (multi-)symplecticity, volume preservation, as well as the configuration manifold. As a consequence, such algorithms have proven to be highly superior in correctly reproducing the global qualitative behavior of the system. Structure-preserving methods have recently undergone significant development and constitute today a privileged road in building numerical algorithms with high reliability and robustness in various areas of computational mathematics. In particular, the capability for long-term computation makes these methods particularly well adapted to deal with the new opportunities and challenges offered by scientific computations. Among the different ways to construct such numerical integrators, the application of variational principles (such as Hamilton’s variational principle and its generalizations) has appeared to be very powerful, since it is very constructive and because of its wide range of applicability.

    An important specific situation encountered in a wide range of applications going from multibody dynamics to nonlinear beam dynamics and fluid mechanics is the case of ordinary and partial differential equations on Lie groups. In this case, one can additionally take advantage of the rich geometric structure of the Lie group and its Lie algebra for the construction of the integrators. Structure preserving integrators that preserve the Lie group structure have been studied from many points of view and with several extensions to a wide range of situations, including forced, controlled, constrained, nonsmooth, stochastic, or multiscale systems, in both the finite and infinite dimensional Lie group setting. They also naturally find applications in the extension of machine learning and deep learning algorithms to Lie group data.

    This Special Issue will collect long versions of papers from contributions presented during the GSI'19 "Geometric Science of Information" conference (www.gsi2019.org) but will not be limited to these authors and is open to international communities involved in research on Lie groups machine learning and Lie group structure-preserving integrators.

    • Prof. Frédéric Barbaresco
    • Prof. Elena Cellodoni
    • Prof. François Gay-Balmaz
    • Prof. Joël Bensoam
      Guest Editors

    Manuscript Submission Information

    Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website . Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

    Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

    Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

    Keywords

    • Lie groups machine learning
    • orbits method
    • symplectic geometry
    • geometric integrator
    • symplectic integrator
    • Hamilton’s variational principle

    posted in Preprints - Books - Archivs - Journal special edition (Entropy...) read more
  • Geo-Sci-Info

    Capture du 2019-07-16 12-43-26.png
    OFFICIAL WEBSITE

    The Conference on Stochastic Geometry is going to be held at the Euler Mathematical Institute on September 16 - 20, 2019

    The Conference is organized and sponsored by:

    • Euler Mathematical Institute
    • Chebyshev Laboratory of St. Petersburg State University

    The goal of the conference is to bring together the researchers who have experience in stochastic geometry and/or stochastic processes, to exchange ideas, and to stimulate new collaborations.

    Organizing committee:

    • I. Ibragimov
    • Yu. Davydov
    • D. Zaporozhets

    Confirmed Invited Speakers (as of March 1, 2019):

    • Alexander Bufetov
    • Pierre Calka
    • Nicolas Chenavier
    • David Coupier
    • Serguei Dachian
    • Sergey Foss
    • Friedrich Goetze
    • Francesca Greselin
    • Julian Grote
    • Anna Gusakova
    • Raphael Lachieze-Rey
    • Guenter Last
    • Alexander Litvak
    • Julien Randon-Furling
    • Zhan Shi
    • Evgeny Spodarev
    • Joseph Yukich
    • Vladislav Vysotsky
    • Elisabeth Werner
    • Sergei Zuyev

    Local coordinators:
    Nadia Zaleskaya, Tatiana Vinogradova, Natalia Kirshner:
    eimiadm[at]imi.ras.ru

    posted in Stochastic Geometry read more
  • Geo-Sci-Info

    Capture du 2019-07-16 12-25-33.png

    OFFICIAL WEBSITE

    Les Rencontres de Probabilités 2019 à Rouen constituent un événement satellite du congrès International Congress on Industrial and Applied Mathematics à Valence. Il s'agit également de l'édition 2019 étendue à une semaine des Rencontres de Probabilités qui sont organisées chaque année à Rouen depuis près de 20 ans sur les thèmes de la mécanique statistique et des systèmes de particules.

    Les thèmes principaux représentés cette année sont la géométrie aléatoire, l'analyse d'algorithmes et les systèmes de particules. Le programme comprendra des cours et des exposés sur chacun des trois domaines, ce qui permettra de réunir des spécialistes internationaux des différentes communautés et de renforcer les interactions entre elles. La participation des jeunes chercheurs est particulièrement encouragée.

    L'inscription est gratuite mais obligatoire.

    Dates :

    23-27 septembre 2019

    Lieu :

    Université de Rouen Normandie, site du Madrillet,

    UFR des Sciences et Techniques, Amphi D

    Télécharger l'affiche

    Orateurs

    Mini-cours

    • Valentin Féray, Universität Zürich
    • Claudio Landim, CNRS/Université de Rouen Normandie & IMPA Rio
    • Dieter Mitsche, Université Jean-Monnet-Saint-Étienne
    • Matthias Reitzner, Universität Osnabrück
    • Cristina Toninelli, CNRS/Université Paris Dauphine

    Exposés pléniers

    • Imre Bárány, Hungarian Academy of Sciences
    • Peter Bürgisser, Technische Universität Berlin
    • Vincent Cohen-Addad, CNRS/Sorbonne Université
    • Giambattista Giacomin, Université Paris Diderot
    • Patricia Gonçalves, Universidade de Lisbon
    • Jean-Baptiste Gouéré, Université de Tours
    • Thierry Lévy, Sorbonne Université
    • Ralph Neininger, Goethe-Universität Frankfurt
    • Cyril Nicaud, Université Paris-Est Marne-la-Vallée
    • Frank Redig, Technische Universiteit Delft
    • Viet Chi Tran, Université de Lille
    • Dimitrios Tsagkarogiannis, Università dell’Aquila

    Comités

    Comité scientifique

    • Thierry Bodineau, CNRS/École Polytechnique
    • Anna de Masi, Universita di L'Aquila
    • Jean-François Marckert, CNRS/Université de Bordeaux
    • Brigitte Vallée, CNRS/Université de Caen Normandie
    • Joseph E. Yukich, Lehigh University

    Comité d'organisation

    • Pierre Calka, Université de Rouen Normandie
    • Nathanaël Enriquez, Université Paris-Sud
    • Xavier Goaoc, Université de Lorraine
    • Mustapha Mourragui, Université de Rouen Normandi
    • Ellen Saada, CNRS/Université Paris Descartes

    Équipe locale

    • Edwige Auvray, CNRS/Université de Rouen Normandie
    • Pierre Calka, Université de Rouen Normandie,
    • Nicolas Forcadel, INSA Rouen Normandie
    • Sandrine Halé, Université de Rouen Normandie
    • Mustapha Mourragui, Université de Rouen Normandie
    • Hamed Smail, Université de Rouen Normandie

    Capture du 2019-07-16 12-31-38.png

    posted in Rouen Probability Meeting read more
  • Geo-Sci-Info

    Two NSF/NIH funded postdoc positions are available in the following fields:

    -Computational or applied topology/geometry/graph/algebra

    -Machine learning/deep learning

    -AI-based drug design and discovery

    -Computational biophysics

    Ideal candidates should have experience in code development, have demonstrated the potential for excellence in research, and hold a recent Ph.D. degree in either mathematics, computer science, computational biophysics, computational chemistry, or bioinformatics. The selected candidates will be teamed up with top performers in recent D3R Grand Challenges, a worldwide competition series in computer-aided drug design. Salary depends on experience but will be at least $47.5k. The positions enjoy standard faculty health benefit. Please send CV to weig [at] msu.edu.

    posted in Jobs offers - Call for projects read more
  • Geo-Sci-Info

    2 Ph.D. students in Machine Learning at Lund University (Sweden).

    Position type: Ph.D. scholarship
    Research area: Machine Learning
    Start: September 2019 or later
    Duration: 4 years
    Where: Lund University - CS department
    Application closing date: July 29, 2019
    Supervisor: Professor Luigi Nardi (luigi.nardi [at] cs.lth.se)
    Position description: https://lu.varbi.com/en/what:job/jobID:280143/

    These projects, financed by WASP (Wallenberg AI, Autonomous System and Software Programme), aim at introducing innovative algorithms and methodologies to overcome the limitations of multi-objective black-box optimization. They are part of a collaboration with Stanford University. The students will be encouraged to apply to the WASP exchange program with Stanford to work closely with collaborators.

    Apply here.
    Topics of interest:

    Black-box optimization
    Derivative-free optimization (DFO)
    Bayesian optimization
    Algorithm configuration and selection
    Active learning
    Automated machine learning (AutoML)
    Neural architecture search (NAS)
    Hyperparameter optimization
    Learning to learn
    Meta learning and transfer learning
    Reinforcement learning (RL)
    Optimization of neural networks
    Evolutionary algorithms (EA)
    Discrete optimization and NP-hard problem solving
    Data-driven analysis of algorithms, hyperparameter importance, etc.

    Some applications of interest:
    Image classification, Natural Language Processing (NLP), Simultaneous localization and mapping (SLAM), Design space exploration (DSE), Optimizing compilers, Hardware design: CPU, GPU, FPGA, CGRA, ASIC.

    posted in Jobs offers - Call for projects read more
  • Geo-Sci-Info


    Open post-doc position at Géoazur in collaboration with Inria, at Sophia
    Antipolis, France, in the research area: Curvilinear network detection
    on satellite images using AI, stochastic models and deep learning.

    EXTENDED Submission deadline July 31, 2019

    Open Position for a post-doc scientist at Géoazur
    (https://geoazur.oca.eu/fr/acc-geoazur) in collaboration with Inria
    (https://www.inria.fr/en/centre/sophia), at Sophia Antipolis (Nice
    region), France, in the area of Computer Vision, Deep Learning and
    Remote Sensing applied to curvilinear detection on both optical and SAR
    satellite images (project abstract below).
    Both Geoazur and Inria Sophia Antipolis are ideally located in the heart
    of the French Riviera, inside the multi-cultural silicon valley of
    Europe (ie. Sophia-Antipolis, see
    https://en.wikipedia.org/wiki/Sophia_Antipolis).

    This position is funded by University Côte d'Azur (UCA, see
    http://univ-cotedazur.fr/en#.XOforoWTpT4).

    Duration: 18 months
    Starting date: between September 1st and December 1st 2019.
    Salary: gross salary per month 3000 EUR (ie. approximately 2400 EUR net)

    Please see full announcement
    https://faultsrgems.oca.eu/images/FAULT/News/Post-doc_offer-AI-ManighettiZerubia.pdf,
    or on https://euraxess.ec.europa.eu/jobs/411481

    Candidate profile

    Strong academic backgrounds in Stochastic Modeling, Deep Learning,
    Computer Vision, Remote Sensing and Parallel Programming with GPUs
    and/or multicore CPUs. A decent knowledge of Earth and telluric features
    (especially faults) will be appreciated.

    To apply, please email a full application to both Isabelle Manighetti
    (manighetti[at]geoazur.unice.fr) and Josiane Zerubia
    (josiane.Zerubia[at]inria.fr), indicating “UCA-AI-post-doc” in the e-mail
    subject.

    The application should contain:

    • a motivation letter demonstrating motivation, academic strengths
      and related experience to this position.
    • CV including publication list
    • at least two major publications in pdf
    • minimum 2 reference letters

    Project abstract

    Curvilinear structure networks are widespread in both nature and
    anthropogenic systems, ranging from angiography, earth and environment
    sciences, to biology and anthropogenic activities. Recovering the
    existence and architecture of these curvilinear networks is an essential
    and fundamental task in all the related domains. At present, there has
    been an explosion of image data documenting these curvilinear structure
    networks. Therefore, it is of upmost importance to develop numerical
    approaches that may assist us efficiently to automatically extract
    curvilinear networks from image data.

    In recent years, a bulk of works have been proposed to extract
    curvilinear networks. However, automated and high-quality curvilinear
    network extraction is still a challenging task nowadays. This is mainly
    due to the network shape complexity, low-contrast in images, and high
    annotation cost for training data. To address the problems aroused by
    these difficulties, this project intends to develop a novel,
    minimally-supervised curvilinear network extraction method by combining
    deep neural networks with active learning, where the deep neural
    networks are employed to automatically learn hierarchical and
    data-driven features of curvilinear networks, and the active learning is
    exploited to achieve high-quality extraction using as few annotations as
    possible. Furthermore, composite and hierarchical heuristic rules will
    be designed to constrain the geometry of curvilinear structures and
    guide the curvilinear graph growing.

    The proposed approach will be tested and validated on extraction of
    tectonic fractures and faults from a dense collection of satellite and
    aerial data and “ground truth” available at the Géoazur laboratory in
    the framework of the Faults_R_Gems project co-funded by the University
    Côte d’Azur (UCA) and the French National Research Agency (ANR). Then we
    intend to apply the new automatic extraction approaches to other
    scenarios, as road extraction in remote sensing images of the Nice
    region, and blood vessel extraction in available medical image databases.

    posted in Jobs offers - Call for projects read more
  • Geo-Sci-Info

    CaptureML.PNG
    JOB Announcement

    MEDIAN WEBSITE

    Depuis 2002, Median Technologies repousse les limites de l'identification, de l'interprétation, de l'analyse et de la communication des données d'imagerie pour le monde médical. Le cœur de notre activité est le développement de logiciels et de plateformes innovants d’imagerie pour les essais cliniques en oncologie, l’aide au diagnostic et le suivi des patients atteints de cancers ; ces logiciels ont pour but d’améliorer la prise en charge des patients souffrant de cancers en aidant à l’identification des pathologies, à la mise au point et à la sélection de thérapies adaptées aux patients (médecine de précision).

    Notre activité est à la convergence de plusieurs disciplines telles que la médecine, l’imagerie médicale et les technologies de l’information. Nos collaborateurs possèdent des expertises scientifiques (intelligence artificielle, sciences des données), techniques (logiciel, cloud computing), médicales, réglementaires, et de business development toutes utilisées dans le développement et la mise sur le marché de nos applications et des services qui leur sont associés. Dans notre travail quotidien, nous sommes guidés par quatre valeurs, toutes fondamentales pour nous :
    • Donner du sens à l’innovation
    • Aider nos clients à atteindre leurs objectifs
    • Mettre la qualité au cœur de notre savoir-être et de notre savoir-faire
    • Penser aux patients

    Aujourd’hui, nous sommes 90 personnes principalement localisées au siège social de Median sur la technopôle de Sophia-Antipolis (Côte d’Azur), mais également aux Etats-Unis et en Chine où nous avons des filiales. Nous travaillons dans un contexte international et multiculturel particulièrement attractif et épanouissant.

    Dans le cadre de notre recherche et développement en Intelligence Artificielle appliquée à l’imagerie médicale, nous recherchons : un Ingénieur de Recherche “Data Science and Machine Learning », H/F

    Intégré dans une équipe de recherche et développement multidisciplinaire au sein du projet iBIOPSY®, vous êtes un scientifique en recherche et développement de solutions innovantes d’imagerie médicale utilisant le Machine Learning et d’autres méthodes d’IA.

    L’imagerie médicale est l’un des domaines les plus dynamiques du Machine Learning. Nous recherchons un Scientifique / ingénieur expérimenté passionné, dynamique, et organisé avec une forte expérience en Machine Learning, d’excellentes compétences en communication pour s’épanouir au cœur de l’innovation technologique.

    Présentation des activités et tâches principales associées au poste
    o Poste sous la supervision du Chief of Science and Innovation Officer

    o Responsabilités :

    1. Vous travaillerez sur la recherche d’images par le contenu dans l’imagerie médicale. Vous construirez des services de moteur de recherche performants pour des applications cliniques.

    2. Vous utiliserez vos connaissances en intelligence artificielle et Machine Learning scientifiques pour développer des biomarqueurs solides et innovants, sur la base de données provenant de systèmes d’imagerie médicale tels que IRM et CT scanners, et autres sources de données.

    3. Votre travail impliquera la recherche et le développement en mode Agile d’algorithmes et systèmes innovants en Machine Learning. Etant au cœur de l'innovation de notre organisation, vous participerez activement à l’exploration, le suivi de l’évolution, l'évaluation et l'exploitation de technologies révolutionnaires, ainsi que l’émergence de nouvelles tendances industrielles, académiques et technologiques.

    4. Vous travaillerez en collaboration avec l’équipe de développement de logiciel ainsi que l’équipe de science clinique.

    5. En outre, vous transmettrez vos connaissances technologiques et partagerez idées et bonnes pratiques entre les équipes. Vous générerez de la propriété intellectuelle pour l'entreprise. Vous rédigerez des articles scientifiques (peer reviewed papers) et présenterez des résultats lors de conférences industrielles/scientifiques.

    6. Nous attendons de vous le développement de solutions d’imagerie révolutionnaires, basées sur l’intelligence artificielle et s’appuyant sur de l’informatique dématérialisé ; l’application de techniques supervisées et non supervisées de Machine Learning pour créer de la valeur depuis des bases de données d’images et de données cliniques générées par nos partenaires en recherche médicale et de l’industrie pharmaceutique. Ces systèmes et services basés sur l’intelligence artificielle iront au-delà de l’analyse d’image pour transformer la pratique médicale et le développement de médicaments.

    o Les responsabilités incluent également la gestion et l’engagement vis-à-vis de nos partenaires technologiques stratégiques.

    Profil sollicité
    o Formation : Maîtrise ou Doctorat en Mathématiques, Science Informatique, ou domaines équivalents.

    o Principales compétences et expériences requises :
    • Minimum 2 ans d’expérience pertinente en (Deep) Machine Learning.
    • Expérience en Medical Imaging, CT/MRI, signatures d’image, Extraction d’informations visuelles à grande échelle, techniques de sélection.
    • Expérience pertinante en Python, R, DL frameworks (i.e. Pytorch, Keras, Tensorflow) et packages standard comme Scikit-learn, Numpy, Scipy, Pandas
    • Modèles d’inférence de structure de données, clustering, cartographie
    • Extraction multimodale.
    • Auteur sur des recherches associées (publications/conférences).
    • Solide expérience en technologies OpenSource pour accélérer l’innovation

    o Connaissances :
    • Connaissance technique approfondie en IA, Deep Learning et en vision artificielle
    • Solides connaissances fondamentales en traitement de données statistiques, techniques de régression, réseaux de neurones, arbres de décision, classification, reconnaissance de formes, théorie des probabilités, systèmes stochastiques, inférence bayésienne, techniques statistiques et réduction de la dimensionnalité.

    o Compétences additionnelles :
    • Fortes aptitudes relationnelles, de communication et de présentation, ainsi que la capacité à travailler en équipe
    • Maîtrise de l’anglais oral et écrit

    Eléments du contrat
    o Poste basé à : Sophia-Antipolis, France
    o Type de contrat : CDI
    o Date de début du contrat : au plus tôt
    o Rémunération : à négocier selon profil

    Avantages offerts par la société
    o Tickets restaurant
    o Restaurant d’entreprise
    o Mutuelle d’entreprise
    o Cadre épanouissant

    Pourquoi nous rejoindre ?
    o Rejoignez une société internationale, multiculturelle et en pleine croissance
    o Soyez au cœur de l’innovation

    Capture.PNG

    posted in Jobs offers - Call for projects read more
  • Geo-Sci-Info

    CaptureCL.PNG
    POSITION - PDF

    MEDIAN WEBSITE

    Depuis 2002, Median Technologies repousse les limites de l'identification, de l'interprétation, de l'analyse et de la communication des données d'imagerie pour le monde médical. Le cœur de notre activité est le développement de logiciels et de plateformes innovants d’imagerie pour les essais cliniques en oncologie, l’aide au diagnostic et le suivi des patients atteints de cancers ; ces logiciels ont pour but d’améliorer la prise en charge des patients souffrant de cancers en aidant à l’identification des pathologies, à la mise au point et à la sélection de thérapies adaptées aux patients (médecine de précision).

    Notre activité est à la convergence de plusieurs disciplines telles que la médecine, l’imagerie médicale et les technologies de l’information. Nos collaborateurs possèdent des expertises scientifiques (intelligence artificielle, sciences des données), techniques (logiciel, cloud computing), médicales, réglementaires, et de business development toutes utilisées dans le développement et la mise sur le marché de nos applications et des services qui leur sont associés. Dans notre travail quotidien, nous sommes guidés par quatre valeurs, toutes fondamentales pour nous :
    • Donner du sens à l’innovation
    • Aider nos clients à atteindre leurs objectifs
    • Mettre la qualité au cœur de notre savoir-être et de notre savoir-faire
    • Penser aux patients

    Aujourd’hui, nous sommes 90 personnes principalement localisées au siège social de Median sur la technopôle de Sophia-Antipolis (Côte d’Azur), mais également aux Etats-Unis et en Chine où nous avons des filiales. Nous travaillons dans un contexte international et multiculturel particulièrement attractif et épanouissant.

    Dans le cadre de notre recherche et développement en Intelligence Artificielle appliquée à l’imagerie médicale, nous recherchons : Ingénieur de Recherche “Data Structuring and Clustering” H/F

    Intégré dans une équipe de recherche et développement multidisciplinaire au sein du projet iBIOPSY®, vous êtes un scientifique en recherche et développement de solutions innovantes d’imagerie médicale utilisant le Machine Learning et d’autres méthodes d’IA.

    L’imagerie médicale est l’un des domaines les plus dynamiques du Machine Learning. Nous recherchons un Scientifique / ingénieur expérimenté passionné, dynamique, et organisé avec une forte expérience en Machine Learning, d’excellentes compétences en communication pour s’épanouir au cœur de l’innovation technologique.

    Présentation des activités et tâches principales associées au poste
    o Poste sous la supervision du Chief of Science and Innovation Officer

    o Responsabilités :

    1. Vous travaillerez sur des techniques de classification de données évolutives, et développerez votre savoir-faire pour l’acquisition de connaissances et l’exploration vers des biomarqueurs robustes. Vous conduirez des études de validation de classification.

    2. Vous travaillerez sur l’extraction à grande échelles d’images par le contenu dans l’imagerie médicale. Vous construirez des services de moteur de recherche performants pour des applications cliniques et contribuerez à développer des biomarqueurs solides et révolutionnaires pour une médecine personnalisée.

    3. Votre travail impliquera la recherche et le développement en mode Agile d’algorithmes et systèmes innovants en Machine Learning. Etant au cœur de l'innovation de notre organisation, vous participerez activement à l’exploration, le suivi de l’évolution, l'évaluation et l'exploitation de technologies révolutionnaires, ainsi que l’émergence de nouvelles tendances industrielles, académiques et technologiques.

    4. Vous travaillerez en collaboration avec l’équipe de développement de logiciel ainsi que l’équipe de science clinique.

    5. En outre, vous transmettrez vos connaissances technologiques et partagerez idées et bonnes pratiques entre les équipes. Vous générerez de la propriété intellectuelle pour l'entreprise. Vous rédigerez des articles scientifiques (peer reviewed papers) et présenterez des résultats lors de conférences industrielles/scientifiques.

    6. Nous attendons de vous le développement de solutions d’imagerie révolutionnaires, basées sur l’intelligence artificielle et s’appuyant sur de l’informatique dématérialisé ; l’application de techniques supervisées et non supervisées de Machine Learning pour créer de la valeur depuis des bases de données d’images et de données cliniques générées par nos partenaires en recherche médicale et de l’industrie pharmaceutique. Ces systèmes et services basés sur l’intelligence artificielle iront au-delà de l’analyse d’image pour transformer la pratique médicale et le développement de médicaments.

    o Les responsabilités incluent également la gestion et l’engagement vis-à-vis de nos partenaires technologiques stratégiques.

    Profil sollicité
    o Formation : Maîtrise ou Doctorat en Mathématiques, Science Informatique ou domaines équivalents.

    o Principales compétences et expériences requises :
    • Minimum 5 ans d’expérience pertinente en (Deep) Machine Learning.
    • Expérience en Medical Imaging, CT/MRI, signatures d’image, Extraction d’informations visuelles à grande échelle, techniques de sélection.
    • Expérience pertinante en Python, R, DL frameworks (i.e. Pytorch, Keras, Tensorflow) et packages standard comme Scikit-learn, Numpy, Scipy, Pandas
    • Modèles d’inférence de structure de données, clustering, cartographie
    • Extraction multimodale.
    • Auteur sur des recherches associées (publications/conférences).
    • Solide expérience en technologies OpenSource pour accélérer l’innovation

    o Connaissances :
    • Connaissance technique approfondie en IA, Deep Learning et en vision artificielle
    • Solides connaissances fondamentales en traitement de données statistiques, techniques de régression, réseaux de neurones, arbres de décision, classification, reconnaissance de formes, théorie des probabilités, systèmes stochastiques, inférence bayésienne, techniques statistiques et réduction de la dimensionnalité.
    o Compétences additionnelles :
    • Fortes aptitudes relationnelles, de communication et de présentation, ainsi que la capacité à travailler en équipe
    • Maîtrise de l’anglais oral et écrit

    Eléments du contrat
    o Poste basé à : Sophia-Antipolis, France
    o Type de contrat : CDI
    o Date de début du contrat : au plus tôt
    o Rémunération : à négocier selon profil

    Avantages offerts par la société
    o Tickets restaurant
    o Restaurant d’entreprise
    o Mutuelle d’entreprise
    o Cadre épanouissant

    Pourquoi nous rejoindre ?
    o Rejoignez une société internationale, multiculturelle et en pleine croissance
    o Soyez au cœur de l’innovation

    Capture.PNG

    posted in Jobs offers - Call for projects read more
  • Geo-Sci-Info

    Salary: £36261 to £48677 per annum (pro-rata if applicable) depending on skills and experience. Salary progression beyond this scale is subject to performance.

    We are looking for an Assistant Professor in Statistics to deliver high quality teaching and undertake original research in a branch of statistics complementing existing activity within the School. Applications from suitable candidates in any area of statistics will be welcome, but we would be particularly interested to hear from candidates with expertise in one or more of Machine Learning, Bayesian Computation and Uncertainty Quantification.

    We believe that a talented and inclusive teams deliver the highest quality research and are seeking applications from high quality candidates who enhance the diversity of our existing team. The School is committed to creating opportunities for people traditionally under-represented in Mathematical Sciences.

    You will be able to carry out duties to the highest standard and to evidence how, through your experience and potential, you will:

    Plan and deliver high quality teaching on undergraduate and MSc statistics modules
    Have a proven track-record or potential for publishing research work of international quality in statistics
    Contribute to the dissemination of research outputs at national/international conferences, workshops, and meetings
    We are looking for a proactive, organised researcher who can evidence:

    A PhD, or equivalent in statistics or a relevant field
    Excellent communication and organisational skills
    The ability to work independently and as part of a multidisciplinary and multicultural team
    Networking, actively engaging with and valuing other areas and diverse groups
    This is a fixed-term post for two years, which is due to start on 1 September 2019 and is full-time (36.25 hours per week), however applications are also welcome from candidates wishing to work part-time (minimum 29 hours per week). Please specify in your application if you wish to work part-time and the number of preferred hours. Job share arrangements may be considered for this post.

    Informal enquiries may be addressed to Professor Andrew Wood, tel: (0115) 9514983 or email andrew.wood@nottingham.ac.uk. Please note that applications sent directly to this email address will not be accepted.

    Our University has always been a supportive, inclusive, caring and positive community. We warmly welcome those of different cultures, ethnicities and beliefs – indeed this very diversity is vital to our success, it is fundamental to our values and enriches life on campus. We welcome applications from UK, Europe and from across the globe. For more information on the support we offer our international colleagues, visit; https://www.nottingham.ac.uk/jobs/applyingfromoverseas/index2.aspx

    posted in Jobs offers - Call for projects read more
  • Geo-Sci-Info

    Capture.PNG
    OFFICIAL WEBSITE
    Toulouse August-November 2019

    Dowload POSTER

    This thematic trimester aims to highlight recent advances and scientific synergies between statistics and geometry. The dynamics of this scientific combination between statistics and geometry is driven by many applications in the field of signal processing (radar, images, …) and massive data (internet databases, monitoring, …). This thematic trimester will undoubtedly be a scientific springboard for the development of a Statistical Geometry-Computational Geometry research axis, whose efflorescence is highly probable in the next decade. The trimester will open at the end of August 2019, with the GSI 2019 conference organized at ENAC. The thematic trimester will then extend, from September to November, around the three following scientific axis:

    • Information Geometry (30th of August-6th of September 2019—14th-19th of October 2019),
    • Topology for learning and data analysis (30th of September-4th of October 2019),
    • Computational algebraic geometry, optimization and statistical applications (6th-8th November 2019).
      Registration is free however mandatory. To register to a workshop please go to workshop pages.

    Financial support application for students are available. Please go to workshop pages.

    Links
    Poster of the thematic trimester
    Precise information of the conference and workshops

    Commitees

    Practical Information (including how to move and lodge in Toulouse)

    Satellite events
    Preparating workshop (including geometry, topology, statistics for dummies)
    [French excellence school on Geometry and Statistics. Master class. July 2019](French excellence school on Geometry and Statistics. Master class. July 2019)

    posted in Statistics with geometry and topology - thematic trimester read more
  • Geo-Sci-Info

    "Topology of statistical systems: a cohomological approach to information theory"
    PhD-defense of Juan Pablo Vigneaux at l'IMJ-PRG under the direction of Daniel Bennequin.

    The defense will take place on friday 14th june 2019 at 10:30 AM in room 1009 of "bâtiment Sophie Germain, 8 place Aurélie Nemours, 75013 Paris France. IMJ-PRG

    The PhD manuscript can be downloaded HERE

    PhD jury:

    • Pr. Samson Abramsky, University of Oxford, Rapporteur.
    • Pr. Daniel Bennequin, Université Paris Diderot, Directeur de thèse.
    • Pr. Stéphane Boucheron, Université Paris Diderot, Examinateur.
    • Pr. Antoine Chambert-Loir, Université Paris Diderot, Examinateur.
    • Pr. Philippe Elbaz-Vincent, Université Grenoble Alpes, Rapporteur.
    • Pr. Mikhail Gromov, Institut des Hautes Études Scientifiques, Examinateur.
    • Pr. Kathryn Hess, École Polytechnique Fédérale de Lausanne, Examinatrice.
    • Pr. Olivier Rioul, Télécom ParisTech, Examinateur.

    posted in Topology of statistical systems: a cohomological approach to information theory read more
  • Geo-Sci-Info

    Workshop in memory of František Matúš

    August 19–23, 2019 -UTIA, the Institute of Information Theory and Automation of the Academy of Sciences of the Czech Republic, Prague
    OFFICIAL WEBPAGE
    https://www.itsoc.org/news-events/recent-news/workshop-in-memory-of-frantisek-matus-august-2019
    image.jpeg

    The workshop will be organized as a part of the conference Prague Stochastics 2019, to be held in August 19–23, 2019, in UTIA, the Institute of Information Theory and Automation of the Academy of Sciences of the Czech Republic, Prague.
    Workshop in memory of František Matúš (August 2019)

    The workshop will be devoted to František Matúš, who passed away on May 17, 2018. His research interests reached several mathematical fields. He was involved in information theory, probability theory, statistics, geometry, algebra, and matroid theory. The workshop to commemorate him is intended to be multidisciplinary, involving these fields in which František worked, and the areas close to his interests. We particularly welcome contributions devoted to information geometry, entropic regions, information inequalities, cryptography, polymatroids, optimization of convex integral functionals, discrete Markovian random sequences, conditional independence, semi-graphoids, graphical models, exponential families, and algebraic statistics.

    The workshop will take place at his home institution. Presentations at the workshop will include about ten invited talks given by experts in the area of his interest, and contributions from registered participants on close topics. A preliminary list of main speakers include:

    • László Csirmaz (Renyi Institute, Budapest)
    • Imre Csiszár (Renyi Institute, Budapest)
    • Thomas Kahle (OvGU, Magdeburg)
    • Seffen Lauritzen (University of Copenhagen)
    • Carles Padró (Universitat Politecnica de Catalunya)
    • Johannes Rauh (Max Planck Institute)
    • Andrei Romashchenko (Laboratoire d’Informatique, Montpellier)
    • Bernd Sturmfels (Max Planck Institute)
    • Raymong Yeung (Chinese University of Hong Kong)
    • Piotr Zwiernik (Barcelona)

    Shorter contributed talks or posters will be selected from submitted abstracts by the program committee. The option to present open problems within smaller topic-specific sessions, moderated by invited chairs, is also considered, and will depend on the interest expressed by the preregistered participants. No confer- ence fee is planned.

    If you are interested in participating, please use the pre-registration form

       http://simu0292.utia.cas.cz/pragstoch2019/callFM.php
    

    and provide us with an abstract of a suggested presentation by May 17, 2019.

    Program Committee:

    • Nihat Ay (MPI MIS, Leipzig)
    • László Csirmaz (Renyi Institute, Budapest)
    • Milan Studeny ́ (UTIA, Prague)

    posted in Workshop in memory of František Matúš read more
  • Geo-Sci-Info

    The 18th International Conference, Graduate School of Mathematics, Nagoya University
    Information Geometry and Affine Differential Geometry III

    OFFICIAL WEBPAGE

    Period
    March 27–29, 2019

    Place
    Rm.~509, Mathematics Bldg., Nagoya University

    Speakers

    • Shun-ichi Amari (Riken),
    • Frédéric Barbaresco (Thales Land & Air Systems),
    • Michel Nguiffo Boyom (Université de Montpellier),
    • Shinto Eguchi (Institute of Statistical Mathematics),
    • Hitoshi, Furuhata (Hokkaido University),
    • Hiroto Inoue (Kyushu University),
    • Hideyuki Ishi (Nagoya University),
    • Amor Keziou (Université de Reims Champagne-Ardenne),
    • Yongdo Lim (Sungkyunkwan University),
    • Hiroshi Matsuzoe (Nagoya Institute of Technology),
    • Atsumi Ohara (Fukui University),
    • Philippe Regnault (Université de Reims Champagne-Ardenne),
    • Tatsuo Suzuki (Shibaura Institute of Technology),
    • Jun Zhang (University of Michigan)

    Organizing Committee

    • Hideyuki Ishi (Nagoya University),
    • Hiroshi Matsuzoe (Nagoya Institute of Technology),
    • Atsumi Ohara (Fukui University),
    • Jun Zhang (University of Michigan)

    Contact to
    Hideyuki Ishi (hideyuki (at) math.nagoya-u.ac.jp)

    posted in Information Geometry and Affine Differential Geometry III read more
  • Geo-Sci-Info

    APPLICATION WEBSITE

    What you will do

    As part of the Research team, you will embrace theoretical mathematics, computer science and financial knowledge. Fully involved on advanced topics, you will work closely with researchers, deep learners, and science addicts. Being part of our international team, you will experience how being smartly wrong often brings to a better solution.

    Main missions:

    improve the learning capabilities of the automated trading systems
    design solutions to challenge large datasets with a data driven approach
    contribute to the research infrastructure aiming at identifying financial biais
    challenge researchers and common knowledge
    suggest and engage in team collaborations to meet research goals
    report and present research findings and developments

    Skills we are looking for

    PhD or MS in Data Science, Science Technology or Mathematics.

    You have:

    3-5 years of experience in Machine Learning
    a powerful intellectual curiosity with a strong academic knowledge in probability, statistics and machine learning models
    experience with Python3 and libraries such as Numpy, Pandas, Plotly
    experience with Linux and Git environments
    strong knowledge of object-oriented programming and algorithms
    a “can-do” attitude and a problem-solving mindset
    eager to learn and to challenge complex machine learning problems
    an ability to operate in an agile and fast-paced environment

    posted in Jobs offers - Call for projects read more
  • Geo-Sci-Info

    APPLICATION WEBSITE
    Job Overview

    Advanced Analytics - Sr. Data Scientist will execute advanced computational approaches to aid in evidence-based pharmaceutical product development. He/She will leverage high-dimensional population health data to support R&D, Medical, HEVA, commercial product development, access and business strategy. The Advanced Analytics role will generate analytics required by healthcare decision makers to support patient access and use of Sanofi medicines and he/she will contribute to the insights required by Sanofi internal teams to develop and commercialize the most impactful medicines.

    Job Responsibilities

    Get to apply a broad array of capabilities spanning machine learning, statistics, text-mining/NLP, and modeling to extract insights to structured and unstructured healthcare data sources, pre-clinical, clinical trial and complementary real world information streams.
    Work on a variety of team-based projects providing expertise in analytical and computational approaches.
    Have the opportunity to identify novel solutions to internal analytics & data challenges including the piloting and/or evaluation of tools for analytics, reporting and data visualization.
    Develop additional skills through training courses, mentoring, and interactions daily with team members and Sanofi stakeholders.
    Provide expertise and execute advanced analytics for solving problems across R&D, Medical Affairs, HEVA and Market Access Strategies and Plans.
    Design and implement data models, perform statistical analysis and create predictive analysis models
    Translate and appropriately champion advanced analytics results and capabilities to non-technical audiences.
    Work with internal and external data scientists to scope and execute Advance Analytics projects.

    Essential Skills & Experience

    PhD or ScD in quantitative field such as Health Services research, Medical Economics, Medical Informatics, Biostatistics, or Computer Science, computer engineering or related field with a minimum of 3 years of industry or academic experience
    Relevant Masters Degree, with 6 or more years of related industry experience
    Proficiency in at least two or more technical or analytical languages (R, Python, etc..) and a willingness to embrace new coding approaches.
    Experience with advanced ML techniques (neural networks/deep learning, reinforcement learning, SVM, PCA, etc.).
    Demonstrated ability to interact with a variety of large-scale data structures e.g. HDFS, SQL, noSQL
    Experience working across multiple environments (e.g. AWS, GCP, linux) for optimizing compute and big data handling requirements.
    Experience with any of the following biomedical data types/population health data/real world data/novel data streams.
    Strong oral and written communication skills
    A demonstrated ability to work and collaborate in a team environment

    Desirable Skills & Experience

    Ability to prototype analyses and algorithms in high-level languages embracing reproducible and collaborative technology platforms (e.g. github, containers, jupyter notebooks)
    Exposure to NLP technologies and analyses
    Knowledge of some datavis technologies (ggplot2, shiny, plotly, d3, Tableau or Spotfire)

    Sanofi is committed to welcoming and integrating people with disabilities

    At Sanofi diversity and inclusion is foundational to how we operate and embedded in our Core Values. We recognize to truly tap into the richness diversity brings we must lead with inclusion and have a workplace where those differences can thrive and be leveraged to empower the lives of our colleagues, patients and customers. We respect and celebrate the diversity of our people, their backgrounds and experiences and provide equal opportunity for all.

    posted in Jobs offers - Call for projects read more
  • Geo-Sci-Info

    APPLICATION WEBSITE

    THE ROLE

    You will join the Data Science team. It's a cross-functional team using data to support strategic decision-making and build better experiences for our passengers and drivers alike.

    As a Data Scientist focused on Algorithms, you will apply machine learning and optimisation techniques on our rich datasets to solve some of the most interesting mobility challenges, such as dynamic pricing, intelligent allocation and much more!

    WHAT YOU'LL DO

    • Work with Product to identify and prioritise algorithmic needs

    • Team up with Engineering to incorporate machine learning and optimisation algorithms in our product

    • Code simulation modules to replicate driver and passenger behaviours and suggest pricing or dispatch improvements

    • Uncover hidden opportunities for growth and efficiency for Heetch

    • Conduct and present quantitative analysis that results in actionable recommendations

    WHO WE ARE LOOKING FOR

    • You have a degree in Computer Science, Engineering, Economics, Physics, Statistics or another quantitative field (MS and above preferred)

    • You have 2+ years of industry experience in algorithm design and development

    • You are comfortable manipulating large datasets (using SQL, Python, R etc)

    • You can build and fit statistical, machine learning, or optimisation models

    • You can collaborate with Engineers to turn prototypes into scaled-up products

    • You can communicate effectively with colleagues from various backgrounds and technical levels

    • You are fluent in English

    BONUS POINTS IF YOU:

    • Have prior exposure to startup environments

    • Have experience with cloud computing and big data frameworks (incl. geospatial data)

    • Have experience leading machine learning projects and/or building data products end-to-end under limited supervision

    • Are able to model and run simulated and live traffic experiments

    • Are an explorer and enjoy going out!

    posted in Jobs offers - Call for projects read more
  • Geo-Sci-Info

    UMAP - Leland McInnes, John Healy, James Melville

    GITHUB - OFFICIAL WEBSITE

    Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction. The algorithm is founded on three assumptions about the data

    • The data is uniformly distributed on a Riemannian manifold;
    • The Riemannian metric is locally constant (or can be approximated as such);
    • The manifold is locally connected.

    From these assumptions it is possible to model the manifold with a fuzzy topological structure. The embedding is found by searching for a low dimensional projection of the data that has the closest possible equivalent fuzzy topological structure.

    The details for the underlying mathematics can be found in our paper on ArXiv:

    • McInnes, L, Healy, J, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, ArXiv e-prints 1802.03426, 2018

    The important thing is that you don't need to worry about that -- you can use UMAP right now for dimension reduction and visualisation as easily as a drop in replacement for scikit-learn's t-SNE.

    Documentation is available via ReadTheDocs.
    Installation, licence, how to use information is avalaible on
    GITHUB - OFFICIAL WEBSITE](https://github.com/lmcinnes/umap)

    Benefits of UMAP

    UMAP has a few signficant wins in its current incarnation.

    • First of all UMAP is fast. It can handle large datasets and high dimensional data without too much difficulty, scaling beyond what most t-SNE packages can manage.
    • Second, UMAP scales well in embedding dimension -- it isn't just for visualisation! You can use UMAP as a general purpose dimension reduction technique as a preliminary step to other machine learning tasks. With a little care (documentation on how to be careful is coming) it partners well with the hdbscan clustering library.
    • Third, UMAP often performs better at preserving aspects of global structure of the data than t-SNE. This means that it can often provide a better "big picture" view of your data as well as preserving local neighbor relations.
    • Fourth, UMAP supports a wide variety of distance functions, including non-metric distance functions such as cosine distance and correlation distance. You can finally embed word vectors properly using cosine distance!
    • Fifth, UMAP supports adding new points to an existing embedding via the standard sklearn transform method. This means that UMAP can be used as a preprocessing transformer in sklearn pipelines.
    • Sixth, UMAP supports supervised and semi-supervised dimension reduction. This means that if you have label information that you wish to use as extra information for dimension reduction (even if it is just partial labelling) you can do that -- as simply as providing it as the y parameter in the fit method.
    • Finally UMAP has solid theoretical foundations in manifold learning (see our paper on ArXiv). This both justifies the approach and allows for further extensions that will soon be added to the library (embedding dataframes etc.).

    Performance and Examples

    UMAP is very efficient at embedding large high dimensional datasets. In particular it scales well with both input dimension and embedding dimension. Thus, for a problem such as the 784-dimensional MNIST digits dataset with 70000 data samples, UMAP can complete the embedding in around 2.5 minutes (as compared with around 45 minutes for most t-SNE implementations). Despite this runtime efficiency UMAP still produces high quality embeddings.

    The obligatory MNIST digits dataset, embedded in 2 minutes and 22 seconds using a 3.1 GHz Intel Core i7 processor (n_neighbors=10, min_dist=0 .001):

    UMAP embedding of MNIST digits

    umap_example_mnist1.png

    The MNIST digits dataset is fairly straightforward however. A better test is the more recent "Fashion MNIST" dataset of images of fashion items (again 70000 data sample in 784 dimensions). UMAP produced this embedding in 2 minutes exactly (n_neighbors=5, min_dist=0.1):

    UMAP embedding of "Fashion MNIST"
    umap_example_fashion_mnist1.png

    The UCI shuttle dataset (43500 sample in 8 dimensions) embeds well under correlation distance in 2 minutes and 39 seconds (note the longer time required for correlation distance computations):

    UMAP embedding the UCI Shuttle dataset
    umap_example_shuttle.png

    posted in GSI FORGE read more
Internal error.

Oops! Looks like something went wrong!