Le séminaire a lieu le lundi, à 11h, a priori en salle 316 du bâtiment de Métrologie B (variable cette année pour cause de travaux). Vous trouverez ci-dessous le planning du séminaire de Probabilités-Statistique pour l’année universitaire en cours.
Contacts : jean-jil.duchamps univ-fcomte.fr, ahmed.zaoui univ-fcomte.fr
Exposés passés :
6 mai : Mathilde André
(ENS Paris, Université de Vienne et LmB)
Genealogies in frequency-dependent multitype branching processes
Abstract :
Our work delves into the universality class of some very celebrated entities in population genetics : Λ-coalescents. These objects catalog the genealogies of constant-sized and exchangeable population models known as Cannings models, see Pitman and Sagitov (1999) and serve as baseline models for panmictic, neutral populations in population genetics studies.
We establish a broad class of multitype frequency-dependent models that extends beyond exchangeable and fixed population models, yet for which the scaling limit of the genealogies sampled at a given time is still Kingman’s coalescent. Thus, this work strides towards refining the intuition that neutral, Cannings-like genealogies can arise from complex interactions. We prove convergence in distribution of the genealogies for the Gromov-Weak topology, using a method of moments and the multiple spine decomposition formalism developed by Foutel-Rodier and Schertzer (2023).
The underlying purpose of this work is to formulate a general methodology for deriving the scaling limits of genealogies within regulated populations. This method streamlines computations on forest-valued processes to a fine-grained analysis of the simpler stochastic process driving the type frequencies in the population.
This is joint work with Félix Foutel-Rodier and Emmanuel Schertzer.
29 avril : Clément Dombry
(LmB)
Distributional regression : CRPS-error bounds for model fitting, model selection and convex aggregation (part 2)
Abstract :
Distributional regression aims at estimating the conditional distribution of a target variable given explanatory co-variates. It is a crucial tool for forecasting when a precise uncertainty quantification is required. A popular methodology consists in fitting a parametric model via empirical risk minimization where the risk is measured by the Continuous Rank Probability Score (CRPS). For independent and identically distributed observations, we provide a concentration result for the estimation error and an upper bound for its expectation. Furthermore, we consider model selection performed by minimization of the validation error and provide a concentration bound for the regret. A similar result is proved for convex aggregation of models. Finally, we show that our results may be applied to various models such as EMOS, distributional regression networks, distributional nearest neighbours or distributional random forests and we illustrate our findings on two data sets (QSAR aquatic toxicity and Airfoil self-noise). Travail en collaboration avec Ahmed Zaoui.
8 avril : Sam Allen
(ETH Zürich)
Tail calibration of probabilistic forecasts
Abstract :
Probabilistic forecasts comprehensively describe the uncertainty in the unknown outcome, making them essential for decision making and risk management. While several methods have been introduced to evaluate probabilistic forecasts, existing techniques are ill-suited to the evaluation of tail properties of such forecasts. However, these tail properties are often of particular interest to forecast users due to the severe impacts caused by extreme outcomes. In this work, we reinforce previous results related to the deficiencies of proper scoring rules when evaluating forecast tails, and instead introduce several notions of tail calibration for probabilistic forecasts, allowing forecasters to assess the reliability of their predictions for extreme events. We study the relationships between these different notions, and propose diagnostic tools to assess tail calibration in practice. The benefit provided by these diagnostic tools is demonstrated in an application to European weather forecasts.
18 mars : Séminaire interne en deux parties
(LmB)
10h [reporté] : Clément Dombry - Distributional regression : CRPS-error bounds for model fitting and model selection (part 2)
Abstract :
Suite de la séance du 4 mars.
11h : Landy Rabehasaina - Estimation of subcritical Galton Watson processes with correlated immigration
Abstract :
Nous considérons un processus observé de Galton Watson Y_n, n ∈ ℤ avec immigration modélisé par un processus corrélé ε_n, n ∈ ℤ. Nous présentons des résultats d’estimation du taux de reproduction et l’espérance de l’immigration dans deux situations. La première est lorsque Cov(ε_0,ε_k)=0 pour k supérieur à un certain k_0 : nous fournissons un estimateur et prouvons un résultat de normalité asymptotique. Dans un deuxième temps, nous considérons le cas où ε_n, n ∈ ℤ a une structure de corrélation générale. Sous des hypothèses de mélange, nous déterminons un estimateur pour le taux de reproduction et nous montrons sa convergence en moyenne quadratique avec vitesse explicite. Lorsque le coefficient de mélange décroit suffisamment vite, un développement d’ordre 2 pour cet estimateur est établi. Travail en collaboration avec Y.Boubacar Maïnassara (UPHF).
25 mars : Jean-Armel Bra Kouadio
(LmB)
Modèles AR(1) faibles modulés par une chaine de Markov cachée
Résumé :
Nous présentons les propriétés asymptotiques de l’estimateur des moments pour les modèles autorégressifs (AR) intégrant des changements de régime markoviens où les erreurs sont non corrélées, mais pas nécessairement indépendantes, avec l’hypothèse que les régimes ne sont pas directement observables. L’assouplissement des hypothèses concernant la non-indépendance des erreurs et la non-observabilité directe des régimes élargit significativement l’applicabilité de cette classe de modèles AR à changements de régimes. Nous donnons des conditions nécessaires pour prouver la consistance et la normalité asymptotique de l’estimateur des moments dans un cas particulier du modèle étudié. Une attention particulière est portée à l’estimation de la matrice de covariance asymptotique.
11 mars : Maxime Egea
(Université de Passau)
Multilevel methods for Bayesian posterior mean sampling
Abstract :
In this talk, I will present a few results regarding Multilevel methods and their applications. I will start with an introduction that highlights the statistical motivations and the difficulties associated with high dimensions. First, I will provide an overview of the existing tools and results to address these issues. Special attention will be given to describing Multilevel Monte Carlo methods, including their construction and computational cost. Next, I will introduce new multilevel methods based on pathwise averages in a general framework. The complexity of this algorithm will be computed more precisely for Langevin diffusions that satisfy uniform convexity assumptions. Furthermore, I will explore ways to relax the uniform convexity assumption to meet specific statistical objectives. In this challenging context, I will present a penalization approach and a parametric approach.
4 mars : Clément Dombry
(LmB)
Distributional regression : CRPS-error bounds for model fitting and model selection (part 1)
Abstract :
Distributional regression aims at estimating the conditional distribution of a target variable given explanatory co-variates. It is a crucial tool for forecasting of the target variable together with uncertainty quantification. A popular method widely used in practice consists in fitting a parametric model via empirical risk minimization where the risk is measured by the Continuous Rank Probability Score (CRPS). In a regression framework with independent and identically distributed (\iid) observations, we provide concentration results for the estimation error and upper bound for its expectation. Furthermore, we consider model selection which is often performed in practice via minimization of the test error on a validation sample. We also provide concentration bound for the regret in model selection. Our results may be applied to various models such as EMOS, distributional regression networks or distributional random forests. Travail en collaboration avec Ahmed Zaoui.
19 février : Félix Cheysson
(Univ. Gustave Eiffel)
Spectral estimation for Hawkes processes
Abstract :
Hawkes processes are a family of point processes for which the occurrence of any event increases the probability of further events occurring. Although the linear Hawkes process, for which a representation in the form of a superposition of branching processes exists, is particularly well studied, difficulties remain in estimating the parameters of the process from imperfect data (noisy, missing or aggregated data), since the usual estimation methods based on maximum likelihood or least squares do not necessarily offer theoretical guarantees or are numerically too costly.
In this work, we propose a spectral approach well-adapted to this context, for which we prove consistency and asymptotic normality. In order to derive these properties, we show that Hawkes processes can be studied through the scope of mixing, opening the use of central limit theorems that already exist in the literature. I will then present two applications of this approach : to aggregated data (joint work with Gabriel Lang) ; and to noisy data (joint work with Anna Bonnet, Miguel Martinez and Maxime Sangnier).
5 février : NGÔ Thị Bảo Trâm
(Le Mans Univ.)
New Developments on (Non-)Normalized Continuous Associated-Kernel Density Estimators
Abstract :
We consider the general modern notion of the so-called associated kernels for smoothing density function on a given support. According to the recent and global properties of normalized discrete associated-kernel estimators, we here investigate the continuous associated-kernel contexts in a completely different way. Diverse and numerous in the literature, the standard (non-)normalized density estimators by non-classical kernels have great interests, including modified versions for reducing the possible boundary bias. We first show, under specific assumptions such the asymptotic unimodality on the continuous associated kernel, that the normalizing random variable also converges in mean square to one. We then deduce the consistency of the considered estimator. The comparison in favour of the standard normalized estimator is obtained by the mean squared error. We conclude by providing, for the first time, the general asymptotic normalities through some regularity assumptions for both (un)normalized associated-kernel density estimators. The Gumbel, Weibull, gamma, lognormal, and other associated kernels are required for illustrating theoretically and numerically some of our results with an application to original data of automobile claim amounts from Covéa Affinity.
29 janvier : Apolline Louvet
(Munich)
Modelling populations expanding in a spatial continuum
Abstract :
Understanding the emergence of genetic diversity patterns in expanding populations is of longstanding interest in population genetics.In this talk, I will introduce a model that can be used to gain some insight on the evolution of genetic diversity patterns at the front edge of an expanding population. This model, called the ∞-parent spatial Λ-Fleming Viot process (or ∞-parent SLFV), is characterized by an "event-based" reproduction dynamics that makes it possible to control local reproduction rates and to study populations living in unbounded regions. I will present what is currently known of the growth properties of this process, and what are the implications of these results in terms of genetic diversity at the front edge.
Based on a joint work with Amandine Véber (MAP5, Univ. Paris Cité) and Matt Roberts (Univ. Bath).
22 Janvier : Arya Akhavan
(CMAP, École Polytechnique de Paris)
Estimating the Minimizer and the Minimum Value of a Regression Function under Passive Design
Abstract :
We propose a new method for estimating the minimizer $x^*$ and the minimum value $f^*$ of a smooth and strongly convex regression function $f$ from the observations contaminated by random noise. Our estimator $z_n$ of the minimizer $x^*$ is based on a version of the projected gradient descent with the gradient estimated by a regularized local polynomial algorithm. Next, we propose a two-stage procedure for estimation of the minimum value $f^*$ of regression function $f$. At the first stage, we construct an accurate enough estimator of $x^*$, which can be, for example, $z_n$. At the second stage, we estimate the function value at the point obtained in the first stage using a rate optimal nonparametric procedure. We derive non-asymptotic upper bounds for the quadratic risk and optimization risk of $z_n$, and for the risk of estimating $f^*$. We establish minimax lower bounds showing that, under a certain choice of parameters, the proposed algorithms achieve the minimax optimal rates of convergence on the class of smooth and strongly convex functions.
15 janvier : Alexandre Legrand
(Univ. Lyon 1)
Processus de branchement avec sélection et inhomogènes en temps.
Abstract :
Les processus de branchement avec sélection sont très utilisés en bio-mathématiques pour modéliser l’évolution de populations de taille bornée ; en informatique pour définir des algorithmes d’optimisation sur données hiérarchiques (appelés "beam search") ; ou encore pour le lien avec l’équation de réaction-diffusion F-KPP. Dans cet exposé, nous déterminons la "vitesse" de processus de branchements, plus précisément du mouvement brownien branchant (BBM) et de la marche aléatoire branchante (BRW), que nous supposons inhomogènes en temps et soumis à une procédure de sélection (i.e. on ne conserve que N individus à chaque instant). Nous montrons que cette vitesse est sujette à une "transition de phase" en fonction de la taille de la population N ; et nous déterminons dans chaque régime l’effet de l’inhomogénéité en temps sur la vitesse.
8 janvier : Julie Tourniaire
(ISTA, Vienne)
A branching particle system as a model of pushed fronts
Abstract :
We consider a system of particles performing a one-dimensional dyadic branching Brownian motion with space-dependent branching rate $r(x)$, negative drift $−\mu$, and killed upon reaching 0. More precisely, the particles branch at rate $\rho/2$ in $[0, 1]$, for some $\rho\geq 1$, and at rate $1/2$ in $(1, +\infty)$. The drift $\mu = \mu(\rho)$ is chosen in such a way that the system is critical.
This system can be seen as an analytically tractable model for fluctuating fronts, describing the internal mechanisms driving the invasion of a habitat by a cooperating population. Recent studies by Birzu, Hallatschek and Korolev on the noisy FKPP equation with Allee effect (or cooperation) suggest the existence of three classes of fluctuating fronts : pulled, semi-pushed and fully-pushed fronts.
In this talk, we will focus on the pushed regime. We will show that the particle system exhibits the same phase transitions as the noisy FKPP equation. We will then use this system to explain how the internal mechanisms driving an invasion shape the genealogy of the population.
11 décembre : Noura Dridi
(ENSMM)
Mesure de l’incertitude pour la prédiction avec les réseaux de neurones
Abstract :
Les réseaux de neurones (RN) constituent un outil performant pour diverses applications : prédiction, classification, ... et qui couvrent plusieurs domaines : médical, géosciences, mécanique... Les RN sont moins restrictives en matière d’hypothèse sur le modèle représentant le système, comme ils sont basés sur un apprentissage par les données. Toutefois, tout système réel incarne plusieurs sources d’incertitude : erreurs de mesure, incertitude liée à l’algorithme de décision. Ainsi, il est important de mesurer et de quantifier cette incertitude. En particulier avec les algorithmes d’apprentissage profond, il est pertinent de fournir une décision avec un niveau de confiance. Je présenterai une méthode qui utilise la technique du dropout pour mesurer l’incertitude liée au RN. La méthode est équivalente à une approximation bayésienne d’un processus Gaussien i.e. minimiser la fonction coût du RN est équivalent à une approximation variationnelle de la distribution a posteriori de la fonction reliant les entrées et sorties. L’avantage que l’ajout de la couche dropout n’augmentera pas la complexité calculatoire de l’algorithme. Un exemple d’application pour la prédiction du mouvement d’éolienne en mer en fonction des conditions environnementales sera détaillé.
23 octobre : Laetitia Colombani
(CMAP, École Polytechnique)
Propagation of chaos in a network of FitzHugh-Nagumo
Abstract :
FitzHugh-Nagumo equations have been suggested in 1961 to model neurons. Stochastic versions of these equations have since been developed. The specificity of these SDE is a cubic term in the drift, which needs us to pay attention. With Pierre Le Bris, we have studied the behavior of a network on N neurons, interacting with each other, when N tends to infinity. We prove an uniform in time propagation of chaos in a mean-field framework, with a coupling method suggested by Eberle (2016). During this talk, I will present this model and the idea of the method.
16 octobre : Ahmed Zaoui
(LmB)
Variance function estimation in regression model and its applications
Abstract :
This presentation focuses on the estimation of the variance function in regression and its applications in regression with reject option and prediction intervals.
First, we are interested in estimating the variance function through two methods : model selection (MS) and convex aggregation (C). The goal of the MS procedure is to select the best estimator from a set of predictors, while the C procedure aims to choose the best convex combination among the predictors. The selected predictors are then referred to as MS-estimator and C-estimator, respectively. The construction of both the MS-estimator and C-estimator is based on a two-step procedure. In the first step, using the first sample, we construct estimators of the variance function through a residual-based method. In the second step, we aggregate these estimators using a second sample. We establish the consistency of both the MS-estimator and C-estimator with respect to the L2-risk.
Next, we shift our focus to the regression problem, where one is allowed to abstain from predicting. We focus on the case where the rejection rate is fixed and derive the optimal rule which relies on thresholding the conditional variance function. We provide a semi-supervised estimation procedure for this optimal rule. The resulting predictor with reject option is shown to be almost as good as the optimal predictor with reject option both in terms of risk and rejection rate. We additionally apply our methodology with kNN algorithm and establish rates of convergence for the resulting kNN predictor under mild conditions. Finally, a numerical study is performed to illustrate the benefit of using the proposed procedure.
Finally, we tackle the problem of building a prediction interval in heteroscedastic Gaussian regression. We focus on prediction intervals with constrained expected length in order to guarantee interpretability of the output. In this framework, we derive a closed form expression of the optimal prediction interval that allows for the development a data-driven prediction interval based on plug-in. The construction of the proposed algorithm is based on two samples, one labeled and another unlabeled. Under mild conditions, we show that our procedure is asymptotically as good as the optimal prediction interval both in terms of expected length and error rate. We conduct a numerical analysis that exhibits the good performance of our method.
25 septembre : Sophie Dabo
(Lille)
Multivariate Principal Component Analysis and Binary
Functional Linear Models under non-random sampling
Abstract :
Multivariate principal component analysis and a multivariate functional binary choice model is explored in a case-control or choice-based sample design context. In other words a model is considered in which the response is binary, the explanatory variables are functional, and the sample is stratified with respect to the values of the response variable. A dimension reduction of the space of the explanatory random functions based on a Karhunen–Loève expansion is used to define a conditional maximum likelihood estimate of the model. Based on this formulation, several asymptotic properties are given. A simulation study and an application to real data are used to compare the proposed method with the ordinary maximum likelihood method, which ignores the nature of the sampling. The proposed model yields encouraging results. The potential of the functional choice-based sampling model for integrating special non-random features of the sample, which would have been difficult to see otherwise, is also outlined.