1.1. Investigate the estimation of general risk measures in case of extreme losses making heavy use of the extreme-value theory. We shall investigate both the cases of spectral risk measures and distortion risk measures. This work has been initiated during SIMERGE in the framework of heavy-tailed distributions. It should be concluded and extended to light-tailed distributions.

1.2. Second, we also aim at proposing new estimators of such extreme risk measures able to deal with real-valued or functional covariates. We shall investigate the use of new semiparametric models. Such models should lead to more efficient estimators of extreme risks than the purely nonparametric ones introduced in SIMERGE.

1.3. We shall develop new models which take into account both the spatial and temporal nature of the data as well as the fact that the observations are extremes. For instance, we shall extend the linear regression model and spatio-temporal autoregressive moving average process to this context. We also aim at extending the extremality measures for independent functional data to determine extreme spatial observations of functional nature (for instance a rain flow curve at some station during a certain period of time).

2.1. We shall propose classification methods effective on non-standard data (e.g categorical data) in high dimension and allowing to handle dependencies. In SIMERGE, new tools were proposed to take into account the curse-of-dimensionality issue in the context of verbal autopsy data. Here, we would like first to adapt and/or improve similar approaches in the context of SNP data. Second, Functional Data Analysis (FDA) classification based methods will be used by transforming very high dimensional data into functional data. Rather than using sparsity dependency patterns, SNP data can be analyzed by FDA methods taking advantage of the high dimensionality.

2.2. In the context of genetic data, the number of explanatory variables can be very large (from several hundred of thousand to several millions). Thus, selecting relevant variables impacting the outcome is not a simple challenge. The role of variable selection is to have an optimal subset of variables which could explain the phenotype. In the framework of the SIMERG2E project, we aim at developing a variable selection method to pinpoint significant genes implicated in the occurrence of malaria or arboviral diseases in a Senegalese population. Thus, we intend to implement a statistic taking into account intra- and inter-individual dependencies to measure the influence of a subset of variables on the disease phenotype. This statistic will enable us to develop an algorithm allowing to browse all variables subsets in the research of optimal solutions.

- LIRIMA

- Statify, Inria Grenoble Rhône-Alpes

- Head: Stéphane Girard

- Co-head: Abdou Kâ Diongue