# Welcome on the SIMERGE webpage ...

# Statistical Inference for the Management of Extreme Risks and Global Epidemiology

## Project description

SIMERGE is a LIRIMA project-team started in January 2015.
It includes researchers from
Mistis (Inria Grenoble - Rhône-Alpes, France), LERSTAD (Laboratoire d'Etudes et de Recherches en Statistiques et Développement,
Université Gaston Berger, Sénégal),
IRD (Institut de Recherche pour le Développement, Unité de Recherche sur les Maladies Infectieuses et Tropicales Emergentes, Dakar, Sénégal)
and LEM lab (Lille Economie et Management, Université Lille 1, 2, 3, Modal, Inria Lille Nord-Europe,
France).

## Scientific program

The Associate team is built on two research themes:

### 1. Spatial extremes, application to management of extreme risks

Weather variability in small regions, a few kilometres in size, is important in many hydrological, agricultural and energy contexts.
Therefore, spatio-temporal modelling of environmental data is well studied in the literature. The basic objectives are: (i) to infer the nature of spatial variation of extreme precipitations and temperatures based on meteorological observations and (ii) to model the pattern of variability of these data components.
Different characterizations of multivariate extreme dependence structures been proposed in the literature (see, for instance, Coles et. al (2000), Ledford and Tawn (1996, 1997)). These works were the basis of various studies to characterize the dependence spatial extremes of a spatial process (we refer to Ancona-Navarrete and Tawn (2002), Schlather and Tawn (2003)).
Once the modeling step is achieved, the inference of the associated risk can be tackled.
One of the most popular risk measures is the Value-at-Risk (VaR) introduced in the 1990's.
In statistical terms, the VaR at level alpha corresponds to the upper alpha-quantile of the loss distribution.
Even though the VaR has been introduced to deal with financial risks, it is also of interest in hydrological
applications where it can be interpreted as a return level.
The Value-at-Risk however suffers from several weaknesses. First, it provides us only with a pointwise information:
VaR does not take into consideration what the loss will be beyond this quantile.
Second, random loss variables with light-tailed distributions or heavy-tailed distributions may have the same Value-at-Risk (Embrechts et. al, 1999). The definition of new risk measures, the study of their properties in case of extreme events (alpha tends to zero)
and their estimation from data are three major statistical challenges.

- Task 1.1. We plan to investigate the estimation of general risk measures in case of extreme losses making heavy use of the extreme-value theory. We shall investigate both the cases of spectral risk measures and distortion risk measures. This will permit to encompass the particular cases of the value-at-risk and of the expected shortfall for instance.

- Task 1.2. We also aim at proposing estimators of such extreme risk measures able to deal with real-valued or functional covariates. In particular, we shall focus on the case where the covariate is a random field. The case of censored data will also be taken into account.

- Task 1.3. We shall develop models which take into account both the spatial and temporal nature of the data as well as the fact that the observations are extremes.

### 2. Classification, application to global epidemiology

Only 23 of world's countries have high-quality death registration data, and
75 have no cause-specific mortality data at all (Mathers et. al, 2005). Verbal autopsy is a thus a key technique
for estimating the cause-of-death
distribution in populations without medical death certification.
Symptoms along with causes of death are collected from a medical facility, and the cause-of-death D distribution is estimated in the population where only symptom data S are available.
Since both D and S are usually qualitative measures, the estimation of the probability distribution
of D given S can be seen as a classification problem for qualitative variables (King and Yu, 2008).
In most cases, the number s of symptoms is high, S is a binary random vector of size s=100, leading
to 2^s possible combinations of symptoms. The probabilistic modeling of S by, for instance, a multinomial distribution thus faces the curse of dimensionality: Moderate sample sizes do not allow an accurate estimation
of all the 2^s probabilities. The challenge is then to build new probabilistic models sufficiently complex
to handle complex dependences and yet sufficiently simple to be estimated in high dimension.

- Task 2.1. We shall use parsimonious multinomial probability distributions to model each class of the mixture. The simplest way to achieve parsimony is to assume conditional independence of the symptoms given the cause-of-death. This model will be enriched by considering hierarchical dependences between the symptoms. The possibility to model these dependencies via a graph will also be investigated.

- Task 2.2. We shall also adapt the classical Gaussian mixture model to binary data thanks to the introduction of a kernel function. The kernel measures the similarity between two samples in a possible non-linear way. This approach will be combined to dimension reduction techniques to, once again, overcome the curse of dimensionality. We shall also work on the modeling of spatial dependences between measurements to take into account the spatial information in epidemiology data.

## Links

- LIRIMA

- Mistis, Inria Grenoble Rhône-Alpes

- Modal, Inria Lille Europe

## Contact the members

- Head: Stéphane Girard

- Co-head: Abdou Kâ Diongue