Statistics in Shape Spaces

Statistical methods in shape space

One of the primary motivations for developing shape analysis methods has been and remains the need to create tools for the analysis of datasets whose elements are shapes, which is a common situation when analyzing biological or medical data. Because the shape spaces we are working with are built as Riemannian manifolds, statistical shape analysis can be considered as a subproblem of the general issue of analyzing of manifold data, and constitutes one of the most important applications of this theory. The papers below describes generic methods that I co-developed in this context, to which many other authors, such as Xavier Pennec, Huiling Le, Kanti Mardia, Tom Fletcher, Sarang Joshi, Stéphanie Allassonnière , and many others, have contributed.

Probably the most basic statistical problem is computing the average (or some location estimator) of a dataset. Since Riemannian manifolds are not affine spaces, the standard definition of averages in Euclidean spaces does not apply, but their characterization as optimal centers in terms of the squared distance does. This leads to the definition of Fréchet means (or centers of mass), which, in any metric space, are defined (given a dataset \(x_1, \ldots, x_n\)) as minimizers of the function \[ c \mapsto \frac1n \sum_{k=1}^n \mathrm{dist}(x_k, c)^2. \] Some of the usual features of Euclidean averages do not apply to Fréchet means, as the latter, in particular, are not necessarily unique (except in the special situation of so-called Hadamard spaces) and most of the time do not have a closed form expression (see, e.g., I. Chavel's textbook). Gradient descent algorithms can be designed in order to minimize the sum of square distances, but computing the gradient of the sum of squares require the computation of the inverse of the Riemannian exponential (the Riemannian logarithm), which is not always a simple task.

Using the optimal control formulation of LDDMM, a centroid-based generative statistical model in shape space is proposed in
[1] A Bayesian generative model for surface template estimation, J. Ma, M.I. Miller, L. Younes, Journal of Biomedical Imaging, 16, 2010,
together with an algorithm for the estimation of the centroid from data, which can be considered as a computation of a centroid when the points \(x_1, \ldots, x_n\) are corrupted by noise.
An earlier paper,
[2] Bayesian template estimation in computational anatomy, J. Ma, M.I. Miller, A. Trouvé, L. Younes, NeuroImage 42 (1), 252-261, 2008,
develops a similar method for images.

This approach can be completed with principal component analysis, as studied in:
[3] Statistics on diffeomorphisms via tangent space representations, M. Vaillant, M.I .Miller, L. Younes, A. Trouvé, NeuroImage 23, S161-S169, 2004
[4] Principal component based diffeomorphic surface mapping, A. Qiu, L. Younes, M.I. Miller, Medical Imaging, IEEE Transactions on 31 (2), 302-311, 2012
[5] Robust Diffeomorphic Mapping via Geodesically Controlled Active Shapes, D. Tward, J. Ma, M. Miller, L. Younes, International journal of biomedical imaging, 2013

Still in relation with probabilistic models and statistics, the following paper explores some diffusion models in shape spaces
[6] Learning shape trends: Parameter estimation in diffusions on shape manifolds V Staneva, L Younes, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.