Cellwise Outliers (arxiv.org)

arXiv:2604.14182v1 Announce Type: cross
Abstract: In statistics and machine learning, the traditional meaning of the terms outlier' and anomaly' is a case in the dataset that behaves differently from the bulk of the data. This raises suspicion that it may belong to a different population. But nowadays increasing attention is being paid to so-called cellwise outliers. These are individual values somewhere in the data matrix (or data tensor). Depending on the dimension, even a relatively small proportion of outlying cells can contaminate over half the cases, which is a problem for existing casewise methods. It turns out that detecting cellwise outliers as well as constructing cellwise robust methods requires techniques that are quite different from the casewise setting. For instance, one has to let go of some intuitive equivariance properties. The problem is difficult, but the past decade has seen substantial progress. For high-dimensional data the cellwise approach is becoming dominant, and typically can deal with missing values as well. We review developments in the estimation of location and covariance matrices as well as regression methods, principal component analysis, methods for tensor data, and various other settings.