A data-driven calibration for a non-asymptotic kernel two-sample test.

P. Lacroix, B. Michel, F. Picard, V. Rivoirard

We observe two populations of multivariate data described by p variables, where p is significantly larger than the population sizes. A two-sample test has to be performed to decide between the null hypothesis (the distributions of both populations are equal) and the alternative hypothesis (distributions are different). To take into account the complex structure of variables and overcome the curse of dimensionality problems, data are embedded in a well-chosen Reproducing Kernel Hilbert Space (RKHS).
In our work, we study a test statistic inspired by Harchaoui et al. (2008) generalizing the student t-test in a RKHS, and propose a non-asymptotic and implementable method to calibrate the test. First, through a spectral analysis, a theoretical upper bound of the test quantile is proposed. Second, a data-driven algorithm is implemented satisfying a control of the type I error and including the calibration of the unknown regularization hyperparameter.

Keywords: statistical tests, kernel methods, non-asymptotic, data-dependent calibration.

Scheduled

FENStatS-SEIO: Statistics and Data Science
June 11, 2025 10:30 AM
Auditorio 1. Ricard Vinyes

Other papers in the same session

A regression model on distributional data in a context of functional data analysis through an appropriate LDQ transformation

R. Verde, A. Balzanella, G. Borrata

Advances in Count Time Series: Addressing Zero Inflation and Structural Breaks in INAR Models

I. Pereira, M. Monteiro

Functional relevance based on the continuous Shapley value

P. Delicado, C. Pachón García

Bounds for the regression parameters in dependently censored survival models

I. Willems, J. Beyhum, I. Van Keilegom

A data-driven calibration for a non-asymptotic kernel two-sample test.

Other papers in the same session

Cookie policy