Treffer: Kernel-based hypothesis tests : large-scale approximations and Bayesian perspectives

Title:
Kernel-based hypothesis tests : large-scale approximations and Bayesian perspectives
Authors:
Contributors:
Sejdinovic, Dino, Filippi, Sarah, Teh, Yee Whye
Publisher Information:
University of Oxford, 2020.
Publication Year:
2020
Collection:
University of Oxford
Document Type:
Dissertation Electronic Thesis or Dissertation
Language:
English
Accession Number:
edsble.800069
Database:
British Library EThOS

Weitere Informationen

This thesis contributes to the field of nonparametric hypothesis testing (i.e. two-sample and independence testing) by providing a large-scale framework and developing a Bayesian perspective. We focus on nonparametric measures of homogeneity and dependence by considering the Hilbert norms between the embeddings of probability distributions in the reproducing kernel Hilbert space (RKHS). The rich representation provided by the associated kernel feature map enables the use of multivariate or non-Euclidean observations (e.g. strings and graphs) and it leads to powerful tests that are able to solve challenging problems given enough observations. However, the cost of computing the kernel matrix scales at least quadratically in the number of samples and makes it prohibitive to use in modern large-scale datasets. First, we propose three estimators of the well-known kernel dependence measure, the Hilbert Schmidt Independence Criterion (HSIC), namely the block-based estimator, the Nystrom estimator and the random Fourier feature (RFF) estimator, and establish the corresponding linear time independence test for each of the estimators. Secondly, we consider a normalised version of HSIC, the NOrmalised Cross COvariance (NOCCO) statistic, and propose an RFF approximated NOCCO. This results in a distribution free test that is robust to the kernel bandwidth misspecification. Thirdly, we propose a two-step conditional independence test that extends the popular two-step approach REgression with Subsequent Independence Test (RESIT) through RKHS valued regressions. When used as a part of the classical PC algorithm for causal inference, the resulting algorithm is more robust to hidden variables that induce nonfunctional associations. Finally, we utilise the classical Bayes factor formalism for model comparison and propose a Bayesian two-sample test by modelling the witness function of the well-known kernel measure of homogeneity, the Maximum Mean Discrepancy (MMD), with a Gaussian Process.