Practical Semiparametric Inference with Bayesian Nonparametric Ensembles
[Thesis]
Liu, Jeremiah Zhe
Coull, Brent A.
Harvard University
2019
132 p.
Ph.D.
Harvard University
2019
Set in the practical situation where the data-generating process is not known and there are multiple imperfect candidate models available, this thesis studies how to construct an approximation model that optimally captures the relevant aspect of the data, for the purpose of conducting sound inference. We consider three types of inference objectives: hypothesis testing (Chapter 2), spatiotemporal prediction (i.e. estimating conditional mean) (Chapter 3), and uncertainty quantification (i.e. estimating distribution function) (Chapter 4). We focus on regression models for continuous outcome. Specifically, we propose Bayesian Nonparametric Ensemble (BNE), a general modeling approach that combines the a priori information encoded in candidate models using ensemble methods, and then addresses the systematic bias in the candidate models using Bayesian nonparametric machinery. As a result, BNE specifies a large model space that is centered around the ensemble of candidate models. Through both theoretical investigation and extensive numeric studies, we show that the proposed approach achieves a valid and powerful test for nonlinear effects (Chapter 2), improves predictive performance (Chapter 3), and provides calibrated quantification of its varying degree of model uncertainty over the feature space (Chapter 4). The proposed method is applied to the detection of nutrition-environment interaction effect on early-stage neuro-development in Bangladesh children, and the integration of multiple spatial prediction models for PM 2.5 levels in Eastern Massachusetts, USA.