عنوان

Large-Scale Interpretable Multi-View Learning for Very High-Dimensional Problems with Application to Multi-Omic Data

پدید آورنده

Shams Solari, Omid

موضوع

رده

کتابخانه

Center and Library of Islamic Studies in European Languages

محل استقرار

استان: Qom ـ شهر: Qom

تماس با کتابخانه : 32910706-025

NATIONAL BIBLIOGRAPHY NUMBER

Number

TL3w7613s4

LANGUAGE OF THE ITEM

.Language of Text, Soundtrack etc

انگلیسی

TITLE AND STATEMENT OF RESPONSIBILITY

Title Proper

Large-Scale Interpretable Multi-View Learning for Very High-Dimensional Problems with Application to Multi-Omic Data

General Material Designation

[Thesis]

First Statement of Responsibility

Shams Solari, Omid

Subsequent Statement of Responsibility

Bickel, Peter J; Brown, James B

.PUBLICATION, DISTRIBUTION, ETC

Name of Publisher, Distributor, etc.

UC Berkeley

Date of Publication, Distribution, etc.

2019

DISSERTATION (THESIS) NOTE

Body granting the degree

UC Berkeley

Text preceding or following the note

2019

SUMMARY OR ABSTRACT

Text of Note

We discuss the sparse Canonical Correlation Analysis (CCA) problem in the context of high-dimensional multi-view problems, where we aim to discover interpretable association structures among multiple random vectors via their respective views with an emphasis on setting where the number of observations is too few compared to the number of covariates. Throughout this text, we use the term view define as observations of a random vector on an ordered set of subjects, which is the same for observations of all other random vectors involved in the analysis. We denote each view by Xi ∈ R n×pi , i = 1, . . . , m, where m is the number of random vectors, or equivalently number of views. In the first two chapters we consider linear association structures shared among multiple views, where the objective is to learn sparse linear combinations of multiple sets of covariates such that they are maximally correlated. In the first chapter we introduce a new approach to the sparse CCA, where we learn the sparsity pattern of the canonical directions in the first stage by casting this problem as two successively shrinking concave minimization programs which are solved via a first-order algorithm, and in the second stage we solve a small CCA problem by considering the sparsity patterns estimated in the first stage. We demonstrate via simulations that, in comparison to other available methods, our approach demonstrates superior convergence properties and capability to recover the underlying sparsity patterns and the magnitudes of the non-zero elements of the canonical directions, as well as, significantly lower computational cost. We then apply our method to a multi-omic environmental genetics study on fruit flies, where we hypothesise about the mechanism of adaptation of this model organism to environmental pesticides.In the second chapter we tackle a shared short-coming of sparse PCA and sparse CCA methods, which is that, in case of estimating multiple components or canonical directions for each view, these directions are not orthogonal to each other, which diminishes interpretability. While all other approaches estimate canonical directions one-by-one via the contraction scheme, we offer a block scheme where we estimate the first d canonical directions simultaneously. In this setting, we can more easily impose orthogonality, and also encourage disjoint sets of non-zero elements within multiple directions, resulting in more interpretable models. We also extended our model to what we call sparse Directed CCA, where we use an accessory variable, defined in the text, to try to capture variations related to a certain hypothesis, rather than the dominant variations which might be proven irrelevant to the main hypothesis. As a validating example, we apply our method to the lung cancer multi-omics available on The Cancer Genome Atlas, using survival data as our accessory variable. While regular sparse CCA exclusively identified correlation structures dominated by and communities separated by gender, our directed sparse CCA correctly identified two underlying communities which were significantly separated by survival.In the final chapter, we generalize our framework to discover non-linear association structures by proposing a two-stage sparse kernel CCA algorithm. We learn maximally aligned kernels in the first stage via sparse Multiple Kernel Learning (MKL), and then solve a KCCA problem in the second stage using learned kernels. We perform sparse MKL by forming an alignment matrix where its elements are the sample Hilbert Schmidt Independence Criterion of base kernels of pairs of views. These base kernels are functions of small sets of covariates of each view; therefore our sparse MKL approach provides interpretable solutions, as sparse convex linear combinations of base kernels. We finally provide an Apache Spark implementation of our methods introduced throughout the dissertation which makes users capable of running our methods on very high-dimensional datasets, e.g. observations on millions of Single Nucleotide Polymorphism loci, using distributed computing. We call this package SparKLe.R versions of our algorithms are also available. MuLe, BLOCCS, and SparKLe-R implements our methods presented in Chapters 1,2, and 3, respectively.

PERSONAL NAME - PRIMARY RESPONSIBILITY

Shams Solari, Omid

PERSONAL NAME - SECONDARY RESPONSIBILITY

Bickel, Peter J; Brown, James B

CORPORATE BODY NAME - SECONDARY RESPONSIBILITY

UC Berkeley

ELECTRONIC LOCATION AND ACCESS

Electronic name

[Thesis]

276903

عنوان Large-Scale Interpretable Multi-View Learning for Very High-Dimensional Problems with Application to Multi-Omic Data

پدید آورنده Shams Solari, Omid

موضوع

رده

کتابخانه Center and Library of Islamic Studies in European Languages

محل استقرار استان: Qom ـ شهر: Qom

NATIONAL BIBLIOGRAPHY NUMBER

LANGUAGE OF THE ITEM

TITLE AND STATEMENT OF RESPONSIBILITY

.PUBLICATION, DISTRIBUTION, ETC

DISSERTATION (THESIS) NOTE

SUMMARY OR ABSTRACT

PERSONAL NAME - PRIMARY RESPONSIBILITY

PERSONAL NAME - SECONDARY RESPONSIBILITY

CORPORATE BODY NAME - SECONDARY RESPONSIBILITY

ELECTRONIC LOCATION AND ACCESS

عنوان

Large-Scale Interpretable Multi-View Learning for Very High-Dimensional Problems with Application to Multi-Omic Data

پدید آورنده

Shams Solari, Omid

کتابخانه

Center and Library of Islamic Studies in European Languages

محل استقرار

استان: Qom ـ شهر: Qom