• No results found

Statistical Data Integration

N/A
N/A
Protected

Academic year: 2021

Share "Statistical Data Integration"

Copied!
24
0
0

Loading.... (view fulltext now)

Full text

(1)

Statistical Data Integration

Using Networks to Integrate a Variety of Big

Biomedical Data

Genevera I. Allen

Dobelman Family Junior Chair,

Department of Statistics and Electrical and Computer Engineering, Rice University, Department of Pediatrics-Neurology, Baylor College of Medicine, Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital.

(2)
(3)
(4)
(5)

Statistical Data Integration

Statistical Data Integration

Uses probabilistic models to jointly model multiple types of measurements taken on the same set of subjects.

(6)

Statistical Data Integration

Statistical Data Integration

Uses probabilistic models to jointly model multiple types of measurements taken on the same set of subjects.

Advantages:

Joint inference on patients.

I Harness power across multiple data sources.

Discover relationships between different types of biomedical info.

(7)

Statistical Data Integration: Markov Networks

Why Networks?

Visualize Big Data.

Discover relationships between biomarkers.

Model complex systems. Markov Networks:

Undirected graphical models based on conditional dependencies.

Probabilistic model - proper multivariate distribution.

(8)

Networks for Different Data Types

Existing (Markov) Network Types:

1 Gaussian Graphical Models (Continuous-Valued).

Glioblastoma gene expression network (microarray).

2 Ising Models (Binary-Valued). 3 Gaussian-Ising Models.

(9)

Networks for Different Data Types

Existing (Markov) Network Types:

1 Gaussian Graphical Models (Continuous-Valued). 2 Ising Models (Binary-Valued).

Modules from lung cancer somatic mutation network.

(10)

Networks for Different Data Types

Existing (Markov) Network Types:

1 Gaussian Graphical Models (Continuous-Valued). 2 Ising Models (Binary-Valued).

(11)

Networks for Different Data Types

Existing (Markov) Network Types:

1 Gaussian Graphical Models (Continuous-Valued). 2 Ising Models (Binary-Valued).

3 Gaussian-Ising Models.

What about count-valued data? Others?

(12)

Networks for Different Data Types

Our Framework: Graphical Models via Exponential Families.

Assumption: Conditional distributions are Exponential Families.

I Ex: Gaussian, Bernoulli, Poisson, Exponential, Negative Binomial, etc.

A lot of math . . .

Theorem: Joint network distribution exists and has a closed form!

I Dependencies parameterized by products of sufficient statistics.

I Strong statistical guarantees for network inference.

(13)

Networks for Different Data Types

(14)

Networks for Different Data Types

(15)
(16)

Integrated Network Models

Block-Directed Graphical Models via Exponential Families.

Assumptions:

I Conditional distributions are Exponential Families.

I Variables belong to known groups and the directionality of dependencies between groups is known.

A lot of math . . .

Theorem: Joint integrated network distribution exists and has a closed form!

I Dependencies parameterized by products of sufficient statistics from different distributions.

I Strong statistical guarantees for network inference.

(17)

Integrated Network Models

(18)

Integrated Network Models

(19)

Applications & Implications

Implication

(20)

Applications & Implications

(21)

Applications & Implications

Breast Cancer Integrated Mutation-Gene Expression Network.

Blue nodes: RNA-sequencing Yellow nodes: genomic mutations

(22)
(23)
(24)

Acknowledgments

Allen Group: Manjari Narayan Frederick Campbell Yue Hu John Nagorski Yulia Baker Qijia Jiang Connor Barnhill Collaborators:

Pradeep Ravikumar, UT Austin Eunho Yang, IBM Watson

Zhandong Liu, BCM Matthew Anderson, BCM Ying-Wooi Wan, BCM

References

Related documents

furnishing transportation to the deceased as either a custom or as a privilege incidental to his employment. There was no material variance in the testimony of any of the

3 The data set covers 23,000 institutions in 45 countries across sub-Saharan Africa including commercial, savings and postal banks, regulated MFIs, mostly unregulated

Accordingly, you will submit your paper three times (see dates below). The final draft may be turned in no later than the beginning of the final exam, though you may submit it

“somewhat agree”, “agree” or “strongly agree” as a multiple night guest in a hotel to reuse their towel; 86.31% “somewhat agree”, “agree” or “strongly agree” as

So all that is needed for an extra slot (en expanded lot 31) is the physical connector with all relevant signals as on cartridge slot 1 and 2 with this slot 31 select signal..

Results indicate that without significant cost reductions or major changes to either market conditions or federal/state incentive schemes, Texas Gulf Coast offshore wind

The tax rate on gross incomes of SFr 175,000 is significantly different from zero at least at the 5 percent level in nearly all income groups except the fifth group. It has a

The H7-case of the XPC Barebone SN78SH7 combines the best features of many successful Shuttle Mini-PCs - including silent heatpipe cooling, enlarged power supply cooling fan