• No results found

EWAS –from raw data to results

N/A
N/A
Protected

Academic year: 2022

Share "EWAS –from raw data to results"

Copied!
27
0
0

Laster.... (Se fulltekst nå)

Fulltekst

(1)

EWAS –from raw data to results

Jon Bohlin, Senior scientist FHI

Dept of infection control epidemiology and modeling Centre for Fertility and Health

AMR Centre

(2)

Course outline

Part 1: Introduction to epi-genetics and the Illumina Humanmethylation450k platform

• Part 2: Overview of methods for analysis of data from Illumina Humanmethylation450k

(3)

Methylation analyses

• Epi-genetics has become popular since it provides

insight into how the environment may influence gene regulation.

• One epi-genetic method is that of methylation. There are several different types of methylation but we will here only focus on one (5-methylcytosine)

• Somewhat simplified we might say that methylation reduces gene expression

• Genomic methylation patterns change throughout the life course

• The technology is new but evolves quickly

(4)

Cytosine

5-methylcytosine

(5)
(6)

Illumina 450k nomenclature

• One observation (1 sample)=one array (450k methylation sites)

• 12 observations on 1 slide

• 1 plate max 8 slides (96 arrays)

(7)

Methylation platforms

• Illumina and NimbleGen are based on «microarray»- technology

• Oligomers (like GWAS) and bisulfit converted counterparts attached to respective beads

• Hybridisation with a methylated base=green light

• No hybridisation gives red light

• Intensities vary for both lights

• Intensities are converted to M –methylated and U –un- methylated, and subsequently to «beta», 0<=beta<=1

• beta is a ratio of methylated to unmethylated

(8)

Type I and II probes

• Type I probes: stronger signal due to distinct probes for methylated/unmethylated sites. Good in low density

CpG regions since it assumes same methylation status for adjacent CpG sites. One color.

• Type II probes: weaker signal and only one probe for methylated/unmethylated. Better in high density CpG regions as it can contain up to 3 underlying CpGs. Two colors.

(9)

Illumina beadchip dataset

(10)

Workflow –from start to finish

Quality control

- Removal of bad samples - Removal of bad probes

- Removal of SNP based probes

- Removal of inserted control probes - Removal of gender-issues

Normalization

- Correct for technical bias (i.e. between plates / batches)

- Correct for technology-specific features (type I/type II bias)

Analysis

- Identify DMR’s

- Correct for known biases in data - Correct for cell-type (cord blod)

(11)

Quality control (QC) –probe selection

• Remove probes with many missing (i.e. >10%)

• Remove observations with bad probes (detection p- value >0.01)

• «gender outliers» and duplicates

• SNPs are sometimes removed (can influence methylation status when close <10bp)

• X/Y chromosomes are also often removed to avoid bias

(12)

MDS plot to evaluate gender outliers

8 outliers

(13)
(14)

«Hidden» QC issues

• Methylated CpGs close to SNPs

• Non-uniquely mapped probes

• Probe design types (type 1, type 2)

• Bias introduced from different plates

batches

(15)

Normalization

• Examine possible problems relating to

«plates» (between and within array corrections)

• Batch-effects when combining multiple analyses (consortium/meta-analyses)

• Correct for type 1 and type 2 probes

(16)

No QC/normalization all chromosomes (left), QC/normalization (right) on two different datasets

(17)
(18)

Betas by Plate

Plate

#

N run

N passed QC

% passed

1 96 92 96%

2 96 69 72%

3 96 80 83%

4 96 87 91%

5 96 67 70%

6 96 83 86%

7 96 90 94%

8 96 88 92%

9 96 69 72%

Total 864 725

(19)

Image Day Batc

h N run N

passed QC

% passe

d 1/14/20

13 1 96 92 96%

1/18/20

13 2 60 42 70%

1/19/20

13 3 84 72 86%

1/20/20

13 4 36 25 69%

1/21/20

13 5 36 30 83%

1/22/20

13 6 12 10 83%

1/25/20

13 7 60 59 98%

1/26/20

13 8 13

2 112 85%

1/27/20

13 9 14

4 126 88%

1/28/20

13 10 96 88 92%

2/4/201

3 NA 12 0 0%

2/23/20

13 11 48 38 79%

2/24/20

13 12 48 31 65%

Betas by Batch

(20)

Dataset (batch) correction

• Necessary when combining 2 or more datasets

• Colored wrt dataset (batch), 2 pictures,

• PCA of dataset 1 and dataset 2 before «ComBat», all chromosomes

(21)
(22)
(23)

Preprocessing and QC

• QC, «manual» using the dataset

• Gender correction can be performed using PCA (/MDS), and plotting (standard functions in R)

• PCA(/MDS) can also be used to assess the needed for within-, between array normalization, not forgetting batch correction

(24)

Normalization and Preprocessing

• Removal of systematic biases is difficult in EWAS

studies. There is also a danger of introducing new ones

• Probe bi-modality can be challenging to work with in a statistical setting

• Normalization procedures tend to depend on datasets.

Sometimes its not needed at all (i.e. this only applies for within and between array normalization).

• Type-correction normalization must always be

performed; BMIQ and RCP seems to be the preferred ones now. Careful with SWAN…

(25)

* Normalization, correction for technical bias (both within- and between array), can be

performed using minfi, WateRmellon, methylumi (Bioconductor)-packages

* ComBat is a good method for performing

between-batch correction. Both parametric and empirical Bayes methods are available. Beware of introduced bias however

* RnBeads also strongly recommended

Normalization and Preprocessing

(26)

Papers that will get you going with pre-processing and QC

Comprehensive analysis of DNA methylation data with RnBeads: Assenov, Y., Müller, F., Lutsik, P., Walter, J., Lengauer, T., & Bock, C.

Preprocessing, normalization and integration of the Illumina

HumanMethylationEPIC array with minfi: Fortin JP, Triche TJ Jr, Hansen KD.

Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays: Aryee MJ, Jaffe AE, Corrada-Bravo H, Ladd-Acosta C, Feinberg AP, Hansen KD, Irizarry RA.

A data-driven approach to preprocessing Illumina 450K methylation array data:

Pidsley R, Y Wong CC, Volta M, Lunnon K, Mill J, Schalkwyk LC.(wateRmellon package)

A framework for analyzing DNA methylation data from Illumina Infinium HumanMethylation450 BeadChip.: Wang Z, Wu X, Wang Y.

A systematic assessment of normalization approaches for the Infinium 450K

methylation platform: Michael C Wu Bonnie R Joubert Pei-fen Kuan Siri E Håberg Wenche Nystad Shyamal D Peddada and Stephanie J London

quantro: a data-driven approach to guide the choice of an appropriate normalization method: Hicks SC, Irizarry RA

(27)

Outline

• Preprocessing (QC, normalisation)

• Transformation (beta-value versus M-value)

• Analysis:

1-1 regression (GLM, etc.)

Shrinkage methods (LASSO/RIDGE+variants)

Dantzig selector

More detailed «time series» analyses of each individual(?)

Region-based analysis of candidate genes and identified regions (DMR) – bumphunting

Pathways/GO

Referanser

RELATERTE DOKUMENTER

By not having this over a secure channel most local users would be able to force the phone to install a new (or old) version og the operating system or a software package

This paper proposes to train a deep convolutional neural network on vibration time series data based on combinations of healthy and simulated fault data.. The architecture is based

Vi har prøvd å finne ut om variablene utdanning og inntekt hadde noen effekt på spredningen til differansen mellom inntektene ved de to registreringsmåtene. Etter å ha foretatt en

Our approach is based on the use of OMG Model-Driven Architecture (MDA) for abstracting platform-specific schemas and instances to platform-independent metamodels and models,

Data which are used as input data in research, but which have been collected, generated or processed by other researchers or research institutions than those conducting the

Fluid consists of three tiers: the Fluid framework, which provides a flexible deployment space alongside an XML- based type system; the Fluid component model, which al- lows a

Examples include a method to precom- pute dynamic scenes and reconstruct simulations among cyclic events via model reduction [JF03], and a method to ease facial animation task

This aggregation was conducted over a regular grid, and the utilized glyphs encoded multivariate relationships, average wind direction during data collection, number of