index_01crop.gif index_02crop.gif
index_456.jpg
spacer.gif


Using Our Discipline to Enhance Human Welfare
August 7 - 11, 2005

Lab 4 - Differential Expression and Linear Modeling using limma

Ben Bolstad.   August 6-7, 2005.

1. Software required for this lab.

You will need R 2.1.0 or R 2.1.1 (http://www.R-Project.org) for this lab. This section lists all of the R packages you need to have installed and also lists some additional R packages which are recommended. Most of the R packages can be installed from the Bioconductor site automatically using:
source("http://www.bioconductor.org/getBioC.R")
getBioC()
However, Bioconductor releases only occur once every six months, whereas the authors of these labs would typically use a more up-to-date version of some packages e.g. limma, which is why it is advisable to check the R package version numbers below and install them from the links provided if your installation is not up-to-date.

1.1 Required R packages

It is highly desirable to have these R packages in a directory in which you have write permission, especially for the estrogen package. You can use .libPaths("C:/Custom/R/library/directory") or .libPaths("C:\\Custom\\R\\library\\directory") before you run install.packages() (or equivalent) to install the package(s) in a customized directory location.

Package WindowsMacOS XSource
affy_1.6.7 affy_1.6.7.zip affy_1.6.7.tgz affy_1.6.7.tar.gz
Biobase_1.5.12 Biobase_1.5.12.zip Biobase_1.5.12.tgz Biobase_1.5.12.tar.gz
estrogen_1.5.0 estrogen_1.5.0.zip estrogen_1.5.0.tar.gz estrogen_1.5.0.tar.gz
hgu95av2_1.8.4 hgu95av2_1.8.4.zip hgu95av2_1.8.4.tar.gz hgu95av2_1.8.4.tar.gz
hgu95av2cdf_1.5.1 hgu95av2cdf_1.5.1.zip hgu95av2cdf_1.5.1.tar.gz hgu95av2cdf_1.5.1.tar.gz
limma_2.0.2 limma_2.0.2.zip limma_2.0.2.tar.gz limma_2.0.2.tar.gz
statmod_1.1.1 or statmod_1.2.0 statmod_1.2.0.zip statmod_1.2.0.tar.gz statmod_1.2.0.tar.gz
xtable_1.2-5 xtable_1.2-5.zip xtable_1.2-5.tgz xtable_1.2-5.tar.gz

1.2 Recommended R packages

Package WindowsMacOS XSource
affylmGUI_1.3.1 affylmGUI_1.3.1.zip affylmGUI_1.3.1.tar.gz affylmGUI_1.3.1.tar.gz
limmaGUI_1.3.9 limmaGUI_1.3.9.zip limmaGUI_1.3.9.tar.gz limmaGUI_1.3.9.tar.gz
R2HTML_1.54 R2HTML_1.54.zip R2HTML_1.54.tar.gz R2HTML_1.54.tar.gz
sma_0.5.14 sma_0.5.14.zip sma_0.5.14.tgz sma_0.5.14.tar.gz
tkrplot_0.0-12 tkrplot_0.0-12.zip tkrplot_0.0-12.tar.gz tkrplot_0.0-12.tar.gz

1.3 Required data

The Estrogen data set is required, but this is available as an R package, so it is listed in Section 1.1. An additional data set, ApoAI is required for this lab. It can be obtained from:
http://bioinf.wehi.edu.au/marray/jsm2005/apoai.zip.

2. Introduction.

This lab introduces linear modelling as a tool for identifying differentially express genes in the conxtext both a cDNA microarray experiment and an Affymetrix GeneChip microarray experiment. The main R library we will use for this purpose is called limma an abbreviation for "Linear Models for Microarray Analysis". Labs 1 and 2 introduced pre-processing for two color arrays. Lab 3 discussed pre-processing for Affymetrix GeneChips. In a typical microarray analysis project these procedures would be applied before beginning the search for differentially expressed genes. For this lab we will do a the minimal amount of pre-processing for brevity.

3. ApoAI data set

In this section we consider a case study where two RNA sources are compared through a common reference RNA. The analysis of the log-ratios involves a two-sample comparison of means for each gene.

In this example we assume that the data is available as an RG list in the data file ApoAI.RData.

Background. The data is from a study of lipid metabolism by Callow et al (2000) ([1]). The apolipoprotein AI (ApoAI) gene is known to play a pivotal role in high density lipoprotein (HDL) metabolism. Mice which have the ApoAI gene knocked out have very low HDL cholesterol levels. The purpose of this experiment is to determine how ApoAI deficiency affects the action of other genes in the liver, with the idea that this will help determine the molecular pathways through which ApoAI operates.

Hybridizations. The experiment compared 8 ApoAI knockout mice with 8 wild type (normal) C57BL/6 ("black six") mice, the control mice. For each of these 16 mice, target mRNA was obtained from liver tissue and labelled using a Cy5 dye. The RNA from each mouse was hybridized to a separate microarray. Common reference RNA was labelled with Cy3 dye and used for all the arrays. The reference RNA was obtained by pooling RNA extracted from the 8 control mice.

Number of arraysRed (Cy5)Green (Cy3)
8Wild Type "black six" mice (WT)Pooled Reference (Ref)
8ApoAI Knockout (KO)Pooled Reference (Ref)

The experimental design could also be described in a diagram, such as:

ApoAIdesign.png

This experiment is slightly old. Today it would be quite unusual to use this many replicate arrays without including any dye-swaps. In two color arrays, dye bias is still a significant problem. This is the main reason why normalization is always necessary.

This is an example of a single comparison experiment using a common reference. The fact that the comparison is made by way of a common reference rather than directly as for the swirl experiment makes this, for each gene, a two-sample rather than a single-sample setup.

3.1 Loading and normalizing the data

 library(limma)
 load("ApoAI.RData") # from: http://bioinf.wehi.edu.au/marray/jsm2005/apoai.zip
 objects()
 names(RG)

 MA <- normalizeWithinArrays(RG) # uses print-tip loess by default

3.2 Defining a design matrix

In order to construct a design matrix, let us remind ourselves of the linear model which we are fitting for each gene:

LinearModel.png

where y.png is the vector of normalized log ratios from the sixteen arrays, Ey.png is the Expected Value of y.png, X.png is the design matrix and alpha.png is the vector of log ratios to estimate, corresponding to the "M" (fold change) column in the final list of differentially expressed genes given by topTable). The estimated log ratios are also known as "coefficients", "parameters" and "log fold changes".

This experiment has three types of RNA: Reference (Ref), Wild Type (WT), and Knockout (KO), so it is sufficient to estimate two log ratios in the linear model for each gene, i.e. we will estimate two parameters, so our design matrix should have two columns. In our case, the two parameters in the alpha.png vector are the log ratios which compare gene expression levels in WT vs Ref and KO vs WT. (There are other possible parameterizations which could have been chosen instead. We are using one which allows us to estimate the contrast of interest (KO vs WT) directly from the linear model fit, rather than estimating it later as a contrast (i.e. a linear combination of parameters estimated from the linear model).

The design matrix we will use is:

ApoAIdesignMatrix.png

where the first column is for the "WT vs Ref" parameter and the second column is for the "KO vs WT" parameter. The first 8 arrays hybridize WT RNA with Ref RNA so it makes sense that they each have a '1' in the WT vs Ref column. The last 8 arrays hybridize KO RNA with Ref RNA which corresponds to the sum of the two parameters, "WT vs Ref" and "KO vs WT" which is clear if you replace "vs" with a minus sign (remembering that everything has been log2 transformed so that subtraction here actually represents a log ratio).

This design matrix can be defined in R as follows:

 design <- matrix(c(rep(1,16),rep(0,8),rep(1,8)),ncol=2)
 colnames(design) <- c("WT-Ref","KO-WT")
 design

3.3 Fitting a linear model

 fit <- lmFit(MA,design=design)
 names(fit)

3.4 Empirical Bayes statistics

 fit <- eBayes(fit)
 names(fit)

3.5 Displaying table(s) of differentially expressed genes

We now use the function topTable to obtain a list the genes with the most evidence of differential expression between the Knockout and Wild-Type RNA samples. The knockout gene (ApoAI) should theoretically have a log fold change of minus infinity, but microarrays cannot measure extremely large fold changes. While the M value of the ApoAI gene in the topTable may not have much biological meaning, the high ranking shows that this gene is consistently down-regulated across the replicate arrays.

 topTable(fit,coef="KO-WT")

 numGenes <- nrow(RG$genes)
 completeTableKOvsWT <- topTable(fit,coef="KO-WT",number=numGenes)
 write.table(completeTableKOvsWT,file="KOvsWTgenes.xls",sep="\t",quote=FALSE,col.names=NA)

The ".xls" extension is used so that the (tab-delimited text) file can be opened in Excel by double-clicking its icon.

The arguments of topTable can be studied in more detail with ?topTable or args(topTable) or just topTable. Currently in limma, topTable is a high-level wrapper function which calls a lower level function "toptable". The default method for ranking genes is the B statistic (log odds of differential expression, Lonnstedt and Speed 2003 [2]), but the moderated t statistic and p-value can also be used. Using the average fold-change (the M column) is not recommended because this ignores the variability between replicate arrays.

3.6 An M A plot using coefficients fitted from the linear model

Using an M A plot, we can see which genes are selected as being differentially expressed by the B statistic (log odds of differential expression), which is the default ranking statistic for the topTable. Of course, the differentially expressed genes selected by the B statistic may not have the most extreme fold changes (M values), because some of the genes with extreme average fold changes may vary significantly between replicate arrays so they will be down-weighted by the empricial Bayes statistics.

 M <- fit$coefficients[,"KO-WT"]
 A <- fit$Amean
 ord <- order(fit$lods[,"KO-WT"],decreasing=TRUE)
 top10 <- ord[1:10]
 plot(A,M,pch=16,cex=0.1)
 text(A[top10],M[top10],labels=substring(fit$genes[top10,"NAME"],1,5),cex=0.8,col="blue")

4. Estrogen data set

For this section of the lab, we will use data from a set of eight Affymetrix chips from a 2x2 factorial-design experiment designed to measure changes in gene expression in a breast-cancer cell line due to the presence (or absence) of estrogen and due to a time effect (10 hours or 48 hours). The experiment was performed by Scholtens et. al [6] who have kindly provided their data for public access on the Bioconductor website: http://www.bioconductor.org/data/experimental.html. The Estrogen data is stored in the "data" subdirectory of an R package called "estrogen" which is listed on this website. It can be installed like any other R package, then the data can be found from within R in the directory:
system.file("data",package="estrogen").

This data set is described in detail in the "Estrogen 2x2 Factorial Design" vignette by Denise Scholtens and Robert Gentleman [7].

The following description of the experiment is taken from that vignette.

Experimental Data

"The investigators in this experiment were interested in the effect of estrogen on the genes in ER+ breast cancer cells over time. After serum starvation of all eight samples, they exposed four samples to estrogen, and then measured mRNA transcript abundance after 10 hours for two samples and 48 hours for the other two. They left the remaining four samples untreated, and measured mRNA transcript abundance at 10 hours for two samples, and 48 hours for the other two. Since there are two factors in this experiment (estrogen and time), each at two levels (present or absent,10 hours or 48 hours), this experiment is said to have a 2x2 factorial design."

The table below describes the experimental conditions for each of the eight arrays. When stored as tab-delimited text, this is known as the RNA Targets file.

NameFileNameTarget
Abs10.1low10-1.celEstAbsent10
Abs10.2low10-2.celEstAbsent10
Pres10.1high10-1.celEstPresent10
Pres10.2high10-2.celEstPresent10
Abs48.1low48-1.celEstAbsent48
Abs48.2low48-2.celEstAbsent48
Pres48.1high48-1.celEstPresent48
Pres48.2high48-2.celEstPresent48

The file (shown as a table above) is known as the "RNA Targets" file in limma or in affylmGUI. It should be stored in tab-delimited text format and it can be created either in a spreadsheet program (such as Excel) or in a text editor. The column headings must appear exactly as above. Each chip should be given a unique name or number (in the Name column). The Affymetrix CEL file name should be listed for each chip in in the FileName column. The Target column tells limma or affylmGUI which chips are replicates. By using the same "Target" name for the first two rows in the table, we are telling limma or affylmGUI that these two CEL files represent replicatechips for the experimental condition (Estrogen Absent, Time 10 hours). Note that we could use a different Targets file for this analysis in which the time effect was ignored if we only wanted to compare Estrogen Absent and Estrogen Present. This would simply require removing the 10's and 48's from the Target column.

There is actually a file called "phenoData.txt" supplied within the estrogen package, which could be used as a Targets file, but in this exercise you should use the format above or download "EstrogenTargets.txt" from http://bioinf.wehi.edu.au/marray/jsm2005/lab4/EstrogenTargets.txt. This way is simpler because we describe the type of RNA target in just one column (Target), rather than in two columns (estrogen and time.h).

4.1 Loading the data and producing an exprSet object

Lab 3 discussed how to pre-process and produce gene expression measures for Affymetrix GeneChip datasets. The first stage for the analysis in this lab is to produce RMA expression values for the data stored in the estrogen package. This package contains raw data in CEL file format. Before starting you should check that the file EstrogenTargets.txt is in your current directory. You can check this using dir(). To load the libraries we will need to use:

 library(limma)
 library(affy)
 library(hgu95av2cdf)

The next stage is to read this data and compute RMA expression measures. Accomplish this using:

 datadir <- system.file("extdata",package="estrogen")
 dir(datadir)

 # Please ensure that either "EstrogenTargets.txt" from 
 # http://bioinf.wehi.edu.au/marray/jsm2005/Data/EstrogenTargets.txt 
 # is in the current working directory as given by getwd(), or that you 
 # provide the read.phenoData function with a full path to 
 # "EstrogenTargets.txt", e.g. "C:/JSM/EstrogenTargets.txt" 
 # or "C:\\JSM\\EstrogenTargets.txt".

 pd <- read.phenoData("EstrogenTargets.txt",header=TRUE,row.names=1,as.is=TRUE)

 abatch <- ReadAffy(filenames=pData(pd)$FileName, celfile.path=datadir)
 abatch@phenoData <- pd
 eset <- rma(abatch)

As discussed in Lab 3 it is usual, and appropriate, to check dataset quality before continuing your analysis. However, due to brevity we will skip over this in this lab. A full set of quality assessment plots can be found at http://www.stat.berkeley.edu/~bolstad/PLMImageGallery/ under the title "estrogen". These plots show no significant quality problems with any arrays in this dataset.

4.2 Defining a design matrix

Defining a design matrix is quite simple for Affymetrix data because each microarray chip has only a single channel, as distinct from the two-color cDNA microarrays slides. Recall that we used an RNA Targets file (stored in tab-delimited text) to describe the Estrogen experiment:

NameFileNameTarget
Abs10.1low10-1.celEstAbsent10
Abs10.2low10-2.celEstAbsent10
Pres10.1high10-1.celEstPresent10
Pres10.2high10-2.celEstPresent10
Abs48.1low48-1.celEstAbsent48
Abs48.2low48-2.celEstAbsent48
Pres48.1high48-1.celEstPresent48
Pres48.2high48-2.celEstPresent48

When we created an object of class exprSet with rma, the phenotype data (i.e. the RNA Targets table) was automatically included in our eset object (of class exprSet), and it can be recovered using either the pData method of the exprSet class.

 targets <- pData(eset)

We have four pairs of replicate arrays (each of which is a single-channel array), so we should estimate four parameters in the linear model. The design matrix should have four columns and the rows should describe the four pairs of replicate arrays.

estrogenDesignMatrix.png

where the four columns of the matrix correspond to EstAbsent10, EstPresent10, EstAbsent48 and EstPresent48.

This matrix can be defined in R as follows:

 design <- model.matrix(~ -1 + factor(targets$Target,levels=unique(targets$Target)))
 colnames(design) <- unique(targets$Target)
 numParameters <- ncol(design)
 parameterNames <- colnames(design)
 design

4.3 Fitting a linear model

Now that we have defined our design matrix, fitting a linear model is as simple as:

 fit <- lmFit(eset,design=design)

fit is an object of class MArrayLM. Its components can be accessed with the $ operator just a standard list in R.

 names(fit)

4.4 Defining a contrasts matrix

Contrasts are linear combinations of parameters from the linear model fit.

Contrasts.png

where beta.png is a vector of contrasts for gene g.png, C.png is the contrasts matrix, and alpha.png is a vector of coefficients (estimated log fold changes), obtained from a linear model fit.

We will estimate three contrasts (so our contrasts matrix will have three columns). The first contrast is an estrogen effect at time 10 hours, the second as an estrogen effect at time 48 hours and the third is a time effect in the absence of estrogen.

estrogenContrastsMatrix.png
 contrastNames <- c(paste(parameterNames[2],parameterNames[1],sep="-"),
                    paste(parameterNames[4],parameterNames[3],sep="-"),
                    paste(parameterNames[3],parameterNames[1],sep="-"))
 contrastsMatrix <- matrix(c(-1,1,0,0,0,0,-1,1,-1,0,1,0),nrow=ncol(design))
 rownames(contrastsMatrix) <- parameterNames
 colnames(contrastsMatrix) <- contrastNames
 contrastsMatrix

4.5 Fitting a linear model for the contrasts

 fit2  <- contrasts.fit(fit,contrasts=contrastsMatrix)
 names(fit2)

4.6 Empirical Bayes statistics

 fit2  <- eBayes(fit2)
 names(fit2)

4.7 Displaying table(s) of differentially expressed genes

We now use the function topTable to obtain a list genes differentially expressed between Estrogen-Present and Estrogen-Absent at time 10 hours, followed by a list of genes differentially expressed between Estrogen-Present and Estrogen-Absent at time 48 hours.

 topTable(fit2,coef="EstPresent10-EstAbsent10")
 topTable(fit2,coef="EstPresent48-EstAbsent48")

 numGenes <- nrow(eset@exprs)

 completeTableEst10 <- topTable(fit2,coef="EstPresent10-EstAbsent10",number=numGenes)
 write.table(completeTableEst10,file="Est10genes.xls",sep="\t",quote=FALSE,col.names=NA)

 completeTableEst48 <- topTable(fit2,coef="EstPresent48-EstAbsent48",number=numGenes)
 write.table(completeTableEst48,file="Est48genes.xls",sep="\t",quote=FALSE,col.names=NA)

4.8 Linking the gene list to annotation information on the Internet

If the genes int the topTable have standard IDs (e.g. UniGene or GenBank), then they can be linked with external annotation information on the Internet. Load the annotation package hgu95av2, which can be obtained from http://www.bioconductor.org/data/metaData.html.

 library(hgu95av2cdf)
 library(hgu95av2)

Now we obtain:

 geneIDs <- ls(hgu95av2cdf)

It is possible to extract the annotation information from the appropriate R environments within the annotation package (hgu95av2), and store it in simple R data structures as follows:

 # geneSymbols <- unlist(as.list(hgu95av2SYMBOL))
 # geneNames <- unlist(as.list(hgu95av2GENENAME))
 # unigene <- unlist(as.list(hgu95av2UNIGENE))

However we must be very careful, because in recent versions of Bioconductor annotation packages, some gene IDs can map to multiple gene names, multiple symbols and/or multiple unigene clusters. The method above, while appearing simple, also does not allow for the possibility that the gene names are stored in a different order in the annotation package from the gene IDs. We will therefore use some rather complicated-looking R code to extract the gene symbols, names and unigene IDs from the hgu95av2 environment. Multiple entries will be collapsed and separated by semicolons, e.g. if geneID 001 corresponds to gene names "Name1" and "Name2", these will be collapsed into "Name1; Name2".

 geneSymbols <- as.character(unlist(lapply(mget(geneIDs,env=hgu95av2SYMBOL),
     function (symbol) { return(paste(symbol,collapse="; ")) } )))
 geneNames <- as.character(unlist(lapply(mget(geneIDs,env=hgu95av2GENENAME),
     function (name) { return(paste(name,collapse="; ")) } )))
 unigene <- as.character(unlist(lapply(mget(geneIDs,env=hgu95av2UNIGENE),
     function (unigeneID) { return(paste(unigeneID,collapse="; ")) } )))

The key functions to note in the code above are mget which extracts multiple annotation strings from the appropriate annotation environmentand lapply which applies our multiple-entry-collapsing function to every element in a list of annotation strings.

Now we abbreviate the gene names to a maximum length of 40 characters, for neat formatting in a table, and we extract the unigene ID from the string which contains "Hs" (for Homo sapiens) as well as the unigene ID.

 geneNames <- substring(geneNames,1,40)
 unigene <- gsub("Hs\\.","",unigene)

 genelist <- data.frame(GeneID=geneIDs,GeneSymbol=geneSymbols,GeneName=geneNames,
   UniGeneHsID=paste("<a href=http://www.ncbi.nlm.nih.gov/UniGene/clust.cgi?ORG=Hs&CID=",
   unigene,">",unigene,"</a>",sep=""))

Now we recreate the toptable for the two contrasts considered earlier, "EstPresent10-EstAbsent10" and "EstPresent48-EstAbsent48" this time providing a hyperlink to the UniGene website for each gene in the toptable.

 unigeneTopTableEst10 <- topTable(coef="EstPresent10-EstAbsent10",n=20,fit=fit2,genelist=genelist)
 unigeneTopTableEst48 <- topTable(coef="EstPresent48-EstAbsent48",n=20,fit=fit2,genelist=genelist)
 library(xtable)
 xtableUnigeneEst10 <- xtable(unigeneTopTableEst10,display=c("d","s","s","s","s","g","g","g","e","g"))
 xtableUnigeneEst48 <- xtable(unigeneTopTableEst48,display=c("d","s","s","s","s","g","g","g","e","g"))

 cat(file="estrogenUniGeneE10.html","<html>\n<body>")
 print.xtable(xtableUnigeneEst10,type="html",file="estrogenUniGeneE10.html",append=TRUE)
 cat(file="estrogenUniGeneE10.html","</body>\n</html>",append=TRUE)

 cat(file="estrogenUniGeneE48.html","<html>\n<body>")
 print.xtable(xtableUnigeneEst48,type="html",file="estrogenUniGeneE48.html",append=TRUE)
 cat(file="estrogenUniGeneE48.html","</body>\n</html>",append=TRUE)

The display argument to the xtable function is used to specify the format of text or numbers displayed in cells of the HTML table:

Format code Meaning
dDecimal (base ten) integer, e.g. 48
sCharacter string, e.g. "Block"
gGeneral real floating-point number, e.g. 8.25
eFloating point number in exponent format, e.g. 1.02E-05

4.9 M A plots using contrasts from the linear model

Now we plot an M vs A plot for each contrast. Recall that M is a log fold change in base 2 and A is a log intensity in base 2 (in this case for each gene, we will use the average of all of the A values). The geneSymbols vector should be available from Section 3.8. If you could not complete Section 3.8 because you don't have the hgu95av2 annotation package installed, then you can just use the geneIDs vector also described in Section 3.8, which does not require the annotation package.

 M.Est10 <- fit2$coefficients[,"EstPresent10-EstAbsent10"]
 A <- fit2$Amean
 ord.Est10 <- order(fit2$lods[,"EstPresent10-EstAbsent10"],decreasing=TRUE)
 top30.Est10 <- ord.Est10[1:30]
 plot(A,M.Est10,pch=16,cex=0.1)
 text(A[top30.Est10],M.Est10[top30.Est10],labels=geneSymbols[top30.Est10],cex=0.8,col="blue")

 M.Est48 <- fit2$coefficients[,"EstPresent48-EstAbsent48"]
 ord.Est48 <- order(fit2$lods[,"EstPresent48-EstAbsent48"],decreasing=TRUE)
 top30.Est48 <- ord.Est48[1:30]
 plot(A,M.Est48,pch=16,cex=0.1)
 text(A[top30.Est48],M.Est48[top30.Est48],labels=geneSymbols[top30.Est48],cex=0.8,col="blue")

 M.EstAbs48vs10 <- fit2$coefficients[,"EstAbsent48-EstAbsent10"]
 ord.EstAbs48vs10 <- order(fit2$lods[,"EstAbsent48-EstAbsent10"],decreasing=TRUE)
 top30.EstAbs48vs10 <- ord.EstAbs48vs10[1:30]
 plot(A,M.EstAbs48vs10,pch=16,cex=0.1)
 text(A[top30.EstAbs48vs10],M.EstAbs48vs10[top30.EstAbs48vs10],labels=geneSymbols[top30.EstAbs48vs10],cex=0.8,col="blue")

4.10 Venn diagrams using contrasts from the linear model

We can now examine which genes respond to Estrogen at either time (10 hours or 48 hours) by examining moderated F-statistics on 2 degrees of freedom and p-values calculated from these F-statistics:

 F.stat <- fit2$F
 p.value <- fit2$F.p.value

What p-value cutoff should be used? One guide is to examine control probe-clusters which are known not to be differentially expressed.

 i <- grep("AFFX",geneNames(eset))
 summary(p.value[i])

We will choose a p-value of 1.0E-5 which is well below the average p.value for the control probe-clusters. Consider those genes with moderated F-statistics with p-values below 1.0E-5, and classify these according to whether they are significantly up or down regulated at the early or late times:

 results <- classifyTestsF(fit2, p.value=1.0E-5)
 vennDiagram(results,names=c("Est10PresAbs","Est48PresAbs","EstAbs48vs10"))

Use the table command (below) to determine how many genes were up-regulated at 10 hours, and how many of those genes were still up-regulated at 48 hours. The same question can be answered for down-regulated genes, using the same table.

 table(E10=results[,1],E48=results[,2])

5. Acknowledgements

Thanks to Scholtens et al. [3] for providing the estrogen data on the Bioconductor site (http://www.bioconductor.org/data/experimental.html). Thanks to Yee Hwa Yang and Sandrine Dudoit for the ApoAI data.

Thanks also to Gordon Smyth for the limma usersguide which this lab was based on.

This lab is heavily based on a Lab produced by James Wettenhall for IBC 2004.

6. References

  1. Callow, M. J., Dudoit, S., Gong, E. L., Speed, T. P., and Rubin, E. M. (2000). Microarray expression profiling identifies genes with altered expression in HDL deficient mice. Genome Research 10, 2022-2029. ( http://www.genome.org/cgi/content/full/10/12/2022)
  2. Lonnstedt I, Speed T. Replicated microarray data. STAT SINICA. Volume 12. Pages 31-46. Jan 2002
  3. Scholtens D, Miron A, Merchant FM, Miller A, Miron PL, Iglehart JD, Gentleman R. Analyzing Factorial Designed Microarray Experiments. Journal of Multivariate Analysis. To appear.
  4. Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 3, No. 1, Article 3. (http://www.bepress.com/sagmb/vol3/iss1/art3/)
  5. Limma users guide
  6. Scholtens D, Miron A, Merchant FM, Miller A, Miron PL, Iglehart JD, Gentleman R. Analyzing Factorial Designed Microarray Experiments. Journal of Multivariate Analysis. To appear.
  7. Scholtens D, Gentleman R. Estrogen 2x2 Factorial Design. http://www.bioconductor.org/repository/devel/vignette/factDesign.pdf

7. Glossary

Knockout RNARNA extracted from a biological specimen which has had one gene artificially knocked out (removed) from it in a laboratory.
Wild Type RNARNA extracted from a biological specimen whose genes are in their natural form (as found in the wild).

Valid HTML 4.0!