Analyzing Output Data from a DNA Microarray using SNOMAD

DNA Microarray data are obtained by the levels of intensities for each element of the matrix. These intensities can be compared in order to determine a correlation between these sets of data. A statistically significant correlation between the two intensity data sets may indicate that two genes are co-expressed. The program SNOMAD enables the refinement of paired microarray data. In effect, SNOMAD normalizes and standardizes the data so that a precise comparison may be made between the two intensity data.

First plot: uncorrected data

The plot shows the raw data plotted with intensity one on the y-axis and intensity two on the x-axis.

Second plot: global mean normalization data

The plot represents the data of the two intensity data sets normalized. This is an indication of how much each point deviates from the global mean after normalization. The x/y line is the global mean. The strongest correlation is seen up until y=4 and x=4 where the bulk of the points are located around the line. This means that the genes in that region of the matrix may show a high level of co-expression. Ideally, each point should be evenly distributed on either side of the line.

Third plot: log/log transformation

However, more mathematical manipulations are needed to eliminate artifacts. After the log transformation, the points are better spread on either side of the line as well as along the line. The log of intensity one versus the log of intensity two shows a better correlation by taking the log of both intensities in order to compare them better in terms of expression levels. If the points are located above the line, it means that the expression levels of intensity one genes are higher.

Fourth plot: mean log intensity x (mean expression level) versus log ratio (y differential expression)

This plot tries to relate the mean expression level of both intensity data sets with their ratio. The ratio gives you a measure of the magnitude of expression levels with one intensity data set with the second one. As shown by the plot, the ratio of intensity one over intensity two (y-axis) shows a clustering of the points at a value ranges from 0.2 going to zero. So, intensity one is 0.2 times brighter than intensity two, indicating a higher expression level for data set one. This correlates with the previous plot results. The bulk of the points with respect to the x-axis are located between zero and -0.5, which signifies that the mean intensity is low. This can be explained by the fact that if that many genes from intensity one have a higher expression level than intensity two, their mean intensity is lower because even though they have a higher expression level, there are still are fewer genes from set one with such expression levels. On the other hand, under the line, the expression level of set one is lower than set two and the mean expression intensity is higher indicating that more genes from set two have a higher intensity level and at the same time a lower expression levels.

Fifth plot: mean log intensity vs. log ratio along with locally calculated mean element intensity (red):

The red line is the new mean line that was calculated based on the correlation of the points in this plot. We can see an even distribution of points on either side of the red line, which shows a good correlation between set one and set two, confirming our results found above.

Sixth plot: mean log intensity vs. adjusted log ratio (residuals, i.e. vertical distance between points and red line in the plot above):

This plot is the same than above except the points have been plotted according to the correlation calculated above based on their position with respect to the red line. You can see a similar distribution of points on either side of the blue line. A strong correlation means a good basis for reporting co-expression of genes in set one with set two.

Seventh plot: The distribution of points are compared to standard deviation values calculated on the points above and below the green lines in each half of the plot. It shows again an even distribution about the green lines indicating that the standard deviation values are not very high, again confirming that the correlation is still good.

Eighth plot: Final plot

This x-axis is the mean log intensity and the y-axis is the log Z scores, which transformed our expression levels into expression levels deviation from the norm (blue line). The data have been completely treated and show that genes in set one and in set two have a high probability of being co-expressed. (Rougemont, 2003) (NCBI, 2006) (Carlo Colantuoni, 2000)

References

Carlo Colantuoni, George Henry. (2000). SNOMAD (Version 2000) [Computer software]. Baltimore: laboratory of Jonathan Pevsner.

NCBI. (2006). Yeast Genome DNA Microarray [Data file]. Available from http:/?/?www.ncbi.nlm.nih.gov/?entrez/?query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12720549

Rougemont, J. (2003). DNA microarray data and contextual analysis of correlation graphs. Bmc Bioinformatics, 4(15), 2105.