—  SHORT COURSE #47  —

Molecular Analysis of Tissues Using Gene Expression Arrays and Tissue Microarrays

Section I: Principles and Applications of Gene Expression Arrays

Edward Gabrielson


Overview:
High throughput gene array technologies offer opportunities to find new markers for disease states and recognize patterns of gene expression that may fingerprint distinct classes of disease. High-throughput tissue microarrays offer an effective mechanism to test and validate candidate markers (or proposed classes of disease). These two approaches are often used in combination for discovery and validation in molecular pathology.

This short course will provide a broad overview of different gene array technology platforms and the application of these technologies to the analysis of tissue samples. The depth of discussion is limited by time constraints, but the course should provide a general background on this technology and help pathologists who are contemplating use arrays in future work. With reference to gene expression arrays, the course will cover three general topics: 1) array technology (manufacturing processes, hybridization methods, and data acquisition), 2) data management and analysis, and 3) specimen requirements.

Gene Array Technology
Comparing different gene array platforms - oligonucleotide arrays and cDNA arrays, fluorescence labeling and radiolabeling:

There are several technology platforms in use today, each with advantages and disadvantages. In particular, the common platforms are oligonucleotide arrays, cDNA arrays spotted on glass slides, and cDNA arrays spotted on nylon membranes.

cDNA arrays were developed as an extension of early human genome project efforts to clone and sequence reverse-transcribed mRNA (expressed sequence tags, or ESTs). These cDNA clones can readily be spotted onto nylon membranes or glass slides in a compact, orderly arrangement.

In general, glass slides are used for hybridization of fluorescent-labeled samples and nylon filters are use for hybridization of radiolabeled samples. Each of these options (nylon filters/radiolabeling or glass slides/fluorescent labeling) has advantages and disadvantages. Fluorescence can be measured at high resolution and therefore array features can be densely packed. By contrast, there is considerable diffusion of radioactivity (P-32 or P-33) and this limits spatial resolution. Large numbers e.g., 10,000 or more) elements can be resolved on an area the size of a glass slide by fluorescence, but only about 2,000 elements can be resolved on the same area by P-33 radioactivity. Another important advantage of fluorescence labeling is that multiple samples, each labeled with a different fluor, can be co-hybridized to a single array. A common tactic used currently is to label an internal standard (or control) with one fluor and a test sample with a second fluor, allowing comparisons of multiple different samples by reference to the standard "control". This internal standard can compensate, to some extent, for the array-to-array variability in the quantity of cDNA deposited on the slide.

Radioactive labeling is still used for labeling samples, and this platform continues to have some merits. Importantly, the methods for radiolabeling are relatively easy and can be performed efficiently in most laboratories. In contrast, incorporation of fluors (particularly the Cy3 fluor) can be fickle. The reagents, including the radioisotope, are inexpensive. Paradoxically, the diffusion of radioactivity actually results in spot images that are more uniform than fluorescent images, simplifying the issue of data acquisition.

Radiolabeled samples are generally hybridized to nylon filter arrays using rotating hybridization chambers. This results in an even distribution of labeled sample over the filter. Fluorescent samples are generally hybridized to arrays printed on glass slides because glass has significantly less autofluorescence than nylon and because the cost of the fluorescence reagents necessitates the use of small reaction volumes. The mechanics of hybridizing fluorescence labeled probes – usually under a cover slip – makes this platform prone to uneven distribution of probe over the array.

Formerly, there was a significant difference in sensitivity between the nylon filter/ radiolabeling platform and the glass slide/ fluorescence platform, although this is becoming less of an issue with advances in labeling technology. For radiolabeling, approximately 1 μg of total RNA is required for each array experiment, in contrast to the 50-100 μg of RNA that is required for direct fluorescent labeling (i.e., incorporation of labeled nucleotide) [2]. However, newer labeling methods with a two-step fluorescent label can obtain strong signals with less than 5 μg of RNA. Two such products are marketed by Clontech and Genispere (http://www.genisphere.com ). The Affymetrix platform uses a proprietary fluorescence labeling method that now requires about 2 μg of RNA. In practical terms, 2 or 5 microdissected frozen section slides will provide approximately 2 μg of total RNA.

A number of laboratories are using amplification methods to increase the amount of material available for hybridization [3]. Kits are now available for RNA amplification (for example MessageAmp aRNA kit from Ambion). In general, there is a relative bias for moderately and highly expressed transcripts using these methods.

In general, glass slides are used for hybridization of fluorescent-labeled samples and nylon filters are use for hybridization of radiolabeled samples. Each of these options (nylon filters/radiolabeling or glass slides/fluorescent labeling) has advantages and disadvantages. Fluorescence can be measured at high resolution and therefore array features can be densely packed. By contrast, there is considerable diffusion of radioactivity (P-32 or P-33) and this limits spatial resolution. Large numbers e.g., 10,000 or more) elements can be resolved on an area the size of a glass slide by fluorescence, but only about 2,000 elements can be resolved on the same area by P-33 radioactivity. Another important advantage of fluorescence labeling is that multiple samples, each labeled with a different fluor, can be co-hybridized to a single array. A common tactic used currently is to label an internal standard (or control) with one fluor and a test sample with a second fluor, allowing comparisons of multiple different samples by reference to the standard "control". This internal standard can compensate, to some extent, for the array-to-array variability in the quantity of cDNA deposited on the slide.

Radioactive labeling is still used for labeling samples, and this platform continues to have some merits. Importantly, the methods for radiolabeling are relatively easy and can be performed efficiently in most laboratories. In contrast, incorporation of fluors (particularly the Cy3 fluor) can be fickle. The reagents, including the radioisotope, are inexpensive. Paradoxically, the diffusion of radioactivity actually results in spot images that are more uniform than fluorescent images, simplifying the issue of data acquisition.

Radiolabeled samples are generally hybridized to nylon filter arrays using rotating hybridization chambers. This results in an even distribution of labeled sample over the filter. Fluorescent samples are generally hybridized to arrays printed on glass slides because glass has significantly less autofluorescence than nylon and because the cost of the fluorescence reagents necessitates the use of small reaction volumes. The mechanics of hybridizing fluorescence labeled probes – usually under a cover slip – makes this platform prone to uneven distribution of probe over the array.

Formerly, there was a significant difference in sensitivity between the nylon filter/ radiolabeling platform and the glass slide/ fluorescence platform, although this is becoming less of an issue with advances in labeling technology. For radiolabeling, approximately 1 μg of total RNA is required for each array experiment, in contrast to the 50-100 μg of RNA that is required for direct fluorescent labeling (i.e., incorporation of labeled nucleotide) [2]. However, newer labeling methods with a two-step fluorescent label can obtain strong signals with less than 5 μg of RNA. Two such products are marketed by Clontech and Genispere (http://www.genisphere.com ). In addition, a number of laboratories are using amplification methods to increase the amount of material available for hybridization [3]. Kits are now available for RNA amplification (for example MessageAmp aRNA kit from Ambion). In general, there is a relative bias for moderately and highly expressed transcripts using these methods.

It is probably reasonable to say that cDNA arrays are becoming obsolete. The main advantages of using cDNA clones for array probes were that these probes were readily available, relatively inexpensive to prepare, and long sequences (thus providing a robust hybridization signal). The main disadvantages of cDNA clones as probes are that many of the commonly available clones are misidentified and that the long sequences cross hybridize to closely related sequences.

Oligonucleotide arrays have the important advantage of being able to select specific, highly unique sequences for each gene, thus minimizing cross-hybridization. Oligonucleotide arraysare currently manufactured by several vendors, including Affymetrix (http://www.affymetrix.com), Agilent (http://www.chem.agilent.com), and Clontech. (http://www.clontech.com). All of these arrays are printed on a glass surface and hybridized to fluorescent-labeled samples.

Affymetrix synthesizes oligonucleotides in situ on the array chip using photolithography, a technology developed for the computer chip industry [1]. Currently, this method is limited to synthesis of oligonucleotides of about 20 bases in length. A unique feature of this platform is that each gene is represented by multiple features consisting of partially overlapping sequences of the particular gene (figure 1). In addition to sequences that perfectly match the gene's wild-type sequence, the array includes single-base mismatch sequences. These panels of features help to determine whether signals are due to hybridization of the intended gene, or due to cross-hybridization.



It should be readily apparent that the Affymetrix platform gives a number of measurements for each gene, including perfect match and mismatch measurements. Affymetrix has software (Micro Array Suite, or MAS) that assimilates this data and converts the multiple measurements into a single value for each gene. Most users of this platform have faith that this software normalization is legitimate, but it should be noted that processing any particular data set with different versions of the MAS software results in different values.

Agilent uses a variation of inkjet technology to synthesize oligonucleotides up to 60 bases long directly on glass array surfaces. This manufacturer claims that sequences of this length provide optimal hybridization specificity, and therefore multiple features are not required for each gene. Clontech and other companies offer arrays of oligonucleotides that are spotted or printed by ink jet.

Array Manufacturing, Sample Preparation, Sample Hybridization, and Data Acquisition

Manufacturing cDNA arrays
Most participants of this course will use arrays that are prepared by a commercial manufacturer or by a core facility and therefore the manufacturing process will not be discussed in this short course. For the rare individual who intends to manufacture cDNA arrays, detailed protocols are available on-line that cover all aspects of array production, from amplifying clones to spotting. One useful source is an article published in Biotechniques by a group from TIGR [4]. This article is available online through a subscription (free} to Biotechniques at http://www.biotechniques.com. Protocols from a Cold Spring Harbor Library Microarray course (directed by Joe DeRisi) are also available online at http://www.microarrays.org.

One aspect of the manufacturing process that is worth mentioning is that of mechanical spotting. The variability among spotting tips is a particularly common cause of variability among the arrays produced. The tip originally developed by the Pat Brown lab, and used in most commercially manufactured arrayers, has a "pen and quill" configuration. This tip has a slot, which draws and releases DNA-containing solution through capillary action when the tip hits a solid surface. The amount of solution released with each hit varies, depending in part on the total amount remaining in the slot. Furthermore, these tips are subject to wear and distortion by the mechanics of spotting. An alternative type of tip uses a ring and pin configuration, which appears to have better reproducibility. In addition, spotting by non-contact methods is used by some machines, and this technology is being used increasingly by commercial producers of arrays.

Sample Labeling and Hybridization
Most labeling of sample mRNA is performed by incorporating label into a reverse transcriptase reaction. This can usually be done with total RNA as the starting point, eliminating the need to purify mRNA. For radiolabeling, P-33 or P-32 nucleotides (either CTP or ATP) are used in the reaction, and for fluorescence labeling, Cy3 or Cy5 tagged nucleotides are used. An effective protocol for radiolabeling samples is available on the Research Genetics Web site (http://resgen.com ), and effective protocols for labeling with fluorescence are available in the previously cited Biotechniques manuscript or on the Pat Brown lab Web page (http://cmgm.stanford.edu/pbrown/mguide/).

It is important for users of arrays to understand some of the technical aspects of hybridization. A number of factors can affect the hybridization of probe to template on the array, including buffering salts, temperature, and duration of hybridization. The presence of SDS in the hybridization buffer helps to minimize non-specific binding and the use of formamide maintains the denatured status of the probe molecules, even at low temperatures. Typically, hybridizations are carried out at relatively low, non-stringent, temperatures (e.g., 42° C) overnight and then washed several times under more stringent conditions to minimize non-specific binding. If one is particularly concerned about low-level expressed genes, the stringency of the washes becomes more important for differentiating low expression values from background noise. It is also important to remember that even though there is usually an excess of immobilized template on the glass slide or nylon membrane, only a portion of the labeled molecules from the sample actually hybridize to the template. Therefore, it is possible to increase signal by increasing the concentration of template to some extent.

Unfortunately, there is little published data on specificity of hybridizations for specific arrayed sequences. Intuitively, sequences that are rich in C and G will have relatively high levels of non-specific binding and this will become increasingly important for genes expressed at low levels, where non-specific hybridization may exceed specific hybridization.

Array Image Analysis and Data Acquisition
Data acquisition from array images can be quite complex, particularly for fluorescent labeling. There are several manufacturers (see table below) that currently produce scanners capable of detecting the Cy3 and Cy5 labels, and most are developing instruments capable of detecting other fluors as well. There are two primary manufacturers of phosporimagers used for quantitatively measuring signals from radiolabeled samples, Molecular Dynamics (http://www.mdyn.com ) and Fuji (http://www.fujimed.com ).

Fluorescent Slide Readers
Axon http://www.axon.com
GSI Lumonics http://www.gsilumonics.com
Genetic Microsystems http://www.affymetrix.com
Genomic Solutions http://www.genomicsolutions.com

Image analysis involves several steps. First, spots must be identified. This task is simplified to some extent by the fact that the robotic systems produce regularly aligned spots on the array. The most simple image analysis software packages use manually-aligned grids to direct spot-specific data acquisition and more sophisticated packages have "spot-finding" features. Irregularities in the array configuration and spurious signals can complicate this task.

Following spot identification, hybridization intensities of each spot are measured. For radioactive decay-generated spots, this is a relatively simple task because the scatter of radioactivity produces spots of relatively symmetrical geometry. In contrast, the high-resolution fluorescent spots have complex geometry, reflecting all irregularities of the original spot geometry. In most cases, background levels are also measured (preferentially specific for each spot) and these values are subtracted from hybridization intensities. The commercially available fluorescent slide readers usually come equipped with software for complete single-experiment analysis (i.e., converting an image to a quantitative data file).

Planning Gene Array Experiments
There is a common misperception that gene expression studies are "fishing expeditions" and that these studies therefore do not require an experimental design. In fact, the ultimate efficiency of gene array projects, and sometimes the ability to acquire any meaningful data at all, depends on careful planning [5].

Controls and Internal Standards
In the simplest experimental design, only two samples are compared to one another. This experiment requires two arrays if single channel labeling is used (e.g., radiolabeling or single color fluorescence for Affymetrix arrays), and these two arrays must be virtually identical for a valid comparison. It is tempting to use dual color fluorescence for such an experiment, because only one array is needed for the direct comparison of the two samples.

The number of arrays actually required goes up quickly, however. It is always advisable to perform each analysis in duplicate to eliminate measured differences that could be due to manufacturing variation and thus increase confidence in any gene expression differences being due to biological differences. In the case of a repeat of an experiment using dual color fluorescence, the fluorescent dyes are often "swapped" in the repeat experiment.

When the experimental design that involves direct comparisons of samples on shared arrays (using dual color fluorescence) the number of arrays required can increase exponentially as the number of samples increases. As shown in figure 1 below, the number of arrays required for direct comparisons of all samples in an experiment increases as an exponential function of the number of samples in the experiment. Thus, while it is tempting to start an experiment with direct comparisons, due to the small number of arrays needed for the first two or three samples, one should always be aware of the limited potential to scale up using this approach.



Figure 1: Experimental design with a series of direct comparisons in an experiment using arrays based on two-color fluorescence labeling. Each arrow represents a comparison, usually performed in duplicate.

The most effective experimental design for comparing multiple different samples uses indirect comparisons. With single channel labeling, comparison of multiple samples is a simple extension of an experiment that compares a small number of samples, assuming that all of the arrays used for the various samples have been manufactured in a highly consistent manner. Unfortunately, for most spotted arrays, there is considerable variation among the arrays across different manufacturing batches and even within a single batch. This, in fact, is an important rationale for using dual-color fluorescence because this platform allows the use of reference standards in an indirect comparison.

Figure 2 (below) diagrams an experiment based on indirect comparisons, using a common reference standard. Note that this experimental design ultimately allows only one sample to be measured per array. When duplicate measurements are performed with this experimental design, dye swapping between the sample and reference standard is commonly performed.



If a reference standard RNA is used in a series of arrays, this standard should represent most (ideally all) of the various gene transcripts that are expected in the samples. This provides at least a basic level of quality assurance for the array (assuring that the gene is truly represented on the array) and also provides a value to be used for calculating a ratio for the genes in the various samples. There are some commercially prepared reference RNA standards (e.g., Stratagene) that are commonly used for profiling cancer cell gene expression, which contain a mixture of RNA from a series of cultured cancer cell lines. Many different laboratories use this commercial preparation, facilitating the comparison of data among these different projects. Furthermore, the manufacturer has assured a reasonable level of consistency of this reference standard, among different lots and over time. A consistent standard over time will allow each new measurement to contribute to a cumulative experiment, leveraging the value of additional experimental work. However, this commercial preparation is relatively costly and it has been argued that it does not represent a comprehensive spectrum of different types of human cancers.

In summary, experimental design planning should consider the choice between a direct comparison or indirect comparison configuration, the appropriate number of replicates, and the selection of internal standards. Direct comparisons are appropriate for small scale experiments where future expansion is not a concern, whereas indirect comparisons are appropriate for most moderate to large experiments.

Analysis of Gene Array Experiments
Obtaining meaningful conclusions from the analysis of thousands of different genes, usually in a relatively small number of samples, is challenging. Fortunately, software applicable to gene array data analysis and data visualization is being developed at a rapid pace to help with the application of decision-based methods to this task. No software is a substitute for a fundamental understanding of the statistical methods being applied, and this usually requires active involvement of a statistician in the program. However, many individuals using cDNA arrays are becoming familiar with standard approaches to array data analysis and can apply software tools for a preliminary investigation of relationships and visualization of data. Several data analysis software packages are tabulated below.

BioDiscovery http://www.biodiscovery.com/
Silicon Genetics (Genespring) http://www.sigenetics.com/
Spotfire http://www.spotfire.com/
Stanford University (FREE!!)
(See also Eisen lab homepage at Lawrence Livermore)
http://rana.Stanford.EDU/software/
http://rana.Ibl.gov.
Partek http://www.partek.com
Rosetta http://rosetta.com

Data Normalization:
The first step in processing image analysis data is almost always normalization. This is necessary to adjust for differences in quantities (and quality) of starting RNA, differences in labeling efficiencies, and differences in detection efficiencies (or, in the case of phosphorimager detection of radiolabeled probes, differences in exposure times. Normalization can be based on a subset of "housekeeping" genes or the entire set of genes represented on the array.

For example, arrays with radionucleotide labeling will typically be imaged individually and the data from multiple arrays will be assembled together in a spreadsheet. Some arrays will have relatively high values compared to other arrays because the sample had more RNA or a better labeling reaction. To make reasonable comparisons of gene expression levels across the different samples, adjustments must be made to normalize the expression for each sample individually.

All of the array data analysis software packages that have been developed in recent years, as well as simple spreadsheet software such as Excel, can perform this function. This initial normalization is generally performed as a linear scaling, with each individual value re-expressed as a percent of the total or a fraction of the mean. Table below shows a simple example of linear scaling.

Sample "Raw" Gene Expression Values

 Sample A Sample B Sample C Sample D Sample E
Gene1 8 964 17 491 759
Gene 2 17 69 34 33 79
Gene 3 3 98 5 50 105
Gene 4 542 14 1094 6 15
Gene 5 28 19 57 9 10
Gene 6 26 1 51 1 1
Gene 7 29 53 60 31 66
Gene 8 11 26 22 12 27
Gene 9 480 945 501 996 1021
Gene 10 231 444 244 409 433

Sample Gene Expression Values Scaled to Percent of Mean

  Sample A Sample B Sample C Sample D Sample E
Gene1 0.058182 3.661223 0.081535 2.409225 3.016693
Gene 2 0.123636 0.262058 0.16307 0.161923 0.31399
Gene 3 0.021818 0.372199 0.023981 0.245339 0.417329
Gene 4 3.941818 0.053171 5.247002 0.029441 0.059618
Gene 5 0.203636 0.072161 0.273381 0.044161 0.039746
Gene 6 0.189091 0.003798 0.244604 0.004907 0.003975
Gene 7 0.210909 0.201291 0.28777 0.15211 0.262321
Gene 8 0.08 0.098747 0.105516 0.058881 0.107313
Gene 9 3.490909 3.589062 2.402878 4.887144 4.058029
Gene 10 1.68 1.686289 1.170264 2.006869 1.720986

With arrays co-hybridized to two samples with different fluors, it is necessary to adjust for different starting quantities of RNA and labeling efficiencies for each of the two samples on the array. Usually, there is an underlying assumption that the total amounts of RNA labeled with either Cy3 or Cy5 are equal, and thus the overall Cy5 to Cy3 ratio should be adjusted to 1 for either the entire set of arrayed genes or a subset of housekeeping genes. Thus, while relative Cy3 or Cy5 intensities will vary from spot to spot, these variations will average out over the thousands of spots on the array.

Once expression values have been normalized, expression for individual genes can now be compared across a series of samples. However, if we wish to examine several different genes in this manner concurrently, it may be difficult to compare highly expressed genes to low-level expressed genes. Thus, it is often desirable to normalize expression for each gene across all samples. For example, in the sample data shown above, genes 4, 6, and 7 are all expressed at about 10x higher levels in samples Band D than in samples A and C. Normalization, using a log scale with offset of 1.0, across the samples will help visualize these relationships when the data is graphed. Furthermore, when performing quantitative measures of similarity (such as the correlation coefficient) it may be desirable to have all gene expression differences of the same magnitude contribute in a reasonably similar manner to the calculation, regardless of their absolute values.

Gene array data is often visualized in an "intensity plot", where relative expression levels are color-coded (e.g., red = relatively high and green = relatively low). While the intensity plot is an effective means to summarize data, there is a significant amount of information that is lost by visualizing the data with this approach. A table of the fully normalized sample data and an intensity plot of this data are shown below.

Normalized Sample Data Set

  Sample A Sample B Sample C Sample D Sample E
Gene1 0.065894 1.793547 0.091329 1.429085 1.620145
Gene 2 0.631117 1.260089 0.817861 0.812524 1.478408
Gene 3 0.116056 1.701378 0.127424 1.179765 1.875377
Gene 4 2.238621 0.072586 2.567 0.040654 0.081137
Gene 5 1.600888 0.60181 2.08741 0.373246 0.336646
Gene 6 2.139939 0.046839 2.703729 0.06048 0.049013
Gene 7 0.95473 0.914947 1.261749 0.706401 1.162172
Gene 8 0.893664 1.093495 1.164812 0.664351 1.183678
Gene 9 0.982493 0.996635 0.801024 1.159568 1.06028
Gene 10 1.016155 1.018571 0.798695 1.134779 1.0318

Intensity Plot of Sample Data


Red indicates high relative expression and green indicates low relative expression.

Class Comparison and Class Discovery in Gene Expression Array Analysis:
The objectives of gene array data analysis can be broadly divided into class comparison or class discovery. Class comparison is actually an extension of the simple comparison of individual samples, with multiple samples chosen to represent each class in the comparison. For example, a class comparison experiment might compare a set of different cancer tissue samples to matching normal tissue samples (either paired or unpaired) with the objective of identifying genes that are differentially expressed in cancer vs. normal. Such an experiment could also involve multiple different samples from each of a number of different types of cancers, with the objective of finding gene expression signatures that distinguish each of the different types of cancer.

A variant of class comparison analysis is the class prediction model. For example, different cancers of one particular histologic type can be stratified according to a clinical feature, such as patient survival or response of the tumor to therapy. Comparing the gene expression patterns of the different pre-defined classes is expected to identify a gene expression signature that can differentiate the different classes (e.g., aggressive vs. non-aggressive cancers).

The objectives of supervised class comparison/ class prediction analysis are not conceptually difficult, but there are several possible statistical approaches to this process. The simplest approach is based on considering predictive strengths of individual genes using standard measures, such as the T-ratio. Other algorithms assign weights related to predictive strengths of individual genes, or use more complex multivariate models.

Regardless of the statistical method used for supervised class comparison analysis, there are several important issues that must be considered. One issue involves consideration of confounding variables. Using the example of comparing cancers with different outcomes (i.e., survival vs. death from disease), the confounding variable of treatment must obviously be considered in the analysis, particularly if treatment is thought to impact survival. Another issue that must be considered is statistical validity of a comparison in such a highly dimensional experiment. Often, the statistical value of each gene in the comparison is considered individually, using a measure such as a T-ratio. When large numbers of genes are analyzed in such a manner, a large number will be expected to meet accepted levels of statistical significance.

One of the most important issues for class comparison studies is that of defining the classes. Typically, outcomes such as response to therapy or survival are continuous variables, not discrete variables. Frequently the boundaries between the classes (e.g., poor outcome vs. good outcome) are made arbitrarily for the purposes of the class comparison, and there can be very little difference between many members of different classes. One approach to this problem is to exclude cases with ambiguous outcomes, and compare data for only those cases with significant differences. Any classification made using this approach, however, will again confront the problem of a continuous range of outcomes when it is applied prospectively in a clinical setting.

Finally, statistical approaches to class comparison that are based on means and variance of gene levels will not adequately recognize the predictive value of particular markers that have variant expression in only a subset of one of the pre-defined classes. The issue of possible unique subsets is, in fact, a rationale for using class discovery methods of gene expression analysis.

Several analytical strategies have been devised for supervised classification in array analysis. These include support vector machines (SVM) [6] and artificial neural networks. A proof of principal study for the use of artificial neural networks demonstrated the effective classification of small, round blue-cell tumors [7] These methods emphasize the composite patterns of gene expression rather than exact thresholds for individual markers. A method that emphasizes the significance of individual genes in a class prediction model is SAM (statistical analysis of microarrays), which calculates a statistic for each gene to measure the strength of the relationship between gene expression and a response variable [8]. The SAM method uses repeated permutations of the data to determine whether the expression of any particular gene is significantly related to the response. The software for this method is available at no charge to academic institutions and is easy to use.

Unsupervised class-discovery analysis methods
Much of the excitement over the microarray technology has come from the ability to consider global patterns of gene expression as a decision tool for defining classes in an unsupervised analysis – or class discovery. One of the most promising applications of unsupervised analysis of gene array data is for the novel classification of cancers (and potentially other diseases) by gene expression profiles. In many situations, our current classification structures cannot distinguish tumors that have vastly different clinical behavior and biological phenotypes. Creating entirely new, clinically meaningful classification systems – or class discovery – represents a far more challenging problem than class distinction, but is clearly an important goal for pathologists.

For class discovery, previous distinguishing characteristics of different classes – and possibly not even the number of different classes – are unknown. In this situation, gene expression data is analyzed to find previously unrecognized subsets of tumors that share gene expression profiles. The gene expression profiles represent objective measures of the cellular phenotype and, if properly analyzed and categorized, can lead to an objective classification structure.

The most commonly used method for unsupervised classification is hierarchical clustering [9], with basic relationships between samples determined by the Pearson Correlation Coefficient. Going back to our sample data set, we can quickly calculate correlations as follows

Correlation Coefficients

  Sample A Sample B Sample C Sample D Sample E
Sample A 1
Sample B -0.99496 1
Sample C 0.97435 -0.97384 1
Sample D -0.90669 0.911652 -0.96752 1
Sample E -0.98274 0.967683 -0.96358 0.869933 1

Casual inspection of these correlation coefficients leads to the recognition that samples A and C are almost identical to one another, as are samples B, D and E to each other. A set of similar samples can be called a "cluster". (Note that, in reality, negative correlations are virtually never seen unless analysis is restricted to a subset of genes that have very different expression between different samples.) This scheme of hierarchical classifications leads to a graphical representation known as a dendrogram, which can be used effectively in identifying and displaying patterns in gene expression data. A dendogram of the sample data set, with the intensity plot reorganized according to the relationships among the different samples and genes, is shown below.



In this example, there are highly significant differences in the patterns of gene expression between the two major branches of the dendogram. This does not, in fact, represent differences that are observed in actual profiling studies. Typically, the differences are far less obvious and there is often concern regarding whether or not the classification devised by hierarchical clustering is valid. The figures below demonstrate principles of hierarchical clustering, using two-dimensional distance relationships to assemble a dendogram. In the left panel of this figure, the cluster dendogram accurately shows that circles A and B are close to one another, as are C and D. However, the distance between circles E and other circles is not accurately demonstrated in the dendogram. When, in the panel on the right, another sample (circle F) is added to the set, circle E becomes reclassified from one branch (with circles C and D) to the other (with circles A and B). This instability of the classification is a result of the underlying lack of robust definition of the classes in this example and serves to demonstrate how hierarchical clustering results in loss, and sometimes misrepresentation of the original information.



There are several examples of studies that used class discovery analysis of gene expression data to identify subclasses of human cancer. One notable study was the identification of two molecularly distinct forms of diffuse large B-cell lymphoma (DLBCL) by groups from Stanford and the National Heart, Lung, and Blood Institute [10]. In this instance, the two forms of DLBCL were identified on the basis of gene expression patterns indicative of different stages of B-cell differentiation. One type expresses genes characteristic of germinal center B cells (germinal center B-like DLBCL) and the second type expresses genes normally induced during in vitro activation of peripheral blood B cells (activated B-like DLBCL). This molecular classification was reported to have prognostic value independent of stratification by the usual clinical grading, with germinal center B-like DLBCL patients having improved survival compared to the activated B-like DLBCL patients. However, a more recent study that evaluated a second series of patients using the markers for germinal center B-like and activated B-like features failed to find any difference in survival between the two groups [11]. This discrepancy highlights the need to validate the biological meaning of any "discovery" of new classes by gene expression profiling. A logical manner to conduct such validation studies on a large population is through the use of tissue microarrays, discussed below.

Data Reduction
Analysisscreening of gene expression data should be performed with some consideration of data reduction to reduce the number of variables by eliminating uninteresting ones or to substitute gene expression values with more parsimonious representation of the data. Some screening is almost always appropriate. For example, gene expressions that are constant can carry no discrimination ability. Additional screening can be based on explanatory power, or measures of marginal association. We favor examining the ratio of within—group variation to between--group variation, with groups defined according to pathological criteria. This requires coping with measurement error, or the variation in signals that would arise from replicates. Previous studies using gene expression data to classify cancers (e.g, the DLBC lymphoma study) have used such data reduction methods.

Parsimonious representations of the data can sometime be identified when there is biological knowledge about a pathway; the presence of a pathway (say gene 1 overexpressed; gene 2 underexpressed; gene 3 overexpressed) can then be used to construct new and more highly explanatory variables. Normally such knowledge is not available and investigators probably need to begin by applying discovery techniques, which find "centroids" of gene expression levels and assign states. These methods are in development by biostatisticians and have not yet been widely applied.

Other Statistical Approaches to Cluster Analysis
There are a wide variety of methods that have been applied to decision-based analysis of array data and statistician now take great delight in developing new methods. A few of these methods will be briefly discussed.

K-Means Clustering: K-means clustering begins analysis by finding k groups, such that the distances within groups are minimized. Different algorithms and clustering indices account for the different possibilities using this technique. Most of the clustering indices are defined numerically by partitioning the total dispersion in the data into within-cluster and between-cluster components. This technique has intuitive appeal. It may be useful to derive the appropriate number of clusters initially by using hierarchical methods. Predefining the number of classes could be an important limitation of SOM's and k-means clustering. Although different possible numbers of classes can be iteratively tested, the clustering algorithm will force all samples into one of the classed and may thus compromise the distinctiveness of a particular group if assigning its members to other classes provides a better overall solution to the problem.

Self-organizing maps: The self organizing map (SOM) algorithm is also finding application in gene expression analysis. As for k-means clustering, SOM requires predefining the number of classes; the algorithm finds a suitable set of cluster centers around which the data appear to aggregate and partitions the sample of tumors according to distance from the centers. SOM classifications lend themselves to interesting visualization techniques such as the Ultsch representation, which would be especially helpful when the number of clusters is moderate or large. The software GENECLUSTER, which implements a version of SOM tailored to gene expression data is available on the Web (http://www.genome.wi.mit.edu/software.html.

Projection Methods: Projection methods identify interesting linear combination of the gene expression patterns. These can be used for visualization, dimension reduction, and class discovery by agglomeration of samples around few interesting linear combinations. One of the oldest and most popular projection techniques is that of principal components, already used successfully in small-dimensional gene expression data problems. Staged approaches also allow us to refine our tools and perform the necessary statistical methods work and validation that would ultimately provide a satisfactory solution to these problems.

Databases for Management of Microarray Data
Effective management of microarray data can be challenging, particularly when a research group plans to combine data sets and continuously mine data for an extended period of time [14]. In addition, sharing and comparing microarray data from various laboratories can be difficult, if not impossible, unless there are some consistencies among microarray database configurations.

The microarray research community has initiated efforts to standardize the management and storage of microarray data through the Microarray Gene Expression Data (MEGD) Society (http://www.mged.org). This organization has spearheaded three related projects: 1) developing of a MicroArray Gene Expression Markup Language (MAGE-ML), 2) establishing standards for reporting data known as Minimum Information About a Microarray Experiment (MIAME), and 3) standardizing the vocabulary through the MGED Society Ontology Working Group. These efforts are summarized in a recent review article [15], and all scientists interested in extended use of microarrays are encouraged to be aware of the emerging standards for microarray data management.

In brief, some of the proposed standards are obvious, such as standardized nomenclature for genes. In addition, the standards include recording information concerning sample preparation, data transformations and normalizations, and experimental conditions. Several widely available databases meet these standards, including the Stanford Microarray Database (SAM, http://genome-stanford.edu/microarray); the RNA Abundance Database (RAD, http://www.cbil.upenn.edu/RAD2/); the NCBI database and analysis tool called ArrayDB (www.nhgri.nih.gov/DIR/LCG/15K/HTML/ ) ArrayExpress, (http://www.ebi.ac.uk/microarray/); and GeneTraffic (http://www.iobion.com/).

Specimen Requirements for Microarray Analysis

Microdissection and Purity of Assayed Cell Population
Many of the studies published to date on the use of gene arrays to analyze tissue samples have had minimal input from pathologists. Unfortuantely, "grind and bind" assays do not faithfully tell what is happening in the particular cells of interest.

One approach to obtaining pure samples of a particular cell type is microdissection. Laser capture microdissection has been applied to frozen tissue samples, but we have actually found mechanical microdissection to be easy, inexpensive, and reliable. Microdissection is applicable to purifying any type of cell population or tissue structure.

A very useful method to obtain purified samples of common cancers is to make smears from scrapings of cut tumor surfaces. Epithelial cells adhere to one another and typically are scraped off in clusters. The scraped material can be smeared on a slide and stained, allowing visualization and easy microdissection of these clusters.

RNA Quality Issues: Array Analysis using RNA from Paraffin-Embedded Tissues
There are several issues regarding quality of tissue samples for microarray analysis. One, obviously, is the issue of RNA degradation. RNA is far more labile than proteins and DNA, and therefore the protocols that have been used to handle tissues for other types of molecular studies could be inadequate for preservation of RNA. Another, less obvious issue, is that of ischemia effects on gene expression. After removal of tissues from the body, cells remain viable for a longer time than commonly expected. During this time of ischemia, cells are stressed and actually express new genes in response to this stress [12]. It is important to be aware of these issues in the analysis of microarray data and to minimize ischemia-dependant variability as much as possible.

Although high quality RNA is obviously optimal for array analysis, there appears to be some tolerance for RNA degradation. This tolerance appears to be highly variable and dependant on labeling methods as well as array platform. The standard approach to evaluating RNA quality requires electrophoresis for measurement of 18s and 28s ribosomal RNA bands. Traditional electorphoresis requires a significant quantity of RNA (e.g., 2 μg) and can waste a precious resource. The Agilent Bioanalyzer uses nanogram quantities for capillary electrophoresis to accomplish the same aim. Alternatively, a method for measuring degredation by comparing PCR amplification of short and long sequences of a common gene (e.g., actin) has also been developed [13].

References

  1. Lockhart, D. J., Dong, H., Byrne, M. C., Follettie, M. T., Gallo, M. V., Chee, M. S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H., and Brown, E. L. Expression monitoring by hybridization to high-density oligonucleotide arrays, Nat Biotechnol. 14: 1675-80., 1996.
  2. Bertucci, F., Bernard, K., Loriod, B., Chang, Y. C., Granjeaud, S., Birnbaum, D., Nguyen, C., Peck, K., and Jordan, B. R. Sensitivity issues in DNA array-based expression measurements and performance of nylon microarrays for small samples, Hum Mol Genet. 8: 1715-22., 1999.
  3. Wang, E., Miller, L. D., Ohnmacht, G. A., Liu, E. T., and Marincola, F. M. High-fidelity mRNA amplification for gene profiling, Nat Biotechnol. 18: 457-9., 2000.
  4. Hegde, P., Qi, R., Abernathy, K., Gay, C., Dharap, S., Gaspard, R., Hughes, J. E., Snesrud, E., Lee, N., and Quackenbush, J. A concise guide to cDNA microarray analysis, Biotechniques. 29: 548-50, 552-4, 556 passim., 2000.
  5. Yang, Y. H. and Speed, T. Design issues for cDNA microarray experiments, Nat Rev Genet. 3: 579-88., 2002.
  6. Brown, M. P., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T. S., Ares, M., Jr., and Haussler, D. Knowledge-based analysis of microarray gene expression data by using support vector machines, Proc Natl Acad Sci U S A. 97: 262-7., 2000.
  7. Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C., and Meltzer, P. S. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nat Med. 7: 673-9., 2001.
  8. Tusher, V. G., Tibshirani, R., and Chu, G. Significance analysis of microarrays applied to the ionizing radiation response, Proc Natl Acad Sci U S A. 98: 5116-21., 2001.
  9. Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. Cluster analysis and display of genome-wide expression patterns, Proc Natl Acad Sci U S A. 95: 14863-8., 1998.
  10. Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A., Boldrick, J. C., Sabet, H., Tran, T., Yu, X., Powell, J. I., Yang, L., Marti, G. E., Moore, T., Hudson, J., Jr., Lu, L., Lewis, D. B., Tibshirani, R., Sherlock, G., Chan, W. C., Greiner, T. C., Weisenburger, D. D., Armitage, J. O., Warnke, R., Staudt, L. M., and et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling, Nature. 403: 503-11., 2000.
  11. Shipp, M. A., Ross, K. N., Tamayo, P., Weng, A. P., Kutok, J. L., Aguiar, R. C., Gaasenbeek, M., Angelo, M., Reich, M., Pinkus, G. S., Ray, T. S., Koval, M. A., Last, K. W., Norton, A., Lister, T. A., Mesirov, J., Neuberg, D. S., Lander, E. S., Aster, J. C., and Golub, T. R. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning, Nat Med. 8: 68-74., 2002.
  12. Huang, J., Qi, R., Quackenbush, J., Dauway, E., Lazaridis, E., and Yeatman, T. Effects of ischemia on gene expression, J Surg Res. 99: 222-7., 2001.
  13. Sugita, M., Haney, J. L., Gemmill, R. M., and Franklin, W. A. One-step duplex reverse transcription-polymerase chain reaction for quantitative assessment of RNA degradation, Anal Biochem. 295: 113-6., 2001.
  14. Ermolaeva, O., Rastogi, M., Pruitt, K. D., Schuler, G. D., Bittner, M. L., Chen, Y., Simon, R., Meltzer, P., Trent, J. M., and Boguski, M. S. Data management and analysis for gene expression arrays, Nat Genet. 20: 19-23., 1998.
  15. Stoeckert, C. J., Jr., Causton, H. C., and Ball, C. A. Microarray databases: standards and ontologies, Nat Genet. 32 Suppl: 469-73., 2002.
The number of citations in this handout are intentionally kept to a minimum. There are a number of useful Web sites that can provide additional information on gene array technology and additional references. The technology – and approaches used to analyze data - is rapidly evolving, and these Web sites can provide up-to-date references. http://ihome.cuhk.edu.hk/~b400559/ - this is a Web site developed by Y.F. Leung, a scientist in Hong Kong. There are many useful links related to the technology and articles using the technology.

http://biosun01.biostat.jhsph.edu/~gparmigi/688/readings.html - This Web page is an outline for a biostatistics course at Johns Hopkins. There is a reading list for contemporary data analysis methods with many links to pdf files of articles.

http://www.gene-chips.com/ - This is a Web page developed by Leming Shi. This site has hundreds of links to product manufacturers, publicly available array data, articles, etc.