Hereditary profiles of disorderly transcription?
Biology Direct volume 1, Article number: 9 (2006)
Microscopic examination of living cells often reveals that cells from some cell strains appear to be in a permanent state of disarray without obvious reason. In all probability such a disorderly state affects cell functioning.
The aim of this study was to establish whether a disorderly state could occur that adversely affects gene expression profiles and whether such a state might have biomedical consequences. To this end, the expression profiles of the 14 genes of the proteasome derived from the GEO SAGE database were utilized as a model system.
By adopting the overall expression profile as the standard for normal expression, deviation in transcription was frequently observed. Each deviating tissue exhibited its own characteristic profile of over-expressed and under-expressed genes. Moreover such a specific deviating profile appeared to be epigenetic in origin and could be stably transmitted to a clonal derivative e.g. from a precancerous normal tissue to its tumor. A significantly greater degree of deviation was observed in the expression profiles from the tumor tissues.
The changes in the expression of different genes display a network of interdependencies. Therefore our hypothesis is that deviating profiles reflect disorder in the localization of genes within the nucleus
The underlying cause(s) for these disorderly states remain obscure; it could be noise and/or deterministic chaos. Presence of mutational damage does not appear to be predominantly involved.
As disturbances in expression profiles frequently occur and have biomedical consequences, its determination could prove of value in several fields of biomedical research.
This article was reviewed by Trey Ideker, Itai Yanai and Stephan Beck
Open peer review
Reviewed by Trey Ideker, Itai Yanai and Stephan Beck. For the full reviews, please go to the Reviewers' comments section.
Within a living cell there will always be "spontaneous" variation in functioning. The origin of this variation could be presence of mutational damage, random fluctuations (also known as noise) or deterministic chaos. Noise has been shown to affect cell functioning  and consequently it can be assumed that some degree of disorder will always be present in a cell. In theory, since extensive disorder could affect the health of the cell and thus ultimately the health of the individual, it would seem prudent to investigate whether a phenomenon like cellular disorder can be demonstrated and analyzed. Databases on gene transcription are now available that facilitate such an investigation.
In this study the word "disorder" is used as an inclusive term to describe "excessive variation in transcription irrespective of cause", thus random variation, deterministic chaos or presence of mutational damage could all be causally involved.
The questions we want to address are: 1) can excessive variation in transcription be demonstrated, 2) does such excessive variation have a degree of permanence and 3) does it play any role in health and disease.
That disorder in gene expression does occur and could be of relevance for understanding carcinogenesis is suggested by the following observations. Firstly, exposure of cells to carcinogens may lead to a state of "delayed, persistent genomic instability"  that can affect each aspect of cell structure and function and predisposes the cell to immortalization. Such genomic instability can also be transmitted to neighboring cells via the medium (bystander effect) [3, 4] and thus does not depend on the presence of mutational damage within the cell. Persistent disorder in transcription might be implied from this unstable state.
Secondly, it is generally accepted that although mutations in oncogenes and/or tumor suppressor genes predispose cells to carcinogenesis, epigenetic and non-genetic mechanisms also play a role . These processes ultimately lead to the cell acquiring a transformed phenotype with concomitant alterations in the life span or eventual immortalization . Although telomerase is known to be expressed during this process of cell transformation, the way in which this gene becomes activated is still unknown [7, 8]. Since the process of transformation appears to involve progressive deregulation of cell functioning, increasing disorder in transcription profiles may play a role.
Thirdly, since predisposition to tumorigenesis can be related to only a small change in the expression level of a single gene , even minor fluctuations in gene expression could have major consequences. Moreover it has recently been shown that noise in gene expression is biologically relevant as it is detrimental to organismal fitness .
Thus perturbations in the transcriptome could be present in cells that are in the process of malignant transformation and such disorder in the transcriptome might, in itself, be a driving factor in carcinogenesis.
The availability of the Cancer Genome Anatomy Project's (CGAP) SAGE (Serial Analysis of Gene Expression) database of human gene expression levels in a wide variety of cells  has enabled us to check our hypothesis and to establish that excessive variation in transcription can be constitutive and hereditary. These findings could prove to be of importance in various fields of biomedical research.
Results and discussion
The proteasome as a model system to investigate disorder in transcription
Although the ultimate aim of this investigation is to establish the existence of a state of cellular disorder that affects all transcription profiles, the first step involves the choice of only one expression profile that could serve as a model for all profiles in a complex system although it is evident that one expression profile will not represent the whole human transcriptome. The transcription profile of the genes that code for a cellular organelle, the 20S proteasome, was chosen to fulfill this role. Since a cellular organelle has a well-defined structure, a prerequisite for its assembly would be that the products of the genes involved be available in, ideally, the correct amounts. Therefore we assume that an optimal expression pattern for the transcription of the genes in question exists although, of course, it might turn out that the expression pattern can be influenced by factors like tissue type or response to stimuli. Sampling errors will affect the number of transcripts of the proteosomal genes found in a library. However, If the degree of variation turns out to be greater than that expected due to sampling, then this could be indicative of the existence of transcriptional disorder.
The 20S proteasome, a structure 15 nm in length with a diameter of 11–12 nm, is organized as four stacked rings with a central channel. This architecture is highly conserved from bacteria to man. Each rings consists of 7 different subunits, each located at a defined position , Alpha-type subunits and beta-type subunits form the two outer and two inner rings respectively. The 14 genes coding for these subunits are all regulated independently from each other and are located on different chromosomes. Thus 14 gene products are needed in equal amounts to build the proteasome. For our calculations we assume the existence of a preferential expression profile for these 14 genes and deviation from this preferential expression profile could indicate "disorder in transcription".
The tags (listed in Table 1) used to establish the degrees of expression of the proteasome genes were derived from the NCBI website  Unique cDNA tags are available for 6 of the 14 genes, whilst the remaining tags also detect expression of genes other than the proteasome genes. Therefore, additional tags specific for the expression of these non-proteasome genes were utilized to check their expression. However, in the available libraries, the expression of these non-proteasome genes appeared so rare that the counts obtained with the proteasome tags of Table 1 are reliable.
In order to obtain data suitable for statistical analysis, only those libraries that have a total tag count of at least 24 for the expression of all the proteasome genes together were used. Additionally, only libraries derived from biopsies were used in order to circumvent any possible effects associated with tissue culture conditions. At the beginning of this study 60 libraries were available that met these criteria, 30 of these were derived from normal tissues and 30 from cancer tissues. A possible disadvantage of using these datasets is that they are derived from very different tissues. However, this was unavoidable due to the limited number of available libraries. Nevertheless, no significant difference between normal and tumor tissues existed for the total tag count per library (Wilcoxon: P = 0,274) or for the counted number of proteasome tags per library (Wilcoxon: P = 0,504). Therefore the groups are homogeneous with respect to these characteristics.
Excessive variation in transcription of proteasomal genes in libraries derived from normal and tumor tissues
Expected frequencies of tags for the 14 genes of a library were obtained with the overall expression profile. This expression profile was derived from the sums of the tag counts in all 60 libraries (Table 2 A). The observed tag counts are shown in Table 3. The expected and observed tag counts were compared by chi-square. Of the 30 normal tissues, 13 deviated (P < 0,05) from the expected values, compared to 23 of the 30 tumor tissues. As the prerequisites for the application of the chi-square test were not met the probabilities of the chi-squares of the two groups were compared using Wilcoxon's test. The two groups proved to be significantly different from each other (P = 1.73 × 10-6), with tumor tissues being more disorderly. In conclusion it is apparent that excessive variation in transcription does occur and that as a group, tumor tissues show a significantly greater degree of variation in transcription than do normal tissues. However, the data also indicates that some normal tissues show excessive variation whilst some of the tumor tissues might still have a normal expression profile.
At the start of this study the GEO database (GPL4) consisted out of 154 libraries, of which 60 met our criteria (derived from tissue biopsies and at least 24 tags). At present 254 libraries are available of which an additional 80 meet our criteria. These extra libraries have therefore been used to check whether our previous findings are reproducible. However, since only 12 of the additional 80 libraries were derived from normal tissues a new calculation was performed using all 140 libraries (the 60 original and 80 new libraries), giving totals of 42 derived from normal tissues and 98 derived from tumor tissues. The observed tag counts are shown in Table 3. Expected frequencies for the tag counts of the 14 genes of a library were obtained with the overall expression profile derived from the sums of the tag counts in all 140 libraries (Table 2 C). The expected and observed tag counts were again compared by chi-square. A comparison of the chi-squares of the normal and tumor tissues by Wilcoxon's test revealed a much greater significant difference between the two groups then previously observed (P = 1.65 × 10-8 instead of P = 1.73 × 10-6). This shows that the 80 new libraries display a similar difference between normal and tumor tissues to that observed with the 60 libraries used previously.
Of these 140 libraries, 30 were derived from breast tissue, 11 of these originating from normal breast. These normal and tumor breast tissues were compared separately to deal with the potential disadvantage posed by tissue heterogeneity. Expected frequencies for the tag counts were again obtained with the overall expression profile derived from the sums of the tag counts in these 30 libraries (Table 2 D). The expected and observed tag counts were again compared by chi-square. Comparison of the chi-squares of the normal and tumor breast tissues by Wilcoxon's test (P = 0,0033) revealed a similar difference between normal and tumor to that found for all tissues.
Variation in transcription specific or unspecific?
Comparison of the expression profile of the 37 most orderly libraries with that of the 37 most disorderly libraries shows that these profiles are rather similar (Table 2E, F). This indicates that excessive variation in transcription is multi-directional and does not lead to a specific and systematic change in the expression profiles of all tissues. However for individual tissues this still could be the case.
Heritability of variant profiles
That the observed variation in transcription is not just due to momentary fluctuations in transcription rates follows from an analysis of pairs of libraries present in the database; i.e. pairs from both tumor and normal tissue or from both tumor and metastatic tissue, each pair being derived from the same individual. Of the six available library pairs, four have enough tags for a detailed analysis of deviation in the expression of each individual proteasome gene. By taking the log of the ratio of "observed tags"/"expected tags" for each gene, a profile is obtained that shows the degree of aberration in expression for each proteasome gene. The data presented in Figure 1A reveal a high correlation between the abnormal expression pattern of a normal prostate and its tumor (R = 0,77 and P = 0,00135). This indicates that a deviating expression profile can be extremely stable and can be transmitted to a clonal derivative as a constitutive trait.
Essentially the same picture is seen in Figure 1B, where a high correlation exists between the deviant profile of another prostate tumor and the normal prostate tissue from which the tumor was derived (R = 0,73 and P = 0,00329). The expression profiles of the two prostates clearly differ from each other, which suggests that a specific expression profile for the prostate does not exist and that these two profiles are deviating dissimilar from the mean expression profile.
The similarity in deviating expression profiles of a normal tissue and its tumor suggests the possibility of a disorderly condition in the normal tissue being a predisposing factor in the eventual appearance of the tumor.
The same persistence of a deviating expression profile is observed for the correlation of a breast tumor and its metastasis as shown in Figure 1C (R = 0,84 and P = 0,00016).
Although heritability of deviating profiles is indicated for these three instances, a direct proof for disorder as cause for these deviations is absent.
Both progression and regression in degree of variation occurs
Inspection of the three profiles of the clonal derivatives that correlated significantly with the three profiles of the tissues of origin (Figure 1A, B and 1C) indicates that 11 genes progressed to further deviation and 3 to less variation (only those genes were counted that had a log obs/exp that is larger than 0.5 or -0.5 which represents a 3.2-fold over-expression or under-expression respectively). As this difference is significant (P = 0,032) it suggests that the deviation in expression is progressing in clonal derivatives. The figure further indicates that progression of a deviating expression profile generally occurs in small steps, but that larger jumps may also occur (PSMA4 in Figure 1A).
That progression does not always occur is suggested by Figure 1D in which no significant correlation is observed between a tumor and its metastasis (R = 0,04 and P = 0,887) and in which a decrease in deviation is apparent in the metastasis. This figure therefore indicates that the deviating profile does not always persist in a clonal derivative. The metastasis actually exhibits an expression profile that is almost indistinguishable from normal, thereby suggesting that the changes in expression were non-genetic in origin.
Apparently, spontaneous epigenetic modifications that interfere with normal gene expression patterns can occur. At present, the cause of these modifications remains obscure. Changes in transcription factors, DNA methylation patterns , unusual DNA structures , alterations in nuclear organization , interference by noncoding RNAs  or changes in the macromolecular transcriptional apparatus  might be involved. Possibly all factors known to influence transcription rates could be involved. Among these factors alteration in nuclear organization is very attractive as it easily explains the simultaneous changes in expression in a number of genes and as numerous nuclear constituents could be involved. If so these findings would be of importance to the rapidly developing field of "spatial nuclear organization as a structural component in gene expression" [19, 20].
Under-expression and over-expression in deviating libraries
In the first set of 60 libraries, 15 variant libraries were identified that deviated strongly (P < 0,01) from the overall profile. The changes in the degree of transcription of the individual genes in these libraries were expressed as the log of the ratio of observed and expected tags (Table 4 A). In the second set of 80 libraries, 22 variant libraries were identified (Table 4 B).
If one considers the data In Table 4 as a whole, 331 under-expressions are observed against 187 over-expressions. Thus under-expression predominates (P = 2,5 × 10-10) in deviating libraries. There is no evidence that differences exist between the genes in their frequencies of under- and over-expression, since the chi-square for heterogeneity is not significant (P = 0,539). However differences between the genes in the degree of under- and over-expressions do exist. The 14 standard deviations of the log obs/exp values, calculated for each individual gene from the 15 most abnormal libraries as shown in Table 4 A, were found to be heterogeneous when compared by the test of Bartlett (P = 0,002). The standard deviations of the 22 most abnormal libraries in Table 4 B correlate with those of Table 4 A (P = 0,022) thus showing a similar pattern of differences between genes. For the whole set of 37 deviating libraries the 3 most variable genes are PSMA7, PSMB4 and PSMA1, while the 3 least variable genes are PSMA2, PSMB6 and PSMA3.
Epigenetic origin of excessive variation in transcription
At first sight it does not seem surprising that cancer tissues show greater variation in transcript abundances, as tumor cells are usually aneuploid. However excessive variation in expression profiles is observed in normal tissues as well. This holds not only for the two normal prostates mentioned previously but also for other normal tissues e.g. cortex (GSM 786), normal retina (574), normal breast (760), normal colon (728), normal brain (676) and normal lung (762). Moreover tumors with a normal expression profile are not rare although tumors, as a rule, are known to be aneuploid and carry mutational damage in oncogenes. In addition the similarity in deviating expression profiles of pairs of normal and tumor prostate tissue (supposedly the first being diploid and the second aneuploid) also indicates a non-genetic origin of the deviating profiles Loss of deviation in a metastasis is, similarly, also suggestive of a non-genetic origin. Therefore it would appear that aneuploidy, as such, or mutated oncogenes do not play predominant roles in the emergence of excessive variation in transcription and thus that there is an epigenetic origin.
An index for the degree of deviation in transcription
The degree of deviation can be quantified in a deviation index by taking the standard deviation of the log obs/exp values from the 14 genes in a library (log ratio deviation index) or the standard deviation of the z-scores (z-score deviation index). The log ratio deviation index will be suitable to reflect fold-changes in expression, while the z-score deviation index wiil be more suitable to reflect percentual changes in expression.
If the log ratio deviation index and the z-score deviation index were calculated for all 140 libraries (shown in table 3) and then normal and tumor libraries were compared with the test of Wilcoxon, tumor libraries were again found to be more deviating than normal libraries. For the log ratio deviation index the significance is considerable (P = 3.6 × 10-6) while for the z-score deviation index the significance is much less (P = 0,0177) indicating that the changes in expression reflect fold-changes rather than percentual changes.
This deviation index provides a means by which to test whether differences in degree of deviation exist between tumors derived from different tissues. To this end a table was prepared with the log ratio deviation index of 5 tumors (astrocytoma, breast cancer, ependymoma, gastric cancer and medulloblastoma), each tumor represented by 9 libraries (Table 5). When compared with ANOVA (Analysis Of VAriance), the degree of deviation was not influenced by tumor type (P = 0,387). Therefore tumors do not appear to differ systematically in degree of deviation.
Deviation of individual genes is often not independent
Correlations between the values of log obs/exp in the 140 libraries were calculated for each pair of genes to determine whether the deviation of the individual genes is independent from each other. Of the 91 correlations, 18 significant correlations (P < 0,01) were observed. These significant correlations are shown In Figure 2. This figure shows a rather simple network of links in deviation of individual genes. Both positive and negative correlations were seen. The existence of these interactions suggests that there will be patterns in the emergence of deviant profiles.
The 3 most unstable genes are involved in significant correlations on 17 occasions as compared to 5 occasions for the 3 most stable genes. This indicates that instability in expression and involvement in correlation are somewhat related (P = 0,0105). The degree of expression of the individual genes (see Table 2) does not seem to be involved in this pattern of interactions. Changes in the spatial architecture of the nucleus could well be responsible for the observed dependence, as any change in architecture will affect many genes.
Possible significance of the observed variation in transcription
As a whole, tumor tissues demonstrate a much larger variation in transcription profiles than normal tissues. This suggests, therefore, that the observed excessive variation in expression of proteasome genes is due to disorder and is not a consequence of an orderly regulation. However one cannot exclude that alternate hypotheses are possible. If due to disorder it is at present still to early to characterize the sources of the disorder observed. Although mutational damage could, in theory, be involved, this is not supported by the data as discussed above. Consequently, noise and/or chaotic processes might be causally involved but it is still too early to make any decisive statement in relation to this.
The observed deviation appears to be due to stable structural epigenetic changes. If our assumption is correct; that all tissues initially have an approximately similar expression profile (which is supported by the grosso modo similar expression profile of orderly and disorderly libraries, see Table 2 E and F), then the findings suggest that this expression profile can be altered in a progressive and unpredictable way resulting in widely different expression profiles. In addition, degree of progression of this deviation seems to have some unpredictability, as it was observed to exert a small effect on a number of genes simultaneously or a major effect for just one gene in particular (Figure 1). Therefore a deterministic epigenetic process could well be the cause for the observed deviations. Only time will tell.
Although the deviations in transcription, as described in this paper, relate only to the transcription of the proteasomal genes, it is not illogical to suppose that similar deviations will exist in transcription profiles of genes involved in other functions of the cell and that the corresponding deviation indices could provide information on the degree of order in transcription.
Our present working hypothesis for the structural aspect of the observed variation is that the patterns of over- and under-expression are a reflection of the localization of the genes within the nucleus. If this hypothesis were correct, then one would expect that future research would reveal correlations between genes that are completely unrelated in function in terms of their degree of over- or under-expression. Consequently, the next desirable step will be to investigate whether similar deviating profiles can be found in other organelles and/or pathways and whether a deviation in one profile corresponds with deviation in another profile.
This phenomenon of deviation in transcription could provide a new method to study genetic dysfunction. Apart from the field of gene expression, research into this phenomenon could turn out to be of value in other fields:
1. Genomic instability
Any decrease in the ability of a cell to carry out normal cellular functions could lead to a less efficient DNA replication and to increased production of free radicals, resulting in a greater degree of spontaneous DNA damage. As DNA repair could also be less efficient a higher spontaneous and induced mutation rate might result. Therefore this new phenomenon of variation might well underlie the hitherto unexplained phenomenon of "persistent delayed genomic instability "  and might also provide an explanation for the trans-generational effects of parental irradiation .
Mutations in oncogenes predispose a cell to developing a transformed phenotype. The switching on of the telomerase gene and other genes involved in immortalization is another prerequisite step in the process of carcinogenesis. In fact the acquisition of an immortal phenotype is the rate-limiting step in carcinogenesis [6, 22]. The course of these epigenetic events, ultimately resulting in malignant transformation, is still not understood. Progressive disorder in transcription in pre-cancerous lesions could be involved in the rare switching on of genes involved in immortalization.
3. Cellular aging
The current database does not contain enough information to determine whether disorder plays any role in cellular aging. However as progressive epigenetic changes could well be at the core of cellular aging it is not inconceivable that the aging process will be reflected in the degree of deviation in expression profiles. Whether this will be seen as an increase or a decrease in variation is a fascinating question. As cellular aging is still largely a black box, investigation of the role of disorder during aging might contribute to an improved understanding of this process.
4. Cell dynamics
The cell is a complex system. It is a mystery as to how all of the cellular subsystems of the cells interact and function collectively as a complex whole. In complex systems 'spontaneous" processes like pattern formation, oscillation, bifurcation and chaotization occur, These processes might depend on very simple rules. The observed deviations in transcription might reflect chaotic processes. So far however there is no direct evidence that the observed deviations have anything to do with chaos, only time will tell. Nevertheless, whether chaotic in nature or not, the study of variation in the regulation of gene expression might contribute to a better insight into the cell as a complex system, especially if it reflects changes in the spatial organization of the nucleus.
5. Practical implications
One direct consequence arising from this study is that determination of the degree of deviation can serve as a control for the quality of libraries, which are to be used for the identification of genes involved in cellular processes. Deviating libraries will be less suitable for the identification of the genes involved.
A deviation index might further prove to be of prognostic value in predicting the probability of progression of neoplastic and possibly of pre-neoplastic lesions and likewise could be used as an indicator of health.
In addition, since deviation in gene expression can either increase or decrease, it would be useful to determine the effects of medication, promoters and anti-promoters on the degree of deviation.
In many respects, the data presented in this paper should be considered as very provisional. Ideally instead of only the 14 genes studied here, one would like to see comparative data for a few hundred genes as well as the use of greater numbers of large libraries from both healthy and unhealthy donors. Although such information is not yet available, this could soon be the case. Nevertheless the data obtained so far does indicate that the study of variation in transcription in the cell could provide new clues in biology and biomedicine.
The tags used to determine the degree of expression of the proteasome genes in SAGE libraries were derived from the GEO database  the proteasome tags in SAGE libraries were found by importing both the tags of table 1 as well as the library tag count file into Microsoft Access. A query that joins both tag fields results in a table showing the abundance of tags for each proteasome gene.
The SAGE libraries were obtained from the "Gene Expression Omnibus" (GEO) .
The 30 libraries from normal tissues are: GSM number 572, 573, 574, 676, 677, 685, 688, 691, 692, 695, 708, 713, 719, 728, 729, 738, 739, 760, 761, 762, 763, 780, 781, 785, 786, 819, 824, 1499, 2386, 3242. The 30 libraries from cancer tissues are: 670, 671, 672, 673, 686, 687, 689, 690, 693, 696, 697, 698, 699, 727, 731, 735, 736, 737, 740, 745, 755, 756, 765, 792, 793, 1497, 1516, 2443, 2451, 2578 (Table 3). Only one library (GSM 709, leukocytes) was excluded since libraries derived from blood and blood-forming tissues might express the immunoproteasome that might then interfere with expression of the 20S proteasome. The 80 additional libraries are: 743; 744; 757; 758; 1498; 1730; 1731; 1732; 1733; 1734; 1735; 2382; 2383; 2384; 2385; 2389; 2408; 7498; 7800; 8505; 8867; 9103; 9104; 14731; 14732; 14733; 14734; 14737; 14739; 14740; 14741; 14742; 14743; 14745; 14746; 14747; 14748; 14749; 14750; 14753; 14754; 14756; 14757; 14760; 14761; 14762; 14763; 14765; 14766; 14767; 14768; 14769; 14771; 14772; 14773; 14774; 14775; 14776; 14779; 14780; 14781; 14782; 14783, 14786, 14787, 14788, 14790, 14791, 14792, 14793, 14794, 14795, 14796, 14797, 14798, 14799, 14800, 14801, 14806 and 14807
The observed frequencies of the expression profiles were compared with their expected frequencies by chi-square. As the lower limit of the proteasomal tag count per library was 24 and the number of genes was 14, the expected number of tags was often less than 5 which is the lower limit of reliability for the application of chi-square. Therefore the outcome of the chi-square test was only used in a parameter free test (Wilcoxon) to compare the group of normal tissues with the group of cancer tissues.
For the calculation of the deviation index the standard deviation of the log of observed tags/expected tags was used. In those cases where the observed number of tags was 0, it was assumed that there was 1 tag. This index has as disadvantage that it is only symmetric if the changes in expression occur as fold-change. If the variation in expression occurs in percentages this index is not symmetric. Therefore a second index was calculated based on z-scores (z-score = (obs - exp)/√exp). Both deviation indices, noted as log ratio deviation index and z-score deviation index, have been applied.
The deviation index of 5 types of tumors was compared with ANOVA. In order to have 9 tumors per group some libraries had to be omitted from the analysis. This was achieved by leaving out the libraries with the lowest number of counts.
Reviewer's report 1
Trey Ideker, University of California San Diego, La Jolla, California, United States
This manuscript by JWIM Simons examines the mRNA levels of proteasomal proteins across publicly-available SAGE data from both cancer and normal tissues. It reports that proteasomal mRNAs show more variance away from their average levels when looking in cancer tissues versus in normal tissues. It also presents a corollary finding, that normal and cancer tissues taken from the same patient tend to have protesomal expression levels that are very similar. Finally, the manuscript makes speculative remarks about the possible interpretation and impact of these findings. In this regard, the main claim is that high variance in proteasomal RNA levels is indicative that the cell is in a "disorderly state" and that this disorderly state is likely a cause, not an effect, of cancer.
The basic finding, that proteasome mRNA levels as measured by SAGE have higher variance from the mean when looking in cancer cells, is interesting and, as far as this reviewer can tell, arrived at through reasonable use of statistical methods. The corollary finding, that mRNA levels from the same patient are correlated in cancer versus normal cells, is also interesting and is nicely controlled by comparing the correlation within versus between patients.
On the other hand, framing these findings within an argument that cellular transcription can be "ordered" or "disordered" is speculative at best and much less compelling. In order to support the "disorder" argument (which is not a small suggestion in the discussion, but also appears in the manuscript title, abstract, introduction, and results) a larger body of evidence would need to be examined and presented. For instance, perhaps a more likely null hypothesis is simply that cancer cells are proliferating and thus have more protein turnover. And there are other equally plausible ideas that do not relate to a global order vs. disorder phenomenon. Without examining such alternate hypotheses and addressing them, the article reads much more like a "commmentary" or "opinion" article than a primary research paper.
Surely so far there is only an indication and not yet a proof that cellular transcription can be disordered. At present there is also not yet an overview of all possible alternate hypotheses let alone to examine them. To make sure that disorderly transcription is only one possible explanation a sentence has been added that states that alternative explanations are possible.
There is also a semantic problem with the use of the term "disorder". Ordered versus disorder has a concrete (and very different) meaning in the field of information theory, which attempts to measure it through quantities such as entropy. In fact, entropy might have been a much more natural metric to use for the proposed disorder index.
In order to have a better separation between findings and speculation the term disorder has been replaced by a more descriptive term (e.g. excessive deviation) in the case of findings. This, I hope, dissolves the "semantic problem" and improves the distinction between fact and speculative interpretation. The changes have been made throughout the text. Your remark that "entropy might have been a much more natural metric to use for the proposed disorder index" has raised my interest greatly but I have no idea how such a thing could be accomplished.
A key assumption of the paper is that for organelles such as the proteasome, "the products of the genes involved [are] available in the correct amounts" and "therefore we can assume an optimal expression pattern of the transcription of the genes in question exists" (Results and discussion section, first paragraph). This may not need be the case. For instance, for the ribosome it has been shown that, while RNA levels can fluctuate in response to stimuli, the overall ribosomal protein levels are buffered from change. Depending on the function/component, such buffering can be due to differential RNA degradation, protein translation, and so on.
Probably there will be many mechanisms within the cell that can buffer undesirable fluctuations. This does not bring down the assumption that optimal functioning will depend on optimal conditions and to some extent this also should hold for an expression profile.
Moreover, given the assumption that correct amounts of each proteasomal subunit are needed, then why are the average or "expected" amounts of each subunit so different from one another in the SAGE data? Since the main result is measuring deviations from these expectations in individual patients, this point is particularly important. I would be interested to see how the results are impacted if the expected amounts are equal to each other across all subunits.
You rightly put the finger on the remarkable differences in tag counts between genes in the mean expression profile while all subunits are equally important for building the proteasome and thus similar frequencies are expected. For this there is not yet a definite answer: remarkable is that when another tag is used (for some genes an additional tag is available) the absolute frequency can be consistently much lower. Explanations could be the length of the cDNA or the efficiency of the cutting enzyme, maybe also splicing or RNA degradation?? For this point I have been looking to microarray data and found that the mean profile is very different from the SAGE profile, the two profiles did not even correlate. Apparently the technique to obtain expression data affects the outcome and it is not clear to me whether this heterogeneity is biological. Taking the mean of the two expression profiles of microarray and SAGE produces an expression profile that is much closer to the "equal amounts" concept. Also this is an issue for the future.
In conclusion, my recommendation for this article would be to (1) remove many of the speculative remarks, or at least leave them for the discussion, including any interpretation of the results as "disorderly"; and to also (2) perform and present a more comprehensive body of findings which support the points in the discussion section that remain. For this second point, at minimum it would be nice to see a survey of all organelles/functions in the cell and whether they show greater variance in cancer than normal. Otherwise there is no evidence that the specific anecdote of the proteasome can be abstracted to some general principle of the cell.
these recommendations have been met by the clearer distinction between findings and possible interpretations as described above. We surely agree that the findings with the proteasome cannot yet be abstracted to a general principle. Therefore with respect to your recommendation to make a similar analysis for all organelles/functions in the cell, it is obvious that such would be our wish and this was as such also stated in the manuscript. However this is physically impossible, my lifespan would not be long enough. The only thing I can do is to point to this new phenomenon and to contribute further to its interpretation in the hope that also other scientists will study this new phenomenon.
Reviewer's report 2
Itai Yanai, Harvard University, Cambridge, Massachusetts, United States
In this paper, Simons uses public SAGE data to quantify changes in gene expression of the set of 14 genes that compose the proteasome. First, the overall relative frequencies of these genes are calculated. A SAGE library is then described as disorderly if the standard deviation of the genes' observed to overall differences is high. It is noteworthy, that the frequencies of the most disorderly libraries are remarkably similar to those of the most orderly libraries, suggesting there is no characteristic state of disorder but instead that each mess is unique.
Simons then shows that the overall tumor libraries are significantly more disorderly – as evidenced by Wilcoxon's test – as a group than the normal libraries. Since this is an important point, I believe it would be helpful to visualize this difference with a principal components type analysis. The 14 dimensions (genes) can be reduced to 2 or 3 and plotted for both the normal and tumor samples. This method is further called for since the author shows in Figure 2 that the genes expression are correlated.
No doubt this could be worthwhile. However, the analysis is unfamiliar to me (I had even never heard of the method). In a first trial with PCA it was found that the first three factors count for only 41,46% of the variability and thus principal components do not appear to be present. In the future, when I am more familiar with this method, I certainly will try to perform a PCA.
An issue is next raised about whether disorderly profiles represent "momentary fluctuations in transcription rates" or heritable states. The author shows that deviations of the proteasome genes' expression in a tumor correlates with those in non-tumor from the same patient. Based upon this evidence, the author states that "a disorderly expression profile can be extremely stable and can be transmitted to a clonal derivative as a constitutive trait." However, since this result was observed in three of only four instances, it would be prudent not to draw too strong of a conclusion about the stability of gene disorder based upon this dataset alone. Furthermore, one could argue that difficulties associated with exclusively dissecting tumor vs. non-tumor samples from a given tissue, compromise our ability to meaningfully compare them; i.e. the two may be similar simply because of impure sample isolations.
Hereditability of deviating expression profiles was observed for three of the four cases. To me it seems unlikely to be due to admixtures of normal tissue in three tumor samples. Of course so far there are only three cases. To stress that this hereditability is not necessarily connected with disorder in transcription a sentence has been added and a question remark has been added.
The author presents an index for disorderliness: standard deviation of the log obs/exp values. I would advise against this formulation because it is not symmetric, biasing in favor of reduced expression. For example, an increase of expression by 10% would be log(1.1/1) = 0.0952, while a decrease by 10% results in log(0.9/1) = -0.1054. Since, the author makes the point that disorderliness tends to occur in terms of under-expression, the lack of symmetry in the index is a clear confounding effect. This can be easily fixed by taking the log of the absolute difference, and adding a negative sign if obs is less than exp. However, it may be the best to convert to Z-scores, to explicitly take into account the variation of each gene's expression.
Your remark on the possible absence of symmetry in the log ratio deviation index did initially worry me. According to your suggestion z-score values have been determined and compared to the log ratio. It turns out that the z-scores discriminate less between normal and tumor indicating that the changes in expression reflect fold-changes rather than percentual changes. This has been added to the text.
Reviewer's report 3
Stephan Beck, The Wellcome Trust Sanger Institute, Hinxton, United Kingdom
I accepted to review this manuscript on the premise that experts in SAGE expression analysis and statistics will be secured as additional reviewers to assess the methodologies and tests carried out in this study.
The manuscript by JWIM Simons aims to address several fundamental questions, listed on page 2:
Can disorder in transcription be demonstrated?
Does such disorder have a degree of permanence?
Does it play any role in health and disease?
While I commend the author for tackling such complex questions, I do not agree with many of the conclusions and believe the study is compromised by inadequate assumptions and data selection. My main concerns are:
1) The selection of only 14 genes (20S proteasome complex) is not nearly enough to represent the human transcriptome. In addition, I do not agree with the rationale for some of the additional stratifications of the libraries/data given in the 'Results and Discussion' (first section), and in the 'Methods'. For example, leukocyte libraries were excluded from the analysis on the basis that "these cells might express the immunoproteasome PSMB8 and PSMB9 genes that might then interfere with expression of the 20S proteasome". According to the GNF gene expression database http://expression.gnf.org/, PSMB9 for instance is not only expressed in blood, but also in lung, thymus, spleen and heart. Therefore, why were these tissues not excluded as well if this is the right thing to do in the first place?
I agree completely with your remark that a sample of 14 genes is not nearly enough to represent the human transcriptome. This has also been stated in the paper. This sample is just the first start to investigate whether such an approach is possible and could be useful. With respect to selection criteria, a selection has to be made, as the immunoproteasome is another organelle than the proteasome. The point therefore is where to draw a line. Whether the immunoproteasome could be normally present in some tissues that are not involved in blood-formation I really do not know. As these expressions, if present, are very low the best policy, in my view, is to draw a line between blood-forming tissues and others. Therefore for the selection of the libraries all libraries from blood forming tissues would have excluded, thus also spleen, thymus and tonsils.
2) I could not work out how the author defines disorderly expression profiles, except for the definition on page 2 where 'disorder' is defined as "excessive variation in transcription irrespective of cause". However, the author does not seem to take into account that natural variation in gene expression can be quite high (up to 14.13%) in unrelated individuals as compared to e.g. monozygotic twins (up to 1.76%) (see e.g. Sharma et al. Physiol. Genomics 2005 21:117-23). If the above ~10-fold difference falls within the definition used here for 'excessive', then perfectly 'normal' expression profiles would be classified as 'disorderly'.
I also agree that at present it is not possible to conclude with certainty what to classify as disorderly. Therefore in order to have a better separation between findings and speculation I have been replacing the term disorder by more descriptive terms (e.g. excessive deviation) in the case it concerns findings. The changes have been made throughout the text and a question remark has been put in the title. Your argument about natural variation in unrelated individuals is strictly speaking not fully valid as in that case the quantitative expression of genes was compared and no use was made of an expression profile, which shows the relative expression.
3) I do not agree with the 'one-fits-all' assumption made on in the 'Results and Discussion' (first paragraph), that "to establish the existence of a state of cellular disorder that affects all transcription profiles, the first step requires the choice of only one expression profile that could serve as a model for all profiles in a complex system."
It has not been my intention to suggest that the use of only one profile would be sufficient to establish the existence of a state of disorder in a tissue (the "one fits all assumption"). This was also discussed in the text. To avoid such a misinterpretation the remark referenced here has been improved.
4) There are numerous statements throughout the manuscript which are unsubstantiated and I cannot not agree with. For instance, the statement in the 'Results and Discussion' (section: Heritability of variant profiles): "The expression profiles of the two prostates clearly differ from each other, which suggests that a specific expression profile for the prostate does not exist and that these two profiles are disorderly. The similarity in disorderly expression profiles of a normal tissue and its tumor suggests the possibility of the disorderly condition in the normal tissue being a predisposing factor in the eventual appearance of the tumor."
After a better separation of findings and speculations I assume that it is evident that the numerous unsubstantiated sayings throughout the manuscript are of a speculative nature.
5) On several occasions the author suggests epigenetic changes to be responsible for 'disorderly' profiles. Yet, no supporting evidence is provided.
"Epigenetic changes". The subheading 'Disorder in transcription can occur in euploid cells" has been changed into "Epigenetic origin of excessive variation in transcription". In this section three arguments were already given as supportive evidence for an epigenetic origin of the changes in expression profiles.
6) The final 5-point conclusion/outlook in 'Results and Discussion' is pure speculation.
The 5-point reflection should indeed be read as speculation. The subheading of this section has been altered according to this.
7) Details of the additional 80 libraries mentioned in 'Methods' (section: Gene expression) should be included in Table 3.
Details of the additional 80 libraries have been included in table 3.
For the reasons outlined above, I find the manuscript not acceptable as Research or Review article and even questionable as Commentary/Hypothesis article. I declare that I have no competing interests.
for the reasons outlined above and after the improvements made we trust that the manuscript is a valuable contribution. In our view this is clearly a research article.
Raser JM, O'Shea EK: Control of stochasticity in eukaryotic gene expression. Science. 2004, 304: 1811-1814. 10.1126/science.1098641.
Simons JW: Coming of age: "dysgenetics" – a theory connecting induction of persistent dekayed genomic instability with disturbed cellular ageing. Int J Radiat Biol. 2000, 76: 1533-1543. 10.1080/09553000050176298.
Boesen JJ, Stuivenberg S, Thyssens CH, Panneman H, Darroudi F, Lohman PH, Simons JW: Stress response induced by DNA damage leads to specific, delayed and untargeted mutations. Mol Gen Genet. 1992, 234: 217-227. 10.1007/BF00283842.
Seymour CB, Mothersill C: Delayed expression of lethal mutations and genomic instability in the progeny of human epithelial cells that survived in a bystander-killing environment. Radiation Oncology Investigations. 1997, 5: 106-110. 10.1002/(SICI)1520-6823(1997)5:3<106::AID-ROI4>3.0.CO;2-1.
Kling J: Put the blame on methylation. The Scientist. 2003, 17: 27-30.
Simons JW: Genetic, Epigenetic, Dysgenetic and non-genetic mechanisms in tumorigenesis. II. Further delineation of the rate limiting step. Anticancer Res. 1999, 19: 4781-4790.
Horikawa I, Barrett JC: Transcriptional regulation of the telomerase nTERT gene as a target for cellular and viral oncogenic mechanisms. Carcinogenesis. 2003, 24: 1167-1176. 10.1093/carcin/bgg085.
Masutomi K, Yu EY, Khurts S, Ben-Porath I, Currier GL, Metz GB, Brooks MW, Kaneko S, Murakami S, DeCaprio JA, et al: Telomerase maintains telomere structure in normal human cells. Cell. 2003, 114: 241-253. 10.1016/S0092-8674(03)00550-6.
Yan H, Dobbie Z, Gruber SB, Markowitz S, Romans K, Giardiello FM, Kinzler KW, Vogelstein B: Small changes in expression affect predisposition to tumorigenesis. Nature Genetics. 2002, 30: 25-26. 10.1038/ng799.
Fraser HB, Hirsh AE, Glaever G, Kumm J, Eisen MB: Noise minimization in eukaryotic gene expression. PLoS Biology 2. 2004, 834-838.
Boon K, Osorio EC, Greenhut SF, Schaefer CF, Shoemaker J, Polyak K, Morin PJ, Buetow KH, Strausberg RL, De Souza SJ, et al: An anatomy of normal and malignant gene expression. Proc Natl Acad Sci U S A. 2002, 99: 11547-11548. 10.1073/pnas.152324199.
Maupin-Furlow JA, Wilson HL, Kaczowska SJ, Ou MS: Proteasomes in the archea: from structure to function. Frontiers in Bioscience. 2000, 5: 837-865.
NCBI SAGEmap. 2006, [http://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/projects/SAGE]
Bradbury J: Human Epigenome Project-Up and Running. PLoS Biology. 2003, 1: 316-319.
Mills M, Lacroix L, Arimondo PB, Leroy JL, Francois JC, Klump H, Mergny JL: Unusual DNA conformations: implications for telomeres. Curr Med Chem Anti-cancer Agents. 2002, 2: 627-644. 10.2174/1568011023353877.
Haeusler RA, Engelke DR: Genome organization in three dimensions: thinking outside the line. Cell Cycle. 2004,
Bartel DP: MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004, 116: 281-297. 10.1016/S0092-8674(04)00045-5.
Taatjes DJ, Marr MT, Tjian R: Regulatory diversity among metazoan co-activator complexes. Nature Reviews Molecular Cell Biology. 2005, 5: 403-410. 10.1038/nrm1369.
Misteli T: Spatial positioning. A new dimension in genome function. Cell. 2004, 119: 153-156. 10.1016/j.cell.2004.09.035.
Cremer T, Kupper K, Dietzel S, Fakan S: Higher order chromatin architecture in the cell nucleus: on the way from structure to function. Biol Cell. 2004, 555-567. 10.1016/j.biolcel.2004.07.002.
Morgan WF: Non-targeted and delayed effects of exposure to ionizing radiation: II. Radiation-induced genomic instability and bystander effects in vivo, clastogenic factors and transgenerational effects. Raiation Research. 2003, 159: 581-596.
Stamfer MR, Yaswen P: Human epithelial cell immortalization as a step in carcinogenesis. Cancer Lett. 2003, 194: 199-208. 10.1016/S0304-3835(02)00707-3.
Giles J: Stephen Wolfram:what kind of science is this?. Nature. 2003, 417: 216-218. 10.1038/417216a.
NCBI SAGEmap. 2006, [http://http:www.ncbi.nlm.nih.gov/projects/SAGE]
GEO database. 2006, [http://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/projects/geo]
The many critical remarks made by Dr. M.G. Layton were greatly appreciated. The continuing support by Mrs M.T.F.J. Simons van den Eerenbeemt for my work deserves more than this "thank you".
About this article
Cite this article
Simons, J.W. Hereditary profiles of disorderly transcription?. Biol Direct 1, 9 (2006). https://0-doi-org.brum.beds.ac.uk/10.1186/1745-6150-1-9