genome sequencing

Should I pay or should I go? Or what do we really know about mutations and disease liability?

Current genome sequencing technologies already allow for cost effective and comprehensive disease prediction. However, for the average consumer, does it really make sense to pay $5,000 for a whole genome analysis at this time? And what can they gain from an analysis of this sort? After all, genome sequencing will identify three to five million of polymorphic variants, for the vast majority of which are not informative from the health perspective.

Let’s see what awaits you in this process:

First, your genome will be compared with an “average” human genome, which is a bit troublesome as these analyses do not capture structural variations – i.e. large scale genome re-arrangements, and majority of duplications or deletions (Wong, Suchard et al. 2008). The second phase of genome analysis is the identification of single nucleotide polymorphisms, only small fraction of which is called a “mutation” that is potentially disease-casing change in the genome. In the third phase of analysis involves sorting of these variants to three loosely defined categories: non-pathogenic, medically relevant or potentially pathogenic, and the variants of unknown significance. Because such categorization is imperfect the estimates of disease predictions require periodical re-analysis with more up-to-date genome annotation, so if you have your genome analyzed, you will know much more about yourself in the future.

However in a short term there are two main areas of concern. The first are of concern is associated with the uncertainty of disease risk estimation, which happens when a novel variant in a known disease-causing gene is identified. For example, severe mutations in the BRCA1 gene are known to the increase risk of breast cancer. Some mutations, particularly those that cause premature protein truncation (nonsense or frameshift mutations), are clearly pathogenic and do not require functional validation or prior knowledge of disease associations. However, for missense mutations, which substitute one amino acid to another in protein sequence, the functional consequences are less clear-cut. Some of missense mutations are disease causing if they affect amino acids essential for protein function or correct folding, while others are simply non-pathogenic and redundant changes.

In such cases clear co-segregation of the mutation with disease has to be observed – i.e. whether the mutation was more frequently transmitted from an affected parent to affected sibling, rather then to unaffected sibling. Alternatively, functional validation in experimental models can provide insights on pathogenicity of missense mutations. However, such functional validation is feasible only if an appropriate experimental model exists, and typically is very costly and slow. Unfortunately in many diseases, the majority of mutations are novel missense mutations and are observed only in one family (sometimes these are called “private mutations”). Therefore, it is quite challenging to confirm the pathogenicity of such mutations without extensive family analysis and disease segregation analyses.

In some cases researchers are trying to model potential disease liability of such mutations through computational models, all of which have the key underlying assumption: more evolutionary conserved amino acids should be more important. This assumption works quite well for very well conserved proteins that kept their basic function unchanged through evolution of vertebrates. However, we know that humans are different from monkeys due to rapid evolution in multiple proteins. Therefore, for a substantial fraction of proteins this assumption is not valid (Dorfman, Nalpathamkalam et al.). Therefore, geneticists and computer scientists have to be much more careful in the application of the in silico prediction algorithms (such as PANTHER, PolyPhen, SIFT and many others) and have to consider whether the analyzed gene is evolutionary conserved by itself before venturing into the analyses. Ideally such algorithms should include a weighting factor for entire gene conservation, but such factoring is not being implemented yet.

Ironically, because of these basic assumptions, researchers discount the potential role of synonymous mutations, which do not change protein sequence but cause change in the DNA. Genetic association studies frequently identify synonymous mutations in multiple diseases almost as commonly as the missense mutations (Chen, Davydov et al. 2010), but researchers do not have means to functionally validate these variants, and these variants are typically excluded from further analysis (incorrectly) assuming that such mutations are less likely to be pathogenic (Sauna and Kimchi-Sarfaty 2011). After all scientists are falling for the same biases as everybody else and prefer to look for lost keys in illuminated areas rather than in total darkness. Nevertheless, in rare cases synonymous variants have been found to be disease causing via effects on RNA translation or splicing.

However, the much bigger problem is that we do not know much about the function of most of the genes! And how can we combine the additive effects of multiple risk factors? These are the questions to be answered… So the third step of genome analysis is the weakest element and still requires a lot of optimization; this is a tremendously competitive area of genetic research which will yield more accurate disease predictions in the future.

So should I pay (for genetic testing) or should I go (without it)?

I would argue that it is up to you, but here are a few points for your consideration: one can use exome sequencing, which analyzes only 5-8% of the genome, but these parts include the protein coding sequences which we can interpret better right now. Again, this is a “search under the light post” problem, but until we will get a better flashlight or the sunrise, this is reasonable and cost effective strategy: currently exome sequencing costs about $1,500, but provides you over 90% of the information you could get from the whole genome analysis. Secondly, most of human diseases are very complex and involve multiple proteins functioning in multiple pathways, so one disease can have and has multiple causes, as respectively requiring distinct treatments. Therefore, the drug that was effective for controlling hypertension for your neighbor is unlikely to work for you. Although we do not know much yet about many diseases, and there are quite a few disease specific associations that at least hint on disease pathways, so regardless of fragmented knowledge, poor accuracy of disease estimation risks of current models, we still can glean a lot of practically useful information such as specific drug response. Thirdly, the disease specific information that you will get for rare, highly robust disease associations is tremendously important for your health and your family! (for details please read the next post named To know, or not to know: that is the question).


Chen, R., E. V. Davydov, et al. (2010). “Non-synonymous and synonymous coding SNPs show similar likelihood and effect size of human disease association.” PLoS One 5(10): e13574.
Dorfman, R., T. Nalpathamkalam, et al. “Do common in silico tools predict the clinical consequences of amino-acid substitutions in the CFTR gene?” Clin Genet.
Sauna, Z. E. and C. Kimchi-Sarfaty (2011). “Understanding the contribution of synonymous mutations to human disease.” Nat Rev Genet 12(10): 683-691.
Wong, K. M., M. A. Suchard, et al. (2008). “Alignment uncertainty and genomic analysis.” Science 319(5862): 473-476.