Universal Newborn Genome Sequencing and Generation Alpha

Imagine the day that genome sequencing of all newborns begins. Instantly two cohorts of people will form: the expanding youngest, with a tremendous amount of personal information stored on a cloud, and the shrinking rest of us, with little knowledge of our genomes.

A century from now, possibly everyone will have access to her or his genome data. But until then, how can we prepare to handle the avalanche of information about what I’d call, if I were a science fiction writer, “generation Alpha?”

My idea of the Alphas is inspired by the 1992 dystopian novel “The Children of Men,” by P.D. James. In 1994, all human sperm suddenly die, and 1995 becomes Year Omega. After that, populations plummet and age in the face of global infertility, with the last remaining people, the Omegas, struggling towards inevitable extinction.



What will happen in our world, in our society, as the Alphas age?


Mining sequenced genomes today has the very best of intentions: ending the “diagnostic odysseys” that patients, typically children with rare or one-of-a-kind diseases, endure. But just as opening a magazine can reveal much more than the article one is looking for, a genome sequence provides hundreds of thousands of gene variants that might mean something about a person’s health, perhaps things totally unexpected.

For now, to restrict dissemination of information to the meaningful, the American College of Medical Genetics and Genomics lists 56 “actionable” secondary findings, a minimal menu of genetic conditions which doctors can prevent or treat that show up while looking for something that could explain symptoms. The list will grow as more genes and their variants are identified, and these conditions have already outgrown their initial designation as “incidental.” They’re important.

Thousands of newborns have already had their genomes sequenced, as part of a handful of projects at major research centers. The actual deciphering can take under a day – a lot better than the decade required to sequence the first human genomes. But our understanding of how genotype becomes phenotype lags far behind the ability to decipher the sequence. The value of an “annotated” genome compared to “raw sequence” is like comparing the plot of To Kill a Mockingbird to a pile of word-size pieces cut from a copy of the book. When it comes to genomes, meaning and context are everything.

ATCG's Image with Group of PeopleBEYOND THE USUAL SUSPECTS

The era of looking for what we already know, the “round up the usual suspects” approach to gene identification and disease diagnosis, will gradually end as more human genome sequences and their interpretations are stored in clouds. Our algorithms will ultimately identify all possible gene variants and their interactions – and what they mean at the whole-body level, the phenotype.

My concern is not those “usual suspects,” the well-studied mutations that lie behind single-gene disorders: cystic fibrosis, sickle cell disease, Huntington disease. I fear the fuzzier genetic information. Genome-wide association studies, for example, identify suites of gene variants that signal a good chance that an illness will happen, but not with the power of a clinical diagnosis based on symptoms and biomarkers. The media often trumpet such findings with a false sense of certainty.

(Note on terminology: “gene variant” is a broader, more politically correct term without the negative connotation of “mutation,” which classically means “change in a gene” from the most common form [“wild type”] in a particular population.)

What I fear most isn’t the use of genome information in predicting or diagnosing disease, but in identifying the harder-to-follow, multifactorial traits that are molded by genes and the environment and therefore much more difficult to trace or quantify: intelligence, personality, temperament, talents. Each gene contributes a small amount and to a differing degree to characteristics that aren’t as neatly predictable as the single-gene, Mendelian disorders like the hemophilias.

Newborn_baby_in_hospital_by_Bonnie_GruenbergWill the idea of genetic determinism – that we are our genes – strengthen as the stockpile of genomic information swells through the population, beginning with the youngest? Will the practice become the ultimate example of paternalism, because newborns didn’t provide permission? As they age, can they choose not to know? Will that even be imaginable, as today it’s difficult to envision or remember a time without the Internet?

Choosing not to know will be especially difficult if others have access to genome information. And who should those others be?


Annotated genome sequences could guide pediatricians in troubleshooting problems, providing a powerful new tool in preventive medicine. At the first birthday, a microbiome analysis might identify children with tendencies towards certain conditions, or with insufficiently challenged immune systems.



Beyond infancy, will availability of genome information fuel stratification as DNA data better predict who is most likely to benefit from a scarce medical resource, and only the young have that information? Years from now, will I be denied a treatment unless I have my genome sequenced to show that I’m just as likely to benefit as a 16-year-old whose genome has been in the electronic medical record since birth?

In a few years, will posh preschools scan applicants’ genome information to select pupils? Will teachers use it to create compatible study groups, or to identify a tendency to bully and treat such a child like future criminals were punished in the dystopian future of the Tom Cruise film Minority Report?

Will standardized test scores be compared to DNA data to deduce whether students are working up to their potential? Will employers look for genomic red flags, the way they stalk Facebook now for evidence of stupidity? This blog has already discussed DNA and dating.


I’m not sure where all this is heading, but it is coming. Widespread newborn genome sequencing could happen within a decade, experts tell me.

Francis Collins wrote in the Wall Street Journal July 7, 2014: “Over the course of the next few decades, the availability of cheap, efficient DNA sequencing technology will lead to a medical landscape in which each baby’s genome is sequenced, and that information is used to shape a lifetime of personalized strategies for disease prevention, detection and treatment.

Is Dr. Collins’ view too narrow? Genome information can be used for purposes other than healthcare. After all, genetic genealogy is based on using landmarks in genomes to identify individuals.



Some may say genome data will be secure, we can control access, and limit how much an individual can know about her or his DNA. But did the top executives of Sony Pictures Entertainment last fall ever imagine that all of the company’s as well as their personal e-mails would rain down on the media from the great iCloud in the sky, in 8 humungous and mortifying data dumps?

At least it can be argued that Jennifer Lawrence’s naked photos wouldn’t have gone everywhere if she hadn’t  sent them to a supposedly safe cloud in the first place. But what about the 11 million customers of Premera Blue Cross, whose clinical records, bank account information, and social security numbers may have been released in a cyberattack in May 2014, reported in the media just two days ago?

Privacy breaches have already hampered DNA research. In 2013, Yaniv Erlich, from the Whitehead Institute and his astute student Melissa Gymrek demonstrated their ability to identify people who’d anonymously donated their DNA to the 1000 Genomes Project. They cataloged the short tandem repeats on Y chromosomes that are used in genetic genealogy and matched them to surnames and public information found on Google, such as state and year of birth. Cross-referencing to DNA sequences of cells at the Coriell Cell Repositories and more sleuthing led to women DNA donors. It’s in Science 339:321, unfortunately behind a paywall. And I’ve heard at genetics meetings about children identified by crossreferencing databases that name their rare diseases and their hometowns.

Is the cat out of the bag for genomes already sequenced?

Is the cat out of the bag for genomes already sequenced?

As with Jennifer Lawrence’s revealing images, DNA sequences will be out there, along with a lot of other identifying information. What can we do to ensure that a Sony situation, health insurance leak, or clever use of public databases doesn’t reveal DNA information on a large scale? Late last year Google took on inexpensive genome sequence storage, although raw data may initially be of limited value.

Can we adequately de-identify people and protect the very DNA data that will lay the groundwork for precision medicine? Will the Alphas be the guinea pigs for genome-control? Maybe precision medicine should stick to storing only clinically relevant DNA information. For now.

At a conference to be held April 8-10 at Children’s Mercy, Kansas City, several research groups sequencing newborn genomes as part of an NIH-funded program will meet to discuss results so far and how the information will be used and protected. That’s a great start to what will certainly be an intriguing and important conversation.

(A version of this post appeared on March 16 at the Biopolitical Times blog at the Center for Genetics and Society.)