Our mission is to make a wide spectrum of data about humans accessible to increase biological literacy and improve human health. PersonalGenomes.org is a nonprofit organization working to generate, aggregate and interpret human biological and trait data on an unprecedented scale using open-source, open-access and open-consent frameworks. Our efforts are informed by values encouraging greater transparency and collaboration between researchers and participants.
We believe obtaining a personal genome sequence is an activity that soon will be shared by millions of individuals around the world. To improve our understanding of how human traits are formed through the interactions between genomes and their environments, a much more holistic picture of the human experience is needed. The ideal scientific resource would be to create a collection of many human genomes that remain connected to their owners who contribute additional information over their lifetime, such as longitudinal health status, medical and social history, environmental exposures, nutrition, lifestyle, physical measurements, blood chemistry, presence or absence of microbes and viruses, and many other kinds of data.
Even if a person’s name, home address or facial photograph is specifically excluded, a dataset like the one we are building is far from anonymous. It is simply too easy for someone to connect the dots and reveal a person’s identity. Moreover, data breaches are not uncommon even in the most highly regulated arenas like national intelligence where secrets are heavily guarded with extensive security clearance protocols and background checks. We think it is very important to be honest about how difficult it is to simultaneously share and protect data.
Sharing data is critical for enabling discovery. Assembling under one roof a research team with the requisite expertise to generate, aggregate and interpret this dream dataset is unrealistic. Expertise is too diffuse. Major contributions might come from unconventional actors residing in far flung corners of the globe. Einstein started off as a patent clerk after all. Citizen scientists, hobbyists, amateurs and the participants themselves will undoubtedly make significant contributions.
We feel the most ethical and practical solution to this dilemma is to turn the privacy problem on its head and collaborate with individuals who are willing to share their data publicly with the understanding that re-identification is possible. We also will reduce potential misunderstandings by requiring prospective participants to demonstrate that they comprehend the public, non-anonymous nature of this endeavor as part of our unique consent process.
We will grow this exceptional public resource by seeding a cohort of well-consented individuals with extensive genomic data and then invite a network of researchers to recruit from this cohort for additional phenotyping and molecular profiling, under the condition that they return computable datasets to the research participants. These participants, in turn, may donate their data to the public domain for others to use, thereby reinforcing the virtuous circle of sharing.
We will encourage widespread use of this public data resource as a platform for scientific research, education, improvement of the public health, public- and private-sector innovation, benchmarking and standardization, and personal exploration.