• Open data is a critical component of the scientific method, but genomes are both identifiable and predictive. As a result, most studies choose to withhold data from participants and restrict access to researchers, hampering the connections and sample sizes needed for precision medicine to work (see our previous blog covering this). The Personal Genome Project (PGP) has pioneered using detailed and portable “open consent” procedures to move beyond these restrictions for the greater good, using volunteers willing to donate diverse personal information to become a public resource. Founded by George Church of Harvard Medical School, and spawning a network of regional offshoots such as PGP UK (founded by our editorial board member Stephan Beck), PGP aims to produce a unique resource for humans, providing open access to genes, environments and traits. Starting with George being their first personal genomics volunteer (see #PGP1’s profile here) they now have over 5,000 registered users that have passed the “open consent” procedure and have started updating diverse phenotypic information.

    Complete Genomics (a subsidiary of our co-publishers BGI), and the PGP today publish a Data Note releasing and describing of over 100 individual whole genome sequences with experimental haplotype phasing. This set of personal genomics data was generated using Complete Genomics Long Fragment Read (LFR) technology and represents the largest set of high coverage whole human genome assemblies with comprehensive experimentally determined haplotypes.

    “The vast majority of genomic data that has been generated to date is without experimentally derived haplotypes” explained Dr. Brock Peters, Senior Director of Research and project leader for Complete Genomics. “This represents a very unique set of data that is freely available for anyone to use through open access data publication.” A total of 184 individuals, recruited by the PGP, took part in the project. Each individual consented to have their identity, their genome, and their phenotype data made freely and publicly available. Blood samples were collected by the PGP team and sent to Complete Genomics for DNA isolation, LFR library generation, and whole genome sequencing. Currently 114 genome assemblies are available with the remaining 70 expected to be released in the coming few months after the sign off of the donors.

    “In 2011, we made freely available a set of 69 whole human genome sequence assemblies which quickly became a highly utilized resource and benchmark for the genetics community,” stated Dr. Radoje Drmanac, CSO of Complete Genomics. “We are proud to continue the tradition by releasing this set of experimentally haplotyped whole human genome sequence assemblies. This represents the largest and most accurate set of human haplotypes currently available.” The terabytes of sequencing data and detailed phenotypic information is available from dbGap (phs000905.v1.p1), the PGP website and the GigaScience GigaDB repository (doi:10.5524/100242).

    “Combining Complete Genomics’ advanced WGS with the PGP’s informed consent policy which allows for unrestricted access and GigaScience’s open access data publication method enables the full release of a large data set with exceptional scientific value. We expect it will be used by many researchers around the world”, explained Dr. Church.

    The technology used to generate this dataset, LFR, was previously described by Complete Genomics in a 2012 Nature publication. In our new publication LFR was again shown to be highly accurate and complete. Each sample was sequenced to 100X coverage allowing for the detection of most variants with high confidence. This allowed for over 98% of heterozygous variants to be placed into long contigs approaching 1 Mb in length. On average, over 85% of haplotypes contained no errors with the majority of the remaining 15% having only a single phasing error.

    We encourage use of this data by the academic community and beyond, as George Church says in the above video, empowering the credential less out-of-the-box thinkers who usually would not get access to this type of data. On top of the high quality of this phased data, the large number and politics free and open nature of these datasets will make them a priceless reference in enabling genome-driven precision medicine to succeed.


    No comments

    Be the first one to leave a comment.

    Post a Comment



    Latest Posts

    Latest Video



    Chan Zuckerberg Initiative’s AI Acquisition Will Make Science Free for All

    Dr. Priscilla Chan and mark Zuckerberg. Image Credit: Chan Zuckerberg SOURCE IN BRIEF The Chan Zuckerberg initiative has taken a huge first step toward their…

    Scientists create a 3-D bioprinter to print human skin (w/video)

    SOURCE (Nanowerk News) This research has recently been published in the electronic version of the scientific journal Biofabrication (“3D bioprinting of functional human skin: production…

    Single-cell epigenomic variability reveals functional cancer heterogeneity

    Abstract Background Cell-to-cell heterogeneity is a major driver of cancer evolution, progression, and emergence of drug resistance. Epigenomic variation at the single-cell level can rapidly…

    Artificial Intelligence Predicts when heart will Fail

    Image copyrightMRC LMS Image captionThe software creates a virtual heart to predict the risk of death Artificial intelligence can predict when patients with a heart…

    Chinese surgeons use 3D printing in two landmark paediatric heart surgeries

     SOURCE Chinese surgeons have utilised 3D printing technology to perform two different paediatric heart surgeries. The surgeries took place at the Second Xiangya Hospital of Central…

    How Apps are Changing the Medical Field

      Our last article about Catalia Health and the company’s use of social robots to provide a better healthcare experience to patients created a huge…

    IBM Watson finding its way into real-world image interpretation

    Posted by admin / January 23, 2017 / Edit post Photo: Twitter user Deborah DiSanzo SOURCE A large radiology practice in the Miami area is…

    IBM uses Smartphone to to help Diagnose Melanoma

    SOURCE The smartphone is on a collision course with your local dermatologist. IBM researchers have developed a computer system that early research shows is more…