Yaron Turpaz, Ph.D., MBA, CIO, Human Longetivity, Inc.
Human Longevity, Inc. was launched in March 2014 with a mission to transform healthcare and accelerate the practice of personalized, preventive healthcare via detailed and comprehensive genomics analysis and risks assessment, in integration with high quality phenotypic and medical information. We have built the largest human genome sequencing facility in the world with 24 Illumina HiSeq X machines and two Pacific Bioscience instruments. With an unprecedented capacity to produce more than 30,000 whole human genomes per year, at 30X genome coverage, we have built an integrated KnowledgebaseTM and developed cloud based solutions to process, analyze and visualize such complex multi-dimensional data that provides time- sensitive meaningful scientific and personalized health insights. To date, we have processed more than 4PB of genomics data from more than 20,000 integrated genomes and health records, and are on track to have more than 1M integrated health records in our KnowledgebaseTM by 2020. This requires us to build solutions that support very large unstructured data sets with real time analysis of complex queries.
"We have built an integrated knowledgebase and developed cloud based solutions to process, analyze and visualize such complex multi-dimensional data that procures time-sensitive, meaningful, scientific and personalized health insights"
We are using the EMC Isilon platform as hot storage for the genome sequencing data that are generated by each of the sequencing machines at the lab, and then transfer the encrypted packaged data to the Amazon cloud (AWS) for all of our production and downstream data analysis. The cloud environment allows us to operate at scale with fast ramp up of storage (S3) and compute (EC2 & λ). It also allows us to securely share data with our customers, such as pharmaceutical companies, hospitals, research institutes and health insurance companies. We are utilizing a plethora of AWS services and in-house developed solutions to manage, track and optimize our usage of the cloud. We put special emphasize on real time tagging of EC2 instances and S3 data to track usage, performance and costs across the organization.
Some of our genomics analysis is extremely time- sensitive, such as whole genome tumor analysis of cancer patients and whole genome germline analysis of newborns with rare diseases. The cloud allows us to dedicate the required compute resources on short notice and process our full production analysis pipeline in record time, which accelerates communication of results to physicians and ultimately to the patients.
On October 2015 we launched the first HLI Health NucleusTM in San Diego, where healthy individuals can receive a comprehensive health screening, including full body MRI, 4D Echocardiogram, DEXA bone density and body mass analysis, whole genome sequencing, metagenomic analysis of the gut microbiome, and metabolomics, among other medical and phenotypic screening tests. We have built a cloud-based solution to store, analyze and report such rich personalized data of each client, and developed a 3D personalized medical avatar that facilitates user-friendly navigation of dynamic medical knowledge. Here again, utilization of cloud computing was essential in the constant optimization of speed and integrated analysis of exponentially growing health and life science Big Data. I believe that pharmaceutical companies, health insurance companies, hospitals, and the life science industry will fully transition to cloud computing in the next decade, as a key enabler to store, process, analyze and visualize personalized medical, genomics and health data.