Developing a data coordination platform for the Human Cell Atlas
The Chan Zuckerberg Initiative (CZI) has announced financial support for the Human Cell Atlas, which is using sequencing technology to redefine every cell in the body. Funding and engineering support from CZI will enable the European Bioinformatics Institute (EMBL-EBI), the Broad Institute and the University of California Santa Cruz Genomics Institute (UCSC) to set up an open, cloud-based Data Coordination Platform to check, share and analyse the vast amounts of diverse information generated.
The New Anatomy
Molecular biology has advanced so far and fast in the past two decades that scientists believe it's time to rethink human anatomy, starting from the smallest unit: the cell. The Human Cell Atlas, led by the Broad Institute and Wellcome Trust Sanger Institute, aims to do just that by creating a new, open, accessible reference map of the healthy human body.
"Anatomy textbooks as they are now were designed by assigning meaning according to how things look and function. Now, we're using molecular tools to characterise what's going on in organs and tissues, and to get a deeper view of anatomy. That's the Human Cell Atlas," explains Dr John Marioni, Research Group Leader at EMBL-EBI, and EMBL-EBI lead on the Data Coordination Platform steering group.
This international collaboration is using RNA sequencing technology to define cells in a whole new way. Such a highly specific, sequencing-based reference of healthy human function will be transformative for biomedical research. One thing is certain: it will mean a lot of data.
Transparency and transformation
"The scale of the Atlas will be in the tens of millions of datasets," says Dr Sarah Teichmann, Head of Cellular Genetics at the Wellcome Trust Sanger Institute and joint leader of the Human Cell Atlas. "Interoperability and transparency are essential for keeping so many moving parts working - we know this from our long experience collaborating with one another. We've designed the data architecture as open-source and modular from the get-go. That will make it easier for others to use and add to the Atlas in the future."
The new cloud-based pipeline will allow Human Cell Atlas partners to upload their datasets, analyse them jointly, and compare healthy and diseased tissues meaningfully. It will shift life-science collaboration toward cloud technologies including Open Stack, Google and Amazon Web Services.
"The size and scope of this new data platform will require large-scale collaborations between informatics and genomics experts across academia and industry," said Cori Bargmann, president of science at the Chan Zuckerberg Initiative. "That is why we are thrilled to bring together three of the world's leading institutions in genomics, informatics, and data sharing to build this important new resource - and our own software engineers will help develop the tools and facilitate the collaboration. It is a great example of how we can help accelerate science by supporting collaborations across institutions and by bringing scientists and engineers together in new ways."
What is the Human Cell Atlas?
Cells are the most fundamental unit of life, yet we know surprisingly little about them. They vary enormously within the body, and express different sets of genes. Without maps of different cell types and where they are located in the body, we cannot describe all their functions and understand the biological networks that direct their activities.
A complete Human Cell Atlas will give us a unique ID card for each cell type, a three-dimensional map of how cell types work together to form tissues, knowledge of how all body systems are connected, and insights into how changes in the map underlie health and disease. It would allow us to identify which genes associated with disease are active in our bodies and where, and analyse the regulatory mechanisms that govern the production of different cell types.
The Human Cell Atlas will be freely available to scientists all over the world, transforming the fundamental understanding of human development and the progression of diseases. EMBL-EBI, the Broad Institute and the University of California Santa Cruz are building the Data Coordination Platform for scientists to check, share and analyse the vast amounts of diverse information generated.
Why does it matter?
"Imagine if, like me, you had somewhat impaired vision and you spent your life seeing things in a blur. Then one day, someone gives you glasses and you are suddenly able to see the world in its detailed beauty: letters on street signs, leaves on the trees. The HCA is like that pair of glasses. It will allow scientists to see biology much more clearly, in high resolution. Until we can see the processes that drive health and disease with that level of clarity, we won't ever fully understand them. And we need to understand them in order to develop medicines that act on them effectively." - Dr Aviv Regev, Chair of the Faculty at the Broad Institute, a professor of biology at MIT and co-chair of the organizing committee for the Human Cell Atlas.
Science is global
The raw data produced by Human Cell Atlas researchers will be stored and accessed at EMBL-EBI, flowed to platform partners in the US for cloud-based analysis and annotation, then sent back to EMBL-EBI to be stored and shared in the public archives, making it available to the wider world.
"Science is truly international, and that is clear in the way the Human Cell Atlas partners work across continents," says Ewan Birney, Director of EMBL-EBI and Chair of the Global Alliance for Genomics and Health. "Each partner brings substantial experience building essential data services for the life sciences. CZI is not just funding the project - they're a hands-on partner. So we know the Atlas will be built with the best engineering possible."
"This contribution is for all the world's biomedical scientists, because the Human Cell Atlas will be shared with everyone," says Dr Aviv Regev, a professor of biology at MIT and co-chair of the organizing committee for Human Cell Atlas. "CZI's support will help us start to build a data platform for scientists around the world to see and analyse each other's data, and to share the results of their work widely and openly. This will inspire others to ask new questions, and empower them to find the answers."
Building the platform is just the start of a colossal undertaking that will take many years to complete, during which technologies will inevitably change. The next step for the Data Coordination Platform is to plan for emerging technologies such as bioimaging, and for sustaining the public resource over the long term.
Source: EMBL Heidelberg