The CyVerse Node newsletter picked up the story about the contributions of University of Arizona researchers and HPC and CyVerse computational resources to the world's largest plant genome sequencing project.
In the culmination of a nine-year research project, gene sequences for more than 1,100 plant species have been released by an international consortium of nearly 200 plant scientists, in which scientists and a supercomputer at the University of Arizona played a key role.
The One Thousand Plant Transcriptomes Initiative, or 1KP, is a global collaboration to examine the diversification of plant species, genes and genomes across the more than 1-billion-year history of green plants dating back to the ancestors of flowering plants and green algae.
The project is by far biggest effort to decipher genomes across the kingdom of plants, according to Mike Barker, an associate professor in the University of Arizona Department of Ecology and Evolutionary Biology and one of the leading authors of the study report.
Until now, plant scientists had generated reference genomes from a relative handful of plant representatives, including Arabidopsis, the "fruit fly" of plant genetics, rice, a fern, and a moss. But those were mere dots of light in the largely dark tree comprising about 400,000 species of known land plants alone, Barker said.
"You could say we have turned on the lights in the dark corridors of the plant tree of life where we haven't been able to look before," he said. "We went from a few light bulbs lighting up isolated rooms to 1,500."
"In the tree of life, everything is interrelated,” said Gane Ka-Shu Wong, lead investigator and professor in the University of Alberta Department of Biological Sciences. "And if we want to understand how the tree of life works, we need to examine the relationships between species. That’s where genetic sequencing comes in."
The paper, "One Thousand Plant Transcriptomes and Phylogenomics of Green Plants," was published Oct. 23 in Nature. The findings reveal the timing of whole genome duplications and the origins, expansions and contractions of gene families contributing to fundamental genetic innovations enabling the evolution of green algae, mosses, ferns, conifer trees, flowering plants and all other green plant lineages. The history of how and when plants secured the ability to grow tall, and make seeds, flowers, and fruits provides a framework for understanding plant diversity around the planet, including annual crops and long-lived forest tree species.
By sequencing and analyzing genes from a broad sampling of plant species, researchers are better able to reconstruct gene content in the ancestors of all crops and model plant species, and gain a more complete picture of the gene and genome duplications that enabled evolutionary innovations.
Nearly a decade ago, Wong organized private funding through the Somekh Family Foundation as well as support from the Government of Alberta and a sequencing commitment from BGI in Shenzhen, China, to launch 1KP. Once the project was operational, additional resources came from other ongoing projects, including iPlant (now CyVerse), a national project providing computational infrastructure and data science training for life sciences research funded by the US National Science Foundation and housed at the University of Arizona.
The massive scope of the project demanded development and refinement of new computational tools for sequence assembly and phylogenetic analysis. The research team behind the decade-long project used super computers to process the genetic sequences from plant samples and map the data onto more than a half-million "family trees" showing the relationship among gene families.
Barker said the University of Arizona stood out from the collaboration in that several undergraduate students were at the core of the project, most notably Thomas Kidder, Sally Galuska and Chris Reardon, all of whom graduated with degrees in bioinformatics. They worked closely with Zheng Li, a doctoral student in Barker’s lab, to analyze hundreds of thousands of gene trees.
"It would be difficult to do these analyses anywhere else," Barker said. "The high-performance computer facilities at the University of Arizona and CyVerse made analyses of this scale possible in the first place."
Read the full article at CyVerse.