Artificial intelligence research laboratory DeepMind has created the most comprehensive map of human proteins to date using artificial intelligence. The company, a subsidiary of Google’s parent company Alphabet, publishes the data for free, and some researchers compare the potential impact of the work to the Human Genome Project, an international effort to map every human gene.
Proteins are long, complex molecules that perform numerous functions in the body from building tissue to fighting disease. Their purpose is dictated by their structure, which folds as origami into complex and irregular shapes. Understanding the refraction of a protein helps explain its function, which in turn helps scientists in a wide range of tasks – from basic research on body function to the design of new drugs and treatments.
Previously, protein structure determination was based on expensive and time consuming experiments. But last year, DeepMind proved capable of producing accurate predictions of protein structure using AI software called AlphaFold. Now the company publishes hundreds of thousands of forecasts made by the program to the public.
“I think this is DeepMind’s entire lifespan of more than 10 years,” said Demis Hassabis, the company’s CEO and founder. Limit. “From the beginning, this is our job: to make breakthroughs in artificial intelligence, to test it in games like Go and Atari, [and] apply it to real-world problems so we can see if we can accelerate scientific breakthroughs and use them for the benefit of humanity. ”
Currently, approximately 180,000 protein structures are publicly available, each prepared by experimental methods and accessible through the Protein Data Bank. DeepMind publishes predictions of the structure of about 350,000 proteins in 20 different organisms, including animals such as mice and fruit flies, and bacteria such as E. coli. (There are some overlaps between DeepMind data and existing protein structures, but how much is difficult to quantify due to the nature of the models.) Most notably, the publication includes predictions for 98 percent of all human proteins, approximately 20,000 different structures known together for human proteomin. It is not the first public data on human proteins, but it is the most comprehensive and accurate.
If desired, scientists can load the entire human proteome for themselves, says John Jumper, technical director at AlphaFold. “HumanProteome.zip is efficient, I think its size is about 50 gigabytes,” Jumper says Limit. “You can put it on a flash drive if you want, even if it doesn’t do you much good without a computer to analyze!”
Following the launch of this first data collection, DeepMind intends to continue to increase its protein stock, which is maintained by the European Flag Science Bioscience Laboratory, the European Molecular Biology Laboratory (EMBL). By the end of the year, DeepMind hopes to release predictions for one hundred million protein structures, a data set that will “shape our understanding of how life works,” says EMBL CEO Edith Heard.
The data will be forever free to both scientific and commercial researchers, Hassabis says. “Anyone can use it anywhere,” DeepMind’s CEO said at a news conference. “They just have to give credit to the people involved in the quote.”
Understanding the structure of a protein is useful to researchers in several fields. The data can help design new drugs, synthesize new enzymes that break down waste materials, and create plants that are resistant to viruses or extreme weather conditions. DeepMind protein predictions are already in use medical research, including studying SARS-CoV-2 activity, the virus that causes COVID-19.
New data will accelerate these efforts, but researchers find that translating this data into real results will still take a long time. “I don’t think it’s going to be something that will change the way patients are treated within a year, but it will certainly have a huge impact on the scientific community,” Marcelo C. Sousa, a professor in the Department of Biochemistry at the University of Colorado, said. Limit.
Researchers need to get used to having such information at their disposal, says Kathryn Tunyasuvunakool, a senior researcher at DeepMind. “As a biologist, I can confirm that we don’t have a game book to look at up to 20,000 structures, so this [amount of data] is very unexpected, ”said Tunyasuvunakool Limit. “Analyze hundreds of thousands of structures – it’s crazy. “
DeepMind software in particular produces forecasts instead of experimentally determined models instead of protein structures, which means that in some cases more work is needed to verify the structure. DeepMind says it has spent a lot of time building Precision Data into its AlphaFold software, which appreciates how confident it is for each forecast.
However, predictions of protein structures are very useful. Determining the structure of a protein by experimental methods is expensive, time consuming, and based on a lot of experimentation and error. This means that even a weakly confident forecast can save researchers years of work by pointing them in the right direction for research.
Helen Walden, professor of structural biology at the University of Glasgow, says Limit that DeepMind’s data “significantly alleviates” research bottlenecks, but that “for example, doing biochemistry and biological assessment of drug functions is a laborious, resource-consuming job”.
Sousa, who has previously used information about AlphaFold in his work, tells researchers the effect will be felt immediately. “In collaboration with DeepMind, we had a data set on a protein sample that we had had for 10 years, and we never got to the point where we were developing a suitable model,” he says. “DeepMind agreed to provide us with a structure, and they were able to solve the problem in 15 minutes after we had been sitting in it for 10 years.”
Why protein folding is so difficult
Proteins are made up of amino acid chains of 20 different varieties in the human body. Because any single protein can consist of hundreds of individual amino acids, each of which can be folded and rotated in different directions, it means that the final structure of the molecule has an incredibly large number of possible configurations. One estimate is that a typical protein can be folded in 10 ^ 300 ways – that is 1, followed by 300 zeros.
Because proteins are too small to be examined under microscopes, researchers have had to indirectly determine their structure using expensive and complex methods such as nuclear magnetic resonance and X-ray crystallography. The idea of determining the structure of a protein simply by reading a list of the amino acids it contains has long been theorized, but it is difficult to achieve, which has led many to describe it as a “major challenge” in biology.
In recent years, however, computational methods – especially those using artificial intelligence – have suggested that such an analysis is possible. With these techniques, artificial intelligence systems are trained in the materials of known protein structures and use this knowledge to create their own predictions.
Many groups have been working on this problem for years, but DeepMind’s in-depth artificial intelligence talent and access to computer resources gave it a dramatic boost. Last year, the company competed in an international CASP protein folding competition and blew up the competition. Its results were so accurate that computational biologist John Moult, one of the founders of CASP, said that “the problem is in some sense [of protein folding] has been solved. ”
DeepMind’s AlphaFold program has been updated since last year’s CASP competition and is now 16 times faster. “We can fold the average protein in minutes, in most cases in seconds,” Hassabis says. The company also released the underlying code To AlphaFold last week in open source format so others can take advantage of its work in the future.
Liam McGuffin, a professor at Reading University who developed some of the UK’s leading protein folding software, praised AlphaFold’s technical brilliance, but also noted that the program’s success was based on decades of previous research and public knowledge. “DeepMind has tremendous resources to keep this database up to date, and they have a better chance of doing this than any single academic group,” McGuffin said. Limit. “I think the researchers would have gotten there eventually, but it would have been slower because we don’t have that many resources.”
Why does DeepMind care?
Many scientists Limit spoke of DeepMind’s generosity to release this information for free. After all, the lab is owned by Google’s senior alphabet, which has poured huge amounts of resources into commercial healthcare projects. DeepMind itself lose a lot of money every year, and has been several reports tensions between the company and its parent company on issues such as research autonomy and commercial viability.
However, Hassabis says Limit that the company always intended to make this information freely available and that doing so is a realization of DeepMind’s fundamental ethics. He emphasizes that DeepMind’s work is used in many places on Google – “almost any technology that is part of it under the hood”, but that the company’s primary goal has always been basic research.
“When we get the contract, we’re here primarily to advance the state of AGI and artificial intelligence technologies and then use them to accelerate scientific breakthroughs,” Hassabis says. “[Alphabet] he has a lot of distributions focused on making money, ”he adds and points out that DeepMind’s focus on research“ brings all sorts of benefits to the prestige and goodwill of the scientific community. It is possible to achieve value in many ways. ”
Hassabis predicts that AlphaFold is a sign of things to come – a project that shows the enormous potential of artificial intelligence to deal with messy problems such as human biology.
“I think we’re in a really exciting moment,” he says. “Over the next decade, we and other AI players hope to produce amazing breakthroughs that will really accelerate solutions to the really big problems we have here on earth.”