Scientists use artificial intelligence to dream up revolutionary new proteins

AI tools are helping scientists create proteins that are unlike anything in nature.Credit: Ian C. Haydon/UW Institute for Protein Design

In June, regulators in South Korea allowed the first-ever drug, the COVID vaccine, to be made from a new human-designed protein. The vaccine is based on a ‘nanoparticle’ globular protein that was created by researchers nearly a decade ago, through a labor-intensive trial and error process.1.

Now, thanks to massive advances in artificial intelligence (AI), a team led by David Baker, a biochemist at the University of Washington (UW) in Seattle, has reported in Science2And the3 That he could design such molecules in seconds instead of months.

These efforts are part of a marine science shift, as life scientists embrace AI tools such as DeepMind AlphaFold’s protein structure prediction program. In July, DeepMind revealed that the latest version of AlphaFold was Predicted structures for each protein known to science. Saw the last months Huge growth in artificial intelligence tools – Some are based on AlphaFold – He can quickly dream up completely new proteins. Previously, this was a hard endeavor with high failure rates.

“Since AlphaFold, there has been a shift in the way we work with protein design,” says Noelia Firuz, a computational biologist at the University of Girona, Spain. “We are seeing very exciting times.”

Most efforts focus on tools that can help make original proteins, unlike anything in nature, without focusing so much on what these molecules can do. But researchers — and a growing number of companies applying artificial intelligence to protein design — want to design proteins that can do useful things, from cleaning up toxic waste to treating disease. Among the companies working towards this goal are DeepMind in London and Meta (formerly Facebook) in Menlo Park, California.

“The methods are really effective. They will become more powerful,” Becker says. “The question is what problems are you going to solve with them.”

from zero

Baker’s lab has spent the past three decades making new proteins. A program called Rosetta, whose lab began developing in the 1990s, breaks the process down into steps. At first, the researchers envisioned a shape of a new protein — often by grouping pieces of other proteins together — and the software deduced a series of amino acids that matched that shape.

But these “first draft” proteins rarely fold into the desired shape when made in the lab, and instead end up in various assertions. So another step was needed to modify the protein sequence so that it only folds into one desired structure. Sergei Ovchinikov, an evolutionary biologist at Harvard University in Cambridge, Massachusetts, who worked in Baker’s lab, says the move, which involved simulating all the ways in which the different sequences could flex, was computationally expensive. “You would literally have, like, 10,000 computers running for weeks to do that.”

By tweaking AlphaFold and other AI software, this time-consuming step is now instant, says Ovchinnikov. In one approach developed by Baker’s team, called hallucinationsResearchers feed random amino acid sequences into a structure prediction network; This changes the structure so that it becomes more protein-like, as judged by network predictions. In a 2021 paper, Baker’s team created more than 100 small “hallucinogenic” proteins in the lab and found signs that roughly a fifth looked like expected.4

AlphaFold, and a similar tool developed by the Baker lab called RoseTTAFold, was trained to predict the structure of individual protein chains. But researchers soon discovered that such networks could also model multiple assemblies of interacting proteins. On this basis, Becker and his team were confident in their ability to delirium proteins that could self-assemble into nanoparticles of various shapes and sizes. These will consist of many copies of a single protein and will be similar to those on which the COVID-19 vaccine is based.

How to design a protein: a graph showing four techniques for designing new protein structures or sequences using artificial intelligence.

Nick Spencer /temper nature; Source: adapted from N. Ferruz et al. preprinting in bioRxiv (2022); and c. wang et al. Sciences 377, 387–394 (2022).

But when they instructed microorganisms to make their creations in the labs, none of the 150 designs worked. “They didn’t fold at all: They were just sloppy stuff at the bottom of the test tube,” Becker says.

At about the same time, another researcher in the lab, machine learning scientist Justas Dauparas, was developing a deep learning tool to tackle what’s known as the reverse folding problem — determining the protein sequence that corresponds to the overall shape of a given protein.3. The network, called ProteinMPNN, can act as a “spelling check” for designed proteins created with AlphaFold and other tools, Ovchinnikov says, by modifying the sequences while preserving the overall shape of the molecules.

When Becker and his team applied this second network to hallucinogenic protein nanoparticles, they had much more success making the molecules experimentally. The researchers determined the structure of 30 new proteins using cryo-electron microscopy and other experimental techniques, 27 of which match designs led by artificial intelligence.2. The team’s creations included giant rings with complex symmetries, unlike anything found in nature. In theory, this approach could be used to design nanoparticles that conform to nearly any symmetric shape, says Lukas Milles, a biophysicist who co-led the effort. “It’s exciting to see what these networks can do.”

The deep learning revolution

Arne Elofsson, a computational biologist at Stockholm University, says deep learning tools like the protein MPNN have changed the rules of the game in protein design. “You draw your protein, you hit a button, and you get something that works one ten times over.” Higher success rates can be achieved by combining multiple neural networks to process different parts of the design process, as Becker’s team did in designing nanoparticles. “Now we have complete control over the shape of the protein,” says Ovchinnikov.

Baker isn’t the only lab applying artificial intelligence to protein design. In a review paper published on bioRxiv this month, Ferruz and her colleagues counted more than 40 AI protein design tools developed in recent years, using various methods.5 (See “How to design a protein”).

Many of these tools, including the MPNN protein, address the problem of reverse folding: they identify sequences corresponding to a given structure, often using methods borrowed from image-recognizers. Others rely on an architecture similar to that of neural language networks such as GPT-3, which produces text similar to humans; But, instead, the tools are capable of producing new protein sequences. “These networks are able to ‘talk’ about proteins,” says Firouz, who co-developed one of these networks6.

With so many protein design tools available, it’s not always clear how best to compare them, says Chloe Hsu, a machine learning researcher at the University of California, Berkeley, who developed an inverse folding network with researchers from Meta.7.

Animation of four protein structures predicted by the Alphafold AI system

Four examples of protein “hallucinations”. In each case, the AlphaFold is presented with a random sequence of amino acids, predicts the structure, and changes the sequence until the program confidently predicts that it will fold into a protein with a well-defined 3D shape. Colors show confidence in prediction (from red for very low confidence, through yellow and light blue to dark blue for very high confidence). Initial frames have been slowed down for clarity. Credit: Sergey Ovchinnikov

Many teams measure their network’s ability to accurately identify an existing protein sequence from its structure. But this does not apply to all methods, and it is not clear how this metric, known as recovery rate, applies to the design of new proteins, the scientists say. Firouz would like to see a protein design competition, similar to the Biennial Critical Assessment of Structure Prediction of Protein (CASP), Which AlphaFold demonstrated its superiority for the first time on other networks. “It’s a dream. And she says something like CASP would really push the field forward.

to the wet lab

Becker and his colleagues insist that making a new protein in the lab is the ultimate test of their methods. This shows their initial failure to form hallucinogenic protein clusters. “AlphaFold thought they were great proteins, but they obviously didn’t work in the wet lab,” says Basile Wiki, a biophysicist in Baker’s lab who co-led the effort, along with Baker, Mills and biochemist Alexis Corbett at the University of Washington .

But not all scientists who develop AI tools for protein design have easy access to experimental constructs, notes Jinbo Shu, a computational biologist at the Toyota Institute of Technology in Chicago in Illinois. Finding a lab to collaborate with can take some time, so Xu creates his own lab to put his team’s creativity to the test.

Experiments will also be necessary when it comes to designing proteins with specific tasks in mind, Baker says. In July, his team described two artificial intelligence methods that allow researchers to embed a specific sequence or structure into a new protein8. They used these methods to design enzymes that catalyze certain reactions; proteins capable of binding to other molecules; and a protein that can be used in a vaccine against a respiratory virus that is a major cause of hospitalization for children.

Last year, DeepMind launched a subsidiary called Isomorphic Labs in London that intends to apply artificial intelligence tools like AlphaFold to drug discovery. DeepMind CEO Demis Hassabis says he sees protein design as a clear and promising application of deep learning technology, and for Fold in particular. “We do a lot of protein design. It’s very early days.”