AIにより、生物学において、四半世紀に一度の進歩がもたらされた。タンパク質の3次構造の予測をAIが可能に!。
Protein Folding (タンパク質の3次構造) : AI Is Making a ‘Once in a Generation’ Advance in Biology
Thanks to AI, we just got stunningly powerful tools to decode life (タンパク質の3次構造を解明することにより、生命の仕組みを解明していく試み).
SingularityHub / By Shelly Fan -Jul 20, 2021
わたしのnoteにおいては、最新の科学・経済・社会等の問題に関して、英語の記事を引用し、その英文が読み易いように加工し、英語の勉強ツールと最新情報収集ツールとしてご利用頂くことをmain missionとさせて頂きます。勿論、私論を書かせて頂くこともしばしです。
従来、タンパク質の立体構造解析は、解析対象のタンパク質を、まず高い純度まで精製し、それを"freeze"させた状態でX線解析を行うことにより実施しており、数か月から数年を要するプロセスであった。また、"freeze"させた状態というのは、本来、そのタンパク質が持つ構造とは異なる事も考えられ、タンパク質の立体構造の解析には大きな困難が伴っていた。その困難を今回、breakthroughさせたのがAIである。
In two back-to-back (立て続けの) papers last week, scientists at DeepMind and the University of Washington described deep learning-based methods to solve protein folding (タンパク質の折り畳み構造、即ち、タンパク質の立体構造)—the last step of executing the programming in our DNA (DNAの設計図/塩基配列から最終段階であるタンパク質を構築する), and a “once in a generation advance.”
Proteins are the minions (子分、手先/mínjən) of life. They form our bodies, fuel our metabolism, and are the target of most of today’s medicine. They start out as a simple ribbon, translated from DNA, and subsequently fold into intricate (複雑な/íntrikət) three-dimensional architectures. Similar to Transformers (トランス、変圧器、変成器/trænsfɔ́rmər), many protein units further assemble into massive, moving complexes [that change their structure] depending on their functional needs at the moment.
Misfolded proteins can be devastating, causing health problems from sickle cell anemia to cancer and Alzheimer’s disease. One of biology’s grandest challenges for the past 50 years has been deciphering how [a simple one-dimensional ribbon-like structure] turns into [3D shapes, equipped with canyons, ridges, valleys, and caves]. It’s as if an alien is reading the coordinates of hundreds of locations on a map of the Grand Canyon on a notebook, and reconstructing it into a 3D hologram of [the actual thing]—without ever laying eyes on [it] or knowing what [it] should look like.
Yeah. It’s hard. “Lots of people have broken their head on it,” said Dr. John Moult at the University of Maryland.
It’s not just an academic exercise. Solving the human genome paved the way for gene therapy, CAR-T cancer breakthroughs (CAR-T細胞療法は、通常の免疫機能だけでは完全に死滅させることが難しい難治性のがんに対する治療法として開発されました。患者さん自身のT細胞を取り出し、遺伝子医療の技術を用いてCAR(キメラ抗原受容体)と呼ばれる特殊なたんぱく質を作り出すことができるよう、T細胞を改変します。CARは、がん細胞などの表面に発現する特定の抗原を認識し、攻撃するように設計されており、CARを作り出すことができるようになったT細胞をCAR-T細胞と呼びます。このCAR-T細胞を患者さんに投与することにより、難治性のがんを治療するのがCAR-T療法です。), and the infamous CRISPR gene editing tool. Deciphering protein folding is bound to illuminate an entire new landscape of biology we haven’t been able to study or manipulate. The fast and furious development of Covid-19 vaccines relied on scientists parsing (解析する/pɑ́rs) multiple protein targets on the virus, including the spike proteins that vaccines target. Many proteins that lead to cancer have so far been out of the reach of drugs because their structure is hard to pin down (明らかにする).
With these new AI tools, scientists could solve haunting medical mysteries while preparing to tackle those yet unknown. It sets the stage for better understanding our biology, informing new medicines, and even inspiring synthetic biology down the line.
“What the DeepMind team has managed to achieve is fantastic and will change the future of structural biology and protein research,” said Dr. Janet Thornton, director emeritus of the European Bioinformatics Institute.
“I never thought I’d see this in my lifetime,” added Moult.
Birth of a Protein
Picture life as a video game. If DNA is the background base code, then proteins are its execution—the actual game that you play. Any bugs in DNA could trigger a crash in the program, but they could also be benign (無害な、安全な/bináin) and allow the game to run as usual. In other words, most modern medicine, like gamers, cares only about the final gameplay—the proteins—rather than the source code that leads to it, unless something goes wrong. From diabetes medication to anti-depressants and potentially life-extending senolytics (アンチエイジング療法 : 「セノリティクス(senolytics)」、つまり、老化した有毒な細胞を体内から取り除く薬を用いる治療), these drugs all work by grabbing onto proteins rather than DNA.
It’s why deciphering protein structure is so important: like a key to a lock, a drug can only dock onto a protein at specific spots. Similarly, proteins often tag-team by binding together into a complex to run your body’s functions—say, forming a memory or triggering an immune attack against a virus.
Proteins are made of building blocks called amino acids, which are in turn programmed by DNA. Similar to the Rosetta stone, our cells can easily translate DNA code into protein building blocks inside a clam-shell-like structure, which spits out a string of one-dimensional amino acids. These ribbons are then shuffled through a whole cellular infrastructure that allows the protein to fold into its final structure.
Back in the 1970s, the Nobel Prize winner Dr. Christian Anfinsen famously asserted that the one-dimensional sequence itself can computationally predict a protein’s 3D structure. The problem is time and power: like trying to hack a password with hundreds of characters suspended in 3D space, the potential solutions are astronomical.
But we now have a tool that beats humans at finding patterns: machine learning.
Enter AI
In 2020, DeepMind shocked the entire field with its entry into a legacy biennial (隔年の、2年に1度の/baiéniəl) competition. Dubbed CASP (Critical Assessment of Protein Structure Prediction), the decades-long test uses traditional lab methods for determining protein structure as its baseline to judge prediction algorithms.
The baseline’s hard to get. It relies on laborious experimental techniques that can take months or even years. These methods often “freeze” a protein and map its internal structure down to the atomic level using X-rays. Many proteins can’t be treated this way without losing their natural structure, but the method is the best we currently have. Predictions are then compared to this gold standard to judge the underlying algorithm.
Last year DeepMind stunned (驚かせる/stʌ́n) everyone with their AI, blowing other competition out of the water. At the time, they were a tease (からかい/tiz), revealing little detail about their “incredibly exciting” method that matched experimental results in accuracy. But the 30-minute presentation inspired Dr. Minkyung Baek at the University of Washington to develop her own approach.
Baek used a similar deep learning strategy, outlined in a paper in Science this week. The tool, RoseTTAFold, simultaneously considers three levels of patterns. The first looks at the amino acid building blocks of a protein and compares them to all the other sequences in a protein database.
The tool next examines how one protein’s amino acids interact with another within the same protein, for example, by examining the distance between two distant building blocks. It’s like looking at your hands and feet fully stretched out versus in a backbend (後屈), and measuring the distance between those extremities as you “fold” into a yoga pose.
Finally, the third track looks at the 3D coordinates (配位/kouɔ́ːrdənit) of each atom that makes up a protein building block—kind of like mapping the studs on a Lego block—to compile the final 3D structure. The network then bounces back and forth (何回も行き来する) between these tracks, so that one output can update another track.
The end results came close to those of DeepMind’s tool, AlphaFold2, which matched the gold standard of structures obtained from experiments. Although RoseTTAFold wasn’t as accurate as AlphaFold2, it seemingly required much less time and energy. For a simple protein, the algorithm was able to solve the structure using a gaming computer in about 10 minutes.
RoseTTAFold was also able to tackle the “protein assemble” problem (タンパク質複合体の構造問題), in that (タンパク質複合体の構造問題) it could predict the structure of proteins, made up of multiple units, by simply looking at the amino acid sequence alone. For example, they were able to predict how the structure of an immune molecule locks onto its target. Many biological functions rely on these handshakes between proteins. Being able to predict them using an algorithm opens the door to manipulating biological processes—immune system, stroke, cancer, brain function—that we previously couldn’t access.
Hacking the Body
Since RoseTTAFold’s public release in July, it’s been downloaded hundreds of times, allowing other researchers to answer their baffling (不可解な、当惑[困惑]させる/bǽfliŋ) protein sequence questions, potentially saving years of work while collectively improving on the algorithm.
“When there’s a breakthrough like this, two years later, everyone is doing it as well if not better than before,” said Moult.
Meanwhile, DeepMind is also releasing their AlphaFold2 code—the one that inspired Baek.
In a new paper in Nature, the DeepMind team described their approach to the 50-year mystery. The crux (最も重要な点[ポイント]、難解な箇所、難問/krʌ́ks) was to integrate multiple sources of information—[the evolution of a protein and its physical and geometric constraints (制約、制限/kənstréint)]—to build a two-step system that maps out a given protein with stunningly high accuracy.
First presented at the CASP meeting, Dr. Demis Hassabis, founder and CEO of DeepMind, is ready to share the code with the world. “We pledged to share our methods and provide broad, free access to the scientific community. Today we take the first step towards delivering on that commitment by sharing AlphaFold’s open-source code and publishing the system’s full methodology,” he wrote, adding that “we’re excited to see [what other new avenues of research] this will enable for the community.”
With the two studies (AlphaFold and RoseTTAFold), we’re entering a new world of predicting—and subsequently engineering or changing—the building blocks of life. Dr. Andrei Lupas, an evolutionary biologist at the Max Planck Institute for Developmental Biology, and a CASP judge, agrees: “This will change medicine. It will change research,” he said. “It will change bioengineering. It will change everything.”
この記事が気に入ったらサポートをしてみませんか?