AI solves the protein folding problem
A tough challenge
One of the longest standing-by problems of Life Sciences was protein folding.
For starters, folding is the process through which a protein is turned from a linear string of amino-acids to a three-dimensional working unit, suitable for carrying out all sorts of tasks in living cells.
This is a key problem not only because proteins are the fundamental building blocks of life, but also because misfolding (i.e. errors that occur while building the 3D structure) is at the heart of lots of human and animal diseases and abnormalities. Moreover, proteins are a target for lots of drugs (antibiotics, for example) and they can also be employed in a wide variety of industrial fields (most of clothes detergents contain enzymes to remove dirt and stains, e.g.).
In this sense, knowing the 3D structure of a protein thanks to a fast, reliable and scalable in silico method, such as a predictive algorithm, is much easier than exploiting slow and expensive laboratory procedures, such as crystallography (which is, nevertheless, still considered the golden standard for protein structure reconstruction).
For 50 years, 3D structure prediction has been a real struggle for bioinformatician, because no algorithm seemed to get accurate results, but then Google Deepmind team came, and the rules of the game changed.
Alphafold, a game changer
In 2020, AlphaFold 2 won the Critical Assessment of Structure Prediction - 14 (CASP14), a protein fold competition, with unprecedented results.
In this competition, participants are evaluated according to GDT, Global Distance Test: the score ranges from 0 to 100, with 100 meaning that all amino-acids in the predicted 3D structure of the protein are exactly where they are supposed to be, or within a threshold distance. Formally, a GDT of 90 is considered to be competitive with laboratory procedures: in 2020, AlphaFold 2 got a median score of 92.4 overall, with the lowest being around 87. Considered that no one, not even AlphaFold (base version), had surpassed 60 in the previous editions of CASP, that was a shockingly impressing result.
Now AlphaFold 2 can be used as a research tool by everyone: you can either run it within Colab (here) or you can search through the AlphaFold database, which encompasses more than 200,000,000 predicted structures: if you scrape through it long enough, you may also come across a protein structure that no one has ever seen!
So, now comes the big question: should we trust AI and rely on that for a task as delicate and critical as protein structure prediction? Share your thoughts in the comments below!