Data-driven De Novo Protein Design with Neural Networks and Deep Learning
Speaker

刘海燕
中国科学技术大学

Time
2022-08-11 14:00 ~ 15:00
Venue
Online
Tencent
  • https://meeting.tencent.com/dm/4av55aVyhMQ9
  • Conference ID: 778-432-278
  • Abstract
    Computational protein design holds great promise for various applications from the development of novel therapeutics to the invention of new bio-catalysts. Recently, data-driven computational approaches to protein design have taken shape as being more robust and more efficient than conventional physics-based methods. I will present two data-driven models: one named SCUBA, which is for designing protein backbones using neural network energy functions, and the other named ABACUS-R, which is for designing amino acid sequences for given backbones using deep learning. The SCUBA energy function is composed of neural networks (NNs), which have been learnt to faithfully capture the complex, high-order correlations in the high-dimensional space of backbone conformations. Backbone structures of high designability—meaning that a substantial number of amino acid sequences autonomously fold into these structures—could be obtained (through sampling/optimization) as low-lying minima on the SCUBA energy landscape. We solved several crystal structures of SCUBA-designed de novo proteins. Some of these proteins are of overall architectures not yet observed in nature, which exemplifies that SCUBA can facilitate far-reaching exploration of the space of designable backbones. In the ABACUS-R method, an encoder-decoder network trained with a multi-task learning strategy is used to predict the sidechain type of a central residue from its 3D local environment. Iterative application of this encoder-decoder to different central residues of a designable target backbone leads to self-consistent overall sequences. In wet experiments examining de novo sequences designed on several natural backbones, ABACUS-R surpassed state-of-the-art energy function-based methods in both success rate and design precision.
    Bio
    刘海燕,中国科学技术大学生命科学与医学部教授。毕业于中国科学技术大学,1990年获学士学位,1996年获博士学位。曾在瑞士苏黎世高等理工学院物理化学实验室学习,以及在美国杜克大学和北卡罗莱纳大学教堂山分校从事博士后研究。2001年起任现职。主要研究方向为蛋白质设计方法及其应用、蛋白质空间结构和动力学的计算机模拟方法与应用。建立并实验验证了蛋白质序列从头设计的统计能量方法ABACUS和深度学习方法ABACUS-R,以及基于神经网络模型的蛋白质结构从头设计方法SCUBA;发展了酶反应自由能面计算、集合自由度增强采样、单参考态自由能计算等模拟技术。已发表研究论文百余篇,包括在Nature及其子刊、JACS、PRL等高水平杂志发表。曾获中国科学院青年科学家奖、国家基金委“杰出青年基金”等。