← Back

上海交大肖湘团队与洪亮团队合作应用高通量蛋白结构预测揭示深海热液古菌细菌共同祖先的代谢特征

2022年12月

近日,上海交通大学生命科学技术学院/微生物代谢国家重点实验室肖湘团队与自然科学研究院洪亮团队合作,以分离培养的具有古老的演化地位和独特代谢共性的深海热液超嗜热古菌和嗜热细菌为切入点,借助自主改良的深度学习AlphaFold2技术和中子散射实验,首次应用高通量蛋白结构组研究早期生命代谢,揭示了深海热液古菌细菌共同祖先的代谢特征。该研究成果以“Proteome-wide 3D structure prediction provides insights into the ancestral metabolism of ancient archaea and bacteria”为题,发表在国际权威杂志《Nature Communications》上。上海交通大学生命科学技术学院赵维殳助理研究员与生命科学技术学院本科生钟博子韬为本文的共同第一作者,生命学院肖湘教授和自然研究院洪亮教授为本文的共同通讯作者

On Dec. 22nd, 2022, Nature Communications published a joint research article entitled “Proteome-wide 3D structure prediction provides insights into the ancestral metabolism of ancient archaea and bacteria” from Prof. Xiang Xiao’s group (State Key Laboratory of Microbial Metabolism/School of Life Sciences and Biotechnology) and Prof. Liang Hong’s group (Institute of Natural Sciences). The co-first authors are Dr. Weishu Zhao (Assistant professor, School of Life Sciences and Biotechnology) and Bozitao Zhong (Undergraduate student, Institute of Natural Sciences). The corresponding authors are Prof. Xiang Xiao and Liang Hong.

以现有生命追溯、重建演化过程和早期生命形式是研究生命起源和演化的一种重要思路。通常,研究者会基于基因组比较来推测早期生命的代谢特征,但高可变的基因组往往难以清晰地揭示演化脉络和模式。本研究独辟蹊径,应用基于高通量AlphaFold2深度学习技术,预测了近万个蛋白分子的3D结构,引入蛋白结构组,对深海热液来源的具有相似代谢功能的古老的超嗜热古菌和嗜热细菌的代表菌株,通过对序列、结构、功能三方面进行比较分析,重构了古菌细菌共同祖先的代谢特征。令人惊奇的是,这些结构在古菌细菌中保守的代谢模块,与实验验证的前生命化学过程高度重合,这一发现为生命起源的研究提供了新视角和新方法。此外,通过中子散射实验对本研究中不同菌种的柔性系数做了定量表征,在不同生长温度范围的古菌和细菌中建立了宏观柔性和其耐热能力的关系。

Tracing the common ancestors from modern cells appears to be an indispensable approach for investigating the origin and evolution of early life on Earth. During the past two decades, many studies have proposed various ideas to reconstruct ancestral cells in silico. With the development of culture-independent high-throughput sequencing technology and bioinformatics approaches, sequence-based comparative genomic and phylogenomic analyses are the major avenue for studying the origin and ancestry of life. The main challenge of phylogenomic analysis is to identify where the phylogenetic tree should be rooted and even whether the root should exist at all. The latest phylogenetic approaches have rooted archaeal and bacterial trees independently and reconstructed the potential lifestyle of the last archaeal common ancestor (LACA) and the last bacterial common ancestor (LBCA), respectively, furnishing valuable and important information on early life. The lack of additional evidence beyond sequence-based conclusions makes ancestral reconstruction controversial, especially for ancestral physiological and metabolic characteristics.

We performed proteome-wide comparison of 3D protein structure on two representatives of ancient archaeal and bacterial species using a self-modified high-throughput alphafold2, as additional information beyond sequences to explore the potential ancestor of early archaea and bacteria. One is a hyperthermophilic archaeon, Thermococcus eurythermalis A501 (denoted A501), which belongs to a representative ancient group of archaea, and the other is a newly discovered and isolated thermophilic bacterium, Zhurongbacter thermophilus 3DAC (denoted 3DAC), which belongs to a novel phylum-level lineage and provides new insights into an early-diverging bacterial cluster (superfamily Zhurongbacteria) with genomic and physiological features that have never been reported in any other bacteria. Both were isolated from deep-sea hydrothermal vents, which are believed to be one of the cradles of life, from different locations. Apart from temperature adaptation, these two strains showed similar habitats, physiological and metabolic characteristics, and thermal resilience, which avoid the general challenge of the low comparability between archaea and bacteria restricted by greatly different ecological niches. The present work decoded nearly 10,000 protein structures and analyzed the structural, sequence and functional differences of these proteins between the two species. Further neutron scattering experiments were utilized to understand the thermal stability of the two species, and found they both show significantly stronger structural stability to survive at high temperatures. According to conserved or variable protein structures of a series enzymes between A501 and 3DAC, metabolic pathways could be spontaneously distinguished into distinct modules, which were further extended and confirmed in 24 other representative archaea and bacteria with diverse positions across the phylogenetic tree. Interestingly, we found the conserved metabolic modules in central carbon metabolism predominated by highly conserved protein structures among archaea and bacteria were exactly consistent with the experimentally confirmed protometabolic pathways by prebiotic priocesses, which provide a new perspective to reconstruct the ancestral metabolism that may present in archaeal-bacterial common ancestor (ABCA) and understand the origin of metabolism.

本研究选取了超嗜热古菌Thermococcus eurythermalis A501和嗜热代表细菌Zhurongbacter thermophilus 3DAC比较两者序列、结构、功能之间的异同。这两株菌均来自深海热液口,而深海热液区被认为是生命的摇篮之一。综合考虑两株菌株中功能相似蛋白质对的序列和蛋白结构异同,可以将蛋白质对划分到3个不同的组:(i) 具有相似结构和相似功能的直系同源序列;(ii) 具有相似结构和功能的非直系同源序列; (iii) 结构不同但功能相似的非直系同源序列。一些蛋白质对结构相似,但基因序列差异性很大,进一步证明了和序列相比,蛋白质结构和功能的关联度更大。通过将不同组的蛋白质对映射到代谢途径中,我们发现两株菌株中的蛋白质进化不是发生在单个蛋白质水平上,而是以代谢模块为单位发生的。譬如,两株菌株糖酵解的中半部分(从 DHAP 到乙酰辅酶 A)涉及的蛋白质都属于(i)组,而糖酵解和脂质生物合成连接通路上的蛋白(DHAP 到 G13P2)都属于(ii)组。以此划分的代谢模块可用于揭示蛋白质的不同来源和不同的演化历史。这在以往基于序列的分析中从未被发现。

Powerful deep learning tools such as AlphaFold2 make it possible to derive proteome-wide structures from genome sequences with high accuracy, which provides additional crucial information than using genomics alone and bridges the gap between sequence and function. In this study, we found that protein pairs with similar functions between one representative archaeon (A501) and one representative bacterium (3DAC) could be classified into different groups by judging their commonality and differences in sequence and structure. Interestingly, we were able to identify some protein pairs that possess similar structures and functions between the two strains but with low sequence similarity. This finding further confirms that structure is more closely associated with function than sequence. By mapping the protein pairs of different groups into metabolic pathways, we found that the evolution of proteins in these two strains does not occur at the individual protein level but rather takes place using a metabolic module as a basic unit. Metabolic modules can be used to explore distinct sources and different evolutionary histories. These findings have not been revealed in previous sequence-based analyses.

图:本研究工作流程以及 A501(古菌,红色)和 3DAC(细菌,蓝色)基本特征。建立高通量结构预测,引入蛋白结构组,比较深海热液古菌细菌在序列、结构、功能之间的异同

进一步对保守蛋白结构分析揭示了细菌和古菌共同祖先(ABCA)的保守代谢模块。这些保守的代谢模块包括糖酵解的中半部分(从 DHAP 到乙酰辅酶 A)、嘌呤和嘧啶的生物合成、一些必需氨基酸(即 Asp、Glu、Ser、Gly 和 Thr)的代谢、一些必需辅因子的生物合成(即 NAD( P)+ 和 CoA)、具有 MBH 和 MBS 的能量呼吸、大多数氨酰-tRNA 连接酶和部分核糖体蛋白等,推测它们本就存在于共同祖先ABCA中,而后被遗传给了细菌和古菌。

These new structure-based findings provide new insights into the metabolic organization of ancient archaea, bacteria and their potential common ancestor (ABCA). Metabolic modules with conserved structures of a series of enzymes shed light on the conserved functions in the ABCA, indicating metabolic modules that likely existed in the ABCA, including the middle half of glycolysis (from DHAP to acetyl-CoA), purine and pyrimidine biosynthesis, metabolism of some essential amino acids (i.e., Asp, Glu, Ser, Gly and Thr), biosynthesis of some essential cofactors (i.e., NAD(P)+ and CoA), the energetic respiratory system with MBH and MBS, as well as most aminoacyl-tRNA ligases and some ribosome proteins. In particular, for respiratory complexes that are usually hard to identify at the sequence level, our results revealed that ancient MBH and MBS systems can function in both archaea and bacteria at the structural level. In contrast, other metabolic modules with different structures, such as the processes up- (glucose to GAP, for sugar utilization) and downstream (acetyl-CoA to acetate, for acetate production) of glycolysis/gluconeogenesis and the biosynthesis of lipids, were likely excluded from the metabolism of ABCA. This modular splicing of metabolic combinations in modern cells suggests the possibility that early life may have involved a hybrid of life and nonlife processes, and the connection between metabolic modules may have been abiotic processes.

有趣的是,本研究发现结构保守的代谢模块与前生命化学过程高度重合,譬如中心碳代谢的保守代谢模块与已经被证实的前生物合成途径一致性高,由现有生命推测生命起源的逆推过程与从化学反应走向生命起源的正推过程出现了交汇,同时也从侧面证明了我们通过蛋白质预测结构获得的保守代谢模块绝非巧合。

Notably, the conserved metabolic modules with conserved protein structures between A501 and 3DAC in central carbon metabolism were surprisingly consistent with the experimentally confirmed protometabolic pathways under ‘prebiotically plausible’ conditions. The high overlap between our results and prebiotic chemistry implies that this phenomenon may not be coincidental, but a combination of necessity and chance events intermediated by protein 3D structures under the control of physical and chemical laws. The 3D structure of metabolic enzymes makes a bridge between the sequences in modern cells and the prebiotic chemical reactions, which provides new perspectives to understand the origin of metabolism.
图:通过蛋白结构的捉对比较,揭示古菌细菌共同祖先中的保守代谢模块,与前生命过程高度重合。

本研究首次应用高通量蛋白结构研究早期生命代谢,建立了蛋白结构组的新方法,发现蛋白的起源和演化不是孤立的而是以代谢模块为单位,保守的蛋白结构揭示了共同祖先的保守代谢模块,并首次发现结构保守的代谢模块与前生命化学过程高度重合,为祖先代谢重组和生命起源研究提供一种新的研究思路,部分验证了高通量蛋白质结构预测在生命起源和演化研究中的重大潜在价值。

Our results demonstrate the importance of characterizing the 3D structures of proteome-wide protein molecules for understanding the evolutionary mechanism and life origin. We found that the protein structure serves as an important bridge between genomic and metabolic pathways, with the latter having little distinction at the functional level but too much distinction at the sequence level. The comparison of seven key enzymes involved in the glycolysis pathway among 24 archaea and bacterial strains is a good example. Although the sequence similarity among them is low and irregular, the structures are strongly conserved, identifying the essential function that is preserved from ancient life to modern cells. For this reason, we call for extensive use of 3D protein structures, especially proteome-wide structure predictions, as an extension of the sequence-based approach in future research on the origin and evolution of metabolism.

本研究获得了国家自然科学基金创新群体(41921006)、青年基金(42106087)和面上项目(11974239、31630002)的支持,上海人工智能实验室、上海科委、上海教委重大项目、上海市浦江人才计划(22PJ1406900)的支持,以及上海交通大学交叉学科研究基金(YG 2016QN13)和“深蓝计划”(SL2021PT103)的支持。

This study was financially supported by the following funding: the Natural Science Foundation of China (grant numbers 41921006, 42106087, 11974239 and 31630002), Shanghai Artificial Laboratory, the Innovation Program of Shanghai Municipal Education Commission, Shanghai Jiao Tong University Multidisciplinary Research Fund of Medicine and Engineering (project number YG 2016QN13), the Oceanic Interdisciplinary Program of Shanghai Jiao Tong University (project number SL2021PT103), and Shanghai Pujiang Program (Grant No. 22PJ1406900).