近日,上海交通大学生命科学技术学院/微生物代谢国家重点实验室肖湘团队与自然科学研究院洪亮团队合作,以分离培养的具有古老的演化地位和独特代谢共性的深海热液超嗜热古菌和嗜热细菌为切入点,借助自主改良的深度学习AlphaFold2技术和中子散射实验,首次应用高通量蛋白结构组研究早期生命代谢,揭示了深海热液古菌细菌共同祖先的代谢特征。该研究成果以“Proteome-wide 3D structure prediction provides insights into the ancestral metabolism of ancient archaea and bacteria”为题,发表在国际权威杂志《Nature Communications》上。上海交通大学生命科学技术学院赵维殳助理研究员与生命科学技术学院本科生钟博子韬为本文的共同第一作者,生命学院肖湘教授和自然研究院洪亮教授为本文的共同通讯作者。
On Dec. 22nd, 2022, Nature Communications published a joint research article entitled “Proteome-wide 3D structure prediction provides insights into the ancestral metabolism of ancient archaea and bacteria” from Prof. Xiang Xiao’s group (State Key Laboratory of Microbial Metabolism/School of Life Sciences and Biotechnology) and Prof. Liang Hong’s group (Institute of Natural Sciences). The co-first authors are Dr. Weishu Zhao (Assistant professor, School of Life Sciences and Biotechnology) and Bozitao Zhong (Undergraduate student, Institute of Natural Sciences). The corresponding authors are Prof. Xiang Xiao and Liang Hong.
以现有生命追溯、重建演化过程和早期生命形式是研究生命起源和演化的一种重要思路。通常,研究者会基于基因组比较来推测早期生命的代谢特征,但高可变的基因组往往难以清晰地揭示演化脉络和模式。本研究独辟蹊径,应用基于高通量AlphaFold2深度学习技术,预测了近万个蛋白分子的3D结构,引入蛋白结构组,对深海热液来源的具有相似代谢功能的古老的超嗜热古菌和嗜热细菌的代表菌株,通过对序列、结构、功能三方面进行比较分析,重构了古菌细菌共同祖先的代谢特征。令人惊奇的是,这些结构在古菌细菌中保守的代谢模块,与实验验证的前生命化学过程高度重合,这一发现为生命起源的研究提供了新视角和新方法。此外,通过中子散射实验对本研究中不同菌种的柔性系数做了定量表征,在不同生长温度范围的古菌和细菌中建立了宏观柔性和其耐热能力的关系。
Tracing the common ancestors from modern cells appears to be an indispensable approach for investigating the origin and evolution of early life on Earth. During the past two decades, many studies have proposed various ideas to reconstruct ancestral cells in silico. With the development of culture-independent high-throughput sequencing technology and bioinformatics approaches, sequence-based comparative genomic and phylogenomic analyses are the major avenue for studying the origin and ancestry of life. The main challenge of phylogenomic analysis is to identify where the phylogenetic tree should be rooted and even whether the root should exist at all. The latest phylogenetic approaches have rooted archaeal and bacterial trees independently and reconstructed the potential lifestyle of the last archaeal common ancestor (LACA) and the last bacterial common ancestor (LBCA), respectively, furnishing valuable and important information on early life. The lack of additional evidence beyond sequence-based conclusions makes ancestral reconstruction controversial, especially for ancestral physiological and metabolic characteristics.
We performed proteome-wide comparison of 3D protein structure on two representatives of ancient archaeal and bacterial species using a self-modified high-throughput alphafold2, as additional information beyond sequences to explore the potential ancestor of early archaea and bacteria. One is a hyperthermophilic archaeon, Thermococcus eurythermalis A501 (denoted A501), which belongs to a representative ancient group of archaea, and the other is a newly discovered and isolated thermophilic bacterium, Zhurongbacter thermophilus 3DAC (denoted 3DAC), which belongs to a novel phylum-level lineage and provides new insights into an early-diverging bacterial cluster (superfamily Zhurongbacteria) with genomic and physiological features that have never been reported in any other bacteria. Both were isolated from deep-sea hydrothermal vents, which are believed to be one of the cradles of life, from different locations. Apart from temperature adaptation, these two strains showed similar habitats, physiological and metabolic characteristics, and thermal resilience, which avoid the general challenge of the low comparability between archaea and bacteria restricted by greatly different ecological niches. The present work decoded nearly 10,000 protein structures and analyzed the structural, sequence and functional differences of these proteins between the two species. Further neutron scattering experiments were utilized to understand the thermal stability of the two species, and found they both show significantly stronger structural stability to survive at high temperatures. According to conserved or variable protein structures of a series enzymes between A501 and 3DAC, metabolic pathways could be spontaneously distinguished into distinct modules, which were further extended and confirmed in 24 other representative archaea and bacteria with diverse positions across the phylogenetic tree. Interestingly, we found the conserved metabolic modules in central carbon metabolism predominated by highly conserved protein structures among archaea and bacteria were exactly consistent with the experimentally confirmed protometabolic pathways by prebiotic priocesses, which provide a new perspective to reconstruct the ancestral metabolism that may present in archaeal-bacterial common ancestor (ABCA) and understand the origin of metabolism.
本研究选取了超嗜热古菌Thermococcus eurythermalis A501和嗜热代表细菌Zhurongbacter thermophilus 3DAC比较两者序列、结构、功能之间的异同。这两株菌均来自深海热液口,而深海热液区被认为是生命的摇篮之一。综合考虑两株菌株中功能相似蛋白质对的序列和蛋白结构异同,可以将蛋白质对划分到3个不同的组:(i) 具有相似结构和相似功能的直系同源序列;(ii) 具有相似结构和功能的非直系同源序列; (iii) 结构不同但功能相似的非直系同源序列。一些蛋白质对结构相似,但基因序列差异性很大,进一步证明了和序列相比,蛋白质结构和功能的关联度更大。通过将不同组的蛋白质对映射到代谢途径中,我们发现两株菌株中的蛋白质进化不是发生在单个蛋白质水平上,而是以代谢模块为单位发生的。譬如,两株菌株糖酵解的中半部分(从 DHAP 到乙酰辅酶 A)涉及的蛋白质都属于(i)组,而糖酵解和脂质生物合成连接通路上的蛋白(DHAP 到 G13P2)都属于(ii)组。以此划分的代谢模块可用于揭示蛋白质的不同来源和不同的演化历史。这在以往基于序列的分析中从未被发现。
Powerful deep learning tools such as AlphaFold2 make it possible to derive proteome-wide structures from genome sequences with high accuracy, which provides additional crucial information than using genomics alone and bridges the gap between sequence and function. In this study, we found that protein pairs with similar functions between one representative archaeon (A501) and one representative bacterium (3DAC) could be classified into different groups by judging their commonality and differences in sequence and structure. Interestingly, we were able to identify some protein pairs that possess similar structures and functions between the two strains but with low sequence similarity. This finding further confirms that structure is more closely associated with function than sequence. By mapping the protein pairs of different groups into metabolic pathways, we found that the evolution of proteins in these two strains does not occur at the individual protein level but rather takes place using a metabolic module as a basic unit. Metabolic modules can be used to explore distinct sources and different evolutionary histories. These findings have not been revealed in previous sequence-based analyses.