科学家开发卷积神经网络预测三维基因组
2020-10-15   阅读:1171   来源:自然

格莱斯顿研究所的数据科学和生物技术Katherine S. Pollard小组在研究中取得进展。他们开发了Akita用于基于DNA序列预测三维基因组。 相关论文发表在2020年10月12日出版的《自然—方法学》杂志上。

这里研究团队提出一个卷积神经网络Akita,只通过DNA序列就可以准确地预测基因组的折叠。由Akita学习的演示强调了一个特定方向语法对于CTCF结合位点的重要性。Akita学到了基因组折叠中可预测的核苷酸水平特征,揭示了核心CTCF模体(motif)以外核苷酸的作用。训练后,Akita可以进行快速的计算机模拟预测。为展示Akita的预测能力,课题组人员演示Akita如何进行计算机上的饱和突变(saturation mutagenesis),解释eQTLs,预测结构变异以及探测特定种类的基因组折叠。总而言之,这些结果使得能够从序列到结构解码基因组功能。

据悉,在复制间期,人类基因组序列在三维空间折叠成多样的基因座特有的联系模式。黏着蛋白和CTCF(CCCTC结合因子)是关键调控因子。如通过染色体构象捕获方法所测定的,干扰任一水平都会极大地破坏全基因组折叠。一个给定DNA序列如何编码基因座特有的折叠模式仍然是未知的。

附:英文原文

Title: Predicting 3D genome folding from DNA sequence with Akita

Author: Geoff Fudenberg, David R. Kelley, Katherine S. Pollard

Issue&Volume: 2020-10-12

Abstract: In interphase, the human genome sequence folds in three dimensions into a rich variety of locus-specific contact patterns. Cohesin and CTCF (CCCTC-binding factor) are key regulators; perturbing the levels of either greatly disrupts genome-wide folding as assayed by chromosome conformation capture methods. Still, how a given DNA sequence encodes a particular locus-specific folding pattern remains unknown. Here we present a convolutional neural network, Akita, that accurately predicts genome folding from DNA sequence alone. Representations learned by Akita underscore the importance of an orientation-specific grammar for CTCF binding sites. Akita learns predictive nucleotide-level features of genome folding, revealing effects of nucleotides beyond the core CTCF motif. Once trained, Akita enables rapid in silico predictions. Accounting for this, we demonstrate how Akita can be used to perform in silico saturation mutagenesis, interpret eQTLs, make predictions for structural variants and probe species-specific genome folding. Collectively, these results enable decoding genome function from sequence through structure. 

DOI: 10.1038/s41592-020-0958-x

编辑:小柯机器人

©2022年12月07日 07:19:47
基因在线
0.1664s加载完成
邮箱:info@jiyinzaixian.com