管道预测总结。RNA- muect - wmn管道概述:在训练集(n = 100,绿色箭头)中,RNA- mutect应用于肿瘤RNA和匹配的正常DNA,以获得标记为体细胞或种系的变体列表。然后用收集到的每个变体的特征集以5倍交叉验证的方式训练随机森林分类器。在测试集中(橙色箭头),执行3个步骤:(1)将MuTect应用于肿瘤RNA和没有匹配的正常样本,以产生混合的体细胞和种系变体列表。(2)将5个训练好的模型应用于该变异集,并以多数投票的方式将其分为体细胞或种系。(3)最后,通过RNA-MuTect过滤步骤进一步对预测的变异集进行过滤。b每个样本在验证集(左)和测试集(右)上计算的精度和召回值的分布。箱形图显示中位数,第25和第75百分位。胡须延伸到不被认为是异常值的最极端数据点,异常值表示为点。c精密度为每个样本真实体细胞突变数的函数。 d Correlation between the number of predicted somatic mutations and the number of somatic mutations as determined by DNA with a matched-normal DNA sample. e Correlation between the number of predicted somatic mutations and the number of somatic mutations as determined by RNA with a matched-normal DNA sample. f Distribution of precision and recall values on validation (left) and test (right) sets computed for each sample in the lung dataset. Box plots show median, 25th, and 75th percentiles. The whiskers extend to the most extreme data points not considered outliers, and the outliers are represented as dots. g Distribution of precision and recall values on validation (left) and test (right) sets computed for each sample in the colon dataset. Box plots show median, 25th, and 75th percentiles. The whiskers extend to the most extreme data points not considered outliers, and the outliers are represented as dots. Source data are provided as a Source Data file. Credit:自然通讯(2022)。DOI: 10.1038 / s41467 - 022 - 30753 - 2