Genomic sequencing analysis in light of Nubeam: method and applications

Date: 2021/11/30 - 2021/11/30

Academic Seminar: Genomic sequencing analysis in light of Nubeam: method and applications

Speaker: Dr. Yongtao Guan

Time: 9:00 a.m.-10:00 a.m., Nov 30th, 2021 (Beijing Time)

Location: via Feishu


We present Nubeam (nucleotide be a matrix) as a novel reference-free approach to analyze short sequencing reads. Nubeam represents nucleotides by matrices, transforms a read into a product of matrices, and assigns numbers to reads based on the product matrix. Nubeam capitalizes on the non-commutative property of matrix multiplication, such that different reads are assigned difference numbers, and similar reads similar numbers. A sample, which is a collection of reads, becomes a collection of numbers that forms an empirical distribution. We demonstrate that the genetic difference between samples can be quantified by the distance between empirical distributions. Nubeam includes the k-mer method as a special case, but unlike the k-mer method, it is convenient for Nubeam to account for GC bias and nucleotide quality. As a reference-free approach, Nubeam avoids reference bias and mapping bias and can work with organisms without reference genomes. Thus, Nubeam is ideal to analyze datasets from metagenomics whole genome shotgun (WGS) sequencing, where the amount of unmapped reads is substantial. We also briefly discuss other applications of Nubeam, including genetic association, cancer screening, and indel calling.


Dr. Guan is a statistical geneticist specialized in developing statistical models and computational methods to analyze genetic and genomic data. He studied probability and stochastic processes, as well as classical population genetics theory, with Professors Steve Krone and Paul Joyce at the University of Idaho (2002-2006). The highlight of his Ph.D. work was the invention of the small-world Markov chain Monte Carlo algorithm and the theoretical study of its polynomial convergence.

Dr. Guan did an extended postdoctoral training with Professor Matthew Stephens at the University of Chicago (2006-2010). During which he worked on Bayesian imputation-based association mapping and Bayesian variable selection regression. He moved to Baylor College of Medicine to start his own lab in August 2010. The highlight of his tenure there include p-values of Bayes factors, strong selection at MHC in Mexicans (via local ancestry analysis), and his collaboration with a diagnostic lab in Beijing, China to improve the efficacy of the noninvasive prenatal screening (NIPS).

In July 2018, his lab moved to Department of Biostatistics and Bioinformatics at Duke University School of Medicine as part of a tenure promotion. But his tenure was delayed due to an NIH investigation under the directive of Trump Department of Justice’s “China Initiative” for his collaboration with China. Although Dr. Guan disclosed to Duke on multiple occasions of his collaboration with China, NIH excluded Dr. Guan from all NIH funded research for unspecified “incomplete disclosure”, and Duke subsequently denied his tenure. Before he closed down his lab, his active research include genetic determinant of health disparity, cancer screening and monitoring, and Nubeam genomic sequencing analysis.