A novel high-accuracy genome assembly method utilizing a high-throughput workflow

Zeng, Qingdong ; Cao, Wenjin ; Xing, Liping ; Qin, Guowei ; Wu, Jianhui ; Nagle, Michael F. ; Xiong, Qin ; Chen, Jinhui ; Yang, Liming ; Bajaj, Prasad ; Chitikineni, Annapurna ; Zhou, Yan ; Yu, Yunxin ; Xu, Jiang ; Nie, Xiaojun ; Huang, Lin ; Liu, Shengjie ; Šafář, Jan ; Šimková, Hana ; Song, Weining ; Guo, Baozhu ; Chen, Shilin ; Doležel, Jaroslav ; Hao, Zhaodong ; Cheng, Qiang ; Liang, Jianguo ; Tang, Jiansong ; Cao, Aizhong ; Wang, Qiang ; Lu, Xiangqian ; Yang, Shouping ; Ma, Hongxiang ; Liu, Jiajie ; Wang, Xiaoting ; Zhang, Hong ; Wang, Zhonghua ; Ji, Wanquan ; Wang, Changfa ; Yuan, Fengping ; Shi, Jisen ; Varshney, Rajeev K. ; Kang, Zhensheng ; Han, Dejun ; Xu, Haibin (2020) A novel high-accuracy genome assembly method utilizing a high-throughput workflow bioRxiv .

Full text not available from this repository.

Official URL: http://doi.org/10.1101/2020.11.26.400507

Related URL: http://dx.doi.org/10.1101/2020.11.26.400507

Abstract

Across domains of biological research using genome sequence data, high-quality reference genome sequences are essential for characterizing genetic variation and understanding the genetic basis of phenotypes. However, the construction of genome assemblies for various species is often hampered by complexities of genome organization, especially repetitive and complex sequences, leading to mis-assembly and missing regions. Here, we describe a high-throughput gold standard genome assembly workflow using a large-scale bacterial artificial chromosome (BAC) library with a refined two-step pooling strategy and the Lamp assembler algorithm. This strategy minimizes the laborious processes of physical map construction and clone-by-clone sequencing, enabling inexpensive sequencing of several thousand BAC clones. By applying this strategy with a minimum tiling path BAC clone library for the short arm of chromosome 2D (2DS) of bread wheat, 98% of BAC sequences, covering 92.7% of the 2DS chromosome, were assembled correctly for this species with a highly complex and repetitive genome. We also identified 48 large mis-assemblies in the reference wheat genome assembly (IWGSC RefSeq v1.0) and corrected these large mis-assemblies in addition to filling 92.2% of the gaps in RefSeq v1.0. Our 2DS assembly represents a new benchmark for the assembly of complex genomes with both high accuracy and efficiency.

Item Type:Article
Source:Copyright of this article belongs to author(s).
ID Code:124677
Deposited On:29 Nov 2021 10:33
Last Modified:29 Nov 2021 10:33

Repository Staff Only: item control page