Copy
H3ABionet Seminars' Coordinators team
View this email in your browser

H3ABioNet Seminars' series

The H3ABioNet Seminar co-ordinating team on behalf of the H3ABioNet Research Working Group cordially invite you to join us for the February 2017 H3ABioNet seminar under the theme of: “Variants calling from NGS datasets”.

Seminar Format:
2 seminar talks will be provided by 3 H3ABionet students on findings from their recent work on variant calling and analysis pipelines for use with NGS datasets. The talks will be 20 to 25 minutes each and followed by a 10 to 15-minute of discussion and Q/A.

Seminar Date: Thursday 16th March 2017
Seminar time: 1pm UTC / 2pm WAT / 3pm CAT / 4pm EAT
URL to join the seminar: https://mconf.sanren.ac.za/webconf/h3abionet_seminars
Variant calling by assembly: complex variants and repeats in populations of complex genomes

Agriculturally important plants have complex genomes.  Ploidy levels can range from 2 in Maize, 4 in soybean, and up to 12 in sugarcane. In addition to high levels of ploidy, many crop genomes are very repetitive, contain blocks of duplicated genes, and many transposons. In order to improve the yield and performance of crops through efficient plant breeding and genetic engineering, the discovery of key genetic variants unique to certain genotypes is essential. Standard variant discovery pipelines rely on the construction of a high quality reference genome for the species of interest, to which different small sequencing reads from samples can be aligned.  Genome complexity hinders the construction of a quality reference, which in turn makes detection of large complex variants difficult, because the reference assembly may miss regions containing agriculturally important variants.
    We developed a reference-free variant calling workflow for plant genetics that relies on variant calling by assembly. Currently, reference-free variant calling algorithms have been implemented in bacteria and diploid animal variant calling pipelines, but they are not as well established as reference based algorithms. They are also more challenging to use, and have not been validated for use in plant genetics. By building a workflow around reference-free variant calling algorithms in plant genetics, we aim to identify key variants that would not have been found otherwise. Specifically, we utilize the Cortex-var software in Glycine max.
    Glycine max, commonly known as soybean, is an ideal organism for this study, as there are many published variants to validate the software with, its genome is highly repetitive and duplicated, and has a ploidy level of 4. In other words, the genome of Glycine max is not so complex as to make our task impossible, yet not so simple as to obviate it. Cortex-var takes short sequencing reads from a number of samples and assembles them simultaneously into de Bruin graphs, which are then compared to look for divergences along the traversal path. These divergences are potential single nucleotide polymorphisms (SNPs), complex variants, or genomic repeats, and are classified as such by the software. This can be done reference-free. However, if there is a reference available, is can be used to see where the sample(s) diverge(s) and roughly place the variant into a region of the genome. We developed a workflow using the Soybean NAM dataset, and explored a number of ways to call variants using Cortex-var. Using the two different algorithms provided by the software, we have been able validate known SNPs in soybean, and identify potential novel variants.
Variants Calling Optimization

Calling variants in large cohorts could result in the variants observed at the level of the individual being lost in joint genotyping. Besides experimental errors from library preparation, the called variants are subject to biases due to the choice of software, configuration of the analysis pipeline, and individual parameters of each tool used. This report provides insights into the effect of the parameter configurations in a variant calling pipeline following GATK best practices from a mathematical point of view, along with experimental results. Computational challenges relating to running the pipeline are also highlighted.
 


Mattew Kendzior is a graduate student at the University of Illinois at Urbana-Champaign (UIUC) working towards his Masters of Science in Bioinformatics. Matt has always had a strong interest in biology during his early schooling years, and did his undergraduate degree in the department of Crop Sciences at UIUC. His concentration was in Plant Biotechnology, and it was during this time that he further focused his interests in plant genetics. Matt became passionate about increasing crop productivity in order to feed a growing global population. His current research focuses on computational genomics applied to Glycine max, and he works in the laboratory of Dr. Matthew Hudson

 



 


Junyu Li is an undergraduate student in her senior year at University of Illinois at Urbana Champain, studying molecular biology and minor in computer science. she joined this project in June 2016 through the SPIN program at National Center for Supercomputing Applications. Junyu loves to study genomes, brains, algorithms, Haruki Murakami and cats, or just play with them.

 



 




 
Azza Ahmed is a multi-cultural, multi-disciplinary academic: Born in Sudan, raised in the UAE, and then taking graduate studies in the UK. She majored in control and instrumentation (electrical engineering) from the University of Khartoum, and then extrapolated on that to control theory in man in her MSc on Cybernetics from the University of Reading. Passionate about learning, her journey from a student, to a teaching assistant and then a lecturer, came about developing soft and hard technical skills from both voluntary and academic work; projects and specialized courses; in person and on line. She has started her PhD journey with the H3ABioNet node of Sudan, targeting the areas of statistical and mathematical methods and pipelines of data analysis in Genome Wide Association Studies (GWAS) and Next-Generation Sequencing (NGS).
 
Copyright © 2017 H3ABioNet, All rights reserved.


Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list

Email Marketing Powered by Mailchimp