How to Approach Genetic Sequencing Databases: A Primer.
Lily Wang, Ph.D.
Vanderbilt Kennedy Center
Statistics and Methodology Core
Over the past decade, with the advances of rapid sequencing technology and the completion of sequencing chromosomes of model organisms such as E coli, mouse, fruit fly and human, enormous amount of DNA and protein sequence data has accumulated in genetic databases. With the massive amount of sequence data, the important goals of modern molecular biology are to understand the structure, function and evolutionary relationships of the genes and proteins. However, experimental methods to achieve these goals are time-consuming and expensive and therefore will not keep pace with the fast growth of seqeunce databanks. Regions of protein sequences that do not change as much as the rest during evolution often suggest similarities in structure, function and relationships in phylogeny. Sequence comparison methods following the theory that similar sequence implies similar structure has been effective tools for understanding properties of biological molecules.
This talk aims to provide researchers with an overview of the major genetic (nucleotide and protein) sequence databanks and major software tools for sequence alignment and database searching. In addition, I will give an introduction to the statistical and computational issues involved in sequence comparison methods.
1. An overview of major sequence databases
2. Why sequence comparison methods are effective tools for understanding properties of proteins.
3. Methods for alignment of sequences and statistical distributions of alignment scores.
4. Commonly used software tools and general guidelines for conducting sequence analysis.
5. Further research in this area.
1. To give an overview of major sequence databases and software tools for database searching.
2. To introduce biological background of sequence analysis and to identify statistical and computational issues involved.
3. To introduce algorithms of sequence alignment and how to use statistical distribution to assess significance of alignment scores.
Researchers who are currently using genetic sequencing databases.
Researchers who might be interested in exploring the relationship between their current research areas and genetics, in particular biological sequence analysis.
Lily Wang, PhD, Assistant professor in Biostatistics, is a
statistical advisor of the Quantitative Core. Wang has a doctorate in
Biostatistics from the