Topics in Computational
Computer Science 88/188
Our goal is to look at some algorithmic problems related to
three-dimensional structures in chemistry and molecular biology,
emphasizing the perspective of geometric algorithms. We hope to
consider a variety of topics (guided by the interests of the
participants), and to make the seminar interesting to people with as
wide a range of backgrounds as possible. Some of the topics we may
cover include: Protein and RNA-folding, Distance Geometry and
Assignment for Protein NMR, DNA arrays, the Phase Problem in X-ray
Crystallography, Rational Drug Design, Molecular Docking, identifying
structural domains and motifs in proteins, and conformational search.
"Strictly speaking, molecular biology is not a new discipline, but
rather a new way of looking at organisms as reservoirs and
transmitters of information. This new vision opened up possibilities
of action and intervention that were revealed during the growth of
- Michel Morange,
"A History of Molecular Biology," Harvard
University Press (1998).
The CS-Bio seminar is open to graduate students, and advanced
undergraduates with a background in both algorithms and systems (at
least CS 25 and CS 23). A background in biology is useful but not
required. Students should be interested in doing some outside reading
in biochemistry and biophysics. Students will be required to present
papers in the seminar, and to do a project. Non-CS students (e.g., in
biology and chemistry) with an interest in computational issues are
invited as well; please speak with me about your
background first though.
If you took my previous CS-Bio seminar in
1998, I estimate that the papers we will read will have about 20%
overlap. I plan for us to read a largely different corpus; for
example, we may read several papers on structural genomics, and
papers on mass spectrometry for functional genomics.
How to Give a Good Talk
If you are scheduled to give a talk, I've prepared a set of hints
for giving a good talk
that I encourage you to look over.
Students will be required to do a project. Pick something in the
general area of computational molecular biology or algorithms for
structural molecular biology you are interested in, and (a) implement
it, (b) analyze it, (c) improve it, (d) extend it, or (e) apply it.
A one-page written project proposal is due on May 9.
Final projects are due on the last day of class. You must
in a written report, and
- Make a web page about your project.
- (1) and (2) can be the same document.
- Email me the webpage for your report. E.g.,
- Your final report can be in html or PostScript from LaTeX. If you
want to use another format, ask me first.
- I suggest your final report should contain many
illustrative pictures and figures.
- If you wrote code, I would like to see it. Please include the
code with your writeup, and link to it from your webpage.
Here is a list of recommended textbooks.
*Papers that are not available online (below) have been handed out
*RECOMB papers (Proceedings of the Nth Annual
International Conference on Computational Molecular Biology
are available online via the
ACM Digital Library.
- March 28
Introduction to Computational Biology and Chemistry.
Do these tasks:
- Read about and download
RasMol on your
machine to be able to view and manipulate biopolymers.
- To read the papers, you will need to be able to download and
print PostScript files
and Adobe PDF files from the WWW. Please familiarize yourself with how
to do this.
"Force field construction is really so arbitrary that if one feels
that Boltzmann-based statistics are inadequate, one can simply add
in any term that appears useful."
Andrew E. Torda,
"Perspectives in Protein-Fold Recognition",
Curr. Opinion in Struct. Bio., 1997, 7:200-205.
- March 30
Presenting: Chris Bailey-Kellogg.
NMR Assignment and Structure
NOESY Jigsaw: Automated Protein Secondary Structure and Main-Chain
Assignment from Sparse, Unassigned NMR Data," (C. Bailey-Kellogg,
A. Widge, J. Kelley, M.
Berardi, J. Bushweller, and B. R. Donald), The Fourth Annual
International Conference on Computational Molecular Biology
(RECOMB'2000), Tokyo, Japan, April 8-11, 2000 pp. 33-45.
- "Automated Analysis of NMR assignments and structures for
proteins," H. Moseley and G. Montelione.
- Background Reading:
- Cavanagh et al, chapter 8.
- Reference: Protein NMR Spectroscopy : Principles and Practice by John
Arthur G., III Palmer, Wayne Fairbrother (Contributor), Nick Skelton
(Contributor) Hardcover - 587 pages (April 1996) Academic Pr; ISBN:
- Refer to Wüthrich as needed for reference
- Reference: NMR of Proteins and Nucleic Acids by Kurt Wuthrich Hardcover - 320
pages (September 1986) John Wiley & Sons; ISBN: 0471828939
Online Tutorials, Notes, and References on
- April 4
Presenting: Bruce Donald.
Reading: De Novo Protein Design: Fully Automated
Sequence Selection [PDF]
Science (1997) October 3; 278 (5335):82 B. I. Dahiyat and
S. L. Mayo.
- April 6 and April 11
away at RECOMB'2000
Bruce and Chris
- April 13
Presenting: Tim Danford.
- Clustering Gene Expression Patterns,
A. Ben-Dor, Z. Yakhini (RECOMB'99)
- Algorithms for Choosing Differential Gene Expression Experiments,
R. M. Karp, R. Stoughton, K. Y. Yeung (RECOMB'99)
- An Algorithm for Clustinering cDNAs for Gene Expression Analysis,
E. Hartuv, A. Schmitt, J. Lange, S. Meirer-Ewert, H. Lehrach, R. Shamir
- April 18
Presenting: Cliff Stein.
- "Global Optimum Protein Threading with Gapped Alignment and
Empirical Pair Potentials," Lathrop, R.H. and Smith, T.F.,
J. Mol. Biol. (cover article), 255:641-665, Feb., 1996.
- Cover Picture
- April 20
Presenting: Zack Berke.
Gene chips, Regulatory networks, and sequencing by hybridization
[more slides] (Powerpoint)
- Identifying Gene Regulatory Networks from Experimental Data,
T. Chen, V. Filkov, S. S. Skiena (RECOMB'99)
- On the Power of Universal Bases in Sequencing by Hybridization,
F. P. Preparata, A. M. Frieze, E. Upfal (RECOMB'99)
- April 25
Presenting: Michael Johnson.
- April 27
Presenting: David Wagner.
Distance Geometry (Continued)
- B. Berger, J. Kleinberg, F.T. Leighton. Reconstructing a
Three-Dimensional Model with Arbitrary Errors. Proc. 28th ACM
Symposium on Theory of Computing, 1996. [Postscript].
- May 2
Presenting: Chris Langmead.
Time-Frequency Analysis of Chemical Shift Dynamics in Protein
- Reading: "Time-Frequency Analysis of Chemical Shift Dynamics in Protein
NMR Data" by C. J. Langmead and B. R. Donald.
- May 4
Presenting: Tony Yan.
Threading for NMR Structure Determination by Homology
Y. Xu, D. Xu, O. H. Crawford, J. R. Einstein, E. Serpersu :
Protein Structure Determination using Protein Threading and Sparse
NMR Data (RECOMB'2000).
- May 9
A one-page written project proposal is due Today.
Presenting: Hany Farid.
- Book chapter: Minimization or Maximization of functions.
(from Gilbert Strang's Linear Algebra and it's Applications).
- Book chapter: Projections and least-squares approximations.
(from Numerical Recipes in C.)
- A new approach to the design of uniquely folded thermally stable
proteins, by X. Jiang, Hany Farid, E. Pistor, and Ramy Farid
(Protein Science, 9:403-416, 2000).
- Background reading:
Prediction and Evaluation of Side-chain Conformations for Protein Backbone Structures,
P.S. Shenkin, H. Farid and J.S. Fetrow
in Proteins: Structure, Function and Genetics, 26:323-352, 1996
- May 11
Presenting: Jack Kelley.
Introduction to Experimental and Computational Issues in Mass
Spectrometry for Structural and Molecular Biology
- Siuzdak G. The emergence of mass spectrometry in biochemical research. Proc
Natl Acad Sci U S A. 1994 Nov 22;91(24):11290-7. [PDF]
- May 16
Presenting: Andrew Ko.
Mass Spec (continued).
- May 18
Presenting: David Wagner.
Mass Spec (continued)
- P. Pevzner, V. Dancik, C. L. Tang:
Mutation-Tolerant Protein Identification by Mass-Spectrometry
- "A Universal Algorithm for Fast and Automated Charge State
Deconvolution of Electrospray Mass-to-Chage Ratio Specta", by Zhongqi
Zhang and Alan C. Marshall (Am. Soc. for Mass Spec., Elsvier) 1997.
- May 23
Presenting: Eunnok Sohn.
Algorithms for Mass Spec
Reading: Course notes on
MS. (Gzipped PostScript). Don't forget to read the
- May 25
Presenting: Tim Danford.
RNA Structure Prediction
- R. B. Lyngso, C. N. S. Pederson :
Pseudonknots in RNA Secondary Structures
- E. Rivas and S. Eddy, "A dynamic programming algorithm for RNA
structure prediction including pseudoknots," Journal of Molecular
Biology, 285:2053-2068 (1999). [PDF]
- "Structure, Stability and Function of RNA Pseudoknots
Involved in Stimulating Ribosomal Frameshifting,"
by David P. Giedroc*, Carla A. Theimer and Paul L. Nixon
J. Mol. Bio. (2000) 298:167-185. [PDF]
- Tuesday, May 30
Last day of class
Final Projects are due Today.
Presenting: Elisheva Werner-Reiss (email@example.com).
Topic: Generalized Convolution for Biopolymers
- Greg Chirikjian, "Conformational statistics of macromolecules
using generalized convolution".
- Greg Chirikjian and Y. Wang, "Conformational statistics of stiff
macromolecules as solutions to PDEs on the rotation and motion groups".
Some other papers we
- Whitepaper on Advanced Computational
Structural Genomics (read the long version, not the "lite" version).
- Fast detection of
common substructure in proteins, P. Chew, K. Kedem, J. Kleinberg,
and D. Huttenlocher (RECOMB'99).
- Rick Lathrop Lab
- For Wolfson-Nussinov work on geometric
hashing for protein complexes, take a look at
AMMP is a modern full-featured molecular mechanics, dynamics and
modeling program. It can manipulate both small molecules and
macromolecules including proteins, nucleic acids and other
polymers. In addition to standard features, like numerically stable
molecular dynamics, fast multipole method for including all atoms in
the calculation of long range potentials and robust structural
optimizers, it has a flexible choice of potentials and a simple yet
powerful ability to manipulate molecules and analyze individual energy
terms. One major advantage over many other programs is that it is easy
to introduce non-standard polymer linkages, unusual ligands or
non-standard residues. Adding missing hydrogen atoms and completing
partial structures, which are difficult for many programs, are
straightforward in AMMP.
Read the white paper on Advanced Computational
Computational biology research at Dartmouth.
Donald Lab Papers at
Systems in Molecular Biology (ISMB) (all meetings).
Dartmouth M.D.-Ph.D. Program
Web sites of interest
to structural biologists.
large resource page on computational biology at George Mason University.
large resource page on bioinformatics at the Institut Pasteur.
list of protein folding groups on the web.
WWW Virtual Library page on biomolecules.
The Journal of Computer-Aided Molecular Design
resources and descriptions of problems in Computational Biology.
Related Resources on the World Wide Web
Muscle-Specific Regulation of Transcription: A Catalog of Regulatory
Elements by Laura L. L-pez
and James W. Fickett presents a summary of published information on
muscle-specific transcriptional regulation.
Pedro's BioMolecular Research Tools
is a collection of
WWW links to information and services useful to molecular biologists. It
provides links to molecular biology search and analysis tools;
bibliographic, text, and Web search services; guides and tutorials; and
biological and biochemical journals and newsletters.
The World Wide Web Virtual Library: Biosciences
points to virtual library pages for
Biochemistry and Molecular Biology. Each of these pages
presents a long list of Web resources. The World Wide Web Virtual Library
Biomolecules covers molecular sequence and structure databases,
metabolic pathway databases, and other lists of Web resources. The World
Wide Web Virtual Library: Biochemistry and Molecular Biology is a list of
resources listed by provider.
Cell & Molecular Biology Online is a
well-organized list of Web resources for cell and molecular biologists. For
each resource, a brief description is provided.
CSUBIOWEB, the California State
University Biological Sciences Web server, provides links to other Web sites
on cell biology and molecular biology.
The Dictionary of Cell Biology (London: Academic Press, 1995) defines transcription, leucine zipper, and
other terms used in this research commentary.
Biotech Life Science Dictionary is a free resource
that defines terms in biochemistry, biotechnology, botany, cell biology, and
genetics, including terms used in this research commentary.
Protein Synthesis is a tutorial on the processes involved in Protein Synthesis, starting from
the genetic information in DNA, through transcription to produce messenger
RNA, and translation of mRNA to a polypeptide. This tutorial is a section
of Principles of Protein Structure Using the Internet, a Birkbeck College
(University of London) accredited Advanced Certificate course.
Reading the Messages in Genes describes
transcription and provides a diagram. This page is a unit of Access
Excellence, a national educational
program sponsored by Genentech that provides high school biology
teachers access to their colleagues, scientists, and critical sources of new
scientific information via the Web.
The MIT Biology Hypertextbook is a Web-based textbook developed for introductory biology courses at MIT. Central
Dogma provides an
illustrated description of the process of transcription.
DNA binding proteins, enhancers, and the control of gene expression describes
transcription and transcription factors. This page was developed by Ronald
R. D. Croy as a component of Course Notes for Molecular Genetics I Lectures.
Control of Gene Expression in Eukaryotes
by Phillip McClean is a tutorial on gene regulation. The Transcription
Complex provides a brief discussion of transcription factors.
The Mechanisms of Gene Regulation are outlined in Microbial Genetics Lecture Notes, developed by L.
S. Pierson III and C. Kennedy for a class at the University of Arizona.
The Wolberger Lab lists
publications of Cynthia Wolberger and her co-workers.
Introduction to the Metazoa
describes the metazoan phyla. This introduction is a chapter of The
Phylogeny of Life, an
online exhibit developed by the University of California Museum of
Protein Zippers describes the leucine zipper and provides an illustration.
Barbara Graves' research is described and selected publications are
listed on the Huntsman Cancer Institute Web page at
the University of Utah.
Some Useful References for the Course
- Introduction to Protein Structure, Branden, C. and Tooze, J. (1991) Garland
Publishing, New York
- Proteins, Creighton, T.E. (1993) 2nd edition, W.H. Freeman & Co., New York
- Principles of Protein Structure, Schulz, G.E. and Schirmer, R.H. (1979) Springer-Verlag, New York
- Protein Structure - New Approaches to Disease and Therapy, Perutz, M. (1992) W.H. Freeman & Co., New York
- Enzyme Structure and Mechanism, Fersht, A.R. (1976) 2nd ed., pub. W.H.Freeman & Co., New York
- Biochemistry, Stryer, L., (1995) 4th edition, W.H. Freeman & Co., New York
- Biochemistry, Voet, D. and Voet, J.G. (1995) 2nd edition, John Wiley & Sons, New York
- Principles of Biochemistry, Zubay, G.L., Parson, W.W. and Vance, D.E. (1995) Wm. C. Brown, Dubuque, Iowa
- Molecular Cell Biology, Darnell, J., Lodish, H. and Baltimore, D. (1995) 3rd edition,
W.H. Freeman & Co., New York
- Molecular Biology of The Cell, Alberts, B., Bray, D., Lewis, J., Raff, M., Roberts, K. and Watson, J.D.
(1994) 3rd edition, Garland Publishing, New York
BioComputing, for the VSNS-Biocomputing Division Course
Biology, developed by Shane Crotty, MIT
Course/Tutorial on Cell Biology, Mark Dalton, Cray Research
Principles of Biochemistry, Horton, Moran, Ochs, Rawn, Scrimgeour
Help with PDF files
getting "There was a problem processing a page" or
"This file contains information not understood by the viewer"
errors that keep you from viewing a PDF file?
You must upgrade to version 3 of the Adobe Acrobat Reader software.
Version 2 is no longer compatible with the PDFs for this course.
Return to top of page