markboguski.net

Home   Education   Employment   Publications   Presentations   Boards   Teaching   Awards

 

Bioinformatics (It’s not clear who first coined this term and when, but the first time it was used in the literature was in 1994 (see Boguski, 1994 and Murray-Rust, 1994).

GenBank and NCBI

  • The National Center for Biotechnology Information (1990) |PubMed|PDF|

  • Genbank (1994) |PubMed|PDF|
  • Genbank (1996) |PubMed|PDF|
  • Genbank (1997) |PubMed|PDF|
  • Genbank (1998) |PubMed|PDF|
  • Genbank (1999) |PubMed|PDF|
  • Database Divisions and Homology Search Files: A Guide for the Perplexed (1997) |PubMed|PDF|

Algorithms, Databases and Homology Searching

  • Molecular Sequence Databases and Their Uses (1992)
  • dbEST--Database For "Expressed Sequence Tags" (1993) |PubMed|PDF| The brevity of this correspondence stands in inverse proportion to the breath and depth of its impact on molecular cell biology and genetics.  Few remember today, but back in the late 1980s and early 1990s most biologists were either skeptical about or outright opposed to the Human Genome Project for any or all of three reasons: the sequence would be an uninterpretable white elephant, genomics would be mindless, boring science and the project would divert funding resources away from conventional “RO1 labs” doing the mindful, interesting research (McIntosh and West, 1995).  In this milieu, my colleagues and I at NCBI started the database of Expressed Sequence Tags (dbEST) in 1993 and, by 1996, in the plenary address at the annual Cold Spring Harbor Genome Meeting, Shirely Tilghman described its impact as part of a sea-change in biomedical research: “If a conservative is a liberal who has just been mugged…a genome enthusiast is a genome critic who just got a hit in the EST database.”  EST data created a number of unique informatics challenges, both quantitative and qualitative. At the time of its origin, dbEST contained only 22,537 partial cDNA sequences.  Within two years, it had grown to 60% of sequence entries in GenBank (Boguski, 1995) – more data that GenBank had accumulated during its first 13 years of operation (1982-1995).  Today (GenBank release 163), dbEST contains 48 million sequences, a nearly 2,000-fold increase in size since our paper appeared in 1993.  Although size matters, ESTs also differed qualitatively from traditional GenBank entries (representing functionally-cloned genes) in that the data were by nature incomplete, inaccurate and subject to several types of artifacts.  Addressing these problems in such a way that made the data useful to biologists required the integration of numerous, cutting edge sequence analysis algorithms and methods (Altschul, Boguski et al., 1995) and resulted in a number of bioinformatics “firsts” including i) periodic, automated re-annotation of sequence records; ii) sequence filtering and masking to deal with artifacts and natural, but confounding sequence features; iii) similarity searching using conceptual, six-frame translations of EST sequences and iv) creation of non-redundant search databases.  These advances enabled, for the general biomedical community: a) gene discovery (the database “hits’ to which Tilghman referred); b) gene expression profiling via transcript counting, years before the invention of microarray methods and c) comparative “genomics” of transcribed sequences by analysis of EST collections from different organisms.  Accelerated cloning was also enabled by dbEST because it was the first sequence database to systematically provide links to physical DNA clone resources, such as ATCC.  This was only the beginning as dbEST enabled important applications in genome research itself.  The clustering of highly-redundant EST collections into “UniGenes” (Boguski and Schuler, 1995) directly led to the first large-scale gene map of the Human Genome (Schuler, Boguski et al., 1996).  These so-called UniGenes were also critical to the development of functional genomics and used to design the first comprehensive, human cDNA microarray (Iyer et al., 1998).  Conversely, the inherent redundancy of EST sequences enabled rapid and efficient computational methods for the discovery of single nucleotide polymorphisms or SNPs (Marth et al., 1999).  dbEST also led to the development of heuristic methods to greatly improve the accuracy of computational gene predictions (e.g. Xu et al., 1997) and consequently annotation of the human genome.
  • Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment (1993) |PubMed|PDF|
  • Gene Discovery in dbEST (1994) |PubMed|PDF|
  • Issues in Searching Molecular Sequence Databases (1994) |PubMed|PDF|
  • Constructing Aligned Sequence Blocks (1994) |PubMed|PDF|
  • A Note About Computing All Local Alignments (1994) |PubMed|PDF|
  • Hunting for Genes in Computer Data Bases (1995) |PubMed|PDF|
  • Sequence Similarity Searching Using the BLAST Family of Programs (1995)

Review Articles and Tutorials

  • On Computer-Assisted Analysis of Biological Sequences: Proline Punctuation, Consensus Sequences, and Apolipoprotein Repeats (1986) |PubMed|PDF|
  • Rat Apolipoprotein A-IV: Application of Computational Methods for Studying the Structure, Function, and Evolution of a Protein (1986) |PubMed|PDF|
  • Homology and Similarity (1991)
  • Computational Sequence Analysis Revisited: New Databases, Software Tools, and the Research Opportunities They Engender (1992) |PubMed|PDF|
  • Bioinformatics (1994) |PubMed|PDF|
  • How to Make Discoveries in Molecular Sequence Databases (1995)
  • Internet Basics for Biologists (1995)
  • Sequence Similarity Searching Using the BLAST Family of Programs (1995)
  • Computational Analysis of DNA and Protein Sequences (1997) |PubMed|PDF|
  • Bioinformatics - a New Era (1998) |PDF|
  • The Bioinformatics Bookshelf: Teach Yourself Bioinformatics (1999) |PDF|
  • Biomedical Informatics for Proteomics (2003) |PubMed|PDF|
  • Genome Informatics: Current Status and Future Prospects (2003)       |PubMed| |PDF|

Home   Education   Employment   Publications   Presentations   Boards   Teaching   Awards

The first Bioinformatics textbook, 1991.

Oxana Pickeral and I reviewed 6 bioinformatics books for Cell in 1999.  Table II in this review contains an interesting list of topics and activities that were within the purview of bio-informatics at that time.

 

 

 

 

© Dr. Mark S. Boguski, All rights reserved