On a cold November morning of 1962, a woman with an undergraduate degree in mathematics and graduate studies in chemistry was punching cards in front of a huge IBM 7090 computer(see Featured Image). The computer looked like the one from the 1960 classic sci-fi movie ‘Time Machine’. A monstrous collection of metal bins with spinning tape disks made a humming noise and reverberated the Ledley lab of National Biomedical Research Foundation. Margaret Oakley Dayhoff sat there analysing the protein sequences in front of her, with Robert Ledeley looking on. She was trying to deduce the correct sequence of the protein.
Understanding a protein and its basic units was nothing new for Margaret, as a large work had already been performed. 20 years back in 1951, Pauling and Corey proposed the structure for the alpha-helix and beta-sheet. Exactly 2 years after Watson and Crick (1953) proposed the double helix model for DNA based on x-ray data obtained by Rosaline Franklin and Maurice Wilkins. This knowledge of the protein and the nucleotides(also referred as semantides by Linus Pauling and Emile Zukerkandi) opened a whole new understanding in molecular biology. But the real breakthrough was obtained after a decade long work(1945-1955) by Frederick Sanger of Cambridge University. He sequenced 51 amino acid sequence of bovine insulin protein. The work gained momentum and the early 1960’s saw new sequencing techniques like Edman degradation (Pehr Edman) and Ion-exchange column by Stanford Noore and Willain Stein. The semi-automated Ion exchange column sequenced 124 Ribonuclease amino acids faster than Sanger’s method.
Dayhoff was busy crunching her FOTRAN programming language code on the IBM7090 computer. Many first generation IBM computers were available for academic resources. Owning a mathematics degree, writing an algorithm was not much a difficult task for her. She wrote a series of FORTRAN programs to determine the amino-acid sequences of protein molecules. Taking the overlapping peptide fragments from the partial digestion of a protein, the programs deduced all of the possible sequences that were consistent with the data. Conceptually similar to the puzzle-solving approach of the early sequencing investigations of insulin and ribonucleas, Dayhoff ’s computer programs arrived at the correct sequence for a small protein (ribonuclease) within a few minutes. Dayhoff and Ledely realized the feat the computer had performed, and how the same work would have taken a team of humans many months to accomplish. They published in their work as ‘COMPROTEIN, a Computer Program to Aid Primary Protein Structure Determination’. Margaret Dayhoff had just “applied information technology to a biological system”, she had just laid a foundation to a new field!
After her initial success, Dayhoff expanded her work to nucleotides, and later established the Atlas of Protein Sequence and Structure, an annual publication that attempted to catalogue all known amino-acid sequences. Although rudimentary by today’s standards, the Atlas served as the first database for molecular biology. Little did she know that her work was going to expand by the inventions of two men, Steve Jobs (Apple I;1970) and Bill Gates (Windows; 1975). They developed desktop computers which became an essential component of every institute and Atlas became an indispensable resource for early computational research.
By 1970, Ben Hesper and Paulien Hedgeweg had coined the term bioinformatics and defining it as ‘‘the study of informatic processes in biotic systems’’. The word came to be widely used in 1978 and by 2002 was included in Oxford English dictionary (Another claim is by Dr. Hwa Lin who coined term in 1988) . Bioinformatics paved way for vast area of studies in Phylogenetic analysis, Omics study, Algorithms development, Sequence analysis, Pattern recognition. Bioinformatics has a tremendous role to play in the path breaking Human Genome Project(2003), Human Proteome Project (2014) and ongoing Human Microbiome Project and Human Metabolome Project.
There are many arguments as whether we should consider Dayhoff or Hedgeweg as the founder of Bioinformatics. But 54 years from today (2016), Dayhoff paved the way for a series of quests for the future scientists to work on! Nobel laureate Paul Nurse posted in the science supplement of the Guardian newspaper that the next ‘‘quantum leap’’ in biology will come through studying information processing in biological systems. In conclusion I would like to add that Bioinformatics is the Severus Snape (Harry Potter character) of Biological field. You don’t like it at first, but when you understand its importance, you start respecting it at a whole new level!!