Computer Applications in the Biosciences 1996, 12, 227-229.


Image Library of Biological Macromolecules

Jürgen Sühnel

Institut für Molekulare Biotechnologie, Postfach 100813, D-07708 Jena / Germany

(E-Mail: jsuehnel@imb-jena.de)


Abstract

An Image Library of Biological Macromolecules is described, which contains image and text files related to structures of biological macromolecules. Currently, the Image Library has about 3000 image files of about 300 structures of biological macromolecules whose coordinates are available in the Protein Data Bank and in the Nucleic Acid Database. The entries include all RNA structures, about 70 DNA structures, 150 proteins and a few carbohydrates. The Library contains further images of amino acids, of standard and modified nucleotides and of nucleic acid model structures. Each entry consists of an annotation file with bibliographic and sequence information and possibly comments, of a color-coded distance plot and of structure images. Almost all of the images are available both in a mono and in a stereo representation. It was strictly avoided to adopt standard procedures for generating these images. Therefore, mixed rendering, coloring and labeling techniques were used extensively. Since May 1995 the Library has a growing division of images in the new VRML format (VRML - Virtual Reality Modeling Language). The Image Library of Biological Macromolecules (http://www.imb-jena.de/IMAGE.html) can be accessed via the World-Wide Web . There is a large number of structures determined by experimental and/or modeling techniques which are not intended to be included into the Protein Data Bank or Nucleic Acid Database for some reason. The Image Library could be a repository of these structures and of images of these and other structures of biological macromolecules including structures which are not known at atomic detail. Authors who are willing to make available images or coordinates to the scientific community via the Image Library of Biological Macromolecules are requested to contact the author.

Introduction

Structural information on biological macromolecules is an essential requirement for our understanding of biological function and for a deliberate variation of this function by rational or evolutionary approaches. Progress in recombinant DNA technology and RNA synthesis, X-ray and NMR instrumentation and computer and software technology has led to an increasing rate of accumulation of new structures.

Structural data of biological macromolecules in terms of coordinate files are available from the Protein Data Bank (PDB) at Brookhaven National Laboratory (Bernstein et al., 1977) and from the Nucleic Acid Database (NDB) at Rutgers University (Berman et al., 1992). Visualization of these structures plays a central role in identifying structural motifs and even in understanding biological function. Often the structure images lead directly to new biological insights or to new hypotheses. Visual information is obviously very appropriate for 'understanding' the complex structures of biological macromolecules (Hall, 1995).

The usual way to visualize the structures of biological macromolecules is to retrieve the coordinate files from PDB or NDB over the Internet and then to use one of the molecular graphics packages for displaying and possibly manipulating the structure. This approach is the method of choice if one is interested in a particular structure in a more detailed way. On the other hand, a rather common situation is that one would prefer to have the image of a structure directly available without the need to spend some time with the generation of the image or even without having access to a special molecular graphics software.

Recently, network information systems reached even public consciousness as a result of the dramatic growth in the use of the Internet (Schatz and Hardin, 1994). The World-Wide Web (WWW) has probably already changed the way science is done. One of its most appealing features is that, at least in principle, only one user interface is necessary for accessing almost all types of informations. Using these hypermedia protocols it is now very easy to transfer images or videos over the Internet. In addition, in 1995 the first VRML viewers have become available. The Virtual Reality Modeling Language (VRML) is a developing standard for describing three-dimensional scenes delived across the Internet. It enables one to rotate, translate or zoom 3D objects, like biological macromolecules for example. Of course, this can be done much better using molecular graphics packages. However, one should realize that VRML viewers will be an integral part of the next-generation web browsers. Insofar, this approach is very appropriate for making available visual structure information on biological macromolecules to a broader community within and even outside of science. Moreover, dynamic behaviour will be included into the VRML specification soon, which makes this format appropriate for collaborative work.

We have started to set up an Image Library, which combines the visualization of biological macromolecules with the new network tools.

The Image Library of Biological Macromolecules

The Image Library provides images of biological macromolecules and, in addition, relevant elementary information like images of amino acids and standard and modified nucleotides, images on the definition of strand direction and of the torsional angles in nucleic acids and DNA model conformations. It is subdivided into DNA, RNA, protein and carbohydrate divisions, where nucleic acid-protein complexes can be found in the nucleic acid parts and DNA-RNA hybrids in the RNA division. Each entry consists of a text file, of a variety of molecular structure images, and of color coded distance plots. The text file contains general information on the structure, like the sequence and the citation of the structure report and a listing of all image files available. The images of molecular structures are intended to provide as much information as possible. Therefore, mixed rendering, coloring and labeling techniques are extensively used. All molecular images are available in a mono and in a stereo representation. In addition to the molecular images color coded distance plots are included. Distance plots relate the distances between representative atoms of amino acids and/or nucleic acids in the 3D structure to the sequence (Godzik et al., 1993). Insofar they yield very useful structural information in addition to the 3D images. Difference distance plots can be used to display subtle differences between two similar structures. The distance plots are also very useful for protein-nucleic acid complexes. One advantage is that they provide a comprehensive overview over all residues involved in the interaction region in relation to their sequence position. The filenames are created according to the following rules: First the PDB and/or the NDB code or some other name for model structures is used. The second part of the name indicates the image type. The name of the graphics software used stands for the usual 3D images of molecular structures, distc and dist3d indicate contour and 3D distance plots and sec images of secondary structures. Moreover, an additional s indicates a stereo image. For example, the file 1ecl_midas_3_s.gif represents a stereo representation of the structure of the 67 K N-terminal fragment of E. coli DNA topoisomerase with the PDB code 1ecl (Lima et al., 1994) generated using the graphics software MIDASPLUS (Ferrin et al., 1988). The following molecular graphics software packages were used: INSIGHTII (Biosym Technologies, Inc.), MIDASPLUS (Ferrin et al., 1988), PROEXPLORE (Oxford Molecular, Ltd.), SETOR (Evans, 1993), VRCHEM, VMD (Theoretical Biophysics Group, Beckman Institute for Advanced Science and Technology, University of Illinois), SYBYL (Tripos, Inc.). The distance matrices were calculated using our own code and the color coded distance plots (3D or contour) were generated using Stanford Graphics (3D Visions, 2780 Skypark Drive, Torrance, CA 90505, USA), Origin (MicroCal Software Inc., 22 Industrial Dr. E., Northampton, MA 01060, USA) and Spyglass Transform (Spyglass, Inc., 1800 Woodfield Drive, Savoy, IL 61874, US).. The names of the files with bibliographic information consist simply of the PDB and/or NDB code and the extension txt. All images are stored in GIF format except for the distance plots which are available both as GIF and PostScript files.

Since May 1995 VRML viewers, like WebSpace, i3D or VRweb, have become generally available. More details on these viewers can be found in the VRML division of the Image Library. A more comprehensive overview on VRML is available via the VRML repository at the San Diego Supercomputer Center (URL:http://www.sdsc.edu/vrml/). VRML is an open platform-independent 3D file format which is based on the Open Inventor format of Silicon Graphics, Inc. The Image Library contains now an increasing number of VRML images (more than 150). They were generated using either the Inventor interface of MIDASPLUS or using the explorer tool developed by Silicon Graphics, Inc. and now supported by the Numerical Algorithms Group, Ltd. with EyeChem modules written by Omer Casher, Department of Chemistry, Imperial College, London. In both cases files in Inventor format are generated which are then converted to the VRML format. The VRML division of the Image Library of Biological Macromolecules was one of the first applications of Virtual Reality Modeling in biology. To the best of our knowledge it is the very first application which is not aimed at demonstration purposes alone. As already noted one can interact with VRML images, without having available the coordinates of the underlying structure and molecular graphics software. This is a brand-new development which still suffers from various problems. One problem is that complex structures may yield very large datasets. Fortunately the compression rate is rather high, in many cases 90%. On the other hand, uncompression takes time. Insofar, it may happen that currently less powerful computers are not able to manage larger VRML files. One should realize, however, that the performance of the viewers, like WebSpace for example, has already dramatically increased since May 1995.

Currently (November 1995), the Image Library has about 3000 image files of about 300 biomolecular structures. The entries include all RNA structures stored in the Protein Data Bank (PDB) and in the Nucleic Acid Database (NDB), about 150 proteins, approximately 70 DNA structures and a few carbohydrates.

The Internet address of the Image Library of Biological Macromolecules is: http://www.imb-jena.de/IMAGE.html. The entries can be accessed either directly or by a text search of filenames and text files. Due to the file name rules used it is very simple to search for special file types. On the other hand, the search option can also be used to find author names, for example, or even all structures with a particular modified nucleotide.

Readers are asked to check out the Image Library on their own. Therefore, example images are not included in this report.

There is a great deal of structures of biological macromolecules which are not available via the Protein Data Bank or the Nucleic Acid Database for some reason. Therefore, authors are encouraged to deposit their structures at the structure databases mentioned. On the other hand, the Image Library could represent a useful addition to these databases. We are willing to include in a relative informal way coordinates of published structures which are not intended to be made available via PDB or NDB for some reason. This refers also to structures obtained by modeling procedures. Further, we would like to encourage authors to make available to the scientific community their own images of structures via the Image Library be the coordinates deposited in PDB/NDB or not. This could be interesting for authors of structure reports because most journals have restrictions on the number of color plates.

Recently, electron microscopists have made important progress in reconstructing 3D images of such complex biological objects like the ribosome at a resolution of 25 Å (Moore, 1995). We expect in the near future a fruitful interplay between structure images of this type and of building blocks whose structure is already known at atomic resolution. Therefore, it would be useful if images of biological objects not known at atomic resolution could be included into the Image Library, too.

Anybody interested in contributing to the Image Library of Biological Macromolecules should contact the author (e-mail: jsuehnel@imb-jena.de.)

When starting this project in December 1993 we were not aware of similar attempts. Now, we know that the Protein Data Bank has almost simultaneoulsy started to include images and that the Swiss-3D-Image collection at the University of Geneva even earlier provided images of biological macromolecules. Currently, it is rather the rule than the exception that research reports on the World-Wide Web include images. Nevertheless, there are currently only four large image archives of biological macromolecules, the Protein Data Bank at Brookhaven National Laboratory, Molecules R US at the National Institutes of Health, Swiss- 3D-Image at the University of Geneva (Peitsch et al. 1995) and the Image Library of Biological Macromolecules at IMB Jena. The Nucleic Acid Database provides only a few images so far and is therefore not classified as a large image archive. There is one basic difference between PDB and Molecules R US on the one side and Swiss-3D-Image and our Image Library on the other side. The first two archives provide automatically generated images of all structures available. The disadvantage of this approach is that the images generated have a relatively low information content. On the other hand, Swiss-3D-Image and the Image Libary provide very instructive images of only a relatively small number of known structures, however. There is almost no overlap between Swiss-3D-Image and the Image Library. Swiss-3D-Image has almost no nucleic acid structures and also for proteins both archives are complementary. Insofar both archives together provide already a substantial number of high-quality images of biological macromolecules.

Acknowledgements

I am grateful to F. Haubensak for setting up the IMB Jena World-Wide Web server and to K. Mehliß for writing the program DIST, which generates distance matrices and difference distance matrices.

References

Bernstein,F. C., Koetzle,T.F., Williams,G., Mayer,E. F., Bryce,M. D., Rodgers,J. R., Kennard, O., Simanouchi,T. and Tasumi,M. (1977), The Protein Data Bank: a computer based archival file for macromolecular structures. J. Mol. Biol. 112, 535-542.

Berman,H.M., Olson,W.K., Beveridge,D.L., Westbrook,J., Gelbin,A., Demeny,T., Hsieh,A.R., Srinivasan,A.R., and Schneider,B. (1992) The Nucleic Acid Database. A comprehensive relational database of three-dimensional structures of nucleic acids. Biophys. J. 63, 751-759.

Evans,S.V. (1993) SETOR: Hardware lighted three-dimensional solid model representations of macromolecules. J. Mol. Graphics 11, 134-138.

Ferrin,T.T., Huang, C.C., Jarvis, L.E. and Langridge,R. (1988) The MIDAS display system. J. Mol. Graphics 6, 13-27, 36,37.

Godzik,A., Skolnick,J. and Kolinski,A.(1993) Regularities in interaction patterns of globular proteins. Prot. Eng. 6, 801-810 .

Hall,S.S. (1995) Protein images update natural history. Science 267, 620-624.

Lima,C.D., Wang,J.C. and Mondragon,A. (1994) Three-dimensional structure of the 67K N-terminal fragment of E. coli DNA topoisomerase. Nature 367, 138-146.

Moore,P.B. (1995) Ribosomes seen through a glass less darkly. Structure 3, 851-852.

Peitsch,M.C., Stampf,D.R., Wells, T.N.C., and Sussmann, J.L. (1995) The Swiss-3D Image collection and PDB-browser on the world-wide web. Trends Biochem. Sci. 20, 82-84.