Scientific Cybernauts
by Dale Graham, Ph.D., DCRT (e-mail: degraham@helix.nih.gov)
Among the reasons that the research community has so enthusiastically embraced the Internet is the access it provides to vast repositories of scientific information and to a wealth of databases for scientific analysis. That's all fine and good, but how, in the Net's ocean of information, can an individual scientist quickly locate those sites that will be of greatest use in his or her own research?
One way to find fruitful sites is to wander around the Internet, simply using your mouse or keyboard to roam through "tunnels" on Gopher servers or to surf though the "links" between sites on the World Wide Web (WWW). Although serendipitous searching may uncover some wonderful resources, most scientists prefer a more efficient mode of exploration. "Search engines"-computer resources that can be accessed free of charge through any WWW browsing program [see box, page 9] -are what you need if you really want to soup up your research performance. These engines enable you to search for any word or combination of words in the text of a wide range of Internet sites. After the words selected for a search are entered, a list of sites will appear. Some picking and choosing might be necessary at this point. As in any computer search, if the terms are too broad, you may get a huge-and thus probably useless-list of sites. Alternatively, if you make your terms too specific, you might wind up with a "list" with nothing on it. Try to use only a couple of relatively distinctive, but not too arcane, terms to design a search of appropriate scope.
You may also find one engine superior to another for your purposes. For example, two major search engines, InfoSeek and Lycos, return some information about the site other than its name, while another, WebCrawler, just returns a list of names. Also, the criteria resulting in rankings may vary from search engine to search engine. Finally, some Internet sites may be part of one engine's index and not another's.
As a recent "Hot Methods Clinic" helps to illustrate, the ability to smoothly navigate the World Wide Web is among the most useful computing skills that a scientist can have [see March-April issue, page 12]. To provide an idea of how search results vary depending upon the engine chosen, I conducted a simple "experiment." Using the term "PCR," I performed a search on each of the three of the most-used search engines, InfoSeek, Lycos, and WebCrawler. The results follow. Note the difference in the amount of detail each engine provides about each site, as well as the fact that although they were all given the same search word, the engines ranked some of the sites in different order. In addition, some engines returned more "hits" than others, reflecting both the incidence of such sites in the engine's index and the method used to determine what information is present at a particular site.
A list of 10 sites was returned, and the top five are listed below.
The first 10 of 1,523 documents that contained the word "PCR" were printed, and the first three of those 10 are listed below.
last fetched: 02-Jul-95
bytes: 11933
links: 10
title: PanVera Catalog, PCR Kits and Primer Sets
outline: PCR Kits and Primer Sets LA PCR Kit Version 1*, 50 reactions Product Number: TAK RR011 PCR in vitro Single Site Amplification and Cloning (SSAC) Kit*, 20 reactions Product Number: TAK R015
excerpt: PanVera Catalog, PCR Kits and Primer Sets PCR Kits and Primer Sets LA PCR Kit Version 1*, 50 reactions Product Number: TAK RR011 Application Amplification of large DNA templates (up to 40 kb) Amplification of cloned inserts and genomic DNA Description PCR technology has been widely used in molecular genetics research, especially for genome analysis and sequencing studies. However, efficient amplification of DNA fragments greater than 5 kb has been problematic. The Takara LA PCR Kit is designed to overcome this limitation. The LA PCR Kit includes all the reagents necessary for amplification of large DNA templates; routine extension to 20 kb, with ...
2) http://twod.med.harvard.edu/ labgc/estep/longPCR_protocol.html
last fetched: 19-Jul-95
file date: 02-Jun-95
bytes: 6270
links: 5
title: Long PCR Protocol
outline: Long PCR Reagents and Guidelines General Guidelines for Long PCR Conditions and Enzyme Mixtures Efficient Long PCR results from the use of two polymerases: a non-proofreading polymerase is the main polymerase.
excerpt: Long PCR Protocol Long PCR Reagents and Guidelines (Modified from Cheng et al. (1) ) General Guidelines for Long PCR Conditions and Enzyme Mixtures Efficient Long PCR results from the use of two polym...
last fetched: 31-Jul-95
bytes: 1567
links: 7
The query "pcr" found 200 documents and returned 25. The first 12 are shown below. Uniform Resource Locators (URLs), which normally are not included in WebCrawler results, are included here. When used on-line, WebCrawler returns a list with the site name as a live link that enables you to access the site simply by clicking on highlighted text.
1) BioGuide,
2) PanVera Catalog,
TaKaRa PCR Products and Molecular Biology Kits, http://www.panvera.com/catalog/pcrmb.html
3) MGD: PCR Primers Query Form,
4) Long PCR Protocol,
http://twod.med.harvard.edu/labgc/estep/longPCR_protocol.html
5) RegForm: PCR,
6) College Nobel Laureate Lecture,
http://bio-stockroom1.tamu.edu/catalog/enzym.txt
8) PanVera Catalog Product Index,
9) Cookie,
10) MGD Home Page,
11) Implications for Molecular Biology in Hypertension Research,
12) List of Journals from CSHL Press,
The information in this article deals only with searching WWW, or Hypertext Transfer Protocol (HTTP), sites and not with other useful Internet sites such as Gopher or File Transfer Protocol (FTP) servers. For information on locating search engines for other kinds of Internet sites, use your WWW browser to access DCRT's Information Sheet on Internet Resources. The address, or URL, for the Information Sheet is http://www.nih.gov/dcrt/expo/infos/resources.html
To reach a search engine program, fire up a WWW browser program such as Netscape or Mosaic. If you're using Netscape, clicking on the Net Search button will take you to a page with search engine sites. Another option is to select the Open Location in Netscape or the Open URL command in Mosaic and other browsers, and then type in the Uniform Resource Locator (URL) of the search engine you want to use. Bear in mind that URLs never contain returns, tabs, or spaces. Also, remember that capital and lower case letters usually must be copied exactly.
InfoSeek Search,
The Lycos Home Page: Hunting WWW Information
Webcrawler Searching,
W3 Search Engines
This site is provided through the University of Geneva, and the search engine sites found here range from greatly useful to helpful only for searches of niche items, such as fonts.
CUSI (Configurable Unified Search Interface)
This site is maintained by Nexor UK. By filling out a single form, you can search several WWW engines.
Experimental Meta-Index
http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Demo/metaindex.html
This site not only provides access to some WWW search engines, but enables you to search Gopher servers, Wide Area Information Servers (WAIS), and other useful sites.