The Invisible Web
One of the biggest myths about the web is that is easily searchable. With the speed and accuracy of search engines like Google, people assume everything on the web is easily accessible and at their fingertips. If they haven't found what they're looking for, they figure they're doing something wrong in their search strategy. That can often be the case - and you should check out JournalismNet's Search Tips to be sure you are employing the latest, best and most relevant techniques and tricks.
But even if you're a whiz with Google and other tools, you have to know their limits. Even the best search engines tap into a small portion of the web. According to Google, the first Google index in 1998 had 26 million pages, and by 2000 Google was searching more than one billion web pages. By 2008, its engineers boasted their tool was "spidering" more than 1 trillion unique URLs.
Sounds impressive, right? It is. That's like hunting through a stack of paper more than 5,000 miles tall - a stack that would stretch from Chicago to Paris. Except that the web keeps growing at a faster page - with millions of blogs, YouTube videos, websites and other postings going up every hour.
And there is a bigger problem. All the search engines combined - Google and all its competitors - only scratch the surface of what's out there on the Internet. The rest remains largely unexplored as buried treasures. That's why it is called the "Invisible Web" or the "Deep Web."
How big is the invisible web? According to one search company, BrightPlanet, the invisible web could be as much as 500 times bigger than the searchable web!
What's buried in the invisible web? Well, obviously sites that intentionally keep their pages from being searched by search engine spiders - private networks, like intranets, like companies use for internal consumption; secure databases from banks and governments. Then there are public sites like universities, journals or other institutions that might only allow access with a password.
Finally, there are what are called dynamic searchable databases. Think about what happens when you visit a government web site that contains public information on companies and shareholders. You make a request for a listing of directors of Company ABC and up pops a list of their names and affiliations. But that web page did not exist until you made that request. The same happens when you ask a university library site to find all the books about "blue whales." The information is stored in databases, retrieved when you make a request and displayed on a web page that will disappear after you finish.
In most cases, Google and other search engines can't spider those kinds of web pages and won't list them in their search results.
All is not lost. There are ways you can hunt through the invisible web. You can't get into password-protected sites or private networks (though they sometimes get hacked or cracked!). But there are ways to hunt through the many databases.
One easy trick is to just add the word database to your search keywords. For example, try a search for:
"toxic chemicals" water Arizona databases
This will get you quite different results than if you don't put in the word "database" including links to some possibly very useful databases.
There are also several specialty search engines that offer ways to dig through the invisible web. One of my favorite is Infomine, put together by librarians from the University of California and other educational centers. It contains a vast amount of databases, electronic journals, electronic books, bulletin boards, mailing lists, online library card catalogs, articles and directories of researchers - searchable by keyword or by topic.
Librarians in general can help you navigate through the invisible web and you'll find a listing of helpful library sites at JournalismNet's Library Help page. In particular, Gary Price, a librarian and information research consultant, maintains an excellent resource at http://gwu.edu/~gprice/direct.htm.
The University of Idaho also maintains a listing of over 5000 websites describing the holdings of manuscripts, archives, rare books, historical photographs, and other primary sources for the research scholar. [http://uidaho.edu/special-collections/Other.Repositories.html]
Another valuable tool is the Directory of Open Access Journals. This service covers free, full text, quality controlled scientific and scholarly journals.
Finally, a site that aims to catalaogue all the world's libraries is called WorldCat. It searches many libraries for books and research articles and even music, and videos to check out.
You'll find similar tools listed on JNet's Academic Search Page at http://www.peoplesearchpro.com/journalism/search/academic.htm.
So don't get discouraged. Much of the web may remain invisible but there are ways to uncover some hidden gems.
Leave a comment