Saturday, January 28, 2017

Introduction to Search Engines

So you were assigned to create a PowerPoint on the effectiveness of Logos for companies so you hit up a search engine and of course you’re going to find your answer in no time! You type in logos and you get hundreds of hits of companies that are promising you the best prices in helping you create your own company logo. Wait, what? No!!! That’s not what I need! You look at your classmate sitting next to you and within ten minutes she has already printed tons of information and is basically ready to go. Panic sets in and you come into terms with the idea that you will fail this marketing class and you will never graduate from college! Maybe your boss at your part time job at the Tiki Post will consider you for a full time position. Eeeek!






Yes, if you don’t know how to navigate the web in 2017 you might feel a bit, perhaps, inadequate? The question is why? Fear no more! There is no shame in not knowing the basics, and you know what? I guarantee you that you’re not alone! Another interesting fact that you might find comforting is that many web savvy individuals might also benefit from this, your guide for Searching the Web for Dummies! And you’re no dummy at all! Soon, you will be an expert and will no longer have to feign disinterest when people talk about new things out there because you will be well versed in the subject matter. So sit back, relax, pay attention, and get ready to learn!


DISCLAIMER: I too, before taking this tutorial, was not 100% sure of my skills. This is a breakdown of what I learned in a simplified (I HOPE!) version. However, some of the information, because it is practically impossible to simplify, is as it appears on the website. 



Search Engines: a Definition

First Things First…What are search engines?


Of course, the first thing you must understand even if you think you alrleady understand is the definition of what a search engine is. A search engine is a compilation of databases of web page files that are automatically put together by a machine. But what exactly does this mean?

That somebody else is already doing half the work for you. 

These databases hold the information you are looking for and have been assembled in such a way that when you search for something, they look for key words and are able to provide you with what they believe to be the closest option to what you’re searching for. 

Huh? 

Key words are indexed and filed so that they are easily recognized within each search. These engines try to keep updated so that the results you get are the most recent. If you look closely, after you search, you get your results and they are divided by tabs. One includes everything that is found, another tab holds images, another videos. When it does this, it allows you to distinguish between what is new and what has been there for a while. 

Easy, huh?


Undoubtedly search engines are great because they, well, search a large portion of the web and they’re obviously the best method that exist today to search the web, but they also have their disadvantages. Bummer! Because of the vast amount of information that is out there, you have to be extremely meticulous about how you key in your searches because you will get hundreds of hits since these engines are looking for keywords. This is when we run into the trying to determine which hit is the best one and well, that can be rather time consuming.



Another thing that you must knwo is that there are two different types of search engines: The individual which compile their own searchable databases on the web (explanation will follow) and the Metasearchers which do not compile databases but instead search the databases of multiple sets of individual engines simultaneously (will follow). 

Metasearchers: a Definition

Note: If as you read through these sections, you feel a bit overwhelmed, don't worry. That is perfectly normal. If all this information doesn't make sense to you yet, it will as you continue reading. We promise!

Okay, here it is: 

As the prefix “meta” suggests, it goes beyond what an ordinary search engine does and it goes 
“beyond” by searching the multiple sets of individual search engines simultaneously. Metasearch engines let you know which engines are retrieving the best results for you in your search.


Huh? I know! Here’s a breakdown:

These engines give you the result in either a single list or multiple lists. Most give you the single list which, like its name suggests, is just one list with no duplicate entries. The multiple lists gives you multiple results displayed in separate lists according to the search engine.
The main thing to understand here is the Pros and Cons. 


The biggest pro is that these metasearch engines are super-fast.  Which is what we all want...Awesome! Right? Well, this is too good to be true because the major con in relation to their speed is that the results it gives back can be hundreds and hundreds of hits and so you’re stuck trying to eliminate the good sources from the bad. No Bueno!



Examples of these engines are Dogpile and Mamma.

Subject Directories: a Definition



Feeling Overwhelmed? Again, we will tell you the same thing, don’t be! We are still breaking all of this down for you so that you can process it slowly.

Let’s talk about subject directories. These directories are controlled by humans. Editors review and select sites to include in their directories that follow specific criteria for selection. These directories work with keywords that are matched with the written descriptions provided by the editors behind everything. This process is time consuming for them, but it makes it more efficient for you. These editors usually just index the home page or the top levels of the site. So this means that your keywords are matched with all kinds of other directories such as general, academic, commercial, portals and vertical portals. Portals link popular subject categories, and offer other services like email, current news, stock quotes, travel information, and maps. You will learn about vortals in the following chapter.

A definite pro of subject directories is that it gets fewer hits, therefore; it is more likely to get more accurate results. Less time consuming, right? Unfortunately, a most definite con is that it very often provides dead link results.

Links to Subject Directories

Links to Portals (subject directories serving as home pages)


Library Gateways and Specialized Databases: a Definition

In this chapter you will learn about the sites that you want to be familiar with if you are researching topics that will aid you in for example, assignments such as research papers or informational presentations. Why? Wait and you will see...

There are two kinds of databases: Library gateways and portals (See previous chapter).


Library gateways are collections that contain databases and informational sites that have been arranged by subject, assembled, reviewed and recommended by specialists, usually librarians. The results you get are academically oriented pages on the web--like we said before, these are the ones you want to target when the information you need needs to be extra reliable. 

 If you ask us, Library Gateways are similar to our old fashioned, good ole' books. If you are older than thirty-five, you might remember the agony of doing your research paper during your senior year in high school and having to predominantly use books for sources because your high school library was barely getting Internet. Yup! That might have been us! 




Now what are these vortals we speak of? They are subject specific databases.

Subject-specific databases, or vortals (i.e., "vertical portals") are databases that are a bit more specific since they are devoted to a single subject and are created by subject experts such as professors, researchers, governmental agencies, business interests, and other subject specialists and/or individuals who have a deep interest in the subject and professional knowledge of a particular field.

If you ask us, these are the databases you should be hitting up since they are the more reliable sources.

Now don’t get confused because we are about to uncover a top secret little something we learned about the web: THE INVISIBLE WEB. Wait! What? Do you have an app on your smartphone that hides specific data, whether it is passwords, or confidential files? Well, the invisible web is somehow similar to this.

About 60 to 80 percent of existing web material is made up of thousands of documents that are hidden behind password protected sites, firewalls, and are archived. Erroneously, people assume that such documents are irretrievable. Although they are not visible to search engine spiders, today’s search engines are learning to find and index the contents of these “Invisible Web” pages. To find them you must point your browser directly at them and that’s what the library gateways and subject-specific databases do.

So really, all we need to navigate through these sites is the right gateway.
If you’re wondering when it is best to use which, wonder no more… 

Library Gateways are best when you are looking high quality information. Subject-Specific Databases are best when used to get, can you guess it? You’re right! Specific information on a subject!


Educator's Reference Desk (educational information)
Expedia (travel)
Jumbo Software (computer software)
Kelley Blue Book (car values)
Monster Board (jobs)
Motley Fool (personal investment)
MySimon (comparison shopping)
Roller Coaster Database (roller coasters)
Voice of the Shuttle (humanities research)
WebMD (health information)

Evaluating Web Pages



So maybe the first four chapters might have been a bit overwhelming because let’s be honest, although great, this information is a bit intimidating. Intimidating in the sense that we might not have been familiar with the concepts and even terminology so naturally that makes it a bit scary. Now this chapter gets a little bit more real and a whole lot easier to understand. Why? Because we will tell you how to determine if the information you’re gathering is being taken from reliable sources, because after all, you don’t want to get your description of dinosaurs from a website created by an aficionado who is NOT an expert in dinosaurs. Or do you?

The first step to success is to learn to read the website address. Every aspect of an address is crucial in determining how reliable the site is.

 Let’s start with the basics. Look at this url:

This is what everything means:
  • "http" means hypertext transfer protocol and refers to the format used to transfer and deal with information
  • "www" stands for World Wide Web and is the general name for the host server that supports text, graphics, sound files, etc. (It is not an essential part of the address, and some sites choose not to use it)
  • "sc" is the second-level domain name and usually designates the server's location, in this case, the University of South Carolina
  • "edu" is the top-level domain name (see below)
  • "beaufort" is the directory name
  • "library" is the sub-directory name
  • "pages" and "bones" are folder and sub-folder names
  • the second "bones" is the file name
  • "shtml" is the file type extension and, in this case, stands for "scripted hypertext mark-up language" (that's the language the computer reads).  The addition of the "s" indicates that the server will scan the page for commands that require additional insertion before the page is sent to the user.
These are the domains that are currently recognized:
  • .edu -- educational site (usually a university or college)
  • .com -- commercial business site
  • .gov -- U.S. governmental/non-military site
  • .mil -- U.S. military sites and agencies
  • .net -- networks, internet service providers, organizations
  • .org -- U.S. non-profit organizations and others
These are the new domains that are either starting to be taken into effect, or will soon:
  • .aero -- restricted use by air transportation industry
  • .biz -- general use by businesses
  • .coop -- restricted use by cooperatives
  • .info -- general use by both commercial and non-commercial sites
  • .museum -- restricted use by museums
  • .name -- general use by individuals
  • .pro -- restricted use by certified professionals and professional entities


DETERMINING PAGE AUTHORSHIP

You obviously need to know where the information is coming from and most importantly, who is putting it out there. So first, you have to learn about the author/publisher because you need to know what their views/opinions/purpose, etc. are founded on. So here is what you have to ask  yourself:

1.      Who is responsible for the page you are accessing? Is it a governmental agency or other official source? A university? A business, corporation or other commercial interest? An individual?

It is safe to say that you can trust the GOV and EDU hostnames to present accurate information. 

The NET, ORG, MIL, and COM domains are more likely to host pages with their own personal or organizational agendas and might require additional verification.

CHECKING THE VITAL INFORMATION
A trustworthy Web page will more than likely provide you with this information:
  • Last date page updated
  • Mail-to link for questions, comments
  • Name, address, telephone number, and email address of page owner
Now ask yourself this: If the page owner is not readily recognizable, does he provide you with credentials or some information on his sources or authority?

CHECKING THE CONTENT

If it’s on the Internet, then it must be true! Right? Wrong! You have to be careful in disseminating the information you’re getting. It is safe to assume that scholarly books and journal articles are reviewed, but who reviews the websites or checks for biases? Can the information you are finding be verified? Also, it is important to consider how often the information is updated. What may have been posted yesterday may be changed tomorrow. Check!

Creating a Search Strategy

Now that you have an idea of what Web Pages are reliable and which ones are not, you need to create a plan. And obviously in order to do this, you must know what your purpose is. What are you looking for?

Do you want to:
  1. Browse?
  2. Locate a specific piece of information?
  3. Retrieve everything I can on the subject?
Your answer will determine how you conduct your search and what tools you will use, and also, how you word your searches. 
  1. If you're browsing and trying to determine what's available in your subject area, start out by selecting a subject directory like Yahoo! Then, enter your search keyword(s) into one of the metasearch engines, such as Vivisimo, just to see what's out there.
  2. If you're looking for a specific piece of information, go to a major search engine such as Google, or to a specialized database such as Bureau of the Census (for statistics).
  3. If you want to retrieve everything you can on a subject, try the same search on several search engines. Also, don't forget to check resources off the Web, such as books, newspapers, journals and other print reference sources.
Now here is the tricky stuff…
If you are not specific, these engines can add the words “and” or “or” to link your words together. For obvious reasons, this alters your results and you might not get what you are looking for. Sometimes the words can be ignored and the engine recognizes words separately and the results are irrelevant and ineffective. There is a list of words known as “stop” words and are usually cut out to cut down response time. These words can be “a, about, an, and, are, as, at, be, by, from, how, I, in, is, it, of, on, that, the, this, we, what, when, where, which, with, etc.” If for example the phrase you are looking for has to have one of these stop words, you might want to consider using “quotations” around them.



The following are effective search statements:
CREATING A SEARCH STATEMENT
  • Be specific
        EXAMPLE:    Hurricane Hugo
     
  • Whenever possible, use nouns and objects as keywords
        EXAMPLE:    fiesta dinnerware plates cups saucers
     
  • Put most important terms first in your keyword list; to ensure that they will be searched, put a +sign in front of each one
        EXAMPLE:    +hybrid +electric +gas +vehicles
     
  • Use at least three keywords in your query
        EXAMPLE:    interaction vitamins drugs
     
  • Combine keywords, whenever possible, into phrases
        EXAMPLE:    "search engine tutorial"
     
  • Avoid common words, e.g., water, unless they're part of a phrase
        EXAMPLE:    "bottled water"
     
  • Think about words you'd expect to find in the body of the page, and use them as keywords
        EXAMPLE:    anorexia bulimia eating disorder
     
  • Write down your search statement and revise it before you type it into a search engine query box
        EXAMPLE:   +"south carolina" +"financial aid" +applications  +grants

Basic Search Tips

Still feeling nervous about your searching “skills”? Don’t be silly! It only gets easier from now on… Remember your earliler search for LOGOS? Well, with these tips you will very likely eliminate hits of companies that are trying to sell you a logo for your company, company that you don't own...at least not yet!
  • If you want to make sure a word is either included or excluded from the search, then you musts use the plus (+) and minus (-) signs in front of words to force their inclusion and/or exclusion in searches with NO space between the sign and the keyword
.
EXAMPLE:   +cookies  -chocolate
(NO space between the sign and the keyword)
 

  • If you want to make sure your search phrase is found exactly as is, you must use quotation marks around the entire phrase.
    EXAMPLE:   "I fell in love with a beautiful stranger"
    (Do NOT put quotation marks around a single word.)
     
  • Put your most important keywords first in the string.
    EXAMPLE:   dog breed family pet choose
     
  • Do not type search in capital letters since it will return only exact matches. Type keywords and phrases in lower case to find both lower and upper case versions.
    EXAMPLE:   governor retrieves both governor and Governor
     
  • Use truncation (or stemming) and wildcards (e.g., *) to look for variations in spelling and word form.
    EXAMPLE:    librar* returns library, libraries, librarian, etc.
    EXAMPLE:    colo*r returns color (American spelling) and colour (British spelling)
     
  • Combine phrases with keywords, using the double quotes and the plus (+) and/or minus (-) signs.
    EXAMPLE:  +cowboys +"wild west" -football -dallas
    (In this case, if you use a keyword with a +sign, you must put the +sign in front of the phrase as well. When searching for a phrase alone, the +sign is not necessary.)

Searching with Boolean Logic and Proximity Operators



Okay, now it might sound as though it is getting complicated again, but it’s not! Trust us! Just stay with us and you will see for yourself!


WHAT'S A "BOOLEAN"?
“Boolean logic takes its name from British mathematician George Boole (1815-1864), who wrote about a system of logic designed to produce better search results by formulating precise queries. He called it the "calculus of thought." From his writings, we have derived Boolean logic and its operators: AND, OR, and NOT, which we use to link words and phrases for more precise queries.”

Okay, so here is the simplified version. Basically, what Boolean is, is just a way created by this British mathematician to make searching the web easier and more precise. How? By using the following key words: 

"AND"

Using AND on your search actually narrows your search by retrieving only documents that contain every single one of the keywords you enter. The more terms you enter, the narrower your search becomes. Voila! Really? Yes, really!

     EXAMPLE:   truth AND justice 
     EXAMPLE:   
truth AND justice AND ethics AND congress

"OR"

Using OR expands your search by returning documents in which either or both keywords appear. Since the OR operator is usually used for keywords that are similar or synonymous, the more keywords you enter, the more documents you will retrieve.

     EXAMPLE:   college OR university
     EXAMPLE:   
college OR university OR institution OR campus

"NOT" / "AND NOT"
NOT or AND NOT (sometimes typed as ANDNOT) limits your search by returning only your first keyword but not the second, even if the first word appears in that document, too.

     EXAMPLE:   saturn AND NOT car
     EXAMPLE:   pepsi AND NOT coke

NESTING
Nesting, i.e., using parentheses, is an effective way to combine several search statements into one search statement. Use parentheses to separate keywords when you are using more than one operator and three or more keywords.

     EXAMPLE:  (hybrid OR electric) AND (Toyota OR Honda)
     (For best results, always enclose OR statements in parentheses.)

BOOLEAN LOGIC REDUX
This is where it gets a bit complicated since not all search engines read Boolean logic. So what does this mean? Basically, you just have to give it a try and find out. Not cool, huh? Well, let’s look at it on the bright side. It’s a start! And it’s a tip that is worth trying.

IMPLIED BOOLEAN OPERATORS
You can also use the (+) or (-) signs and it has the same effect as using the words and or not.

     EXAMPLE:  +dementia -alzheimers 
Similarly, putting double quotation marks (" ") around two or more words will force them to be searched as a phrase in that exact order.

Field Searching

So what if you are looking for something very specific, something you already know is out there? How do you search for that? Lucky for you, there is such a thing as field searching and this is what it does: 

Field searching allows you to specifically search the search engine for a particular Web document.

What does this mean? What it means is that you are able to search even more specifically because you will be targeting a desired result.

TITLE SEARCHING

If you know the title of the article you are looking for, you can simply type it in quotation marks and BOOM! You are done!

     EXAMPLE:    title: "web search tutorial"

If titles have only one word entries, this might not work very well.

DOMAIN SEARCHING

If you are seeking information from a particular kind of site, you may choose to limit your field search to one of the current top level domains (see below, discussed earlier):
  • edu -- educational site
  • com -- commercial business site
  • gov -- U.S. governmental/non-military site
  • mil -- U.S. military sites and agencies
  • net -- networks, internet service providers, organizations
  • org -- U.S. non-profit organizations and others
     EXAMPLE:  domain:edu AND "On the Origin of Species" AND Darwin AND paleontology 
limits your search to educational sites dealing with Charles Darwin and his theory of evolution.
Some search engines have an advanced search option and will allow you to limit your search to a specific domain by the use of drop-down menus. One, SearchEdu, does it for you by limiting its basic search option to the .edu domain exclusively.
If you are seeking information from a particular international domain, you may choose to search the domain geographically using the two-letter country code.

     EXAMPLE:     domain:UK AND "Edward de Vere 17th Earl of Oxford"

This limits your search to sites in the United Kingdom dealing with the Shakespearean authorship question.


NOTE: Because the Internet was created in this country, US was not originally assigned as a country letter code to U.S. domain names; however, it is used to designate state and local government hosts, including many public schools and some community colleges. Other countries have their own two letter codes as the final part of their hostnames, e.g., UK for United Kingdom; CA for Canada; FR for France, etc.
For a list of Internet Country Codes, go to: ISO's List of Country Codes

HOST (OR SITE) SEARCHING

If you are seeking information that resides on a specific computer or server, you can narrow your search with a "host" or "site" query.

     EXAMPLE:   host:www.sc.edu

returns pages hosted at the University of South Carolina.

URL SEARCHING

If you are seeking a specific file, and that file's name is part of the host site's URL, you may find it more quickly by choosing a URL search.

     EXAMPLE:   url:bck2skol

returns sites in which the filename, bck2skol, (my old course for Internet "newbies") is incorporated into the URL.

LINK SEARCHING

If you have a web page and would like to know who is linking to it, or if you would like to see who is linking to a particular page of interest, you may choose a LINK search.

     EXAMPLE:   link:www.sc.edu/beaufort/

returns pages with links to my campus of the University of South Carolina.

IMAGE SEARCHING


If you want to find a particular image on the web, you may choose an IMAGE search. You will need to specify the image by name, which works well if the name is part of the image file name. If not, you may miss that particular image altogether.

     EXAMPLE:   IMAGE:bones.gif

(Actually, I found the "dancing bones" logo that I use for this tutorial with a Boolean search as follows:  "free gifs" AND bones)



Troubleshooting


Ideally, in a perfect world, you will run into absolutely no issues while searching the web and your results will always be favorable, but yeah, that's not going to happen. There will be times when even if you word your searches as meticulously as you possibly can, you will encounter an issue here and there. But don't worry! That is perfectly normal. You are not doing anything wrong. It could be that you still need a little bit more practice so just be patient and you will conquer. The following are a few of the issues that you may come across. 

TOO MANY: If your search gives you a ridiculous amount of results and it is practically impossible to look through them all! Well then you probably used a single term that is too common so to fix this you need to incorporate synonyms to get better luck.  

TOO FEW: You're on the wrong site or your search is too narrow. To fix this, try to omit some search terms. Try your search on another engine: metasearcher, directory, people search, or specialty resource. Ask for help.

"404 -- FILE NOT FOUND" MESSAGE: This message tells you that the file you has been moved, removed, or renamed. To fix this, you must go back to the search engine and do a phrase search or a field search on the title. Try shortening the URL to see if the file might still be on the same server.

SERVER DOES NOT HAVE A DNS ENTRY" MESSAGE: This means that your browser can't locate the server (i.e. the computer that hosts the Web page). It could mean that the network is busy or that the server has been removed or taken down for maintenance. For better results, check your spelling and try again later. Be Patient!

"SERVER ERROR" OR "SERVER IS BUSY" MESSAGE: Be patient! The server you are attempting to contact may be offline, may have crashed, or may be very busy. Try again later. Again, be patient!

HOME PAGE is NOWHERE TO BE FOUND: Change it up! Try to guess, experiment with different top-level domain names by using the name, brief name, or the acronym.