Message Number: YG3671 | New FHL Archives Search
From: Pam Sessoms
Date: 2001-05-17 09:29:00 UTC
Subject: Searching the Archives

All about the Yahoo! Groups Archive Search Engine

It is often suggested that folks search the FHL archives to help find
answers to questions. This is a *wonderful* suggestion, and searching for
info prior to posting a question should be a standard procedure whenever
possible. Often, questions have already been answered, and you can go on
your way with no waiting for replies, and attention can be directed to
tough or perplexing cases. Even if your question is not answered, you
will very likely find something to partially answer your question or let
you fill in more needed info ahead of time.

However, I am here in my capacity as a reference librarian to tell you
that searching skills are not to be taken for granted. Not everyone knows
the best way to search, and search engines of differnet types work
differently. The best search engines are well-documented and have lots of
special commands that allow users to hone in on exactly what they want.
I'm afraid I must report that after some tinkering around, unless I have
completely missed the boat, the Yahoo! Groups search engine is neither
well-documented nor terribly flexible. Someone please fill me in if I've
missed a good online help section all about this. Here is what I have
learned by experimenting. If you test-drive these examples, in all cases,
don't put the quote marks into the search engine - they are only here to
make the message easier to read.


SUBSTRINGS
The search engine searches for substrings, that is, parts of words. If
you search for the word "bowel," you will retrieve articles including the
word "bowels". This can be a blessing or a curse. It can be a blessing
when searching for multiple forms of the same word, for example, when
searching for lymphoma information, you can just put in "lympho" and you
will get messages containing the word "lymphoma," "lymphosarcoma" or just
"lympho" as many of us say. You will also get messages containing
"lymphocyte" or "lymphocytes."

This can be a curse as well, especially when searching for short
words. Take, for example, ECE. If you search for "ECE", you will
retrieve *every single message ever posted to the FHL*. I am not making
this up. Why? You might think it's because the letters e-c-e are common
in longer words. That is true, but the search engine is searching *all*
of the message text, including the mail headers, any attachments
(including the mess of letters that make up a GIF or JPEG image encoding),
etc. Each message has a "Received:" element to the header, which includes
the fateful letters ECE. I tried putting a "space" character on either
side of "ece", but that does not help, because the search engine seems to
strip out spaces.

In most cases, you can work the substrings thing to your advantage. But
do keep it in mind at all times. I was getting very weird results at one
point, picking up lots of messages with attachments, that did not seem to
contain my search word ("pam" to try to use my own messages to test with
since I know what I've posted). Then I did a "view source" and saw that
the encoded attachments were being searched, and "pam" just randomly shows
up in there quite often, as it's only three letters.


IMPLICIT BOOLEAN "AND"
If you put more than one word into the search box, any retrieved messages
must contain BOTH words. 'nuff said, that's fine. :-) Just be aware of
it, and if you don't get enough results with a multi-word search, try
taking something out.


NO PHRASE SEARCHING
I have not found a way to make it search for a group of words together as
a phrase. It just "ANDs" things together. Quote marks don't work; it'll
just try to search for the quote marks and not find anything.


NO WAY TO BOOLEAN "OR"
Many search engines will let you put the word "or" between words to let
you retrieve messages that contain EITHER Word-A OR
Word-B. Alternatively, a search engine may give you a funny little symbol
to use in place of the word "or" that will do the same thing. For
example, it would be lovely to put "carafate OR sucralfate" into the
archive and have it retrieve messages about either brand-name Carafate or
the generic name sucralfate. But it doesn't work this way. It actually
searches for the word "or" like any other word (and "or" is so common it's
probably in every message). So the search "carafate OR
sucralfate" retrieves the same messages as "carafate sucralfate." If you
need to do an "or" type thing, try to work out a substring if possible, or
just do the search multiple times with the different words.


TO BOOLEAN "NOT"
I haven't played much with this, and in actual practice, it's not done
very often because it is VERY easy to eliminate relevant material with
this technique, but to be complete, the search engine *does* seem to allow
you to *exclude* messages with a particular word from your search. For
example, to retrieve messages containing the word "carafate" but NOT also
containing the word "sucralfate", you can search for "carafate
-sucralfate" (with the little minus sign in front of the word you want to
exclude). Bad example, but I'm having a hard time thinking of a good
one. For a giggle, search for "ferret -ece" and get NO hits, because of
the thing about ECE mentioned above, in "substrings."


AS THE GROUP GROWS...
This one isn't such a huge issue right now, but it will probably become
one the more messages we accumulate. Notice the little heading on top of
any search results - it says what messages were searched. If you get only
a small number of hits, at this time, it'll search everything and list all
your results on one page. If you get lots of hits, it'll break them down
to multiple pages and you'll have to click on "next" to see the next
page. Now, I am also on a group that is much larger than the FHL, and
searching the archives are kind of painful, because it searches the
archives in "chunks" even if you get *no hits* on that respective
chunk. So a typical result is:

Searched Messages 32577-30164 of 32577
(with no matching messages, or hits)

Then you must click "next" to search the next couple of thousand messages,
etc, to see if anything in that chunk matches, and so on, until you get to
the beginning of the archive for that list. The archives are still
invaluable, but it takes a bit of time and patience to get through one
search.


A SUGGESTION TO THOSE ANSWERING QUESTIONS
If you direct someone to the archives AND you have already done the search
yourself to make sure that relevant stuff will be in there for the person
asking the question, it helps to do one of two things. (1) Tell them
*exactly* what to type in for their search or (2) Jot down a few relevant
message numbers and relay those in your reply - that way they can put
those numbers into the "Msg # ___" dialog box on the message index or
within each individual message and zip directly to what you wanted them to
see. Of course, it may not always be possible to test drive the search
like that, but if you do it, it can help to do one of these two things to
be sure the person is able to find the same messages you did.


I should probably note that because I haven't found any official
documentation on the search engine, this is all just what I have figgured
out by trial and error. It is entirely possible that the search engine
really is much more powerful than I have found, and if you know the
secrets, please please pleaset post them!!!

Hmmm, I think that's pretty much what I wanted to say... I REALLY did not
mean this to turn into such a novel or, heaven forbid, a lecture, but here
it is. I hope it helps someone!

-Pam S., reference librarian at large. :-)