Stop Word Considerations
Stop words are words which will be ignored by the search program.
The following explains issues regarding the use of stop words:
- In a search box which implicitly joins words with boolean AND but
which does not recognize the word 'and' as a boolean term, the
search 'moon and misbegotten' would get hits only on records which
had all three terms 'moon', 'misbegotten', 'and'. The user might,
therefore, get fewer hits than expected (e.g. A Moon for the
Misbegotten would not be retrieved). Having 'and' as a stop
word would increase the number of hits.
- In a search box which implicitly joins words with boolean OR, the
search
the journal of stop words
would retrieve every
record which contained the word 'the'! In this case, a list of stop
words would reduce unexpectedly numerous hits. That being said, it is
unusual to have search box implicitly OR words together.
- In addition to affecting retrieval, stop words may be used to
reduce index size. Whether or not index size is an issue will depend
on the programming and the quantity of data being indexed. In many
cases it is not a problem and so does not need to be a consideration.
The programmer will determine if it is an issue.
If stop words are necessitated by the type of data and the
functionality of the search boxes, the words which should be
defined as stop words may vary according to the data being searched
(full text vs. metadata, free text vs. controlled vocabulary,
languages represented, etc.).
- Some common stop words include:
- the
- a
- an
- and
- of
- foreign equivalents to the above