Home \| Previous Page

Simple Search Mode Reference Guide

This section describes each Verity operator in detail. Where appropriate, each description includes an example of simple syntax and explicit syntax. Operators are listed alphabetically.

<accrue>

Selects documents that include at least one of the search elements you specify. Valid search elements are two or more words or phrases. Retrieved documents are relevance-ranked.

The <accrue> operator scores retrieved documents according to the presence of each search element in the document using "the more, the better" approach: the more search elements found in the document, the better the document's score. Following are examples of search syntax.

To select documents containing stemmed variations of the words "computers" and "laptops," you can enter any of the following:

computers <accrue> laptops
computers, laptops
<accrue> (computers, laptops)

and

Selects documents that contain all of the search elements you specify. Documents retrieved using the and operator are relevance-ranked. Following are examples of search syntax.

To select documents which contain stemmed variations of the phrase "pharmaceutical companies" and stemmed variations of the word "stock," you can enter the following:

pharmaceutical companies and stock

Only those documents that contain both search elements, or stemmed variations of them (for example, "pharmaceutical company," "stocks," etc.), are retrieved and ranked according to their scores.

<contains>

Selects documents by matching the word or phrase you specify with values stored in a specific document field. Documents are selected only if the search elements specified appear in the same sequential and contiguous order in the field value. When you use the <contains> operator, you specify the field name to search, and the word or phrase to search for.

With the <contains> operator, the words stored in a document field are interpreted as individual, sequential units. You can specify one or more of these units as search criteria. To specify multiple words, each word must be sequential and contiguous, and must be separated by a blank space.

For example, the following title contains eight sequential words:

American Version of 'Orient Express' Offers Opulent Ride

American
Version
of
Orient
Express
Offers
Opulent
Ride

The following examples demonstrate how you can use the <contains> operator with sequential, contiguous words to match the document title listed above, assuming it is stored in a title field:

title <contains> American Version

title <contains> Express Offers

The following examples show how you can use a question mark (?) to represent individual variable characters of a word, and an asterisk (*) to match multiple variable characters of a word:

title <contains> Amer* Version

title <contains> Version of Or????

Question marks and asterisks cannot be used to represent white space that appears between words.

The <contains> operator does not recognize nonalphanumeric characters. The <contains> operator interprets nonalphanumeric characters as spaces and treats the separated values as individual units.

For example, if you have defined a dash (-) as a valid character, and you enter search criteria that include this character, as in on-line, the value is defined as two individual units, as follows:

title <contains> on line

<ends>

Selects documents by matching the character string you specify with the ending characters of the values stored in a specific document field. For example, assume a document field named author has been defined. To select documents written by Milner, Wagner, and Faulkner, you can enter the following:

author <ends> ner

= (equals)

Selects documents whose document field values are exactly the same as the search string you specify. For example, assume a document field named date has been defined. To select only those documents dated October 24, 1992, you can enter the following:

date = 10-24-92

> (greater than)

Selects documents whose document field values are greater than the search string you specify. For example, assume a document field named date has been defined. To select only those documents dated after October 24, 1992, you can enter the :

date > 10-24-92

>= (greater than or equal to)

Selects documents whose document field values are greater than or equal to the search string you specify. For example, assume a document field named date has been defined. To select only those documents dated on or after October 24, 1992, you can enter the following:

date >= 10-24-92

< (less than)

Selects documents whose document field values are less than the search string you specify. For example, assume a document field named date has been defined. To select only those documents dated before February 14, 1991, you can enter the :

date < 02-14-91

<= (less than or equal to)

Selects documents whose document field values are less than or equal to the search string you specify. For example, assume a document field named date has been defined. To select only those documents dated prior to and including February 14, 1991, you can enter the following:

date <= 02-14-91

<in>

Selects documents that contain specified values in one or more document zones. A document zone represents a region of a document, such as the document's summary, date, or body text. The <in> operator works only if document zones have been defined in your collections. If you use the <in> operator to search collections in which zones are not defined, no documents will be selected. In addition, the zone name you specify must match the zone names defined in your collections. Consult your collection administrator to determine which zones have been defined for specific collections.

The <in> operator can be qualified with the <when> operator, to search for a term only within the one or more zones upon which certain conditions have been placed. Use of the when operator is described below.

The following query expression searches document zones named summary for the word safety.

"safety" <in> summary

To search with multiple words, phrases, or topics enclose them in parentheses. The following query expression searches document zones named summary for the word safety and stemmed variations of the word warning.

("safety", warning) <in> summary

To search multiple zones, separate them with commas and enclose them in parentheses. The following query expression searches both the summary zone and the title zone for the word safety and stemmed variations of the word warning.

("safety", warning) <in> (summary, title)

You must enclose query expressions containing commas in parentheses. The following example searches the summary zone for the word safety and stemmed variations of the phrase environmental regulation.

("safety", environmental regulation) <in> summary

The following query expression searches both the summary zone and the title zone for the word safety and stemmed variations of the phrase environmental regulation.

("safety", environmental regulation) <in> (summary, title)

<matches>

Selects documents by matching the character string you specify with values stored in a specific document field. Documents are selected only if the search elements specified match the field value exactly. If a partial match is found, a document is not selected. When you use the <matches> operator, you specify the field name to search, and the word, phrase, or number to search for.

Unlike the <contains> operator, the search criteria you specify with a <matches> operator must match the field value exactly for a document to be selected. With the <matches> operator, any occurrence of a search string that appears as a portion of a value is not selected; only values matching the entire search string are selected.

You can use question marks (?) to represent individual variable characters within a string, and asterisks (*) to match multiple characters within a string.

For example, assume a document field named source includes the following values:

computer

computerworld

computer currents

pc computing

To locate documents whose source is computer, the <matches> operator is used as follows:

source <matches> computer

Here, the <matches> operator matches computer, but not computerworld, computer currents, or pc computing.

To locate documents whose source is computerworld, the <matches> operator is used as follows:

source <matches> computer?????

Now, the <matches> operator matches computerworld, since each question mark (?) represents specific character positions within the string. computer and computer currents are not matched, because their character strings do not match the length represented by the specific character positions.

To locate documents whose sources are computer, computerworld, and computer currents, the <matches> operator is used as follows: source <matches> computer*

Here, the <matches> operator matches computer, computerworld, and computer currents, since the asterisk (*) represents zero or more variable characters at the end of the string.

To locate documents whose sources include computer, computerworld, computer currents, and pc computing, the <matches> operator can be used as follows: source <matches> *comput*

Now, the <matches> operator matches all four occurrences, since the asterisk (*) represents a string of characters of any length.

<near>

Selects documents containing specified search terms within close proximity to each other. Document scores are calculated based on the relative number of words between search terms. For example, if the search expression includes two words, and those words occur next to each other in a document (so that the region size is two words long), then the score assigned to that document is 1.0. Thus, the document with the smallest possible region containing all search terms always receives the highest score. Documents whose search terms are not within 1000 words of each other are not selected, since the search terms are probably too far apart to be meaningful within the context of the document.

The near operator is similar to the other proximity operators in the sense that the search words you enter must be found within close proximity of one another. However, unlike other proximity operators, the near operator calculates relative proximity and assigns scores based on its calculations.

To retrieve relevance-ranked documents that contain stemmed variations of the words "war" and "peace" within close proximity to each other, you can enter the following: war <near> peace

<near/n>

Selects documents containing two or more words within N number of words of each other, where N is an integer. Document scores are calculated based on the relative distance of the specified words when they are separated by N words or less.

For example, if the search expression near/5 is used to find two words within five words of each other, a document that has the specified words within three words of each other is scored higher than a document that has the specified words within five words of each other.

The N variable can be an integer between 1 and 1,024, where near/1 searches for two words that are next to each other. If N is 1,000 or above, you must specify its value without commas, as in near/1000. You can specify multiple search terms using multiple instances of near/N, as long as the value of N is the same.

For example, to retrieve relevance-ranked documents that contain stemmed variations of the words "commute," "bicycle," "train," and "bus" within 10 words of each other, you can enter the following: commute <near/10> bicycle <near/10> train <near/10> bus

You can use the near/N operator with the order modifier to perform ordered proximity searches. For more information about the order modifier, see "order" in this appendix.

<or>

Selects documents that show evidence of at least one of your search elements. Documents selected using the or operator are relevance-ranked.

To select documents that contain stemmed variations of the word "election" or the phrases "national elections" or "senatorial race", you can enter the following: election or national elections or senatorial race

Only those documents that contain at least one of the search elements, or a stemmed variation of at least one of them, are retrieved and ranked according to their scores.

<paragraph>

Selects documents that include all of the search elements you specify within a paragraph. Valid search elements are two or more words or phrases. You can specify search elements in a sequential or a random order. Documents are retrieved as long as search elements appear in the same paragraph.

To retrieve relevance-ranked documents that contain stemmed variations of the word "drug" and the phrase "cancer treating" in the same paragraph, you can enter the following: drug <paragraph> cancer treating

To search for three or more words or phrases, you must use the paragraph operator between each word or phrase.

You can use the paragraph operator with the order modifier to perform ordered proximity searches. For more information about the order modifier, see "order" in this appendix.

<phrase>

Selects documents that include a phrase you specify. A phrase is a grouping of two or more words that occur next to each other in a specific order.

By default, two or more words separated by a space are considered to be a phrase in simple syntax. In addition, two or more words enclosed in double quotes are considered to be a phrase. To retrieve relevance-ranked documents that contain the phrase "mission oak," you can enter any of the following: mission oak "mission oak" mission <phrase> oak <phrase> (mission, oak)

<sentence>

Selects documents that include all of the words you specify within a sentence. You can specify search elements in a sequential or a random order. Documents are retrieved as long as search elements appear in the same sentence.

To retrieve relevance -ranked documents that contain stemmed variations of the words "American," and "innovation" within the same sentence, you can enter the following: american <sentence> innovation <sentence> (american, innovation)

You can use the sentence operator with the order modifier to perform ordered proximity searches. For more information about the order modifier, see "order" in this appendix.

<starts>

Selects documents by matching the character string you specify with the starting characters of the values stored in a specific document field. For example, assume a document field named reporter has been defined. To retrieve documents written by Jack, Jackson, and Jacks, you can enter the following: reporter <starts> jack

<stem>

Selects documents that include one or more variations of the search word you specify. For example, to retrieve documents containing a variation of the word "film," you can enter the following: <stem> film

The documents retrieved will include words such as "films," "filmed," and "filming." Documents are not relevance-ranked unless the many modifier is used, as in: <many><stem> film

<substring>

Selects documents by matching the character string you specify with a portion of the strings of the values stored in a specific document field. The characters that comprise the string can occur at the beginning of a field value, within a field value, or at the end of a field value.

For example, assume a document field named title has been defined. To retrieve documents whose titles contain words such as "solution," "resolution," "solve," and "resolve," you can enter the following: title <substring> sol

<thesaurus>

Selects documents that contain one or more synonyms of the word you specify. For example, to retrieve documents containing synonyms of the word "altitude" you can enter the following: <thesaurus> altitude

The documents retrieved will include words such as "height" or "elevation." Documents are not relevance-ranked unless the many modifier is used, as in: <many><thesaurus> altitude

<typo/n>

Selects documents that contain the word you specify plus words that are similar to the query term. The typo/N operator performs "approximate pattern matching" to identify similar words. This makes it ideal for use in an environment where documents have been scanned using an Optical Character Reader (ocr).

The optional N variable in the operator name expresses the maximum number of errors between the query term and a matched term, a value called the error distance. If N is not specified, an error distance of 2 is used.

The error distance between two words is based on the calculation of errors, where an error is defined to be a character insertion, deletion, or transposition. For example, for these sets of words, the second word matches the first within an error distance of 1: mouse, house (m\xde h) agreed, greed (a is deleted) cat, coat (o is inserted)

For the query below, documents with the words "sweeping" and "swimming" will match, since there are 3 transpositions in the word (e\xde i, e\xde m, p\xde m).

<typo/3> sweeping

Both of the queries below will return the same results. Documents containing the words "swept" and "kept" will match, since the "kept" word contains 1 transposition, 1 deletion. <typo/2> swept <typo> swept

The typo/N operator must scan the collection's word list in order to find candidate matching words. This makes it impractical for use in large collections (greater than 100,000 documents unless a current spanning word list is available) or in performance-sensitive environments. Performance can be improved by generating a spanning word list for the collections to be used.

note: Please note these limitations. A query term specified with typo/N can have a maximum length of 32 characters. Also, typo/N is not supported with multi-byte character sets.

<when>

Selects documents that contain specified values in one or more document zones upon which certain conditions have been placed. The following examples illustrate searching for terms within a zone upon which certain conditions have been placed.

Say you want to search for the word "here" in a zone named "A," whose href attribute contains the string "verity," and the text looks like this: Our site is <A href = "www.verity.com">here</A>.

To search for the word "here" in the zone "A" when the href contains the string "verity," you can write this query: "here" <in> A <when> (href <contains> "verity")

A query condition for the when operator must be enclose in parentheses, as shown above. A query condition can include one or more Verity operators, and it takes the form: "atribute_name" <attribute_test_operator> "test_value"

where attribute_test_operator is one of <starts>, <ends>, <contains>, <=>, or <matches>. Except for =, all must be surrounded by angle brackets.

Attribute test operators can be combined with the combination operators <and> or <or>. For example, you can search the string "ibm" in a zone named "Company," when the attribute named "reference" is either equal to "major" or "significant" using the following query: "ibm" <in> "Company" <when> ("reference" = "major" <or> "reference" = "significant")

<wildcard>

Selects documents that contain matches to a wildcard character string. The wildcard operator lets you define a wildcard string, which can be used to locate related word matches in documents. A wildcard string consists of special characters. For example, to retrieve documents that contain words such as, "pharmaceutical," "pharmacology," and "pharmacodynamics," you can enter the following: pharmac*

Documents are not relevance-ranked unless the many modifier is used, as in: <many> pharmac*

The wildcard characters "*" and "?" automatically enable wildcard searching. To use other constructs, use the wildcard operator explicitly with any of the characters below.

Character Function

? Specifies one of any alphanumeric character, as in ?an, which locates "ran," "pan," "can," and "ban." It is not necessary to specify the wildcard operator when you use the question mark. The question mark is ignored in a set ([ ]) or in an alternative pattern({ }).

* Specifies zero or more of any alphanumeric character, as in corp*, which locates "corporate," "corporation," "corporal," and "corpulent." It is not necessary to specify the wildcard operator when you use the asterisk; you should not use the asterisk to specify the first character of a wildcard string. The asterisk is ignored in a set ([ ]) or in an alternative pattern ({ }).

[ ] Specifies one of any character in a set, as in <wildcard> `c[auo]t`, which locates "cat," "cut," and "cot." You must enclose the word that includes a set in backquotes (`), and there can be no spaces in a set.

{ } Specifies one of each pattern separated by a comma, as in <wildcard> `bank{s,er,ing}`, which locates "banks," "banker," and "banking." You must enclose the word that includes a pattern in backquotes (`), and there can be no spaces in a set.

^ Specifies one of any character not in the set, as in <wildcard> `st[^oa]ck`, which excludes "stock" and "stack" but locates "stick" and "stuck." The caret (^) must be the first character after the left bracket ([) that introduces a set.

- Specifies a range of characters in a set, as in <wildcard> `c[a-r]t`, which locates every three-letter word from "cat" to "crt."

Searching for Nonalphanumeric Characters

Remember that you can search for nonalphanumeric characters only if the style.lex file used to create the collections you are searching is set up to recognize the characters you want to search for. Consult your collection administrator for more information.

Searching for Wildcard Characters as Literals

Provided the style.lex file is set up for the collections to be searched, you can search for a word containing a wildcard character such as "/" or "*" by preceding the wildcard character with a backslash. For example, if you enter the following search string: abc\*d

the engine finds five-character words matching the "abc*d" string.

When you want to match a literal backslash, you must enter two backslashes.

Searching for Special Characters as Literals

The following nonalphanumeric characters perform special, internal search engine functions, and by default are not treated as literals in a wildcard string:

comma ,
left and right parentheses ( )
double quotation mark "
backslash \
at sign @
left curly brace {
left bracket [
less than sign <
backquote `

To interpret special characters as literals, you must surround the whole wildcard string in backquotes (`). For example, to search for the wildcard string "a{b", you surround the string with backquotes, as follows: <wildcard> `a{b`

To search for a wildcard string that includes the literal backquote character (`), you must use two backquotes together and surround the whole wildcard string in backquotes (`), as follows: <wildcard> `*n``t`

You can search on backquotes only if the style.lex file used to create the collections you are searching is set up to recognize the backquote character. Consult your collection administrator for information.

<word>

Selects documents that include one or more instances of a word you specify. For example, to search for documents that contain the word "rhetoric," without also considering the words "rhetorical" and "rhetorician," you can enter the following: <word> rhetoric

Documents are not relevance-ranked unless the many modifier is used, as in: <many><word> rhetoric

Return to Top of Page | Go to Search Overview | Go to EDGAR Search Page
Go to Website Search Page | Go to Using Simple Search

Modifier Reference

Modifiers are used in conjunction with operators. When specified, a modifier changes the standard behavior of an operator in some way. For example, you can use the case modifier with an operator to specify that the case of the search word you enter be considered a search element as well. Modifiers include case, many, not, and order, each of which is described below.

<case>

Use the case modifier with the word or wildcard operator to perform a case-sensitive search, based on the case of the word or phrase specified.

To use the case modifier, you simply enter the search word or phrase as you wish it to appear in retrieved documents - in all uppercase letters, in mixed uppercase and lowercase letters, or in all lowercase letters.

For example, to retrieve documents that contain the word "Apple" in mixed uppercase and lowercase letters, you can enter the following: <case> <word> Apple

Only those documents that contain the word "Apple" will be selected. Occurrences of "apple," "apples," or "apple" will not be selected.

When mixed uppercase and lowercase characters are included in a query, the search engine finds case-sensitive matches.

<many>

Counts the density of words, stemmed variations, or phrases in a document, and produces a relevance-ranked score for retrieved documents. The more occurrences of a word, stem, or phrase proportional to the amount of document text, the higher the score of that document when retrieved. Because the many modifier considers density in proportion to document text, a longer document that contains more occurrences of a word can score lower than a shorter document that contains fewer occurrences. You can use the many modifier with these operators: <word>, <wildcard>, stem, phrase, sentence, paragraph.

For example, to select documents based on the density of stemmed variations of the word "apple," you can enter the following: <many> <stem> apple

To select documents based on the density of the phrase "mission oak," you can enter the following: <many> mission oak

The many modifier cannot be used with and, or, accrue, or relational operators.

<not>

Use the not modifier with a word or phrase to exclude documents that show evidence of that word or phrase. For example, to select only documents that contain the words "cat" and "mouse" but not the word "dog," you can enter the following: cat <and> mouse <and> <not> dog

You can use the not modifier only with the operators and and or.

<order>

Use the order modifier to specify that search elements must occur in the same order in which they were specified in the query. If search values do not occur in the specified order in a document, the document is not selected. You can use the order modifier with these operators: paragraph, sentence, and near/N.

Always place the order modifier just before the operator. The following syntax examples show how you can use either simple syntax or explicit syntax to retrieve documents containing the word "president" followed by the word "washington" in the same paragraph:

Simple syntax: president <order><paragraph> washington

Explicit syntax: <order><paragraph> ("president", "washington")

To search for documents containing the words "diver," "kills," "shark" in that order within 20 words of each other, use one of the following queries: diver <order><near/20> kills <order><near/20> shark <order> <near/20> (diver, kills, shark)

You can use the near/N operator with the order modifier to duplicate the behavior of the phrase operator. For example, to search for documents containing the phrase "world wide web," you can use the following syntax: world <order><near/1> wide <order><near/1> web