In the beginning, there were keywords, and nothing but keywords, and no query syntax was supported, and Sphinx just matched all keywords, and that was good. But even in that innocent antediluvian age, diverse people were asking for various querying patterns, and ranking methods, and we heard them, and thus and so matching modes were cast upon Sphinx. And they were four, and accessible via SphinxAPI and its younger brother, SphinxSE they were.
Nowadays, matching modes are just a legacy. Even the very concept of a “matching mode” is already deprecated internally. But we still have to quickly cover them, as two out of three searching APIs (SphinxAPI and SphinxSE) support them and default to a certain legacy mode for compatibility reasons.
Legacy modes were a predefined combination of (very simple) query parsing rules, query-to-document matching rules, and a specific ranking method (called a ranker).
There are four legacy matching modes: ALL, ANY, PHRASE, and BOOLEAN. You could switch between modes using the SetMatchMode() call in SphinxAPI. For instance, the following call in PHP sets the PHRASE mode:
$client->SetMatchMode ( SPH_MATCH_PHRASE );
In ALL, ANY, and PHRASE modes, queries were interpreted as “bags of keywords”
and then matched and ranked as specified by the mode. BOOLEAN, in addition, supported the basic Boolean operators (AND, OR, NOT, and parentheses).
57
ALL
Documents that match all of the keywords are returned. Documents are ranked in the order reflecting how closely the matched words resemble the query (phrase proximity to the query).
ANY
Documents that match any of the keywords are returned. Documents are ranked based on the degree of the phrase proximity to the query, and the number of unique matching documents in every field.
PHRASE
Documents that match the query as an exact phrase are returned. Documents are ranked based on the fields in which the phrase occurs, and their respective user weights.
BOOLEAN
Documents that match a Boolean expression built from keywords, parentheses, and the AND, OR, and NOT operators are returned. Documents are not ranked. It was expected that you will sort them based on a criterion other than relevance.
In addition, there’s one nonlegacy matching mode:
EXTENDED
Documents that match an expression in Sphinx query syntax are returned. (Query syntax supports keywords, parentheses, Boolean operators, field limits, grouping keywords into phrases, proximity operators, and many more things that we will discuss in detail shortly.) Documents are ranked according to one of the available ranking functions that you can choose on the fly.
There were several problems with the legacy matching modes.
First, they were very limited. There was no way to do anything even slightly fancy, like, say, matching “Barack Obama” as an exact phrase and “senator” and “Illinois” as plain keywords at the same time.
Second, they essentially tightly coupled query syntax and a ranking function. So, for instance, when using the ALL mode, you could not ask Sphinx to just apply lightweight ranking and skip keyword positions for speed. In that mode, Sphinx always computes a rather expensive proximity rank. Or the other way around, if you liked the ranking that ANY yielded, you couldn’t get it while matching all words or matching a phrase, on the grounds that the ANY ranking function was nailed onto its matching mode with nine-inch titanium nails.
Third, once we introduced query syntax support, all the matching modes became just limited, particular subcases of that generic, all-encompassing syntax. That’s the course of progress and redundancy in the modern world. The milkman’s lot isn’t as sought after as it once was...
Last but not least, Sphinx used to have a different code path internally for every match-ing mode, and that was of little help when maintainmatch-ing and improvmatch-ing it.
The EXTENDED mode fixes all of this. It decouples query syntax from ranking; you can choose a ranking function separately (using either the SetRankingMode() API call or the OPTION ranker=XXX SphinxQL clause). And adding new full-text querying features does not involve a new “matching mode” anymore; you just change your queries.
So, in version 0.9.9, we internally switched everything to use a unified matching engine, formerly exposed only under the EXTENDED matching mode. When you use one of the legacy modes, Sphinx internally converts the query to the appropriate new syntax and chooses the appropriate ranker. For instance, the query one two three will be internally rewritten as follows:
ALL
Query: one two three Ranker: PROXIMITY ANY
Query: "one two three"/1 Ranker: PROXIMITY PHRASE
Query: "one two three"
Ranker: PROXIMITY BOOLEAN
Query: one two three Ranker: NONE
Special characters such as quotes and slashes that are reserved in query syntax will also be escaped in rewritten queries.
For compatibility reasons, SphinxAPI and SphinxSE default to the ALL matching mode, so to use query syntax or fancier new ranking functions, you have to explicitly switch to EXTENDED mode:
$client->SetMatchMode ( SPH_MATCH_EXTENDED );
The MATCH() operator in SphinxQL always uses EXTENDED mode, so you don’t have to do anything there to get query syntax.
Matching Modes | 59