|Fulltext search support|
|Table of Contents|
SMW 2.5 adds an experimental support (not enabled by default) for accessing the full-text capabilities of the SQL back-end. Support was added for MySQL1 and SQLite2 while Postgres is currently not supported34.
The search is only supported by the
SPARQLStore requires the native support of full-text search capabilities by the triple-store).
Features and limitations
FT_SEARCHtable aggregated search content for
URIvalues against an index search is being executed
- Supported operations rely on the backend database (MySQL, MariaDB)
- For MySQL and MariaDB,
IN BOOLEAN MODEis used as default search mode
- relevance and scores are not used for any sorting purpose (e.g. as in best match)
onoi/tesato provide some text manipulation support as well as a possibility to use language detection if enabled
- Custom stopwords are only applied by
onoi/tesain case the language detection is enabled but MySQL/MariaDB provide their own standard list 5 which are enabled by default
- General CJK support is a challenging endeavour due to text elements to be broken into corresponding tokens that are not separate by spaces
onoi/tesaprovides some simple
Tokenizer's (which doesn't require language detection) that will try to provide rudimentary CJK search out-of-the box (expects ICU 54+)
- Mroonga is a MySQL storage engine and said to be a CJK-ready fulltext search, column store
- MySQL comes with an optional ngram Full-Text Parser and MeCab Full-Text Parser Plugin.
- According to https://jira.mariadb.org/browse/MDEV-10267, MariadDB is missing those parser plug-ins
Settings and configurations
$smwgEnabledFulltextSearchto enable the feature
$smwgFulltextSearchTableOptionsDB back-end related options
$smwgFulltextSearchPropertyExemptionList- list properties that should be not be indexed and so should be exempted from full-text search
$smwgFulltextSearchMinTokenSizeset to "3" by default and expected to correspond to either innodb_ft_min_token_size or ft_min_word_len (this helps us to switch back to LIKE in cases where the min threshold is not applicable)
$smwgFulltextLanguageDetectionexperimental setting (see notes)
Changes to any of the above settings, requires to re-run the
Manuals and instructions
This feature is not enabled by default, please read the following manuals and instructions for insights on:
- Indexing describes some methods on how to manually create and update the index table
- Searching contains some examples and descriptions about the available search syntax
- Technical notes has some notes about the technical implementation, fine-tuning, and performance
- Semantic MediaWiki: GitHub pull request #1481
- Semantic MediaWiki: GitHub pull request #1801
- Semantic MediaWiki: GitHub pull request #1956 notes that "... any interested developer who is eager to help with implementing a Postgres solution ..."
- Postgres is not supported due to a different index schema (e.g.
to_tsquery) but users interested to make it available are encouraged to have a look at the
MySQLValueMatchConditionBuilderon how to create a Postgres specific implementation.
- https://dev.mysql.com/doc/refman/5.6/en/fulltext-stopwords.html and https://mariadb.com/kb/en/mariadb/stopwords/