Help:Full-text search

From semantic-mediawiki.org
Jump to: navigation, search
Help:Full-text searchFulltext/Indexing
Important noteImportant Note: This page is work in progress.
N/A
Fulltext search support
Keywords
Table of Contents

Semantic MediaWiki 2.5.0 adds an experimental support (not enabled by default) for accessing the full-text capabilities of the SQL back-end. Support was added for MySQL1 and SQLite2 while Postgres is currently not supported34.

The search is only supported by the SQLStore (the SPARQLStore requires the native support of full-text search capabilities by the triple-store).

Requirements

  • Semantic MediaWiki 2.5+
  • MySQL 5.5+ / MariaDB 10.0.5+1
  • SQLite 3.8+2
  • PHP 5.5+
  • SQLStore

Features and limitations

  • FT_SEARCH table aggregated search content for BLOB and URI values against an index search is being executed
  • Supported operations rely on the backend database (MySQL, MariaDB)
  • For MySQL and MariaDB, IN BOOLEAN MODE is used as default search mode
  • relevance and scores are not used for any sorting purpose (e.g. as in best match)
  • TextSanitizer relies on onoi/tesa to provide some text manipulation support as well as a possibility to use language detection if enabled
  • Custom stopwords are only applied by onoi/tesa in case the language detection is enabled but MySQL/MariaDB provide their own standard list 5 which are enabled by default

CJK support

  • General CJK support is a challenging endeavour due to text elements to be broken into corresponding tokens that are not separate by spaces
  • onoi/tesa provides some simple Tokenizer's (which doesn't require language detection) that will try to provide rudimentary CJK search out-of-the box (expects ICU 54+)
  • Mroonga is a MySQL storage engine and said to be a CJK-ready fulltext search, column store
  • MySQL comes with an optional ngram Full-Text Parser and MeCab Full-Text Parser Plugin.
  • According to https://jira.mariadb.org/browse/MDEV-10267, MariadDB is missing those parser plug-ins

Settings and configurations

Changes to any of the above settings, requires to re-run the rebuildFulltextSearchTable.php script.

Manuals and instructions

This feature is not enabled by default, please read the following manuals and instructions for insights on:

  • Indexing describes some methods on how to manually create and update the index table
  • Searching contains some examples and descriptions about the available search syntax
  • Technical notes has some notes about the technical implementation, fine-tuning, and performance

See also




References

  1. a b  Semantic MediaWiki: GitHub pull request #1481
  2. a b  Semantic MediaWiki: GitHub pull request #1801
  3. ^  Semantic MediaWiki: GitHub pull request #1956 notes that "... any interested developer who is eager to help with implementing a Postgres solution ..."
  4. ^  Postgres is not supported due to a different index schema (e.g. to_tsvector, to_tsquery) but users interested to make it available are encouraged to have a look at the MySQLValueMatchConditionBuilder on how to create a Postgres specific implementation.
  5. ^  https://dev.mysql.com/doc/refman/5.6/en/fulltext-stopwords.html and https://mariadb.com/kb/en/mariadb/stopwords/