Full-text search

Full-text search
Full-text search support for properties which data types use strings of characters or text to store their database tables . e.g. datatype "Page"Holds names of wiki pages, and displays them as a link, datatype "Text"Holds text of arbitrary length, datatype "Code"Holds technical, pre-formatted texts (similar to datatype Text) or datatype "URL"Holds URIs, URNs and URLs, etc.

Keywords
queries · fulltext · full-text · full-text search
Table of Contents
Contents 1 Features 1.1 General notes 1.2 Notes on language support for Chinese, Japanese, and Korean (CJK) 2 Instructions

Semantic MediaWiki 2.5.0Released on 14 March 2017 and compatible with MW 1.23.0 - 1.29.x. adds experimental support for accessing the full-text capabilities of the relational databases (SQL back-end) for properties whose data types use strings of characters or text to store their database tables, e.g. datatype "Page"Holds names of wiki pages, and displays them as a link, datatype "Text"Holds text of arbitrary length, datatype "Code"Holds technical, pre-formatted texts (similar to datatype Text) or datatype "URL"Holds URIs, URNs and URLs, etc.

Features[edit]

General notes[edit]

The FT_SEARCH table aggregates search content for datatypes storing their data as BLOB and URI values, e.g. datatype "Page"Holds names of wiki pages, and displays them as a link, datatype "Text"Holds text of arbitrary length, datatype "Code"Holds technical, pre-formatted texts (similar to datatype Text) or datatype "URL"Holds URIs, URNs and URLs, etc.
These datatypes use either CHAR, VARCHAR, or TEXT to store their data in the database tables.
Supported operations rely on the relational backend database (MySQL, MariaDB and SQLite).
For MySQL and MariaDB databases, IN BOOLEAN MODE is used as default search mode. This allows for a number of special operators to be used by the software.
Relevance and scores are not used for any sorting purpose, e.g. as in best match.
TextSanitizer relies on the "onoi/tesa" library1 to help with the sanitization of text or string elements to provide some text manipulation support as well as a possibility to use language detection if enabled. This library is pre-installed for use by Semantic MediaWiki.
Custom stopwords are only applied by the "onoi/tesa" library1 in case the language detection is enabled but MySQL/MariaDB provide their own standard list2 which are enabled by default
Starting with Semantic MediaWiki 3.0.0Released on 11 October 2018 and compatible with MW 1.27.0 - 1.31.x.:
- If the SMW_FIELDT_CHAR_NOCASE option to configuration parameter $smwgFieldTypeFeaturesSets relational database specific field type features is enabled the full-text search only comes into effect for selections using the comparators ~ and !~.3
- API-module "smwtask"Allows to invoke and execute internal Semantic MediaWiki tasks is used instead of a socket connection via a special page to invoke extra "work" after an update has been completed as part of an independent transaction.4 See also configuration parameter $smwgPostEditUpdateSets how many jobs should be executed as part of a post-edit event.

Notes on language support for Chinese, Japanese, and Korean (CJK)[edit]

General CJK support is a challenging endeavour due to text elements to be broken into corresponding tokens that are not separate by spaces
The "onoi/tesa" library1 provides some simple Tokenizer's which does not require language detection and will try to provide rudimentary CJK search out-of-the box. This requires ICU 54+.
Mroonga is a MySQL storage engine and said to be a CJK-ready fulltext search, column store
MySQL comes with an optional ngram Full-Text Parser and MeCab Full-Text Parser Plugin.
According to this issue, MariadDB is missing those parser plug-ins. Support is still wanting in 2023.

Instructions[edit]

For users

Searching contains some examples and descriptions about the available search syntax

For system administrators

How to enable and configure full-text search on your wiki
Indexing describes some methods on how to manually create and update the index table

For developers

Technical notes provides some information on the technical implementation, fine-tuning, and performance

References

a b c | "onoi/tesa" - Metin veya dizi öğelerinin sterilize edilmesine yardımcı olacak küçük bir kitaplık.
^ | https://dev.mysql.com/doc/refman/5.6/en/fulltext-stopwords.html ve https://mariadb.com/kb/en/mariadb/stopwords/
^ Semantic MediaWiki: GitHub issue comment gh:smw:2499:307624826
^ Semantic MediaWiki: GitHub pull request gh:smw:3318

Status:	effective
Progress:	100%
Version:	3.0.0+