Semantic MediaWiki 2.5.0Released on 14 March 2017 and compatible with MW 1.23.0 - 1.29.x. adds experimental support for accessing the full-text capabilities of the relational databases (SQL back-end) for properties whose data types use strings of characters or text to store their database tables, e.g. datatype "Page"Holds names of wiki pages, and displays them as a link, datatype "Text"Holds text of arbitrary length, datatype "Code"Holds technical, pre-formatted texts (similar to datatype Text) or datatype "URL"Holds URIs, URNs and URLs, etc.
FT_SEARCH table aggregates search content for datatypes storing their data as
URI values, e.g. datatype "Page"Holds names of wiki pages, and displays them as a link, datatype "Text"Holds text of arbitrary length, datatype "Code"Holds technical, pre-formatted texts (similar to datatype Text) or datatype "URL"Holds URIs, URNs and URLs, etc.
- These datatypes use either
TEXT to store their data in the database tables.
- Supported operations rely on the relational backend database (MySQL, MariaDB and SQLite).
- For MySQL and MariaDB databases,
IN BOOLEAN MODE is used as default search mode. This allows for a number of special operators to be used by the software.
- Relevance and scores are not used for any sorting purpose, e.g. as in best match.
TextSanitizer relies on the "onoi/tesa" library1 to help with the sanitization of text or string elements to provide some text manipulation support as well as a possibility to use language detection if enabled. This library is pre-installed for use by Semantic MediaWiki.
- Custom stopwords are only applied by the "onoi/tesa" library1 in case the language detection is enabled but MySQL/MariaDB provide their own standard list2 which are enabled by default
- Starting with Semantic MediaWiki 3.0.0Released on 11 October 2018 and compatible with MW 1.27.0 - 1.31.x.:
- If the
SMW_FIELDT_CHAR_NOCASE option to configuration parameter
$smwgFieldTypeFeaturesSets relational database specific field type features is enabled the full-text search only comes into effect for selections using the comparators
- API-module "smwtask"Allows to invoke and execute internal Semantic MediaWiki tasks is used instead of a socket connection via a special page to invoke extra "work" after an update has been completed as part of an independent transaction.4 See also configuration parameter
$smwgPostEditUpdateSets how many jobs should be executed as part of a post-edit event.
Notes on language support for Chinese, Japanese, and Korean (CJK)
- General CJK support is a challenging endeavour due to text elements to be broken into corresponding tokens that are not separate by spaces
- The "onoi/tesa" library1 provides some simple
Tokenizer's which does not require language detection and will try to provide rudimentary CJK search out-of-the box. This requires ICU 54+.
- Mroonga is a MySQL storage engine and said to be a CJK-ready fulltext search, column store
- MySQL comes with an optional ngram Full-Text Parser and MeCab Full-Text Parser Plugin.
- According to this issue, MariadDB is missing those parser plug-ins. Support is still wanting in 2023.
- For users
- Searching contains some examples and descriptions about the available search syntax
- For system administrators
- For developers
- Technical notes provides some information on the technical implementation, fine-tuning, and performance