How to do indexing with the full-text search
|Table of Contents|
The index update for use of the full-text search only occurs on a change propagation therefore the initialization of the index should be done using the "rebuildFulltextSearchTable.php" maintenance script (recommended) or by triggering the process via special page "SemanticMediaWiki1 to prepare and create the initial index content.
Indexing with the maintenance script
The simplest way to do that is by running from your command line which is the recommended approach:
php maintenance/rebuildFulltextSearchTable.php --quick
This will output something similar to below example where the script explains the configuration used including what datatypes are enabled or which properties are excluded. It also outputs the version used by the "onoi/tesa" library2 which provides some text sanitization functions.
--optimizewhich in case of MySQL3 will ensure that "... builds the table to update index statistics and free unused space in the clustered index ...":
php maintenance/rebuildFulltextSearchTable.php --quick --optimize
- Example output
The script rebuilds the search index from property tables that support a fulltext search. Any change of the index rules (altered stopwords, new stemmer etc.) and/or a newly added or altered table requires to run this script again to ensure that the index complies with the rules set forth by the DB or Sanitizer. ## Configuration - ICU (Intl) PHP-extension 54.1 - Tesa::Sanitizer 0.2 - Tesa::Transliterator 0.2 - Tesa::LanguageDetector (Disabled) - DataTypes (Indexable) BLOB, URI, WIKIPAGE The following properties are exempted from the indexing process. - _ASKFO, _ASKST, _IMPO, _LCODE, _UNIT, _CONV, _TYPE, _ERRT, _INST - _ASK, _SOBJ, ___EUSER, ___CUSER, ___SUBP, ___EXIFDATA, __sci_cite - __sil_iwl_lang, __sil_ill_lang ## Indexing The entire index table is going to be purged first and it may take a moment before the rebuild is completed due to dependencies on table content including varying options. The index table was purged. Rebuilding the text index from (rows finished/expected): - smw_di_blob 100% (2387/2387) - smw_di_uri 100% (1990/1990)
Indexing via the special page
It is also possible do the indexing via special page "SemanticMediaWiki" by triggering the special task "Full-text search rebuild" by clicking the button labeled "Schedule full-text rebuild" in the "Data repair and update" section of that page. Thus the indexing will be done via the job queue. Note that the respective user must have the permission to do so.
- Semantic MediaWiki: GitHub pull request gh:smw:2142
- "onoi/tesa" - A small library to help with the sanitization of text or string elements.
- Optimizing InnoDB Full-Text Indexes notes "Running OPTIMIZE TABLE on a table with a full-text index rebuilds the full-text index, removing deleted Document IDs and consolidating multiple entries for the same word, where possible."