How to do indexing with the full-text search

From semantic-mediawiki.org
< Full-text search
Semantic searchSelecting pagesFull-text searchHow to do indexing with the full-text search
Table of Contents

The index update for use of the full-text search only occurs on a change propagation therefore the initialization of the index should be done using the "rebuildFulltextSearchTable.php" maintenance script (recommended) or by triggering the process via special page "SemanticMediaWiki1 to prepare and create the initial index content.

Indexing with the maintenance script[edit]

The simplest way to do that is by running from your command line which is the recommended approach:

php maintenance/rebuildFulltextSearchTable.php --quick

This will output something similar to below example where the script explains the configuration used including what datatypes are enabled or which properties are excluded. It also outputs the version used by the "onoi/tesa" library2 which provides some text sanitization functions.

NoteNote: You may also use option --optimize which in case of MySQL3 will ensure that "... builds the table to update index statistics and free unused space in the clustered index ...":
php maintenance/rebuildFulltextSearchTable.php --quick --optimize
Example output
The script rebuilds the search index from property tables that
support a fulltext search. Any change of the index rules (altered
stopwords, new stemmer etc.) and/or a newly added or altered table
requires to run this script again to ensure that the index complies
with the rules set forth by the DB or Sanitizer.

## Configuration

- ICU (Intl) PHP-extension         54.1
- Tesa::Sanitizer                  0.2
- Tesa::Transliterator             0.2
- Tesa::LanguageDetector           (Disabled)
- DataTypes (Indexable)            BLOB, URI, WIKIPAGE

The following properties are exempted from the indexing process.

- _ASKFO, _ASKST, _IMPO, _LCODE, _UNIT, _CONV, _TYPE, _ERRT, _INST
- _ASK, _SOBJ, ___EUSER, ___CUSER, ___SUBP, ___EXIFDATA, __sci_cite
- __sil_iwl_lang, __sil_ill_lang

## Indexing

The entire index table is going to be purged first and
it may take a moment before the rebuild is completed due to
dependencies on table content including varying options.

The index table was purged.

Rebuilding the text index from (rows finished/expected):

- smw_di_blob                       100% (2387/2387)
- smw_di_uri                        100% (1990/1990)

Indexing via the special page[edit]

It is also possible do the indexing via special page "SemanticMediaWiki" by triggering the special task "Full-text search rebuild" by clicking the button labeled "Schedule full-text rebuild" in the "Data repair and update" section of that page. Thus the indexing will be done via the job queue. Note that the respective user must have the permission to do so.

References

  1. ^  Semantic MediaWiki: GitHub pull request gh:smw:2142
  2. ^  "onoi/tesa" - A small library to help with the sanitization of text or string elements.
  3. ^  Optimizing InnoDB Full-Text Indexes notes "Running OPTIMIZE TABLE on a table with a full-text index rebuilds the full-text index, removing deleted Document IDs and consolidating multiple entries for the same word, where possible."