$smwgFulltextLanguageDetection

From semantic-mediawiki.org


Configuration parameter details:
Name $smwgFulltextLanguageDetection
Description Sets which languages to detect for the full-text search from an indexable text
Default setting
[];
Software Semantic MediaWiki
Since version
Until version still available
Configuration Full-text search · Experimental
Keyword full-text search · data store · relational database · sql store · sql database · experimental


$smwgFulltextLanguageDetection is a configuration parameter that sets which languages to detect for the full-text search from an indexable text. It was introduced in Semantic MediaWiki 2.5.0Released on 14 March 2017 and compatible with MW 1.23.0 - 1.29.x..1

  • Using the feature connected to this configuration parameter is experimental.
  • This setting only takes effect if the full-text search feature was enabled.

Default setting[edit]

$smwgFulltextLanguageDetection = [];

This means that by default language detection is disabled.

Available language detectors[edit]

  • TextCatLanguageDetector: Allows for "N-Gram-Based Text Categorization" via TextCat2 and relies on the "wikimedia-textcat" utility3.
  • CdbNGramLanguageDetector: Allows for "N-Gram-Based Text Categorization" via the "constant database"24

Changing the default setting[edit]

Changing the content of this configuration parameter requires to run maintenance script "rebuildFulltextSearchTable.php"Allows to rebuild the full text search data table

To modify the setting to this configuration parameter, add one of the following lines to your "LocalSettings.php" file after the enableSemantics() call:

Allow major Western European languages to be detected
$smwgFulltextLanguageDetection => [
	'TextCatLanguageDetector' => [
		'de',
		'en',
		'es',
		'fr',
		'pt'
	]
];
Allow major East Asian languages to be detected (MySQL 5.7+)
$smwgFulltextLanguageDetection => [
	'TextCatLanguageDetector' => [
		'ja',
		'zh',
		'ko'
	]
];
  • A large list of languages does have a detrimental influence on the performance when trying to detect a language from a free text. Therefore languages should only be added with caution.
  • This configuration parameter should only hold one language detector at a time.
  • Stopwords are only applied after language detection has been enabled.

See also[edit]