Help:Configuration parameter "$smwgFulltextLanguageDetection"

From semantic-mediawiki.org
Jump to: navigation, search
edit with form (experimental)


Title $smwgFulltextLanguageDetection
Description Sets which languages to detect for the full-text search from an indexable text
Default setting
array();
Software Semantic MediaWiki
First version supported 2.5.0
Last version supported still available
Configuration Full-text search
Keyword Full-text search

$smwgFulltextLanguageDetection is a configuration parameter that sets which languages to detect for the full-text search from an indexable text. It was introduced in Semantic MediaWiki 2.5.0.1

Important noteImportant Note: Using the feature connected to this configuration parameter is experimental!
NoteNote: This setting only takes effect if the full-text search feature was enabled.

Default setting

$smwgFulltextLanguageDetection = array( );

This means that by default language detection is disabled.

Available language detectors

  • TextCatLanguageDetector: Allows for "N-Gram-Based Text Categorization" via TextCat2 and relies on the "wikimedia-textcat" utility3.
  • CdbNGramLanguageDetector: Allows for "N-Gram-Based Text Categorization" via the "constant database"24

Changing the default setting

Important noteImportant Note: Changing the content of this configuration parameter requires to run the "rebuildFulltextSearchTable.php" maintenance script.

To modify this configuration setting, add one of the following lines to your "LocalSettings.php" file after the enableSemantics() call:

Allow major Western European languages to be detected
$smwgFulltextLanguageDetection => array(
	'TextCatLanguageDetector' => array( 'en', 'de', 'fr', 'es' )
);
Allow major East Asian languages to be detected (MySQL 5.7+)
$smwgFulltextLanguageDetection => array(
	'TextCatLanguageDetector' => array( 'ja', 'zh', 'ko' )
);
NoteNote:
  • A large list of languages does have a detrimental influence on the performance when trying to detect a language from a free text. Therefore languages should only be added with caution.
  • This configuration parameter should only hold one language detector at a time.
  • Stopwords are only applied after language detection has been enabled.

See also