ElasticStore

From semantic-mediawiki.org
Search feature information
ElasticStore the component to establish a connection between Semantic MediaWiki and a Elasticsearch cluster.
Collection
Keywords

The ElasticStore was introduced as part of Semantic MediaWiki 3.01 to provide a powerful and scalable QueryEngine that can serve enterprise and wiki-farm users better by moving query heavy computation to an external entity (meaning separated from the main DB master/replica) known as Elasticsearch2.

ElasticStore

Requirements | Features | Setup | Usage | Settings | Technical notes | FAQ

The ElasticStore provides a framework to replicate Semantic MediaWiki related data to an Elasticsearch cluster and enable its QueryEngine to send #ask requests and retrieve information from Elasticsearch (aka ES) instead of the default SQLStore.

The objective is to provide an interface to Elasticsearch to:

  • improve structured (and allow unstructured) content searches
  • extend and improve full-text query support (including sorting of results by relevancy)
  • provide means for a scalability strategy by relying on the ES infrastructure

Requirements

  • Elasticsearch: Recommended 6.1+, Tested with 5.6.6
  • Semantic MediaWiki: 3.0+
  • elasticsearch/elasticsearch (PHP ^7.0 ~6.0 or PHP ^5.6.6 ~5.3)

We rely on the elasticsearch php-api to communicate with Elasticsearch and are therefore independent from any other vendor or MediaWiki extension that may use ES as search backend (e.g. CirrusSearch).

It is recommended to use:

  • ES 6+ due to improvements to its sparse field handling
  • ES hardware with "... machine with 64 GB of RAM is the ideal sweet spot, but 32 GB and 16 GB machines are also common ..." as noted in the elasticsearch guide

Features

  • Handle property type changes without the need to rebuild the entire index itself after it is ensured that all ChangePropagation jobs have been processed
  • Inverse queries are supported (e.g. [[-Foo::Bar]])
  • Property chains and paths queries are supported (e.g. [[Foo.Bar::Foobar]])
  • Category and property hierarchies are supported

ES is not expected to be used as solely data store replacement and therefore it is not assumed that ES returns all _source fields for a request.

The ElasticStore provides a customized serialization format to transform and transfer data, an interpreter (see domain language) allows #ask queries to be answered by an ES instance.

Setup

To use Elasticsearch as drop-in replacement for the existing SQLStore based query answering the following settings and actions are necessary.

  • Set $GLOBALS['smwgDefaultStore'] = 'SMWElasticStore';
  • Set $GLOBALS['smwgElasticsearchEndpoints'] = [ ... ];
  • Run php setupStore.php or php update.php
  • Rebuild the index using php rebuildElasticIndex.php

For a more detailed introduction, see the usage and settings section and

more ...

See also[edit]

References

  1. ^  Semantic MediaWiki: GitHub pull request gh:smw:3054
  2. ^  Elasticsearch is a highly scalable open-source full-text search and analytics engine