Concept caching

From semantic-mediawiki.org
N/A
Explains the mechanism for pre-computing query results, so called concept caching.
Table of Contents

In order to speed up semantic query answering for queries that use concepts, Semantic MediaWiki offers a special mechanism for pre-computing query results, called concept caching. This feature is especially useful for large wikis that still want to make use of complex queries in a controlled way. In particular, there are various options to specify which queries should be computed "live" and which queries should be answered only if a cache is available.

The performance increase gained by caching a concept is comparable to the effect of replacing every use of this concept by a MediaWiki category. It affects all semantic queries that use this concept as well as the display of the concept page as such. On the other hand, pre-computed results for a concept query might become out of date, so the displayed results might no longer agree with the contents of the wiki. It is possible to specify how old a cached result should at most be until SMW will attempt to recompute its results.

Creating and managing concept caches[edit]

Under the default configuration, SMW will use a concept cache whenever it is available and no older than one day.

rebuildConceptCache.php[edit]

Use the maintenance script rebuildConceptCache.php ("SMW_conceptCache.php ≤ SMW 1.9.1) to create a cache for a concept.

Like any other SMW script, it is executed by changing into the directory [SMWpath]/maintenance and running php rebuildConceptCache.php. Doing this will display a documentation on how to use this script. If this fails, then you probably use a non-standard directory structure on your wiki – read [SMWpath]/maintenance/README to find out how to run scripts in your case.

Parameters[edit]

For all concepts

Use the following parameters to set the basic mode of operation for the script. Each of these actions refers to all concepts.

  • php rebuildConceptCache.php --status shows the status of all concept caches on your site (including the case that there is no cache for a concept). You can use this now to see which concept pages you got.
  • php rebuildConceptCache.php --create creates new caches for all concepts, or updates them if they are already there.
  • php rebuildConceptCache.php --delete deletes the caches of all concepts.
Only for certain concepts

There are a number of parameters to restrict the operation to only certain concepts:

  • --concept "Concept name" Process only the one concept of the given name. The name should not include a namespace prefix, but it needs surrounding " if the name contains spaces.
  • --hard Process only concepts that are not allowed to be computed online according to the current wiki settings. See below for further details.
  • --old <min> Process only concepts with caches older than <min> minutes or with no caches at all.
  • --update Process only concepts that already have some cache, i.e. do not create any new caches. For the opposite (only concepts without caches), use paramenter --old <min> with a very high number for <min>, that does not apply to any of the existing caches.
  • -s <startid> Process only concepts with page id of at least <startid>
  • -e <endid> Process only concepts with page id of at most <endid>
Combining options

These options can be combined to restrict the selection of concepts further.

For example, the call php rebuildConceptCache.php --create --update --old 30 will create caches for all concepts that already have a cache, but where the cache is older than 30 minutes.

Configuring SMW to use caches[edit]

SMW has basically three options for handling concepts in queries and on the concept page:

  • Compute the elements of the concepts when needed, using the current wiki data.
  • Retrieve the elements of the concepts from a cache.
  • Reject the use of the concept completely, treating it like a query with no results.

The default behaviour of SMW is to use available caches if those are not older than one day, and to otherwise compute concept elements on the fly as long as the concept would be allowed as a (non-concept) inline query according to the current wiki settings. It is possible to create concepts that are not allowed as inline queries, since the restrictions on the size and complexity of concepts are less strict than for inline queries. Results for such concepts will by default only be returned from cache (no matter how old it is), and otherwise not be supplied. The following sections explain the relevant parameters to configure that behaviour.

Which queries are allowed as inline?[edit]

Three parameters are used to determine how "complex" a query is: size, depth, and the types of query features it uses. The size is essentially the overall number of query conditions found in the query. The depth is the maximal number of chained property statements. For example [[Some property::value]] has depth 1, and [[property1.property2::value]] [[Category:Something]] has depth 2. The query features are the types of query conditions that are used. You can use #ask with format=debug to see the size and depth of a query.

SMW has configuration options to set a maximal size and depth, and to restrict the available features for queries that are used inline or on special pages. These parameters and their default values are:

The allowed features are simply all available features by default, but one could restrict that on large wikis. To allow only category and concepts queries, e.g., one would set:

$smwgQFeatures = SMW_CATEGORY_QUERY | SMW_CONCEPT_QUERY;

in LocalSettings.php. To allow also the conjunction (intersection) of these, one would use

$smwgQFeatures = SMW_CATEGORY_QUERY | SMW_CONCEPT_QUERY | SMW_CONJUNCTION_QUERY;

If a query is not allowed according to these options, then it will be cut-down to a simpler query. Concepts may use queries that do not meet these requirements. If this happens, the concept is simply not computed "life" at all, and results are only shown when a cache has been created for that concept. Concepts that are affected by this can be selected by the option --hard in php rebuildConceptCache.php. In that way, users can create (propose) concepts, and administrators may supply caches for them if feasible.

Which queries are allowed in concepts?[edit]

For concepts, SMW supports settings similar to those explained above, but with slightly different defaults:

As before, these can be changed in LocalSettings.php. The defaults show that some queries will be allowed in concepts while not being allowed in #ask. As discussed before, such concepts will not be enabled until they have some cache.

Hard and simple concepts[edit]

The above sections explained that some concepts in SMW are considered "hard". By default, this is the case if they represent queries that would not be allowed as inline queries. This can be changed by setting the value of the option $smwgQConceptCaching. The possible settings are:

  • $smwgQConceptCaching = CONCEPT_CACHE_HARD; The default setting as explained above.
  • $smwgQConceptCaching = CONCEPT_CACHE_ALL; All concepts are considered to be "hard", i.e. concepts will never be computed online and always rely on caches.
  • $smwgQConceptCaching = CONCEPT_CACHE_NONE; No concepts will be considered hard. Concepts may still use caches if available (see next section), but they do not depend on them in any case. This can be useful if the concept namespace is write-restricted to a certain trusted user group who will be the only ones who can create new concept queries.

The default setting in SMW is $smwgQConceptCaching = CONCEPT_CACHE_HARD;

When will available caches be used?[edit]

Hard concepts will always try to use a cache, since they simply cannot display anything without a cache. For simple concepts, re-computing the results online is possible, and SMW will use the cache age to decide what to do. The configuration parameter $smwgQConceptCacheLifetime specifies how old (in minutes) a cache is allowed to be. Older caches are not used if the configuration of SMW allows to compute their elements online.

$smwgQConceptCacheLifetime = 24*60;(default setting)

How many items will be cached?[edit]

The $smwgQMaxLimit setting defines how many items per concept are cached. In case your concept has more items than the default setting, you should evaluate appropriate measurements necessary for adopting the setting.

$smwgQMaxLimit = 10000;(default setting)

See also[edit]

Help page on caching.