Help:Embedded query update

From semantic-mediawiki.org
Jump to: navigation, search
N/A
Embedded query updates
Image / Video collection
Keywords
Table of Contents

The embedded query update feature provided by "QueryDependencyLinksStore" was added with Semantic MediaWiki 2.3.0Released on 29 October 2015 and compatible with MW 1.19.0 - 1.25.x. as part of the query management to track and store dependencies of embedded queries (a.k.a. inline queries).1

To enable this feature it is required to:

  1. Set configuration parameter $smwgEnabledQueryDependencyLinksStoreSets whether tracking and storing of dependencies of embedded queries may be used to "true"
  2. Run the "update.php" maintenance script followed by maintenance script "rebuildData.php"Allows to rebuild all the semantic data for a selected data backend/store.

Features and limitations

  • Dependencies are resolved for properties, categories, concepts (non cached) and hierarchies
  • Namespace queries, e.g. [[Help:+]] are not tracked as this would significantly impact update performance for when a single namespace dependency is altered
  • Queries with arbitrary conditions, e.g. [[~Issue/*]] can not be tracked as they are not distinguishable in terms of an object description
  • Disabling query limits setting the "limit" option to "0" (|limit=0) are not tracked as they return an empty result list and only represent a simple link to special page "Ask"Provides an interface that assists users with creating and executing semantic queries
  • Queries via special page "Ask"Provides an interface that assists users with creating and executing semantic queries are not tracked since those are not embedded on pages
  • An invalidation (done by "ParserCachePurgeJob") is only triggered by entities that have been altered or added
  • Configuration parameter $smwgQueryDependencyPropertyExemptionListSets special properties that should be exempted from embedded queries updates contains property keys that are excluded from detection, by default special property "Modification date"Holds a fixed value that corresponds to the date of the last modification of each page, special property "Has subobject"Holds the subobjects set on a page and special property "Query duration"Holds the value of the duration a query took to execute

Notes

The main obstacle for queries to display updated results is the "ParserCache" by which MediaWiki determines whether to serve a page from cache or to re-parse (and with it the embedded parser functions such as parser function #ask) its content.

The "QueryDependencyLinksStore" does not update the query result itself but manages the invalidation of the "ParserCache" for selected articles that have been identified to require an update using the "ParserCachePurgeJob". Once the "ParserCache" is outdated, MediaWiki will start a re-parse on the next article view by which new results for embedded queries are to be requested from the "QueryEngine".

Computing dependencies

Computing dependencies and manage them in a performant manner is crucial to avoid a bottleneck situation and that is why dependencies (which are fetched from an "QueryResult" object) are only resolved after a change to an object appeared.

Keeping track on each query and its dependencies is important but the identification as to when to initiate a "ParserCachePurgeJob" is even more so. To avoid any lag in the update process, "DeferredRequestDispatchManager" starts a background process to inform "ParserCachePurgeJob" about all changed (and ONLY those that have changed, a "diff" taken from the "CompositePropertyTableDiffIterator") entities (properties, categories, and objects) that happened to a subject.

As updates solely happen on a diff, which one reason to have maintenance script "rebuildData.php"Allows to rebuild all the semantic data for a selected data backend/store run shortly after the "QueryDependencyLinksStore" was enabled is to build a baseline of dependencies otherwise only newly added queries are tracked and added as part of the "QUERY_LINKS" database table.

Embedded queries that are no longer used (which includes deleted or changed queries) are removed from the table together with all its associated dependencies.

A list of selected properties can be blacklisted and exempted from tracking because there have an impact on performance or are unlikely to be used directly. See configuration parameter $smwgQueryDependencyPropertyExemptionListSets special properties that should be exempted from embedded queries updates for further information.

Invalidation, updates, and the job queue

The actual invalidation is handled by "ParserCachePurgeJob" and despite its name, the execution is done as part of a deferred update process which in case of a denied or failed request ("DeferredRequestDispatchManager" returns with something like "wasCompleted":false,"connectionFailure":2," as part of the logging) is going to be pushed into the job queue as safety measure.

Depending on the size of subjects that need invalidation, the "ParserCachePurgeJob" works in batches and may require some processing before it is finished.

The decision not to rely on the job queue as primary execution handler was made in order for the "QueryDependencyLinksStore" to act independent from a possible delayed job schedule and allowing queries to be updated in an instant (which depends on the update size) by the next page view. Nevertheless, any left over "ParserCachePurgeJob" jobs that weren't processed in time and pushed into the job queue should be timely executed or scheduled with the help of the runJobs.php.

Only a "HTTP/1.0 202 Accepted" acknowledged response of the "DeferredRequestDispatchManager" will allow for an immediate update, any other response code will push the updates into the job queue.

For an altered value that at the same time is part of a query within the same page (where the update was made), the query update may not be immediately visible because of the deferred update process MediaWiki runs on an article save (depending on the MW version, "ParserCachePurgeJob" runs to early before MW has finished or depending on the DB lag update information are not yet available for the "DeferredRequestDispatchManager" to act upon).

If for some reason the "DeferredRequestDispatchManager" is unable to complete the request and waiting on the job scheduler is not an option2 then setting "SMW_HTTP_DEFERRED_SYNC_JOB" to configuration parameter $smwgEnabledHttpDeferredJobRequestSets whether selected jobs can be executed asynchronously to the initial transaction triggering the job3 may be an option but can slightly influence the update time since transaction updates are done synchronously.

Example

See also

Specific information
Information on configuration parameters
General information
Technical background information
  • Issue #1117 contains the implementation details of the "QueryDependencyLinksStore"
  • Issue #1261 discussion about the "HTTP/1.1 301 Moved Permanently" return