Speeding up Semantic MediaWiki

Speeding up Semantic MediaWiki
On very large or high traffic sites, further restriction of SMW features might be desirable for performance reasons. Besides setting up the usual external caches and distributed DB-servers, you can also modify some SMW options to increase speed (while switching off features). Try some or all of the below in LocalSettings.php:
 * Set to  .  Drastic improvements for searches with several subqueries.  This will not affect search results unless an annotation's object points to a redirected page.
 * Set to  .  most radical -- no more semantic queries, just browsing/display features
 * Set to  .  disable subcategory reasoning in queries
 * Set to  .  allow only queries of 5 or fewer conditions (default 12)
 * Set to  .  allow only queries of depth 2 or smaller (default 4)
 * Set to  .  never ever return more than 100 results to a query (default 10000)
 * Set to  .  only return 10 query results by default
 * Set to  .  allow duplicate query segments from computing query results
 * Set to  .  allow query result caching

These settings have different effects, and their effectiveness depends very much on the usage and content structure of your wiki. You may wish to try out conservative settings first and relax these step by step when things work reliably. If you have continued performance issues on your *large* wiki, please do not hesitate to contact the SMW developers for support.

Slow query formatting
Many users think that the main performance problem is in query answering (finding the matching results) but in practice the far bigger problem is often query formatting (putting all that data into the layout you want). Indeed, queries can produce very long pages with large amounts of templates. It is very convenient to use template-based result formats to achieve a nice custom display of a result list. However, this can make rendering very slow and put the wiki under significant load. The problem of overusing templates in query result formatting becomes even worse if you start to use inline queries in some of your formatting templates. If, moreover, you are using such template-embedded queries to pull in more results (and more templates), this can bring any site down. There have been known public wikis where the creation of a single page caused several thousand templates and ask queries to be processed (many of them many times, by different templates).

Example: Imagine you have a wiki that manages publications. Each publication has a title, a publisher, a date of publication, and up to 12 authors. You need to know each author, but also their order is important. So you create 12 properties "author1" ... "author12". You want to create a query that builds the list of publications of 2014. Obviously, the results should look like normal publications (not like a table with 15 columns!). So you have a query that asks for 15 property values and you use a template that assembles this data into a nice looking publication string. Your template uses parser functions to create a comma-separated list of authors (in particular, if there are only three authors, then you want the list to have only two commas, obviously). This requires 11 nested parser functions in your template. Thus, listing 100 publications on one page leads to 1100 parser function calls for constructing this page. This will slow down page constructions and put load on your server.

There are a number of possible solutions. Let's explain them with the example:
 * Prefabricated results. Every publication has a page, and the page already contains all of the data you need for displaying the publication. So instead of constructing a nice display dynamically when showing query results, you store the string that should be displayed in an SMW property of type string. It is possible to store MediaWiki markup in string values. If you use a template for publications, you can just refer to the template variables again on the same page to build that string. If you use a more complicated setup (e.g., many templates in Semantic Forms style), then the Variables extension will help you to collect all the data you need without running any queries. In any case, you get a string that is exactly the string you want to use for displaying publications, and you can now change all your publication queries to display this one string value instead of fetching 15 values to construct it anew. There is no new update problem either: if the publication page changes, the string will be updated too. Since you would probably display publications in the same way in many queries, you can reuse this single string to speed up most of your queries.
 * PHP-based formatting. It is not very hard to program your own custom result format in PHP. In fact, if you can work with MediaWiki templates, this will seem trivial to you. Thus, you could create a result format that produces the publication string programmatically. The steps/checks are the same as in templates, but using conditionals in PHP will be much more efficient than using parser functions in MW pages. This method also gives you additional power and allows you to do things that can be really hard in MediaWiki templates. You don't need to create a universal result format that can be fully controlled by users of your wiki. For example, PHP result formats can fetch any data from the store in PHP whether or not there is a corresponding printout in the query. So your code could just fetch the data for authors etc. on its own. To get started with PHP result format programming, look at some of the short PHP result formats that are included with SMW (the table format might be a good starting point, although it is already a bit more complex than what you will need).
 * LUA-based templates. MediaWiki supports LUA as a scripting language that can be used to replace complex table code. LUA is known to be much faster than parser functions, so this can be a good way to improve performance while keeping everything in wiki (no PHP code). See also the "Semantic Scribunto" extension.
 * No universal templates. Some users have been observed to build templates that are supposed to format arbitrary data. They just used (many!) inline queries to find out what to do. This practice should rather be avoided. If you can distinguish different cases that require different processing of results, then don't try to make it all happen in a single template. Create one template for each use case and make sure it only checks the things that are relevant in this case. If you are a programmer, you may find that it is better to split long functions into many smaller functions that can be used universally. This reasoning should not be applied to templates, since the cost of calling a (sub)template is huge compared to the cost of calling a function in programming. Flat code, while less readable, might therefore be preferable here.

These solutions can also be combined. For example, it can also be enough to store a partial prefabricated result (e.g., the list of all authors) and then do some remaining formatting in LUA or templates at query time. This can give you more flexibility if you need to support many slightly different query formats, and you don't want to store prefabricated results for each of them.

Finally, whatever you do, make sure that no page ever queries for data that is entered on that same page. Yes, this has been seen in practice. Instead, if you need the value of some property elsewhere on the same page, use the Variables extension to store it in a local variable during parsing.

Concepts
Using concepts (see concept caches) similar to categories can help ease the burden on database queries and free performance for repetitive #ask queries. In contrast to the formatting problem discussed before, this really addresses performance problems that are caused by particularly complex query conditions that take a long time to process even if you use the most simple formats (and no printouts).

Semantic Drilldown
Older versions of Semantic Drilldown could harm performance significantly; this was improved considerably in version 1.3. If you are using an older version, you're strongly encouraged to upgrade to the latest version.

Page Forms
In Page Forms (formerly Semantic Forms), pointing red links to forms can slow down performance, since every red link needs to be checked to see if it has a matching form. The following setting can help:

This checks for matching forms for each red-linked page only on the page on which the link exists, not across the wiki; which could significantly reduce the number of database queries required

Speeding up MediaWiki
Often, problems in performance for Semantic MediaWiki-based wikis do not arise from Semantic MediaWiki at all, but rather are due to normal MediaWiki operation. Often, these can be fixed by PHP caching, database caching and the like. See here for one list of tips.

Of the caching tools mentioned on that page, one that has consistently proven very helpful for SMW administrators (and probably for MediaWiki administrators as a whole) is APC.

Documents

 * Markus Krötzsch: Saving C02: Top SMW Performance Issues and How to Address Them, SMWCon Fall 2011, September 21, 2011