Repairing data and data structures

From semantic-mediawiki.org
N/A
Describes ways on how to repair and rebuild data.
Table of Contents

All data that Semantic MediaWiki (SMW) uses is stored in wiki pages. If the data should ever get out of date or contain any errors, then it is always possible to completely rebuild the data from the wiki. No data is ever lost. Refreshing data is also needed on some software upgrades, and after the first installation (since it also gathers some existing meta data).

This page describes ways to repair/initialise basically any SMW installation. The data of a single page can be refreshed by simply editing and saving it. If there are many pages it is more convenient to use a feature of Special:SMWAdmin for doing this automatically. There is also a maintenance script for doing this from the command line: "rebuildData.php" ("SMW_refreshData.php" ≤ SMW 1.9.1).

To make sure that all wiki pages display the new data after the repair, you can run

touch LocalSettings.php

(or, if there is no command line access, edit it in some trivial way). This will invalidate any MediaWiki page caches that may otherwise make you see old versions of wiki pages.

Using a maintenance script[edit]

This is the recommended way in case the wiki operator has shell access to the server running the wiki. The maintenance script to rebuild and thus repair the semantic data added to the wiki is called "rebuildData.php". It directly refreshes selected portions of the wiki without any prior access to it. The basic operation of "rebuildData.php" ("SMW_refreshData.php" ≤ SMW 1.9.1) is to go through all pages of the wiki and to re-store the semantic data for each. Normally, the script can be run by changing to the directory [path to SMW]/maintenance of your SMW installation, and the executing

php rebuildData.php -v -d 50

where of course PHP needs to be installed on the command line. If this does not work on your site (e.g. due to unusual directory structures), read the file [path to SMW]/maintenance/README in that directory.

The above script goes through all pages in the order they are stored in your wiki database, and refreshes their data. The parameter -v makes sure that the script's progress is printed. The parameter -d 50 will add delay of 50 ms between every data item. The script can be aborted by pressing "CRTL-C" as usual. The index numbers shown by the script refer not only to page indices as used in MediaWiki, but also to indices SMW uses in its semantic data. For this reason, the script may process indices that are higher than the maximal page index in the wiki.

If you have a large number of pages then the script may consume a lot of memory during its execution, and it is better to stop after, say, 2000 pages. This is due to a PHP memory leak. As a workaround, the script can be run for only part of the pages at a time: use the parameters -s and -e to give a first and last page id to be refreshed, e.g.

php rebuildData.php -v -s 1000 -e 2999

Multiple runs of this script might be needed, e.g. since data for properties can only be stored when the datatype of the property was stored. You can run the script with parameters -p to refresh only property pages at first, so that these are already available when doing the second refresh. Overall, more than two refreshes should not be required in normal cases.

To make sure that all wiki pages display the new data after the refresh, you can run

touch LocalSettings.php

This will invalidate any MediaWiki page caches that may otherwise make you see old versions of wiki pages.

Using a special page[edit]

The administration special page "SemanticMediaWiki" (SMW ≥ 2.5.0) or "SMWAdmin" (SMW 0.3 - 2.4.x) offers a feature for rebuilding all data. By default all users who are administrators (user group "sysop") or Semantic MediaWiki administrators (user group "smwadministrator") may use this feature. This feature can be disabled by removing the SMW_ADM_REFRESH constant from configuration parameter $smwgAdminFeatures. See the linked help page for information on how this is done.

To initiate the rebuild of the semantic data the button labelled "Start updating data" provided in the "Rebuild data" section of the "SemanticMediaWiki" special page has to be clicked. After doing so the system message "An update is already in progress." appears together with a bar displaying the approximate progress of the data rebuild together with a respective percentage information. Also a button labelled "Stop this update" together with a checkbox field requesting confirmation "Yes, I am sure" appears at the same time. This allows to cancel the process in case it has mistakenly been initiated or if the process halted for some technical reason.

Once initiated, the update takes time varying from wiki to wiki. The more data is stored on the wiki the longer this process will take. Moreover since the update progresses during each page view it will be faster if many people view your wiki. Also note that it is normal that the update progresses faster until it reaches 50%, since only property pages are refreshed during that part of the process. The actual update of all wiki pages starts at 50%.

You can speed up the process by using one of the following options:

  • If you have shell access, you can use the MediaWiki maintenance script "runJobs.php". Please consider specifying a parameter --maxjobs 1000 or similar so that each run of the script is bounded in duration. Otherwise the script tends to occupy increasing amounts of memory.
  • If you do not have shell access you may use the MediaWiki setting $wgJobRunRate in your "LocalSettings.php" file to increase the number of jobs which should be performed per request. This increases the speed of the update process. Please note that it also increases the load on the system which may have an negative effect on the performance of your wiki.
  • You can also run a script to automatically hit the web site a certain number of times, so that you don't have to either wait for the site to be hit or keep reloading it in the browser. You can find an example of such a script here: "hitURL.php". This can be done in conjunction with increasing the value of $wgJobRunRate.

Rebuilding everything[edit]

The above methods should be able to fix data records in SMW in most cases. However, it is conceivable that some erroneous content of the SMW storage still persists for some reason. In this case, it makes sense to completely delete and reinstall the database structures of SMW before refreshing all data.

To completely delete all SMW data, the setup script setupStore.php ("SMW_setup.php" ≤ SMW 1.9.2) is used with parameter --delete:

php setupStore.php --delete

After this, proceed as if re-installing SMW anew by first running

php setupStore.php

again, and then triggering the repair of all data using one of the above methods.

The script "rebuildData.php" can be also used with parameter -f to delete and recreate all data in one step. In this case, it is suggested to first rebuild the records for all properties and types, and to process the remaining data afterwards. So one would run:

php rebuildData.php -fpv
php rebuildData.php -v

Note that of course only the first run uses -f. On large wikis, the parameters -s and -e can again be used as explained in the previous section.

Automatic repair features[edit]

Some changes on wiki pages require that the data of other pages is updated as well. For example, if a template that contains semantic annotations is changed, then the data for all pages using this template might also require update. Likewise, if the datatype of some property is changed, all pages using this property should be refreshed. SMW usually takes care of such updates automatically. As in MediaWiki, it may take some time until all required updates are completed. There is no convenient way to review the progress.

Caveats[edit]

When adding new namespaces or activating existing namespaces with the setting true for the configuration parameter $smwgNamespacesWithSemanticLinks, the default special property "Modification date" will not be created by repairing or initialising data according to the explanations described on this page. This property will be created each time a page in this new namespace is modified, after it was added to the wiki.