Archive:Repairing SMW's data 1.3

From semantic-mediawiki.org


This page contains outdated information and is thus OBSOLETE!
This documentation page applies to all SMW versions from 1.0 to 1.3.
     


SMW admin manual
Installation
Configuration
Concept caching
Fixed properties
Using SPARQL and RDF stores
SPARQLStore
Pretty URIs
Troubleshooting
Repairing data and data structures
Extensions
Basic extensions
Semantic extensions
SMW user manual
Table of Contents

All data that Semantic MediaWiki uses is stored in wiki pages. So if the data should ever get out of date or contain any errors, then it is always possible to completely rebuild the data from the wiki. No data is ever lost.

This page describes way to repair/initialise basically any SMW installation. The data of a single page can be refreshed by simply editing and saving it. If there are many pages it is more convenient to use a script doing this automatically: SMW_refreshData.php


Refreshing the data of all pages[edit]

The basic operation of SMW_refreshData.php is to go though all pages of the wiki and to re-store the semantic data for each. Normally, the script can be run by changing to the directory [SMW_path]/maintenance of your SMW installation, and the executing

php SMW_refreshData.php -v

where of course php needs to be installed on the command line. If this does not work on your site (e.g. due to unusual directory structures), read the file [SMW_path]/maintenance/README in that directory.

The above script goes through all pages in the order they are stored in your wiki database, and refreshes their data. The parameter -v makes sure that the script's progress is printed. The script can be aborted by CRTL-C as usual.

If you have a large number of pages then the script may consume a lot of memory during its execution, and it is better to stop after, say, 2000 pages. This is due to a PHP memory leak. As a workaround, the script can be run for only part of the pages at a time: use the parameters -s and -e to give a first and last page id to be refreshed, e.g.

php SMW_refreshData.php -v -s 1000 -e 1999

Mutliple runs of this script might be needed, e.g. since data for properties can only be stored when the datatype of the property was stored. You can run the script with parameters -tp to refresh only type and property pages at first, so that these are already available when doing the second refresh. Overall, more than two refreshs should not be required in normal cases.

To make sure that all wiki pages display the new data after the refresh, you can run touch LocalSettings.php. This will invalidate any MediaWiki page caches that may otherwise make you see old versions of wiki pages.

Refreshing all data[edit]

The above refreshes all pages' data and thus fixes the data records for each page. But not all data that SMW keeps internally is associated with a wiki page, and sometimes it could also happen that SMW contains wrong data that belongs to a page that no longer exists. This data would not be repaired when going through all pages, and therefore other measrues might be needed.

To completely delete and recreate all SMW data, the refresh script can be used with parameter -f. In this case, it is suggested to first rebuild the records for all properties and types, and to process the remaining data afterwards. So one would run:

php SMW_refreshData.php -ftpv
php SMW_refreshData.php -v

Note that of course only the first run uses -f. On large wikis, the parameters -s and -e can again be used as explained in the previous section.

To make sure that all wiki pages display the new data after the refresh, you can run touch LocalSettings.php. This will invalidate any MediaWiki page caches that may otherwise make you see old versions of wiki pages.

Repairing a wiki without shell access[edit]

Some servers may not offer shell logins to wiki administrators, so that running command-line scripts is not possible. There is currently no easy method for repairing SMW in this case, but there are several feasible options.

  • Refresh job. Since Semantic MediaWiki 1.3.0, there is an experimental feature for triggering the step-by-step refreshing of all pages through the web. To do that, set the option $smwgAdminRefreshStore = true; and go to Special:SMWAdmin. At the end of the URL, add ?action=refreshstore and press enter. This will create jobs to update all data, step by step. Whenever a page in your wiki is called, some of these updates will be done, until all data was updated. You can see the number of current jobs at Special:Statistics (typically, the number will go down and up again, since some jobs add further jobs; there is no overall status report available right now). There is no way to stop this other than by disabling jobs completely or deleting all SMW jobs from the database table job. You can run all jobs at once by using the MediaWiki script runJobs.php though it is advisable to use the parameter --maxjobs 1000 to avoid very long runs. The setting $smwgAdminRefreshStore = true; can be removed again after creating the job without stopping it. Calling the URL again will only restart the process from the beginning. Please report any problems encountered with this feature.
  • Use a web-shell. There are web-based shell tools for PHP that emulate command-line access through web interfaces. These are subject to timeouts and may not be able to execute long-running scripts, but with the parameters -s and -e the repair task can always be split into chunks.
  • Using semantic forms and templates. Since version 1.2, SMW will automatically refresh the data of pages that use templates after that template has changed. So if most of your semantic pages do use some template (e.g. for a semantic form), then editing this template will trigger a lot of repair activity in your wiki. Note that the updates are not necessarily completed immediately but are processed over some time, using the job system of MediaWiki. See Special:Statistics to check the length of your job queue.
  • Edit pages. To refresh small amounts of pages, it can be feasible to simply edit them directly. This could also be automated by a web-robot such as pywikibot.

A better solution is worked on.

When refreshing pages selectively, it should be noted that SMW stores data about properties and about categories. So even category information will not be available in SMW until a page that uses a category was first stored with SMW installed.