Help:Repairing SMW's data
|SMW admin manual|
|Using SPARQL and RDF stores|
|Repairing SMW's data|
|SMW user manual|
All data that Semantic MediaWiki (SMW) uses is stored in wiki pages. If the data should ever get out of date or contain any errors, then it is always possible to completely rebuild the data from the wiki. No data is ever lost. Refreshing data is also needed on some software upgrades, and after the first installation (since it also gathers some existing meta data).
This page describes ways to repair/initialise basically any SMW installation. The data of a single page can be refreshed by simply editing and saving it. If there are many pages it is more convenient to use a feature of Special:SMWAdmin for doing this automatically. There is also a maintenance script for doing this from the command line: "Help:rebuildData.php" ("SMW_refreshData.php" ≤ SMW 1.9.1).To make sure that all wiki pages display the new data after the repair, you can run
The administration special page "Special:SMWAdmin" offers a feature for repairing all data. This page is only available to wiki users with administrator status. Moreover, the update process can only be started or stopped online if the configuration parameter
$smwgAdminRefreshStore is set to
Once initiated, the update takes time. The progress can be viewed on Special:SMWAdmin. Even if the option
$smwgAdminRefreshStore is disabled after starting the update, the ongoing process will continue and can be tracked online. Stopping the process is only possible if
$smwgAdminRefreshStore is enabled.
The time the update will take varies from wiki to wiki. The update progresses during each page view. If many people view your wiki, then the update progresses more quickly. If there are a large number of pages, then the update will take longer. It is normal that the update progresses faster until it reaches 50%, since only property and type pages are refreshed during that part. The actual update of all wiki pages starts at 50%.
You can speed up the process by using one of the two following options:
- If you have shell access, you can use the MediaWiki maintenance script "runJobs.php". Please consider specifying a parameter
--maxjobs 1000or similar so that each run of the script is bounded in duration. Otherwise the script tends to occupy increasing amounts of memory.
- If you do not have shell access you may use the MediaWiki setting
$wgJobRunRatein your "LocalSettings.php" file to increase the number of jobs which should be performed per request. This increases the speed of the update process. Please note that it also increases the load on the system which may have an negative effect on the performance of your wiki.
- You can also run a script to automatically hit the web site a certain number of times, so that you don't have to either wait for the site to be hit or keep reloading it in the browser. You can find an example of such a script here: "hitURL.php". This can be done in conjunction with increasing the value of
Using the SMW maintenance script
While the above method can basically be done without utilising a maintenance script, there is also a script "rebuildData.php" that directly refreshes selected portions of the wiki without any prior access to it. The basic operation of "rebuildData.php" ("SMW_refreshData.php" ≤ SMW 1.9.1)
is to go through all pages of the wiki and to re-store the semantic data for each. Normally, the script can be run by changing to the directory
[path to SMW]/maintenance of your SMW installation, and the executing
php rebuildData.php -v -d 50
where of course PHP needs to be installed on the command line. If this does not work on your site (e.g. due to unusual directory structures), read the file
[path to SMW]/maintenance/README in that directory.
The above script goes through all pages in the order they are stored in your wiki database, and refreshes their data. The parameter
-v makes sure that the script's progress is printed. The parameter
-d 50 will add delay of 50 ms between every data item. The script can be aborted by pressing "CRTL-C" as usual. The index numbers shown by the script refer not only to page indices as used in MediaWiki, but also to indices SMW uses in its semantic data. For this reason, the script may process indices that are higher than the maximal page index in the wiki.
If you have a large number of pages then the script may consume a lot of memory during its execution, and it is better to stop after, say, 2000 pages. This is due to a PHP memory leak. As a workaround, the script can be run for only part of the pages at a time: use the parameters
-e to give a first and last page id to be refreshed, e.g.
php rebuildData.php -v -s 1000 -e 2999
Multiple runs of this script might be needed, e.g. since data for properties can only be stored when the datatype of the property was stored. You can run the script with parameters
-tp to refresh only type and property pages at first, so that these are already available when doing the second refresh. Overall, more than two refreshes should not be required in normal cases.
The above methods should be able to fix data records in SMW in most cases. However, it is conceivable that some erroneous content of the SMW storage still persists for some reason. In this case, it makes sense to completely delete and reinstall the database structures of SMW before refreshing all data.
To completely delete all SMW data, the setup script Help:setupStore.php ("SMW_setup.php" ≤ SMW 1.9.2) is used with parameter
php setupStore.php --delete
The script "rebuildData.php" can be also used with parameter
-f to delete and recreate all data in one step. In this case, it is suggested to first rebuild the records for all properties and types, and to process the remaining data afterwards. So one would run:
php rebuildData.php -ftpv php rebuildData.php -v
Note that of course only the first run uses
-f. On large wikis, the parameters
-e can again be used as explained in the previous section.
Automatic repair features
Some changes on wiki pages require that the data of other pages is updated as well. For example, if a template that contains semantic annotations is changed, then the data for all pages using this template might also require update. Likewise, if the datatype of some property is changed, all pages using this property should be refreshed. SMW usually takes care of such updates automatically. As in MediaWiki, it may take some time until all required updates are completed. There is no convenient way to review the progress.
When adding new namespaces or activating existing namespaces with the setting
true for the configuration parameter
$smwgNamespacesWithSemanticLinks, the default special property "Modification date" will not be created by repairing or initialising data according to the explanations described on this page. This property will be created each time a page in this new namespace is modified, after it was added to the wiki.
Help:Repairing SMW's data en 1.4.0