Help:Maintenance script rebuildData.php

The "rebuildData.php" maintenance script recreates all the semantic data in the database, by cycling through all the pages that might have semantic data, and calling functions that re-save semantic data for each one, i.e. doing a full re-parse.

This script is a command line tool, while (formerly known as special page "SMWAdmin") data rebuilding (repair) uses the job queue to process all pages. If possible use this maintenance script for data rebuilding. See also the help page on using special page "SemanticMediaWiki" for data rebuilding.

brought an improved client output to this maintenance script.CiteRef::gh:smw:4517

Parameters
Maintenance scripts provide generic maintenance parameters, script dependent parameters and depending on the maintenance script script specific parameters which are described on this page if provided.


 * Script specific parameters

Progress display
The progress (starting with ) that is displayed during a rebuild process is self-adjusting based in the amount of expected ID's vs. the actual amount of ID's being processed.CiteRef::gh:smw:1042 Due to each entity (i.e. subobject, property, and subject) being assigned an ID it does not necessarily correspond to the page ID of MediaWiki as various types of subobjects embedded in a page are assigned an ID as well.

Especially in case of a full rebuild is the progress slanting where the start amount is lower than the final ID count (which is predicted from the MediaWiki articles count).

Quick and slow progress
ID's assigned to a "real" page are parsed using MW's Parser to ensure that all data and extensions influencing the state of the data are being accounted for which amounts to the extensive memory and time effort required to finalize a full parse of a page including all #subobject, #ask plus any other embedded parser function calls.CiteRef::gh:smw:1698

ID's that represent data items such as subobjects or value objects can be processed using Semantic MediaWiki internal functions hence the comparatively quick update progress.

Verbose output
The verbose output got extendedCiteRef::gh:smw:1433 in  to display additional information about an entity that is being processed. The marker  identifies a regular MediaWiki page with the ID corresponding to the   table entry while non-marked ID's are matched to an entry in the   database table.

Marked for deletion entries
Starting with, entities marked as deleted  are being removed at each "rebuildData.php" run to free tables of outdated entities.CiteRef::gh:smw:1106

Since a dedicated flag is available:CiteRef::gh:smw:3284 Starting with the dedicated  is available.CiteRef::gh:smw:4484
 * The following command quietly removes just the outdated entitiesCiteRef::gh:smw:1754:236913464

Dispose of outdated object ID references
Starting with outdated object ID references are disposed when running "rebuildData.php".CiteRef::gh:smw:1216 When the data type of a property type is changed, a property is removed or other object values are deleted chances are that some ID's remain in the   database table.CiteRef::gh:smw:498 To avoid a pile of garbage references being collected in this database table it is checked if for the ID's whether they can safely be removed or not during the "rebuidData.php" run. This is best and frequently done using the  option.CiteRef::gh:smw:1754:236913464.


 * The following command removes outdated object ID references

Examples

 * The following command refreshes existing semantic data items with a delay of 50 ms between every data item without prompting progress information.


 * The following command verbosely rebuilds semantic data after deleting existing items with a delay of 100 ms between every data item.


 * The following command verbosely rebuilds semantic data of pages in a given category.


 * The following command verbosely rebuilds semantic data with a delay of 75 ms between every data item and provides memory usage information after it has been completed.
 * Example output:
 * a) memory used after execution and b) memory used before the execution
 * a) memory used after execution and b) memory used before the execution


 * The following command refreshes the wiki pages "Page 1" and "Page 2" without prompting progress information.


 * The following command rebuilds semantic data with a delay of 50 ms between every data item, ignores errors which may arise during execution and writes them to a file in the directory provided.
 * Exceptions are e.g written to the "mywiki.logrebuilddata-exceptions-2016-08-14.log" file if the wiki ID was "mywiki" and the script was run on August 14, 2016.


 * The following command removes outdated object ID references and adds an maintenance log entry to special page "Log" (Semantic MediaWiki log)CiteRef::gh:smw:1361

Note
There was some discussion on the mailing list about the occasions it is required to run this maintenance script.CiteRef::mail:user:smw:35804435