Help:Maintenance script "rebuildData.php"

From semantic-mediawiki.org
(Redirected from RebuildData.php)
Jump to: navigation, search
Has maintenance script name::maintenance script "rebuildData.php"
Has description::Allows to rebuild all the semantic data for a selected data backend/store
Further Information
Provided by: Has component::Semantic MediaWiki
Added: Has minimum version::1.9.2
Removed: Has maximum version::still in use
Location (path): Has path::./extensions/SemanticMediaWiki/maintenance/
Class docu: RebuildData
Table of Contents

The rebuildData.php maintenance script recreates all the semantic data in the database, by cycling through all the pages that might have semantic data, and calling functions that re-save semantic data for each one.

rebuildData.php is a command line tool, while special page "SMWAdmin" data rebuilding (repair) uses the job queue to process all pages. If possible use this maintenance script for data rebuilding. See also the help page on using special page "SMWAdmin" for data rebuilding.

Important Notes

  • This maintenance script depreciated the former "SMW_refreshData.php" script starting with Semantic MediaWiki 1.9.2 while options and usage at that time remain the same.
NoteNote:  It is strongly encouraged to make the transition to this new script to take advantage of the new features added since and because the old one will will be removed with the release of Semantic MediaWiki 3.0 in early 2017.
  • Starting with Semantic MediaWiki 2.2.0 the properties are always being rebuild first every time this maintenance script is run and no matter what options are chosen.CiteRef::gh:smw:877
  • If SMW is not installed in its standard path then the "MW_INSTALL_PATH" environment variable must be set. See README in the maintenance directory.

Usage

php rebuildData.php [-d|-s|-e|-n|--startidfile|-b|-v|-c|-p|-t|--page|--redirects|--query|-f|--no-cache|--report-runtime|--debug|--skip-properties|--shallow-update|--ignore-exceptions|--exception-log|--with-maintenance-log]

NoteNote:  This only shows the script specific parameters.

Parameters

Generic parameters

--help (-h)
Display this help message
--quiet (-q)
Whether to supress non-error output
--conf:
Location of "LocalSettings.php", if not default (see also this thread)
--wiki
For specifying the wiki ID
--globals
Output globals at the end of processing for debugging
--memory-limit
Set a specific memory limit for the script, "max" for no limit or "default" to avoid changing it
--server:
The protocol and server name to use in URLs, e.g. https://www.semantic-mediawiki.org. This is sometimes necessary because server name detection may fail in command line scripts.

Script dependant parameters

--dbuser
The DB user to use for this script
--dbpass
The password to use for this script

Script specific parameters

-d <delay>
Wait for this many milliseconds after processing an article. Useful for limiting server load.
-s <startid>
Start refreshing at given article ID. Useful for partial refreshing.
-e <endid>
Stop refreshing at given article ID. Useful for partial refreshing.
-n <numids>
Stop refreshing after processing a given number of IDs. Useful for partial refreshing.
--startidfile <startidfile>
Read <startid> from a file instead of the arguments and write the next ID to the file when finished. Useful for continual partial refreshing from cron.
-b <backend>
Execute the operation for the storage backend of the given name (default is to use the current data backend/store)
-v
Be verbose about the progress.
-c or --categories
Will refresh only category pages (and other explicitly named namespaces).
NoteNote: The --categories option is only available starting with Semantic MediaWiki 2.4.0.CiteRef::gh:smw:1433
-p
Will refresh only property pages (and other explicitly named namespaces)
-t
Will refresh only type pages (and other explicitly named namespaces)
NoteNote: This option was removed with Semantic MediaWiki 2.3.0 since namespace "Type" is no longer used. CiteRef::gh:smw:1127
--page=<pagelist>
Will refresh only the pages of the given names, with | used as a separator.
NoteNote: The options -s, -e, -n, --startidfile, -c, -p, -t are ignored if --page is given.
--redirects
Will refresh only the pages which are redirecting to another page. Available since Semantic MediaWiki 2.4.0.CiteRef::gh:smw:1433
--query
Will refresh only pages returned by a given query. Available since Semantic MediaWiki 1.9.2.CiteRef::gh:smw:243
NoteNote: The options -s, -e, -n, --startidfile, -c, -p, -t are ignored if --query is given.
-f
Fully delete all content instead of just refreshing relevant entries. This will also rebuild the whole storage structure. May leave the wiki temporarily incomplete.
--no-cache
Sets the $wgMainCacheType to none while running the script. Available since Semantic MediaWiki 2.2.0.CiteRef::gh:smw:749
--report-runtime
Will return memory usage and runtime of the respective script execution. Available since Semantic MediaWiki 2.1.0 as --runtime.CiteRef::gh:smw:643
NoteNote: Since Semantic MediaWiki 2.2.0 this parameter was renamed to --report-runtime.CiteRef::gh:smw:68e8bc9
--debug
Sets global variables to support debug ouput while running. Available since Semantic MediaWiki 2.2.0.CiteRef::gh:smw:766
--skip-properties
Is to skip the default properties rebuild (only recommended when successive build steps are used). Available since Semantic MediaWiki 2.3.0.CiteRef::gh:smw:1106
--shallow-update
As option is to parse only those entities that have a different last modified timestamp compared to that of its last revision and should only be used to run a quick update on deleted, redirects, and other out of sync entities. Available since Semantic MediaWiki 2.3.0.CiteRef::gh:smw:1127
--ignore-exceptions
Allows to ignore encountered exceptions, i.e. the script does not stop as soon as an exception (error) appears. Available since Semantic MediaWiki 2.4.0.CiteRef::gh:smw:1433
NoteNote: This option is best used together with the --exception-log option.
--exception-log="/path/to/smw/logs/directory/"
Writes exceptions (errors) encountered to a log file allowing for later debugging. Available since Semantic MediaWiki 2.4.0.CiteRef::gh:smw:1361
NoteNote: A file name is automatically being created containing the string "logrebuilddata-exceptions" and the timestamp (ISO format), e.g. "logrebuilddata-exceptions-2016-12-05.log". In case an unambiguous name is needed just add an identifier to the option, e.g. --exception-log="/path/to/smw/logs/directory/mywiki-". It will be prepended to the file name, e.g. "mywiki-logrebuilddata-exceptions-2016-12-05.log".
--with-maintenance-log
Adds a log entry to "Special:Logs" on the wiki and reports the script's runtime. Available since Semantic MediaWiki 2.4.0.CiteRef::gh:smw:1361
NoteNote: If you are using this parameter make sure that MediaWiki's configuration parameter $wgMaxNameChars is set to a value not lower than "17".CiteRef::gh:smw:1983 Otherwise an exception will be issued informing about the minimum value for this setting ("32" or higher is recommended).CiteRef::gh:smw:1985

Progress display

The progress (starting with Semantic MediaWiki 2.3.0) that is displayed during a rebuild process is self-adjusting based in the amount of expected ID's vs. the actual amount of ID's being processed.CiteRef::gh:smw:1042 Due to each entity (i.e. subobject, property, and subject) being assigned an ID it does not necessarily correspond to the page ID of MediaWiki as various types of subobjects embedded in a page are assigned an ID as well.

Especially in case of a full rebuild (-f) is the progress slanting where the start amount is lower than the final ID count (which is predicted from the MediaWiki articles count).

............................................................ 10%
............................................................ 9%
............................................................ 9%
............................................................ 8%
............................................................ 9%
............................................................ 10%

Quick and slow progress

ID's assigned to a "real" page are parsed using MW's Parser to ensure that all data and extensions influencing the state of the data are being accounted for which amounts to the extensive memory and time effort required to finalize a full parse of a page including all #subobject, #ask plus any other embedded parser function calls.CiteRef::gh:smw:1698

ID's that represent data items such as subobjects or value objects can be processed using Semantic MediaWiki internal functions hence the comparatively quick update progress.

Verbose output

The verbose output (-v) got extendedCiteRef::gh:smw:1433 in Semantic MediaWiki 2.4.0 to display additional information about an entity that is being processed. The marker * identifies a regular MediaWiki page with the ID corresponding to the page table entry while non-marked ID's are matched to an entry in the smw_object_ids table.

(4/796)         Finished processing ID 5* (Berlin)
(5/796)         Finished processing ID 6* (London)
(6/796)         Finished processing ID 7* (Lorem_ipsum)
(7/796)         Finished processing ID 8* (John_Doe)
(7/796)         Finished processing ID 8 (Property:Display_units)

Marked for deletion entries

Starting with Semantic MediaWiki 2.3.0, entities marked as deleted :smw-delete are being removed at each "rebuildData.pgp" run to free tables of outdated entries.CiteRef::gh:smw:1106

Removing marked for deletion entries.
..
2 IDs removed.

Dispose of outdated object ID references

Starting with Semantic MediaWiki 2.4.0 outdated object ID references are disposed when running "rebuildData.php".CiteRef::gh:smw:1216 When the data type of a property type is changed, a property is removed or other object values are deleted chances are that some ID's remain in the ID_TABLE of the database.CiteRef::gh:smw:498 To avoid a pile of garbage references being collected in this database table it is checked if for the ID's whether they can safely be removed or not during the "rebuidData.php" run. This is best and frequently done using the --shallow-update option.CiteRef::gh:smw:1754:236913464.

Examples

The following command refreshes existing semantic data items with a delay of 50 ms between every data item without prompting progress information.
php rebuildData.php -d 50 -q
The following command verbosely rebuilds semantic data after deleting existing items with a delay of 100 ms between every data item.
php rebuildData.php -f -d 100 -v
The following command verbosely rebuilds semantic data of pages in a given category.
php rebuildData.php --query='[[Category:SomeCategory]]' -v
The following command verbosely rebuilds semantic data with a delay of 75 ms between every data item and provides memory usage information after it has been completed.
php rebuildData.php -d 75 --report-runtime
Example output:
Memory used: 25543928 (b: 11429464, a: 36973392) with a runtime of 81.62 sec (1.36 min)
a) memory used after execution and b) memory used before the execution
The following command refreshes the wiki pages "Page 1" and "Page 2" without prompting progress information.
php rebuildData.php --page="Page 1|Page 2"
The following command rebuilds semantic data with a delay of 50 ms between every data item, ignores errors which may arise during execution and writes them to a file in the directory provided.
php rebuildData.php -d 50 --ignore-exceptions --exception-log="/var/log/mediawiki/"
Exceptions are e.g written to the "mywiki.logrebuilddata-exceptions-2016-08-14.log" file if the wiki ID was "mywiki" and the script was run on August 14, 2016.
The following command quietly removes just the outdated entities.CiteRef::gh:smw:1754:236913464
rebuildData.php --skip-properties -s 1 -e 1 --quiet

{{#scite:gh:smw:243

|type=issue
|citation text=Semantic MediaWiki: GitHub pull request #243 

}} {{#scite:gh:smw:498

|type=issue
|citation text=Semantic MediaWiki: GitHub issue #498

}} {{#scite:gh:smw:643

|type=issue
|citation text=Semantic MediaWiki: GitHub issue #643

}} {{#scite:gh:smw:749

|type=issue
|citation text=Semantic MediaWiki: GitHub issue #749

}} {{#scite:gh:smw:766

|type=issue
|citation text=Semantic MediaWiki: GitHub issue #766

}} {{#scite:gh:smw:877

|type=issue
|citation text=Semantic MediaWiki: GitHub issue #877

}} {{#scite:gh:smw:1042

|type=issue
|citation text=Semantic MediaWiki: GitHub pull request #1042 

}} {{#scite:gh:smw:1106

|type=issue
|citation text=Semantic MediaWiki: GitHub pull request #1106 

}} {{#scite:gh:smw:1127

|type=issue
|citation text=Semantic MediaWiki: GitHub pull request #1127

}} {{#scite:gh:smw:1216

|type=issue
|citation text=Semantic MediaWiki: GitHub pull request #1216

}} {{#scite:gh:smw:1433

|type=issue
|citation text=Semantic MediaWiki: GitHub pull request #1433

}} {{#scite:gh:smw:1698

|type=issue
|citation text=Semantic MediaWiki: GitHub issue #1698: Can we make `rebuildData` to operate faster?

}} {{#scite:gh:smw:68e8bc9

|type=issue
|citation text=Semantic MediaWiki: GitHub commit 68e8bc9 

}} {{#scite:gh:smw:1754:236913464

|type=issue
|citation text=Semantic MediaWiki: GitHub issue #1754 comment 

}}


{{#set: |docinfo status=effective }}{{#set: |worn by =Template:Docinfo |warning =Master page updated }}{{#set:

|worn by =Template:Docinfo
|warning =No description
}}{{#set:

|revision id=47760 |namespace=Help |docinfo progress =100 |docinfo priority =2 }}{{#set:docinfo master page=Help:rebuildData.php }}{{#set:docinfo master rev =47760 }}{{#set:docinfo category=-}}{{#set:docinfo author=User:Kghbln}}{{#set:docinfo editor=User:Kghbln}}{{#set:docinfo support=User:Kghbln}}{{#set:docinfo language=en}}

This documentation page applies to all SMW versions from 1.9.2 to the most current version.
Other versions: {{#ask: Docinfo master page::Help:rebuildData.php Docinfo language::en from version::!1.9.2|
   ?from version = |
   ?to version = |
   sort = from version|
   order = desc|
   link=none|
   format=list|
   sep=, |
   template=versionlink|
   limit = 3|
   searchlabel = […]
  }}       Other languages: {{#ask: Docinfo master page::Help:rebuildData.php from version::1.9.2 Docinfo language::!en|
  ?Docinfo language =|
  link=none|
  format=list|
  sep=, |
  template=languagelink|
  sort = Docinfo language|
  order = asc|
  limit = 10|
  searchlabel = […]
}}

Docinfo master page::Help:rebuildData.php Docinfo language::en from version::1.9.2

{{#set:Release status=current}}

{{#set:Is master page=false}}