Help:Ontology mapped data import with a script

Usually users neither do want to upload complete ontologies, nor does the wiki support the complete expressivity of the ontology language. Furthermore, ontology imports are not a trivial task that has to happen often: it is basically used to pump prime the wiki with an pre-existing data source -- something that will usually be done only once with a certain data source. And different data sources need to be treated differently.

Therefore the most promising way to implement the import of ontologies -- and basically many other, different external data sources -- to a wiki, is to write a script that will read the external data source, create Mediawiki markup based on that data source, and upload that markup to an appropriate page in the wiki. In case the page already exists, the script author has to decide how to handle that case.

Let's take a look at an example. Assume we have a file with the following content: Hydrogen, H, 1 Helium, He, 2 ... I.e., a tab separated list of all elements with its name, chemical symbol, and element order, with each entry being separated with a new line. A script could parse the data line by line, and create wiki text like " Hydrogen (Chemical symbol:H) is element # 1 in the element table. " and upload that text to the page Hydrogen, assuming it does not exist yet.

For that we need a library that allows us simple read and write operations on the wiki. Since Semantic MediaWiki is properly based on MediaWiki, we can merrily reuse libraries created for MediaWiki for that task -- in PHP for example the integrated API (not finished yet), or in Python the pyWikipediaBot framework (stable and maintained).

The format of the data source does not require to be an ontology, but as often ontologies have certain advantages for data exchange: first, there are a number of available libraries to parse ontologies in standard formats (so, no need to write a parser), second, it is easy to map and merge data from different sources and then get exactly the data that we need.

Here is an example with an ontology, that can also be used for testing that feature. In this example we will use Python as a scripting language, but this is not a requirement. Let's assume the file is known as elements.rdf.

It is easy to imagine to have a bigger data-file. Assuming an installed and configured pyWikipediaBot framework, the following script would upload the file:

Running the script on an empty wiki with the above data file should lead to the following output on the command line: Added page Hydrogen Script ended. In the wiki, a page "Hydrogen" should have been created an the content of the page would be Hydrogen (Chemical symbol: H) is element number 1 in the element table.

Running it a second time should send the following output to the command line: Hydrogen exists already, I did not change it. Script ended. No changes in the wiki should have happened.

Based on this starting point, the scripts and the data sources may be of a much higher complexity and do more clever stuff, for e.g. instead of just not touching anything if a page exists, the script could analyze the existing content and try to add or update data from the data source.

A more general script based on the script described above can be found here.