Batch import of large RDF datasets using RDFIO or the new rdf2smw tool

SMWCon Fall 2016
Batch import of large RDF datasets using RDFIO or the new rdf2smw tool
Talk details
Description:	The talk will present the status of the RDFIO extension, as well as recent work on an alternative approach for RDF import with the rdf2smw tool.
Speaker(s):	Samuel Lampa
Slides:	see here
Type:	Talk, Technical talk, Demo
Audience:	Developers, Admins
Event start:	2016/09/30 12:20:00
Event finish:	2016/09/30 12:40:00
Length:	20 minutes
Video:	click here
Keywords:	RDF, semantic web, data import, big data, automation, virtual machine
Give feedback

The RDFIO extension was developed for being able to import datasets consisting of plain RDF triples, for collaborative editing and further export, or just as a means to bootstrap a wiki structure based on an existing dataset.

In our research at the Pharmaceutical Bioinformatics group at Uppsala University, we are simultaneously interested in semantic approaches, and the challenges of big data, as produced by new high throughput laboratory techniques in the life sciences.

We have thus experimented with importing datasets of a somewhat larger size (so far up to ~0.5 million triples) into Semantic MediaWiki. This has put the RDFIO extension to the test, and hinted at the need to develop e.g. a commandline interface for the import function.

To be able to quickly iterate on different settings for converting RDF to wiki page structures without needing to run a full import, we have also developed an alternative approach for RDF import, using a standalone tool, rdf2smw, that converts from RDF to SMW pages and facts in MediaWiki XML format, which means that the resulting wiki content can be manually verified (and settings changed), before the full, time-consuming import, which is then done via MediaWiki's main XML import function.

Finally, to support the development and experimentation, we have also created an automated vagrant box installation of MW, SMW and RDFIO, which we will mention.