Using SPARQL and RDF stores

From semantic-mediawiki.org
Using SPARQL and RDF stores
Provides information about using SPARQL and related RespositoryConnectors.
Table of Contents

By default, Semantic MediaWiki (SMW) stores all data in the same relational database (usually, a MySQL database) that is used by MediaWiki. This ensures a simple setup, but a relational database is not an ideal type of storage for semantic data. A more natural data model for SMW data is RDF, a data format that organizes information in graphs rather than in fixed database tables. Fortunately, it is possible to use RDF-based systems, in conjunction with the standard SQL database, to manage and query SMW's data. This page explains the details.

Pros and cons of using an RDF database[edit]

Whether or not to use an RDF store in a specific wiki depends on a number of factors, including the specific RDF database being used. Nonetheless, we can reasonably hope for the following advantages:

  • Better query performance
    RDF stores are designed to answer queries in the SPARQL query language. SMW queries can be expressed in this language much more naturally than in the SQL query language of relational databases. In this sense, SMW queries are a mostly typical use case for RDF database systems while they are a rather exotic use case for relational database systems. Moreover, many important optimization methods for relational database queries are useless or misleading for SMW queries. It can therefore be expected that RDF stores should provide superior query performance.
  • Additional interfaces
    RDF stores that support the SPARQL standard also allow other applications to ask SPARQL queries against their data without going via the SMW web frontend. This allows efficient use of wiki data in other applications. Some SPARQL-capable databases further support (parts of) the OWL 2 ontology language and provide according interfaces to query the stored data (e.g. via the OWL Link protocol). Semantic Web applications also use a number of common programming libraries (such as librdf or the OWL API) that can be useful for integrating them with other tools on a lower level.
  • Reasoning features and ontology-based data access
    Semantic Web languages such as RDF Schema and OWL provide additional expressive features for modeling, for example by allowing the declaration of derived classes or the declaration of further property characteristics (e.g. transitivity of properties). Some SPARQL-capable databases can evaluate these features for query answering, e.g. for ontology-based data access (OBDA), the method of creating "virtual views" on data by means of semantic modeling constructs.
  • Data integration and ontology re-use
    It is possible to store additional data in the RDF database that is updated by SMW. In this way, the RDF store can act as a platform for data integration and ontology re-use.
  • Physical separation of computing resources
    Using a database backend that is not the same as in MediaWiki provides an easy way to distribute tasks across multiple servers. In particular, complex queries can thus be prevented from affecting the basic operation of the wiki, even if they unexpectedly consume a prohibitive amount of computing power, i.e. if they kill the server that hosts the RDF database.

Nevertheless, there are a number of possible drawbacks as well:

  • Higher storage requirements
    The data is only mirrored in RDF databases, not removed from SQL. Hence additional storage space is required.
  • Additional maintenance effort
    The setup of RDF backends in SMW is easy, but there is still some effort in running an additional database-management system.
  • Questions regarding performance and stability
    There are a number of industry-strength RDF databases available today, some of them free/open source. Yet, the experience of using these systems with SMW is still limited, so some testing is helpful before deciding on a particular backend for a large-scale SMW application.

Luckily, it is possible to switch back and forth between SQL-based and RDF-based storage backends without major effort, so that the decision can be revisited after trying it for a while.

Deciding on an RDF database[edit]

In principle, SMW supports any database that supports the SPARQL query language and SPARUL (SPARQL/Update) as introduced in SPARQL 1.1. In Semantic MediaWiki 1.7.0Released on 1 January 2012 and compatible with MW 1.16.x - 1.19.x., stores are required to accept updates and queries that do not specify a graph but it is planned to remove this limitation in the future. One list of RDF stores is maintained at w3.org. See also a list of SPARQL triple-store vendors which is maintained by the SWM Project.

RDF stores are sometimes called "triple stores" even though many modern stores are actually "quad stores" that also assign a named graph to each RDF triple.

Available repository connectors[edit]

Help PageConnectorDescriptionVersion
Help:SPARQLStore (custom)CustomCustom access point to the SPARQLStore2.0.0
Help:SPARQLStore (default)DefaultDefault access point to the SPARQLStore1.6.0
Help:SPARQLStore and VirtuosoVirtuosoVirtuoso access point to the SPARQLStore1.7.1
Help:SPARQLStore and 4store4store4store access point to the SPARQLStore2.0.0
Help:SPARQLStore and BlazegraphBlazegraphBlazegraph access point to the SPARQLStore2.3.0
Help:SPARQLStore and FusekiFusekiJena Fuseki access point to the SPARQLStore2.0.0
Help:SPARQLStore and SesameSesameSesame (RDF4J) access point to the SPARQLStore2.1.0

Available configuration parameters[edit]

Parameter Description Default Version
$smwgSparqlRepositoryConnectorForcedHttpVersion Sets whether CURLOPT_HTTP_VERSION should explicitly be forced for the endpoint communication
false
2.3.1+
$smwgDefaultStore Sets the storage backend to be used for the semantic data
SMW\SQLStore\SQLStore
0.7+
$smwgExportResourcesAsIri Sets whether resources should be exported as IRIs (Internationalized Resource Identifiers)
true
2.5.0+
$smwgSparqlCustomConnector Defines the SPARQL custom database connectors
custom
1.6.0+
$smwgSparqlDataEndpoint Sets the endpoint for data on the SPARQL database
http://localhost:8080/data/
1.6.0+
$smwgSparqlDefaultGraph Sets the identifier (graph) of the SPARQL database
''
1.7.0+
$smwgSparqlQFeatures Sets the SPARQL query features that are expected to be supported by the repository of the identifier (graph) of the SPARQL database see doc­u­men­ta­tion 2.3.0+
$smwgSparqlQueryEndpoint Sets the endpoint for querying the SPARQL database
http://localhost:8080/sparql/
1.6.0+
$smwgSparqlReplicationPropertyExemptionList see doc­u­men­ta­tion +
$smwgSparqlRepositoryConnector Identifies a database connector that ought to be used together with the semantic data store.
default
2.0.0+
$smwgSparqlUpdateEndpoint Sets the endpoint for updating the SPARQL database
http://localhost:8080/update/
1.6.0+

Moving data to the new database[edit]

After the configuration was changed, there is no data yet in the RDF database. To fill it with the current content of the wiki, it is necessary to refresh all data. See the help page on repairing SMW's data for details. Any method that refreshes the data will work. All SMW queries (inline or semantic search) will be executed against the RDF database, so their results will only be correct when all data has been refreshed.

Known limitations[edit]

There are still a few features that are not supported when using query answering via an RDF database:

  • Concept queries: There is no support for concepts in RDF stores yet.

See also[edit]