GSoC 2013

This page collects ideas for projects for the potential involvement of Semantic MediaWiki (or, more specifically, the Open Semantic Data Association) as a mentoring organization in the 2013 Google Summer of Code.

SMW query management and smart updates
Component: SMW (core)

Expected results: New capabilities added to SMW

Short explanation: Query management is a proposed addition to the capabilities of Semantic MediaWiki that would allow automatic updating of queries and gathering of query statistics. This would work by storing query meta data as semantic properties, which can then be queried. Query management would allow automatic updating of query results when their source data is modified. This ensures up-to-date query results everywhere, without the need of more resource-intensive solutions like disabling the cache, or rebuilding all pages via a cron-job. This automatic updating is made possible by storing query dependencies among the query meta-data. Query management would allow you to query various things about query usage such as where queries are located, how much dependencies they have, how long/expensive they are, time of their last update, etc. With this information you can get a better overview of how queries are used across your wiki and pinpoint inefficient usage.

Prerequisites: prior programming experience, working knowledge of PHP, decent database knowledge is a plus

Adding unit tests to SMW
Component: SMW core, possibly extensions

Expected results: Create unit tests (mainly PHPUnit, possibly also some QUnit) so we notice when something breaks.

Short explanation: SMW currently has no unit tests, and really could use some as subtle behavior changes get introduced over time that are not documented and often not intended. A good place to start adding tests are the DataValue classes, which can use testing for their parsing and their formatting methods. More details on SMW unit testing can be found here.

Prerequisites: PHP

Extend and improve RDBMS support of SMW
Component: SMW

Expected results: Additions to the relational database store support of SMW, and improvements to the existing implementation.

Short explanation: SMW currently has support for MySQL and, to some degree, PostgreSQL. This is done via a single "SQL store", which varies its behaviour slightly based on the type of database used. Having 2 separate stores would likely be better. This store currently also lacks support for special data types, such as geographical entities, which cannot be interacted with, since this would require putting SQL functions around field names or values in SQL statements, which is currently not possible. Adding support for the other RDBMSes (partially) supported by MediaWiki core (mainly Oracle and MSSQL) would also be nice.

Prerequisites: PHP, SQL, the RDBMS you want to work with

Improving the interplay between Spark and SMW
Component: Semantic Result Formats and Spark

Expected results: Being able to use Spark with Semantic MediaWiki as the backend store easily, and using Spark within SMW with data from an external source

Short explanation: Spark is a JavaScript library which allows to take SPARQL query results and visualize them within any HTML5 site. It is basically like inline queries in SMW, but against any SPARQL endpoint and with no required backend. The idea would be to extend Spark so that it can be used against SMW data and not only against SPARQL endpoints, explore if the #ask syntax makes sense, and add a Semantic Result Format that allows to integrate Spark into Semantic MediaWiki.

Prerequisites: JavaScript, PHP (a little)

Semantic Drilldown improvements
Component: Semantic Drilldown

Expected results: Various improvements to Semantic Drilldown.

Short explanation: Semantic Drilldown is an extension that lets users drill down on pages via semantic properties. It is a popular extension (and one of only a handful of SMW extensions enabled on Wikia), but it has a number of important weaknesses:


 * Compound data defined within pages, using either subobjects or Semantic Internal Objects, cannot be filtered on.
 * "Concepts" cannot be filtered on.
 * Results can't be shown in multiple formats at the same time (like a map and a list).
 * The display of results in columns can be awkward (see here, for example).
 * When drilling through subcategories, the full path of subcategories isn't shown.
 * The interface currently doesn't offer flexibility between doing an "AND" and "OR" (or "NOT", for that matter) of different values.
 * It may be possible to improve the extension's performance, using some sort of caching system.

"Section" tag for Semantic Forms
Component: Semantic Forms

Expected results: A new tag, " ", in the Semantic Forms form definition syntax.

Short explanation: Semantic Forms lets users create and edit pages using custom-made forms; it is a very popular extension. Currently, it only allows for creating form fields based on parameters/fields within template calls. It would be very useful to allow fields based on page sections - i.e., portions of text that start with a header. This would allow for imposing even more structure onto pages, and allow for page structures - like a page with a template call in the middle of the page - that SF currently can't support.

Improvements and adittions to (Semantic) Maps
Component: Maps and Semantic Maps

Expected results: Redesign of the JS including unit tests and adittion of new features.

Short explanation: Maps and Semantic Maps both have a code quality problem JavaScript wise. The features have been added over the years by various different people, many of which not all that familiar with JS. This is a maintenance hassle and greatly hinders adittion of new features. Your task would be to redesign the JS functionallity from ground up and then add any of the many exciting features people have been asking about.

Prerequisites: Strong JavaScript skills

noSQLStore4 (from Green to Web Scale SMW)
Component: SMW

Expected results: Use of MongoDB as an SMW store instead of an RDBMS.

Short explanation: MongoDB is web-scale. Sharding and replication can be done relatively easily/organically. Possibly, each concept can correspond to a MongoDB collection, with a system collection having all the metadata about all the "concept" collections. This will even allow a simple SMW API as the concepts are native MongoDB objects. MongoDB also has native geospatial capabilities that can be leveraged to add extended geospatial querying capabilities (e.g. near, radius, bounded queries, distance calculations, etc.) and new geo-datatypes (polygons/shapes for zipcodes, etc.) and Semantic Maps. MongoDB's aggregation and map-reduce framework may even be leveraged for Query Management/Scheduling. Its GridFS capabilities can also be leveraged to create a file/blob SMW datatype.

Prerequisites: MongoDB, PHP

Further ideas

 * Fix the many issues with the Semantic Signup extension.