Help:DataValue

From semantic-mediawiki.org
Jump to: navigation, search
Note: THIS NEEDS AN ACTUAL REVISION

A data value represents the user perspective on data which includes input and output specific rules.

This class and its subclasses is the basic building block of all SMW elements. Its purpose is to provide a unified interface for all semantic entities that SMW deals with, e.g., numbers, dates, geo coordinates, wiki pages, and properties. It might be surprising that not only values but also objects and properties are represented by the SMWDataValue class. This makes sense since wiki pages can be both objects and values, and since properties have many similarities with wiki pages (in particular they have associated articles).

The various subclasses of SMWDataValue roughly correspond to the datatypes that a value can have, and they implement the specific behaviour of particular values. In the current architecture, SMWDataValue subclasses implement all functions that are specific to a particular datatype:

  • Interpreting user input (e.g. parsing a string into a calendar date)
  • Encoding values in a format that is suitable for storage (e.g. computing a standardized, language-independent string for representing a date and a floating point number that can be used for sorting dates)
  • Generating readable output (e.g. converting the internal representation of a date back into a text that is readable in the current language)
  • Generating serializations in the RDF export format (e.g. representing dates with the XML Schema type "date" if they are in the range of this datatype, and using some fallback encoding otherwise)

Each data value thus has many forms of representation: the text a user writes on a wiki page (this can have many forms that lead to the same value), the various display versions (e.g. augmented with links or tooltips), a unique internal representation (the value as the software sees it), an RDF encoding. Some subclasses of SMWDataValue have additional representations, e.g. SMWTimeValue provides an output that is formatted according to ISO 8601. The class SMWDataValue has different get methods for obtaining these representation, and programmers should read the software documentation of that class to understand when to use which method.

A general challenge here is that the datatypes are very diverse. Many values have an internal structure, e.g. dates have a year, month, and day component, whereas wiki page titles have a namespace identifier, title text, and interwiki component. This diversity makes it hard to treat values as single values of some primitive datatype (e.g. representing wiki pages as strings would not be accurate, since it would be impossible to filter such values by namespace). The internal representation of a data value therefore is a list of values, obtained with the method getDBkeys(). How long this list is, and which type each of its entries has is determined by the method getSignature(). There is also a method getHash() that returns a string that can be used to compare two datavalues without looking at the details of their format.

Another challenge is the diversity of desirable output formats. Users typically want a large number of formatting options that are very specific to certain datatypes, so that it is hard to provide them via a unified interface. Moreover, output is used in MediaWiki both within HTML and within Wikitext contexts, requiring different formatting and treatment of special characters. The output methods of SMWDataValue reflect some of this diversity, and an additional facility of "output formats" (see SMWDataValue::setOutputFormat()) is provided for more fine-grained control. But obviously there must be limits of what can be achieved without cluttering the architecture, and users are advised to subclass their own datatype implementations for special formatting.

DataValue objects can be created directly via their own methods, but it is usually advisable to use the SMWDataValueFactory instead. See below for an explanation of SMW's datatype system.