Archive:Semantic search 1.0

Semantic MediaWiki includes an easy-to-use query language which enables users to access the wiki's knowledge. The syntax of this query language is similar to the syntax of annotations in Semantic MediaWiki. This query language can be used on the special page Special:Ask or in inline queries.

Naturally, answering queries requires additional resources, and the administrators of some sites can decide to switch off or restrict most of the features given below in order to ensure that even high-traffic sites can handle the additional load.

Introduction
Semantic queries specify two things:
 * 1) Which pages to select
 * 2) What information to display about those pages

All queries must state some conditions that describe what is asked for. You can select pages by name, namespace, category, and most importantly by property values. For example, the query:

Located in::Germany

is a query for all pages with the "Located in" property with a value of "Germany". If you enter this in Special:Ask and click "Find results", SMW executes the query and displays results as a simple table of all matching page titles. If there are many results, they can be browsed via the navigation links at the top and bottom of the query results, for example a query for all persons on ontoworld.org.

The second point is important to display more information. In the example above, one might be interested in the population of the things located in Germany. To display this as well as the page titles, change the query to: Located in::Germany population::*

and SMW displays the same page titles and the values of the Population property on those pages, if any.

Both points are explained in more detail in the sections below.

By category or property value
In the example above, we gave the single condition Located in::Germany to describe which pages we were interested in. The markup text is exactly what you would otherwise write to assert that some page has this property and value. Putting it in a semantic query makes return all such pages (actually some more; but read on). This is a general scheme: The syntax for asking for pages that satisfy some condition is exactly the syntax for explicitly asserting that this condition holds.

The following queries show what this means:
 * 1)  gives all pages directly or indirectly (through a sub-, subsub-, etc. category) in the category.
 * 2) born in::Boston gives all pages  annotated as being about someone born in Boston.
 * 3) height::180cm  gives all pages  annotated as being about someone having a height of 180cm.

By using other categories, relations, or attributes than above, we can already ask for pages which have certain annotations. Next let us combine those requirements:

born in::Boston height::180cm

asks for everybody who is an actor and was born in Boston and is 180cm tall. In other words: when many conditions are written into one query, the result is narrowed down to those pages that meet all the requirements. Thus we have a logical AND. By the way: queries can also include line breaks in order to make them more readable. So we could as well write:

born in::Boston height::180cm

to get the same result as above. Note that queries only return the articles that are positively known to satisfy the required properties: if there is no property for the height of some actor, that actor will not be selected.

Wildcards and disjunctions
In the examples above, we gave very concrete conditions, using "Actor", "Boston", and "180cm" as the value for the category or property. It is possible to weaken these conditions in several ways.

Wildcards are written as "+" and allow any value for a given condition. For example, born in::+ returns all pages that have annotations for the property "born in". For categories, this feature makes little sense: [[Category:+]] just returns everything that has some category.

Disjunctions are written as "||" and allow queries to require (at least) one out of several possible fillers. For example, retrieves everything that is a musical actor or a theatre actor. This also includes everything that is both, i.e. we really have a logical OR here. We can also specify multiple values for a property, e.g. |New York.

Subqueries
Enumerating multiple pages for a property is cumbersome and hard to maintain. For instance, to select all actors that are born in a Italian city you could write: |Milan||Turin||Florence||... To generate a list of all these Italian cities you could run another query located in::Italy and copy and paste the results into the first query. What you would like to do is use the city query within the actor query to generate the complex set of pages. This is called a subquery. Instead of a fixed list of page names for the property's value, you enter a new query enclosed in and within the property condition. In this example, you combine and write:

born in:: [[located in::Italy ]]

Arbitrary levels of nesting are possible, though nesting might be restricted for a particular site to improve performance.

For another example, to select all cities of the European Union you could write:

located in:: [[member of::European Union ]]

(

Subcategories and subproperties
Conditions with categories are generally simple, but they are more powerful than they might at first appear:

When selecting pages by category, the result also includes all pages that are in subcategories of this category.

For example, assume that we have a category "Theatre actor" which is a subcategory of "Actor". Then the query will also return the specialized actors that are in the category "Theatre actor" only. This makes sense in many situations, but you can still view the pages that were directly put into the category "Actor" by just going to the page of that category.

Semantic MediaWiki has a related feature, subproperties. You can annotate one property with the Property:Subproperty of another property.

''When selecting pages by some constraint on a property, the result also includes all pages that have subproperties of this property.

For example, assume that in a movie wiki we have properties "directed" and "wrote screenplay of" that are annotated as subproperties of the more general property "worked on". Then the query worked on::Titanic will also return people with the specialized relationships to this movie. This is only for selection, not for display: if you request the property to be displayed in search results, you will not see values for its subproperties; continuing this example, if you display "worked on", you will not see values for its subproperty "directed" unless you request display of both properties

Equivalent names and redirects
In MediaWiki, if an article has multiple names, or if it has alternative punctuation, capitalization or spellings, you can make pages for the alternative names that are redirects to one main article. Semantic MediaWiki takes redirects into account when searching for property values:

When selecting pages by constraining a property to some page name, the result also includes all pages that have the property value set to page names that are equivalent according to MediaWiki redirects.

For example, if The Golden State is a redirect page to California, then querying for Located in::California will also find pages annotated with "Located in::The Golden State", and vice-versa.

Comparators
The thing following the :: is the value for the property that must match, in other words it's a test for equality. For each datatype SMW attempts to regularize the property before making the comparison:
 * for strings it trims leading and trailing whitespace
 * for wiki pages it handles capitalizing and underscores for spaces
 * for numbers it normalizes thousand separators, decimal point, and scientific notation
 * for numbers with units it converts the number into the primary unit (which will lead to rounding in the calculation and display of the converted value)

By adding other symbols after the :: before the value, you can use a different comparator than equality.


 * > and < (greater/less than)
 * greater than or equal, and less than or equal


 * ! (exclamation mark)
 * not equal


 * ~ (tilde)
 * "like" comparison for strings

In SMW 1.0, these comparators do not work for properties of datatype Page or for conditions on categories.

A wiki installation can limit which comparators are available. By default, ~ for "like" is not enabled, an administrator must modify $smwgQComparators in the file SMW_Settings.php.

Greater than or equal, less than or equal comparators
With numeric values, you often want to select pages with property values within a certain range. For example

height::>6 ft height::<7 ft

asks for all actors that are between 6 feet and and 7 feet tall. Note that this takes advantage of the automatic unit conversion: even if the height of the actor was set with height::195cm it would be recognized as a correct answer (provided that the datatype for height understands both units, see Help:custom units).

Note the comparator is greater/less than or equal. Do not add = to it.

Such range conditions on properties are mostly relevant if its values can be ordered in a natural way. For example, it makes sense to ask start date::>May 6 2006 but is is not really helpful to say homepage URL::>http://www.somewhere.org.

If a datatype has no natural linear ordering, Semantic MediaWiki will just apply the alphabetical order to the normalized datavalues as they are used in the RDF export. You can thus use greater than and less than to select alphabetic ranges of a string property. For example, you could ask surname::>Do surname::<G to select surnames between "Do" and up to "G".

Not equal comparator
You can select pages whose property is not equal to some value. For example, Area code::!415 will select pages whose area code is not "415". As with the (default) equality comparator, rounding in numeric conversions can lead to unexpected results; for example, height::!6.00 ft may still select someone whose height displays as "6.00 feet" in alternate units.

Like comparator
This only works for properties of datatype Help:Type String.

In a like condition you use ' * ' wildcards to match any sequence of characters and ' ? ' to match any single character. For example, you could ask Address::~*Park Place* to select addresses containing "Park Place", or Honorific::~M?. to select both "Mr." and "Ms.".

Direct conditions on pages
So far, all conditions depended on some or the other annotation given within an page. But there are also conditions that directly select some pages, or pages from a given namespace.

Directly giving some page title (possibly including a namespace prefix), or a list of such page titles separated by ||, selects the pages with those names. For example,

|France||User:John Doe

Note that the result does not display any namespace prefixes; see the hover box or status bar of the browser, or follow the links to determine the namespace. Restricting the set based on an attribute value one could ask, e.g., "Who of Bill Murray, Dan Aykroyd, Harold Ramis and Ernie Hudson is taller than 6ft?". But direct selection of articles is most useful if further properties of those articles are asked for, as is described below.

To select a category, you must put a ":" before the category name; this avoids confusing (return all actors) and Category:Actor (return the category "Actor").

Restricting to a namespace
A less strict way of selecting given pages is via namespaces. The default is to return pages in every namespace. To return pages in a particular namespace, specify the namespace with a wildcard, e.g. write Help:+ to return every page in the "Help" namespace. Since the main namespace usually has no prefix, write + to select pages in the main namespace. For example, to return pages in either the main or "User" namespace, write |User:+. To return pages in the "Category" namespace, again you need a ":" in front of the namespace label to prevent confusion.

Information Display
Queries return a list of pages, and the default result is to simply display the pages' titles. You can specify additional properties of the pages to display and also display the pages' categories. The way you do this is different in the Special:Ask page and the two forms of inline queries.

in Special:Ask
In the "Additional printouts" form field on the web page, just list each additional property you want to display prepended with ? (question mark), one per line, e.g. ?Population ?Has capital

Use ?Category to display all the categories to which the page is directly assigned.

in the function
List each additional property you want to display prepended with ? (question mark), separated by | (pipe symbol). For example

in the tag
You add statements such as population::* to show the value of the population property (if any) of the selected pages. Using * as "filler" indicates that this statement does not specify a condition for the selection of pages, but only specifies what should be displayed about the selected pages. For example the query above can also be written as  Population::+ Population::* Has capital::* [[Category:*]]

Note how this also works for categories.

What is displayed
Even if there are no "height" properties in a page, the page is still in the selection, so the result will display an empty field. Likewise, if some article has been assigned many different values for one property, all of them will be displayed.

Thus a common idiom when you want to display all values of some property is to select only pages that have a value for that property, using with the '+' wildcard constraint, for example using the function syntax: Otherwise you'll display every page in the wiki, most of which will have no value for "height".

Display format
For attributes that support units, queries can also determine which unit should be used for the output. Just mention one of the supported units after the property name. The syntax details are different for the different forms of queries.


 * In Special:Ask and the function :
 * Add the unit after the property to display, separated by ' # '. For example ?Area#km² displays the values of the property area displayed in km².


 * In the tag:
 * Add the unit after the '*' following the property to display. For example  height::*cm  displays the values of the property height displayed in cm.

Header label
The default is to display the name of the property as the header for the column of its values. You can override this to display a different header; again the syntax details are different for the different forms of queries.


 * In Special:Ask and the function :
 * Add the header label after the property to display (and after any display format), separated by ' = '. For example ?Population=Count of people living there or ?Area#km²=Coverage.


 * In the tag:
 * Add the header label after the property to display (and after any display format), separated by ' | '. For example  Count of people living there  or  How tall .

Display does not include subproperties
As explained in, querying for a property also queries against its subproperties. However, displaying the property will only show valus for the property itself, not values of its subproperties. You must request display of the subproperties. In the same example above, querying for worked on::Titanic will return the person with the subproperty directed::Titanic, but if you display the "worked on" property it will show a blank for that page.

Sorting results
Normally results are sorted by page title. You can sort results by some property, and in SMW version 1.1 you can sort by multiple properties.


 * in Special:Ask
 * choose the column to sort on and Descending or Ascending; in SMW version 1.1 you can add additional sorting conditions.


 * in the function
 * specify the column to be sorted as the parameter sort=property name </tt>, separated from other parameters by the | (pipe symbol). In SMW 1.1 you can sort on multiple properties by separating them with commas.


 * in the tag
 * specify the column to be sorted by inside the ask-tag: &lt;ask sort="population"&gt;

Ascending or descending order can be chosen by specifying order="ascending" (or "asc", "descending', or "desc") in the same way. If you sort on multiple properties, you can specify the order for each one separated by commas.  If you specify an additional sort order, then that leads an alphabetical sort by the main result column (the page title); for example, an   function with sort=date of birth | order=asc,desc</tt> will sort by date of birth, oldest first, and then by page name in reverse alphabetical order..

You can also click the little sorter-icons in the header of a results table, but note that this JavaScript sort in the UI only sorts the results visible on the current page, not all results. Also, it has some "smarts" to sort numerically or alphabetically but its sorting and collating will not be identical to a semantic search's sort order.

Using templates and variables
You can use arbitrary templates and variables in a query. One particularly useful variable for inline queries is  for the current page with namespace, which allows you to reuse a generic query on many pages. For an example of this, see Property:Population. Read about inline queries for more information.

Another example is a selection criteria that displays all future events based on the current date: end date::>2024-August-28

Linking to Semantic Search Results
The easiest way to do this is to create a page with the semantic search in an inline query (see next help section). If you want to link to the results of a query in Special:Ask, you need to handle the ?, [, and ] characters in its URL. To hide the ? introducing the query parameters, use a template like Wikipedia's Querylink. To escape the brackets, use &amp;#91;</tt> and &amp;#93;</tt> to represent them

Limited "Reasoning"

 * When querying for category, SMW does query on ancestor categories (identified by assigning the parent category in the child category's article)
 * When querying for a property, SMW does query on ancestor properties (identified by Property:Subproperty of in the child property's article).

These queries are computationally expensive and so administrators may disabled them a particular wiki.

Although you can create and use properties to annotate many other features of a property (that it is transitive, the inverse of another, broader/narrower than another etc.), and you can even link these to well-known properties in ontologies such as owl, rdfs, SKOS, etc., SMW does not use these annotations to perform amarter queries. You must craft queries and subqueries that query for these. The sample pages Germany and California show examples of queries for inverse relationships; the sample page Germany shows an example of a subquery for a transitive relationship.

Redirected Properties not queried
If one property is a redirect to another, searching for that property won't find articles using the other, and requesting display of one won't display the other.

You could use Property:Subproperty of to make one a subproperty of the other, which will result in it showing up in the other's queries.

No Subqueries for Properties
You can't use a subquery to get a list of properties to query against. You can query to find a list of properties and copy and paste them into a query.

No queries of Special Properties
You can't query for the values of SMW's built-in Special properties such as Allows value or Equivalent URI.