Archive:Semantic search 1.0
This page contains outdated information and is thus OBSOLETE! |
Semantic MediaWiki includes an easy-to-use query language which enables users to access the wiki's knowledge. The syntax of this query language is similar to the syntax of annotations in Semantic MediaWiki. This query language can be used on the special page Special:Ask or in inline queries.
Naturally, answering queries requires additional resources, and the administrators of some sites can decide to switch off or restrict most of the features given below in order to ensure that even high-traffic sites can handle the additional load.
Introduction[edit]
Semantic queries specify two things:
- Which pages to select
- What information to display about those pages
All queries must state some conditions that describe what is asked for. You can select pages by name, namespace, category, and most importantly by property values. For example, the query:
[[Located in::Germany]]
is a query for all pages with the "Located in" property with a value of "Germany". If you enter this in Special:Ask and click "Find results", SMW executes the query and displays results as a simple table of all matching page titles. If there are many results, they can be browsed via the navigation links at the top and bottom of the query results, for example a query for all persons on ontoworld.org.
The second point is important to display more information. In the example above, one might be interested in the population of the things located in Germany. To display this as well as the page titles, change the query to:
[[Located in::Germany]] [[population::*]]
and SMW displays the same page titles and the values of the Population property on those pages, if any.
Both points are explained in more detail in the sections below.
Page selection[edit]
By category or property value[edit]
In the example above, we gave the single condition [[Located in::Germany]] to describe which pages we were interested in. The markup text is exactly what you would otherwise write to assert that some page has this property and value. Putting it in a semantic query makes return all such pages (actually some more; but read on). This is a general scheme: The syntax for asking for pages that satisfy some condition is exactly the syntax for explicitly asserting that this condition holds.
The following queries show what this means:
- [[Category:Actor]] gives all pages directly or indirectly (through a sub-, subsub-, etc. category) in the category.
- [[born in::Boston]] gives all pages annotated as being about someone born in Boston.
- [[height::180cm]] gives all pages annotated as being about someone having a height of 180cm.
By using other categories, relations, or attributes than above, we can already ask for pages which have certain annotations. Next let us combine those requirements:
[[Category:Actor]] [[born in::Boston]] [[height::180cm]]
asks for everybody who is an actor and was born in Boston and is 180cm tall. In other words: when many conditions are written into one query, the result is narrowed down to those pages that meet all the requirements. Thus we have a logical AND. By the way: queries can also include line breaks in order to make them more readable. So we could as well write:
[[Category:Actor]] [[born in::Boston]] [[height::180cm]]
to get the same result as above. Note that queries only return the articles that are positively known to satisfy the required properties: if there is no property for the height of some actor, that actor will not be selected.
Wildcards and disjunctions[edit]
In the examples above, we gave very concrete conditions, using "Actor", "Boston", and "180cm" as the value for the category or property. It is possible to weaken these conditions in several ways.
Wildcards are written as "+" and allow any value for a given condition. For example, [[born in::+]] returns all pages that have annotations for the property "born in". For categories, this feature makes little sense: [[Category:+]] just returns everything that has some category.
Disjunctions are written as "||" and allow queries to require (at least) one out of several possible fillers. For example, [[Category:Musical actor||Theatre actor]] retrieves everything that is a musical actor or a theatre actor. This also includes everything that is both, i.e. we really have a logical OR here. We can also specify multiple values for a property, e.g. [[born in::Boston||New York]].
Subqueries[edit]
Enumerating multiple pages for a property is cumbersome and hard to maintain. For instance, to select all actors that are born in a Italian city you could write:
[[Category:Actor]] [[born in::Rome||Milan||Turin||Florence||...]]
To generate a list of all these Italian cities you could run another query
[[Category:City]] [[located in::Italy]]
and copy and paste the results into the first query. What you would like to do is use the city query within the actor query to generate the complex set of pages. This is called a subquery. Instead of a fixed list of page names for the property's value, you enter a new query enclosed in <q> and </q> within the property condition. In this example, you combine and write:
[[Category:Actor]] [[born in::<q>[[Category:City]] [[located in::Italy]]</q>]]
Arbitrary levels of nesting are possible, though nesting might be restricted for a particular site to improve performance.
For another example, to select all cities of the European Union you could write:
[[Category:Cities]] [[located in::<q>[[Category:Country]] [[member of::European Union]]</q>]]
((view results)
Subcategories and subproperties[edit]
Conditions with categories are generally simple, but they are more powerful than they might at first appear:
When selecting pages by category, the result also includes all pages that are in subcategories of this category.
For example, assume that we have a category "Theatre actor" which is a subcategory of "Actor". Then the query [[Category:Actor]] will also return the specialized actors that are in the category "Theatre actor" only. This makes sense in many situations, but you can still view the pages that were directly put into the category "Actor" by just going to the page of that category.
Semantic MediaWiki has a related feature, subproperties. You can annotate one property with the Property:Subproperty of another property.
When selecting pages by some constraint on a property, the result also includes all pages that have subproperties of this property.
For example, assume that in a movie wiki we have properties "directed" and "wrote screenplay of" that are annotated as subproperties of the more general property "worked on". Then the query [[worked on::Titanic]] will also return people with the specialized relationships to this movie. This is only for selection, not for display: if you request the property to be displayed in search results, you will not see values for its subproperties; continuing this example, if you display "worked on", you will not see values for its subproperty "directed" unless you request display of both properties
Equivalent names and redirects[edit]
In MediaWiki, if an article has multiple names, or if it has alternative punctuation, capitalization or spellings, you can make pages for the alternative names that are redirects to one main article. Semantic MediaWiki takes redirects into account when searching for property values:
When selecting pages by constraining a property to some page name, the result also includes all pages that have the property value set to page names that are equivalent according to MediaWiki redirects.
For example, if Demo:California is a redirect page to Demo:California, then querying for [[Located in::California]] will also find pages annotated with "Located in::The Golden State", and vice-versa.
Comparators[edit]
The thing following the :: is the value for the property that must match, in other words it's a test for equality. For each datatype SMW attempts to regularize the property before making the comparison:
- for strings it trims leading and trailing whitespace
- for wiki pages it handles capitalizing and underscores for spaces
- for numbers it normalizes thousand separators, decimal point, and scientific notation
- for numbers with units it converts the number into the primary unit (which will lead to rounding in the calculation and display of the converted value)
By adding other symbols after the :: before the value, you can use a different comparator than equality.
- > and < (greater/less than)
- greater than or equal, and less than or equal
- ! (exclamation mark)
- not equal
- ~ (tilde)
- "like" comparison for strings
In SMW 1.0, these comparators do not work for properties of datatype Page or for conditions on categories.
A wiki installation can limit which comparators are available. By default, ~ for "like" is not enabled, an administrator must modify $smwgQComparators in the file SMW_Settings.php.
Greater than or equal, less than or equal comparators[edit]
With numeric values, you often want to select pages with property values within a certain range. For example
[[Category:Actor]] [[height::>6 ft]] [[height::<7 ft]]
asks for all actors that are between 6 feet and and 7 feet tall. Note that this takes advantage of the automatic unit conversion: even if the height of the actor was set with [[height::195cm]] it would be recognized as a correct answer (provided that the datatype for height understands both units, see Help:custom units).
Note the comparator is greater/less than or equal. Do not add = to it.
Such range conditions on properties are mostly relevant if its values can be ordered in a natural way. For example, it makes sense to ask [[start date::>May 6 2006]] but is is not really helpful to say [[homepage URL::>http://www.somewhere.org]].
If a datatype has no natural linear ordering, Semantic MediaWiki will just apply the alphabetical order to the normalized datavalues as they are used in the RDF export. You can thus use greater than and less than to select alphabetic ranges of a string property. For example, you could ask [[surname::>Do]] [[surname::<G]] to select surnames between "Do" and up to "G".
Not equal comparator[edit]
You can select pages whose property is not equal to some value. For example, [[Area code::!415]] will select pages whose area code is not "415". As with the (default) equality comparator, rounding in numeric conversions can lead to unexpected results; for example, [[height::!6.00 ft]] may still select someone whose height displays as "6.00 feet" in alternate units.
Like comparator[edit]
This only works for properties of datatype Help:Datatype "String".
In a like condition you use ' * ' wildcards to match any sequence of characters and ' ? ' to match any single character. For example, you could ask [[Address::~*Park Place*]] to select addresses containing "Park Place", or [[Honorific::~M?.]] to select both "Mr." and "Ms.".
Direct conditions on pages[edit]
So far, all conditions depended on some or the other annotation given within an page. But there are also conditions that directly select some pages, or pages from a given namespace.
Directly giving some page title (possibly including a namespace prefix), or a list of such page titles separated by ||, selects the pages with those names. For example,
[[Brazil||France||User:John Doe]]
Note that the result does not display any namespace prefixes; see the hover box or status bar of the browser, or follow the links to determine the namespace. Restricting the set based on an attribute value one could ask, e.g., "Who of Bill Murray, Dan Aykroyd, Harold Ramis and Ernie Hudson is taller than 6ft?". But direct selection of articles is most useful if further properties of those articles are asked for, as is described below.
To select a category, you must put a ":" before the category name; this avoids confusing [[Category:Actor]] (return all actors) and [[:Category:Actor]] (return the category "Actor").
Restricting to a namespace[edit]
A less strict way of selecting given pages is via namespaces. The default is to return pages in every namespace. To return pages in a particular namespace, specify the namespace with a wildcard, e.g. write
[[Help:+]]
to return every page in the "Help" namespace. Since the main namespace usually has no prefix, write [[:+]] to select pages in the main namespace. For example, to return pages in either the main or "User" namespace, write [[:+||User:+]]. To return pages in the "Category" namespace, again you need a ":" in front of the namespace label to prevent confusion.
Information Display[edit]
Queries return a list of pages, and the default result is to simply display the pages' titles. You can specify additional properties of the pages to display and also display the pages' categories. The way you do this is different in the Special:Ask page and the two forms of inline queries.
in Special:Ask[edit]
In the "Additional printouts" form field on the web page, just list each additional property you want to display prepended with ? (question mark), one per line, e.g.
?Population ?Has capital
Use ?Category to display all the categories to which the page is directly assigned.
in the {{#ask:}} function[edit]
List each additional property you want to display prepended with ? (question mark), separated by | (pipe symbol). For example
{{#ask: [[Population::+]] | ?Population | ?Has capital | ?Category }}
in the <ask> tag[edit]
You add statements such as [[population::*]] to show the value of the population property (if any) of the selected pages. Using * as "filler" indicates that this statement does not specify a condition for the selection of pages, but only specifies what should be displayed about the selected pages. For example the query above can also be written as
<ask> [[Population::+]] [[Population::*]] [[Has capital::*]] [[Category:*]] </ask>
Note how this also works for categories.
What is displayed[edit]
Even if there are no "height" properties in a page, the page is still in the selection, so the result will display an empty field. Likewise, if some article has been assigned many different values for one property, all of them will be displayed.
Thus a common idiom when you want to display all values of some property is to select only pages that have a value for that property, using with the '+' wildcard constraint, for example using the {{#ask:}} function syntax:
{{#ask: [[height::+]] | ?height}}
Otherwise you'll display every page in the wiki, most of which will have no value for "height".
Display format[edit]
For attributes that support units, queries can also determine which unit should be used for the output. Just mention one of the supported units after the property name. The syntax details are different for the different forms of queries.
- In Special:Ask and the {{#ask:}} function
- Add the unit after the property to display, separated by ' # '. For example ?Area#km² displays the values of the property area displayed in km².
- In the <ask> tag
- Add the unit after the '*' following the property to display. For example [[height::*cm]] displays the values of the property height displayed in cm.
Header label[edit]
The default is to display the name of the property as the header for the column of its values. You can override this to display a different header; again the syntax details are different for the different forms of queries.
- In Special:Ask and the {{#ask:}} function
- Add the header label after the property to display (and after any display format), separated by ' = '. For example ?Population=Count of people living there or ?Area#km²=Coverage.
- In the <ask> tag
- Add the header label after the property to display (and after any display format), separated by ' | '. For example [[Population::*|Count of people living there]] or [[height::*cm|How tall]].
Display does not include subproperties[edit]
As explained in #Subcategories and subproperties, querying for a property also queries against its subproperties. However, displaying the property will only show valus for the property itself, not values of its subproperties. You must request display of the subproperties. In the same example above, querying for [[worked on::Titanic]] will return the person with the subproperty [[directed::Titanic]], but if you display the "worked on" property it will show a blank for that page.
Sorting results[edit]
Normally results are sorted by page title. You can sort results by some property, and in SMW version 1.1 you can sort by multiple properties.
- in Special:Ask
- choose the column to sort on and Descending or Ascending; in SMW version 1.1 you can add additional sorting conditions.
- in the {{#ask:}} function
- specify the column to be sorted as the parameter sort=property name , separated from other parameters by the | (pipe symbol). In SMW 1.1 you can sort on multiple properties by separating them with commas.
- in the <ask> tag
- specify the column to be sorted by inside the ask-tag: <ask sort="population">
Ascending or descending order can be chosen by specifying order="ascending" (or "asc", "descending', or "desc") in the same way. If you sort on multiple properties, you can specify the order for each one separated by commas. If you specify an additional sort order, then that leads an alphabetical sort by the main result column (the page title); for example, an {{#ask:}} function with sort=date of birth | order=asc,desc will sort by date of birth, oldest first, and then by page name in reverse alphabetical order..
You can also click the little sorter-icons in the header of a results table, but note that this JavaScript sort in the UI only sorts the results visible on the current page, not all results. Also, it has some "smarts" to sort numerically or alphabetically but its sorting and collating will not be identical to a semantic search's sort order.
Using templates and variables[edit]
You can use arbitrary templates and variables in a query.
One particularly useful variable for inline queries is {{FULLPAGENAME}}
for the current page with namespace, which allows you to reuse a generic query on many pages.
For an example of this, see Property:Population.
Read about inline queries for more information.
Another example is a selection criteria that displays all future events based on the current date:
[[Category:Event]] [[end date::>{{CURRENTYEAR}}-{{CURRENTMONTH}}-{{CURRENTDAY}}]]
Linking to Semantic Search Results[edit]
The easiest way to do this is to create a page with the semantic search in an inline query (see next help section). If you want to link to the results of a query in Special:Ask, you need to handle the ?, [, and ] characters in its URL. To hide the ? introducing the query parameters, use a template like Wikipedia's Querylink. To escape the brackets, use [ and ] to represent them
Limitations[edit]
Limited "Reasoning"[edit]
- When querying for category, SMW does query on ancestor categories (identified by assigning the parent category in the child category's article)
- When querying for a property, SMW does query on ancestor properties (identified by Property:Subproperty of in the child property's article).
These queries are computationally expensive and so administrators may disabled them a particular wiki.
Although you can create and use properties to annotate many other features of a property (that it is transitive, the inverse of another, broader/narrower than another etc.), and you can even link these to well-known properties in ontologies such as owl, rdfs, SKOS, etc., SMW does not use these annotations to perform amarter queries. You must craft queries and subqueries that query for these. The sample pages Germany and Demo:California show examples of queries for inverse relationships; the sample page Germany shows an example of a subquery for a transitive relationship.
Redirected Properties not queried[edit]
If one property is a redirect to another, searching for that property won't find articles using the other, and requesting display of one won't display the other.
You could use Property:Subproperty of to make one a subproperty of the other, which will result in it showing up in the other's queries.
No Subqueries for Properties[edit]
You can't use a subquery to get a list of properties to query against. You can query to find a list of properties and copy and paste them into a query.
No queries of Special Properties[edit]
You can't query for the values of SMW's built-in Special properties such as Allows value or Equivalent URI.