Semantic MediaWiki's unconstrained schema approach allows users to create or define properties freely and with that freedom it is possible that conceptional identical or near-duplicate properties (similar properties) can occur and be used for value annotations without being detected by an agent that engages in a data curation1 task.

Several methods can help mitigate and counter syntactic similarity issues in the first place such as:

  • Use of templates to formalize user input
  • Use of #REDIRECT to build a pool of synonyms around a canonical property and allow them to be merged2 into a coherent extension of a properties semantics.

Syntacticly similar properties should be cleared and removed during the task called semantic gardening if they are indeed not different to each other. See an example for this on the sandbox wiki.3 Semantic MediaWiki 2.5.0 brought the feature of syntactic property similarity evaluation as well as special page "PropertyLabelSimilarity" which assists in displaying syntacticly similar properties and performing the task of semantic gardening.4


Configuration parameter $smwgSimilarityLookupExemptionProperty
Sets the property used to exclude a property from being evaluated during similarity checks
defines a property that allows to describe properties in terms of an exemption condition meaning to exclude a property from the process of syntactic similarity evaluation. By default this property is called "owl:differentFrom".

For example, on the property page "Governance level" one may annotate [[owl:differentFrom::Governance level of]] which would result in a suppressed similarity lookup for both properties "Governance level" and "Governance level of" property when compared to each other. Thus it is clear that these two properties "Governance level" and "Governance level of" are indeed similar but conceptually different and they will not be shown on special page "PropertyLabelSimilarity". See the respective example on the sandbox wiki.5

Syntactic vs. semantic similarity

Syntactic similarity is understood as function that "analyzes the syntactic similarity of a pair of tags" using the "Levenshtein Distance, the Cosine Similarity, the Jaccard Similarity, the Jaro Distance"6:100 while semantic similarity analyzes the "semantic relations defined between tags as well as their frequency"6:101.


