Property similarity

Semantic MediaWiki's schema free approach allows users to create or define properties freely and with that freedom it is possible that conceptional identical or near-duplicate properties can occur and be used for value annotations without being detected by an agent that engages in a data curation1 task.

Syntactic similarity is understood as function that "analyzes the syntactic similarity of a pair of tags" using the "Levenshtein Distance, the Cosine Similarity, the Jaccard Similarity, the Jaro Distance" 2:100 while semantic similarity analyzes the "semantic relations defined between tags as well as their frequency" 2:101.

Several methods can help mitigate and counter label similarity issues such as:

  • Use of templates to formalize user input
  • Use of #REDIRECT to build a pool of synonyms around a canonical property and allow them to be merged 3 into a coherent extension of a properties semantics.

