From Triples to Text: LLM(RAG)-Based Approach to Querying Wikibase

From semantic-mediawiki.org
< MediaWiki Users and Developers Conference Fall 2025
MediaWiki Users and Developers Conference Fall 2025From Triples to Text: LLM(RAG)-Based Approach to Querying Wikibase
MediaWiki Users and Developers Conference Fall 2025
From Triples to Text: LLM(RAG)-Based Approach to Querying Wikibase
Talk details
Description: Search wiki content with natural language using Retrieval augmented generation. An implementation with llamaIndex connected to Wikibase content and mediawiki articles.
Speaker(s): Kolja Bailly
Type: Talk
Audience: Developers, Community, Business people, Academics, Admins
Event start: 2025/10/29 15:00:00
Event finish: 2025/10/29 15:30:00
Length: 30 minutes
Video: click here
Keywords: AI, LLM, RAG, Wikibase
Give feedback

The Open Science Lab(OSL) at TIB Hannover, develops open source solutions for the management of research data with Wikibase, an extension of the MediaWiki software suite. This presentation shows the integration of AI-based approaches within MediaWiki, utilizing Retrieval-Augmented Generation (RAG), a methodology that allows Large Language Models (LLMs) to interact with custom data sources. The llama_index_mediawiki-service is a containerized solution, based on the LlamaIndex framework, designed to run a LLM that enhances the usability and accessibility of data hosted on MediaWiki instances. Computational resources can be used from remote services such as Huggingface API or GWDG SAIA or locally, preserving user privacy by keeping all data local. The results provide context-aware responses to user queries in natural language or support the user in the creation of SPARQL queries. OSL has updated the service to index data saved in several structured formats including MediaWiki pages and Wikibase statements. By leveraging LlamaIndex, a Vector Index can be created that stores data from the Wiki instance in a format that allows comparison of semantic similarity. A demo instance of the service has been applied to a Wiki instance containing data about historic manor houses in the Baltic Sea Region, a joint project between OSL and the University of Greifswald. While still in development, this demo offers a promising step towards easy-to-use and free-of-charge open-source LLM integration in MediaWiki.


Gitlab repository: https://gitlab.com/nfdi4culture/wikibase4research/wikibase-RAG