Hey Wiki!

From semantic-mediawiki.org
SMWCon Fall 2023
Hey Wiki!
Talk details
Description: Connecting SMW with a local AI in a ChatGPT like way.
Speaker(s): Alexander Gesinn
Type: Talk
Event start: 2023/12/11 16:30:00
Length: 20 minutes
Video: click here
Give feedback

In this talk, we show how to run a container-virtualised service that aims to run a local Large Language Model (LLM) to assist wiki users. The LLM is intended to act as a chatbot for a user and be aware of the content hosted on the wiki without sending that content to a third party service provider for privacy reasons.

The service answers user questions such as "Where did the last 10 Semantic MediaWiki conferences take place?" or "Provide a top 5 list of event series with most events."

Building Blocks[edit]

  • MediaWiki - An open source wiki
  • Wiki Database - A data store for the wiki, for example MySQL
  • LlamaIndex - A data framework for connecting custom data sources to large language models (LLMs).
  • A public available LLM
  • A Vector Index created by LlamaIndex
SMWCon 2023 - Hey Wiki Architecture.png


LlamaIndex allows you to connect to data sources and add your data to the data that LLMs already have. This is often referred to as Retrieval-Augmented Generation (RAG). RAG allows you to use LLMs to query, transform and generate new insights from your data.

In the context of RAG, the retrieval component retrieves relevant information from a knowledge base based on the original prompt. The retrieved information is then incorporated into the prompt before generating a final response. This modification allows the model to leverage external knowledge to enhance the quality, relevance, and contextuality of the generated content. The combination of retrieval and generation mechanisms in RAG enables a more nuanced and informed response compared to traditional generative models.

Vector Index[edit]

In terms of LlamaIndex, a Vector Index is a data structure that is made up of document objects to enable querying by an LLM.