WikiApiary

WikiApiary gathers statistics and usage information about 5,500 MediaWiki installations with more being added each day. Tracking 1,935,719,662 edits and counting...

What is WikiApiary?
WikiApiary collects, graphs and analyzes information about MediaWiki websites. Once a website is registered with WikiApiary, which any registered user can do, a collection of bots will start collecting information about the versions of software being used, the amount of editing activity on the site and the use of Semantic MediaWiki. WikiApiary intends to help people with MediaWiki websites have more visibility and awareness of activity, as well as collecting and aggregating information about the MediaWiki environment throughout the web.

Data
WikiApiary collects information about each site as well as aggregating and analyzing that data across all sites that are monitored. One of the design objectives of WikiApiary was to build it with MediaWiki, making the entire system work with standard MediaWiki and Semantic MediaWiki extensions. The only data that is stored outside of Semantic MediaWiki is the time series edit data for each website, which is held in a separate MariaDB database and accessed with separate PHP methods. All other information for each wiki are held in three different subpages for each site: General, Extensions and Skins. These pages are managed by bots and are transcluded into the main page for each wiki

By default WikiApiary collects usage information from each wiki every 4 hours, and version data every day (this is adjustable for each site, but cannot be more than 8 hours). The dygraph library is used to visualize the data and is presented on the wiki using the Widgets extension to place raw HTML into the page.

One of the most basic graphs WikiApiary provides includes the count of pages and articles over time.



WikiApiary also grabs information for administrators like the size of the job queue.



Administrators will also find it useful to see how quickly the API is responding.



WikiApiary monitors itself (very meta!) as well, and does so every 15 minutes. This gives an incredible amount of detail for each data point. WikiApiary could monitor websites as frequently as every minute. Here is a detail graph showing the results of some performance tuning.



For Semantic MediaWiki sites WikiApiary also supports more detailed monitoring if the administrator adds a page to the project namespace that outputs a JSON data block. Here is a graph showing the trend of semantic query sizes being used on WikiApiary.



Last but not least, these graphs can be popped into their own windows. Soon these will automatically update themselves and allow you to create your own dashboard of various wikis graphs.



How is collection done?
You might be wondering how this information is collected for each site. A key part of the process is segmenting all of the websites that WikiApiary monitors. The site has a configurable metric for the number of bot segments. This is used to divide the sites into bot segments. Each minute Bumble Bee (a Python robot) is launched from cron with a specific segment number. Right now there are 15 segments, so every 15 minutes the robot cycles through all of the websites. The transclusion of the bot segments page is used in all relevant places so a single change will immediately update all segment assignments.



In addition to bot segments WikiApiary also has hour and day segments that are used for other types of scheduling, and in some cases both are used together in a day by hour segment.

MediaWiki Platform
In addition to the detailed information about each website, WikiApiary gathers statistics about extensions and which version is used, active skins, versions of MediaWiki, MySQL and PHP in use. Some highlights:

Examples
WikiApiary has documented some great events in it's first 6 months:


 * The Wikidata project has been monitored since it's inception and WikiApiary shows the growth of the site to over 9,000 active editors and over 12,000,000 pages!
 * WikiApiary monitors WikiVoyage and captured the huge jump in activity after the site was launched in mid-January 2013.

Semantic Data
WikiApiary is a large Semantic MediaWiki site on it's own. In fact, it is the 12th largest Semantic MediaWiki site tracked by WikiApiary with over 1.5 million values being stored (the largest SMW site has over 17 million properties!). The site has over 75,000 semantic queries, with 40,000 of those being count queries.

Semantic Wishing
WikiApiary wouldn't exist without Semantic MediaWiki. It is an amazing platform that enables so much, however there are some gaps that would make building something like WikiApiary much easier.


 * The addition of a distinct query format would be amazingly useful. Right now you have to send values into arrays and have the array extension deal with duplicates. This is okay for small datasets, but when you want distinct over thousands of values it is a performance burden.

Why Apiary?
An apiary is a bee yard. While considering the name for the site I kept envisioning the MediaWiki logo, a sunflower. My friend Garrick and I were brainstorming the name and imagining WikiApiary as a a collection of bees (bots) flying off to thousands of sunflowers (MediaWiki instances) and bringing back information. Hence, apiary.

Future Plans
The to do list for WikiApiary is long and rich. The site has attracted a good amount of attention and usage and there is so much more that can be done. Some things you will see in the coming months:


 * I've always planned to have the site send email notifications and reports to people for their sites. For example, if you are using an old extension you could elect to get an email notifying you of the update. If your user growth took an unusual turn, you could get an email.
 * Right now all graphs are showing the raw data collected every 4 hours. Soon this will be aggregated into daily and weekly data. This will also enable some new charts that show rate of change instead of raw numbers. Edit count isn't a great graph, but edits by day is.
 * There is an experimental feature that will backup a wiki remotely. I would like to make that better, and will likely be a paid service at some point.
 * I want to be able to display trending graphs versions of software being used. This will involve snapshotting statistics nightly to show the rates of adoption for various upgrades and the trend in usage by extension.
 * Spiders and dedicated bots that pull in new wikis.
 * A WikiApiary extension that brings some of the features into a wiki, and extends functionality for WikiApiary working remotely.

The list goes on and on, but that is a good sample.

Want to get involved?
Is your wiki being monitored by WikiApiary? If not, you should add it. There is a simple 5 step process outlined on the main page. If you are a wiki maven and routinely discover new wikis, add the bookmarklet to your browser and with a couple of clicks you can add a new site (and it will even check for duplicates before you add it!).

If you run a wiki farm with a lot of wikis it would be great to write a bot to make sure all of your wikis are in WikiApiary. If you are a Python, PHP or Javascript developer and want to help out take a look at the GitHub repo.

Special Thanks
Obviously a big thanks is due to all of the developers of MediaWiki and the MediaWiki API. Also, the awesome developers behind Semantic MediaWiki, Semantic Forms and Semantic Results Formats. Additionally, I would like to give a special thanks to &#91;&#91;kgh&#93;&#93; who signed up very early and has helped in many, many ways with the site. I also want to thank Nemo who added a link from the extension pages on MediaWiki.org to their respective WikiApiary pages. This is the way that most people have discovered the site. Huge thanks to both!