An automated Wikibase deployment pipeline for research data management

By Paul Duchesne, Open Science Lab, TIB

Intentions and methods for deploying Wikibase test and production instances at the Open Science Lab, TIB – Leibniz Information Centre for Science and Technology


As part of our work in the context of NFDI4Culture, a project aimed at setting out standards, guidelines and practical implementations of infrastructure for research data management, the Open Science Lab have been engaging with the construction of a reliable and configurable deployment process for Wikibase. The requirements themselves have grown organically based on research needs in the context of facilitating LOD management and the annotation of complex media objects, and includes significant modification from the example Wikibase installation configuration provided by WMDE. Our method of deployment has also matured from a collection of terminal commands to a fully-automated Ansible playbook.

Simple diagram of the deployment pipeline

Deployment

Our entire deployment process is now orchestrated by Ansible. This is a tool intended to allow developers to deploy systems in a highly predictable manner, and is attractive as it requires that nothing is pre-installed on target servers. It is also extremely scalable: if a playbook can write a deployment to a single instance, it can write to a hundred.

The time-saving aspect of Ansible has also been crucial, as the deployment playbook now contains a number of discrete stages, from installing prerequisite tools, building custom containers and performing post-installation configuration. We have included full NGINX configuration, including automatically requesting SSL certificates as, although we no longer use this role in our own instances, this has been retained here as likely useful for other users who wish to deploy a proper HTTPS setup.

Within the playbook, it is worth breaking down what each of these roles are doing:

Production and testing architecture

As with other TIB projects, each Wikibase deployment comprises discrete instances for production, testing and development work, written to distinct Debian servers. The advantage of deploying via the same Ansible playbook to each of these servers is that testing and development work can take place in environments identical to the production instance. Concerning data synchronisation, this is performed using a SQL database snapshot process and is enacted periodically from production to test instances.

Maintenance

Maintenance of the Wikibase deployment is controlled by a separate playbook, which manages the automated snapshotting process to ensure that data will be retained in the case of catastrophic server failure, or to facilitate regular database overwriting operations. A planned extension of this playbook will be the incorporation of system monitoring tasks to ensure that the services are responding as expected. This playbook will also be handed over to a scheduling system like Apache Airflow to manage regular operation without human interaction.

The use of Ansible is also useful beyond deployment, as system upgrades to Mediawiki, Wikibase and other containerised components can be consistently directed to all relevant servers. In this way, maintenance remains a sustainable task even when the number of running instances is scaled.

Outlook

The process described above has allowed us to deploy instances of Wikibase which are specially customised to meet the needs of research data management use-cases with minimal effort and with confidence that the resulting services will be stable and ready to use. Future efforts will focus on additional strategies for managing multiple deployments, adding improved system monitoring and the incorporation of further tools and configuration changes as driven by project requirements.


Page last modified: 2022-07-05.