Corpus Management Platform

The Corpus Management Platform is part of the Flemish contribution to DARIAH Belgium. The goal is to create a platform for the collaborative management and discovery of digitised textual collections that allows digital humanities researchers to prepare their corpora (consisting of, for example, digitised newspapers and books) for textual analysis. The platform will enable researchers to browse and search the digitized collections compiled, cleaned, enriched and managed by the researchers themselves. Once the relevant research sub-corpus has been compiled, data export tools, using standardised open formats will enable researchers to export the sub-corpus for analysis with existing digital text analysis tools such as MALLET for topic modelling, VOYANT for data visualisation or AntConC for concordance and textual analysis. The platform will build on the existing IIIF format, the International Image Interoperability Framework. This format is used by some of the most important libraries and cultural heritage institutions in the world, therefore providing access to enormous collections of digital objects. 

The platform has been conceived as part of a larger and modular Virtual Research Environment Service Infrastructure (VRE-SI) . In a first phase, Islandora (a digital asset management system based on Fedora Commons and Drupal), Mediawiki and Omeka were tested as possible frameworks and content management systems. A wider variety of systems and existing solutions are currently being evaluated and compared.  

Involved DARIAH-BE partners