Seer-Dock: A General-Purpose Dockerized Scholarly Document Collection and Management Framework
The harvesting, management, and analysis of thematic document collections is a major challenge in a wide variety of applications. While the criteria for compiling such collections are individual, the entire process is largely standardized. Therefore, it is not efficient to build new systems over and over again to take over these tasks. In this work, we introduce Seer-Dock, a novel and easy-todeploy general-purpose dockerized framework to build a scholarly document harvesting and management system. It is based on Cite-SeerX, the most widely used scholarly search engine. Seer-Dock uses docker containers for all components and thus enables its users to rapidly deploy a full-fledged document collection and management system on any operating system platform and tailor it to the specific needs of an application domain. Moreover, it is easy to scale, orchestrate, maintain, and recover. In this resource paper, we introduce the architecture of Seer-Dock and its components. Like its kernel CiteSeerX, Seer-Dock is available under an Apache 2 open source license.