论文标题
克拉里亚(Clariah)的本体论:历史,语言和媒体的互操作性
Ontologies in CLARIAH: Towards Interoperability in History, Language and Media
论文作者
论文摘要
数字人文学科的最重要目标之一是通过增加学术研究的规模,链接现有数据库或改善数据的可访问性来为研究人员提供新研究问题的数据和工具。在这里,公平的原则提供了一个有用的框架,因为数据需要:可找到,因为它们通常分散在各种来源之间;可以访问,因为有些可能是离线的或在付费墙后面的;可互操作,因此使用标准知识表示格式和共享词汇;并通过足够的许可和权限可重复使用。整合来自各种人文科学领域的数据并不是微不足道的,诸如“ 18世纪经济财富平等分布?”或“围绕破坏性媒体事件构建的叙事是什么?”)以及准备阶段(例如数据收集,知识组织,清洁)需要考虑的叙事。在本章中,我们描述了荷兰国家项目Clariah开发和集成的本体和工具Clariah,以解决人文科学的三个基本领域或“支柱”(语言学,社会和经济历史和媒体研究)的这些问题,这些问题具有具有范式数据表示(典型的数据,结构性数据,结构性数据,以及多层型)。我们总结了从这些本体和工具从这些域中从概括和可重复性的角度使用这些本体和工具中学到的经验教训。
One of the most important goals of digital humanities is to provide researchers with data and tools for new research questions, either by increasing the scale of scholarly studies, linking existing databases, or improving the accessibility of data. Here, the FAIR principles provide a useful framework as these state that data needs to be: Findable, as they are often scattered among various sources; Accessible, since some might be offline or behind paywalls; Interoperable, thus using standard knowledge representation formats and shared vocabularies; and Reusable, through adequate licensing and permissions. Integrating data from diverse humanities domains is not trivial, research questions such as "was economic wealth equally distributed in the 18th century?", or "what are narratives constructed around disruptive media events?") and preparation phases (e.g. data collection, knowledge organisation, cleaning) of scholars need to be taken into account. In this chapter, we describe the ontologies and tools developed and integrated in the Dutch national project CLARIAH to address these issues across datasets from three fundamental domains or "pillars" of the humanities (linguistics, social and economic history, and media studies) that have paradigmatic data representations (textual corpora, structured data, and multimedia). We summarise the lessons learnt from using such ontologies and tools in these domains from a generalisation and reusability perspective.