The machine learning models currently used under the heading ‘artificial intelligence’ (AI) have been trained almost exclusively on data from the 21st century. They are therefore highly unsuitable for use in historical contexts or contexts that differ culturally from the Western world. The provision of data from the cultural heritage sector does not simply provide a remedy here. Such datasets are usually of high quality and are characterised by their historical depth, cultural richness and diversity. However, they often contain problematic content that stems from the worldview of bygone times and therefore require comprehensive documentation in order to make machine learning models more precise, powerful and suitable for use in different cultural contexts, and to enable their use for the common good.
This guide focuses on ethical, social and legal aspects of documenting cultural heritage data used for training machine learning models. In particular, it focuses on how these models may perpetuate historical or statistical biases. The analysis moves along the different phases of the entire machine learning workflow and identifies a number of neuralgic points where biases can arise. In addition, the role of cultural heritage institutions is emphasised. These institutions have both extensive experience in establishing documentation procedures and valuable datasets. They are therefore particularly qualified to publish datasets accompanied by exemplary documentation. The provision of cultural heritage data, taking ethical considerations into account, can help to prepare critical content for society in a way that stimulates the development of AI applications and avoids socially detrimental effects. The guide concludes with a plea for an interdisciplinary approach to address the issues identified and emphasises the need for proactive measures by cultural heritage institutions to document existing stereotypes and biases in the data in order to make a positive contribution to AI ethics. In doing so, this guide not only opens up the possibility of contributing to the development of small-scale models that are highly suitable for specific tasks in the cultural heritage sector with a high cost-benefit ratio, but also of making the existing large-scale multipurpose models more robust, efficient, context-sensitive, accurate and sustainable. The publication of high-quality cultural heritage datasets, including documentation, sharpens the profile of the cultural heritage institution, makes it attractive as a partner for research and thus opens up the possibility of participating in the acquisition of research funding.
| Persistent URI: | https://commons.nfdi4objects.net/resources/6127f5d3-cf3c-4b58-87d8-2e5ab12faf8a/ |
| DOI: | 10.5281/zenodo.16418346 |
| License | Creative Commons Attribution 4.0 (CC BY 4.0) |
|---|---|
| BibTeX |
|
| Categories | Recommendation |
| Tags | Data analysis CARE |