Computer Vision for Digital Humanists

Sarah A. Lang; Sarah-May Lang; Sean Winslow; Daniel Luger; Germaine Götzlemann; Angelos Nicolaou; Nicolas Renet; Suzana Sagadin; Niklas Tscherne

Winter School

Computer Vision for Digital Humanists

The foundational skills in Distant Viewing and computer vision are increasingly relevant in the Digital Humanities, yet educational resources are often aimed at those with a background in computer science and statistics. For example, machine learning and digital image processing are fundamental to the Computational Humanities, but many scholars in the Digital Humanities lack accessible training in these areas.

The goal of this project was to create a focused course enabling students to acquire essential skills in computer vision, specifically tailored for Digital Humanists. Upon completing this course, students will possess a foundational understanding of digital image representation, computer vision methodologies, and machine learning techniques, all contextualized within a Digital Humanities framework.

This class is part of the project “Computer Vision for Digital Humanists” and licensed Creative Commons BY NC SA. This project (2023) was funded by CLARIAH-AT with the support of BMBWF. It was made possible by major contributions from the ERC DiDip project (From Digital to Distant Diplomatics). The video was produced by Moving Stills .

The goal of the project “Computer Vision for Digital Humanists” was the creation of educational self-learning resources on Computer Vision specifically for Digital Humanities, consisting of slide decks, Jupyter Notebooks with practical exercises in Python as well as teaching videos ( see the YouTube playlist. They cover a range of topics from the basics of computer vision and machine learning to training custom deep learning models for one’s own historical data.

Session 1Introduction
Session 2Theoretical Foundations of Computer Vision
Session 3Tools for Computer Vision in the Digital Humanities
Session 4Practical: Hands-On Computer Vision for Digital Humanists
Session 5The Distant Viewing Paradigm in the Digital Humanities

Introduction

This Winter School module introduces digital humanists to computer vision, combining the theoretical background with practical skills. It is tailored for those new to computer vision, offering a broad understanding of basic concepts to Digital Humanities applications.

The first module presents a basic introduction to machine learning, offering the necessary theoretical background for understanding computer vision. This is complemented by a historical review of Computer Vision, outlining its evolution and key developments.

The video school places a strong emphasis on practical skills, such as creating one’s own custom deep-learning models for computer vision using Python. Participants will engage in several sessions focused on data preparation, including selection, preprocessing, and organization, as well as model construction and optimization with tools like TensorBoard. The school also includes a session on no-code computer vision tools, providing alternatives for participants who prefer not to code their own models from scratch. Basic knowledge of Python is expected as a prerequisite to the practicals but not necessary for the tools session.

An important component of the school is exploring computer vision applications in Digital Humanities, including a literature review and the current state of the field. A discussion on a collaborative project between a computer scientist and a humanities scholar illustrates the challenges and benefits of interdisciplinary work in the Digital Humanities.

Overall, this school aims to equip digital humanists with the knowledge and skills to incorporate computer vision techniques into their research, balancing theoretical background with hands-on practice.

Theoretical Foundations of Computer Vision

This module provides a foundational understanding of computer vision, specifically designed for beginners. It covers machine learning and computer vision concepts, techniques, and evaluation methods.

The session starts with a glossary-based introduction, explaining key machine learning concepts in simple, understandable terms. Attendees are encouraged to view this segment multiple times, taking breaks as needed to fully grasp the material, as these concepts are foundational for subsequent modules.

The second video presents a historical overview of computer vision, tracking its development up to current times. This part of the module contextualizes current state-of-the-art computer vision techniques within their historical trajectory, explaining the significance of various advancements, and serves as a historical backdrop for understanding how they shape contemporary practices.

Anguelos Nicolaou leads the rest of this module, beginning with the session ‘Taxonomy of Methods’. Computer vision is not just a one-size-fits-all analysis. There are multiple distinct methods or subtasks, such as classification, image segmentation, and many more that you will need to know about so you know what your problem is called and what methods are best to tackle it. Nikolaou then discusses ‘Performance Evaluation and Epistemology’. This segment focuses on how to evaluate the effectiveness of computer vision algorithms. It also delves into the nuances of algorithmic learning, highlighting potential pitfalls such as algorithms finding spurious patterns or taking unintended shortcuts. It stresses the importance of using evaluation techniques to ensure the validity of computer vision models, fostering critical engagement with the outputs of computer vision algorithms.

Overall, this session equips participants with a deeper understanding of the theoretical background of computer vision, preparing them not just to use these techniques in their research but also to critically assess their performance.

Tools for Computer Vision in the Digital Humanities

Niklas Tscherne begins the module by introducing Tropy, a software designed for managing research photographs. Tropy is particularly useful for organizing large collections of visual data and supplementing them with their associated metadata.

Following Tscherne, Germaine Götzelmann presents a variety of out-of-the-box computer vision tools suitable for Humanities research. She covers topics such as mathematical morphology and introduces tools like LAREX or VISE from the Oxford Visual Geometry Group. Her examples contain detecting images or illustrations in early modern books. Her presentation demonstrates that a number of computer vision tasks in the Humanities can be done without relying on deep learning. Although this is clearly the dominant paradigm nowadays and will probably continue to be even more dominant in the future.

Practical: Hands-On Computer Vision for Digital Humanists

This practical session encompasses three videos that guide participants through data preparation and building computer vision models, ranging from basic to large-scale analysis.

The first video, “Get the Data,” led by Suzana Sagadin, teaches how to create a corpus using the Python Pandas library. Accompanying this session is a blog post by Sagadinand Sarah Lang that discusses forming humanities research questions suitable for computer vision and selecting a dataset that can be used to train a computer vision algorithm for solving this particular issue. Using the example of classifying photos from the 19th or 20th century illustrates the complexity hidden in seemingly simple research questions in the context of Humanities data.

Suzana Sagadin & Sarah Lang, How to create a ground truth data set for computer vision using Humanities data, in LaTeX Ninja Blog, 04.07.2023.

The second video, “The Mini Model,” shows how to prepare data for a “Hello World!” deep learning model in Python. This includes organizing data, preprocessing, and model construction. The video fills in these details often omitted in standard tutorials, ensuring a thorough understanding necessary for training custom algorithms using one’s own data.

The third video, “The Large Model,” extends the concepts from the earlier sessions to a more complex model, suitable for real-world applications. It also shows how to use TensorBoard for monitoring and evaluating machine learning progress.

These sessions are conducted in Google Colab using Jupyter notebooks, a user-friendly format ideal for those with limited technical skills or Python experience. Participants have the option to use Google Colab or work with provided code and datasets locally on their machine. The code notebooks have been archived on Zenodo where they will remain accessible.

Code notebooks on Zenodo

The Distant Viewing Paradigm in the Digital Humanities

The final session of the schools provides an overview of computer vision applications in the digital humanities in the context of the Distant Viewing paradigm.

The first video, by Sarah Lang and Suzana Sagadin, offers an extensive literature review on the application of computer vision in digital humanities. It revisits essential concepts from the initial machine learning introduction, connecting these technical ideas to their practical use in (Digital) Humanities research. It also addresses ethical considerations when with computational methods in the context of cultural heritage data.

The second video follows an interdisciplinary discussion between a humanities scholar and a computer scientist from the ERC DiDip (From Digital to Distant Diplomatics) project (https://didip.hypotheses.org/). This conversation sheds light on the challenges and learning opportunities in interdisciplinary collaboration, demonstrating how computational methods can complement humanities research and showing what such collaboration can look like in practice.

ERC DiDip (From Digital to Distant Diplomatics) project

Winter School

Computer Vision for Digital Humanists

Introduction

Theoretical Foundations of Computer Vision

Tools for Computer Vision in the Digital Humanities

Practical: Hands-On Computer Vision for Digital Humanists

The Distant Viewing Paradigm in the Digital Humanities

Organisation