Making the 18th century accessible. Digital sources for 18th century and confessional history
As part of one of the larger research projects in LUMEN, Lutheranism and societal development in Denmark, we have now made a major step within digitalization of sources 18th century sources and made it possible to use digital methods on our 18th century archival material. This enable us to ask new research questions and reach firmer conclusions.
A research infrastructure project funded by the Carlsberg Foundation has made it possible for us to train general model in Transkribus, able to machine-read 18th century handwritten documents.
Transkribus is a platform developed to do automated recognition, transcription and searching of historical documents. Users share models on the platform, however a model for 18th century Danish handwriting has been missing. Our model is now available for other users of Transkribus, to the benefit of both researcher and students. In this blog, I want to share some of my experiences with developing this model, and some of the perspectives I see in this.
The collection
The project was kicked off in august 2020. Funding from Aarhus University for a pilot project made it possible to start the project in collaboration with the City Archive in Aarhus. Together with six student assistants and a colleague from the archive, I spend one week marking up pages in a collection I had made on Transkribus. The collection consists of letters concerning the administration of Møn’s ‘tugt’ workhouse, including descriptions of the reasons the inmates were send there, and in some cases their verdicts. The collection has of approximately 7000 pages, covering the period from 1737 to 1812. I uploaded each year as a pdf to Transkribus and during the first week we segmented the entire collection, marking up pages and lines.
The pilot project
I had already worked with some of the cases when we moved the collection to Transkribus, and I therefor had 2-300 pages transcribed. First step was to upload them to Transkribus, proofread them and train the first model. This was almost done during the autumn 2020 by me and four student assistants. Depending on how messy the pages are and how trained you are at segmenting, it is possible to mark between 500 and 1000 pages during one week of work – full time. With funding from Carlsberg, we were able to continue this work during the spring. During the lockdown in winter and spring 2021, we worked together on zoom 2 x 2 hours each week, proofreading text that had been transcribed by the model on Transkribus and slowly improving our model.
Remarkable accuracy
We now have a model that read a variety of 18th Century handwritings with an accuracy around 95% and have started the next phase of the project, to train the model on 18th Century Court records from Aarhus City. The results here are equally good and the model are made public in the Transkribus platform under the name: 18C Danish Administrative Writing. This makes it possible for other researchers in Denmark and Norway to train this universal model to more specific hands and reach an even higher degree of accuracy.
Huge potentials for research
So, what are the potential of all this work? They are several. The project has allowed us to digitalize and transcribe a larger number of 18th Century handwritten documents. The second part of the project, funded by Carlsberg, was to build a digital platform in which these documents are searchable and easy to access. Here text can be annotated and proofread. Now that the Transcribus model is as good as it is, text can be machine-read and moved to the platform often without further proofreading. This opens new possibilities for my current as well as future research projects.
Digital methods can be used on 18th Century handwritten documents in large scale. With my collection from Møn’s ‘tugt’ workhouse, I will be able to answer new research question about the maintenance of a Christian society in practice and the depth of the distribution of social imaginaries from Lutheran theology in the Danish society. I can make large scale analysis of how un-Christian behaviour was understood through the words and deeds connected to the concept – and the development of this over time. Or put ordinary peoples understanding of sin into context and systemize my analysis of how sin and authority connected.
Student inclusion as happy side effect
An important side effect of this project has been the possibility to include students in the research. It has been fun working together, and proofreading all the material have given my student assistants an insight into the cases that they can now use in their master thesis. This gives a for humanities rare opportunity to have master students working independently within the frame of a collaborative research project, with the same sources as yourself.
Digital methods in teaching
The pilot project got funded in the first place because this model also has huge possibilities for using digital methods in teaching. Few students today actually have the time and possibility during their study to learn to read 18th century handwriting. With this model, the process of learning this is easier. You still need to learn to read the handwriting, but working first with documents in Transkribus, and having a machine-read suggestion makes the beginning a bit more manageable. This spring I tested it in one of my classes, and it is clear that the model makes it possible to include more original documents in teaching than it would otherwise have been. This way, the 18th century is still accessible for students and research - because being introduced to the original documents of older periods as a student, is a prerequest for investing more time and energy on the sources later in the study.