Liberation of data: how to liberate data from publications and how to reuse it - DEST (Distributed European School of Taxonomy)

Description

This training session will offer participants a deep dive into the processes of annotating and sharing biodiversity data. The session will begin with an online theoretical introduction, providing an overview of the current state of biodiversity literature, the move towards the liberation of taxonomic data and the new opportunities this offers. This will be followed by two days of hands-on, in-person training focused on practical workflows for structuring, annotating, and circulating biodiversity data from scientific publications. Participants will explore how to annotate biodiversity datasets, ensuring they are properly formatted for reuse, and how to facilitate the sharing and circulation of these datasets within the scientific community. Trainees will also learn how to access and reuse data for their own research needs.

By the end of the session, attendees will be equipped with the skills to enhance the accessibility and reusability of biodiversity data, contributing to more effective data exchange.

Online session

15 October 2025

14.00-16.00 Introduction: The state of biodiversity literature and the move towards data liberation

Welcome: getting to know each other and trainees’ expectations
General introduction to the course
Technical requirements

Face to face

10 November 2025

Day 1: Structuring and annotating biodiversity data

09.00-10.00 Module 1: Concepts

Principles of semantics
The elements structuring a biodiversity publication
Annotations, attributes and linking: FAIR data
Workflows: template, individual extraction, born digital versus scanned documents

10.00-10.30 – Coffee break

10.30-13.00 Module 2: XML-first workflow

Introduction to the XML-first workflow
Structuring a taxonomy/biodiversity paper
XML conversion and enrichment of the metadata
Annotating the data in JATS-Taxpub

13.00-14.00 – Lunch

14.00-15.30 Module 3: Golden Gate

The PDF workflow to enable re-use of data
Conversion and annotation of legacy literature
Learning and becoming a certified contributor

15.30-16.00 – Coffee break

16.00-17.00 Module 3: Continuation

11 November 2025

Day 2: Accessing and re-using biodiversity data

9.00-10.30 Module 4: Curation and re-use of data from publications

Curation and quality control of data already available in online repositories
TreatmentBank
Biodiversity Literature Repository
Synospecies
Ocellus

10.30-11.00 – Coffee break

11.00-12.30 Module 4: Continuation

12.30-13.30 – Lunch

13.30-15.00 Module 5: Complementary re-use of data from publications

Biodiversity PMC: specific annotation to question and answering
- Biodiversity front end user interfaces
- SIBiLS Collections
- SIBiLS back-end services
- SIBILS API & Data access channels
- Curation-support tools
- Advanced triage systems

15.00-15.30 – Coffee break

15.30-17.00 Module 5: Continuation

GBIF: occurrences to taxonomic names
- GBIF – hosted portals
ChecklistBank: integrating names in CoL

Trainers

	Laurence Bénichou (https://orcid.org/0000-0002-0713-0751) is a French publisher. She is the Head of the Paris Museum Science Press (MNHN), and founded in 2011 the European Journal of Taxonomy with a board of European colleagues and serves now as the Liaison officer for the journal. Since 2018, she leads the E-Publishing working group of CETAF. An expert in the field of scientific publishing for the Ministry of French higher education and research, she specialized in Linnaean Taxonomy publishing, Communication Design and Media. Her research is focused on diamond open access, digital publishing and data mining.
	Chris Le Coquet (https://orcid.org/0009-0006-7416-8983) is a French publisher. He is a desk editor for the European Journal of Taxonomy, and is also in charge of digital projects focused on the FAIRisation of biodiversity data at the Paris Museum Science Press (MNHN).
	Donat Agosti (https://orcid.org/0000-0001-9286-1200) is a Swiss biologist with over 30 years of experience, focused on making biodiversity data openly accessible. He co-founded Plazi in 2008, a Swiss NGO that develops workflows to convert scientific literature into FAIR data. Through partnerships with institutions like GBIF, NIH, Zenodo, and Biodiversity PMC, Plazi’s work enables the reuse of published data in global research infrastructures, with Plazi being the largest data contributor to Zenodo, GBIF and COL. He is also a widely published researcher in taxonomy and biodiversity informatics.
	Julia Giora (https://orcid.org/0009-0006-7416-8983) is a Brazilian biologist with PhD and postdoctoral training in Animal Biology, with over 20 years of experience in research, higher education, and biodiversity data. Currently leading the Learning & Engagement team at Plazi, working internationally with FAIR scientific data and training. Also active as a content developer for universities and Brazilian NGOs focused on biodiversity conservation. Author of more than 20 scientific publications, including peer-reviewed articles, books, and book chapters.
	Emilie Pasche (https://orcid.org/0000-0002-9118-5762) is a research associate at HES-SO Geneva and SIB, and is involved in Biodiversity PMC.
	Markus Döring (https://orcid.org/0000-0001-7757-1889) is a German botanist and biodiversity informatician currently working for the Global Biodiversity Information Facility (GBIF) in Copenhagen, Denmark. Trained originally as a botanist, Markus bridges the gap between taxonomy and data infrastructure. He has played a key role in developing the GBIF Backbone Taxonomy, ChecklistBank and the integrated publishing toolkit (IPT). He is the lead developer for the Catalogue of Life and has been engaged in Biodiversity Information Standards (TDWG) since 2001 with contributions to Darwin Core and TCS.

Dates of Training period

One theoretical online session, plus two working days (October to November 2025) divided as follows:

Online session

Wednesday 15th of October 2025

Face to face practical experience

Monday 10th and Tuesday 11th of November 2025

Location

Villa Engler – Freie Universität Berlin, Altensteinstr. 2, 14195 Berlin

Course’s language

English

Target audience

Editors / Publishers / Librarians / Researchers / Students

Fee

The school enrolment is free. All the other costs are at the expense of participants.

Registration deadline

Registration has closed on 30 September 2025

Mode of trainees’ assessment

A short quiz will be provided at the end of the theoretical introduction;
Short exercises will be provided at the end of each practical module.

Participant quota (min and max number of trainees)

10–20 (1-2 groups)

Types of training/ Implementation method

Theoretical modules/Online and in-person lectures
Practical experience / face to face
- Structuring and annotating biodiversity data. / Hands-on exercises using XML-first workflow and Golden Gate
- Exploring and reusing biodiversity data / Hands-on exercises using TreatmentBank, BLR, GBIF, BiodiversityPMC, etc.

Training Course learning outcomes

The present course will cover a variety of topics from purely theoretical to the development of practical skills in the field of biodiversity data. The main expected outcomes are:

Learn about the state of biodiversity literature and the move to liberate biodiversity data from publications
Familiarise with the standards and workflows in biodiversity data
Practice the structuration of biodiversity paper
Practice the annotation of biodiversity data
Learn how to efficiently access and reuse biodiversity data
Practice data infrastructures such as TreatmentBank, GBIF and BiodiversityPMC
First step to become a certified contributor to TreatmentBank

Certifications provided

Certificate of Attendance by CETAF DEST with 5 ECVET Units (European Credit system for Vocational Education and Training)
Certificate by CETAF DEST according to Europass Certificate Supplement (certifying analytically the knowledge, skills and competences gained)

What trainees need to bring

Laptop (up-to-date OS)
Microsoft Word
Java ver. 23
Libre Office ver. 7.6.5.2 (provided)
XML Mind ver. 9.5.1 (provided)
Golden Gate Imagine (provided)

Registration

Registration has closed

Evaluation form

More details: dest@cetaf.org