Description
This training session will offer participants a deep dive into the processes of annotating and sharing biodiversity data. The session will begin with an online theoretical introduction, providing an overview of the current state of biodiversity literature, the move towards the liberation of taxonomic data and the new opportunities this offers. This will be followed by two days of hands-on, in-person training focused on practical workflows for structuring, annotating, and circulating biodiversity data from scientific publications. Participants will explore how to annotate biodiversity datasets, ensuring they are properly formatted for reuse, and how to facilitate the sharing and circulation of these datasets within the scientific community. Trainees will also learn how to access and reuse data for their own research needs.
By the end of the session, attendees will be equipped with the skills to enhance the accessibility and reusability of biodiversity data, contributing to more effective data exchange.
Online session
15 October 2025
14.00-16.00 Introduction: The state of biodiversity literature and the move towards data liberation
- Welcome: getting to know each other and trainees’ expectations
- General introduction to the course
- Technical requirements
Face to face
10 November 2025
Day 1: Structuring and annotating biodiversity data
09.00-10.00 Module 1: Concepts
- Principles of semantics
- The elements structuring a biodiversity publication
- Annotations, attributes and linking: FAIR data
- Workflows: template, individual extraction, born digital versus scanned documents
10.00-10.30 – Coffee break
10.30-13.00 Module 2: XML-first workflow
- Introduction to the XML-first workflow
- Structuring a taxonomy/biodiversity paper
- XML conversion and enrichment of the metadata
- Annotating the data in JATS-Taxpub
13.00-14.00 – Lunch
14.00-15.30 Module 3: Golden Gate
- The PDF workflow to enable re-use of data
- Conversion and annotation of legacy literature
- Learning and becoming a certified contributor
15.30-16.00 – Coffee break
16.00-17.00 Module 3: Continuation
11 November 2025
Day 2: Accessing and re-using biodiversity data
9.00-10.30 Module 4: Curation and re-use of data from publications
- Curation and quality control of data already available in online repositories
- TreatmentBank
- Biodiversity Literature Repository
- Synospecies
- Ocellus
10.30-11.00 – Coffee break
11.00-12.30 Module 4: Continuation
12.30-13.30 – Lunch
13.30-15.00 Module 5: Complementary re-use of data from publications
- Biodiversity PMC: specific annotation to question and answering
- Biodiversity front end user interfaces
- SIBiLS Collections
- SIBiLS back-end services
- SIBILS API & Data access channels
- Curation-support tools
- Advanced triage systems
15.00-15.30 – Coffee break
15.30-17.00 Module 5: Continuation
- GBIF: occurrences to taxonomic names
- GBIF – hosted portals
- ChecklistBank: integrating names in CoL
Trainers
![]() |
Laurence Bénichou (https://orcid.org/0000-0002-0713-0751) is a French publisher. She is the Head of the Paris Museum Science Press (MNHN), and founded in 2011 the European Journal of Taxonomy with a board of European colleagues and serves now as the Liaison officer for the journal.
Since 2018, she leads the E-Publishing working group of CETAF. An expert in the field of scientific publishing for the Ministry of French higher education and research, she specialized in Linnaean Taxonomy publishing, Communication Design and Media. Her research is focused on diamond open access, digital publishing and data mining. |
Chris Le Coquet (https://orcid.org/0009-0006-7416-8983) is a French publisher. He is a desk editor for the European Journal of Taxonomy, and is also in charge of digital projects focused on the FAIRisation of biodiversity data at the Paris Museum Science Press (MNHN). | |
![]() |
Donat Agosti (https://orcid.org/0000-0001-9286-1200) is a Swiss biologist with over 30 years of experience, focused on making biodiversity data openly accessible. He co-founded Plazi in 2008, a Swiss NGO that develops workflows to convert scientific literature into FAIR data. Through partnerships with institutions like GBIF, NIH, Zenodo, and Biodiversity PMC, Plazi’s work enables the reuse of published data in global research infrastructures, with Plazi being the largest data contributor to Zenodo, GBIF and COL. He is also a widely published researcher in taxonomy and biodiversity informatics. |
![]() |
Julia Giora (https://orcid.org/0009-0006-7416-8983) is a Brazilian biologist with PhD and postdoctoral training in Animal Biology, with over 20 years of experience in research, higher education, and biodiversity data. Currently leading the Learning & Engagement team at Plazi, working internationally with FAIR scientific data and training. Also active as a content developer for universities and Brazilian NGOs focused on biodiversity conservation. Author of more than 20 scientific publications, including peer-reviewed articles, books, and book chapters. |
![]() |
Emilie Pasche (https://orcid.org/0000-0002-9118-5762) is a research associate at HES-SO Geneva and SIB, and is involved in Biodiversity PMC. |
![]() |
Markus Döring (https://orcid.org/0000-0001-7757-1889) is a German botanist and biodiversity informatician currently working for the Global Biodiversity Information Facility (GBIF) in Copenhagen, Denmark. Trained originally as a botanist, Markus bridges the gap between taxonomy and data infrastructure. He has played a key role in developing the GBIF Backbone Taxonomy, ChecklistBank and the integrated publishing toolkit (IPT). He is the lead developer for the Catalogue of Life and has been engaged in Biodiversity Information Standards (TDWG) since 2001 with contributions to Darwin Core and TCS. |
Dates of Training period
One theoretical online session, plus two working days (October to November 2025) divided as follows:
Online session
Wednesday 15th of October 2025
Face to face practical experience
Monday 10th and Tuesday 11th of November 2025
Location
Villa Engler – Freie Universität Berlin, Altensteinstr. 2, 14195 Berlin
Course’s language
English
Target audience
Editors / Publishers / Librarians / Researchers / Students
Fee
The school enrolment is free. All the other costs are at the expense of participants.
Registration deadline
30 September 2025
Mode of trainees’ assessment
- A short quiz will be provided at the end of the theoretical introduction;
- Short exercises will be provided at the end of each practical module.
Participant quota (min and max number of trainees)
10–20 (1-2 groups)
Types of training/ Implementation method
- Theoretical modules/Online and in-person lectures
- Practical experience / face to face
- Structuring and annotating biodiversity data. / Hands-on exercises using XML-first workflow and Golden Gate
- Exploring and reusing biodiversity data / Hands-on exercises using TreatmentBank, BLR, GBIF, BiodiversityPMC, etc.
Training Course learning outcomes
The present course will cover a variety of topics from purely theoretical to the development of practical skills in the field of biodiversity data. The main expected outcomes are:
- Learn about the state of biodiversity literature and the move to liberate biodiversity data from publications
- Familiarise with the standards and workflows in biodiversity data
- Practice the structuration of biodiversity paper
- Practice the annotation of biodiversity data
- Learn how to efficiently access and reuse biodiversity data
- Practice data infrastructures such as TreatmentBank, GBIF and BiodiversityPMC
- First step to become a certified contributor to TreatmentBank
Certifications provided
- Certificate of Attendance by CETAF DEST with 5 ECVET Units (European Credit system for Vocational Education and Training)
- Certificate by CETAF DEST according to Europass Certificate Supplement (certifying analytically the knowledge, skills and competences gained)
What trainees need to bring
- Laptop (up-to-date OS)
- Microsoft Word
- Java ver. 23
- Libre Office ver. 7.6.5.2 (provided)
- XML Mind ver. 9.5.1 (provided)
- Golden Gate Imagine (provided)
More details: dest@cetaf.org