Liberation of data: how to liberate data from publications and how to reuse it

Description

This training session will offer participants a deep dive into the processes of annotating and sharing biodiversity data. The session will begin with an online theoretical introduction, providing an overview of the current state of biodiversity literature, the move towards the liberation of taxonomic data and the new opportunities this offers. This will be followed by two days of hands-on, in-person training focused on practical workflows for structuring, annotating, and circulating biodiversity data from scientific publications. Participants will explore how to annotate biodiversity datasets, ensuring they are properly formatted for reuse, and how to facilitate the sharing and circulation of these datasets within the scientific community. Trainees will also learn how to access and reuse data for their own research needs.

By the end of the session, attendees will be equipped with the skills to enhance the accessibility and reusability of biodiversity data, contributing to more effective data exchange.

 

Online session

15 October 2025

14.00-16.00 Introduction: The state of biodiversity literature and the move towards data liberation 

  • Welcome: getting to know each other and trainees’ expectations
  • General introduction to the course
  • Technical requirements

 

Face to face 

10 November 2025

Day 1: Structuring and annotating biodiversity data

09.00-10.00 Module 1: Concepts

  • Principles of semantics
  • The elements structuring a biodiversity publication
  • Annotations, attributes and linking: FAIR data
  • Workflows: template, individual extraction, born digital versus scanned documents

10.00-10.30 – Coffee break

10.30-13.00 Module 2: XML-first workflow

  • Introduction to the XML-first workflow
  • Structuring a taxonomy/biodiversity paper
  • XML conversion and enrichment of the metadata
  • Annotating the data in JATS-Taxpub

13.00-14.00 – Lunch

14.00-15.30 Module 3: Golden Gate

  • The PDF workflow to enable re-use of data
  • Conversion and annotation of legacy literature
  • Learning and becoming a certified contributor

15.30-16.00 – Coffee break

16.00-17.00 Module 3: Continuation

 

11 November 2025

Day 2: Accessing and re-using biodiversity data

9.00-10.30 Module 4: Curation and re-use of data from publications

  • Curation and quality control of data already available in online repositories
  • TreatmentBank
  • Biodiversity Literature Repository
  • Synospecies
  • Ocellus

10.30-11.00 – Coffee break

11.00-12.30 Module 4: Continuation

 

12.30-13.30 – Lunch

13.30-15.00 Module 5: Complementary re-use of data from publications

  • Biodiversity PMC: specific annotation to question and answering
    • Biodiversity front end user interfaces
    • SIBiLS Collections
    • SIBiLS back-end services
    • SIBILS API & Data access channels
    • Curation-support tools
    • Advanced triage systems

15.00-15.30 – Coffee break

15.30-17.00 Module 5: Continuation

  • GBIF: occurrences to taxonomic names
    • GBIF – hosted portals
  • ChecklistBank: integrating names in CoL

Trainers

    Laurence Bénichou (https://orcid.org/0000-0002-0713-0751) is a French publisher. She is the Head of the Paris Museum Science Press (MNHN), and  founded in 2011 the European Journal of Taxonomy with a board of European colleagues and serves now as the Liaison officer for the journal.

Since 2018, she leads the E-Publishing working group of CETAF. An expert in the field of scientific publishing for the Ministry of French higher education and research, she specialized in Linnaean Taxonomy publishing, Communication Design and Media. Her research is focused on diamond open access, digital publishing and data mining.

Chris Le Coquet (https://orcid.org/0009-0006-7416-8983) is a French publisher. He is a desk editor for the European Journal of Taxonomy, and is also in charge of digital projects focused on the FAIRisation of biodiversity data at the Paris Museum Science Press (MNHN).
Donat Agosti (https://orcid.org/0000-0001-9286-1200) is a Swiss biologist with over 30 years of experience, focused on making biodiversity data openly accessible. He co-founded Plazi in 2008, a Swiss NGO that develops workflows to convert scientific literature into FAIR data. Through partnerships with institutions like GBIF, NIH, Zenodo, and Biodiversity PMC, Plazi’s work enables the reuse of published data in global research infrastructures, with Plazi being the largest data contributor to Zenodo, GBIF and COL. He is also a widely published researcher in taxonomy and biodiversity informatics.
Julia Giora (https://orcid.org/0009-0006-7416-8983) is a Brazilian biologist with PhD and postdoctoral training in Animal Biology, with over 20 years of experience in research, higher education, and biodiversity data. Currently leading the Learning & Engagement team at Plazi, working internationally with FAIR scientific data and training. Also active as a content developer for universities and Brazilian NGOs focused on biodiversity conservation. Author of more than 20 scientific publications, including peer-reviewed articles, books, and book chapters.
Emilie Pasche (https://orcid.org/0000-0002-9118-5762) is a research associate at HES-SO Geneva and SIB, and is involved in Biodiversity PMC.
Markus Döring (https://orcid.org/0000-0001-7757-1889) is a German botanist and biodiversity informatician currently working for the Global Biodiversity Information Facility (GBIF) in Copenhagen, Denmark.
Trained originally as a botanist, Markus bridges the gap between taxonomy and data infrastructure. He has played a key role in developing the GBIF Backbone Taxonomy, ChecklistBank and the integrated publishing toolkit (IPT). He is the lead developer for the Catalogue of Life and has been engaged in Biodiversity Information Standards (TDWG) since 2001 with contributions to Darwin Core and TCS.

 

 

Dates of Training period

One theoretical online session, plus two working days (October to November 2025) divided as follows:

Online session

Wednesday 15th of October 2025

Face to face practical experience

Monday 10th and Tuesday 11th of November 2025

Location

Villa Engler – Freie Universität Berlin, Altensteinstr. 2, 14195 Berlin

Course’s language

English

Target audience

Editors / Publishers / Librarians / Researchers / Students

Fee

The school enrolment is free. All the other costs are at the expense of participants.

Registration deadline

30 September 2025

Mode of trainees’ assessment

  • A short quiz will be provided at the end of the theoretical introduction;
  • Short exercises will be provided at the end of each practical module.

Participant quota (min and max number of trainees)

10–20  (1-2 groups)

Types of training/ Implementation method

  1. Theoretical modules/Online and in-person lectures
  2. Practical experience / face to face
    • Structuring and annotating biodiversity data. / Hands-on exercises using XML-first workflow and Golden Gate
    • Exploring and reusing biodiversity data / Hands-on exercises using TreatmentBank, BLR, GBIF, BiodiversityPMC, etc.

Training Course learning outcomes

The present course will cover a variety of topics from purely theoretical to the development of practical skills in the field of biodiversity data. The main expected outcomes are:

  • Learn about the state of biodiversity literature and the move to liberate biodiversity data from publications
  • Familiarise with the standards and workflows in biodiversity data
  • Practice the structuration of biodiversity paper
  • Practice the annotation of biodiversity data
  • Learn how to efficiently access and reuse biodiversity data
  • Practice data infrastructures such as TreatmentBank, GBIF and BiodiversityPMC
  • First step to become a certified contributor to TreatmentBank

Certifications provided

  1. Certificate of Attendance by CETAF DEST with 5 ECVET Units (European Credit system for Vocational Education and Training)
  2. Certificate by CETAF DEST according to Europass Certificate Supplement (certifying analytically the knowledge, skills and competences gained)

What trainees need to bring

  1. Laptop (up-to-date OS)
  2. Microsoft Word
  3. Java ver. 23
  4. Libre Office ver. 7.6.5.2 (provided)
  5. XML Mind ver. 9.5.1 (provided)
  6. Golden Gate Imagine (provided)

 

Registration form

 

More details: dest@cetaf.org