Digitisation of Natural History collections: From data quality to data cleaning and data publishing – A hands-on experience

 

DESCRIPTION

The CETAF-DEST Training course “Digitisation of Natural History collections: From data quality to cleaning and data publishing. A hands-on experience” will target the basic and important steps and standards of the digitisation process of all biological, geological, palaeontological & mineralogical data of the natural history collections. These are:

  • Data quality for ensuring the maximum quality when digitising taxonomic, geographical, collection and descriptive data such as taxonomic data, specimens and materials, literature data, field work notes, occurrence data, remote sensing data, etc.
  • Data cleaning in order to improve the quality of data and make them “fit-for-use” by defining and determining error types, search and identify error instances, correct the errors, document error instances and error types and modify data entry procedures to reduce future errors.
  • The data visualisation focusing in applying visualisation techniques in clean data.
  • Biodiversity Data Standards, such as Darwin Core, which are documented agreements on representation, format, definition, structuring, tagging, transmission, manipulation, use, and management of data, in order to ensure that data can be easily verified, analysed and reused by the wider scientific community.
  • ABCDEFG Standards which give access to Biological Collection Databases extended for Geosciences.
  • Publishing of data using the IPT (INTEGRATED PUBLISHING TOOLKIT) for the biological data and the GeoCASe platform for the geological, palaeontological & mineralogical data.
  • The types of Data Sets convenient for publication in GeoCASe platform.

The course will be a 5 days ’hands-on experience, since, according to its schedule, most of the time is dedicated to exercises on data quality and cleaning while group work will ensure and motivate interaction between trainees and trainers and foster lively discussions. Moreover, trainees will have the possibility to bring sample(s) of their institutional biodiversity and geodiversity digital database and exercise on them.

The course has been evaluated as a successful course through its four times implementation and standardisation in the frames of the EU COST Action CA17106 on “Mobilising Data, Experts and Policies in Scientific Collections-MOBILISE”.

TARGET GROUPS

The course is addressed to everyone who is engaged in biological and geological collections and their data, such as Curators and Collections’ managers, Directors/Senior managers, Collections’ Digitization managers/officers, Scientists on bio- or geo informatics, Students (Graduates, Post graduates, MSc, PhD), Technicians of collections.

 TRAINERS SHORT CVs
Dimitri Brosens, MSc, is the Biodiversity Data Liaison Manager from the Belgian Biodiversity Platform in the Flemish Research Institute for Nature and Forest since early 2009. Since 2021 he also acts as the GBIF node manager for Belgium. The ultimate goal of Dimitri is to get Belgium on the world map of biodiversity data. He has a particular interest in the publication, standardization and interoperability of biodiversity datasets.
Dr Piotr Tykarski works at the University of Warsaw, teaching entomology, invertebrate zoology, ecology and GIS. His scientific interests focus on saproxylic beetles and, factors and patterns in biodiversity. He has been involved in biodiversity informatics activities and GBIF in Poland since 2003. Currently in this respect he plays a double role of a node manager and a HoD of GBIF for Poland.
Dr Heimo Rainer is a curator in the Viennese Herbaria of the NHM (W) and the University (WU) and coordinates digitization there. They have established the Virtual Herbaria Platform JACQ a consortium of 40+ institutions and individuals. His research focus is the taxonomy of a tropical plant genus Annona.
Dr Larissa Smirnova is assistant at Royal Museum for Central Africa where she currently coordinates Synthesys+ project. Since 2009 she has been involved in different European and national projects connected to biodiversity informatics and digitization. She is also part of the GBIF mentoring and training program and has participated as invited coach in a number of training courses on data publishing.
Dr Catherina Voreadou is the head of Hydrobiology Lab of the Natural History Museum of the University of Crete. In parallel, since 1998 she is the head of Education and since 2012 she is leading also the Centre of Environmental Training of NHMC responsible for Life Long Learning. Member of the Greek National Accreditation Centre for Continuing Vocational Training (EKEPIS), as trainer for adults. She has coordinated or participated in 19 Research Educational and Training projects funded either by National or European sources. She is the author and editor of many Educational editions.
Dr Iasmi Stathi is a scorpiologist in the Natural History Museum of the University of Crete. Member of the Greek National Accreditation Centre for Continuing Vocational Training (EKEPIS), as trainer for adults. Sixteen years of experience in Environmental Education Programs, Training for Life Long Learning Education and Education Research and Development at the Dept. of Education, NHMC-UOC. She has also participated in 18 Research Educational projects funded either by National or European sources.
Athanasia Margetousaki, is a Pedagogist and has a postgraduate degree in Teaching Science. She works as a Special Technical Scientific Staff (EIB) at the Natural History Museum of Crete of the University of Crete. She has many years of experience in adult education by participating in adult education and training programs. She has participated in many research projects funded by European and national resources. She has also participated with announcements in national and world conferences.

REGISTRATION FEE

500€ It includes coffees and lunches. For accommodation and dinners special prices have been arranged.

PAYMENT OF FEES

Registration is free.
Only the trainees who will be selected to participate the course will be requested to pay the fees. This will happen after the registration deadline.

REGISTRATION DEADLINE

Registration has closed on 10 July 2023

 

MODE OF TRAINEES’ ASSESSMENT

Through the hands-on exercises in the daily program

 

PARTICIPANT QUOTA (min and max number of trainees)

Min 15 – Max 20 trainees

 

DAILY PROGRAM 

Important info

All hands-on exercises in the daily program will be implemented in parallel groups:

  • Trainees managing Biological data
  • Trainees managing Geological, Palaeontological, Mineralogical data

 

Day 1:  13/11/2023

09.00 – 09.30 WELCOME, GAME for GETTING TO KNOW EACH OTHER


SESSION A

09.30 – 10.00 PRESENTATION OF THE TRAINING COURSE, TRAINEES’ EXPECTATIONS
Catherina Voreadou, Iasmi Stathi, Athanasia Margetousaki

10.00 – 10.30 RECOVERY OF PREVIOUS KNOWLEDGE
Catherina Voreadou, Iasmi Stathi, Athanasia Margetousaki

10.30 – 11.00 Coffee  

 

SESSION B

11.00 – 11.30 INTRODUCTION TO BIODIVERSITY DATA STANDARDS, DISCUSSION
Larissa Smirnova

11.30 – 12.00 GBIF, IPT (INTEGRATED PUBLISHING TOOLKIT), TYPES OF DATA SET CONVENIENT FOR PUBLICATION IN GBIF, DISCUSSION
Piotr Tykarski, Dimitri Brosens

12.00 – 12.30 GeoCASe – ABCDEFG STANDARDS (ACCESS TO BIOLOGICAL COLLECTION DATABASES EXTENDED FOR GEOSCIENCES), TYPE OF DATA SETS CONVENIENT FOR PUBLICATION IN GeoCASe, DISCUSSION
 Heimo Rainer

12.30 – 13.00 OVERALL DISCUSSION

13.00 – 14.00 Lunch

 

SESSION C

14.00-15.30 HANDS ON EXERCISE in parallel groups

  • DATA QUALITY & CLEANING, PREPARATION OF DATA SETS
  • Software: OPEN REFINE/QGIS/R-studio/Other tools
    Dimitri Brosens, Piotr Tykarski, Heimo Rainer, Larissa Smirnova

15.30-16.00 Coffee

 

SESSION D

16.00- 17.00 HANDS ON EXERCISE in parallel groups

  • DATA QUALITY & CLEANING, PREPARATION OF DATA SETS
  • Software: OPEN REFINE/QGIS/R-studio/Other tools
    Dimitri Brosens, Piotr Tykarski, Heimo Rainer, Larissa Smirnova

 

Day 2: 14/11/2023

SESSION E

09.00 – 11.00 HANDS ON EXERCISE in parallel groups

  • DATA QUALITY & CLEANING, PREPARATION OF DATA SETS
  • Software: OPEN REFINE/QGIS/R-studio/Other tools
    Dimitri Brosens, Piotr Tykarski, Heimo Rainer, Larissa Smirnova

11.00 – 11.30 Coffee

 

SESSION F

11.30 – 13.00 HANDS ON EXERCISE in parallel groups

  • DATA QUALITY & CLEANING, PREPARATION OF DATA SETS
  • Software: OPEN REFINE/QGIS/R-studio/Other tools
    Dimitri Brosens, Piotr Tykarski, Heimo Rainer, Larissa Smirnova

13.00 – 14:00 Lunch

 

SESSION G

14:00-15:30 HANDS ON EXERCISE in parallel groups

  • DATA QUALITY & CLEANING, PREPARATION OF DATA SETS
  • Software: OPEN REFINE/QGIS/R-studio/Other tools
    Dimitri Brosens, Piotr Tykarski, Heimo Rainer, Larissa Smirnova

15.30-16.00 Coffee

 

SESSION H

16.00- 17.00 HANDS ON EXERCISE in parallel groups

  • DATA QUALITY & CLEANING, PREPARATION OF DATA SETS
  • Software: OPEN REFINE/QGIS/R-studio/Other tools
    Dimitri Brosens, Piotr Tykarski, Heimo Rainer, Larissa Smirnova


Day 3 
15/11/2023

09.00 – 17.00 Excursion in Crete. Lunch included


Day 4 
16/11/2023

SESSION I 09.00 – 11.00

HANDS ON EXERCISE in parallel groups

  • PUBLISHING OF DATA & METADATA IN GBIF & GeoCASe
  • IPT (INTEGRATED PUBLISHING TOOLKIT), ABCDEFG STANDARDS (ACCESS TO BIOLOGICAL COLLECTION DATABASES EXTENDED FOR GEOSCIENCES)
    Dimitri Brosens, Piotr Tykarski, Heimo Rainer, Larissa Smirnova

11.00 – 11.30 Coffee

 

SESSION J 11.30 – 13.00

HANDS ON EXERCISE in parallel groups

  • PUBLISHING OF DATA & METADATA IN GBIF & GeoCASe
  • IPT (INTEGRATED PUBLISHING TOOLKIT), ABCDEFG STANDARDS (ACCESS TO BIOLOGICAL COLLECTION DATABASES EXTENDED FOR GEOSCIENCES)
    Dimitri Brosens, Piotr Tykarski, Heimo Rainer, Larissa Smirnova

13.00 – 14:00 Lunch

 

SESSION K 14.00-15.30

HANDS ON EXERCISE in parallel groups

  • PUBLISHING OF DATA & METADATA IN GBIF & GeoCASe
  • IPT (INTEGRATED PUBLISHING TOOLKIT), ABCDEFG STANDARDS (ACCESS TO BIOLOGICAL COLLECTION DATABASES EXTENDED FOR GEOSCIENCES)
    Dimitri Brosens, Piotr Tykarski, Heimo Rainer, Larissa Smirnova

15.30-16.00 Coffee

 

SESSION L 16.00- 17.00

HANDS ON EXERCISE in parallel groups

  • PUBLISHING OF DATA & METADATA IN GBIF & GeoCASe
  • IPT (INTEGRATED PUBLISHING TOOLKIT), ABCDEFG STANDARDS (ACCESS TO BIOLOGICAL COLLECTION DATABASES EXTENDED FOR GEOSCIENCES)
    Dimitri Brosens, Piotr Tykarski, Heimo Rainer, Larissa Smirnova

 

Day 5 17/11/2023

SESSION K 09.00 – 11.00
HANDS ON EXERCISE in parallel groups

  • PUBLISHING OF DATA & METADATA IN GBIF & GeoCASe
  • IPT (INTEGRATED PUBLISHING TOOLKIT), ABCDEFG STANDARDS (ACCESS TO BIOLOGICAL COLLECTION DATABASES EXTENDED FOR GEOSCIENCES)
    Dimitri Brosens, Piotr Tykarski, Heimo Rainer, Larissa Smirnova

11.00 – 11.30 Coffee

 

SESSION M 11.30 – 13.00

HANDS ON EXERCISE in parallel groups

  • PUBLISHING OF DATA & METADATA IN GBIF & GeoCASe
  • IPT (INTEGRATED PUBLISHING TOOLKIT), ABCDEFG STANDARDS (ACCESS TO BIOLOGICAL COLLECTION DATABASES EXTENDED FOR GEOSCIENCES)
    Dimitri Brosens, Piotr Tykarski, Heimo Rainer, Larissa Smirnova

13.00 – 14:00 Lunch

 

SESSION N 14.00-15.30
HANDS ON EXERCISE in parallel groups

  • PUBLISHING OF DATA & METADATA IN GBIF & GeoCASe
  • IPT (INTEGRATED PUBLISHING TOOLKIT), ABCDEFG STANDARDS (ACCESS TO BIOLOGICAL COLLECTION DATABASES EXTENDED FOR GEOSCIENCES)
    Dimitri Brosens, Piotr Tykarski, Heimo Rainer, Larissa Smirnova

15.30-16.00 Coffee

 

SESSION O 16.00- 17.00

HANDS ON EXERCISE in parallel groups

  • PUBLISHING OF DATA & METADATA IN GBIF & GeoCASe
  • IPT (INTEGRATED PUBLISHING TOOLKIT), ABCDEFG STANDARDS (ACCESS TO BIOLOGICAL COLLECTION DATABASES EXTENDED FOR GEOSCIENCES)
    Dimitri Brosens, Piotr Tykarski, Heimo Rainer, Larissa Smirnova

 

LEARNING OUTCOMES

By the end of this TS, trainees will have the knowledge, skills and competences in order to:

  • Ensure the maximum quality when digitising taxonomic, geographical, collection and descriptive data
  • Improve the quality of data
  • Defineand determine error types
  • Search and identify error instances
  • Correct the errors
  • Document error instances and error types
  • Modify data entry procedures to reduce future errors
  • Visualise data entries
  • Define the Biodiversity Data Standards and the ABCDEFG Standards for Geosciences
  • Practice in the INTEGRATED PUBLISHING TOOLKIT and the GeoCASe platform
  • Identify the types of Data Sets convenient for publication in GeoCase

 

LANGUAGE REQUIREMENTS

In order to follow the TS, trainees need a CEFR (Common European Framework of Reference for Languages) B1 level of English, as the formal language of the TS is English.

 

CERTIFICATIONS PROVIDED

  1. Certificate of Attendance by CETAF DEST with ECVET Units (European Credit system for Vocational Education and Training)
  2. Certificate by CETAF DEST according to Europass Certificate Supplement (certifying analytically the knowledge, skills and competences gained)

 

TECHNOLOGY AND OTHER REQUIREMENTS
Trainees are required to bring their own laptops. They are also recommended to bring sample(s) of their institutional biological, geological, palaeontological & mineralogical digital databases in order to be exercised on them.

 

VENUE
Natural History Museum of Crete-University of Crete

More details: dest@cetaf.org

Application form

Registration has closed on 10 July 2023

 

Evaluation of the course

Before the course questionnaire

After the course questionnaire 

Evaluation form