Introduction to Metadata


Reproducible Research Data and Project Management in R

Dr Anna Krystallli

R-RSE


https://acce-rrresearch.netlify.app/

You got data. Is it enough?

@tomjwebb I see tons of spreadsheets that i don't understand anything (or the stduent), making it really hard to share.

— Erika Berenguer (@Erika_Berenguer) January 16, 2015

@tomjwebb @ScientificData "Document. Everything." Data without documentation has no value.

— Sven Kochmann (@indianalytics) January 16, 2015

@tomjwebb Annotate, annotate, annotate!

— CanJFishAquaticSci (@cjfas) January 16, 2015

Document all the metadata (including protocols).@tomjwebb

— Ward Appeltans (@WrdAppltns) January 16, 2015

You download a zip file of #OpenData. Apart from your data file(s), what else should it contain?

— Leigh Dodds (@ldodds) February 6, 2017

#otherpeoplesdata dream match!

Thought experiment:

Imagine a dream open data set, how would you locate it?
  • what details would you need to know to determine relevance?
  • what information would you need to know to use it?

metadata = data about data

Metadata

“Information that describes, explains, locates, or in some way makes it easier to find, access, and use a resource (in this case, data).”

Data Reuse Checklist

http://mozillascience.github.io/checklist/

Metadata

Backbone of digital curation

Without it, a digital resource may be irretrievable, unidentifiable or unusable

Metadata Types

Descriptive

  • enables identification, location and retrieval of data, often includes use of controlled vocabularies for classification and indexing.

Technical

  • describes the technical processes used to produce, or required to use a digital data object.

Administrative

  • used to manage administrative aspects of the digital object e.g. intellectual property rights and acquisition.

Elements of metadata

  • Structured data files:

    • readable by machines and humans, accessible through the web
  • Controlled vocabularies eg. NERC Vocabulary server

    • allows for connectivity of data


KEY TO SEARCH FUNCTION

  • By structuring & adhering to controlled vocabularies, data can be combined, accessed and searched!

  • Different communities develop different standards which define both the structure and content of metadata

Metadata in research

Identifying the right metadata standard

  • General: Dublin Core Metadata Initiative Specification

  • NERC Data Centers: Check with individual data centers for their metadata specification.

  • Re3data.org: Registry of Research Data Repositories.

Seek help from support teams

Most university libraries have assistants dedicated to Research Data Management:

@tomjwebb @ScientificData Talk to their librarian for data management strategies #datainfolit

— Yasmeen Shorish (@yasmeen_azadi) January 16, 2015

Key metadata

The bare minimum

Document data coverage information

  • taxonomic coverage: a table containing taxonomic information on species in data.
    • also record authority / source
  • temporal coverage: temporal range and resolution details
  • spatial coverage:
    • a human readable geographic description of the study area
    • spatial range and resolution details
    • include depth (marine/freshwater) or altitudinal (terrestrial) information

Make sure to record units!

Methods metadata

Document protocols in a methods document

Keep a dynamic document used to plan, record and write up methods.

@tomjwebb record every detail about how/where/why it is collected

— Sal Keith (@Sal_Keith) January 16, 2015

Any additional information other users would need to combine your data with theirs? Record it

Practical metadata

ACCE DTP RDM course


Teaching this course has always felt challenging in terms of practical exercises

  • Defining Metadata & explaining importance: ✅
  • Advising on domain specific Controlled Vocabularies & structure ❌

  • How can we practice creating metadata?

rOpenSci Unconf 18

May 21 - 22, 2018. Seattle

rOpenSci Unconf mission

bringing together scientists, developers, and open data enthusiasts from academia, industry, government, and non-profits to get together for a few days and hack on various projects.


Ideas for projects submitted through GitHub issues in the runconf18 repo

issue #72 🙋‍♀️

Metadata team!

Luckily, a whole bunch of other awesome folks were also thinking about these topics and interested in working on them! 🤩

(in alphabetical order):

  • Carl Boettiger
  • Scott Chamberlain
  • Auriel Fournier: #41
  • Kelly Hondula
  • Anna Krystalli
  • Bryce Mecum
  • Maëlle Salmon
  • Kate Webbink: #52
  • Kara Woo: #68

pkg dataspice

Package dataspice makes it easier for researchers to create basic, lightweight and concise metadata files for their datasets.


  • Metadata collected in csv files
  • Metadata fields are based on schema.org
    • underlies Google Datasets metadata specification
  • Helper functions and shinyapps to extract and edit metadata files.
  • Ability to produce:
    • structured json-ld metadata file.
    • a helpful dataset README webpage.

Google unveils search engine for open data

The tool, called Google Dataset Search, should help researchers to find the data they need more easily.

Nature NEWS - 05 SEPTEMBER 2018


https://toolbox.google.com/datasetsearch

dataspice tutorial


The goal of this section is to provide a practical exercise in creating metadata for an example field collected data product using package dataspice.

  • Understand basic metadata and why it is important
  • Understand where and how to store them
  • Understand how they can feed into more complex metadata objects.

dataspice workflow

Practical

time for some live coding 😱

head to the dataspice tutorial

Introduction to Metadata Reproducible Research Data and Project Management in R Dr Anna Krystallli R-RSE https://acce-rrresearch.netlify.app/

  1. Slides

  2. Tools

  3. Close
  • Introduction to Metadata
  • You got data. Is it enough?
  • @tomjwebb Annotate,...
  • You download a zip...
  • #otherpeoplesdata dream match!
  • metadata = data about data
  • Metadata
  • Metadata
  • Metadata Types
  • Elements of metadata
  • Metadata in research
  • Identifying the right metadata standard
  • Seek help from support teams
  • Key metadata
  • The bare minimum
  • Methods metadata
  • Practical metadata
  • ACCE DTP RDM course
  • rOpenSci Unconf 18
  • rOpenSci Unconf mission
  • issue #72 🙋‍♀️
  • Metadata team!
  • pkg dataspice
  • Google unveils search engine for open data
  • dataspice tutorial
  • dataspice workflow
  • Practical
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • ? Keyboard Help