Introduction & Welcome!


Reproducible Research Data and Project Management in R

Dr Anna Krystallli

R-RSE


https://acce-rrresearch.netlify.app/

👋 Hello

me: Dr Anna Krystalli

  • Research Software Engineer, R-RSE

    • mastodon annakrystalli@fosstodon.org
    • github @annakrystalli
    • email r.rse.eu[at]gmail.com
  • Background in Marine Macroecology

  • Core Team: ReproHack

Ice Breaker

Split into break out rooms

Introduce yourselves

Q: Why did you decide to join this course?

Why are we here?

The paper is the advertisement

“an article about computational result is advertising, not scholarship. The actual scholarship is the full software environment, code and data, that produced the result.

John Claerbout paraphrased in Buckheit and Donoho (1995)

The Scientific Paper Is Obsolete

Here’s what’s next

APR 5, 2018, The Atlantic

Lessons from the Reproducibility/Replicability crisis

  • Many issues statistical and a results of broken Academic incentive systems.

  • Much can be tackled by transparency and better computational literacy.

Reproducible Research in Computational Science

ROGER D. PENG, SCIENCE 02 DEC 2011 : 1226-1227

Reproducibility has the potential to serve as a minimum standard for judging scientific claims when full independent replication of a study is not possible.

Reinventing discovery by open sourcing science

Nielsen, Michael. Reinventing Discovery: The New Era of Networked Science. Princeton University Press, 2012. JSTOR, www.jstor.org/stable/j.ctt7s4vx.

  • Sharing resources

  • Collective intelligence

  • Mass collaboration

The internet was built for open science

Key to next generation networked science

The grand vision

Hans Rosling on open data (and data science) back in 2006

So how far have we come?

gapminder.org: today

Fighting global misconceptions with data

gapmider at our fingertips

library(ggplot2)

p <- ggplot(gapminder::gapminder, 
            aes(gdpPercap, lifeExp, size = pop, 
                color = continent, frame = year)) +
  geom_point() + scale_x_log10() + theme_bw()
plotly::ggplotly(p)

How do we get there?

Research meta-responsibilities

We need better digital curation of the workhorses of modern science: code & data

aim to create secure materials that are FAIR findable, accessible, interoperable, reusable

Research meta-responsibilities

  • Think about traceability and provenance.

  • Follow community conventions.

  • Prepare it to share it.

We all need to do our bit!

Drivers of better digital management

  • Funders: value for money, impact, reputation

  • Publishers: many now require code and data.

  • PIs, Supervisors and immediate research group

  • Your wider scientific community

  • The public

Yourselves!

Be your own best friend:

Ultimately it’s about getting a handle on our research materials

“Agree on a community convention…then follow it””

The concept of a Research Compendium

“ …We introduce the concept of a compendium as both a container for the different elements that make up the document and its computations (i.e. text, code, data, …), and as a means for distributing, managing and updating the collection.”

Gentleman and Temple Lang, 2004

Why Research Compendia?

Kartik Ram: rstudio::conf 2019 talk

Research Compendium Principles

Kartik Ram: rstudio::conf 2019 talk

R + Rstudio

Next generation data science powerhouse

Backed by a diverse and active community of learners, users and developers

Back to “Why are we here?”

  • To show you howto use R + Rstudio to perform reproducible data analyses.
  • To help you make the most of the real workhorses of your work, YOUR CODE & DATA!
  • To help you be empowered by modern tools & technologies rather than be overwhelmed by them.
  • To help you lead the culture change rather than be burdened by increased requirements.
  • Ultimately, to change how science works for better for everyone!

  • We’ll do this by introducing you to useful data and software tools and best practices.

Course Outline

Day 1

  • Basics

  • Project Management

  • Data Munging

Day 2

  • Metadata

  • Analysing & presenting data

Day 3

  • Version Control

  • Packaging Code

  • Research Compendia

We’ll take regular breaks and aim to break for lunch between 13:00 - 14:00 for an hour

Before we dive in

  • We’ll exploring best practice in data and workflow management. I’ve tried to focus on concepts and tools that I wish I knew when I started.
  • We’ll explore individual tools and concepts and show how they work nicely together.
  • We’ll be coding together and working in Posit Cloud.
  • Feedback: After each day, let me know on the notepad:
    • 📗 : somethind you liked
    • 🔴 : somethind that could be improved
  • Please feel free to ask questions if I use jargon you don’t understand or need some clarification. Questions are helpful for everyone!

Working on Zoom

  • Have your mic on mute by default.
  • Please try to have your camera on as much as possible.
  • You can use the chat for questions but better to interrupt me.
  • Please try to help each other!
  • If you get lost, the materials and appendices are your friend!
  • Use status reaction emojis to communicate how it’s going.

A note about Posit Cloud Projects

  • Please also do not share the link to Course Shared space on Posit Cloud.
  • Posit Cloud projects will be deleted after a week of the course ending. Please [download any work you want to keep.]

Let’s go!

Get back