Version Control with Git & GitHub
Version Control
Hands up - who has heard of version control software?
What do you think it does?
What is Version control? 🤔
The management of changes to documents, computer programs, large web sites, and other collections of information.
Examples:
- Numbering of book editions
- Wikipedia’s Page history
Where did it come from?
The need for a logical way to organize and control revisions has existed for almost as long as writing has existed, but revision control became much more important, and complicated when the era of computing began
Elements of a Version Control system
Changes are usually identified by a number or letter code, termed the “revision number”
Each revision is associated with a timestamp and the person making the change.
Only changes to a file are recorded rather than saving a whole new copy.
Revisions can be compared, restored, and with some types of files, merged.
What is git? 🤔
Open source (free to use) Version control software. Usually accessed via the command line, or a client program.
Where did it come from?
Git development began in 2006 after many developers of the Linux kernel gave up access to BitKeeper (at the time the best but proprietary)
Why use it in research?
Exhibit A
Exhibit B
What is GitHub 🤔
A website that allows you to store your Git repositories online and makes it easy to collaborate with others. They also provide other services like issue (bug) tracking and wikis. Similar services are GitLab and BitBucket.
Why use it in research:
To enable collaboration and track contributions
images: Mozilla Science Lab CC-BY 4.0
- Acts as a remote back-up
- Facilitates transparency
- Facilitates project management
- Facilitates sharing and collaboration
- Super-charges innovation by Open Sourcing Science
Anatomy of GitHub Repo
Readme files. Create a
README.md
file to explain what your project is, and how to install and use it.README.md
is the file that is automatically displayed when you open a GitHub repo.-
License. Without some sort of licence, the contents of the repository are technically closed. Some allow users of the code to do anything they like with their code - these are known as permissive licences. Examples are the MIT Licence or Apache.
- https://choosealicense.com/ - does what it says on the tin and helps you choose a licence. Here are some resources to help you choose:
- https://tldrlegal.com/ - plain english explanations of licences in bullet form.
Contributing guide - make a file called
CONTRIBUTING.md
and guidelines for contributors so they know what they should do if they want to help you out.Code of Conduct - good projects have codes of conduct to make sure that people are treated well. Github has an Code of Conduct wizard to make it easy to add one.
Issues - use GitHub issues to record and discuss tasks.
Git, Github & Rstudio
Before: git only through the terminal
Rstudio & usethis
to the rescue!
-
Initialise Rstudio project with Git by just checking a box!
- Forgot to? use
usethis::use_git()
- Forgot to? use
visual panel to easily see the status of all your files
interactive navigation through file version history
Version Control Demo
How does Git
work?
When a local directory becomes initialised with git, a hidden .git
folder is added to it.
it’s now called a repository
New copies of files you tell git to track will be added to that
.git
folder.After adding, git will track any modifications to those files
first commit - whole file added
Any file unknown to git will have a yellow ? box next to it.
The first time you commit a file you are adding it to .git
, effectively telling it to start tracking the file
second commit - only difference highlighted
The first time you commit a file, only the changes are shown and any file that has uncommited modifications is shown with a blue M
When all changes have been committed, the git panel is clear.
Enough theory, how about in practice!
Configure git & GitHub
Configure git
First, git
needs to know who you are so your commits can be attributed to you. usethis
to the rescue again!
Check your configuration
usethis::git_sitrep()
── Git global (user)
• Name: <unset>
• Email: <unset>
âś– Git user's name and email are not set. Configure using `use_git_config(user.name = "<your name>", user.email = "<your email>")`.
• Global (user-level) gitignore file: <unset>
• Vaccinated: FALSE
â„ą See `?git_vaccinate` to learn more
â„ą Defaulting to 'https' Git protocol
• Default Git protocol: 'https'
• Default initial branch name: <unset>
── GitHub user
• Default GitHub host: 'https://github.com'
• Personal access token for 'https://github.com': <unset>
• To create a personal access token, call `create_github_token()`
• To store a token for current and future use, call `credentials::set_github_pat()`
â„ą Read more in the 'Managing Git(Hub) Credentials' article:
https://usethis.r-lib.org/articles/articles/git-credentials.html
── Active usethis project: '/cloud/project' ──
â„ą Active project is not a Git repo
Set your configuration
Use your github username and and the email you used to sign-up on GitHub
usethis::use_git_config(
user.name = "Jane",
user.email = "jane@example.org")
Create a GITHUB PAT
To authenticate with GitHub, you’ll also need a Personal Authorisation Token (PAT). Password-based authentication for Git is deprecated, i.e. you really should not be sending your username and password every time you push or pull. Here, I’m referring to the username and password you would use to login to GitHub in the browser.
What should you do instead?
Get a personal access token (PAT) and use that as your credential for HTTPS operations. (The PAT will actually be sent as the password and the username is somewhat artificial, consulted only for credential lookup.)
How to get a PAT?
GitHub offers instructions for creating a personal access token.
The usethis package has a helper function that takes you to the web form to create a PAT, with the added benefit that it pre-selects the recommended scopes:
usethis::create_github_token()
â—Ź Call `credentials::set_github_pat()` to register this token in the local Git credential store
It is also a great idea to store this token in any password-management software that you use
âś” Opening URL 'https://github.com/settings/tokens/new?scopes=repo,user,gist,workflow&description=DESCRIBE THE TOKEN\'S USE CASE'
Will open up the GitHub panel to generate your PAT.
Give your PAT a descriptive name. I’m calling mine POSIT_CLOUD
as that’s the only place I’m going to use it and, for security reasons, I’m setting an expiration date of 7 days, just long enough to complete the course.
Once you are happy with the selected scopes, scroll to the bottom and click “Generate token”.
You will be taken to a page that shows your token. Make sure you copy it to your clipboard!.
As the page says, you must store this token somewhere, because you’ll never be able to see it again, once you leave that page or close the window.
Do not ever hard-wire your PAT into your code! A PAT should always be retrieved implicitly, for example, from the Git credential store or from an environment variable.
Store your credential
Below, we will add your PAT to our the Posit Cloud Git cached credential store as a semi-persistent convenience, sort of like “remember me” on a website.
YOU MUST ALSO STORE A SECURE COPY SOMEWHERE, Either a password manager or locally on your computer for now, WE WILL NEED IT IN FUTURE SESSIONS
Add to the file and save.
Store with credentials
package
As of March 2024, the recommended R package for accessing the Git credential store is:
If you don’t have credentials installed, install via install.packages("credentials")
.
Then call credentials::set_github_pat()
:
credentials::set_github_pat()
credentials::set_github_pat()
is a very handy function, since it populates the GITHUB_PAT environment variable using the git_credential
manager, which git
itself uses for storing passwords. The credential manager returns stored credentials if available, and securely prompt the user for credentials when needed.
Respond to the prompt with your personal access token (PAT).
You can check that you’ve stored a credential with credentials::set_github_pat()
:
credentials::set_github_pat()
If prompted for GitHub credentials, enter your PAT in the password field
Using GITHUB_PAT from Anna Krystalli (credential helper: cache)
TODO store your PAT with a password manager
Treat this PAT like a password! Currently, we’ve only store a copyable form of the PAT in the Posit CLoud credentials store cache. This cache will be cleared regularly so if we don’t store it somewhere else securely too, it will be lost!!
If you use a password management app, such as 1Password or LastPass (which you should), it is highly recommended to store this PAT in it as a password. This way, you can easily retrieve it when needed but also manage it securely.
If for some reason you ever generate a PAT but fail to capture it on your system, you’ll have to generate another one.
This is not the end of the world, but you should delete the “lost” PAT on GitHub. If you aren’t disciplined about labelling PATs and deleting lost PATs, you will find yourself in an unsettling situation where you can’t be sure which PAT(s) are in use.
For security reasons, Posit Cloud only caches the following configurations temporarily. While more secure, it does mean we’ll need to perfom these steps in new projects and every 12 hours in older projects.
On your local system however, you will only need to do this once.
To allow your local Rstudio application to communicate seamlessly with GitHub when pusshing and pulling through the GUI, I also recommend you go a step further and setup an ssh key. This will allow you to authenticate with GitHub without having to enter your username and password every time.
See Set up keys for SSH in Happy Git with R as well the credentials
package vignette for more information.
Version Controlling projects
Turn our project into a repository
If you didn’t initialise git at the beginning of your project, you can do so now with usethis::use_git()
:
This will try to commit everything in the repo so far in one go!
Override that behaviour by SELECTING A NEGATIVE RESPONSE WHEN ASKED.
usethis::use_git()
âś” Initialising Git repo'.Rhistory', '.RData' to '.gitignore'
âś” Adding 9 uncommitted files:
There are * '.gitignore'
* '.Rbuildignore'
* 'analysis.R'
* 'data-raw/'
* 'data/'
* 'report.html'
* 'report.Rmd'
* 'R/'
* 'project.Rproj'
Is it ok to commit them?
1: Negative
2: Nope
3: I agree
: Selection
Next allow Rstudio to restart when asked:
â—Ź A restart of RStudio is required to activate the Git pane
Restart now?
1: Absolutely
2: Absolutely not
3: Negative
The git panel should now be visible in the top right of your Rstudio window.
Let’s now check our git setup again:
usethis::git_sitrep()
── Git global (user)
• Name: 'Anna Krystalli'
• Email: 'annakrystalli@googlemail.com'
• Global (user-level) gitignore file: <unset>
• Vaccinated: FALSE
â„ą See `?git_vaccinate` to learn more
â„ą Defaulting to 'https' Git protocol
• Default Git protocol: 'https'
• Default initial branch name: <unset>
── GitHub user
• Default GitHub host: 'https://github.com'
• Personal access token for 'https://github.com': <unset>
• To create a personal access token, call `create_github_token()`
• To store a token for current and future use, call `credentials::set_github_pat()`
â„ą Read more in the 'Managing Git(Hub) Credentials' article:
https://usethis.r-lib.org/articles/articles/git-credentials.html
── Active usethis project: '/cloud/project' ──
── Git local (project)
• Name: 'Anna Krystalli'
• Email: 'annakrystalli@googlemail.com'
âś– Default branch cannot be determined.
• Current local branch -> remote tracking branch:
'<unset>' -> '<unset>'
── GitHub project
â„ą Project does not use GitHub
Review git status
In our project, let’s have a look at the Rstudio Git tab. It shows all the files currently in the root of our project that git is aware of. The yellow ? indicates none of the files have been added to git yet, i.e. git is aware of them but none are being formally tracked for changes.
Ignoring files through .gitignore
There may be files that we don’t want to commit to git, e.g.
data files that are too large
documents with sensitive information (eg authorisation tokens etc)
intermediate files that we don’t need to save copies of.
We can tell git to ignore them by adding them to the .gitignore
file. By doing this upfront, we minimise the danger of committing them to our project’s history by accident.
Open .gitignore
When we open .gitgnore
we see there are a number of standard files already added.
.Rproj.user
.Rhistory
.Rdata
.httr-oauth
.DS_Store
.quarto
These are standard files it’s best practice to never commit to version control, especially if we are going to share our repository online.
Let’s go ahead and add the rest of the files we want to ignore.
Add files and folders to .gitignore
.gitignore
regex
You can use regex
(regular expressions) in .gitignore
files to ignore files according to a pattern.
directoryname/*
will ignore all files in a directory.*.html
will ignore any file ending in.html
prefix “!” which negates the pattern
So let’s use regex to: - ignore all files in attic/
- ignore all files in data-raw/wood-survey-data-master/
apart from the NEON logo (neon-logo.png
) and references.bib
which are required to successfully render our report!
To do so, add the following to the bottom of .gitignore
attic/*
data-raw/wood-survey-data-master/*
!data-raw/wood-survey-data-master/neon-logo.png
!data-raw/wood-survey-data-master/references.bib
You should notice that once the .gitgnore file is saved, the files and directories we’ve added to it are no longer visible in the git panel as options to add to a commit.
Commit project files
Add files
The first step of a commit is to add the files we want to be part of the commit to the staging area..
To do so, just tick the box next to any files and/or directories in the git pane.
Given we want to commit everything, as git is ignoring files we are not interested in, it is safe to add all files to the staging area!
Note how when the data-raw
folder is selected, only 3 files are added. That’s .gitignore
in action!
Commit changes
Write an appropriate commit message (e.g. “Initial commit”) and click the commit button:
Once the commit os complete, the git panel should be clear:
Create a README
Our repository also needs a README
. We only need a simple plain markdown (.md
) file for our README.
We can create a template using usethis::use_readme_md()
usethis::use_readme_md()
Edit README
Adapt the template, adding a short description about your project, including local links to your report and dataspice information page.
I’ve added the following:
# wood-survey
<!-- badges: start -->
<!-- badges: end -->
The goal of wood-survey is to analyse the relationship between stem diameter and height in NEON vegetation survey data.
[here](report.html)
A report of our analysis can be found
[here](data/index.html) Details about the data used can be found
Add and commit your new README
Create repository on GitHub
So far we’ve only been working with a local version (i.e. only available to us in Posit Cloud) of our project. Now we want to share it with the world through GitHub!
Because have set up a GITHUP_PAT
, we can use function usethis::use_github()
to create a GitHub repository for our project:
usethis::use_github(protocol = "https")
'annakrystalli/project'
âś” Creating GitHub repository 'origin' to 'https://github.com/annakrystalli/project.git'
âś” Setting remote 'master' branch to GitHub and setting 'origin/master' as upstream branch
âś” Pushing 'https://github.com/annakrystalli/project' âś” Opening URL
Once the repo is created and any commmited files pushed, the repo is launched in the browser:
Host HTML content on GitHub
Let’s head to the repo and have a look at what we’ve shared. To host our html content on GitHub, we need to enable gh-pages
in our repository.
Enable gh-pages
Go to repo Settings and navigate to the Pages panel on the left hand side and change your settings to the following configuration:
- Source: Deploy from a branch.
- Branch: master
- Folder: / (root)
- Ensure the Enforce HTTPS option is selected.
and Save
Check Deployed Site
Once the site has been successfully deployed, you will see a message like this (you might need to refresh the page):
Click on the link displayed and go check out your work!
Note that my URL looks a bit different to yours. That’s because I’m redirecting all my GitHub pages to my personal domain. Yours will be something like https://<username>.github.io/<repo-name>
Landing page
The landing page of your site will be the README.md
file (although you could override this if the root of your project contained a index.md
or index.html
file).
From there you can click through to the rendered report page (or you can add /report.html
to the end of the URL to go directly to the report) and the data documentation page.
Report
Scroll through and check out the interactive features like the table of contents, code folding and table, figure and citation previews when hovering over their mentions!
Add URL to repo About panel
A useful feature of GitHub is the ability to include a link to any hosted content in the About panel of your repository. This is a great way to direct visitors to the rendered content of your project.
To add the URL to the About panel, click on the button next to About and check the Use your GitHub Pages website box:
Once added, visitors can easily navigate to the hosted content from the repo landing page:
Git
tips
- commit early, commit often
- commit logical bits of work together
- write meaninful messages