GitHub is well-known as a platform where software developers host their code and collaborate with their teams on a project. In this blog post, we'll show you how you can use the GitHub model to do the same thing and collaborate seamlessly on your research papers.
This blog post is co-authored with Dmitry Soshkinov, because we believe that GitHub is a great technology and tool to be used beyond pure software development.
Git, GitHub, and how it all works
If you've never worked with GitHub before, check out the Microsoft Learn module for a step-by-step introduction.
The first thing you'll want to do is set up Git. Git is the version control system that runs behind the scenes of any GitHub project-- it's what allows you to collaborate with others, go back to previous versions of your project, and view changes made by different members of your team. You may want to use Git from a command-line, but in the beginning, it might be easier to use the GitHub Desktop client.
Projects on GitHub are organized in repositories. You'll create a new repository for your research paper, and choose who you want to have access. All your files, whether you're using Markdown, LaTeX, or another typesetting or markup language (more on that later!) will live in this repository. You'll want to clone the repository to your local machine, so that you have a copy of your files.
The source of truth for your paper will live on the main branch of your repository -- this branch is initialized when you create your repository. You can create multiple branches for different sections of your paper, and edit and merge them into your main branch when you're finished. A commit is a snapshot of your repository at a given moment, and it might contain a set of changes that you've made to the information on a specific branch.
This is just a short introduction to all the features you can take advantage of when you use GitHub to collaborate on your research papers. Keep reading for more information, and a sample workflow that you can use to get started.
What should and should not be stored in Git
It is important to understand that GitHub is not a replacement for file storage, or a convenient storage for binary files. It was originally designed to be used as a source code repository, and thus it allows you to track changes between text documents. If you are planning on collaborating on Word documents, setting up a shared OneDrive location is a much better choice. For this reason, many people don’t consider GitHub to be a convenient collaboration platform for editing documents. However, scientists often write their papers in text format, most often – TeX or LaTeX. This makes it very convenient to use GitHub as a collaboration platform. It is one of the reasons we believe that GitHub is a very beneficial collaboration platform for scientists.
Why GitHub?
Using Git will give you many advantages:
- Tracking changes between different editions of a document. Text documents can be easily compared to each other using the GitHub interface. This is useful even when you are working on a paper alone, because all changes are tracked, and you can always roll back to any previous state.
- Working on different branches of the document and merging branches together. There are a few different styles of using Git for collaboration, so-called Git workflows. With branches, you and your collaborators can all work on specific parts of your project without conflicts, for prolonged periods of time.
- Accepting contributions to your paper/code from outside. Github has a convenient mechanism of pull requests – suggestions from other users, that you can then approve and merge into the main content. For example, the Web Development for Beginners course was developed and hosted on GitHub originally by a group of around 10 people, and now it has more than 50 contributors, including people who are translating the course into different languages.
- If you are very advanced (or have some friends who are into DevOps), you can setup GitHub Actions to automatically create a new PDF version of your paper every time changes are made to the repository.
LaTeX or Markdown?
Most scientists write their papers in LaTeX, mostly because it provides easy access to a lot of workflows in academia, like paper templates. There are also some good collaboration platforms specific to TeX, for example, Overleaf. However, it won't give you full control of your versioning and collaboration features like Git.
However, writing in LaTeX also includes quite a bit of overhead, meaning that many layout features are quite verbose, for example:
\subsection{Section 1}
\begin{itemize}
\item Item 1
\item Item 2
\end{itemize}
In the world of software development, there is a perfect format for writing formatted text documents -- Markdown. Markdown looks just like a plain text document, for example, the text above would be formatted like this:
## Section 1
* Item 1
* Item 2
This document is much easier to read as plain text, but it is also formatted into a nice looking document by Markdown processors. There are also ways to include TeX formulae into markdown using specific syntax.
In fact, I've been writing all of my blog posts and most text content in Markdown for a few years, including posts with formulae. For scientific writing, the great Markdown processor (as well as live editing environment) integrated with TeX is madoko – I highly recommend you check it out. You can use it from the web interface (which has GitHub integration), and there's also an open-source command-line tool to convert your Markdown writing into either LaTeX, or directly to PDF.
While you may continue using LaTeX with Git, I encourage you to look into markdown-based writing options. By the way, if you have some writing in different formats, such as Microsoft Word documents, it can be converted to Markdown using a tool called Pandoc.
Sample workflow
The main thing that Git does is allow you to structure your writing (whether it is code or a scientific paper) into chunks called commits. Your code is tracked in a local repository that lives on your computer, and once you have made some changes, you commit them to save. Then, you can also synchronize your commits with others by using a remote common repository, called upstream.
Sound complicated? When using GitHub Desktop most of the tasks are completely automated for you. Below, we describe the simplest way you can collaborate on a paper with your colleagues.
-
Create a new repository on GitHub. I set the visibility to Private so I can decide which collaborators I’d like to invite to contribute later.
-
Select Set up in Desktop to quickly set up your repository in GitHub Desktop.
-
Next, you'll need to create a local clone of the repository on your machine. You may be prompted to reauthenticate to GitHub during this step.
-
I already have a couple of Markdown files that I've started working on saved to my computer. I can select View the files of your repository in Finder to open the folder where my local copy of the repository is stored, and drag in the files for my Table of Contents, Section 1, and Bibliography from my computer.
-
Now, when I go back to GitHub Desktop, I can see those files have been added to my repository. I want to commit those files to the main branch. I can also publish my branch to push those changes to GitHub, and make them accessible to others who I'll collaborate with.
-
Next, I'm going to create a new branch so I can go off and work on Section 2 of my paper. I'll automatically end up on that branch after it has been created. There are a couple of options you'll be able to select from for making changes to your file in this branch:
- You can create a Pull Request from your current branch -- if I wanted my colleague to be able to review the changes I've made in this branch, I'd use this option and send them the PR for review.
- You can also open the repository in your external editor. I use VS Code to edit my files, so I can add section 2 of my paper there, and then commit it to my section2 branch.
- If I already have section 2 of my paper saved somewhere on my computer, or if my colleague has sent me something they've worked on, I can follow the same workflow as above and check out the files in my repository on my machine, and add/remove files that way.
- If I just need to make a small change, I'd open my repository in the browser and edit from there.
-
I can open my repository in GitHub to check out all of the files and information. This is the link I’d send to a colleague if I wanted them to be able to clone the code onto their local machine, and help me out with some sections.
Since I’ve made my repository private, I’ll need to add collaborators in the Settings pane.
-
Once I’m happy with Section 2 of my paper, I can go ahead and merge it into the main branch of my repository. I switch over to the main branch, then choose a branch to merge into main, and choose section2. Then, I’ll want to push my changes back up to GitHub so that the main branch is updated with the newest changes for any future collaborators.
This is one example of a Git workflow you can use in conjunction with GitHub Desktop to collaborate on a research paper with your colleagues. There are several other ways that may serve your needs better—you may want to use the command line with VS Code, or edit your files on GitHub in the browser. Whatever method works for you is the best method, as long as you’re able to accomplish your goals.
Further reading and useful links
- GitHub Desktop documentation
- GitHub.com documentation
- Edit your files with Visual Studio Code