Why Version Control?
You’re working on a team project and need to make edits to reports and code. You waiting for your team member to make a change and then email you back another a copy. There has to be a better way…
“Version control is the lab notebook of the digital world: it’s what professionals use to keep track of what they’ve done and to collaborate with other people. Every large software development project relies on it, and most programmers use it for their small jobs as well. And it isn’t just for software: books, papers, small data sets, and anything that changes over time or needs to be shared can and should be stored in a version control system.” – Version Control with Git
What better way to understand git, then check out git itself. This might take a while…
We’ll be working inside the git/ directory set our working state to v2.23.0.
Git’s Object Model: Content-Addressable Data Store.
- Every object has a SHA-1 hash: 40 hex characters.
- Given 40 hex characters, we can find the unique object with that hash.
Let’s examine a single commit.
Object Types: Blobs, Trees, Commits
We will use the
git cat-file command to help us search for objects inside the store.
If we provide git with a partial hash, it will attempt to find a unique match, and if it is unable to, it will provide a list of those that did match.
Let’s examine a blob object. A blob contains file contents.
Let’s examine a tree object. A tree contains folder contents.
Example representation of folder contents contained by a tree:
Perhaps one of the most important type of object inside the object model is a commit. A commit contains many things:
- A root tree
- A list of parent commits
- A commit message
- An author name, email, time.
- A committer name, email, time.
Let’s examine an example commit.
We can examine the commit graph (but only the first part!).
Diffs are not part of the object model!
Commits are NOT diffs
Instead, diffs are dynamically calculated from the commit graph inside the object store. For example, even object attributes, such as file renames are not represented inside the datastore and must be calculated dynamically.
Let’s examine a diff.
To enable efficient representation and fast computations of git operations, merkle trees provide forward references within the graph to blobs.
Branches are simply pointers to commits. Tags are pointers to anything (commits, trees, blobs).
Move between branches with git switch
git switch is a new feature in v2.23.0 of git. It essentially replaces and does less work than
git checkout. Primarily,
git switch will:
HEADto point to a new branch.
- Updates the working directory to match the commit’s tree.
We can switch our branch to the maintenance branch.
We can return to the main branch.
Practice: Creating a Repo
Let’s try the basics. Let’s create a new local git repository.
Create a new directory (Basics) and file (README.md).
We are going to create a new git repository, but maybe not the way you’ve done it before.
In the next set of commands, we will be working inside the
This will create a new .git directory to store commits and other objects.
We can quickly inspect the contents of the git’s directory and object store.
Before adding a file to the repository, it must first be staged.
We will commit our staged changes into the repository.
Stage, unstage, and discard changes
Changes flow from our working tree, to staging index, and into repository.
Exercise: Use the following sets of steps and execute them in any order you wish. Observe what happens to the working tree and index, by running the
git status step.
Update the README.md and stage our change.
View the current state of our working tree and index.
Unstage file (remove from index), but keep changes in working tree.
Discard changes in worktree (we will lose our work!). This will restore changes to both the index and the working tree based on the latest version in the repo.
While having a local git repository is cool, we should connect it to another remote repository. In other words, we have no place to
git push to…
- Get new data:
git fetch <remote> [branch]
- Upload your data:
git push <remote> <branch>
- Get new data and merge into working tree:
git pull <remote> <refspec>
Hot Take: Avoid
git pull on large repositories! You may want to handle merges yourself into your target branch instead of having git mess with your working tree.
Exercise: Let’s open a terminal and perform the following steps.
Create a repo on GitHub (If you are a NCSU student, use GitHub Enterprise: https://github.ncsu.edu).
Follow the instructions on GitHub to add a remote url to an existing git repository. Basically, you need to run something like:
git remote add origin https://github.com/<user>/<repo>.git
Push your changes to GitHub. Verify you can see your updated README.md!
On GitHub, edit the README.md, to say “Hello GitHub!”. Commit the changes on GitHub. Now you have changes in your remote (origin), that are missing on your local copy.
git pulland verify you now have the updated changes.
Git Branching Playground
Manipulating the commit graph can get quite complicated! This interactive visualization is very useful for getting a deeper understanding of how operations such as branches, merges, cherry-picking, and more work!
We will solve the “Introduction Sequence” levels in:
Git Configuration and Security
If you want to make sure your commits are properly linked to your GitHub account, make sure you have configured your computer to have your name and email filled out.
$ git config --global user.name "FirstName LastName" $ git config --global user.email email@example.com
You might also consider an authenication strategy. If you’re being asked to login everytime your pull/push to your remote repository, you might want to enable caching of your credentials. For example, you could use:
git config --global credential.helper store
However, this may store your credentials in plain text on your computer. There are other platform-specific credential.helpers that you can use to more securely store your credentials. It is also possible to generate personal access tokens that you can use authenicate instead of a passcode.
An alternative approach is to use sshkeys. In this case, you have a public/private keypair, with the public key stored on GitHub. You then use a different url pattern for your commands such as
git clone. Instead of the
https:// prefix, you instead use
If you are interested in exploring this option: See these guides on GitHub: