GitPython is a Python code library for programmatically reading from and writing to Git source control repositories.
Let's learn how to use GitPython by quickly installing it and reading from a local cloned Git repository.
This tutorial should work with either Python 2.7 or 3, but Python 3, especially 3.6+, is strongly recommended for all new applications. I used Python 3.6.3 to write this post. In addition to Python, throughout this tutorial we will also use the following application dependencies:
Take a look at this guide for setting up Python 3 and Flask on Ubuntu 16.04 LTS if you need specific instructions to get a base Python development environment set up.
All code in this blog post is available open source under the MIT license on GitHub under the first-steps-gitpython directory of the blog-code-examples repository. Use and abuse the source code as you like for your own applications.
Start by creating a new virtual environment for your project. My virtualenv
is named testgit
but you can name yours whatever matches the project
you are creating.
python3 -m venv gitpy
Activate the newly-created virtualenv.
source gitpy/bin/activate
The virtualenv's name will be prepended to the command prompt after activation.
Now that the virutalenv is activated we can use the pip
command to install
GitPython.
pip install gitpython==2.1.7
Run the pip
command and after everything is installed you should see output
similar to the following "Successfully installed" message.
(gitpy) $ pip install gitpython==2.1.7
Collecting gitpython==2.1.7
Downloading GitPython-2.1.7-py2.py3-none-any.whl (446kB)
100% |████████████████████████████████| 450kB 651kB/s
Collecting gitdb2>=2.0.0 (from gitpython==2.1.7)
Downloading gitdb2-2.0.3-py2.py3-none-any.whl (63kB)
100% |████████████████████████████████| 71kB 947kB/s
Collecting smmap2>=2.0.0 (from gitdb2>=2.0.0->gitpython==2.1.7)
Downloading smmap2-2.0.3-py2.py3-none-any.whl
Installing collected packages: smmap2, gitdb2, gitpython
Successfully installed gitdb2-2.0.3 gitpython-2.1.7 smmap2-2.0.3
Next we can start programmatically interacting with Git repositories in our Python applications with the GitPython installed.
GitPython can work with remote repositories but for simplicity in this tutorial we'll use a cloned repository on our local system.
Clone a repository you want to work with to your local system. If you don't have a specific one in mind use the open source Full Stack Python Git repository that is hosted on GitHub.
git clone git@github.com:mattmakai/fullstackpython.com fsp
Take note of the location where you cloned the repository because we need
the path to tell GitPython what repository to handle. Change into the
directory for the new Git repository with cd
then run the pwd
(present
working directory) command to get the full path.
cd fsp
pwd
You will see some output like /Users/matt/devel/py/fsp
. This path is your
absolute path to the base of the Git repository.
Use the export
command to set an environment variable for the absolute path
to the Git repository.
export GIT_REPO_PATH='/Users/matt/devel/py/fsp' # make sure this your own path
Our Git repository and path environment variable are all set so let's write the Python code that uses GitPython.
Create a new Python file named read_repo.py
and open it so we can start
to code up a simple script.
Start with a couple of imports and a constant:
import os
from git import Repo
COMMITS_TO_PRINT = 5
The os
module makes it easy to read environment variables, such as our
GIT_REPO_PATH
variable we set earlier. from git import Repo
gives our
application access to the GitPython library when we create the Repo
object.
COMMITS_TO_PRINT
is a constant that limits the number of lines of output
based on the amount of commits we want our script to print information on.
Full Stack Python has over 2,250 commits so there'd be a whole lot of output
if we printed every commit.
Next within our read_repo.py
file create a function to print individual
commit information:
def print_commit(commit):
print('----')
print(str(commit.hexsha))
print("\"{}\" by {} ({})".format(commit.summary,
commit.author.name,
commit.author.email))
print(str(commit.authored_datetime))
print(str("count: {} and size: {}".format(commit.count(),
commit.size)))
The print_commit
function takes in a GitPython commit object and
prints the 40-character SHA-1 hash for the commit followed by:
Below the print_commit
function, create another function named
print_repository
to print details of the Repo
object:
def print_repository(repo):
print('Repo description: {}'.format(repo.description))
print('Repo active branch is {}'.format(repo.active_branch))
for remote in repo.remotes:
print('Remote named "{}" with URL "{}"'.format(remote, remote.url))
print('Last commit for repo is {}.'.format(str(repo.head.commit.hexsha)))
print_repository
is similar to print_commit
but instead prints the
repository description, active branch, all remote Git URLs configured
for this repository and the latest commit.
Finally, we need a "main" function for when we invoke the script from the
terminal using the python
command. Round out our
if __name__ == "__main__":
repo_path = os.getenv('GIT_REPO_PATH')
# Repo object used to programmatically interact with Git repositories
repo = Repo(repo_path)
# check that the repository loaded correctly
if not repo.bare:
print('Repo at {} successfully loaded.'.format(repo_path))
print_repository(repo)
# create list of commits then print some of them to stdout
commits = list(repo.iter_commits('master'))[:COMMITS_TO_PRINT]
for commit in commits:
print_commit(commit)
pass
else:
print('Could not load repository at {} :('.format(repo_path))
The main function handles grabbing the GIT_REPO_PATH
environment variable
and creates a Repo object based on the path if possible.
If the repository is not empty, which indicates a failure to find the
repository, then the print_repository
and print_commit
functions are
called to show the repository data.
If you want to copy and paste all of the code found above at once, take a
look at the
read_repo.py
file on GitHub.
Time to test our GitPython-using script. Invoke the read_repo.py
file using
the following command.
(gitpy) $ python read_repo.py
If the virtualenv is activated and the GIT_REPO_PATH
environment variable
is set properly, we should see output similar to the following.
Repo at ~/devel/py/fsp/ successfully loaded.
Repo description: Unnamed repository; edit this file 'description' to name the repository.
Repo active branch is master
Remote named "origin" with URL "git@github.com:mattmakai/fullstackpython.com"
Last commit for repo is 1fa2de70aeb2ea64315f69991ccada51afac1ced.
----
1fa2de70aeb2ea64315f69991ccada51afac1ced
"update latest blog post with code" by Matt Makai (matthew.makai@gmail.com)
2017-11-30 17:15:14-05:00
count: 2256 and size: 254
----
1b026e4268d3ee1bd55f1979e9c397ca99bb5864
"new blog post, just needs completed code section" by Matt Makai (matthew.makai@gmail.com)
2017-11-30 09:00:06-05:00
count: 2255 and size: 269
----
2136d845de6f332505c3df38efcfd4c7d84a45e2
"change previous email newsletters list style" by Matt Makai (matthew.makai@gmail.com)
2017-11-20 11:44:13-05:00
count: 2254 and size: 265
----
9df077a50027d9314edba7e4cbff6bb05c433257
"ensure picture sizes are reasonable" by Matt Makai (matthew.makai@gmail.com)
2017-11-14 13:29:39-05:00
count: 2253 and size: 256
----
3f6458c80b15f58a6e6c85a46d06ade72242c572
"add databases logos to relational databases pagem" by Matt Makai (matthew.makai@gmail.com)
2017-11-14 13:28:02-05:00
count: 2252 and size: 270
The specific commits you see will vary based on the last 5 commits I've pushed to the GitHub repository, but if you see something like the output above that is a good sign everything worked as expected.
We just cloned a Git repository and used the GitPython library to read a slew of data about the repository and all of its commits.
GitPython can do more than just read data though - it can also create and write to Git repositories! Take a look at the modifying references documentation page in the official GitPython tutorial or check back here in the future when I get a chance to write up a more advanced GitPython walkthrough.
Questions? Let me know via a GitHub issue ticket on the Full Stack Python repository, on Twitter @fullstackpython or @mattmakai.
See something wrong in this blog post? Fork this page's source on GitHub and submit a pull request.
Fix errors in your Python code before your users see them by monitoring with Rollbar.