Git With Python HowTo GitPython Tutorial And PyGit2 Tutorial

Interacting with Git using Python is a very common use case in the DevOps field: very often it is necessary to checkout application’s or scripts along with their configuration or even just checkout versioned configurations. Although more rare, it is sometimes necessary to update the checked out contents and push the committed version back to the “origin” remote repository. In the "Git With Python HowTo GitPython Tutorial And PyGit2 Tutorial" post we play with the two most commonly used Python libraries used to interconnect to Git: gitpython and pygit2.

As a companion of this post you may consider reading:

since both posts provide a lot of useful information on Git

GitPython

Probably the most used library and of course one of the earliest, it is a wrapper around the "git" command line utility providing an object model representing Git concepts and offering error handling. For this reason GitPython needs the "git" executable to be installed on the system and available in your "PATH" environment variable or in the "GIT_PYTHON_GIT_EXECUTABLE" environment variable.

CVE-2024-22190: despite this being a Linux blog, since Python code can run also on Windows it is worth the effort to mention this CVE, which exploits untrusted search paths.

The official documentation is available here.

GitPython is not suited for long-running processes (like daemons) as it tends to leak system resources. It was written in a time where destructors (as implemented in the __del__ method) still ran deterministically.

This project is in maintenance mode, which means that:

there will be no feature development, unless these are contributed
there will be no bug fixes, unless they are relevant to the safety of users, or contributed
issues will be responded to with waiting times of up to a month

Of course I'm not going to describe everything about GitPython: doing that would mean rewriting its documentation, so a pointless thing - I just want to provide some handy examples for the most common use cases.

The GitPython official documentation is available at https://gitpython.readthedocs.io/en/stable/.

Installing

GitPython can be easily installed using PIP as follows:

python3 -m pip install gitpython

Handling Exceptions

The very most of the GitPython exceptions can be managed by capturing the "git.exc.CommandError" exception as shown by the following snippet:

try
   git.somefunction(...)
except git.exc.GitCommandError as e:
    print(e, file=sys.stderr)
    exit(1)

Cloning a Remote Repository

If the remote repository is a public one, cloning it requires just a very trivial statement like the following one:

repo = git.Repo.clone_from("https://git0.p1.carcano.corp/mcarcano/myrepo.git", "myrepo")

the function returns a Python Object referring to the cloned repository.

Cloning a Remote Private Repository

If the remote repository is a private one, things differs depending on it is using "HTTP" or "SSH" transport.

HTTP Transport

When dealing with HTTP transport, authentication is password-based: in this case the "git" command line utility requires to specify the path to an external helper script able to pass the authentication credentials.

Since GitPython relies on the "git" command line utility, it inherits the same requirement, so to work with private Git repositories requiring password authentication you must create the "askpass.py" file with the following contents:

#!/usr/bin/python3
    
from sys import argv
from os import environ
    
if 'username' in argv[1].lower():
    print(environ['GIT_USERNAME'])
    exit()
    
if 'password' in argv[1].lower():
    print(environ['GIT_PASSWORD'])
    exit()
    
exit(1)

Despite technically the "askpass.py" can be put where you prefer, but to keep things simple and easy both for running and deployment my suggestion is to put it in the same directory with your script using GitPython.

once done, assign it execution rights:

chmod 755 askpass.py

as for the Python code to put in tour script for cloning the repo, you must export the environment variables:

GIT_ASKPASS - the path to the "askpass.py" helper script
GIT_USERNAME - the environment variable with the username expected by the "askpass.py" helper script
GIT_PASSWORD - the environment variable with the password expected by the "askpass.py" helper script

right before the clone statement.

For example:

working_dir=os.path.dirname(os.path.realpath(__file__))

os.environ["GIT_ASKPASS"] = os.path.join(working_dir, "askpass.py")
os.environ["GIT_USERNAME"] = "myusername"
os.environ["GIT_PASSWORD"] = "thepassword"

repo = git.Repo.clone_from("https://git0.p1.carcano.corp/mcarcano/myprivaterepo.git", "myprivaterepo")

SSH Transport

SSH transport is a different matter, since it relies on authorized SSH public keys: in this case it is necessary to export into the "GIT_SSH_COMMAND" environment variable the ssh statement with the parameter pointing to the SSH private key to use while connecting.

For example, to use the private key from the "sshkey" file in the same directory of the script itself:

ssh_cmd = "ssh -i sshkey"
repo = git.Repo.clone_from("git@git0.p1.carcano.corp:mcarcano/myprivaterepo.git", "myprivaterepo", env={"GIT_SSH_COMMAND": ssh_cmd})

If the SSH private key is password protected, you can use the SSH agent as described in my post "OpenSSH Tutorial - The Ultimate SSH Guide To Understand It".

Configuring User Attributes

Properly committing changes to a repository requires to set a few user attributes - at least the Author's name and email address - so that to be able to easily track in the Git log who made the change.

Once cloned the repository (or opened from the local filesystem) and got the repository Python Object, you can easily set these attributes as follows:

repo.config_writer().set_value("user", "name", "Marco Carcano").release()
repo.config_writer().set_value("user", "email", "marco.carcano@carcano.corp").release()

Creating A New Branch

If the cloned branch is a protected one, then you are not allowed to commit changes to it and push it back to the remote repository. In such a scenario you must checkout a new branch from the cloned one and operate the changes into that branch.

This can be achieved as follows:

new_branch = repo.create_head("my-new-branch")

When dealing with protected branches changes are merged to the protected branch by the means of a Merge Request (aka Pull Request). In such a scenario, when creating the Merge Request you must specify the current branch (so the freshly created one) as the source, and the protected one (the one you cloned) as target. In such a scenario, you can get the name of the current branch before creating the new one as follows:

initial_branch_name=repo.active_branch.name

if you need to start a new branch from a different one than the one you cloned, you can use the following syntax:

new_branch = repo.create_head("my-new-branch", origin.refs.master)

of course specifying the ref-spec of the branch you want to use.

Once created the new branch, you probably want to checkout it as follows:

new_branch.checkout()

Committing Changes

Once cloned the repository (or opened from the local filesystem) and got the repository Python Object, you can easily add contents to the staging area:

repo.git.add("changedfile.txt")

the above statement of course adds just a single changed item, ... you may instead prefer to add every changed content as a whole:

repo.git.add(all=True)

Once added everything you fancy to the staging area, just commit the changes as follows:

repo.index.commit("My commit message")

Creating Tags

Git supports two types of tags: lightweight and annotated. GitPython of course support both kind of tags.

If you need to create a lightweight tag:

lightweight_tag = repo.create_tag("my_lw_tag")

If instead you need to create annotated tag:

annotated_tag = repo.create_tag("my_tag", message="the descriptive message")

Pushing Changes

A very common use case is pushing to the remote Git repository the committed changes.

Branches

If the changes have been committed directly into the branch that was previously cloned, you can just define the remote repository as a Python object and push as follows:

remote_name="origin"
repo.remote(remote_name).push()

if instead the branch to push is a new locally created one (a quite common use case when dealing with protected branches), it is necessary to pass it as an argument of the push - for example:

remote_name="origin"
new_local_branch-"mybranch"
repo.remote(remote_name).push(new_local_branch)

even easier, if you just want to push the current branch, no matter it is a new or existent one:

remote_name="origin"
repo.remote(remote_name).push(repo.active_branch.name)

PyGit2

It is a set of Python bindings to the libgit2 linkable C Git library - libgit2 is a dependency-free implementation of Git, with a focus on having a nice API for use within other programs. Besides pygit2, there are available bindings for other programming languages such as C#, Objective-C, Golang and so on.

Among libgit2 supported features, there is also SSH: conversely from the HTTPS feature, that leverages on the SSL library, the SSH feature relies on libssh2. Because of some security issues with libssh2, some distributions, such as Red Hat, decided not to provide it anymore (it was available until Red Hat Enterprise Linux 7 - now it is intentionally missing).

Of course I'm not going to describe everything about pygit2: doing that would mean rewriting its documentation, so a pointless thing - I just want to provide some handy examples for the most common use cases.

The pygit2 official documentation is available at https://www.pygit2.org/index.html.

Checking For SSH transport support

Before using pygit2, it is best to check that:

pygit2 has SSH support enabled
libgit2 with SSH support enabled

Make sure that pygit2 has been built with SSH transport is very easy: just launch Python and run the following statements:

import pygit2 as pg
bool(pg.features and pg.GIT_FEATURE_SSH)

if the output is "True", then pygit2 has been built with SSH transport enabled.

Checking if "libgit2" has been built with SSH support enabled requires instead building a small C program that tries to instantiate the SSH transport.

Create the "/tmp/libgit2-SSH-support-test.c" file with the following contents:

#include <git2.h>
#include <git2/sys/transport.h>
#include <stdio.h>

void main()
{
  git_transport *transport;

  int err = git_transport_new(&transport, NULL, "ssh://github.com/libgit2/pygit2.git");
  printf("%d\n", err);
}

then, build and run it as follows:

gcc /tmp/libgit2-SSH-support-test.c -lgit2 && ./a.out

if the output is "0", then SSH support has been enabled in the "libgit2" library, otherwise it does not.

Installing

On Red Hat family Linux distributions, pygit2 can be easily installed using DNF as follows::

sudo dnf install -y python3-pygit2

if instead you prefer to use PIP, type:

python3 -m pip install pygit2

Handling Exceptions

The very most of the pygit2 exceptions can be managed by capturing the "GitError" exception as shown by the following snippet:

from pygit2._pygit2 import GitError

try
   git.somefunction(...)
except GitError as e:
    print(e, file=sys.stderr)
    exit(1)

Please note how we explicitly imported the GitError name using the "from" directive.

Opening An Existing Local Repository

Opening an existing local repository is very trivial and can be easily accomplished by using the "discover_repository" method as follows:

repository_path = pygit2.discover_repository("path/to/the/repository/directory")
repo = pygit2.Repository(repository_path)

The outcome is a Python object describing the repository.

Cloning a Remote Repository

If the remote repository is a public one, cloning it requires just a very trivial statement like the following one:

repo = pygit2.clone_repository("https://git0.p1.carcano.corp/mcarcano/myrepo.git", "myrepo")

the function returns a Python Object referring to the cloned repository.

Cloning a Remote Private Repository

If the remote repository is a private one, we must instruct pygit2 about how to handle the authentication process. This can be easily achieved by subclassing the "RemoteCallbacks" Object:

class GitCallbacks(pygit2.RemoteCallbacks):
    def __init__(self, user=None, token=None, pub_key=None, priv_key=None, passphrase=None):
        self.user = user
        self.token = token
        self.pub_key = pub_key
        self.priv_key = priv_key
        self.passphrase = passphrase

    def credentials(self, url, username_from_url, allowed_types):
        if allowed_types & pygit2.enums.CredentialType.USERNAME:
            return pygit2.Username(self.user)
        elif allowed_types & pygit2.enums.CredentialType.USERPASS_PLAINTEXT:
            return pygit2.UserPass(self.user, self.token)
        elif allowed_types & pygit2.enums.CredentialType.SSH_KEY:
            return pygit2.Keypair(username_from_url, self.pub_key, self.priv_key, self.passphrase)
        else:
            return None

    def push_update_reference(self, refname, message):
        if message is not None:
            raise GitError("Push of {} failed - error message is: {}".format(refname, message))

    def certificate_check(self, certificate, valid, host):
        return True

    def transfer_progress(self, stats):
        print("Retrieved objects: {}/{}".format(stats.indexed_objects, stats.total_objects), end="\r")

in the above snippet:

the "__init__" method has every parameter as optional (lines 2-7)
the "credentials" method selects and returns the correct kind of credentials (USERNAME, USERPASS_PLAINTEXT, SSH_KEY) along with its values (lines 9-17)
the "push_update_reference" method is used to raise a GitError exception if any (error) message is returned during a push (lines 19-21)
the "certificate_check" overrides the checking of the server's certificate - it returns always True, so the certificate is never check - this is just a sample: don't implement it in production (lines 23-24)
the "transfer_progress" method is used to print information about the progress of the clone (lines 26-27)

HTTP Transport

When dealing with HTTP transport, authentication is password-based: in this case we initialize the callback by passing the username and the password:

repo = pygit2.clone_repository("https://git0.p1.carcano.corp/mcarcano/myprivaterepo.git", "myprivaterepo", 
                callbacks=GitCallbacks(user="myusername", token="thepassword"))

SSH Transport

SSH transport is a different matter, since it relies on authorized SSH public keys: in this case we initialize the callback by passing the path to the private key file, the path to the public key file and the passphrase to decode the private key:

repo = pygit2.clone_repository("ssh://git0.p1.carcano.corp/mcarcano/myprivaterepo.git", "myprivaterepo",
           callbacks=GitCallbacks(priv_key="sshkey", pub_key="sshkey.pub", passphrase="thepassphrase"))

Don't forget to check that SSH is actually enabled in both pygit2 and libgit2 as described before: this will save you a lot of time trying to debug obscure oddities until you will find that it was just not enabled.

Getting Configurations From An Opened Repository

Once a repository is opened, either using the "discover_repository" or "clone_repository" methods, a repository object is returned: among its nested objects, there's the "config" object containing, as a list of items, all the setting for that repository (the ones stored in the "./git/config" file inside the repository directory)

These properties can be accessed by using the "__getitem__" method.

For example, if they have already been set into the "./git/config", we can easily get the username and email address as follows:

repo = pygit2.Repository("path/to/the/repository/directory")
repo.config.__getitem__('user.name'))
repo.config.__getitem__('user.email')

Creating A New Branch

This can be achieved as follows:

last_commit= repository.revparse_single("HEAD")
new_branch = repository.branches.local.create("my-new-branch",last_commit)
repository.checkout(repository.branches.local["my-new-branch"])

initial_branch_name=repo.head.shorthand

Committing Changes

Once cloned the repository (or opened from the local filesystem) and got the repository Python Object, you can easily add contents to the staging area.

First, create the Signature objects for the author of the changes and for their committer:

author = pygit2.Signature("Marco Carcano", "marco.carcano@carcano.corp")
committer = pygit2.Signature("Marco Carcano", "marco.carcano@carcano.corp")

Despite the fact that most of the time the author and committer are the same person, that is not always true: think for example a pull request - in this case the committer is somebody committing changes on behalf of someone else.

Mind that if the repository already contains committer's information, you can guess both Author and Committer information using the "__getitem__" method of the repository's "config" object from it as follows:

author = pygit2.Signature(repository.config.__getitem__('user.name'), repository.config.__getitem__('user.email'))
committer = pygit2.Signature(repository.config.__getitem__('user.name'), repository.config.__getitem__('user.email'))

If you instead need to access global (system account-wide) information, you must use the "get_global_config()" method.

For example:

global_config = pygit2.Config.get_global_config()

we must the create the index for the staging area's files, add contents to it and create the tree object.

index = repo.index
index.add("changedfile.txt")
index.write()
tree = index.write_tree()

if the commit is not the initial commit of a new git repository, we must also assign:

the current head's name
the list of parents

ref = repository.head.name
parents = [ repository.head.target ]

the above statement of course adds just a single changed item, ... you may instead prefer to add every changed content as a whole:

index.add_all()

Once defined all of the above, you can perform the actual commit as follows:

message = "My commit message"
commit=repository.create_commit(ref, author, committer, message, tree, parents)

Creating Tags

Git supports two types of tags: lightweight and annotated. Conversely from GitPython, pygit2 does not have a specific method for creating lightweight tags - you must deal with creating a reference by yourself as follows:

from pygit2._pygit2 import GIT_OBJ_COMMIT

tag_name = "my_lw_tag"
last_commit= repository.revparse_single("HEAD")
if last_commit.type == GIT_OBJ_COMMIT:
    repository.create_reference("refs/tags/{}".format(tag_name), (last_commit.id))

it instead provides the "create_tag" method for creating annotated tags. For example:

from pygit2._pygit2 import GIT_OBJ_COMMIT

last_commit= repository.revparse_single("HEAD")
if last_commit.type == GIT_OBJ_COMMIT:
    committer = pygit2.Signature("Marco Carcano", "marco.carcano@carcano.corp")
    oid = repository.create_tag("my_tag", str(last_commit.id), GIT_OBJ_COMMIT, committer, "the descriptive message")

Pushing Changes

A very common use case is pushing to the remote Git repository the committed changes.

Branches

The statement for pushing branches changes a little bit depending on the underlying transport.

The following snippet is an example for the HTTP transport:

branch="mybranch"
remote_name="origin"
repository.remotes[remote_name].push(["+refs/heads/{}".format(branch)], callbacks=GitCallbacks(user="username", token="password"))

if instead you are dealing with the SSH transport, the above code changes as follows:

branch="mybranch"
remote_name="origin"
repository.remotes[remote_name].push(["+refs/heads/{}".format(branch)], callbacks=GitCallbacks(priv_key="sshkey", pub_key="sshkey.pub", passphrase="password"))

Example Scripts

My affectionate readers know I always show things in action: theory is important, but worth few without some good practical examples.

So I wrote two sample scripts, one using gitpython and the other using pygit2: you can use them as a starting point to write your own, improving and adapting them as needed.

Create a directory for this project and change into it:

mkdir -m 755 git-with-python
cd git-with-python

Both scripts read configuration settings from the "conf" directory, and are stored in the "bin", so create these directories as follows:

mkdir -m 755 conf bin

Several parts of both scripts use the same logic, for example for reading configuration files or initialisation Python's logging facility. For this reason, no matter which script you want to use, you must also create these files.

Configuration Files

Let's start by setting the configuration files - create the "conf/secrets.ini" with the following contents:

[git]
username=fooapp
password=A6.Ur30-Ne
user=Foo Application
email=foo@carcano.corp

it contains:

the username and password to connect to the remote git repository
the username and email address to be used for authoring and committing changes

Since - at least with GitPython - we are going to test also the SSH transport, create also the "conf/sshkey" file containing the private key for connecting - it's contents should look like the following snippet:

and the "conf/sshkey.pub" file with the related public key - remember to authorise it on the remote git repository or it won't of course be valid for connecting.

If you need guidance on how to generate these keys, please read the post "OpenSSH Tutorial - The Ultimate SSH Guide To Understand It" - besides showing you how to deal with these keys, it will also give you a very good understanding of OpenSSH.

then set up the "conf/logging.conf" file with all the settings to be used by the Python's logging facility:

[loggers]
keys=root,foo

[handlers]
keys=console,syslog,file

[formatters]
keys=jsonFormatter

[logger_root]
level=DEBUG
handlers=file
propagate=0

[logger_foo]
level=DEBUG
handlers=file
qualname=__main__
propagate=0

[handler_null]
class = logging.NullHandler
formatter = default
args = ()

[handler_console]
class=StreamHandler
level=DEBUG
formatter=jsonFormatter
args=(sys.stderr,)

[handler_syslog]
class = handlers.SysLogHandler
args = ('/dev/log', handlers.SysLogHandler.LOG_USER)
level=DEBUG
formatter=jsonFormatter

[handler_rotatingfile]
class=handlers.RotatingFileHandler
level=DEBUG
formatter=jsonFormatter
args=('foo.log','a',1024,50)

[handler_file]
class=FileHandler
level=DEBUG
formatter=jsonFormatter
args=('foo.log','a')

[formatter_jsonFormatter]
format={"time": "%(asctime)s", "logger": "%(name)s", "level": "%(levelname)s", "pid": "%(process)d", "src": "%(pathname)s", "line": "%(lineno)d", "msg": "%(message)s"}

GitPython Sample Script

The following script implement the use case using GitPython - create the "bin/bump-version-with-gitpython.py" file with the following contents and assign it execution permission:

#!/usr/bin/python3

import errno, os, sys, argparse, configparser, logging, logging.config
import re
import git

credentials_file = "secrets.ini"
log_config_file = "logging.conf"
# repo_url="git@git0.p1.carcano.corp:fooapp/fooapp.git"
repo_url = "https://git0.p1.carcano.corp:3000/fooapp/fooapp.git"
branch = "master"
repo_dir = "fooapp"

# https://docs.python.org/3/library/logging.html#levels
logging_loglevels = {"NOTSET": 0, "DEBUG": 10, "INFO": 20, "WARNING": 30, "ERROR": 40, "CRITICAL": 50}


def abort(logger_reason, exit_reason=None):
    """

    Exit printing a message on stderr and logging the exception using the logging facility

    Args:
        logger_reason: string or exception object with the reason for exiting to be logged using the logging facility
        exit_reason:   string or exception object with the reason for exiting to print on stderr - if omitted it is
                       automatically set with the value of the logger_reason parameter

    Returns: nothing
    """
    if exit_reason is None:
        exit_reason = logger_reason
    logger.error(str(logger_reason).replace("'", "\'").replace('"', "'").replace("\n", "   ").replace("\r", ""))
    logger.error("Script aborted")
    print(exit_reason, file=sys.stderr)
    exit(1)


def load_credentials(file):
    """

    Load the credentials from a file

    Args:
        file: the path to the credentials' file

    Returns:
        an object with the contents of the credentials file
    """
    try:
        contents = configparser.ConfigParser()
        if not os.path.isfile(file):
            raise FileNotFoundError(
                errno.ENOENT, os.strerror(errno.ENOENT), file)
        if not os.access(file, os.R_OK):
            raise FileNotFoundError(
                errno.EPERM, "Unable to open file for reading", file)
        contents.read(file)
        return contents
    except FileNotFoundError as e:
        abort(e)
    except:
        abort("{} is not an INI file".format(file))


def init_logging():
    """
    Initialize the Python's standard logging facility

    Formatters are automatically set by this function, verbosity can be set by exporting the LOGLEVEL
    environment variable as defined by https://docs.python.org/3/library/logging.html#levels.
    If necessry, it is possible to alter these formats by creating the logging.ini file in the configuration
    directory - the format is the one of the Python's standard logging facility
    https://docs.python.org/3/library/logging.config.html#configuration-file-format

    Returns: nothing

    """
    if os.path.isfile("{}/{}".format(config_dir, log_config_file)):
        logging.config.fileConfig(fname="{}/{}".format(config_dir, log_config_file), disable_existing_loggers=False)
        log = logging.getLogger(__name__)
        print("logging using '{}' file".format("{}/{}".format(config_dir, log_config_file)))
    else:
        log = logging.getLogger(__name__)
        handler = logging.StreamHandler()
        formatter = logging.Formatter("%(asctime)s %(name)-12s %(levelname)-8s %(message)s")
        handler.setFormatter(formatter)
        log.addHandler(handler)
        log.setLevel(logging.INFO)
    log.setLevel(logging_loglevels[os.getenv("LOGLEVEL", "INFO")])
    return log


def git_clone(url, repo_branch="master", dest_dir=None, single_branch=True, ssl_verify=True):
    """
    
    Clone a Git repository
    
    Args:
        url:             Git repository's URL - both HTTP and SSH transports are supported
        repo_branch:     name of the branch to check out while cloning the repository
        dest_dir:        directory on the filesystem where to store the cloned repository
        single_branch:   if True, it clones only the specified repository branch
        ssl_verify:      if True, enable the checkings of the TLS certificate - if any

    Returns:
       an object representing the cloned repository
    """
    repo_name = re.sub("\.git$", "", re.sub("/$", "", url).rsplit("/", 1)[-1])
    logger.info("Cloning '{}', branch '{}'".format(url, repo_branch))
    if dest_dir is None:
        dest_dir = os.path.join("/tmp", repo_name)
    working_dir = os.path.dirname(os.path.realpath(__file__))
    if url.startswith("http"):
        try:
            if credentials.get("git", "username") and credentials.get("git", "password"):
                logger.debug("Logging in to git as '{}'".format(credentials["git"]["username"]))
                os.environ["GIT_ASKPASS"] = os.path.join(working_dir, "askpass.py")
                os.environ["GIT_USERNAME"] = credentials["git"]["username"]
                os.environ["GIT_PASSWORD"] = credentials["git"]["password"]
        except configparser.NoSectionError:
            logger.debug("Logging in to git anonymously")
            pass
        try:
            logger.debug("Cloning using HTTP transport")
            repo = git.Repo.clone_from(url, dest_dir, branch=repo_branch, single_branch=single_branch, depth=1,
                config="http.sslVerify={}".format(str(ssl_verify)), allow_unsafe_options=True)
        except FileNotFoundError as e:
            abort(e)
        except git.exc.GitCommandError as e:
            abort("Error while checking out '{}' git repository: {}".format(repo_name, str(e)))
    else:
        logger.debug("Cloning using SSH transport")
        try:
            if os.path.isfile("{}/sshkey".format(config_dir)):
                logger.debug("Logging in to git using SSH key '{}/sshkey'".format(config_dir))
                if not os.access("{}/sshkey".format(config_dir), os.R_OK):
                    raise FileNotFoundError(
                        errno.EPERM, "Unable to open file for reading", "{}/sshkey".format(config_dir))
                ssh_cmd = "ssh -i {}/sshkey".format(config_dir)
                repo = git.Repo.clone_from(url, dest_dir, branch=repo_branch, single_branch=single_branch, depth=1,
                    env={"GIT_SSH_COMMAND": ssh_cmd})
            else:
                logger.debug("Logging in to git anonymously")
                repo = git.Repo.clone_from(url, dest_dir, branch=repo_branch, single_branch=single_branch, depth=1)
        except FileNotFoundError as e:
            abort(e)
        except git.exc.GitCommandError as e:
            abort("Error while checking out '{}' git repository: {}".format(repo_name, str(e)))
    try:
        if credentials.get("git", "user"):
            repo.config_writer().set_value("user", "name", credentials["git"]["user"]).release()
    except (configparser.NoSectionError, configparser.NoOptionError):
        pass
    try:
        if credentials.get("git", "email"):
            repo.config_writer().set_value("user", "email", credentials["git"]["email"]).release()
    except (configparser.NoSectionError, configparser.NoOptionError):
        pass
    return repo


def update_metadata(changelog_file):
    """
    
    Update the metadata file, bumping the release version
    
    Args:
        changelog_file: path to the metadata file within the cloned Git repository

    Returns: Nothing

    """
    try:
        with open(os.path.join(repo_dir, changelog_file), "r") as file:
            contents = file.read()
        project_version = re.search("release:\s+(\d+\.\d+\.\d+)", contents).group(1)
        if not project_version:
            abort("Unable to guess the project's current version")
        major, minor, patchlevel = map(int, project_version.split("."))
        logger.debug("Project version tokens: - MAJOR={}, MINOR={}, PATCH={})".format(major, minor, patchlevel))
        new_version = "{}.{}.{}".format(major, minor, str(patchlevel + 1))
        logger.info("Bumping from version '{}' to version '{}'".format(project_version, new_version))
        with open(os.path.join(repo_dir, changelog_file), "r+") as file:
            contents = file.readlines()
            file.seek(0)
            file.truncate()
            for line in contents:
                match = re.sub(r"^release:.*$", "release: {}".format(new_version), line)
                if match != line:
                    line = match
                file.write(line)
        logger.debug("File '{}' successfully updated".format(changelog_file))
    except FileNotFoundError as e:
        abort(e)
    try:
        repository.git.add(changelog_file)
        repository.index.commit("Bumping to version '{}'".format(new_version))
        logger.debug("Changes successfully committed")
    except git.exc.GitCommandError as e:
        abort(e)


def git_push(branch_or_tag_name="", remote="origin"):
    """

    Push the specified remote

    Args:
        branch_or_tag_name: name of the branch or tag to push
        remote: name of the remote to push

    Returns:

    """
    try:
        logger.info("Pushing changes to git")
        if branch_or_tag_name == "":
            branch_or_tag_name = repository.active_branch.name
        repository.remote(remote).push(branch_or_tag_name)
    except (git.exc.GitCommandError, ValueError) as e:
        abort(e)


default_config_dir = "{}/conf".format(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))

parser = argparse.ArgumentParser(description="Example script showing how to use git-python")
parser.add_argument("-c", "--config-dir", dest="config_dir", default=default_config_dir,
    help="path to the configuration directory")
args = parser.parse_args()

config_dir = args.config_dir

logger = init_logging()
logger.info("Script started")
credentials = load_credentials("{}/{}".format(config_dir, credentials_file))

repository = git_clone(repo_url, branch, repo_dir, True, False)

new_branch = repository.create_head('bump')
new_branch.checkout()

update_metadata("meta.yml")

git_push()

logger.info("Script ended")

PyGit2 Sample Script

The following script implement the use case using pygit2 - create the "bin/bump-version-with-pygit2.py" file with the following contents and assign it execution permission:

Are you enjoying these high quality free contents on a blog without annoying banners? I like doing this for free, but I also have costs so, if you like these contents and you want to help keeping this website free as it is now, please put your tip in the cup below:

Even a small contribution is always welcome!

#!/usr/bin/python3

import errno, os, sys, argparse, configparser, logging, logging.config
import re
import pygit2
from pygit2._pygit2 import GitError

credentials_file = "secrets.ini"
log_config_file = "logging.conf"
# repo_url="git@git0.p1.carcano.corp:fooapp/fooapp.git"
repo_url = "https://git0.p1.carcano.corp:3000/fooapp/fooapp.git"
branch = "master"
repo_dir = "fooapp"

# https://docs.python.org/3/library/logging.html#levels
logging_loglevels = {"NOTSET": 0, "DEBUG": 10, "INFO": 20, "WARNING": 30, "ERROR": 40, "CRITICAL": 50}


class GitCallbacks(pygit2.RemoteCallbacks):
    def __init__(self, user=None, token=None, pub_key=None, priv_key=None, passphrase=None):
        self.user = user
        self.token = token
        self.pub_key = pub_key
        self.priv_key = priv_key
        self.passphrase = passphrase

    def credentials(self, url, username_from_url, allowed_types):
        if allowed_types & pygit2.enums.CredentialType.USERNAME:
            return pygit2.Username(self.user)
        elif allowed_types & pygit2.enums.CredentialType.USERPASS_PLAINTEXT:
            return pygit2.UserPass(self.user, self.token)
        elif allowed_types & pygit2.enums.CredentialType.SSH_KEY:
            return pygit2.Keypair(username_from_url, self.pub_key, self.priv_key, self.passphrase)
        else:
            return None

    def certificate_check(self, certificate, valid, host):
        return True

    def transfer_progress(self, stats):
        print("Retrieved objects: {}/{}".format(stats.indexed_objects, stats.total_objects), end="\r")


def abort(logger_reason, exit_reason=None):
    """

    Exit printing a message on stderr and logging the exception using the logging facility

    Args:
        logger_reason: string or exception object with the reason for exiting to be logged using the logging facility
        exit_reason:   string or exception object with the reason for exiting to print on stderr - if omitted it is
                       automatically set with the value of the logger_reason parameter

    Returns: nothing
    """
    if exit_reason is None:
        exit_reason = logger_reason
    logger.error(str(logger_reason).replace("'", "\'").replace('"', "'").replace("\n", "   ").replace("\r", ""))
    logger.error("Script aborted")
    print(exit_reason, file=sys.stderr)
    exit(1)


def load_credentials(file):
    """

    Load the credentials from a file

    Args:
        file: the path to the credentials' file

    Returns:
        an object with the contents of the credentials file
    """
    try:
        contents = configparser.ConfigParser()
        if not os.path.isfile(file):
            raise FileNotFoundError(
                errno.ENOENT, os.strerror(errno.ENOENT), file)
        if not os.access(file, os.R_OK):
            raise FileNotFoundError(
                errno.EPERM, "Unable to open file for reading", file)
        contents.read(file)
        return contents
    except FileNotFoundError as e:
        abort(e)
    except:
        abort("{} is not an INI file".format(file))


def init_logging():
    """
    Initialize the Python's standard logging facility

    Formatters are automatically set by this function, verbosity can be set by exporting the LOGLEVEL
    environment variable as defined by https://docs.python.org/3/library/logging.html#levels.
    If necessry, it is possible to alter these formats by creating the logging.ini file in the configuration
    directory - the format is the one of the Python's standard logging facility
    https://docs.python.org/3/library/logging.config.html#configuration-file-format

    Returns: nothing

    """
    if os.path.isfile("{}/{}".format(config_dir, log_config_file)):
        logging.config.fileConfig(fname="{}/{}".format(config_dir, log_config_file), disable_existing_loggers=False)
        log = logging.getLogger(__name__)
        print("logging using '{}' file".format("{}/{}".format(config_dir, log_config_file)))
    else:
        log = logging.getLogger(__name__)
        handler = logging.StreamHandler()
        formatter = logging.Formatter("%(asctime)s %(name)-12s %(levelname)-8s %(message)s")
        handler.setFormatter(formatter)
        log.addHandler(handler)
        log.setLevel(logging.INFO)
    log.setLevel(logging_loglevels[os.getenv("LOGLEVEL", "INFO")])
    return log


def git_clone(url, repo_branch="master", dest_dir=None):
    """
    
    Clone a Git repository
    
    Args:
        url:             Git repository's URL - both HTTP and SSH transports are supported
        repo_branch:     name of the branch to check out while cloning the repository
        dest_dir:        directory on the filesystem where to store the cloned repository

    Returns:
       an object representing the cloned repository
    """
    repo_name = re.sub("\.git$", "", re.sub("/$", "", url).rsplit("/", 1)[-1])
    logger.info("Cloning '{}', branch '{}'".format(url, repo_branch))
    if dest_dir is None:
        dest_dir = os.path.join("/tmp", repo_name)
    working_dir = os.path.dirname(os.path.realpath(__file__))
    try:
        if url.startswith("http"):
            logger.debug("Cloning using HTTP transport")
            repo = pygit2.clone_repository(url, dest_dir, depth=1,
                callbacks=GitCallbacks(user=credentials["git"]["username"], token=credentials["git"]["password"]))
        else:
            logger.debug("Cloning using SSH transport")
            repo = pygit2.clone_repository(url, "tools", depth=1,
                callbacks=GitCallbacks(priv_key="sshkey", pub_key="sshkey.pub",
                passphrase=credentials["git"]["password"]))
    except (GitError, ValueError) as e:
        abort(e)
    return repo


def update_metadata(changelog_file):
    """
    
    Update the metadata file, bumping the release version
    
    Args:
        changelog_file: path to the metadata file within the cloned Git repository

    Returns: Nothing

    """
    try:
        with open(os.path.join(repo_dir, changelog_file), "r") as file:
            contents = file.read()
        project_version = re.search("release:\s+(\d+\.\d+\.\d+)", contents).group(1)
        if not project_version:
            abort("Unable to guess the project's current version")
        major, minor, patchlevel = map(int, project_version.split("."))
        logger.debug("Project version tokens: - MAJOR={}, MINOR={}, PATCH={})".format(major, minor, patchlevel))
        new_version = "{}.{}.{}".format(major, minor, str(patchlevel + 1))
        logger.info("Bumping from version '{}' to version '{}'".format(project_version, new_version))
        with open(os.path.join(repo_dir, changelog_file), "r+") as file:
            contents = file.readlines()
            file.seek(0)
            file.truncate()
            for line in contents:
                match = re.sub(r"^release:.*$", "release: {}".format(new_version), line)
                if match != line:
                    line = match
                file.write(line)
        logger.debug("File '{}' successfully updated".format(changelog_file))
    except FileNotFoundError as e:
        abort(e)
    try:
        index = repository.index
        index.add(changelog_file)
        index.write()
        author = pygit2.Signature(credentials["git"]["user"], credentials["git"]["email"])
        committer = pygit2.Signature(credentials["git"]["user"], credentials["git"]["email"])
        message = "Bumping to version '{}'".format(new_version)
        tree = index.write_tree()
        ref = repository.head.name
        parents = [repository.head.target]
        return repository.create_commit(ref, author, committer, message, tree, parents)
    except (GitError, OSError) as e:
        abort(e)


def git_push(branch="", tag="", remote="origin"):
    """

    Push the specified remote

    Args:
        branch: name of the branch to push
        tag: name of the tag to push
        remote: name of the remote to push

    Returns:

    """
    try:
        if branch == "" and tag == "":
            what = repository.head.name
        elif branch != "":
            what = "refs/heads/{}".format(branch)
        elif tag != "":
            what = "refs/tags/{}".format(tag)
        logger.info("Pushing changes to git")
        if credentials["git"]["username"] and credentials["git"]["password"]:
            repository.remotes[remote].push(['+' + what],
                callbacks=GitCallbacks(user=credentials["git"]["username"], token=credentials["git"]["password"]))
        else:
            repository.remotes[remote].push(['+' + what], callbacks=GitCallbacks(priv_key="sshkey",
                pub_key="sshkey.pub", passphrase=credentials["git"]["password"]))
    except (GitError, ValueError) as e:
        abort(e)


default_config_dir = "{}/conf".format(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))

parser = argparse.ArgumentParser(description="Example script showing how to use git-python")
parser.add_argument("-c", "--config-dir", dest="config_dir", default=default_config_dir,
    help="path to the configuration directory")
args = parser.parse_args()

config_dir = args.config_dir

logger = init_logging()
logger.info("Script started")
credentials = load_credentials("{}/{}".format(config_dir, credentials_file))

repository = git_clone(repo_url, branch, repo_dir)

last_commit = repository.revparse_single("HEAD")
new_branch = repository.branches.local.create("bump", last_commit)
repository.checkout(repository.branches.local["bump"])

cmit = update_metadata("meta.yml")
git_push()

logger.info("Script ended")

Footnotes

Here it ends our tutorial on Git with Python - I hope you enjoyed it. As we saw, GitPython and PyGit2 are the most commonly used libraries - the first exploits the git command line tool, abstracting data as Python objects. The latter instead is pure Python linked to the libgit2 library, with the downside of an inherited dependency on libssh2 that brings the drawback of having the SSH transport disabled on recent Linux distributions.
Last but not least, we saw the syntax for the most common use cases, along with a full featured example for each of them.

If you appreciate this strive please and if you like this post and any other ones, just share this and the others on Linkedin - sharing and comments are an inexpensive way to push me into going on writing - this blog makes sense only if it gets visited.

I hate blogs with pop-ups, ads and all the (even worse) other stuff that distracts from the topics you're reading and violates your privacy. I want to offer my readers the best experience possible for free, ... but please be wary that for me it's not really free: on top of the raw costs of running the blog, I usually spend on average 50-60 hours writing each post. I offer all this for free because I think it's nice to help people, but if you think something in this blog has helped you professionally and you want to give concrete support, your contribution is very much appreciated: you can just use the above button.

GitPython

Installing

Handling Exceptions

Cloning a Remote Repository

Cloning a Remote Private Repository

HTTP Transport

SSH Transport

Configuring User Attributes

Creating A New Branch

Committing Changes

Creating Tags

Pushing Changes

Branches

Tags

PyGit2

Checking For SSH transport support

Installing

Handling Exceptions

Opening An Existing Local Repository

Cloning a Remote Repository

Cloning a Remote Private Repository

HTTP Transport

SSH Transport

Getting Configurations From An Opened Repository

Creating A New Branch

Committing Changes

Creating Tags

Pushing Changes

Branches

Tags

Example Scripts

Configuration Files

GitPython Sample Script

PyGit2 Sample Script

Footnotes

Leave a Reply Cancel Reply