Interacting with Git using Python is a very common use case in the DevOps field: very often it is necessary to checkout application’s or scripts along with their configuration or even just checkout versioned configurations. Although more rare, it is sometimes necessary to update the checked out contents and push the committed version back to the “origin” remote repository. In the "Git With Python HowTo GitPython Tutorial And PyGit2 Tutorial" post we play with the two most commonly used Python libraries used to interconnect to Git: gitpython and pygit2.

As a companion of this post you may consider reading:

since both posts provide a lot of useful information on Git

GitPython

Probably the most used library and of course one of the earliest, it is a wrapper around the "git" command line utility  providing an object model representing Git concepts and offering error handling. For this reason GitPython needs the "git" executable to be installed on the system and available in your "PATH" environment variable or in the "GIT_PYTHON_GIT_EXECUTABLE" environment variable.

CVE-2024-22190: despite this being a Linux blog, since Python code can run also on Windows it is worth the effort to mention this CVE, which exploits untrusted search paths.

The official documentation is available here.

GitPython is not suited for long-running processes (like daemons) as it tends to leak system resources. It was written in a time where destructors (as implemented in the __del__ method) still ran deterministically.

This project is in maintenance mode, which means that:

  • there will be no feature development, unless these are contributed
  • there will be no bug fixes, unless they are relevant to the safety of users, or contributed
  • issues will be responded to with waiting times of up to a month

Of course I'm not going to describe everything about GitPython: doing that would mean rewriting its documentation, so a pointless thing - I just want to provide some handy examples for the most common use cases.

The GitPython official documentation is available at https://gitpython.readthedocs.io/en/stable/.

Installing

GitPython can be easily installed using PIP as follows:

python3 -m pip install gitpython

Handling Exceptions

The very most of the GitPython exceptions can be managed by capturing the "git.exc.CommandError" exception as shown by the following snippet:

try
   git.somefunction(...)
except git.exc.GitCommandError as e:
    print(e, file=sys.stderr)
    exit(1)

Cloning a Remote Repository

If the remote repository is a public one, cloning it requires just a very trivial statement like the following one:

repo = git.Repo.clone_from("https://git0.p1.carcano.corp/mcarcano/myrepo.git", "myrepo")

the function returns a Python Object referring to the cloned repository.

Cloning a Remote Private Repository

If the remote repository is a private one, things differs depending on it is using "HTTP" or "SSH" transport.

HTTP Transport

When dealing with HTTP transport, authentication is password-based: in this case the "git" command line utility requires to specify the path to an external helper script able to pass the authentication credentials.

Since GitPython relies on the "git" command line utility, it inherits the same requirement, so to work with private Git repositories requiring password authentication you must create the "askpass.py" file with the following contents:

#!/usr/bin/python3
    
from sys import argv
from os import environ
    
if 'username' in argv[1].lower():
    print(environ['GIT_USERNAME'])
    exit()
    
if 'password' in argv[1].lower():
    print(environ['GIT_PASSWORD'])
    exit()
    
exit(1)

Despite technically the "askpass.py" can be put where you prefer, but to keep things simple and easy both for running and deployment my suggestion is to put it in the same directory with your script using GitPython.

once done, assign it execution rights:

chmod 755 askpass.py

as for the Python code to put in tour script for cloning the repo, you must export the environment variables:

  • GIT_ASKPASS - the path to the "askpass.py" helper script
  • GIT_USERNAME - the environment variable with the username expected by the "askpass.py" helper script
  • GIT_PASSWORD - the environment variable with the password expected by the "askpass.py" helper script

right before the clone statement.

For example:

working_dir=os.path.dirname(os.path.realpath(__file__))

os.environ["GIT_ASKPASS"] = os.path.join(working_dir, "askpass.py")
os.environ["GIT_USERNAME"] = "myusername"
os.environ["GIT_PASSWORD"] = "thepassword"

repo = git.Repo.clone_from("https://git0.p1.carcano.corp/mcarcano/myprivaterepo.git", "myprivaterepo")

SSH Transport

SSH transport is a different matter, since it relies on authorized SSH public keys: in this case it is necessary to export into the "GIT_SSH_COMMAND" environment variable the ssh statement with the parameter pointing to the SSH private key to use while connecting.

For example, to use the private key from the "sshkey" file in the same directory of the script itself:

ssh_cmd = "ssh -i sshkey"
repo = git.Repo.clone_from("git@git0.p1.carcano.corp:mcarcano/myprivaterepo.git", "myprivaterepo", env={"GIT_SSH_COMMAND": ssh_cmd})
If the SSH private key is password protected, you can use the SSH agent as described in my post "OpenSSH Tutorial - The Ultimate SSH Guide To Understand It".

Configuring User Attributes

Properly committing changes to a repository requires to set a few user attributes - at least the Author's name and email address - so that to be able to easily track in the Git log who made the change.

Once cloned the repository (or opened from the local filesystem) and got the repository Python Object, you can easily set these attributes as follows:

repo.config_writer().set_value("user", "name", "Marco Carcano").release()
repo.config_writer().set_value("user", "email", "marco.carcano@carcano.corp").release()

Creating A New Branch

If the cloned branch is a protected one, then you are not allowed to commit changes to it and push  it back to the remote repository. In such a scenario you must checkout a new branch from the cloned one and operate the changes into that branch.

This can be achieved as follows:

new_branch = repo.create_head("my-new-branch")

When dealing with protected branches changes are merged to the protected branch by the means of a Merge Request (aka Pull Request). In such a scenario, when creating the Merge Request you must specify the current branch (so the freshly created one) as the source, and the protected one (the one you cloned) as target. In such a scenario, you can get the name of the current branch before creating the new one as follows:

initial_branch_name=repo.active_branch.name

if you need to start a new branch from a different one than the one you cloned, you can use the following syntax:

new_branch = repo.create_head("my-new-branch", origin.refs.master)

of course specifying the ref-spec of the branch you want to use.

Once created the new branch, you probably want to checkout it as follows:

new_branch.checkout()

Committing Changes

Once cloned the repository (or opened from the local filesystem) and got the repository Python Object, you can easily add contents to the staging area:

repo.git.add("changedfile.txt") 

the above statement of course adds just a single changed item, ... you may instead prefer to add every changed content as a whole:

repo.git.add(all=True)

Once added everything you fancy to the staging area, just commit the changes as follows:

repo.index.commit("My commit message")

Creating Tags

Git supports two types of tags: lightweight and annotated. GitPython of course support both kind of tags.

If you need to create a lightweight tag:

lightweight_tag = repo.create_tag("my_lw_tag")

If instead you need to create annotated tag:

annotated_tag = repo.create_tag("my_tag", message="the descriptive message")

Pushing Changes

A very common use case is pushing to the remote Git repository the committed changes.

Branches

If the changes have been committed directly into the branch that was previously cloned, you can just define the remote repository as a Python object and push as follows:

remote_name="origin"
repo.remote(remote_name).push()

if instead the branch to push is a new locally created one (a quite common use case when dealing with protected branches), it is necessary to pass it as an argument of the push - for example:

remote_name="origin"
new_local_branch-"mybranch"
repo.remote(remote_name).push(new_local_branch)

even easier, if you just want to push the current branch, no matter it is a new or existent one:

remote_name="origin"
repo.remote(remote_name).push(repo.active_branch.name)

Tags

Same way as with the git command line tool, tags are not included in the push: when dealing with tags, you must explicitly pass it as follows:

remote_name="origin"
repo.remote(remote_name).push("lightweight_tag")

or:

remote_name="origin"
repo.remote(remote_name).push("annotated_tag_name")

You can have the same behaviour of the "--follow-tags" command line git option by adding "push.followTags = true" to the repository configuration as follows:

repo.config_writer().set_value("push", "followTags", "true").release()

After doing that every annotated tag is pushed even when just calling the push() method with no arguments:

PyGit2

It is a set of Python bindings to the libgit2 linkable C Git library - libgit2 is a dependency-free implementation of Git, with a focus on having a nice API for use within other programs. Besides pygit2, there are available bindings for other programming languages such as C#, Objective-C, Golang and so on.

Among libgit2 supported features, there is also SSH: conversely from the HTTPS feature, that leverages on the SSL library, the SSH feature relies on libssh2. Because of some security issues with libssh2,  some distributions, such as Red Hat, decided not to provide it anymore (it was available until Red Hat Enterprise Linux 7 - now it is intentionally missing).

Of course I'm not going to describe everything about pygit2: doing that would mean rewriting its documentation, so a pointless thing - I just want to provide some handy examples for the most common use cases.

The pygit2 official documentation is available at https://www.pygit2.org/index.html.

Checking For SSH transport support

Before using pygit2, it is best to check that:

  • pygit2 has SSH support enabled
  • libgit2 with SSH support enabled

Make sure that pygit2 has been built with SSH transport is very easy: just launch Python and run the following statements:

import pygit2 as pg
bool(pg.features and pg.GIT_FEATURE_SSH)

if the output is "True", then pygit2 has been built with SSH transport enabled.

Checking if "libgit2" has been built with SSH support enabled requires instead building a small C program that tries to instantiate the SSH transport.

Create the "/tmp/libgit2-SSH-support-test.c" file with the following contents:

#include <git2.h>
#include <git2/sys/transport.h>
#include <stdio.h>

void main()
{
  git_transport *transport;

  int err = git_transport_new(&transport, NULL, "ssh://github.com/libgit2/pygit2.git");
  printf("%d\n", err);
}

then, build and run it as follows:

gcc /tmp/libgit2-SSH-support-test.c -lgit2 && ./a.out

if the output is "0", then SSH support has been enabled in the "libgit2" library, otherwise it does not.

Installing

On Red Hat family Linux distributions, pygit2 can be easily installed using DNF as follows::

sudo dnf install -y python3-pygit2

if instead you prefer to use PIP, type:

python3 -m pip install pygit2

Handling Exceptions

The very most of the pygit2 exceptions can be managed by capturing the "GitError" exception as shown by the following snippet:

from pygit2._pygit2 import GitError

try
   git.somefunction(...)
except GitError as e:
    print(e, file=sys.stderr)
    exit(1)

Please note how we explicitly imported the GitError name using the "from" directive.

Cloning a Remote Repository

If the remote repository is a public one, cloning it requires just a very trivial statement like the following one:

repo = pygit2.clone_repository("https://git0.p1.carcano.corp/mcarcano/myrepo.git", "myrepo")

the function returns a Python Object referring to the cloned repository.

Cloning a Remote Private Repository

If the remote repository is a private one, we must instruct pygit2 about how to handle the authentication process. This can be easily achieved by subclassing the "RemoteCallbacks" Object:

class GitCallbacks(pygit2.RemoteCallbacks):
    def __init__(self, user=None, token=None, pub_key=None, priv_key=None, passphrase=None):
        self.user = user
        self.token = token
        self.pub_key = pub_key
        self.priv_key = priv_key
        self.passphrase = passphrase

    def credentials(self, url, username_from_url, allowed_types):
        if allowed_types & pygit2.enums.CredentialType.USERNAME:
            return pygit2.Username(self.user)
        elif allowed_types & pygit2.enums.CredentialType.USERPASS_PLAINTEXT:
            return pygit2.UserPass(self.user, self.token)
        elif allowed_types & pygit2.enums.CredentialType.SSH_KEY:
            return pygit2.Keypair(username_from_url, self.pub_key, self.priv_key, self.passphrase)
        else:
            return None

    def certificate_check(self, certificate, valid, host):
        return True

    def transfer_progress(self, stats):
        print("Retrieved objects: {}/{}".format(stats.indexed_objects, stats.total_objects), end="\r")

in the above snippet:

  • the "__init__" method has every parameter as optional (lines 2-7)
  • the "credentials" method selects and returns the correct kind of credentials (USERNAME, USERPASS_PLAINTEXT, SSH_KEY) along with its values (lines 9-17)
  • the "certificate_check" overrides the checking of the server's certificate - it returns always True, so the certificate is never check - this is just a sample: don't implement it in production (lines 19-20)
  • the "transfer_progress" method is used to print information about the progress of the clone (lines 22-23)

HTTP Transport

When dealing with HTTP transport, authentication is password-based: in this case we initialize the callback by passing the username and the password:

repo = pygit2.clone_repository("https://git0.p1.carcano.corp/mcarcano/myprivaterepo.git", "myprivaterepo", 
                callbacks=GitCallbacks(user="myusername", token="thepassword"))

SSH Transport

SSH transport is a different matter, since it relies on authorized SSH public keys: in this case we initialize the callback by passing the path to the private key file, the path to the public key file and the passphrase to decode the private key:

repo = pygit2.clone_repository("ssh://git0.p1.carcano.corp/mcarcano/myprivaterepo.git", "myprivaterepo",
           callbacks=GitCallbacks(priv_key="sshkey", pub_key="sshkey.pub", passphrase="thepassphrase"))
Don't forget to check that SSH is actually enabled in both pygit2 and libgit2 as described before: this will save you a lot of time trying to debug obscure oddities until you will find that it was just not enabled.

Creating A New Branch

If the cloned branch is a protected one, then you are not allowed to commit changes to it and push  it back to the remote repository. In such a scenario you must checkout a new branch from the cloned one and operate the changes into that branch.

This can be achieved as follows:

last_commit= repository.revparse_single("HEAD")
new_branch = repository.branches.local.create("my-new-branch",last_commit)
repository.checkout(repository.branches.local["my-new-branch"])

When dealing with protected branches changes are merged to the protected branch by the means of a Merge Request (aka Pull Request). In such a scenario, when creating the Merge Request you must specify the current branch (so the freshly created one) as the source, and the protected one (the one you cloned) as target. In such a scenario, you can get the name of the current branch before creating the new one as follows:

initial_branch_name=repo.head.shorthand

Committing Changes

Once cloned the repository (or opened from the local filesystem) and got the repository Python Object, you can easily add contents to the staging area.

First, create the Signature objects for the author of the changes and for their committer:

author = pygit2.Signature("Marco Carcano", "marco.carcano@carcano.corp")
committer = pygit2.Signature("Marco Carcano", "marco.carcano@carcano.corp")

Despite the fact that most of the time the author and committer are the same person, that is not always true: think for example a pull request - in this case the commiter is somebody committing changes on behalf of someone else.

we must the create the index for the staging area's files, add contents to it and create the tree object.

index = repo.index
index.add("changedfile.txt")
index.write()
tree = index.write_tree()

if the commit is not the initial commit of a new git repository, we must also assign:

  • the current head's name
  • the list of parents
ref = repository.head.name
parents = [ repository.head.target ]

the above statement of course adds just a single changed item, ... you may instead prefer to add every changed content as a whole:

index.add_all()

Once defined all of the above, you can perform the actual commit as follows:

message = "My commit message"
commit=repository.create_commit(ref, author, committer, message, tree, parents)

Creating Tags

Git supports two types of tags: lightweight and annotated. Conversely from GitPython, pygit2 does not have a specific method for creating lightweight tags - you must deal with creating a reference by yourself as follows: 

from pygit2._pygit2 import GIT_OBJ_COMMIT

tag_name = "my_lw_tag"
last_commit= repository.revparse_single("HEAD")
if last_commit.type == GIT_OBJ_COMMIT:
    repository.create_reference("refs/tags/{}".format(tag_name), (last_commit.id))

it instead provides the "create_tag" method for creating annotated tags. For example:

from pygit2._pygit2 import GIT_OBJ_COMMIT

last_commit= repository.revparse_single("HEAD")
if last_commit.type == GIT_OBJ_COMMIT:
    committer = pygit2.Signature("Marco Carcano", "marco.carcano@carcano.corp")
    oid = repository.create_tag("my_tag", str(last_commit.id), GIT_OBJ_COMMIT, committer, "the descriptive message")

Pushing Changes

A very common use case is pushing to the remote Git repository the committed changes.

Branches

The statement for pushing branches changes a little bit depending on the underlying transport.

The following snippet is an example for the HTTP transport:

branch="mybranch"
remote_name="origin"
repository.remotes[remote_name].push(["+refs/heads/{}".format(branch)], callbacks=GitCallbacks(user="username", token="password"))

if instead you are dealing with the SSH transport, the above code changes as follows:

branch="mybranch"
remote_name="origin"
repository.remotes[remote_name].push(["+refs/heads/{}".format(branch)], callbacks=GitCallbacks(priv_key="sshkey", pub_key="sshkey.pub", passphrase="password"))

Tags

Pushing tags is pretty similar to pushing branches - the below snippet is for pushing tags using the HTTP transport:

tag="mytag"
remote_name="origin"
repository.remotes[remote_name].push(["+refs/tags/{}".format(branch)], callbacks=GitCallbacks(user="username", token="password"))

whereas the below one is for pushing tags using the SSH transport:

branch="mytag"
remote_name="origin"
repository.remotes[remote_name].push(["+refs/tags/{}".format(branch)], callbacks=GitCallbacks(priv_key="sshkey", pub_key="sshkey.pub", passphrase="password"))

Example Scripts

My affectionate readers know I always show things in action: theory is important, but worth few without some good practical examples.

So I wrote two sample scripts, one using gitpython and the other using pygit2: you can use them as a starting point to write your own, improving and adapting them as needed.

Create a directory for this project and change into it:

mkdir -m 755 git-with-python
cd git-with-python

Both scripts read configuration settings from the "conf" directory, and are stored in the "bin", so create these directories as follows:

mkdir -m 755 conf bin

Several parts of both scripts use the same logic, for example for reading configuration files or initialisation Python's logging facility. For this reason, no matter which script you want to use, you must also create these files.

Configuration Files

Let's start by setting the configuration files -  create the "conf/secrets.ini" with the following contents:

[git]
username=fooapp
password=A6.Ur30-Ne
user=Foo Application
email=foo@carcano.corp

it contains:

  • the username and password to connect to the remote git repository
  • the username and email address to be used for authoring and committing changes

Since - at least with GitPython - we are going to test also the SSH transport, create also the "conf/sshkey" file containing the private key for connecting - it's contents should look like the following snippet:

-----BEGIN OPENSSH PRIVATE KEY-----
b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEbm9uZQAAAAAAAAABAAABlwAAAAdzc2gtcn
...
dAZ0qapX/6Ypa/AAAAJnZhZ3JhbnRAZ2l0LWNhLXVwMWEwMDEucDEuY2FyY2Fuby5jb3Jw
AQIDBA==
-----END OPENSSH PRIVATE KEY-----

and the "conf/sshkey.pub" file with the related public key - remember to authorise it on the remote git repository or it won't of course be valid for connecting.

If you need guidance on how to generate these keys, please read the post "OpenSSH Tutorial - The Ultimate SSH Guide To Understand It" - besides showing you how to deal with these keys, it will also give you a very good understanding of OpenSSH.

then set up the "conf/logging.conf" file with all the settings to be used by the Python's logging facility:

[loggers]
keys=root,foo

[handlers]
keys=console,syslog,file

[formatters]
keys=jsonFormatter

[logger_root]
level=DEBUG
handlers=file
propagate=0

[logger_foo]
level=DEBUG
handlers=file
qualname=__main__
propagate=0

[handler_null]
class = logging.NullHandler
formatter = default
args = ()

[handler_console]
class=StreamHandler
level=DEBUG
formatter=jsonFormatter
args=(sys.stderr,)

[handler_syslog]
class = handlers.SysLogHandler
args = ('/dev/log', handlers.SysLogHandler.LOG_USER)
level=DEBUG
formatter=jsonFormatter

[handler_rotatingfile]
class=handlers.RotatingFileHandler
level=DEBUG
formatter=jsonFormatter
args=('foo.log','a',1024,50)

[handler_file]
class=FileHandler
level=DEBUG
formatter=jsonFormatter
args=('foo.log','a')

[formatter_jsonFormatter]
format={"time": "%(asctime)s", "logger": "%(name)s", "level": "%(levelname)s", "pid": "%(process)d", "src": "%(pathname)s", "line": "%(lineno)d", "msg": "%(message)s"}

GitPython Sample Script

The following script implement the use case using GitPython - create the "bin/bump-version-with-gitpython.py" file with the following contents and assign it execution permission:

#!/usr/bin/python3

import errno, os, sys, argparse, configparser, logging, logging.config
import re
import git

credentials_file = "secrets.ini"
log_config_file = "logging.conf"
# repo_url="git@git0.p1.carcano.corp:fooapp/fooapp.git"
repo_url = "https://git0.p1.carcano.corp:3000/fooapp/fooapp.git"
branch = "master"
repo_dir = "fooapp"

# https://docs.python.org/3/library/logging.html#levels
logging_loglevels = {"NOTSET": 0, "DEBUG": 10, "INFO": 20, "WARNING": 30, "ERROR": 40, "CRITICAL": 50}


def abort(logger_reason, exit_reason=None):
    """

    Exit printing a message on stderr and logging the exception using the logging facility

    Args:
        logger_reason: string or exception object with the reason for exiting to be logged using the logging facility
        exit_reason:   string or exception object with the reason for exiting to print on stderr - if omitted it is
                       automatically set with the value of the logger_reason parameter

    Returns: nothing
    """
    if exit_reason is None:
        exit_reason = logger_reason
    logger.error(str(logger_reason).replace("'", "\'").replace('"', "'").replace("\n", "   ").replace("\r", ""))
    logger.error("Script aborted")
    print(exit_reason, file=sys.stderr)
    exit(1)


def load_credentials(file):
    """

    Load the credentials from a file

    Args:
        file: the path to the credentials' file

    Returns:
        an object with the contents of the credentials file
    """
    try:
        contents = configparser.ConfigParser()
        if not os.path.isfile(file):
            raise FileNotFoundError(
                errno.ENOENT, os.strerror(errno.ENOENT), file)
        if not os.access(file, os.R_OK):
            raise FileNotFoundError(
                errno.EPERM, "Unable to open file for reading", file)
        contents.read(file)
        return contents
    except FileNotFoundError as e:
        abort(e)
    except:
        abort("{} is not an INI file".format(file))


def init_logging():
    """
    Initialize the Python's standard logging facility

    Formatters are automatically set by this function, verbosity can be set by exporting the LOGLEVEL
    environment variable as defined by https://docs.python.org/3/library/logging.html#levels.
    If necessry, it is possible to alter these formats by creating the logging.ini file in the configuration
    directory - the format is the one of the Python's standard logging facility
    https://docs.python.org/3/library/logging.config.html#configuration-file-format

    Returns: nothing

    """
    if os.path.isfile("{}/{}".format(config_dir, log_config_file)):
        logging.config.fileConfig(fname="{}/{}".format(config_dir, log_config_file), disable_existing_loggers=False)
        log = logging.getLogger(__name__)
        print("logging using '{}' file".format("{}/{}".format(config_dir, log_config_file)))
    else:
        log = logging.getLogger(__name__)
        handler = logging.StreamHandler()
        formatter = logging.Formatter("%(asctime)s %(name)-12s %(levelname)-8s %(message)s")
        handler.setFormatter(formatter)
        log.addHandler(handler)
        log.setLevel(logging.INFO)
    log.setLevel(logging_loglevels[os.getenv("LOGLEVEL", "INFO")])
    return log


def git_clone(url, repo_branch="master", dest_dir=None, single_branch=True, ssl_verify=True):
    """
    
    Clone a Git repository
    
    Args:
        url:             Git repository's URL - both HTTP and SSH transports are supported
        repo_branch:     name of the branch to check out while cloning the repository
        dest_dir:        directory on the filesystem where to store the cloned repository
        single_branch:   if True, it clones only the specified repository branch
        ssl_verify:      if True, enable the checkings of the TLS certificate - if any

    Returns:
       an object representing the cloned repository
    """
    repo_name = re.sub("\.git$", "", re.sub("/$", "", url).rsplit("/", 1)[-1])
    logger.info("Cloning '{}', branch '{}'".format(url, repo_branch))
    if dest_dir is None:
        dest_dir = os.path.join("/tmp", repo_name)
    working_dir = os.path.dirname(os.path.realpath(__file__))
    if url.startswith("http"):
        try:
            if credentials.get("git", "username") and credentials.get("git", "password"):
                logger.debug("Logging in to git as '{}'".format(credentials["git"]["username"]))
                os.environ["GIT_ASKPASS"] = os.path.join(working_dir, "askpass.py")
                os.environ["GIT_USERNAME"] = credentials["git"]["username"]
                os.environ["GIT_PASSWORD"] = credentials["git"]["password"]
        except configparser.NoSectionError:
            logger.debug("Logging in to git anonymously")
            pass
        try:
            logger.debug("Cloning using HTTP transport")
            repo = git.Repo.clone_from(url, dest_dir, branch=repo_branch, single_branch=single_branch, depth=1,
                config="http.sslVerify={}".format(str(ssl_verify)), allow_unsafe_options=True)
        except FileNotFoundError as e:
            abort(e)
        except git.exc.GitCommandError as e:
            abort("Error while checking out '{}' git repository: {}".format(repo_name, str(e)))
    else:
        logger.debug("Cloning using SSH transport")
        try:
            if os.path.isfile("{}/sshkey".format(config_dir)):
                logger.debug("Logging in to git using SSH key '{}/sshkey'".format(config_dir))
                if not os.access("{}/sshkey".format(config_dir), os.R_OK):
                    raise FileNotFoundError(
                        errno.EPERM, "Unable to open file for reading", "{}/sshkey".format(config_dir))
                ssh_cmd = "ssh -i {}/sshkey".format(config_dir)
                repo = git.Repo.clone_from(url, dest_dir, branch=repo_branch, single_branch=single_branch, depth=1,
                    env={"GIT_SSH_COMMAND": ssh_cmd})
            else:
                logger.debug("Logging in to git anonymously")
                repo = git.Repo.clone_from(url, dest_dir, branch=repo_branch, single_branch=single_branch, depth=1)
        except FileNotFoundError as e:
            abort(e)
        except git.exc.GitCommandError as e:
            abort("Error while checking out '{}' git repository: {}".format(repo_name, str(e)))
    try:
        if credentials.get("git", "user"):
            repo.config_writer().set_value("user", "name", credentials["git"]["user"]).release()
    except (configparser.NoSectionError, configparser.NoOptionError):
        pass
    try:
        if credentials.get("git", "email"):
            repo.config_writer().set_value("user", "email", credentials["git"]["email"]).release()
    except (configparser.NoSectionError, configparser.NoOptionError):
        pass
    return repo


def update_metadata(changelog_file):
    """
    
    Update the metadata file, bumping the release version
    
    Args:
        changelog_file: path to the metadata file within the cloned Git repository

    Returns: Nothing

    """
    try:
        with open(os.path.join(repo_dir, changelog_file), "r") as file:
            contents = file.read()
        project_version = re.search("release:\s+(\d+\.\d+\.\d+)", contents).group(1)
        if not project_version:
            abort("Unable to guess the project's current version")
        major, minor, patchlevel = map(int, project_version.split("."))
        logger.debug("Project version tokens: - MAJOR={}, MINOR={}, PATCH={})".format(major, minor, patchlevel))
        new_version = "{}.{}.{}".format(major, minor, str(patchlevel + 1))
        logger.info("Bumping from version '{}' to version '{}'".format(project_version, new_version))
        with open(os.path.join(repo_dir, changelog_file), "r+") as file:
            contents = file.readlines()
            file.seek(0)
            file.truncate()
            for line in contents:
                match = re.sub(r"^release:.*$", "release: {}".format(new_version), line)
                if match != line:
                    line = match
                file.write(line)
        logger.debug("File '{}' successfully updated".format(changelog_file))
    except FileNotFoundError as e:
        abort(e)
    try:
        repository.git.add(changelog_file)
        repository.index.commit("Bumping to version '{}'".format(new_version))
        logger.debug("Changes successfully committed")
    except git.exc.GitCommandError as e:
        abort(e)


def git_push(branch_or_tag_name="", remote="origin"):
    """

    Push the specified remote

    Args:
        branch_or_tag_name: name of the branch or tag to push
        remote: name of the remote to push

    Returns:

    """
    try:
        logger.info("Pushing changes to git")
        if branch_or_tag_name == "":
            branch_or_tag_name = repository.active_branch.name
        repository.remote(remote).push(branch_or_tag_name)
    except (git.exc.GitCommandError, ValueError) as e:
        abort(e)


default_config_dir = "{}/conf".format(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))

parser = argparse.ArgumentParser(description="Example script showing how to use git-python")
parser.add_argument("-c", "--config-dir", dest="config_dir", default=default_config_dir,
    help="path to the configuration directory")
args = parser.parse_args()

config_dir = args.config_dir

logger = init_logging()
logger.info("Script started")
credentials = load_credentials("{}/{}".format(config_dir, credentials_file))

repository = git_clone(repo_url, branch, repo_dir, True, False)

new_branch = repository.create_head('bump')
new_branch.checkout()

update_metadata("meta.yml")

git_push()

logger.info("Script ended")

PyGit2 Sample Script

The following script implement the use case using pygit2 - create the "bin/bump-version-with-pygit2.py" file with the following contents and assign it execution permission:

#!/usr/bin/python3

import errno, os, sys, argparse, configparser, logging, logging.config
import re
import pygit2
from pygit2._pygit2 import GitError

credentials_file = "secrets.ini"
log_config_file = "logging.conf"
# repo_url="git@git0.p1.carcano.corp:fooapp/fooapp.git"
repo_url = "https://git0.p1.carcano.corp:3000/fooapp/fooapp.git"
branch = "master"
repo_dir = "fooapp"

# https://docs.python.org/3/library/logging.html#levels
logging_loglevels = {"NOTSET": 0, "DEBUG": 10, "INFO": 20, "WARNING": 30, "ERROR": 40, "CRITICAL": 50}


class GitCallbacks(pygit2.RemoteCallbacks):
    def __init__(self, user=None, token=None, pub_key=None, priv_key=None, passphrase=None):
        self.user = user
        self.token = token
        self.pub_key = pub_key
        self.priv_key = priv_key
        self.passphrase = passphrase

    def credentials(self, url, username_from_url, allowed_types):
        if allowed_types & pygit2.enums.CredentialType.USERNAME:
            return pygit2.Username(self.user)
        elif allowed_types & pygit2.enums.CredentialType.USERPASS_PLAINTEXT:
            return pygit2.UserPass(self.user, self.token)
        elif allowed_types & pygit2.enums.CredentialType.SSH_KEY:
            return pygit2.Keypair(username_from_url, self.pub_key, self.priv_key, self.passphrase)
        else:
            return None

    def certificate_check(self, certificate, valid, host):
        return True

    def transfer_progress(self, stats):
        print("Retrieved objects: {}/{}".format(stats.indexed_objects, stats.total_objects), end="\r")


def abort(logger_reason, exit_reason=None):
    """

    Exit printing a message on stderr and logging the exception using the logging facility

    Args:
        logger_reason: string or exception object with the reason for exiting to be logged using the logging facility
        exit_reason:   string or exception object with the reason for exiting to print on stderr - if omitted it is
                       automatically set with the value of the logger_reason parameter

    Returns: nothing
    """
    if exit_reason is None:
        exit_reason = logger_reason
    logger.error(str(logger_reason).replace("'", "\'").replace('"', "'").replace("\n", "   ").replace("\r", ""))
    logger.error("Script aborted")
    print(exit_reason, file=sys.stderr)
    exit(1)


def load_credentials(file):
    """

    Load the credentials from a file

    Args:
        file: the path to the credentials' file

    Returns:
        an object with the contents of the credentials file
    """
    try:
        contents = configparser.ConfigParser()
        if not os.path.isfile(file):
            raise FileNotFoundError(
                errno.ENOENT, os.strerror(errno.ENOENT), file)
        if not os.access(file, os.R_OK):
            raise FileNotFoundError(
                errno.EPERM, "Unable to open file for reading", file)
        contents.read(file)
        return contents
    except FileNotFoundError as e:
        abort(e)
    except:
        abort("{} is not an INI file".format(file))


def init_logging():
    """
    Initialize the Python's standard logging facility

    Formatters are automatically set by this function, verbosity can be set by exporting the LOGLEVEL
    environment variable as defined by https://docs.python.org/3/library/logging.html#levels.
    If necessry, it is possible to alter these formats by creating the logging.ini file in the configuration
    directory - the format is the one of the Python's standard logging facility
    https://docs.python.org/3/library/logging.config.html#configuration-file-format

    Returns: nothing

    """
    if os.path.isfile("{}/{}".format(config_dir, log_config_file)):
        logging.config.fileConfig(fname="{}/{}".format(config_dir, log_config_file), disable_existing_loggers=False)
        log = logging.getLogger(__name__)
        print("logging using '{}' file".format("{}/{}".format(config_dir, log_config_file)))
    else:
        log = logging.getLogger(__name__)
        handler = logging.StreamHandler()
        formatter = logging.Formatter("%(asctime)s %(name)-12s %(levelname)-8s %(message)s")
        handler.setFormatter(formatter)
        log.addHandler(handler)
        log.setLevel(logging.INFO)
    log.setLevel(logging_loglevels[os.getenv("LOGLEVEL", "INFO")])
    return log


def git_clone(url, repo_branch="master", dest_dir=None):
    """
    
    Clone a Git repository
    
    Args:
        url:             Git repository's URL - both HTTP and SSH transports are supported
        repo_branch:     name of the branch to check out while cloning the repository
        dest_dir:        directory on the filesystem where to store the cloned repository

    Returns:
       an object representing the cloned repository
    """
    repo_name = re.sub("\.git$", "", re.sub("/$", "", url).rsplit("/", 1)[-1])
    logger.info("Cloning '{}', branch '{}'".format(url, repo_branch))
    if dest_dir is None:
        dest_dir = os.path.join("/tmp", repo_name)
    working_dir = os.path.dirname(os.path.realpath(__file__))
    try:
        if url.startswith("http"):
            logger.debug("Cloning using HTTP transport")
            repo = pygit2.clone_repository(url, dest_dir, depth=1,
                callbacks=GitCallbacks(user=credentials["git"]["username"], token=credentials["git"]["password"]))
        else:
            logger.debug("Cloning using SSH transport")
            repo = pygit2.clone_repository(url, "tools", depth=1,
                callbacks=GitCallbacks(priv_key="sshkey", pub_key="sshkey.pub",
                passphrase=credentials["git"]["password"]))
    except (GitError, ValueError) as e:
        abort(e)
    return repo


def update_metadata(changelog_file):
    """
    
    Update the metadata file, bumping the release version
    
    Args:
        changelog_file: path to the metadata file within the cloned Git repository

    Returns: Nothing

    """
    try:
        with open(os.path.join(repo_dir, changelog_file), "r") as file:
            contents = file.read()
        project_version = re.search("release:\s+(\d+\.\d+\.\d+)", contents).group(1)
        if not project_version:
            abort("Unable to guess the project's current version")
        major, minor, patchlevel = map(int, project_version.split("."))
        logger.debug("Project version tokens: - MAJOR={}, MINOR={}, PATCH={})".format(major, minor, patchlevel))
        new_version = "{}.{}.{}".format(major, minor, str(patchlevel + 1))
        logger.info("Bumping from version '{}' to version '{}'".format(project_version, new_version))
        with open(os.path.join(repo_dir, changelog_file), "r+") as file:
            contents = file.readlines()
            file.seek(0)
            file.truncate()
            for line in contents:
                match = re.sub(r"^release:.*$", "release: {}".format(new_version), line)
                if match != line:
                    line = match
                file.write(line)
        logger.debug("File '{}' successfully updated".format(changelog_file))
    except FileNotFoundError as e:
        abort(e)
    try:
        index = repository.index
        index.add(changelog_file)
        index.write()
        author = pygit2.Signature(credentials["git"]["user"], credentials["git"]["email"])
        committer = pygit2.Signature(credentials["git"]["user"], credentials["git"]["email"])
        message = "Bumping to version '{}'".format(new_version)
        tree = index.write_tree()
        ref = repository.head.name
        parents = [repository.head.target]
        return repository.create_commit(ref, author, committer, message, tree, parents)
    except (GitError, OSError) as e:
        abort(e)


def git_push(branch="", tag="", remote="origin"):
    """

    Push the specified remote

    Args:
        branch: name of the branch to push
        tag: name of the tag to push
        remote: name of the remote to push

    Returns:

    """
    try:
        if branch == "" and tag == "":
            what = repository.head.name
        elif branch != "":
            what = "refs/heads/{}".format(branch)
        elif tag != "":
            what = "refs/tags/{}".format(tag)
        logger.info("Pushing changes to git")
        if credentials["git"]["username"] and credentials["git"]["password"]:
            repository.remotes[remote].push(['+' + what],
                callbacks=GitCallbacks(user=credentials["git"]["username"], token=credentials["git"]["password"]))
        else:
            repository.remotes[remote].push(['+' + what], callbacks=GitCallbacks(priv_key="sshkey",
                pub_key="sshkey.pub", passphrase=credentials["git"]["password"]))
    except (GitError, ValueError) as e:
        abort(e)


default_config_dir = "{}/conf".format(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))

parser = argparse.ArgumentParser(description="Example script showing how to use git-python")
parser.add_argument("-c", "--config-dir", dest="config_dir", default=default_config_dir,
    help="path to the configuration directory")
args = parser.parse_args()

config_dir = args.config_dir

logger = init_logging()
logger.info("Script started")
credentials = load_credentials("{}/{}".format(config_dir, credentials_file))

repository = git_clone(repo_url, branch, repo_dir)

last_commit = repository.revparse_single("HEAD")
new_branch = repository.branches.local.create("bump", last_commit)
repository.checkout(repository.branches.local["bump"])

cmit = update_metadata("meta.yml")
git_push()

logger.info("Script ended")

Footnotes

Here it ends our tutorial on Git with Python - I hope you enjoyed it. As we saw, GitPython and PyGit2 are the most commonly used libraries - the first exploits the git command line tool, abstracting data as Python objects. The latter instead is pure Python linked to the libgit2 library, with the downside of an inherited dependency on libssh2 that brings the drawback of having the SSH transport disabled on recent Linux distributions.
Last but not least, we saw the syntax for the most common use cases, along with a full featured example for each of them.

If you appreciate this strive please and if you like this post and any other ones, just share this and the others on Linkedin - sharing and comments are an inexpensive way to push me into going on writing - this blog makes sense only if it gets visited.

I hate blogs with pop-ups, ads and all the (even worse) other stuff that distracts from the topics you're reading and violates your privacy. I want to offer my readers the best experience possible for free, ... but please be wary that for me it's not really free: on top of the raw costs of running the blog, I usually spend on average 50-60 hours writing each post. I offer all this for free because I think it's nice to help people, but if you think something in this blog has helped you professionally and you want to give concrete support, your contribution is very much appreciated: you can just use the above button.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>