GitHub as a database

git is indeed a database. GitHub is a remote database powered by git.

I needed a way to keep information about specific important events in my code nicely saved for later analysis. What can be better than committing them to a VCS? Timestamps, commit descriptions etc.

I used local git first and then switched to GitHub. GitHub provides API for all its functionality.

The little code below demonstrates how this approach works.

It needs two things to be set: GITHUB_TOKEN, which can be generated in your GitHub account and the repo variable with the repository name.

It upserts a new file to create a file. Then it upserts it again to modify it. Then it deletes the file.

The repository log nicely keeps all these actions in the commit history.

Note: The PyGithub package needs to be installed first:

pip instal pygithub

Code:

import os
from typing import Optional, Union

import github
from github.Repository import Repository


def get_repo(repo: str) -> Repository:
    assert repo, 'repository name is missing'
    g = github.Github(os.environ['GITHUB_TOKEN'])
    return g.get_repo(repo)


def upsert_file(
    name: str,
    body: str,
    message: str = None,
    *,
    repo: Optional[Union[Repository, str]] = None,
    branch: str = "main",
    verbose: Optional[bool] = False,
):
    r = repo if isinstance(repo, Repository) else get_repo(repo)
    try:
        description_ = message or f'Update {name}'
        current = r.get_contents(name, ref=branch)
        current = r.update_file(
            current.path,
            description_,
            body,
            current.sha,
            branch=branch,
        )
        if verbose:
            print(current)
    except github.GithubException:
        message = message or f'Create {name}'
        created = r.create_file(name, message, body, branch=branch)
        if verbose:
            print(created)


def delete_file(
    name: str,
    message: str = None,
    *,
    repo: Optional[Union[Repository, str]] = None,
    branch: str = "main",
    verbose: Optional[bool] = False,
):
    r = repo if isinstance(repo, Repository) else get_repo(repo)
    message = message or f'Delete {name}'
    current = r.get_contents(name, ref=branch)
    deleted = r.delete_file(
        current.path,
        message,
        current.sha,
        branch=branch,
    )
    if verbose:
        print(deleted)


assert os.getenv('GITHUB_TOKEN'), 'Set GITHUB_TOKEN variable'

repo = "<YOUR_GITHUB_NAME>/<REPO_NAME>"

upsert_file("README.md", "NEW BODY", repo=repo, verbose=True)
upsert_file("README.md", "UPDATED BODY", repo=repo, verbose=True)

delete_file("README.md", repo=repo, verbose=True)

Execute it by:

python main.py

It prints something like:

{'content': ContentFile(path="README.md"), 'commit': Commit(sha="a6c540fec9b1b02e21acbb0ddd790efb6b7cb33f")}
{'commit': Commit(sha="2436e7ff2692a9af398dabd9eb9d1eee0f821954"), 'content': ContentFile(path="README.md")}
{'commit': Commit(sha="31fefb51e3510071777e4f4c8a0971de0a184f78"), 'content': NotSet}

Go to your GitHub repository and check the commits.