I wrote a bit ago about making commits via the GitHub API. That post outlined making changes in two simplified situations: making changes to a single file and making updates to two existing files at the root of the repository. Here I show a more general solution that allows arbitrary changes anywhere in the repo.
I want to be able to specify a repo and branch and say "here are the contents of files that have changed or been created and here are the names of files that have been deleted, please take all that and this message and make a new commit for me." Because the GitHub API is so rudimentary when it comes to making commits that will end up being a many-stepped process, but it's mostly the same steps repeated many times so it's not a nightmare to code up. At a high level the process goes like this:
I'll start off with the preliminaries that allow me to pull down the current repo state. I use the github3.py library for abstracting the GitHub API requests.
import os.path
from github3 import login
Basic information required for connecting to GitHub and which repo and branch to work on:
username = 'jiffyclub'
token = 'zzz'
repo_name = 'demodemo'
branch_name = 'master'
A Repository instance will be the main interface to the repo.
gh = login(username=username, token=token)
repo = gh.repository(username, repo_name)
To actually see repo contents we have to pick a specific branch. "Recursing" on the tree is how I get one long list of all the things in the repo.
# get the current repo layout
branch = repo.branch(branch_name)
tree = branch.commit.commit.tree.recurse()
h = tree.tree[0]
h.path, h.mode, h.sha, h.type
('README.md', '100644', 'c385d5f2330a39aca84f2f7999346244bbf0a997', 'blob')
By looping over the tree I can print out the whole repo structure:
for h in tree.tree:
print(h.path)
README.md dir1 dir1/dir1-1.txt dir1/dir1-2.txt dir1/dir2 dir1/dir2/dir2-1.txt dir1/dir2/dir2-2.txt dir1/dir2/dir3 dir1/dir2/dir3/dir3-1.txt dir1/dir2/dir3/dir3-2.txt dir4 dir4/dir4-1.txt dir5 dir5/dir5-1.txt dir5/dir5-2.txt dir8 dir8/dir8-1.txt dir8/dir8-2.txt root1.txt root2.txt setup.fish
Git tracks repository state using two kinds of objects: blobs, which contain file contents, and trees, which contain file and directory names pointing to blobs and other trees.
My plan is to represent the current repository state locally, modify that local state, and finally add the changes to GitHub via the API.
def split_one(path):
"""
Utility function for splitting off the very first part of a path.
Parameters
----------
path : str
Returns
-------
head, tail : str
Examples
--------
>>> split_one('a/b/c')
('a', 'b/c')
>>> split_one('d')
('', 'd')
"""
s = path.split('/', 1)
if len(s) == 1:
return '', s[0]
else:
return tuple(s)
split_one('dir1/dir2/dir3')
('dir1', 'dir2/dir3')
To match Git's blobs and trees the core of my code will be two classes:
File
and Directory
. File
will be quite simple; it will know
how to post a new blob to GitHub and not much else:
class File(object):
"""
Represents a file/blob in the repo.
Parameters
----------
name : str
Name of this file. Should contain no path components.
mode : str
'100644' for regular files,
'100755' for executable files.
sha : str
Git sha for an existing file,
omitted or None for a new/changed file.
content : str
File's contents as text.
Omitted or None for an existing file,
must be given for a changed or new file.
"""
def __init__(self, name, mode, sha=None, content=None):
self.name = name
self.mode = mode
self.sha = sha
self.content = content
def create_blob(self, repo):
"""
Post this file to GitHub as a new blob.
If this file is unchanged nothing will be done.
Parameters
----------
repo : github3.repos.repo.Repository
Authorized github3.py repository instance.
Returns
-------
dict
Dictionary of info about the blob:
path: blob's name
type: 'blob'
mode: blob's mode
sha: blob's up-to-date sha
changed: True if a new blob was created
"""
if self.sha:
# already up to date
print('Blob unchanged for {}'.format(self.name))
changed = False
else:
assert self.content is not None
print('Making blob for {}'.format(self.name))
self.sha = repo.create_blob(self.content, encoding='utf-8')
changed = True
return {'path': self.name,
'type': 'blob',
'mode': self.mode,
'sha': self.sha,
'changed': changed}
The Directory
, with its listing of files and other directories,
ties everything together.
With the root directory we can find anything else in the repo.
In fact, the hash of the root tree of a repo is what Git keeps
a record of when you make a commit.
Everything else is referenced off that tree and any trees it contains.
class Directory(object):
"""
Represents a directory/tree in the repo.
Parameters
----------
name : str
Name of directory. Should not contain any path components.
sha : str
Hash for an existing tree, omitted or None for a new tree.
"""
def __init__(self, name, sha=None):
self.name = name
self.sha = sha
self.files = {}
self.directories = {}
self.changed = False
def add_directory(self, name, sha=None):
"""
Add a new subdirectory or return an existing one.
Parameters
----------
name : str
If this contains any path components new directories
will be made to a depth necessary to construct the full path.
sha : str
Hash for an existing directory, omitted or None for a new directory.
Returns
-------
`Directory`
Reference to the named directory.
If `name` contained multiple path components only the
reference to the last directory referenced is returned.
"""
head, tail = split_one(name)
if head and head not in self.directories:
self.directories[head] = Directory(head)
elif not head:
# the input directory is a child of the current directory
if name not in self.directories:
self.directories[name] = Directory(name, sha)
return self.directories[name]
return self.directories[head].add_directory(tail, sha)
def add_file(self, name, mode, sha=None, content=None):
"""
Add a new file. An existing file with the same name
will be replaced.
Parameters
----------
name : str
Name of file. If it contains path components new
directories will be made as necessary until the
file can be made in the appropriate location.
mode : str
'100644' for regular files,
'100755' for executable files.
sha : str
Git hash for file. Required for existing files,
omitted or None for new files.
content : str
Content of a new or changed file. Omit for existing files.
Returns
-------
`File`
"""
head, tail = os.path.split(name)
if not head:
# this file belongs in this directory
if mode is None:
if tail in self.files:
# we're getting an update to an existing file
assert content is not None
mode = self.files[tail].mode
assert mode
else:
raise ValueError('Adding a new file with no mode.')
self.files[tail] = File(name, mode, sha, content)
else:
self.add_directory(head).add_file(tail, mode, sha, content)
def delete_file(self, name):
"""
Delete a named file.
Parameters
----------
name : str
Name of file to delete. May contain path components.
"""
head, tail = os.path.split(name)
if not head:
# should be in this directory
del self.files[tail]
self.changed = True
else:
self.add_directory(head).delete_file(tail)
def create_tree(self, repo):
"""
Post a new tree to GitHub.
If this directory and everything in/below it
are unchanged nothing will be done.
Parameters
----------
repo : github3.repos.repo.Repository
Authorized github3.py repository instance.
Returns
-------
tree_info : dict
'path': directory's name
'mode': '040000'
'sha': directory's up-to-date hash
'type': 'tree'
'changed': True if a new tree was posted to GitHub
"""
tree = [f.create_blob(repo) for f in self.files.values()]
tree = tree + [d.create_tree(repo) for d in self.directories.values()]
tree = list(filter(None, tree))
if not tree:
# nothing left in this directory, it should be discarded
return None
# have any subdirectories or files changed (or been deleted)?
changed = any(t['changed'] for t in tree) or self.changed
if changed:
print('Creating tree for {}'.format(self.name))
tree = [{k: v for k, v in t.items() if k != 'changed'} for t in tree]
self.sha = repo.create_tree(tree).sha
else:
print('Tree unchanged for {}'.format(self.name))
assert self.sha
return {'path': self.name,
'mode': '040000',
'sha': self.sha,
'type': 'tree',
'changed': changed}
With the File
and Directory
classes defined I can construct
the current repo state.
Everything starts with the unnamed root directory.
I filter out the blobs and trees so I can add the directories
first, though this isn't strictly necessary.
trees = [h for h in tree.tree if h.type == 'tree']
blobs = [h for h in tree.tree if h.type == 'blob']
root = Directory('', branch.commit.commit.tree.sha)
for h in trees:
root.add_directory(h.path, h.sha)
for h in blobs:
root.add_file(h.path, h.mode, h.sha)
With the repo state reconstructed locally I'll configure some changes. There are changes to existing files, new files, and file deletions.
# 'mode': None indicates it's an existing file and the mode should be kept as is
# New files must give a valid 'mode' parameter
updates = [{'path': 'README.md',
'content': 'a',
'mode': None},
{'path': 'dir1/dir1-1.txt',
'content': 'b',
'mode': None},
{'path': 'dir1/dir2/dir3/dir3-2.txt',
'content': 'c',
'mode': None},
{'path': 'dir1/dir2/dir3/dir3-3.txt',
'content': 'e',
'mode': '100644'},
{'path': 'root3.txt',
'content': 'f',
'mode': '100644'},
{'path': 'dir6/dir7/dir7-1.txt',
'content': 'g',
'mode': '100644'}]
# paths to deleted files
deleted = ['root1.txt',
'dir1/dir2/dir2-1.txt',
'dir1/dir2/dir2-2.txt',
'dir4/dir4-1.txt',
'dir8/dir8-1.txt']
The next step is to update the local repo representation:
# make our local repo reflect how we want it to look
# after changing/adding/deleting files
for thing in updates:
root.add_file(thing['path'], thing['mode'], content=thing['content'])
for d in deleted:
root.delete_file(d)
The local repo representation now has the same structure
I want the repo on GitHub to have.
To get all the updates sent to GitHub I call the
.create_tree
method on the root directory.
That method in turn calls the .create_tree
and .create_blob
methods on all the directories
and files below, which in turn do the same.
One by one each changed file and directory will have
its data sent to GitHub and finally I'll have the
hash of the new root tree that I can use in a commit.
root_info = root.create_tree(repo)
Making blob for root3.txt Blob unchanged for root2.txt Making blob for README.md Blob unchanged for setup.fish Blob unchanged for dir1-2.txt Making blob for dir1-1.txt Blob unchanged for dir3-1.txt Making blob for dir3-3.txt Making blob for dir3-2.txt Creating tree for dir3 Creating tree for dir2 Creating tree for dir1 Making blob for dir7-1.txt Creating tree for dir7 Creating tree for dir6 Blob unchanged for dir8-2.txt Creating tree for dir8 Blob unchanged for dir5-1.txt Blob unchanged for dir5-2.txt Tree unchanged for dir5 Creating tree for
root_info
{'sha': '3f1e781ebde83629df62de0a869169d29c10e435', 'path': '', 'type': 'tree', 'mode': '040000', 'changed': True}
At this point GitHub has all of my new data but there's nothing in the history of my repo pointing at this new state. That requires making a new commit. The ingredients for a new commit are a message, the sha hash of a tree (from which can be derived the entire repo state), and a parent commit(s) for linking the new commit to the rest of the project history.
new_commit = repo.create_commit('Making a whole bunch of changes all over via the GitHub API.',
tree=root_info['sha'],
parents=[branch.commit.sha])
new_commit
<Commit [Matt Davis:753e75b9891afac88ecc9fae86ec0bc11fa009c6]>
new_commit.html_url
'https://github.com/jiffyclub/demodemo/commit/753e75b9891afac88ecc9fae86ec0bc11fa009c6'
The commit is now part of my project's history, but my working branch has not been updated to point at the new commit. This happens implicitly when you work with Git at the command line, but when working via the API it has to be done manually.
The procedure for this is to get a
Reference
instance for the working branch and use its .update
method to point it at the new commit.
ref = repo.ref('heads/{}'.format(branch_name))
ref.update(new_commit.sha)
True
A return value of True
indicates success.
I haven't made any attempt here to test symlinks or binary content like images. Those could require some special handling, but I think it'll be maneagable.