Skip to content

Commit

Permalink
Keep track of the commit at which a repo was forked
Browse files Browse the repository at this point in the history
  • Loading branch information
gousiosg committed Jun 11, 2016
1 parent ad16ed2 commit 97464cd
Show file tree
Hide file tree
Showing 5 changed files with 93 additions and 7 deletions.
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright (c) 2012, Georgios Gousios
Copyright (c) 2012 - onwards Georgios Gousios <[email protected]>
All rights reserved.

Redistribution and use in source and binary forms, with or without
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ GHTorrent can be used for a variety of purposes, such as:

* Mirror the Github API event stream and follow links from events to actual data
to gradually build a [Github index](http://ghtorrent.org/)
* Create a queriable metadata index for a specific repository
* Create a queriable metadata database for a specific repository
* Construct a data source for [extracting process analytics](http://www.gousios.gr/blog/ghtorrent-project-statistics/) (see for example [those](http://ghtorrent.org/pullreq-perf/)) for one or more repositories

## Components
Expand All @@ -31,10 +31,10 @@ the retriever in order to update an SQL database (see [schema](http://ghtorrent.

The Persister and GHTorrent components have configurable back ends:

* **Persister:** Either uses MongoDB > 2.0 (`mongo` driver) or no persister (`noop` driver)
* **Persister:** Either uses MongoDB > 3.0 (`mongo` driver) or no persister (`noop` driver)
* **GHTorrent:** GHTorrent is tested mainly with MySQL and SQLite, but can theoretically be used with any SQL database compatible with [Sequel](http://sequel.rubyforge.org/rdoc/files/doc/opening_databases_rdoc.html). Your milaege may vary.

For distributed mirroring you also need RabbitMQ >= 3
For distributed mirroring you also need RabbitMQ >= 3.3

## Installation

Expand Down
69 changes: 67 additions & 2 deletions lib/ghtorrent/ghtorrent.rb
Original file line number Diff line number Diff line change
Expand Up @@ -85,8 +85,10 @@ def ensure_commit(repo, sha, user, comments = true)
# starts from what the project considers as master branch.
# [return_retrieved] Should retrieved commits be returned? If not, memory is
# saved while processing them.
def ensure_commits(user, repo, sha = nil, return_retrieved = false)
# [pages] Number of commits to retrieve. A negative number means retrieve all
def ensure_commits(user, repo, sha = nil, return_retrieved = false, num_commits = -1)

num_retrieved = 0
commits = ['foo'] # Dummy entry for simplifying the loop below
commit_acc = []
until commits.empty?
Expand All @@ -107,9 +109,14 @@ def ensure_commits(user, repo, sha = nil, return_retrieved = false)
commit_acc = commit_acc << retrieved
end

num_retrieved += retrieved.size
if num_commits > 0 and num_retrieved >= num_commits
break
end

end

commit_acc.select{|x| !x.nil?}
commit_acc.flatten.select{|x| !x.nil?}
end

##
Expand Down Expand Up @@ -561,6 +568,12 @@ def ensure_repo(user, repo, recursive = false)
else
repos.filter(:owner_id => curuser[:id], :name => repo).update(:forked_from => parent[:id])
info "Repo #{user}/#{repo} is a fork of #{parent_owner}/#{parent_repo}"

forked_commit = ensure_forked_commit(user, repo)
unless forked_commit.nil?
repos.filter(:owner_id => curuser[:id], :name => repo).update(:forked_commit_id => forked_commit[:id])
info "Repo #{user}/#{repo} was forked at #{parent_owner}/#{parent_repo}:#{forked_commit[:sha]}"
end
end
end

Expand Down Expand Up @@ -715,6 +728,58 @@ def ensure_fork_commits(owner, repo, parent_owner, parent_repo)
end
end

# Retrieve and return the commit at which the provided fork was forked at
def ensure_forked_commit(owner, repo)

fork = ensure_repo(owner, repo, false)

if fork[:forked_from].nil?
warn "Repo #{owner}/#{repo} is not a fork"
return nil
end

# Return commit if already specified
unless fork[:forked_commit_id].nil?
commit = db[:commits].where(:id => fork[:forked_commit_id]).first
return commit unless commit.nil?
end

parent = db.from(:projects, :users).\
where(:projects__owner_id => :users__id).\
where(:projects__id => fork[:forked_from]).\
select(:users__login, :projects__name).first

if parent.nil?
warn "Unknown parent for repo #{owner}/#{repo}"
return nil
end

times = 1
found = false
forked_sha = nil
while not found and times <= 100
forked_commits = ensure_commits(owner, repo, 'master',
return_retrieved = true,
commits = (10 * times))

parent_commits = ensure_commits(parent[:login], parent[:name], 'master',
return_retrieved = true, commits = 10 * times)

forked_commits.each do |c|
common_commits = parent_commits.select { |pc| pc[:sha] == c[:sha] }
unless common_commits.empty?
forked_sha = common_commits.first[:sha]
found = true
break
end
end
times += 1
end

db[:project_commits].where(:project_commits__project_id => fork[:id]).delete
db[:commits].where(:sha => forked_sha).first
end

##
# Make sure that a project has all the registered members defined
def ensure_project_members(user, repo, refresh = false)
Expand Down
2 changes: 1 addition & 1 deletion lib/ghtorrent/migrations/025_add_updated_at_projects.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

if defined?(Sequel::SQLite)
add_column :projects, :updated_at, DateTime,
:null => false, :default => 0
:null => false, :default => 56400
else
add_column :projects, :updated_at, DateTime,
:null => false, :default => Sequel::CURRENT_TIMESTAMP
Expand Down
21 changes: 21 additions & 0 deletions lib/ghtorrent/migrations/028_add_forked_commit.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
require 'sequel'
require 'ghtorrent/migrations/mysql_defaults'

Sequel.migration do

up do
puts 'Adding column fork_commit to projects'

alter_table(:projects) do
add_foreign_key :forked_commit_id, :commits
end

end

down do
puts 'Dropping column fork_commit from projects'
alter_table(:projects) do
drop_foreign_key :forked_commit_id, :commits
end
end
end

0 comments on commit 97464cd

Please sign in to comment.