-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature : add LOV Migration Script #24
Feature : add LOV Migration Script #24
Conversation
…er for the csv generator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the code to be more structured into modules: DumpParser
, CSVGenerator
and a object for each of the models to migrate: Vocabulary
, Agent
and Distribution
.
I optimized the n3 parsing code, to not save all the graph in memory, but just building a Hash, while streaming each line/statement on the graph.
Before optimization:
➜ ncbo_cron git:(lov_migrator) ✗ bin/lov_migrator --all
File downloaded successfully as lov.n3.gz.tmp
The local file remains unchanged.
Parsing the n3 file in memory
Parsing the n3 file in memory ended in 150.80259921000106 seconds
Find vocabularies start
Find vocabularies ended in 0.02299319504527375 seconds
Found 861 vocabularies:
After optimization:
➜ ncbo_cron git:(lov_migrator) ✗ bin/lov_migrator --all
File downloaded successfully as lov.n3.gz.tmp
The local file remains unchanged.
Parsing the n3 file in memory
Parsing the n3 file in memory ended in 11.830599968961906 seconds with 861 subjects found
Find vocabularies ended in 5.6899734772741795e-06 seconds
Check my commits to see the details and understand the process.
And add always a description to your PRs.
bin/lov_migrator
Outdated
|
||
initialize_csv(CSV_FILENAME) | ||
|
||
graph = parse_n3_file('test.n3') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this need to be a script argument and hard-coded 'test.n3'
bin/lov_migrator
Outdated
graph = parse_n3_file('test.n3') | ||
|
||
vocab_uris = find_vocabularies(graph) | ||
agent_uris = find_agents(graph) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can disable this for the moment
bin/lov_migrator
Outdated
end | ||
|
||
# Start of the script | ||
''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this not used?
|
||
class Distribution | ||
|
||
def extract_distribution_info(graph, distribution_uri) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to update to work with the graph as an hash
|
||
class Agent | ||
|
||
def extract_agent_info(graph, agent_uri) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to update to work with the graph as an hash
end | ||
end | ||
|
||
def print_vocabulary_info(info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to remove not needed in the final version
|
||
CSV_MAIN_ATTRS = [ :prefix, :title, :description, :keyword, :creator, :uri, :lastModifiedInLOVAt ] | ||
CSV_ADDED_ATTRS = [ :destination, :who, :comment ] | ||
CSV_DISPATCH_FILENAME = 'LOV_vocabularies_dispatch.csv' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this need to be an argument of the script
The time gain from using a ruby hash instead of loading the graph in memory is pretty impressive. I will update the script accordingly, thanks for the review. |
It's closed, replaced by PR#26 |
No description provided.