Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature : add LOV Migration Script #24

Closed

Conversation

muhammedBkf
Copy link

No description provided.

Copy link

@syphax-bouazzouni syphax-bouazzouni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code to be more structured into modules: DumpParser, CSVGenerator and a object for each of the models to migrate: Vocabulary, Agent and Distribution.

I optimized the n3 parsing code, to not save all the graph in memory, but just building a Hash, while streaming each line/statement on the graph.

Before optimization:

➜  ncbo_cron git:(lov_migrator) ✗ bin/lov_migrator --all 
File downloaded successfully as lov.n3.gz.tmp
The local file remains unchanged.
Parsing the n3 file in memory
Parsing the n3 file in memory ended in 150.80259921000106 seconds
Find vocabularies start
Find vocabularies ended in 0.02299319504527375 seconds
Found 861 vocabularies:

After optimization:

➜  ncbo_cron git:(lov_migrator) ✗ bin/lov_migrator --all 
File downloaded successfully as lov.n3.gz.tmp
The local file remains unchanged.
Parsing the n3 file in memory
Parsing the n3 file in memory ended in 11.830599968961906 seconds with 861 subjects found
Find vocabularies ended in 5.6899734772741795e-06 seconds 

Check my commits to see the details and understand the process.

And add always a description to your PRs.

bin/lov_migrator Outdated

initialize_csv(CSV_FILENAME)

graph = parse_n3_file('test.n3')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this need to be a script argument and hard-coded 'test.n3'

bin/lov_migrator Outdated
graph = parse_n3_file('test.n3')

vocab_uris = find_vocabularies(graph)
agent_uris = find_agents(graph)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can disable this for the moment

bin/lov_migrator Outdated
end

# Start of the script
'''

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not used?


class Distribution

def extract_distribution_info(graph, distribution_uri)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to update to work with the graph as an hash


class Agent

def extract_agent_info(graph, agent_uri)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to update to work with the graph as an hash

end
end

def print_vocabulary_info(info)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to remove not needed in the final version


CSV_MAIN_ATTRS = [ :prefix, :title, :description, :keyword, :creator, :uri, :lastModifiedInLOVAt ]
CSV_ADDED_ATTRS = [ :destination, :who, :comment ]
CSV_DISPATCH_FILENAME = 'LOV_vocabularies_dispatch.csv'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this need to be an argument of the script

@muhammedBkf
Copy link
Author

muhammedBkf commented Oct 23, 2024

I updated the code to be more structured into modules: DumpParser, CSVGenerator and a object for each of the models to migrate: Vocabulary, Agent and Distribution.

I optimized the n3 parsing code, to not save all the graph in memory, but just building a Hash, while streaming each line/statement on the graph.

Before optimization:

➜  ncbo_cron git:(lov_migrator) ✗ bin/lov_migrator --all 
File downloaded successfully as lov.n3.gz.tmp
The local file remains unchanged.
Parsing the n3 file in memory
Parsing the n3 file in memory ended in 150.80259921000106 seconds
Find vocabularies start
Find vocabularies ended in 0.02299319504527375 seconds
Found 861 vocabularies:

After optimization:

➜  ncbo_cron git:(lov_migrator) ✗ bin/lov_migrator --all 
File downloaded successfully as lov.n3.gz.tmp
The local file remains unchanged.
Parsing the n3 file in memory
Parsing the n3 file in memory ended in 11.830599968961906 seconds with 861 subjects found
Find vocabularies ended in 5.6899734772741795e-06 seconds 

Check my commits to see the details and understand the process.

And add always a description to your PRs.

The time gain from using a ruby hash instead of loading the graph in memory is pretty impressive. I will update the script accordingly, thanks for the review.

@muhammedBkf
Copy link
Author

It's closed, replaced by PR#26

@muhammedBkf muhammedBkf closed this Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants