Feature : add LOV Migration Script #24

muhammedBkf · 2024-10-21T08:37:21Z

No description provided.

…ore executing

…ribution

…er for the csv generator

syphax-bouazzouni

I updated the code to be more structured into modules: DumpParser, CSVGenerator and a object for each of the models to migrate: Vocabulary, Agent and Distribution.

I optimized the n3 parsing code, to not save all the graph in memory, but just building a Hash, while streaming each line/statement on the graph.

Before optimization:

➜  ncbo_cron git:(lov_migrator) ✗ bin/lov_migrator --all 
File downloaded successfully as lov.n3.gz.tmp
The local file remains unchanged.
Parsing the n3 file in memory
Parsing the n3 file in memory ended in 150.80259921000106 seconds
Find vocabularies start
Find vocabularies ended in 0.02299319504527375 seconds
Found 861 vocabularies:

After optimization:

➜  ncbo_cron git:(lov_migrator) ✗ bin/lov_migrator --all 
File downloaded successfully as lov.n3.gz.tmp
The local file remains unchanged.
Parsing the n3 file in memory
Parsing the n3 file in memory ended in 11.830599968961906 seconds with 861 subjects found
Find vocabularies ended in 5.6899734772741795e-06 seconds

Check my commits to see the details and understand the process.

And add always a description to your PRs.

syphax-bouazzouni · 2024-10-21T18:31:46Z

bin/lov_migrator

+
+initialize_csv(CSV_FILENAME)
+
+graph = parse_n3_file('test.n3')


this need to be a script argument and hard-coded 'test.n3'

syphax-bouazzouni · 2024-10-21T18:32:23Z

bin/lov_migrator

+graph = parse_n3_file('test.n3')
+
+vocab_uris = find_vocabularies(graph)
+agent_uris = find_agents(graph)


you can disable this for the moment

syphax-bouazzouni · 2024-10-21T18:34:12Z

bin/lov_migrator

+end
+
+# Start of the script
+'''


why is this not used?

syphax-bouazzouni · 2024-10-23T06:39:16Z

bin/lov_migrator

+
+    class Distribution
+
+      def extract_distribution_info(graph, distribution_uri)


to update to work with the graph as an hash

syphax-bouazzouni · 2024-10-23T06:39:42Z

bin/lov_migrator

+
+    class Agent
+
+      def extract_agent_info(graph, agent_uri)


to update to work with the graph as an hash

syphax-bouazzouni · 2024-10-23T06:40:32Z

bin/lov_migrator

+  end
+end
+
+def print_vocabulary_info(info)


to remove not needed in the final version

syphax-bouazzouni · 2024-10-23T06:41:08Z

bin/lov_migrator

+
+CSV_MAIN_ATTRS = [ :prefix, :title, :description, :keyword, :creator, :uri, :lastModifiedInLOVAt ]
+CSV_ADDED_ATTRS = [ :destination, :who, :comment ]
+CSV_DISPATCH_FILENAME = 'LOV_vocabularies_dispatch.csv'


this need to be an argument of the script

muhammedBkf · 2024-10-23T06:58:31Z

I updated the code to be more structured into modules: DumpParser, CSVGenerator and a object for each of the models to migrate: Vocabulary, Agent and Distribution.

I optimized the n3 parsing code, to not save all the graph in memory, but just building a Hash, while streaming each line/statement on the graph.

Before optimization:
➜  ncbo_cron git:(lov_migrator) ✗ bin/lov_migrator --all 
File downloaded successfully as lov.n3.gz.tmp
The local file remains unchanged.
Parsing the n3 file in memory
Parsing the n3 file in memory ended in 150.80259921000106 seconds
Find vocabularies start
Find vocabularies ended in 0.02299319504527375 seconds
Found 861 vocabularies:
After optimization:
➜  ncbo_cron git:(lov_migrator) ✗ bin/lov_migrator --all 
File downloaded successfully as lov.n3.gz.tmp
The local file remains unchanged.
Parsing the n3 file in memory
Parsing the n3 file in memory ended in 11.830599968961906 seconds with 861 subjects found
Find vocabularies ended in 5.6899734772741795e-06 seconds 
Check my commits to see the details and understand the process.

And add always a description to your PRs.

The time gain from using a ruby hash instead of loading the graph in memory is pretty impressive. I will update the script accordingly, thanks for the review.

muhammedBkf · 2024-11-08T09:49:33Z

It's closed, replaced by PR#26

muhammedBkf and others added 14 commits October 21, 2024 09:29

lov migrator script initialization

7358191

adding notation3 file parsing in lov migrator

d379d06

Add RDF vocabulary metadata extraction and mapping to ruby object

dd23c7f

add csv export for vocabularies metadata

255971c

add vocabularies dump file download + check for changes in vocabs bef…

9cefc65

…ore executing

add creator metadata to the csv main attributes

30ea364

fix creator metadata to accept multiple values

43de6b5

remove main function

c12cf02

extract agents metadata

276389c

add support for manual dispatch attributes in csv

585d8a2

add the 'date of latest modification in LOV' attribute to csv

fc5114c

update the parse_n3_file function to be faster using a ruby hash

2a4f654

create module to for the models to migrate Vocabulary, Agent and dist…

3aaeb97

…ribution

dispatch the code into two modules one for the dump parsing and anoth…

c89db6c

…er for the csv generator

syphax-bouazzouni requested changes Oct 23, 2024

View reviewed changes

add rdf and rdf/n3 gems to the Gemfile

1108bfd

muhammedBkf closed this Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature : add LOV Migration Script #24

Feature : add LOV Migration Script #24

muhammedBkf commented Oct 21, 2024

syphax-bouazzouni left a comment •

edited

Loading

syphax-bouazzouni Oct 21, 2024

syphax-bouazzouni Oct 21, 2024

syphax-bouazzouni Oct 21, 2024

syphax-bouazzouni Oct 23, 2024

syphax-bouazzouni Oct 23, 2024

syphax-bouazzouni Oct 23, 2024

syphax-bouazzouni Oct 23, 2024

muhammedBkf commented Oct 23, 2024 •

edited

Loading

muhammedBkf commented Nov 8, 2024


		initialize_csv(CSV_FILENAME)

		graph = parse_n3_file('test.n3')


		class Distribution

		def extract_distribution_info(graph, distribution_uri)

Feature : add LOV Migration Script #24

Feature : add LOV Migration Script #24

Conversation

muhammedBkf commented Oct 21, 2024

syphax-bouazzouni left a comment • edited Loading

Choose a reason for hiding this comment

syphax-bouazzouni Oct 21, 2024

Choose a reason for hiding this comment

syphax-bouazzouni Oct 21, 2024

Choose a reason for hiding this comment

syphax-bouazzouni Oct 21, 2024

Choose a reason for hiding this comment

syphax-bouazzouni Oct 23, 2024

Choose a reason for hiding this comment

syphax-bouazzouni Oct 23, 2024

Choose a reason for hiding this comment

syphax-bouazzouni Oct 23, 2024

Choose a reason for hiding this comment

syphax-bouazzouni Oct 23, 2024

Choose a reason for hiding this comment

muhammedBkf commented Oct 23, 2024 • edited Loading

muhammedBkf commented Nov 8, 2024

syphax-bouazzouni left a comment •

edited

Loading

muhammedBkf commented Oct 23, 2024 •

edited

Loading