-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Fetch and detect latest modified Vocabularies using LOV SPARQL Endpoint #26
base: development
Are you sure you want to change the base?
Feature: Fetch and detect latest modified Vocabularies using LOV SPARQL Endpoint #26
Conversation
bin/lov_migrator
Outdated
attr_accessor :last_processed_date | ||
def initialize(endpoint = LOV_ENDPOINT) | ||
@lov_endpoint = endpoint | ||
@last_processed_date_file = ".last_processed_date.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do you make sure that this file does not disappear or that the script is not run from a different place/folder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We can't be sure that the file won't disappear, i'll keep it temporarily until we have a running instance of LovPortal, I'll bring the last processed date from there.
- I'll use a full path so that the file will be stored in the same place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the file disappear the script will still work and update the csv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not pass it as an argument of the script, the user will know, which date will start the import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
else | ||
if parser.last_processed_date | ||
puts "The following vocabs were changed since #{parser.last_processed_date}" | ||
puts parser.fetch_latest_changed_vocabs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't use this information anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now we just print it, to see the changed vocabs. We may use this later to make automatic updates on LovPortal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if you fetch always all the vocals, so no need to do a second SPARQL query, and just filter all the vocabs by the date.
# Exit cleanly from an early interrupt | ||
Signal.trap("INT") { exit 1 } | ||
|
||
require 'optparse' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put all require together in the very top of the file like this:
require 'optparse'
require 'open-uri'
require 'net/http'
require 'json'
require 'date'
require 'benchmark'
require 'csv'
require 'date' | ||
require 'benchmark' | ||
require 'csv' | ||
LOV_ENDPOINT = "https://lov.linkeddata.es/dataset/lov" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the constants on top also
end | ||
end.parse! | ||
|
||
raise OptionParser::MissingArgument if options[:vocabs].nil? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a message like this:
raise OptionParser::MissingArgument, "Specify vocabularies with -a or -v" if options[:vocabs].nil?
end | ||
|
||
def remote_changes? | ||
return true unless @last_processed_date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could replace these two lines by:
!@last_processed_date || @last_processed_date < latest_remote_modification_date
def sparql_query(query, accept_format = 'application/sparql-results+json') | ||
uri = URI.parse("#{@lov_endpoint}/sparql") | ||
|
||
http = Net::HTTP.new(uri.host, uri.port) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's better to keep request params grouped to make it more lisible, something like this:
response = Net::HTTP.start(uri.host, uri.port, use_ssl: uri.scheme == 'https') do |http|
request = Net::HTTP::Post.new(uri)
request.set_form_data('query' => query)
request['Accept'] = accept_format
http.request(request)
end
response = sparql_query(query, 'text/csv') | ||
end | ||
def update_latest_modification_date | ||
File.open(@last_processed_date_file, "w") do |file| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ruby I think you can do directly:
File.write(@last_processed_date_file, Date.today)
@muhammedBkf You did a great job here 👏, |
70de4d7
to
04381bb
Compare
This script aims to fetch and detect latest modified vocabs in LOV and save them to a CSV file. This is how it works:
.last_processed_date.txt
in the following format: YYYY-MM-DD