Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Fetch and detect latest modified Vocabularies using LOV SPARQL Endpoint #26

Open
wants to merge 17 commits into
base: development
Choose a base branch
from

Conversation

muhammedBkf
Copy link

@muhammedBkf muhammedBkf commented Nov 4, 2024

This script aims to fetch and detect latest modified vocabs in LOV and save them to a CSV file. This is how it works:

  1. Check if any changes happened after the latest run.
    • latest run date is saved in a local file .last_processed_date.txt in the following format: YYYY-MM-DD
  2. We print the latest changed vocabs. if there is no change we exit.
  3. update the local csv file

Screenshot from 2024-11-04 07-44-49

bin/lov_migrator Outdated
attr_accessor :last_processed_date
def initialize(endpoint = LOV_ENDPOINT)
@lov_endpoint = endpoint
@last_processed_date_file = ".last_processed_date.txt"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you make sure that this file does not disappear or that the script is not run from a different place/folder?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We can't be sure that the file won't disappear, i'll keep it temporarily until we have a running instance of LovPortal, I'll bring the last processed date from there.
  • I'll use a full path so that the file will be stored in the same place

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the file disappear the script will still work and update the csv

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not pass it as an argument of the script, the user will know, which date will start the import.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

else
if parser.last_processed_date
puts "The following vocabs were changed since #{parser.last_processed_date}"
puts parser.fetch_latest_changed_vocabs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't use this information anywhere?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now we just print it, to see the changed vocabs. We may use this later to make automatic updates on LovPortal.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if you fetch always all the vocals, so no need to do a second SPARQL query, and just filter all the vocabs by the date.

# Exit cleanly from an early interrupt
Signal.trap("INT") { exit 1 }

require 'optparse'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put all require together in the very top of the file like this:

require 'optparse'
require 'open-uri'
require 'net/http'
require 'json'
require 'date'
require 'benchmark'
require 'csv'

require 'date'
require 'benchmark'
require 'csv'
LOV_ENDPOINT = "https://lov.linkeddata.es/dataset/lov"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the constants on top also

bin/lov_migrator Show resolved Hide resolved
end
end.parse!

raise OptionParser::MissingArgument if options[:vocabs].nil?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a message like this:
raise OptionParser::MissingArgument, "Specify vocabularies with -a or -v" if options[:vocabs].nil?

end

def remote_changes?
return true unless @last_processed_date
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could replace these two lines by:
!@last_processed_date || @last_processed_date < latest_remote_modification_date

def sparql_query(query, accept_format = 'application/sparql-results+json')
uri = URI.parse("#{@lov_endpoint}/sparql")

http = Net::HTTP.new(uri.host, uri.port)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to keep request params grouped to make it more lisible, something like this:

      response = Net::HTTP.start(uri.host, uri.port, use_ssl: uri.scheme == 'https') do |http|
        request = Net::HTTP::Post.new(uri)
        request.set_form_data('query' => query)
        request['Accept'] = accept_format
        http.request(request)
      end

response = sparql_query(query, 'text/csv')
end
def update_latest_modification_date
File.open(@last_processed_date_file, "w") do |file|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ruby I think you can do directly:
File.write(@last_processed_date_file, Date.today)

@Bilelkihal
Copy link
Collaborator

@muhammedBkf You did a great job here 👏,
I added some comments on the code to suggest you some improvements to make it more lisible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants