Skip to content

Proposal for Semweb library redesign

Wouter Beek edited this page Oct 7, 2015 · 18 revisions

Proposal for Semweb library redesign

1 Literals

1.1 Internal literal representations

Observations:

  • There are currently 3 compound term representations for literals:
    1. literal(type(D, Lex))
    2. literal(lang(Lang, Lex))
    3. literal(Lex)
  • Simple literals (variant 3) were removed in RDF 1.1. They are now (implicitly typed) XSD strings.
  • RDF 1.1 specifies the values of language-tagged strings as pairs of lexical expression and language tag in that order.

Requirement:

  • rdf(S,P,O) <=> rdf(S,P,X), X=O (steadfastness)

Proposal:

  • Two representations for literals:
    1. TextOrValue^^Type (value for known types)
    2. Text@Lang
  • Output arg binds to above form.
  • Input for query can only be one of the above.

Assert converts the literal value:

- Prolog integer         --> xsd:integer
- Prolog float           --> xsd:double
- Prolog true/false      --> xsd:boolean
- String^^KnownType      --> Value^^KnownType
- String^^xsd:langString --> error
- 0/1^^xsd:boolean       --> true/false^^xsd:boolean
- String^^UnknownType    --> String^^UnknownType
- Atom^^KnownType        --> Value^^KnownType
- Atom^^UnknownType      --> String^^UnknownType
- String		     --> String^^xsd:string
- String@Lang	     --> String@Lang
- Atom@Lang		     --> String@Lang
  • Issue: time and duration representation. Two places: internal value space and Prolog representation. Options:

    • Value space
      • Canonical strings
      • Time-stamp
    • Prolog representation
      • Canonical strings
      • Time-stamp
      • Compound term or dict. These are expensive to create and parse, so most likely not a good idea.

New and obsolete predicates

  • Add rdf_canonical_literal(+Input, -Canonical).

  • Add rdf_match(S,P,Pattern,O)

  • Add rdf_match(S,P,Pattern,O,G) Pattern Pattern is first arg of current literal(Pattern, Literal) or lang_matches(Lang).

  • Remove option convert_typed_literal(:).

1.4 Literal sorting

Observation:

  • Literals are sorted, but the sorting has little relation to the SPARQL defined ordering. This implies we cannot optimize SPARQL queries that use comparison operators.

2 Simplify the basic API

2.1 API for enumerating existing terms

Observations:

  • rdf_resource/1 does not enumerate all resource-denoting terms. It enumerates the terms that are blank nodes or IRIs and that appear in the subject or object position of some triple.
  • Enumeration based on positional occurrence (rdf_subject/1 and rdf_current_predicate/1) is missing object and node support.
  • Enumeration based on kind (rdf_current_literal/1) is missing support for blank nodes, IRIs and names.

Proposal:

  • Remove rdf_resource/1.
  • Enumerate terms:
    • Add rdf_term/1, uniquely enumerating the current RDF terms.
  • Enumerate terms based on their positional occurrence:
    • Keep rdf_subject/1 as-is.
    • Rename rdf_current_predicate/1rdf_predicate/1.
    • Add rdf_object/1 to complement rdf_subject/1 and rdf_predicate/1.
    • Add rdf_node/1, uniquely enumerating terms that occur in either the subject or object position.
  • Enumerate terms based on their kind:
    • Rename rdf_current_literal/1rdf_literal/1.
    • Add rdf_bnode/1, uniquely enumerating terms that are blank nodes.
    • Add rdf_iri/1, uniquely enumerating terms that are neither blank node nor literal.
    • Add rdf_name/1, uniquely enumerating terms that are either IRI or literal.

2.2 API for creating new terms

Observations:

  • Blank nodes and nodes are different concepts in RDF 1.1.

Proposal:

  • Rename rdf_bnode/1rdf_create_bnode/1.
  • Replace rdf_node/1 by the predicate described in section 2.1.

2.3 API for checking a term's kind

Observations:

  • Kind checking (rdf_is_bnode/1 and rdf_is_literal/1) is missing support for IRIs.
  • rdf_is_resource/1 does not succeed for all resource-denoting terms.

Proposal:

  • Keep rdf_is_bnode/1 as-is.
  • Keep rdf_is_literal/1 as-is.
  • Rename rdf_is_resource/1rdf_is_iri/1.
  • Change rdf_is_resource/1/rdf_is_iri/1 to not succeed for non-IRIs.

3 Load/save RDF serializations

3.1 Uniform API for parsers/writers

Observations:

  • Reading/writing RDF files is currently spread over multiple modules.

Proposals:

  • Core predicates:
    • rdf_load(+Input, +Options)
    • rdf_save(+Input, +Options)
  • Wrapper predicates:
    • rdf_load(+Input)
    • rdf_save(+Input)
  • Properties or rdf_load/[1,2] and rdf_save/[1,2]:
    • Using open_any/5 and close_any/1, Input can a (1) file, (2) stream, or (3) HTTP(S) IRI.
    • Using archive_data_stream/3, all archive streams of Input are processed in sequence.
  • Properties of rdf_load/[1,2]:
    • If format(-) then use rdf_guess_format/3 to determine the serialization format.
    • If Input is an HTTP(S) IRI then option base_iri is automatically set.
    • Option format(?) can either be used to set a serialization format or it can be used to return the serialization format that was heuristically found.

Format guessing should probably be part of a new library, keeping the plugin based infrastructure for rdf_db.pl.

3.2 Loading named graphs

Observations:

  • rdf_load(+Input, +Options) with option graph(+Graph) does not make clear where quadruples are stored?
  • rdf_save(+Input, +Options) with option graph(+Graph) does not allow a collection graphs to be saved to a quadruples file.

Proposal:

  • For rdf_load/2 option dataset(-Default:atom, -Named:list(atom)) returns the graphs that data has be loaded into (i.e., Named=[] for triples).
  • For rdf_save/2 option dataset(+Default:atom, +Named:list(atom)) allows a subset of the currently loaded graphs to be saved to file.
  • Option graph(?) is a shorthand for option dataset(?,[]).

3.3 Serialization format-specific parsers/writers

Proposal:

  • Rename predicates used for loading/saving XML/RDF:
    • rdf_load/[1,2]rdf_xml_load/[1,2]
    • rdf_save/[1,2]rdf_xml_save/[1,2]
    • rdf_save_footer/1rdf_xml_save_footer/1
    • rdf_save_header/2rdf_xml_save_header/2
    • rdf_save_subject/3rdf_xml_save_subject/3

3.4 Missing parsers/generators

The following formats are not supported:

  • JSON-LD
  • N3

4 Collections

Observations:

  • Collections are part of RDF not RDFS.

Proposals:

  • Rename arguments:
    • DBGraph
    • SetCollection
  • Rename predicates:
    • rdfs_member/2rdf_member/2
    • rdfs_list_to_prolog_list/2rdf_list/2
    • rdfs_assert_list/[2,3]rdf_assert_list/[2,3]
  • Support recursive lists for rdf_list/2 and rdf_assert_list/[2,3].

5 Improve terminology

5.1 API

Option names:

  • base_uribase_iri.
  • register_namespacesregister_prefixes.
  • namespacesprefixes.

5.2 Documentation

Proposal:

  • Section "Enumerating objects" → "Enumerating RDF terms", since non only object terms are enumerated.

Terminology:

Term Intended meaning
RDF statement RDF triple or RDF quadruple

Variable names:

Variable name Intended expansion
C Class
D Datatype
G Graph
I Instance
Lex Lexical expression
O Object term
P Predicate term
Property
S Subject term
T RDF term
V Value

6 Graphs

6.1 Internal quadruple representation

Observations:

  • Named graphs (according to the data) are currently conflated with the ability to load data into different graphs (according to the application).
  • Originally, Line was the line in the file from which the triple was loaded. library(rdf_persistency) uses Line as time stamp to correlate transactions on different graphs. Caching applications might use it for last access, etc. Possibly we should separate that.

Proposal:

  • What about saying that a triple is identified by and have a predicate rdf_triple_property(S,P,O,G,Property) to query additional properties (last access, line, etc.)?
  • In the internal storage format for statements the 4th argument must be the graph name (either an IRI for named graphs or the default graph). We need a new rdf/5 (probably with a different name to query).

6.2 Default graph name

The default graph is currently called user.

6.3 Graph statistics

RDF statistics does not show the correct number of graphs in case a graph is unloaded:

?- use_module(library(semweb/rdf_db)).
true.
?- rdf_create_graph(test).
true.
?- rdf_unload_graph(test).
true.
?- rdf_statistics(graphs(N)).
N = 1.
?- rdf_graph(G).
false.

7 Backward chaining

Proposals:

  • Rename predicates:
    • rdfs_individual_of/2rdfs_instance_of/2
Clone this wiki locally