Proposal for Semweb library redesign

1 Literals

1.1 Internal literal representations

Observations:

There are currently 3 compound term representations for literals:
1. literal(type(D, Lex))
2. literal(lang(Lang, Lex))
3. literal(Lex)
Simple literals (variant 3) were removed in RDF 1.1. They are now (implicitly typed) XSD strings.
RDF 1.1 specifies the values of language-tagged strings as pairs of lexical expression and language tag in that order.

Requirement:

rdf(S,P,O) <=> rdf(S,P,X), X=O (steadfastness)

Proposal:

Two representations for literals:
1. TextOrValue^^Type (value for known types)
2. Text@Lang
Output arg binds to above form.
Input for query can only be one of the above.

Assert converts the literal value:

- Prolog integer         --> xsd:integer
- Prolog float           --> xsd:double
- Prolog true/false      --> xsd:boolean
- String^^KnownType      --> Value^^KnownType
- String^^xsd:langString --> error
- 0/1^^xsd:boolean       --> true/false^^xsd:boolean
- String^^UnknownType    --> String^^UnknownType
- Atom^^KnownType        --> Value^^KnownType
- Atom^^UnknownType      --> String^^UnknownType
- String		     --> String^^xsd:string
- String@Lang	     --> String@Lang
- Atom@Lang		     --> String@Lang

Issue: time and duration representation. Two places: internal value space and Prolog representation. Options:
- Value space
  - Canonical strings
  - Time-stamp
- Prolog representation
  - Canonical strings
  - Time-stamp
  - Compound term or dict. These are expensive to create and parse, so most likely not a good idea.

New and obsolete predicates

Add rdf_canonical_literal(+Input, -Canonical).
Add rdf_match(S,P,Pattern,O)
Add rdf_match(S,P,Pattern,O,G) Pattern Pattern is first arg of current literal(Pattern, Literal) or lang_matches(Lang).
Remove option convert_typed_literal(:).

1.4 Literal sorting

Observation:

Literals are sorted, but the sorting has little relation to the SPARQL defined ordering. This implies we cannot optimize SPARQL queries that use comparison operators.

2 Simplify the basic API

2.1 API for enumerating existing terms

Observations:

rdf_resource/1 does not enumerate all resource-denoting terms. It enumerates the terms that are blank nodes or IRIs and that appear in the subject or object position of some triple.
Enumeration based on positional occurrence (rdf_subject/1 and rdf_current_predicate/1) is missing object and node support.
Enumeration based on kind (rdf_current_literal/1) is missing support for blank nodes, IRIs and names.

Proposal:

Remove rdf_resource/1.
Enumerate terms:
- Add rdf_term/1, uniquely enumerating the current RDF terms.
Enumerate terms based on their positional occurrence:
- Keep rdf_subject/1 as-is.
- Rename rdf_current_predicate/1 → rdf_predicate/1.
- Add rdf_object/1 to complement rdf_subject/1 and rdf_predicate/1.
- Add rdf_node/1, uniquely enumerating terms that occur in either the subject or object position.
Enumerate terms based on their kind:
- Rename rdf_current_literal/1 → rdf_literal/1.
- Add rdf_bnode/1, uniquely enumerating terms that are blank nodes.
- Add rdf_iri/1, uniquely enumerating terms that are neither blank node nor literal.
- Add rdf_name/1, uniquely enumerating terms that are either IRI or literal.

2.2 API for creating new terms

Observations:

Blank nodes and nodes are different concepts in RDF 1.1.

Proposal:

Rename rdf_bnode/1 → rdf_create_bnode/1.
Replace rdf_node/1 by the predicate described in section 2.1.

2.3 API for checking a term's kind

Observations:

Kind checking (rdf_is_bnode/1 and rdf_is_literal/1) is missing support for IRIs.
rdf_is_resource/1 does not succeed for all resource-denoting terms.

Proposal:

Keep rdf_is_bnode/1 as-is.
Keep rdf_is_literal/1 as-is.
Rename rdf_is_resource/1 → rdf_is_iri/1.
Change rdf_is_resource/1/rdf_is_iri/1 to not succeed for non-IRIs.

3 Load/save RDF serializations

3.1 Uniform API for parsers/writers

Observations:

Reading/writing RDF files is currently spread over multiple modules.

Proposals:

Core predicates:
- rdf_load(+Input, +Options)
- rdf_save(+Input, +Options)
Wrapper predicates:
- rdf_load(+Input)
- rdf_save(+Input)
Properties or rdf_load/[1,2] and rdf_save/[1,2]:
- Using open_any/5 and close_any/1, Input can a (1) file, (2) stream, or (3) HTTP(S) IRI.
- Using archive_data_stream/3, all archive streams of Input are processed in sequence.
Properties of rdf_load/[1,2]:
- If format(-) then use rdf_guess_format/3 to determine the serialization format.
- If Input is an HTTP(S) IRI then option base_iri is automatically set.
- Option format(?) can either be used to set a serialization format or it can be used to return the serialization format that was heuristically found.

Format guessing should probably be part of a new library, keeping the plugin based infrastructure for rdf_db.pl.

3.2 Loading named graphs

Observations:

rdf_load(+Input, +Options) with option graph(+Graph) does not make clear where quadruples are stored?
rdf_save(+Input, +Options) with option graph(+Graph) does not allow a collection graphs to be saved to a quadruples file.

Proposal:

For rdf_load/2 option dataset(-Default:atom, -Named:list(atom)) returns the graphs that data has be loaded into (i.e., Named=[] for triples).
For rdf_save/2 option dataset(+Default:atom, +Named:list(atom)) allows a subset of the currently loaded graphs to be saved to file.
Option graph(?) is a shorthand for option dataset(?,[]).

3.3 Serialization format-specific parsers/writers

Proposal:

Rename predicates used for loading/saving XML/RDF:
- rdf_load/[1,2] → rdf_xml_load/[1,2]
- rdf_save/[1,2] → rdf_xml_save/[1,2]
- rdf_save_footer/1 → rdf_xml_save_footer/1
- rdf_save_header/2 → rdf_xml_save_header/2
- rdf_save_subject/3 → rdf_xml_save_subject/3

3.4 Missing parsers/generators

The following formats are not supported:

JSON-LD
N3

4 Collections

Observations:

Collections are part of RDF not RDFS.

Proposals:

Rename arguments:
- DB → Graph
- Set → Collection
Rename predicates:
- rdfs_member/2 → rdf_member/2
- rdfs_list_to_prolog_list/2 → rdf_list/2
- rdfs_assert_list/[2,3] → rdf_assert_list/[2,3]
Support recursive lists for rdf_list/2 and rdf_assert_list/[2,3].

5 Improve terminology

5.1 API

Option names:

base_uri → base_iri.
register_namespaces → register_prefixes.
namespaces → prefixes.

5.2 Documentation

Proposal:

Section "Enumerating objects" → "Enumerating RDF terms", since non only object terms are enumerated.

Terminology:

Term	Intended meaning
RDF statement	RDF triple or RDF quadruple

Variable names:

Variable name	Intended expansion
`C`	Class
`D`	Datatype
`G`	Graph
`I`	Instance
`Lex`	Lexical expression
`O`	Object term
`P`	Predicate term
	Property
`S`	Subject term
`T`	RDF term
`V`	Value

6 Graphs

6.1 Internal quadruple representation

Observations:

Named graphs (according to the data) are currently conflated with the ability to load data into different graphs (according to the application).
Originally, Line was the line in the file from which the triple was loaded. library(rdf_persistency) uses Line as time stamp to correlate transactions on different graphs. Caching applications might use it for last access, etc. Possibly we should separate that.

Proposal:

What about saying that a triple is identified by and have a predicate rdf_triple_property(S,P,O,G,Property) to query additional properties (last access, line, etc.)?
In the internal storage format for statements the 4th argument must be the graph name (either an IRI for named graphs or the default graph). We need a new rdf/5 (probably with a different name to query).

6.2 Default graph name

The default graph is currently called user.

6.3 Graph statistics

RDF statistics does not show the correct number of graphs in case a graph is unloaded:

?- use_module(library(semweb/rdf_db)).
true.
?- rdf_create_graph(test).
true.
?- rdf_unload_graph(test).
true.
?- rdf_statistics(graphs(N)).
N = 1.
?- rdf_graph(G).
false.

7 Backward chaining

Proposals:

Rename predicates:
- rdfs_individual_of/2 → rdfs_instance_of/2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for Semweb library redesign

Proposal for Semweb library redesign

1 Literals

1.1 Internal literal representations

New and obsolete predicates

1.4 Literal sorting

2 Simplify the basic API

2.1 API for enumerating existing terms

2.2 API for creating new terms

2.3 API for checking a term's kind

3 Load/save RDF serializations

3.1 Uniform API for parsers/writers

3.2 Loading named graphs

3.3 Serialization format-specific parsers/writers

3.4 Missing parsers/generators

4 Collections

5 Improve terminology

5.1 API

5.2 Documentation

6 Graphs

6.1 Internal quadruple representation

6.2 Default graph name

6.3 Graph statistics

7 Backward chaining

Clone this wiki locally