-
Notifications
You must be signed in to change notification settings - Fork 15
Proposal for Semweb library redesign
Observations:
- There are currently 3 compound term representations for literals:
literal(type(D, Lex))
literal(lang(Lang, Lex))
literal(Lex)
- Simple literals (variant 3) were removed in RDF 1.1. They are now (implicitly typed) XSD strings.
- RDF 1.1 specifies the values of language-tagged strings as pairs of lexical expression and language tag in that order.
Proposal:
- One representation for literals:
lexical(D, Lex)
lexical(rdf:langString, Lex-Lang)
lexical(xsd:string, Lex)
-
rdf_assert(S,P,O), rdf(S,P,O)
is always true. -
rdf_assert(S,P,O), rdf(S,P,X), X = O
is always true. - It is expensive to translate to canonical lexical expressions.
Observation:
- Storing literal values i.o. lexical expressions would speed up
rdf/[3,4]
considerably.
Proposal:
- One representation for interpreted literals:
type(D, V)
type(rdf:langString, Lex-Lang)
type(xsd:string, V)
- Add
rdf_load/2
optioninterpret_literals(+bool)
to either assertlexical/1
(default) ortype/1
. - If a datatype IRI is not recognized and
interpret_literals(true)
then throw a warning and assert the literal usinglexical/1
. - Remove option
convert_typed_literal(:)
.
Observation:
- If we can presume that all literals have canonical lexical expressions then this makes it easier to search for triples.
Proposal:
- All assertion predicates that assert lexical expressions do so canonically.
Observation:
- Turtle is an often-used format for manual RDF data entry.
Proposal:
- Use Turtle-like syntactic sugar for literals:
-
Text@Lang
(Turtle:"Lex"@Lang
) -
Value^^Type
(Turtle:"Lex"^^Datatype
)
-
Observation:
- Literals are sorted, but the sorting has little relation to the SPARQL defined ordering. This implies we cannot optimize SPARQL queries that use comparison operators.
Not all predicates are currently matching language tags in language-tagged strings according to BCP 47, e.g., [2] currently does not match [1].
[1] ex:a ex:b "c"@en-US .
[2] ?- rdfs_label(rdf:s, en, Label).
There should be a difference between (1) predicates that enumerate existing RDF terms, (2) predicates that create new RDF terms and (3) predicates that check whether something is an RDF term.
Observations:
-
rdf_resource/1
does not enumerate all resource-denoting terms. It enumerates the terms that are blank nodes or IRIs and that appear in the subject or object position of some triple. - Enumeration based on positional occurrence (
rdf_subject/1
andrdf_current_predicate/1
) is missing object and node support. - Enumeration based on kind (
rdf_current_literal/1
) is missing support for blank nodes, IRIs and names.
Proposal:
- Remove
rdf_resource/1
. - Enumerate terms:
- Add
rdf_term/1
, uniquely enumerating the current RDF terms.
- Add
- Enumerate terms based on their positional occurrence:
- Keep
rdf_subject/1
as-is. - Rename
rdf_current_predicate/1
→rdf_predicate/1
. - Add
rdf_object/1
to complementrdf_subject/1
andrdf_predicate/1
. - Add
rdf_node/1
, uniquely enumerating terms that occur in either the subject or object position.
- Keep
- Enumerate terms based on their kind:
- Rename
rdf_current_literal/1
→rdf_literal/1
. - Add
rdf_bnode/1
, uniquely enumerating terms that are blank nodes. - Add
rdf_iri/1
, uniquely enumerating terms that are neither blank node nor literal. - Add
rdf_name/1
, uniquely enumerating terms that are either IRI or literal.
- Rename
Observations:
- Blank nodes and nodes are different concepts in RDF 1.1.
Proposal:
- Rename
rdf_bnode/1
→rdf_create_bnode/1
. - Replace
rdf_node/1
by the predicate described in section 2.1.
Observations:
- Kind checking (
rdf_is_bnode/1
andrdf_is_literal/1
) is missing support for IRIs. -
rdf_is_resource/1
does not succeed for all resource-denoting terms.
Proposal:
- Keep
rdf_is_bnode/1
as-is. - Keep
rdf_is_literal/1
as-is. - Rename
rdf_is_resource/1
→rdf_is_iri/1
. - Change
rdf_is_resource/1
/rdf_is_iri/1
to not succeed for non-IRIs.
Observations:
- Reading/writing RDF files is currently spread over multiple modules.
Proposals:
- Core predicates:
rdf_load(+Input, +Options)
rdf_save(+Input, +Options)
- Wrapper predicates:
rdf_load(+Input)
rdf_save(+Input)
- Properties or
rdf_load/[1,2]
andrdf_save/[1,2]
:- Using
open_any/5
andclose_any/1
,Input
can a (1) file, (2) stream, or (3) HTTP(S) IRI. - Using
archive_data_stream/3
, all archive streams ofInput
are processed in sequence.
- Using
- Properties of
rdf_load/[1,2]
:- If
format(-)
then userdf_guess_format/3
to determine the serialization format. - If
Input
is an HTTP(S) IRI then optionbase_iri
is automatically set. - Option
format(?)
can either be used to set a serialization format or it can be used to return the serialization format that was heuristically found.
- If
Observations:
-
rdf_load(+Input, +Options)
with optiongraph(+Graph)
does not make clear where quadruples are stored? -
rdf_save(+Input, +Options)
with optiongraph(+Graph)
does not allow a collection graphs to be saved to a quadruples file.
Proposal:
- For
rdf_load/2
optiondataset(-Default:atom, -Named:list(atom))
returns the graphs that data has be loaded into (i.e.,Named=[]
for triples). - For
rdf_save/2
optiondataset(+Default:atom, +Named:list(atom))
allows a subset of the currently loaded graphs to be saved to file. - Option
graph(?)
is a shorthand for optiondataset(?,[])
.
Proposal:
- Rename predicates used for loading/saving XML/RDF:
-
rdf_load/[1,2]
→rdf_xml_load/[1,2]
-
rdf_save/[1,2]
→rdf_xml_save/[1,2]
-
rdf_save_footer/1
→rdf_xml_save_footer/1
-
rdf_save_header/2
→rdf_xml_save_header/2
-
rdf_save_subject/3
→rdf_xml_save_subject/3
-
The following formats are not supported:
- JSON-LD
- N3
Observations:
- Collections are part of RDF not RDFS.
Proposals:
- Rename arguments:
-
DB
→Graph
-
Set
→Collection
-
- Rename predicates:
-
rdfs_member/2
→rdf_member/2
-
rdfs_list_to_prolog_list/2
→rdf_list/2
-
rdfs_assert_list/[2,3]
→rdf_assert_list/[2,3]
-
- Support recursive lists for
rdf_list/2
andrdf_assert_list/[2,3]
.
Option names:
-
base_uri
→base_iri
. -
register_namespaces
→register_prefixes
. -
namespaces
→prefixes
.
Proposal:
- Section "Enumerating objects" → "Enumerating RDF terms", since non only object terms are enumerated.
Terminology:
Term | Intended meaning |
---|---|
RDF statement | RDF triple or RDF quadruple |
Variable names:
Variable name | Intended expansion |
---|---|
C |
Class |
D |
Datatype |
G |
Graph |
I |
Instance |
Lex |
Lexical expression |
O |
Object term |
P |
Predicate term |
Property | |
S |
Subject term |
T |
RDF term |
V |
Value |
Observations:
- Named graphs (according to the data) are currently conflated with the ability to load data into different graphs (according to the application).
- Originally,
Line
was the line in the file from which the triple was loaded.library(rdf_persistency)
usesLine
as time stamp to correlate transactions on different graphs. Caching applications might use it for last access, etc. Possibly we should separate that.
Proposal:
- What about saying that a triple is identified by and have a predicate
rdf_triple_property(S,P,O,G,Property)
to query additional properties (last access, line, etc.)? - In the internal storage format for statements the 4th argument must be the graph name (either an IRI for named graphs or the default graph). We need a new rdf/5 (probably with a different name to query).
The default graph is currently called user
.
RDF statistics does not show the correct number of graphs in case a graph is unloaded:
?- use_module(library(semweb/rdf_db)).
true.
?- rdf_create_graph(test).
true.
?- rdf_unload_graph(test).
true.
?- rdf_statistics(graphs(N)).
N = 1.
?- rdf_graph(G).
false.
Proposals:
- Rename predicates:
-
rdfs_individual_of/2
→rdfs_instance_of/2
-