-
Notifications
You must be signed in to change notification settings - Fork 15
Proposal for Semweb library redesign
Observations:
- There are currently 3 compound term representations for literals:
literal(type(D, Lex))
literal(lang(Lang, Lex))
literal(Lex)
- Simple literals (variant 3) were removed in RDF 1.1. They are now (implicitly typed) XSD strings.
- RDF 1.1 specifies the values of language-tagged strings as pairs of lexical expression and language tag in that order.
Requirement:
-
rdf(S,P,O)
<=>rdf(S,P,X), X=O
(steadfastness)
Proposal:
- Two representations for literals:
-
TextOrValue^^Type
(value for known types) Text@Lang
-
- Output arg binds to above form.
- Input for query can only be one of the above.
Assert converts the literal value:
- Prolog integer --> xsd:integer
- Prolog float --> xsd:double
- Prolog true/false --> xsd:boolean
- String^^KnownType --> Value^^KnownType
- String^^xsd:langString --> error
- 0/1^^xsd:boolean --> true/false^^xsd:boolean
- String^^UnknownType --> String^^UnknownType
- Atom^^KnownType --> Value^^KnownType
- Atom^^UnknownType --> String^^UnknownType
- String --> String^^xsd:string
- String@Lang --> String@Lang
- Atom@Lang --> String@Lang
-
Issue: time and duration representation. Two places: internal value space and Prolog representation. Options:
- Value space
- Canonical strings
- Time-stamp
- Prolog representation
- Canonical strings
- Time-stamp
- Compound term or dict. These are expensive to create and parse, so most likely not a good idea.
- Value space
-
Add rdf_canonical_literal(+Input, -Canonical).
-
Add rdf_match(S,P,Pattern,O)
-
Add rdf_match(S,P,Pattern,O,G) Pattern Pattern is first arg of current literal(Pattern, Literal) or lang_matches(Lang).
-
Remove option
convert_typed_literal(:)
.
Observation:
- Literals are sorted, but the sorting has little relation to the SPARQL defined ordering. This implies we cannot optimize SPARQL queries that use comparison operators.
Observations:
-
rdf_resource/1
does not enumerate all resource-denoting terms. It enumerates the terms that are blank nodes or IRIs and that appear in the subject or object position of some triple. - Enumeration based on positional occurrence (
rdf_subject/1
andrdf_current_predicate/1
) is missing object and node support. - Enumeration based on kind (
rdf_current_literal/1
) is missing support for blank nodes, IRIs and names.
Proposal:
- Remove
rdf_resource/1
. - Enumerate terms:
- Add
rdf_term/1
, uniquely enumerating the current RDF terms.
- Add
- Enumerate terms based on their positional occurrence:
- Keep
rdf_subject/1
as-is. - Rename
rdf_current_predicate/1
→rdf_predicate/1
. - Add
rdf_object/1
to complementrdf_subject/1
andrdf_predicate/1
. - Add
rdf_node/1
, uniquely enumerating terms that occur in either the subject or object position.
- Keep
- Enumerate terms based on their kind:
- Rename
rdf_current_literal/1
→rdf_literal/1
. - Add
rdf_bnode/1
, uniquely enumerating terms that are blank nodes. - Add
rdf_iri/1
, uniquely enumerating terms that are neither blank node nor literal. - Add
rdf_name/1
, uniquely enumerating terms that are either IRI or literal.
- Rename
Observations:
- Blank nodes and nodes are different concepts in RDF 1.1.
Proposal:
- Rename
rdf_bnode/1
→rdf_create_bnode/1
. - Replace
rdf_node/1
by the predicate described in section 2.1.
Observations:
- Kind checking (
rdf_is_bnode/1
andrdf_is_literal/1
) is missing support for IRIs. -
rdf_is_resource/1
does not succeed for all resource-denoting terms.
Proposal:
- Keep
rdf_is_bnode/1
as-is. - Keep
rdf_is_literal/1
as-is. - Rename
rdf_is_resource/1
→rdf_is_iri/1
. - Change
rdf_is_resource/1
/rdf_is_iri/1
to not succeed for non-IRIs.
Observations:
- Reading/writing RDF files is currently spread over multiple modules.
Proposals:
- Core predicates:
rdf_load(+Input, +Options)
rdf_save(+Input, +Options)
- Wrapper predicates:
rdf_load(+Input)
rdf_save(+Input)
- Properties or
rdf_load/[1,2]
andrdf_save/[1,2]
:- Using
open_any/5
andclose_any/1
,Input
can a (1) file, (2) stream, or (3) HTTP(S) IRI. - Using
archive_data_stream/3
, all archive streams ofInput
are processed in sequence.
- Using
- Properties of
rdf_load/[1,2]
:- If
format(-)
then userdf_guess_format/3
to determine the serialization format. - If
Input
is an HTTP(S) IRI then optionbase_iri
is automatically set. - Option
format(?)
can either be used to set a serialization format or it can be used to return the serialization format that was heuristically found.
- If
Format guessing should probably be part of a new library, keeping the plugin based infrastructure for rdf_db.pl.
Observations:
-
rdf_load(+Input, +Options)
with optiongraph(+Graph)
does not make clear where quadruples are stored? -
rdf_save(+Input, +Options)
with optiongraph(+Graph)
does not allow a collection graphs to be saved to a quadruples file.
Proposal:
- For
rdf_load/2
optiondataset(-Default:atom, -Named:list(atom))
returns the graphs that data has be loaded into (i.e.,Named=[]
for triples). - For
rdf_save/2
optiondataset(+Default:atom, +Named:list(atom))
allows a subset of the currently loaded graphs to be saved to file. - Option
graph(?)
is a shorthand for optiondataset(?,[])
.
Proposal:
- Rename predicates used for loading/saving XML/RDF:
-
rdf_load/[1,2]
→rdf_xml_load/[1,2]
-
rdf_save/[1,2]
→rdf_xml_save/[1,2]
-
rdf_save_footer/1
→rdf_xml_save_footer/1
-
rdf_save_header/2
→rdf_xml_save_header/2
-
rdf_save_subject/3
→rdf_xml_save_subject/3
-
The following formats are not supported:
- JSON-LD
- N3
Observations:
- Collections are part of RDF not RDFS.
Proposals:
- Rename arguments:
-
DB
→Graph
-
Set
→Collection
-
- Rename predicates:
-
rdfs_member/2
→rdf_member/2
-
rdfs_list_to_prolog_list/2
→rdf_list/2
-
rdfs_assert_list/[2,3]
→rdf_assert_list/[2,3]
-
- Support recursive lists for
rdf_list/2
andrdf_assert_list/[2,3]
.
Option names:
-
base_uri
→base_iri
. -
register_namespaces
→register_prefixes
. -
namespaces
→prefixes
.
Proposal:
- Section "Enumerating objects" → "Enumerating RDF terms", since non only object terms are enumerated.
Terminology:
Term | Intended meaning |
---|---|
RDF statement | RDF triple or RDF quadruple |
Variable names:
Variable name | Intended expansion |
---|---|
C |
Class |
D |
Datatype |
G |
Graph |
I |
Instance |
Lex |
Lexical expression |
O |
Object term |
P |
Predicate term |
Property | |
S |
Subject term |
T |
RDF term |
V |
Value |
Observations:
- Named graphs (according to the data) are currently conflated with the ability to load data into different graphs (according to the application).
- Originally,
Line
was the line in the file from which the triple was loaded.library(rdf_persistency)
usesLine
as time stamp to correlate transactions on different graphs. Caching applications might use it for last access, etc. Possibly we should separate that.
Proposal:
- What about saying that a triple is identified by and have a predicate
rdf_triple_property(S,P,O,G,Property)
to query additional properties (last access, line, etc.)? - In the internal storage format for statements the 4th argument must be the graph name (either an IRI for named graphs or the default graph). We need a new rdf/5 (probably with a different name to query).
The default graph is currently called user
.
RDF statistics does not show the correct number of graphs in case a graph is unloaded:
?- use_module(library(semweb/rdf_db)).
true.
?- rdf_create_graph(test).
true.
?- rdf_unload_graph(test).
true.
?- rdf_statistics(graphs(N)).
N = 1.
?- rdf_graph(G).
false.
Proposals:
- Rename predicates:
-
rdfs_individual_of/2
→rdfs_instance_of/2
-