-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split mediaType
into two properties; one for Link, one for Object? (contentType
?)
#638
Comments
big +1 for this, i think it makes a lot of sense to split these out.
An RDF Literal is either language/direction-tagged, or it is coerced to a
different type, but not both
Is this limitation tracked anywhere? Is "coerced to a different type" the
right semantics we're looking for? A request we've had in the past is that
it would be very helpful to specify the media types of summary, source, and
other natural language values, and it's annoying to have to define new
properties for each one.
…On Sat, Feb 8, 2025 at 10:06 PM a ***@***.***> wrote:
https://www.w3.org/TR/activitystreams-vocabulary/#dfn-mediatype
When used on a Link
<https://www.w3.org/TR/activitystreams-vocabulary/#dfn-link>, identifies
the MIME media type of the referenced resource.
When used on an Object
<https://www.w3.org/TR/activitystreams-vocabulary/#dfn-object>,
identifies the MIME media type of the value of the content
<https://www.w3.org/TR/activitystreams-vocabulary/#dfn-content> property.
If not specified, the content
<https://www.w3.org/TR/activitystreams-vocabulary/#dfn-content> property
is assumed to contain text/html content.
This kind of "multiple applicability" is generally bad semantic design,
since you are using the same term/symbol for different concepts. When used
differently, there should be different terms.
I would say that in a next version, we consider doing something like:
mediaType
: Domain: Link
: Range: MIME media type
: Functional: True
: Comment: Identifies the MIME media type of the referenced resource (href
of a Link).
contentType
: Domain: Object
: Range: MIME media type
: Functional: True
: Comment: Identifies the MIME media type of the value of the content
property. If not specified, the content property is assumed to contain
text/html content.
Motivationally, mediaType has incredibly limited applicability in general
when applied to Object.content, with binary representations not really
making sense for values of content. But it makes sense to use all kinds
of different values when applied to Link.href, as the referenced resource
can be basically anything (text or binary).
------------------------------
Tangentially, one other thing that might work and might align better with
JSON-LD / RDF is to look into "typed values", so basically something like
this in expanded JSON-LD form:
{
"https://www.w3.org/ns/activitystreams#content": {
***@***.***": "<p lang='en'>hello world</p>",
***@***.***": "http://www.w3.org/1999/02/22-rdf-syntax-ns#HTML"
}
}
Which is equivalent to the following Turtle/N-Triples:
_:b0 <https://www.w3.org/ns/activitystreams#content> "<p lang='en'>hello world</p>"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#HTML>
Not sure how much sense this makes, though... What we currently have is
that we use the language tag @language which cannot be used at the same
type as @type. An RDF Literal is either language/direction-tagged, or it
is coerced to a different type, but not both. So it might make sense to
leave it as a sort of contentType property while the value of content
remains a language-tagged literal string.
—
Reply to this email directly, view it on GitHub
<#638>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABZCV2VPFMRHMFWLIZZ5ZT2O3V67AVCNFSM6AAAAABWYPBNNSVHI2DSMVQWIX3LMV43ASLTON2WKOZSHA2DANBWGI2DAMY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I think we can't fully avoid the "different new properties", and to an extent this mirrors how things like HTTP have a
RDF Literals https://w3c.github.io/rdf-concepts/spec/#section-Graph-Literal are composed of the following:
In concrete syntaxes like JSON-LD or Turtle, there is often syntactic sugar for "simple literals", which don't have an explicitly stated datatype, but instead the datatype is inferred based on the syntax. For example,
If you try to use both features on the JSON-LD playground this becomes a bit more apparent:
RDF 1.2 also warns about this when defining rdf:HTML like so: https://w3c.github.io/rdf-concepts/spec/#section-html
So in expanded JSON-LD form, these are fine:
The former is a language-tagged string ( The latter is an HTML Literal, which is not the same as a language-tagged string. Literals can only have one datatype, unlike Resources which can have multiple types/classes via Based on my current understanding, we essentially have to choose between:
In the interest of taking the least destructive path, probably a simple single |
is there any particular justification to restrict language tagging to only
a single data type? that feels like the most parsimonious restriction to
loosen
…On Sun, Feb 9, 2025, 9:42 PM a ***@***.***> wrote:
I think we can't fully avoid the "different new properties", and to an
extent this mirrors how things like HTTP have a Content-Type header which
functions identically to how as:mediaType applies to Object.content. The
difference is that an AS2 document can have more than just content. But
still, we're probably dealing with at most 2 or 3 properties here --
contentType and maybe summaryType if something like #620
<#620> gets considered, plus
something for links (mediaType could be reused, but we could also define
hrefType if we *really* wanted to...). We've already declared name MUST
be plain-text, and that requirement makes sense. Are there other properties
that need a variable media type for their literal values?
------------------------------
Is this limitation tracked anywhere?
RDF Literals
https://w3c.github.io/rdf-concepts/spec/#section-Graph-Literal are
composed of the following:
- Lexical form. This is a simple string representation of the value.
- Datatype IRI. This is what the lexical form gets "coerced to".
In concrete syntaxes like JSON-LD or Turtle, there is often syntactic
sugar for "simple literals", which don't have an explicitly stated
datatype, but instead the datatype is inferred based on the syntax. For
example, "foo" in Turtle is equivalent to "foo"^^<
http://www.w3.org/2001/XMLSchema#string> by default. 1 in Turtle is
equivalent to "1"^^<http://www.w3.org/2001/XMLSchema#integer>. For
JSON-LD, similar syntactic sugar is used to convert a JSON string value
into an xsd:string Literal, JSON numbers into either xsd:integer or
xsd:double Literal, and JSON boolean into xsd:boolean Literal.
- IFF the datatype IRI is
http://www.w3.org/1999/02/22-rdf-syntax-ns#langString, then a third
component is the BCP47 language (JSON-LD @language, Turtle ***@***.***).
- IFF the datatype IRI is
http://www.w3.org/1999/02/22-rdf-syntax-ns#dirLangString, then there
is a third component for the language as above, as well as a fourth
component for the direction (ltr or rtl, expressed with JSON-LD
@direction or with Turtle ***@***.***)
If you try to use both features on the JSON-LD playground this becomes a
bit more apparent:
{"https://www.w3.org/ns/activitystreams#content": {
***@***.***": "<p>Hello world</p>",
***@***.***": "en",
***@***.***": "http://www.w3.org/1999/02/22-rdf-syntax-ns#HTML"
}}
jsonld.SyntaxError: Invalid JSON-LD syntax; an element containing ***@***.***
<https://github.com/value>" may not contain both ***@***.***
<https://github.com/type>" and either ***@***.***
<https://github.com/language>" or ***@***.***
<https://github.com/direction>".
RDF 1.2 also warns about this when defining rdf:HTML like so:
https://w3c.github.io/rdf-concepts/spec/#section-html
Any language annotation (lang="…"), text directionality annotation
(dir="…"), or XML namespaces (xmlns) desired in the HTML content must be
included explicitly in the HTML literal. [...]
So in expanded JSON-LD form, these are fine:
{"https://www.w3.org/ns/activitystreams#content": {
***@***.***": "<p>Hello world</p>",
***@***.***": "en"
}}
{"https://www.w3.org/ns/activitystreams#content": {
***@***.***": "<p lang='en'>Hello world</p>",
***@***.***": "http://www.w3.org/1999/02/22-rdf-syntax-ns#HTML"
}}
The former is a language-tagged string (rdf:langString) in the English
language (en), equivalent to "<p>Hello ***@***.*** but applications
know out-of-band that they can parse it as HTML (based on contentType or
its default value).
The latter is an HTML Literal, which is not the same as a language-tagged
string. Literals can only have one datatype, unlike Resources which can
have multiple types/classes via rdf:type. As an HTML Literal, you know to
parse it as HTML, but any language or direction information needs to be
encoded in-band within the Literal's value.
Based on my current understanding, we essentially have to choose between:
- Define datatype IRIs for every MIME type we are interested in using,
a la rdf:HTML (so maybe something like example:Markdown as a datatype
IRI for Markdown literals?)
- Use MIME types in a separate property (so something like contentType:
"text/html" informs how to process the value of content)
- If we wanted to go further in a not-so-backwards-compatible way,
we could bundle these together into an object node? ({"value":
"foo", "mediaType": "text/plain"}) -- but this would probably not
be worth it, since it's not directly a natural language property anymore...
In the interest of taking the least destructive path, probably a simple
single contentType will work fine here. If there was justification for
exploding it or unpacking it further, then maybe that could be done, but I
don't particularly see that justification right now. If anything, it would
make more sense to define a content property that was always HTML (typed
value rdf:HTML Literal) and then encode the language and direction within
that HTML literal, but this is perhaps similarly unjustifiable at this
point.
—
Reply to this email directly, view it on GitHub
<#638 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABZCV4HMJM2BQ7ST3MBROD2PA34NAVCNFSM6AAAAABWYPBNNSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBWHE3DSOJZHA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
If that restriction is to be loosened, it needs to be loosened all the way down the stack at the RDF abstract level (and then at the JSON-LD concrete level). I'm not sure how we would proceed there... |
https://www.w3.org/TR/activitystreams-vocabulary/#dfn-mediatype
This kind of "multiple applicability" is generally bad semantic design, since you are using the same term/symbol for different concepts. When used differently, there should be different terms.
I would say that in a next version, we consider doing something like:
mediaType
: Domain: Link
: Range: MIME media type
: Functional: True
: Comment: Identifies the MIME media type of the referenced resource (
href
of aLink
).contentType
: Domain: Object
: Range: MIME media type
: Functional: True
: Comment: Identifies the MIME media type of the value of the content property. If not specified, the content property is assumed to contain
text/html
content.Motivationally,
mediaType
has incredibly limited applicability in general when applied toObject.content
, with binary representations not really making sense for values ofcontent
. But it makes sense to use all kinds of different values when applied toLink.href
, as the referenced resource can be basically anything (text or binary).Tangentially, one other thing that might work and might align better with JSON-LD / RDF is to look into "typed values", so basically something like this in expanded JSON-LD form:
Which is equivalent to the following Turtle/N-Triples:
Not sure how much sense this makes, though... What we currently have is that we use the language tag
@language
which cannot be used at the same type as@type
. An RDF Literal is either language/direction-tagged, or it is coerced to a different type, but not both. So it might make sense to leave it as a sort ofcontentType
property while the value ofcontent
remains a language-tagged literal string.The text was updated successfully, but these errors were encountered: