Split `mediaType` into two properties; one for Link, one for Object? (`contentType`?) #638

trwnh · 2025-02-09T06:06:18Z

https://www.w3.org/TR/activitystreams-vocabulary/#dfn-mediatype

When used on a Link, identifies the MIME media type of the referenced resource.

When used on an Object, identifies the MIME media type of the value of the content property. If not specified, the content property is assumed to contain text/html content.

This kind of "multiple applicability" is generally bad semantic design, since you are using the same term/symbol for different concepts. When used differently, there should be different terms.

I would say that in a next version, we consider doing something like:

mediaType
: Domain: Link
: Range: MIME media type
: Functional: True
: Comment: Identifies the MIME media type of the referenced resource (href of a Link).

contentType
: Domain: Object
: Range: MIME media type
: Functional: True
: Comment: Identifies the MIME media type of the value of the content property. If not specified, the content property is assumed to contain text/html content.

Motivationally, mediaType has incredibly limited applicability in general when applied to Object.content, with binary representations not really making sense for values of content. But it makes sense to use all kinds of different values when applied to Link.href, as the referenced resource can be basically anything (text or binary).

Tangentially, one other thing that might work and might align better with JSON-LD / RDF is to look into "typed values", so basically something like this in expanded JSON-LD form:

{
  "https://www.w3.org/ns/activitystreams#content": {
    "@value": "<p lang='en'>hello world</p>",
    "@type": "http://www.w3.org/1999/02/22-rdf-syntax-ns#HTML"
  }
}

Which is equivalent to the following Turtle/N-Triples:

_:b0 <https://www.w3.org/ns/activitystreams#content> "<p lang='en'>hello world</p>"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#HTML>

Not sure how much sense this makes, though... What we currently have is that we use the language tag @language which cannot be used at the same type as @type. An RDF Literal is either language/direction-tagged, or it is coerced to a different type, but not both. So it might make sense to leave it as a sort of contentType property while the value of content remains a language-tagged literal string.

The text was updated successfully, but these errors were encountered:

nightpool · 2025-02-10T04:35:04Z

big +1 for this, i think it makes a lot of sense to split these out.

An RDF Literal is either language/direction-tagged, or it is coerced to a

different type, but not both Is this limitation tracked anywhere? Is "coerced to a different type" the right semantics we're looking for? A request we've had in the past is that it would be very helpful to specify the media types of summary, source, and other natural language values, and it's annoying to have to define new properties for each one.

…

On Sat, Feb 8, 2025 at 10:06 PM a ***@***.***> wrote: https://www.w3.org/TR/activitystreams-vocabulary/#dfn-mediatype When used on a Link <https://www.w3.org/TR/activitystreams-vocabulary/#dfn-link>, identifies the MIME media type of the referenced resource. When used on an Object <https://www.w3.org/TR/activitystreams-vocabulary/#dfn-object>, identifies the MIME media type of the value of the content <https://www.w3.org/TR/activitystreams-vocabulary/#dfn-content> property. If not specified, the content <https://www.w3.org/TR/activitystreams-vocabulary/#dfn-content> property is assumed to contain text/html content. This kind of "multiple applicability" is generally bad semantic design, since you are using the same term/symbol for different concepts. When used differently, there should be different terms. I would say that in a next version, we consider doing something like: mediaType : Domain: Link : Range: MIME media type : Functional: True : Comment: Identifies the MIME media type of the referenced resource (href of a Link). contentType : Domain: Object : Range: MIME media type : Functional: True : Comment: Identifies the MIME media type of the value of the content property. If not specified, the content property is assumed to contain text/html content. Motivationally, mediaType has incredibly limited applicability in general when applied to Object.content, with binary representations not really making sense for values of content. But it makes sense to use all kinds of different values when applied to Link.href, as the referenced resource can be basically anything (text or binary). ------------------------------ Tangentially, one other thing that might work and might align better with JSON-LD / RDF is to look into "typed values", so basically something like this in expanded JSON-LD form: { "https://www.w3.org/ns/activitystreams#content": { ***@***.***": "hello world", ***@***.***": "http://www.w3.org/1999/02/22-rdf-syntax-ns#HTML" } } Which is equivalent to the following Turtle/N-Triples: _:b0 <https://www.w3.org/ns/activitystreams#content> "hello world"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#HTML> Not sure how much sense this makes, though... What we currently have is that we use the language tag @language which cannot be used at the same type as @type. An RDF Literal is either language/direction-tagged, or it is coerced to a different type, but not both. So it might make sense to leave it as a sort of contentType property while the value of content remains a language-tagged literal string. — Reply to this email directly, view it on GitHub <#638>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABZCV2VPFMRHMFWLIZZ5ZT2O3V67AVCNFSM6AAAAABWYPBNNSVHI2DSMVQWIX3LMV43ASLTON2WKOZSHA2DANBWGI2DAMY> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

trwnh · 2025-02-10T05:42:09Z

I think we can't fully avoid the "different new properties", and to an extent this mirrors how things like HTTP have a Content-Type header which functions identically to how as:mediaType applies to Object.content. The difference is that an AS2 document can have more than just content. But still, we're probably dealing with at most 2 or 3 properties here -- contentType and maybe summaryType if something like #620 gets considered, plus something for links (mediaType could be reused, but we could also define hrefType if we really wanted to...). We've already declared name MUST be plain-text, and that requirement makes sense. Are there other properties that need a variable media type for their literal values?

Is this limitation tracked anywhere?

RDF Literals https://w3c.github.io/rdf-concepts/spec/#section-Graph-Literal are composed of the following:

Lexical form. This is a simple string representation of the value.
Datatype IRI. This is what the lexical form gets "coerced to".

In concrete syntaxes like JSON-LD or Turtle, there is often syntactic sugar for "simple literals", which don't have an explicitly stated datatype, but instead the datatype is inferred based on the syntax. For example, "foo" in Turtle is equivalent to "foo"^^<http://www.w3.org/2001/XMLSchema#string> by default. 1 in Turtle is equivalent to "1"^^<http://www.w3.org/2001/XMLSchema#integer>. For JSON-LD, similar syntactic sugar is used to convert a JSON string value into an xsd:string Literal, JSON numbers into either xsd:integer or xsd:double Literal, and JSON boolean into xsd:boolean Literal.

IFF the datatype IRI is http://www.w3.org/1999/02/22-rdf-syntax-ns#langString, then a third component is the BCP47 language (JSON-LD @language, Turtle "hello"@en).
IFF the datatype IRI is http://www.w3.org/1999/02/22-rdf-syntax-ns#dirLangString, then there is a third component for the language as above, as well as a fourth component for the direction (ltr or rtl, expressed with JSON-LD @direction or with Turtle "hello"@en--ltr)

If you try to use both features on the JSON-LD playground this becomes a bit more apparent:

{"https://www.w3.org/ns/activitystreams#content": {
  "@value": "<p>Hello world</p>",
  "@language": "en",
  "@type": "http://www.w3.org/1999/02/22-rdf-syntax-ns#HTML"
}}

jsonld.SyntaxError: Invalid JSON-LD syntax; an element containing "@value" may not contain both "@type" and either "@language" or "@direction".

RDF 1.2 also warns about this when defining rdf:HTML like so: https://w3c.github.io/rdf-concepts/spec/#section-html

Any language annotation (lang="…"), text directionality annotation (dir="…"), or XML namespaces (xmlns) desired in the HTML content must be included explicitly in the HTML literal. [...]

So in expanded JSON-LD form, these are fine:

{"https://www.w3.org/ns/activitystreams#content": {
  "@value": "<p>Hello world</p>",
  "@language": "en"
}}

{"https://www.w3.org/ns/activitystreams#content": {
  "@value": "<p lang='en'>Hello world</p>",
  "@type": "http://www.w3.org/1999/02/22-rdf-syntax-ns#HTML"
}}

The former is a language-tagged string (rdf:langString) in the English language (en), equivalent to "Hello world"@en... but applications know out-of-band that they can parse it as HTML (based on contentType or its default value).

The latter is an HTML Literal, which is not the same as a language-tagged string. Literals can only have one datatype, unlike Resources which can have multiple types/classes via rdf:type. As an HTML Literal, you know to parse it as HTML, but any language or direction information needs to be encoded in-band within the Literal's value.

Based on my current understanding, we essentially have to choose between:

Define datatype IRIs for every MIME type we are interested in using, a la rdf:HTML (so maybe something like example:Markdown as a datatype IRI for Markdown literals?)
Use MIME types in a separate property (so something like contentType: "text/html" informs how to process the value of content)
- If we wanted to go further in a not-so-backwards-compatible way, we could bundle these together into an object node? ({"value": "foo", "mediaType": "text/plain"}) -- but this would probably not be worth it, since it's not directly a natural language property anymore...

In the interest of taking the least destructive path, probably a simple single contentType will work fine here. If there was justification for exploding it or unpacking it further, then maybe that could be done, but I don't particularly see that justification right now. If anything, it would make more sense to define a content property that was always HTML (typed value rdf:HTML Literal) and then encode the language and direction within that HTML literal, but this is perhaps similarly unjustifiable at this point.

nightpool · 2025-02-10T06:35:51Z

is there any particular justification to restrict language tagging to only a single data type? that feels like the most parsimonious restriction to loosen

…

On Sun, Feb 9, 2025, 9:42 PM a ***@***.***> wrote: I think we can't fully avoid the "different new properties", and to an extent this mirrors how things like HTTP have a Content-Type header which functions identically to how as:mediaType applies to Object.content. The difference is that an AS2 document can have more than just content. But still, we're probably dealing with at most 2 or 3 properties here -- contentType and maybe summaryType if something like #620 <#620> gets considered, plus something for links (mediaType could be reused, but we could also define hrefType if we *really* wanted to...). We've already declared name MUST be plain-text, and that requirement makes sense. Are there other properties that need a variable media type for their literal values? ------------------------------ Is this limitation tracked anywhere? RDF Literals https://w3c.github.io/rdf-concepts/spec/#section-Graph-Literal are composed of the following: - Lexical form. This is a simple string representation of the value. - Datatype IRI. This is what the lexical form gets "coerced to". In concrete syntaxes like JSON-LD or Turtle, there is often syntactic sugar for "simple literals", which don't have an explicitly stated datatype, but instead the datatype is inferred based on the syntax. For example, "foo" in Turtle is equivalent to "foo"^^< http://www.w3.org/2001/XMLSchema#string> by default. 1 in Turtle is equivalent to "1"^^<http://www.w3.org/2001/XMLSchema#integer>. For JSON-LD, similar syntactic sugar is used to convert a JSON string value into an xsd:string Literal, JSON numbers into either xsd:integer or xsd:double Literal, and JSON boolean into xsd:boolean Literal. - IFF the datatype IRI is http://www.w3.org/1999/02/22-rdf-syntax-ns#langString, then a third component is the BCP47 language (JSON-LD @language, Turtle ***@***.***). - IFF the datatype IRI is http://www.w3.org/1999/02/22-rdf-syntax-ns#dirLangString, then there is a third component for the language as above, as well as a fourth component for the direction (ltr or rtl, expressed with JSON-LD @direction or with Turtle ***@***.***) If you try to use both features on the JSON-LD playground this becomes a bit more apparent: {"https://www.w3.org/ns/activitystreams#content": { ***@***.***": "Hello world", ***@***.***": "en", ***@***.***": "http://www.w3.org/1999/02/22-rdf-syntax-ns#HTML" }} jsonld.SyntaxError: Invalid JSON-LD syntax; an element containing ***@***.*** <https://github.com/value>" may not contain both ***@***.*** <https://github.com/type>" and either ***@***.*** <https://github.com/language>" or ***@***.*** <https://github.com/direction>". RDF 1.2 also warns about this when defining rdf:HTML like so: https://w3c.github.io/rdf-concepts/spec/#section-html Any language annotation (lang="…"), text directionality annotation (dir="…"), or XML namespaces (xmlns) desired in the HTML content must be included explicitly in the HTML literal. [...] So in expanded JSON-LD form, these are fine: {"https://www.w3.org/ns/activitystreams#content": { ***@***.***": "Hello world", ***@***.***": "en" }} {"https://www.w3.org/ns/activitystreams#content": { ***@***.***": "Hello world", ***@***.***": "http://www.w3.org/1999/02/22-rdf-syntax-ns#HTML" }} The former is a language-tagged string (rdf:langString) in the English language (en), equivalent to "Hello ***@***.*** but applications know out-of-band that they can parse it as HTML (based on contentType or its default value). The latter is an HTML Literal, which is not the same as a language-tagged string. Literals can only have one datatype, unlike Resources which can have multiple types/classes via rdf:type. As an HTML Literal, you know to parse it as HTML, but any language or direction information needs to be encoded in-band within the Literal's value. Based on my current understanding, we essentially have to choose between: - Define datatype IRIs for every MIME type we are interested in using, a la rdf:HTML (so maybe something like example:Markdown as a datatype IRI for Markdown literals?) - Use MIME types in a separate property (so something like contentType: "text/html" informs how to process the value of content) - If we wanted to go further in a not-so-backwards-compatible way, we could bundle these together into an object node? ({"value": "foo", "mediaType": "text/plain"}) -- but this would probably not be worth it, since it's not directly a natural language property anymore... In the interest of taking the least destructive path, probably a simple single contentType will work fine here. If there was justification for exploding it or unpacking it further, then maybe that could be done, but I don't particularly see that justification right now. If anything, it would make more sense to define a content property that was always HTML (typed value rdf:HTML Literal) and then encode the language and direction within that HTML literal, but this is perhaps similarly unjustifiable at this point. — Reply to this email directly, view it on GitHub <#638 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABZCV4HMJM2BQ7ST3MBROD2PA34NAVCNFSM6AAAAABWYPBNNSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBWHE3DSOJZHA> . You are receiving this because you commented.Message ID: ***@***.***>

trwnh · 2025-02-10T06:41:14Z

If that restriction is to be loosened, it needs to be loosened all the way down the stack at the RDF abstract level (and then at the JSON-LD concrete level). I'm not sure how we would proceed there...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split `mediaType` into two properties; one for Link, one for Object? (`contentType`?) #638

Split `mediaType` into two properties; one for Link, one for Object? (`contentType`?) #638

trwnh commented Feb 9, 2025

nightpool commented Feb 10, 2025 via email

trwnh commented Feb 10, 2025

nightpool commented Feb 10, 2025 via email

trwnh commented Feb 10, 2025

Split mediaType into two properties; one for Link, one for Object? (contentType?) #638

Split mediaType into two properties; one for Link, one for Object? (contentType?) #638

Comments

trwnh commented Feb 9, 2025

nightpool commented Feb 10, 2025 via email

trwnh commented Feb 10, 2025

nightpool commented Feb 10, 2025 via email

trwnh commented Feb 10, 2025

Split `mediaType` into two properties; one for Link, one for Object? (`contentType`?) #638

Split `mediaType` into two properties; one for Link, one for Object? (`contentType`?) #638