The notes below reports many discussion points and decisions made together with the ELIXIR Plant Community. This document is an updated version of a discusssion document that emerged from the ELIXIR Biohackathon 2021 and 2022.
The real use case we have considered to test our modelling approach is based on:
Optimizing experimental procedures for quantitative evaluation of crop plant performance in high throughput phenotyping systems.
The experiment contains several measurement types, each having specific result types and data files.
Here, we consider only the factors reported in the ISA-Tab, "soil cover" and "plant movement".
IMPORTANT: see detailed explainations in the next section.
Most of the MIAPPE mapping work has been done within the ELIXIR plant community. Hereby, we describe some of the choices made.
schema:Dataset
should be used to representMIAPPE:Investigation
, since this is the type that is most compatible with schema.org/bioschemas.- It's recommended to add a qualifier like
schema:additionalType ppeo:investigation
-
Use
bioschema:Study
for this. Possibly useschema:additionalType ppeo:study
, as above -
Study's location: use
schema:studyLocation
for this. Hoefully, this issue with it will be fixed at some point. -
Study's contact: use
schema:contactPoint
for this. Hopefully, this issue with it will be fixed at some point.- If you don't know if a study contributor is also a contact point, use the more generic predicate
schema:contributor
.
- If you don't know if a study contributor is also a contact point, use the more generic predicate
-
Use something like
bioschema:studyDomain <http://edamontology.org/topic_3810>
(agricultural science) to be compatible with the bioschema profile. -
Use something like
schema:studySubject <http://purl.obolibrary.org/obo/NCBITaxon_3702>
to be compatible with the bioschema profile (also, see notes below about subjectOf).
- As you can see in the diagram above, we have defined subclasses of
schema:BioChemEntity
, which reflect the MIAPPE terminology: agri:FieldTrialBioMaterial
agri:FieldTrialMaterialSource
agri:FieldTrialObservationUnit
bioschema:BioSample
to represent a MIAPPE sample.
-
schema:subjectOf
(orschema:about
) to be used to linkBioChemEntity
tobioschema:Study
(and consequently, the subclasses above).schema:studySubject
would be a more precise alternative and it's mentioned by the bioschemas study profile. However, its current domain/range (MedicalStudy) aren't suitable for plant biology, and this property seems to be specifically about the organism, plant, patient, etc that have been studied, while here, we are linking all the biochem entities that have been used for the purpose of studying one of them (eg, a plant variety).
-
Another alternative might be
schema:partOf
, but currently it hasn't the right domain/range for this, and it seems there is a peculiar view on schema.org community about these generic mereological relationships (example. -
Use
bioschema:taxonomicRange
to describe the specie of any of these BioChemEntity(s). This is as per Bioschema specifications, see also here.dc:type
can redundantly be used for the organism, for sake of interoperability -
Use
bioschema:isPartOfBioChemEntity
(orhasBioChemEntityPart
) to relateagri:FieldTrialBioMaterial
from (or to)agri:FieldTrialMaterialSource
(as per bioschemas profile) -
Same relation to be used to relate
bioschema:BioSample
toagri:FieldTrialObservationUnit
, and this toagri:FieldTrialBioMaterial
-
We don't see such a need to introduce specific properties for this (like
agri:hasBioMaterial
,agri:hasMaterialSource
). -
Use
dc:type
inagri:FieldTrialObservationUnit
for MIAPPE:"Observation Unit Type" -
We considered a specific a super-class to group the ones introduced above, we gave up with this, since this class wouldn't add much to
schema:BioChemEntity
.- See details here about other alternatives we have considered, before reaching consensus on the modelling described above.
- Use
dc:type
to qualify the value of a name/value pair. For instance, attachdc:type <http://purl.obolibrary.org/obo/NCBITaxon_3702>
to aschema:PropertyValue
instance with name: "organism" and value "arabidopsis." - In general, don't use
schema:additionalType
for informal qualifier, since this property is much more committing, as explained here. - When you want to qualify the type of a PV, prefer
schema:propertyID
. For instance: name: "p-value",propertyID: sio:SIO_000765
, value: 0.01. This is the purpose propertyID is designed for.
To clarify the type qualifier further, consider this:
ex:sample_prop a schema:PropertyValue;
schema:name 'Phenotype'
schema:value "14 day post anthesis, aleurone layer";
# These can come from eg, manual curation or auto-annotation
# (of course multiple terms are possible)
dc:type
<http://purl.obolibrary.org/obo/PO_0005360>, # aleurone layer
<http://www.cropontology.org/rdf/CO_321:0000434>; # day
.
As said above, this property is not exactly an instance of day, and linking the ontology annotations via something that implies rdf:type
would also lead to the property being in the intersection of 'day' and 'aleurone layer', which might be disjoint classes and hence the fragment above would lead to an inconsistency.
Similarly, schema:propertyID <aleurone layer>
wouldn't be correct, cause we are not giving a value for the aleurone layer property, rather, we're defining something that is loosely a subclass of that (note that skos:narrowerInstantive
might be more precise than dc:type
, we prefer the latter for sake of simplicity).
- In certain cases, we define specific value/type properties, such as
agri:hasExperimentalFactorType
(see below). We do this when we want to be explicit on the kind of instantive relation we are representing, mostly to reflect the terminology used in the application domain. - When we introduce these new properties, they're defined as subproperties of
dc:type
.
PV extensions that have a value/type relationship (eg, ExperimentalFactorValue/ExperimentalFactorType )
- The following sections apply to cases like factor value vs factor type, experiment design (as type only), protocol parameter value vs parameter name/type, observed value vs observed variable. They're similar cases in the structural sense: essentially, they're all value/type pairs, and they all have similar properties, such as name, specific value, ontology term annotations. This is similar to other formats, such as ISA-Tab, MAGE-Tab, etc.
- As said above, in a number of cases (see below), we want/need to introduce new types, for a number of reasons, eg, they're part of the common terminology, they're needed qualifiers.
- As outlined above, we chose to relate them via
dc:type
, eg,ex:nitrogenHigh a agri:ExperimentalFactorValue; schema:name "Nitrogen, High Concentration"; schema:value "High Concentration"; dc:type ex:fertilizer; . ex:fertilizer a agri:ExperimentalFactorType schema:name "Experimental Factor Type"; schema:value "Fertilizer Type"; .
- both
agri:ExperimentalFactorValue
andagri:ExperimentalFactorType
are defined as subclasses ofschema:PropertyValue
- both
- until new terms like ExperimentalFactorValue aren't accepted by schema.org (and aren't parsed by tools like the Google validator), it is advisable to define these type instances by adding
schema:PropertyValue
as their class (eg,ex:nitrogenHigh a agri:ExperimentalFactorValue, schema:PropertyValue
. This is redundant, but it can help with interoperability. - Note that, according to the above
dc:type
can be used for both generic links to the URIs of external ontology terms (see the previous section) and for linking local PV types, as ghe case presented here. It's easy (in SPARQL or alike) to distinguish between the two cases (?pval dc:type ?ptype. ?ptype a agri:ExperimentalFactorType
). The case of a generic ontology term can possibly be made clearer by usingschema:DefinedTerm
, egex:sampleProp a agri:ExperimentalFactorValue, schema:PropertyValue; schema:name "aleurone layer"; dc:type <http://purl.obolibrary.org/obo/PO_0005360>, ex:plantPart. <http://purl.obolibrary.org/obo/PO_0005360> a schema:DefinedTerm.
For the property values used as types (eg, factor type), use and/or extend schema:PropertyValue
, as said above. Then:
- Use
schema:name
to describe the category/type (eg, "experimental factor type", "protocol parameter type"). - When you are using a specific extension of
schema:PropertyValue
, eg,ExperimentalFactorType
, this is redundant (the new type says the same thing the name says), yet it might be useful for visualisation/UI purposes and to ensure applications like Google get the data. - Use
schema:value
for the specific type (eg, "Treatment", "Watering")
For the PVs used as values (of a type PV), use or extend schema:PropertyValue
too, as above, and then:
- use
schema:name
use the same or similar value used for its type (eg, "Treatment", "Treatment Type", "Factor Value[Treatment]) - Again, this is redundant when you're linking the type via
dc:type
as explained above, but might be useful. - USe
schema:value
for the value (eg, "Nitrogen fertiliser", "Watered 2 times a day", "untreated")
For details, see below (eg, ExperimentalFactorValue, ExperimentalFactorType, ExperimentalDesign).
The following applies what said above for value/type pairs.
- extends
schema:PropertyValue
schema:name
is "study design", and it's optionalschema:value
is a short textual description of the experimental design (summary or title)schema:description
maps MIAPPE'sstudy design description
- Optionally, the description could be duped into
schema:value
(to support common practice with schema.org)
- Optionally, the description could be duped into
schema:propertyID
should haveppeo:experimental_design
among its values, ie, to represent the concept that the property is about the design type. Again, this is redundat, yet useful- TODO: an alternative is that we propose ExperimentalDesign as a new term and that it
subclasses
ppeo:experimental_design
(and PV)
- TODO: an alternative is that we propose ExperimentalDesign as a new term and that it
subclasses
- possibly,
dc:type
for ontology linking of which a particular value is an instance, eg,schema:value "factorial design with 2 factors"
,dc:type <http://purl.obolibrary.org/obo/OBI_0500015>
- links from study via new property
studyExperimentalDesign
?- this was proposed during the hackathon, but there is little point, search engines will
recognise
schema:additionalProperty
only, not this. We have usedadditionalProperty
in our use case.
- this was proposed during the hackathon, but there is little point, search engines will
recognise
- As above, this is an extension of PropertyValue
- ontology terms added as above, using dc:type
- we introduce
agri:hasExperimentalFactorType
to link to the type. - we also introduce the symmetrical property
agri:hasExperimentalFactorValue
, so that there is an easy way to list all the possible factor values for a type.- These two properties are only to link value/type. To link from other entitied (eg, study -> factor types or values), use
schema:additionalProperty
- Moreover, you might want to keep things simple, by using
dc:type
in place of these specific new predicates. It's always possible to know the factor type for a factor value, by including the classes in the queries.
- These two properties are only to link value/type. To link from other entitied (eg, study -> factor types or values), use
- this is optional, ExperimentalFactorValue already can accomodate a type qualifier in its name. However, we expect it to be used in most cases (as per common practices)
- extends
PropertyValue
, as explained above - Study/Type link is established via additionalProperty (see the discussion about the experimental design above)
ISATab uses protocols for Events and growth protocol parameters for Environment. It all sounds really weird, rainfall is fundamentally different from watering, in one case there is something that happens in uncontrolled and unplanned way, in the other there is an intended action.
So, our proposal is as follow.
- We propose the new
agri:StudyEvent
as subclass ofschema:Action
. This mapsMIAPPE:Event
generically, OR, possibly, we have more specific subclasses: bioschema:LabProcess
(which is already being proposed as anAction
) is made a subclass ofbioschema:StudyEvent
.- According to documentation,
bioschema:bioSample
can link genericbioschema:BioChemEntity
, not necessarily biosamples. So, this could be used to link the input material of a lab protocol bioschema:bioSample
is made a subproperty ofschema:object
, in order to make it coherent with the upper model given for theAction
class.
- According to documentation,
bioschema:LabProtocol
is used to model a protocol as a plan, which can be applied in a place and time, ie, it's linked tobioschema:LabProcess
.- Using Action allows for using
schema:result
, with all subclasses in the hierarchy, includingLabProcess
- Domain/range of object/result of a study event include observation unit, sample, data download
- We propose the new
bioschema:StudyExternalEvent
, as a subclass ofbioschema:StudyEvent
and distinct fromLabProcess
- Use
dc:type
to qualify the event type with MIAPPE/PPOE terms, eg, "growing protocol", "rainfall", "watering". This applies to both protocols and external events. - As elsewhere,
schema:subjectOf
linksStudyEvent
toStudy
(and hence, all the mentioned subclasses)- As said above prefer this to
schema:partOf
.schema:studySubject
has a very different meaning.
- As said above prefer this to
- ==> We still see issues with the way Bioschemas is proposing to model
LabProtocol
/LabProcess
, see here
- We propose the new
agri:LabProtocolParameterValue
as subclass ofschema:PropertyValue
. Similarly, introduceagri:LabProtocolParameterType
(both modelled as per the discussion above on value/type pairs). - Introduce
agri:hasLabProtocolParameterValue
as subproperty ofdc:type
, to link a param type to its values. Introduce its inverse,agri:hasLabProtocolParameterType
.- As in other cases, these two properties are only to link value/type. To link from other entitied (eg, event -> parameters), use
schema:additionalProperty
- Moreover, you might want to keep things simple, by using
dc:type
in place of these specific new predicates (see the factor value case).
- As in other cases, these two properties are only to link value/type. To link from other entitied (eg, event -> parameters), use
- We use
bioschema:parameterValue
to link abioschema:LabProcess
toagri:LabProtocolParameterValue
. Coherently, we introduce agri:parameter, to link fromLabProtocol
toagri:LabProtocolParameterType
.
- modelled in a way similar to
ExperimentalFactorType
(and, again, like value/type pair) - So, it extends
PropertyValue
, ontology term annotations are added as above - Possibly the link from study is established via a new property like
agri:studyObservedVariable
- currently, we use
schema:additionalProperty
- currently, we use
-
Another value/type case.
-
extends
PropertyValue
, ontology terms added as above -
Links StudyObservedVariable via
agri:hasObservedVariable
- a subproperty of
dc:type
, as above - As in other cases, these two properties are only to link value/type. To link from other entitied (eg, sample -> obs value), use
schema:additionalProperty
- Moreover, you might want to keep things simple, by using
dc:type
in place of these specific new predicates (see the factor value case).
- a subproperty of
-
Use
schema:additionalProperty
to link to this fromschema:DataDownload
,bioschema:BioSample
,agri:FieldTrialObservationUnit
.- We haven't decided if to add something like
agri:hasObservedValue
for this.
- We haven't decided if to add something like
-
TODO: an alternative to link from DataDownload is
schema:variableMeasured
, but this would require a domain addition (to be asked to schema.org). -
for MIAPPE/scale, use
schema:unitText
and/orschema:unitCode
-
for MIAPPE/trait, new property
hasObservedValueTrait
as subproperty ofadditionalProperty
(see above), use PropertyValue to describe the trait -
for method, use
bioschema:measurementTechnique
, withPropertyValue
orLabProtocol
as possible targets- requires bioschema range extension
-
for time scale,
schema:additionalProperty
withPropertyValue.name = "time scale"
-
We have considered
schema:Observation
for this StudyObservedValue, but currently, it has very different assumptions and objectives
- To be used for anything that is computed from data, in particular, from observed values
- Examples:
- average yield
- p-value from a t-student test over mean yield difference
- p-value from some ANOVA analysis of gene expression levels.
- average yield
- This would be a simple and very uniform way to represent experiment results, including those that were derived/computed
- See MIAPPE use case, TODO: gene expression use case to be aligned to this
- Considering it a form of observed value (ie, a subclass of it) might be controversial
- As mentioned above, we might want to consider
schema:Observation
- Just as above, it is the counterpart of computed value
- This is to represent
MIAPPE:DataFile
(use this qualifier with additionalType) - It's is used to report data files associated to samples or observation units, or generically to the whole study
- Use
agri:evidence
to link data to entities like samples or studies- An alternative for this would be
schema:subjectOf
(or the symmetricschema:about
). But the new evidence property is more explicit. Possibly, add these schema annotations too, for interoperability.
- An alternative for this would be
- Proposal: use an instance of
agri:LabProtocolParameterValue
andagri:LabProtocolParameterType
- linked to Study, via
additionalProperty
(optional)
- one of the
LabProtocolParameter
(s) that can be associated to growth protocol
- an
agri:LabProtocol
of type "Growth Protocol" - Growth facility is a
LabProtocolParameterValue
, linked to this protocol - with
schema:propertyID ppeo:growth_facility
schema:name "growth facility"
schema:value
for the short name/title, if anyschema:description
for "description of growth facility" (similarly to ExperimentalDesign above)
- as for Growth facility or ExperimentalDesign