Skip to content

Commit

Permalink
release 1.2
Browse files Browse the repository at this point in the history
  • Loading branch information
xrotwang committed Nov 6, 2024
1 parent 1087aa3 commit 2d1c7db
Show file tree
Hide file tree
Showing 20 changed files with 132,459 additions and 81,308 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -128,4 +128,5 @@ dmypy.json
# Pyre type checker
.pyre/
.idea/

raw/cldf/*.json
raw/cldf/*/
11 changes: 11 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Changelog

Changes between releases of the IMTVault dataset.


## [v1.2] - 2024-11-06

- Compiled against Glottolog 5.1.
- Added IGT examples aggregated from CLDF datasets.
- Added interlinear morpheme translations for new Glossa and Language Science Press publications.

2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ well as the [paper introducing IMTVault](http://www.lrec-conf.org/proceedings/lr

Distribution of examples in IMTVault across the languages of the world:

![](map.jpg?pacific-centered&language-properties=Examples_Count_Log&language-properties-colormaps=viridis&width=20&height=10&padding-left=5&padding-right=5&padding-top=5&padding-bottom=5&format=jpg&markersize=12#cldfviz.map)
![](map.jpg?pacific-centered&language-properties=Examples_Count_Log&language-properties-colormaps=viridis&width=20&height=10&padding-left=5&padding-right=5&padding-top=5&padding-bottom=5&format=jpg&markersize=12&with-ocean#cldfviz.map)


## How to use
Expand Down
5 changes: 3 additions & 2 deletions RELEASING.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,10 @@ preferably in a separate virtual environment.
```shell
cldfbench download cldfbench_imtvault.py
```
- Recreate the CLDF running
- Recreate the CLDF running (~10mins)
```shell
cldfbench makecldf --with-cldfreadme --with-zenodo cldfbench_imtvault.py --glottolog-version v4.7
cldfbench makecldf --with-cldfreadme --with-zenodo cldfbench_imtvault.py --glottolog-version v5.1
cldf validate cldf
```
- Recreate the README running
```shell
Expand Down
33 changes: 22 additions & 11 deletions cldf/Generic-metadata.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,32 +17,32 @@
{
"rdf:about": "https://github.com/cldf-datasets/imtvault",
"rdf:type": "prov:Entity",
"dc:created": "v1.0-2-g0fea706",
"dc:created": "v1.1",
"dc:title": "Repository"
},
{
"rdf:about": "https://github.com/glottolog/glottolog",
"rdf:type": "prov:Entity",
"dc:created": "v4.8",
"dc:created": "v5.1",
"dc:title": "Glottolog"
},
{
"rdf:about": "https://github.com/xrotwang/glossa_xml",
"rdf:about": "https://github.com/langsci/raw_texfiles",
"rdf:type": "prov:Entity",
"dc:created": "50441cf",
"dc:created": "2dcdd57",
"dc:title": "Repository"
},
{
"rdf:about": "https://github.com/langsci/raw_texfiles",
"rdf:about": "https://github.com/xrotwang/glossa_xml",
"rdf:type": "prov:Entity",
"dc:created": "56dd5e7",
"dc:created": "e66218c",
"dc:title": "Repository"
}
],
"prov:wasGeneratedBy": [
{
"dc:title": "python",
"dc:description": "3.10.12"
"dc:description": "3.12.3"
},
{
"dc:title": "python-packages",
Expand All @@ -55,7 +55,7 @@
{
"dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#ContributionTable",
"dc:description": "Source publications from which IGT examples are extracted are listed as Contributions.",
"dc:extent": 1025,
"dc:extent": 1128,
"tableSchema": {
"columns": [
{
Expand Down Expand Up @@ -104,7 +104,7 @@
},
{
"dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#LanguageTable",
"dc:extent": 1132,
"dc:extent": 1611,
"rdfs:comment": "We add a pseudo-language with ID `undefined` to be able to add examples with unknown object language.",
"tableSchema": {
"columns": [
Expand Down Expand Up @@ -185,7 +185,7 @@
},
{
"dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#ExampleTable",
"dc:extent": 79522,
"dc:extent": 121596,
"tableSchema": {
"columns": [
{
Expand Down Expand Up @@ -246,6 +246,17 @@
"required": false,
"name": "Meta_Language_ID"
},
{
"dc:description": "The level of conformance of the example with the Leipzig Glossing Rules",
"dc:extent": "singlevalued",
"datatype": {
"base": "string",
"format": "WORD_ALIGNED|MORPHEME_ALIGNED"
},
"propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#lgrConformance",
"required": false,
"name": "LGR_Conformance"
},
{
"datatype": "string",
"propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#comment",
Expand All @@ -255,7 +266,7 @@
{
"datatype": {
"base": "string",
"format": "LGRConformance\\.MORPHEME_ALIGNED|LGRConformance\\.WORD_ALIGNED|LGRConformance\\.UNALIGNED"
"format": "2|1|0"
},
"name": "LGR_Conformance_Level"
},
Expand Down
27 changes: 14 additions & 13 deletions cldf/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ property | value
[dc:identifier](http://purl.org/dc/terms/identifier) | https://imtvault.org
[dc:license](http://purl.org/dc/terms/license) | https://creativecommons.org/licenses/by/4.0/
[dcat:accessURL](http://www.w3.org/ns/dcat#accessURL) | https://github.com/cldf-datasets/imtvault
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/cldf-datasets/imtvault/tree/0fea706">cldf-datasets/imtvault v1.0-2-g0fea706</a></li><li><a href="https://github.com/glottolog/glottolog/tree/v4.8">Glottolog v4.8</a></li><li><a href="https://github.com/xrotwang/glossa_xml/tree/50441cf">xrotwang/glossa_xml 50441cf</a></li><li><a href="https://github.com/langsci/raw_texfiles/tree/56dd5e7">langsci/raw_texfiles 56dd5e7</a></li></ol>
[prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) | <ol><li><strong>python</strong>: 3.10.12</li><li><strong>python-packages</strong>: <a href="./requirements.txt">requirements.txt</a></li></ol>
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/cldf-datasets/imtvault/tree/v1.1">cldf-datasets/imtvault v1.1</a></li><li><a href="https://github.com/glottolog/glottolog/tree/v5.1">Glottolog v5.1</a></li><li><a href="https://github.com/langsci/raw_texfiles/tree/2dcdd57">langsci/raw_texfiles 2dcdd57</a></li><li><a href="https://github.com/xrotwang/glossa_xml/tree/e66218c">xrotwang/glossa_xml e66218c</a></li></ol>
[prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) | <ol><li><strong>python</strong>: 3.12.3</li><li><strong>python-packages</strong>: <a href="./requirements.txt">requirements.txt</a></li></ol>
[rdf:ID](http://www.w3.org/1999/02/22-rdf-syntax-ns#ID) | imtvault
[rdf:type](http://www.w3.org/1999/02/22-rdf-syntax-ns#type) | http://www.w3.org/ns/dcat#Distribution

Expand All @@ -28,14 +28,14 @@ Source publications from which IGT examples are extracted are listed as Contribu
property | value
--- | ---
[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF ContributionTable](http://cldf.clld.org/v1.0/terms.rdf#ContributionTable)
[dc:extent](http://purl.org/dc/terms/extent) | 1025
[dc:extent](http://purl.org/dc/terms/extent) | 1128


### Columns

Name/Property | Datatype | Description
--- | --- | ---
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string` | Primary key
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string`<br>Regex: `[a-zA-Z0-9_\-]+` | Primary key
[Name](http://cldf.clld.org/v1.0/terms.rdf#name) | `string` |
[Description](http://cldf.clld.org/v1.0/terms.rdf#description) | `string` |
[Contributor](http://cldf.clld.org/v1.0/terms.rdf#contributor) | `string` |
Expand All @@ -47,21 +47,21 @@ Name/Property | Datatype | Description
property | value
--- | ---
[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF LanguageTable](http://cldf.clld.org/v1.0/terms.rdf#LanguageTable)
[dc:extent](http://purl.org/dc/terms/extent) | 1132
[dc:extent](http://purl.org/dc/terms/extent) | 1611
[rdfs:comment](http://www.w3.org/2000/01/rdf-schema#comment) | We add a pseudo-language with ID `undefined` to be able to add examples with unknown object language.


### Columns

Name/Property | Datatype | Description
--- | --- | ---
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string` | Primary key
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string`<br>Regex: `[a-zA-Z0-9_\-]+` | Primary key
[Name](http://cldf.clld.org/v1.0/terms.rdf#name) | `string` |
[Macroarea](http://cldf.clld.org/v1.0/terms.rdf#macroarea) | `string` |
[Latitude](http://cldf.clld.org/v1.0/terms.rdf#latitude) | `decimal` |
[Longitude](http://cldf.clld.org/v1.0/terms.rdf#longitude) | `decimal` |
[Glottocode](http://cldf.clld.org/v1.0/terms.rdf#glottocode) | `string` |
[ISO639P3code](http://cldf.clld.org/v1.0/terms.rdf#iso639P3code) | `string` |
[Latitude](http://cldf.clld.org/v1.0/terms.rdf#latitude) | `decimal`<br>&ge; -90<br>&le; 90 |
[Longitude](http://cldf.clld.org/v1.0/terms.rdf#longitude) | `decimal`<br>&ge; -180<br>&le; 180 |
[Glottocode](http://cldf.clld.org/v1.0/terms.rdf#glottocode) | `string`<br>Regex: `[a-z0-9]{4}[1-9][0-9]{3}` |
[ISO639P3code](http://cldf.clld.org/v1.0/terms.rdf#iso639P3code) | `string`<br>Regex: `[a-z]{3}` |
`Examples_Count` | `integer` |
`Examples_Count_Log` | `number` |

Expand All @@ -70,22 +70,23 @@ Name/Property | Datatype | Description
property | value
--- | ---
[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF ExampleTable](http://cldf.clld.org/v1.0/terms.rdf#ExampleTable)
[dc:extent](http://purl.org/dc/terms/extent) | 79522
[dc:extent](http://purl.org/dc/terms/extent) | 121596


### Columns

Name/Property | Datatype | Description
--- | --- | ---
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string` | Primary key
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string`<br>Regex: `[a-zA-Z0-9_\-]+` | Primary key
[Language_ID](http://cldf.clld.org/v1.0/terms.rdf#languageReference) | `string` | References [languages.csv::ID](#table-languagescsv)
[Primary_Text](http://cldf.clld.org/v1.0/terms.rdf#primaryText) | `string` | The example text in the source language.
[Analyzed_Word](http://cldf.clld.org/v1.0/terms.rdf#analyzedWord) | list of `string` (separated by ` `) | The sequence of words of the primary text to be aligned with glosses
[Gloss](http://cldf.clld.org/v1.0/terms.rdf#gloss) | list of `string` (separated by ` `) | The sequence of glosses aligned with the words of the primary text
[Translated_Text](http://cldf.clld.org/v1.0/terms.rdf#translatedText) | `string` | The translation of the example text in a meta language
[Meta_Language_ID](http://cldf.clld.org/v1.0/terms.rdf#metaLanguageReference) | `string` | References the language of the translated text<br>References [languages.csv::ID](#table-languagescsv)
[LGR_Conformance](http://cldf.clld.org/v1.0/terms.rdf#lgrConformance) | `string`<br>Valid choices:<br> `WORD_ALIGNED` `MORPHEME_ALIGNED` | The level of conformance of the example with the Leipzig Glossing Rules
[Comment](http://cldf.clld.org/v1.0/terms.rdf#comment) | `string` |
`LGR_Conformance_Level` | `string` |
`LGR_Conformance_Level` | `string`<br>Valid choices:<br> `2` `1` `0` |
`Language_Name` | `string` | Name of the object language as used in the source publication.
`Abbreviations` | `json` | Mapping of gloss abbreviations used in the examples to descriptions of their meaning.
`Corpus_Reference` | `string` | Identifies the location of the example in the underlying corpus
Expand Down
Loading

0 comments on commit 2d1c7db

Please sign in to comment.