Skip to content

Latest commit

 

History

History

cldf

StructureDataset CLDF dataset with data and supplements for Barlow “Loss of colexification of ‘hand’ and ‘five’ in Austronesian languages”

CLDF Metadata: StructureDataset-metadata.json

Sources: sources.bib

property value
dc:conformsTo CLDF StructureDataset
dc:license https://creativecommons.org/licenses/by/4.0/
dcat:accessURL https://github.com/cldf-datasets/barlowhandandfive
prov:wasDerivedFrom
  1. cldf-datasets/barlowhandandfive v1.1-1-g948aa74
  2. Glottolog v5.0
prov:wasGeneratedBy
  1. python: 3.12.3
  2. python-packages: requirements.txt
rdf:ID barlowhandandfive
rdf:type http://www.w3.org/ns/dcat#Distribution
property value
dc:conformsTo CLDF ValueTable
dc:extent 6063

Columns

Name/Property Datatype Description
ID string
Regex: [a-zA-Z0-9_\-]+
Primary key
Language_ID string References languages.csv::ID
Parameter_ID string References parameters.csv::ID
Value string
Code_ID string References codes.csv::ID
Comment string
Source list of string (separated by ;) References sources.bib::BibTeX-key

Table forms.csv

property value
dc:conformsTo CLDF FormTable
dc:extent 2023

Columns

Name/Property Datatype Description
ID string
Regex: [a-zA-Z0-9_\-]+
Primary key
Language_ID string A reference to a language (or variety) the form belongs to
References languages.csv::ID
Parameter_ID string A reference to the meaning denoted by the form
References parameters.csv::ID
Form string The written expression of the form. If possible the transcription system used for the written form should be described in CLDF metadata (e.g. via adding a common property dc:conformsTo to the column description using concept URLs of the GOLD Ontology (such as phonemicRep or phoneticRep) as values).
Segments list of string (separated by )
Comment string
Source list of string (separated by ;) References sources.bib::BibTeX-key
Contribution_ID string Key of lexical dataset from which the form was taken.
References contributions.csv::ID
Glottocode_in_dataset string Glottocode assigned to the variety in the source dataset from which the form was selected
Language_name_in_dataset string Name of the variety in the source dataset from which the form was selected

This table lists each language-level languoid in Glottolog 5.0 classified as Austronesian. Languages are roughly sorted by genealogy and then geography, more or less reflecting the spread of Austronesian languages from Taiwan to Polynesia. This sorting is reflected by the numbers given in the “Number” column.

property value
dc:conformsTo CLDF LanguageTable
dc:extent 1274

Columns

Name/Property Datatype Description
ID string
Regex: [a-zA-Z0-9_\-]+
Primary key
Name string
Macroarea string
Latitude decimal
≥ -90
≤ 90
Longitude decimal
≥ -180
≤ 180
Glottocode string
Regex: [a-z0-9]{4}[1-9][0-9]{3}
ISO639P3code string
Regex: [a-z]{3}
Number integer
Melanesia string
Valid choices:
yes no
Languages are classified as being in Melanesia if they are primarily spoken in PG, SB, VU, NC or the Western New Guinea provinces of ID.

Forms for this study (i.e., words for the concepts ‘five’ and ‘hand’ in Austronesian languages) were taken from the four datasets listed in this table.

property value
dc:conformsTo CLDF ContributionTable
dc:extent 4

Columns

Name/Property Datatype Description
ID string
Regex: [a-zA-Z0-9_\-]+
Primary key
Name string
Description string
Contributor string
Citation string

This dataset provides three kinds of parameters: 1) The two concepts ‘hand’ and ‘five’, with the corresponding forms listed in FormTable; 2) six parameters analyzing the colexification status for these two concepts in Austronesian languages, with values listed in ValueTable; and 3) one parameter replicating coding decisions about types of numeral systems, derived from Barlow (2023) but updated here to reflect changes in classifications between Glottolog versions 4.6 and 5.0, with values also listed in ValueTable.

property value
dc:conformsTo CLDF ParameterTable
dc:extent 9

Columns

Name/Property Datatype Description
ID string
Regex: [a-zA-Z0-9_\-]+
Primary key
Name string
Description string
ColumnSpec json

Table codes.csv

property value
dc:conformsTo CLDF CodeTable
dc:extent 31

Columns

Name/Property Datatype Description
ID string
Regex: [a-zA-Z0-9_\-]+
Primary key
Parameter_ID string The parameter or variable the code belongs to.
References parameters.csv::ID
Name string
Description string
color string

This table lists coding decisions for “replacement events” for the words for ‘hand’ or ‘five’ in subgroups or single languages of the Austronesian family. For the concept ‘hand’, a row represents a probable loss of the inherited Proto-Austronesian form *qalima ‘hand’, whether in the individual history of a single language or in a protolanguage ancestral to multiple languages. For the concept ‘five’, a row represents a probable loss of the inherited Proto-Austronesian form *lima ‘five’.

Replacement events are considered taking a relatively conservative approach—that is, a replacement event is reconstructed to a protolanguage only if there is strong evidence for it and no apparent exceptions (such as a reflex of *qalima ‘hand’ found in one or more member languages of the given group).

property value
dc:extent 189

Columns

Name/Property Datatype Description
ID string Primary key
Replacement_Group string Replacement events can also be considered taking a more liberal approach—that is, replacement events can, in some cases, be reconstructed to higher-order protolanguages or to multiple protolanguages in an area, either when the apparent exceptions seem to be possibly due to subsequent borrowing or when the “replacement event” could be viewed as a single areal spread across multiple languages or language groups. The “conservative” replacement events listed here are grouped into “liberal” events via matching values for the Replacement_Group column. If there is no discrepancy between the more conservative and the more liberal approaches, an event will be in a replacement group of its own.
Subgroup string
Comment string
Source string
Concept string References parameters.csv::ID
Language_IDs list of string (separated by ) References languages.csv::ID