Skip to content

Latest commit

 

History

History
441 lines (269 loc) · 40.9 KB

ega-4-defs-ega-file-object.md

File metadata and controls

441 lines (269 loc) · 40.9 KB

EGA File object Schema

https://raw.githubusercontent.com/EbiEga/ega-metadata-schema/main/schemas/EGA.experiment.json#/properties/experimentTypeSpecifications/properties/arrayExperiment/properties/adfFiles/items

Object containing the base metadata attributes of a file object in the EGA. These can inherited elsewhere with or without extending them.

Abstract Extensible Status Identifiable Custom Properties Additional Properties Access Restrictions Defined In
Can be instantiated No Unknown status No Forbidden Forbidden none EGA.experiment.json*

items Type

object (EGA File object)

all of

items Properties

Property Type Required Nullable Defined by
filename string Required cannot be null EGA common metadata definitions
fileContent array Optional cannot be null EGA common metadata definitions
filetype string Required cannot be null EGA common metadata definitions
checksumMethod string Required cannot be null EGA common metadata definitions
unencryptedChecksum Merged Required cannot be null EGA common metadata definitions
encryptedChecksum Merged Required cannot be null EGA common metadata definitions
sequenceQualityDetails object Optional cannot be null EGA common metadata definitions

filename

The full name of a file, including all of their file extensions (e.g. .gpg, .md5...), that identifies the file (e.g. 'my-bam-file.bam.gpg').

filename

filename Type

string (Filename)

filename Constraints

pattern: the string must match the following regular expression:

^[^<>:;,?"*|]+$

try pattern

filename Examples

"my-bam-file.bam.gpg"

fileContent

Array of file content items. This array exists to clarify what the purpose of a file, regardless of its format, may be. For example, a TXT formatted file could contain multiple types of data, from gene annotations to READMEs. Therefore, select the items from the used ontology that best describe the content of your file.

fileContent

fileContent Type

object[] (File content item)

fileContent Constraints

minimum number of items: the minimum number of items for this array is: 1

unique items: all items in this array must be unique. Duplicates are not allowed.

filetype

The main format in which data is structured and represented in an electronic file. It is normally defined by the file extension of the file (e.g. FASTQ for a '.fastq' file). The string corresponds to the ID or name (e.g. FASTA, TSV...), chosen from a list of controlled vocabulary (CV), associated with the given filetype. If you cannot find your term in the CV list, please create an issue at our metadata GitHub repository proposing its addition.

filetype

filetype Type

string (Filetype)

filetype Constraints

enum: the value of this property must be equal to one of the following values:

Value Explanation
"CEL" [format:1638]
"TSV" [format:3475]
"FASTQ" [format:1930]
"FASTA" [format:1929]
"VCF" [format:3016]
"SRA" [format:3698]
"SRF" [format:3698]
"SFF" [format:3284]
"BAM" [format:2572]
"CRAM" [format:3462]
"XLSX" [format:3620]
"CSV" [format:3752]
"BED" [format:3003]
"IDAT" [format:3578]
"MAP" [format:3285]
"PED" [format:3286]
"BIM" []
"FAM" []
"TXT" [format:2330]
"EXP" [format:1631]
"GPR" [format:3829]
"PY" [format:3996]
"SH" []
"ADF" [NCIT:C172213]
"SDRF" [NCIT:C172211]
"IDF" [NCIT:C172212]
"MD5" [data:2190]
"HAP" []
"CSFASTA" []
"LOC" []
"HTML" [format:2331]
"HIC" []
"MD" []
"MATLAB" [format:4007]
"PERL" [format:3998]
"TIF" []
"R" [format:3999]
"SNP" []
"XML" [format:2332]
"SVG" [format:3604]
"PNG" [format:3603]
"JPG" [format:3579]
"GTC" []: An Illumina-specific file containing called genotypes in AA/AB/BB format
"HDF5" [format:3590]
"FAST5" []
"PAIR" []
"TXT" [format:2330]
"BGI" []: Index file of a BGEN file
"BGEN" []: Binary version of a GEN file
"GEN" [format:3812]
"PXF" []: A phenopacket. An open standard for sharing disease and phenotype information represented as PXF (Phenotype Exchange Format) files, which may be encoded in JSON or YAML.
"LOOM" [format:3913]
"BAX.H5" []
"BAS.H5" []
"ASM" []: The files in the ASM directory describe and annotate the genome assembly with respect to the reference genome.
"CSI" []
"TBI" [format:3700]
"BCF" [format:3020]
"qual454" [format:3611]
"qualsolid" [format:3610]
"FASTQ-illumina" [format:1931]
"FASTQ-helicos" []
"FASTQ-sanger" [format:1932]
"FASTQ-solexa" [format:1933]
"SAM" [format:2573]
"CRAI" []: CRAM indexing format
"BAI" [format:3327]
"MTX" [format:3916]
"MEX " []: Market Exchange Format (MEX) for sparse matrices. It contains a matrix (MTX) file, and also gzipped TSV files with feature and barcode sequences corresponding to row and column indices respectively. Feature-barcode matrix
"GMX" []
"GMT" []
"GRP" []

checksumMethod

Node containing both the ID (MD5 or SHA-256), describing the method which yields the checksum from a data input for the purpose of detecting errors. Term chosen from a list of controlled vocabulary (CV). If you cannot find your term in the CV list, please create an issue at our metadata GitHub repository proposing its addition.

checksumMethod

checksumMethod Type

string (Checksum method ID)

checksumMethod Constraints

enum: the value of this property must be equal to one of the following values:

Value Explanation
"MD5" [NCIT:C171276]
"SHA-256" [NCIT:C80226]

unencryptedChecksum

A computed value which depends on the contents of a block of data and which is transmitted or stored along with the data in order to detect corruption of the data, computed from the unencrypted files.

unencryptedChecksum

unencryptedChecksum Type

string (Checksum [NCIT:C43522] of the unencrypted file)

one (and only one) of

unencryptedChecksum Examples

"46798b5cfca45c46a84b7419f8b74735"

encryptedChecksum

A computed value which depends on the contents of a block of data and which is transmitted or stored along with the data in order to detect corruption of the data, computed from the encrypted files.

encryptedChecksum

encryptedChecksum Type

string (Checksum [NCIT:C43522] of the encrypted file)

one (and only one) of

encryptedChecksum Examples

"bc527343c7ffc103111f3a694b004e2f"

sequenceQualityDetails

Sequencing quality scores measure the probability that a base is called (i.e. sequenced) incorrectly. New sequencing technologies assign a quality score to each of the bases in the sequence.

sequenceQualityDetails

sequenceQualityDetails Type

object (Sequence quality details)