Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tlg0031.tlg002.INTF-grc1.xml #2819

Open
gcelano opened this issue Dec 18, 2024 · 4 comments
Open

tlg0031.tlg002.INTF-grc1.xml #2819

gcelano opened this issue Dec 18, 2024 · 4 comments
Labels

Comments

@gcelano
Copy link

gcelano commented Dec 18, 2024

Contrary to other files, this one contains <w> for each word: is there a particular reason for that? The file also does not contain punctuation: is this correct?

@lcerrato
Copy link
Collaborator

This file is the responsibility of the contributing organization the INTF. I believe this is a manuscript transcription but predates my involvement with the OGL.
There is a lack of bibliographic information and credits here, so I cannot offer any information on the source or the markup.

@AlisonBabeu
Copy link
Collaborator

hi @gcelano and @lcerrato I never created metadata for this file I recall because there wasn't enough info in the header. I believe this file plus the related cop-1 file were uploaded by Matt as part of a special project if I recall.

@lcerrato
Copy link
Collaborator

It appears to be only chapter 1 of Mark, not the entire book. The encoding is the responsibility of the organization or the editor.
If it is not useful in this format, we should reconnect with the INTF about cleaning it up but I do not see a version on their site to which we can point for missing info.

@lcerrato lcerrato added help wanted question Special case requires special handling labels Dec 18, 2024
@mgbilby
Copy link
Contributor

mgbilby commented Dec 23, 2024

If I can vote from afar, I'd recommend deleting the INTF version for the time being and pipe in an Epidoc-compliant transformation of the SBLGNT, bypassing the corresponding print work altogether. SBLGNT has applied a CC-BY 4.0 license for all 27 books of the canonical NT.

Each work comes in two formats, txt (each verse on a separate row) and xml (each verse, word, and punctuation tokenized on a separate row, as in the snippet below).

I'm happy to give this a try and might see if Rick Brannan (formerly of Logos Bible Software) would like to lend a hand. ...

<verse-number id="Mark 1:1">1:1</verse-number>
    <w>Ἀρχὴ</w>
    <suffix></suffix>
    <w>τοῦ</w>
    <suffix></suffix>
    <w>εὐαγγελίου</w>
    <suffix></suffix>
    <w>Ἰησοῦ</w>
    <suffix></suffix>
    <prefix> ⸀</prefix>
    <w>χριστοῦ</w>
    <suffix>. </suffix>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants