Replies: 70 comments
-
From @dustymc The "full" dump is computationally expensive and takes a lot of disk; I'm not sure if we could support running that for everyone at any reasonable interval with current resources. loans + accessions + projects have all those problems too, but there aren't very many of them and they don't change very often - we could probably pull that off without much problem. FLAT is cheap and easy to query (that's why it exists!) but is missing a lot of information - eg, it contains only one locality per specimen. The DWC files contain full (except 'unaccepted') locality data, are in a Standard exchange format, and we could probably share more data than we do. I suspect that's our best bet for a "lightweight backup" but I'd need to know more about the purpose of the backup to make that call. The Oracle backups contain everything, we're already paying the cost to make them, you can read them with free software, and I think TACC has essentially unlimited bandwidth. Scattering them across more disks in more locations under the control of more organizations would definitely make me sleep better. I think all I need is an address and write credentials to make that happen. |
Beta Was this translation helpful? Give feedback.
-
The purpose of the backup would be to allow collections to maintain local flat file copies of their most critical data sufficient to recover the majority in case they decide to switch to a different platform or in case of catastrophic failure or downtime. Having this option would provide significant peace of mind to collections staff and admin, and would provide increase Arctos usability and marketability. |
Beta Was this translation helpful? Give feedback.
-
I would need table/column detail to proceed; I can't know what anyone considers critical. (I know what I would consider critical: an Oracle backup file, which contains the rules and structure in addition to the data.) I think a real-world use case would be very useful. DWC data are here: http://ipt.vertnet.org:8080/ipt/resource?r=msb_mamm
|
Beta Was this translation helpful? Give feedback.
-
Yes, I can speak to the desire for this as well as chief necessary fields. Are you looking for particular columns that are missing? Can you direct me to a single DWC extract sheet (where all the code table values are part of one spreadsheet)? I've been creating a column matching sheet for migrating our data from mySQL extraction of the data from local Specify-derived server, to how the data is originally entered (so we know we're extracting all necessary fields whilst using a DWC extract schema in Specify), then mapping that to Arctos fields. It will serve as a guide to the IT expert assisting in the migration. I can share that. This may require a phone call to be most effective if I'm missing some information or not addressing what you're asking. If there is a DWC schema (the column headings) that I can look at, I can tell you what key elements are missing for our data purposes anyway. |
Beta Was this translation helpful? Give feedback.
-
Yes.
I don't think such a thing exists; no specimen will have eg, all Attributes, and many specimens are spread across multiple DWC:Occurrences.
https://www.tdwg.org/standards/dwc/, but DynamicProperties make it somewhat like Arctos in that it's not limited to a spreadsheet-like structure. |
Beta Was this translation helpful? Give feedback.
-
Is there a way to code as part of an 'export all' function "export all Attributes"? Here is the Column Matching sheet I mentioned. It starts by matching all fields that we have for mammal records extracted from the server database to what we enter in "flat sheet" data entry spreadsheets. Then those fields are mapped to how they must be entered into Arctos. |
Beta Was this translation helpful? Give feedback.
-
I'm not sure I'm understanding, but.... The Arctos specimen bulkloader is a greatly simplified view of the most common things shared among incoming specimens. You can see all current Attribute types at http://arctos.database.museum/info/ctDocumentation.cfm?table=CTATTRIBUTE_TYPE. I can certainly export attributes, the question is how. As rows in an attachment, no problem. As structured data in a cell, MAYBE they'll fit now, but that won't necessarily last - attributes can hold about 8K, any specimen can have any number of them, and various tools have various length constraints. There are many other such data - eg, any specimen can have any number of parts, any part can have any number of part attributes. Lacking better ideas, those would likely include eg, "determiner: John Smith." Determining if that's John Smith the expert or John Smith the dyslexic prankster needs a link back to Agents. In Arctos, an agent's old phone number (publications, relations to other entities, etc.) is very much a real part of "the attribute record." (Or in general everything is a part of everything else.) In an export, if you want any of those "ancillary" data I'll need to explicitly know about it. The simplest model we've found that's capable of carrying the complexity of the data is the one we use. The only backup I'm aware of from which that complexity could be recovered is the native Oracle backup. |
Beta Was this translation helpful? Give feedback.
-
Dusty,
I think in this case what we need is as follows, in the order top to bottom
of what may be feasible:
1) A large flat file exactly like what we upload with the specimen
bulkloader or get via the specimen results download, We can't download
this ourselves because of browser timeout issues; otherwise we would.
This would include all of the possible specimen bulkloader data fields =
all fields added from add/remove data fields in specimen results.
(Ideally, these fields would download only if there are data to populate
them.)
Attributes would have a Determined by Agent and Determined date field etc.
Parts would include either JSON string or, even better, be parsed out into
columns as they would go into the bulkloader (including barcode field).
The event downloaded would be by default the most recent accepted event.
The ID downloaded would be the most recent accepted ID.
Agents would be preferred name. Obviously, they could not contain any other
info from the agents table in this format.
Accessions would be an included column.
Citations would be an included column.
Can we get a column for loans added as a general concatenated field and
also embedded into the parts JSON script? Is this possible?
2) Download data on multiple specimen events, ID history- how do we do
this? As concatenated fields like the OTHER IDs? JSON? Multiple columns?
3) Download accessions and loans as list.
4) Figure out a way to download a separate flatten part locations for all
items in the collection - this could obviously be monstrous, but would be
immensely helpful to have as a periodic backup / archive.
5) Figure out a way to download the full part location tree in print format
- archivable?
6) Download the agents table into something that can be archived on local
servers?
Anything I'm missing?
…On Wed, Apr 24, 2019 at 9:42 AM dustymc ***@***.***> wrote:
I'm not sure I'm understanding, but....
The Arctos specimen bulkloader is a greatly simplified view of the most
common things shared among incoming specimens.
You can see all current Attribute types at
http://arctos.database.museum/info/ctDocumentation.cfm?table=CTATTRIBUTE_TYPE
.
I can certainly export attributes, the question is how. As rows in an
attachment, no problem. As structured data in a cell, MAYBE they'll fit
now, but that won't necessarily last - attributes can hold about 8K, any
specimen can have any number of them, and various tools have various length
constraints.
There are many other such data - eg, any specimen can have any number of
parts, any part can have any number of part attributes.
Lacking better ideas, those would likely include eg, "determiner: John
Smith." Determining if that's John Smith the expert or John Smith the
dyslexic prankster needs a link back to Agents. In Arctos, an agent's old
phone number (publications, relations to other entities, etc.) is very much
a real part of "the attribute record." (Or in general everything is a part
of everything else.) In an export, if you want any of those "ancillary"
data I'll need to explicitly know about it.
The simplest model we've found that's capable of carrying the complexity
of the data is the one we use. The only backup I'm aware of from which that
complexity could be recovered is the native Oracle backup.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#2051 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADQ7JBD5U5OBE5DEQKP4VQ3PSB5XPANCNFSM4HHTHLUA>
.
|
Beta Was this translation helpful? Give feedback.
-
Those are wildly different things. Eg, one will deal with 10 (or whatever the number is) parts with "core" part-components broken out, the other will deal with any number of parts (attributes, identifications, etc.) but with the complexity concatenated in various ways. I can certainly flatten various stuff into various formats (and much of that exists as FLAT), but I need specifics that address the reality of the data.
See above for the problems with merging them into structured data. I'm not sure how eg, 30 sex determinations might be munged into a spreadsheet - I suppose we could parse them out to sex_1 and sex_determiner_1 and such, but that would lead to a variable and indefinite number of columns.
Not all fit within the current limitations of Oracle. That'll get better soon, but it's just a bump from 4KB to 32KB - some may still not fit.
The specimen bulkloader can currently handle a ~dozen parts and no part attributes. The data can be many more parts, each with any number of attributes.
That one we can do! (As long as you don't care about taxon-stuff that won't fit into FLAT.)
That depends on what precisely you mean by "loans." If it's just a list of loan numbers or similar, probably. If you want more (loan data, results, involved parts, something for data loans, ....) then it likely won't easily fit.
That's available, but it links to Arctos so isn't very suitable for many of your reasons. I can find a way around Oracle's datatype limitations (eg, write to files or CLOBs), but that would be computationally expensive (we can PROBABLY afford it), require a lot of disk, and I'm not sure what software would be capable of processing the results.
This approach does not seem useful for that to me. I don't think it's possible to flatten 'critical data' without significant loss (or perhaps significant liberties in defining "flat"!). If I were going to migrate Arctos data to any other platform, I would want to start with an Oracle backup file. Absolute worst case, I could pay a consult for a few days to get what I want from it, whatever that might be. In the case of catastrophic failure, recovering from a fresh copy of the backups stored somewhere that wasn't affected by the fire/meteor/aliens/Texan Revolution (and the stuff on GitHub) would be trivial. Recovering from anything else would be torturous. In the case of significant downtime, pulling Arctos up (eg, on some cloud service or at another .edu) from backups (plus github) would be technically trivial, and mostly impossible from anything else that I can imagine. |
Beta Was this translation helpful? Give feedback.
-
I guess the question would be, how did you extract the Cornell data to
repatriate to them to go back into Specify? That would be the easiest
scenario, because they didn't do anything with their data. Or maybe we
didn't give them anything back after all that work?
This request is largely an assurance to potential users that if they hate
us, they can go back to something non-Oracle, something like what they
came in with, which is largely a collection of csv files. Think of it as
part of marketing - may not be structural necessary in your view, but it is
necessary psychologically and sociologically to get people to feel
comfortable in our environment. Also allows users to maintain local backups
just in case. I think we all want the latter, for the old "Dusty gets hit
by a bus" catastrophe scenario. Don't do that, by the way, at least not
until we get more funding :)
In my original request, this - bulkload file format with "core"
part-components broken out - would be what I ideally would want, without
the part and attribute limits. If this is computationally not possible,
then concatenation in a format that would allow it to be parsed out later
into a csv file with "core components broken out" would be acceptable. So,
the specimen results download with various types of concatenation would be
an OK replacement, although not ideal (I hate having to try to parse JSON
into csv - but maybe I just don't know how.)
Loans - concatenated list of loan numbers OK in flat. Plus a separate
download as loan list info from transactions menu. Again, it would be ideal
to have the parts download show loan relationships as well in some way-
back to JSON?
Citations - OK even without the external links.
We accept that there will be loss of data in this format. But recovering
some data is better than loss of all, which is what happens when Arctos, or
even our local internet, goes down.
I would be happy to help go through field by field to decide on data
concatenation etc if that is what it takes. Google spreadsheet?
…On Wed, Apr 24, 2019 at 1:11 PM dustymc ***@***.***> wrote:
exactly like what we upload with the specimen bulkloader or get via the
specimen results download
Those are wildly different things. Eg, one will deal with 10 (or whatever
the number is) parts with "core" part-components broken out, the other will
deal with any number of parts (attributes, identifications, etc.) but with
the complexity concatenated in various ways. I can certainly flatten
various stuff into various formats (and much of that exists as FLAT), but I
need specifics that address the reality of the data.
Attributes would have a Determined by Agent and Determined date field etc.
See above for the problems with merging them into structured data. I'm not
sure how eg, 30 sex determinations might be munged into a spreadsheet - I
suppose we could parse them out to sex_1 and sex_determiner_1 and such, but
that would lead to a variable and indefinite number of columns.
Parts would include either JSON string
Not all fit within the current limitations of Oracle. That'll get better
soon, but it's just a bump from 4KB to 32KB - some may still not fit.
even better, be parsed out into columns as they would go into the
bulkloader
The specimen bulkloader can currently handle a ~dozen parts and no part
attributes. The data can be many more parts, each with any number of
attributes.
The ID downloaded would be the most recent accepted ID.
That one we can do! (As long as you don't care about taxon-stuff that
won't fit into FLAT.)
column for loans added as a general concatenated field and also embedded
into the parts JSON script? Is this possible?
That depends on what precisely you mean by "loans." If it's just a list of
loan numbers or similar, probably. If you want more (loan data, results,
involved parts, something for data loans, ....) then it likely won't easily
fit.
Citations would be an included column.
That's available, but it links to Arctos so isn't very suitable for many
of your reasons.
I can find a way around Oracle's datatype limitations (eg, write to files
or CLOBs), but that would be computationally expensive (we can PROBABLY
afford it), require a lot of disk, and I'm not sure what software would be
capable of processing the results.
The purpose of the backup would be to allow collections to maintain local
flat file copies of their most critical data sufficient to recover the
majority in case they decide to switch to a different platform or in case
of catastrophic failure or downtime. Having this option would provide
significant peace of mind to collections staff and admin, and would provide
increase Arctos usability and marketability.
This approach does not seem useful for that to me.
I don't think it's possible to flatten 'critical data' without significant
loss (or perhaps significant liberties in defining "flat"!).
If I were going to migrate Arctos data to any other platform, I would want
to start with an Oracle backup file. Absolute worst case, I could pay a
consult for a few days to get what I want from it, whatever that might be.
In the case of catastrophic failure, recovering from a fresh copy of the
backups stored somewhere that wasn't affected by the
fire/meteor/aliens/Texan Revolution (and the stuff on GitHub) would be
trivial. Recovering from anything else would be torturous.
In the case of significant downtime, pulling Arctos up (eg, on some cloud
service or at another .edu) from backups (plus github) would be technically
trivial, and mostly impossible from anything else that I can imagine.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#2051 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADQ7JBDDXOOO7GZL4JYE4S3PSCWEFANCNFSM4HHTHLUA>
.
|
Beta Was this translation helpful? Give feedback.
-
The "full" dump (=tables)
From an Oracle dump:
From anything else:
I don't think a flatfile "export" is a bad idea, but I do think it should come with some sort of explicit explanation of where it came from and what its limitations are.
Are you buying a bus?!
Yes, I think that's what it's going to take. Here's a sample of what's easiest to get to.
|
Beta Was this translation helpful? Give feedback.
-
Maybe an Oracle dump is good too, but how much expertise is required to
clean up the output enough that a student could comprehend it or use it for
subsequent data entry?
I like the idea of the flat file at least as a complementary approach.
@dkrejsa@angelo.edu <[email protected]> let's look over this flatbits
file as a start.
…On Wed, Apr 24, 2019 at 3:13 PM dustymc ***@***.***> wrote:
Cornell
The "full" dump (=tables)
something non-Oracle,
From an Oracle dump:
1. install Oracle Express (or fire up an account with some Oracle
host, or go talk nice to your local financial people who almost certainly
use Oracle, or...)
2. impdb {dumpfile}
3. use one of the hundreds of available tools or scripts to pull out
whatever you want (including DDL, rules, relationships, datatype, etc.) in
whatever format you want it, ignore what you don't want.
From anything else:
1. Try to figure out what you have
2. Hope there's enough information to whatever you're trying to do
I don't think a flatfile "export" is a bad idea, but I do think it should
come with some sort of explicit explanation of where it came from and what
its limitations are.
more funding
Are you buying a bus?!
field by field
Yes, I think that's what it's going to take. Here's a sample of what's
easiest to get to.
***@***.***> create table temp_flatbits as select * from flat where guid like 'MSB:Mamm:%' and rownum<10000;
temp_flatbits.csv.zip
<https://github.com/ArctosDB/arctos/files/3114179/temp_flatbits.csv.zip>
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#2051 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADQ7JBFDC5GXFQPRMNIHGVTPSDEQ5ANCNFSM4HHTHLUA>
.
|
Beta Was this translation helpful? Give feedback.
-
Nothing that hasn't been on stackoverflow a million times anyway, and the container describes the data. The front-end is on github - it's not too hard to build a clone of Arctos from an Oracle dump and a git pull either. I think it would just be a different type of expertise required to interpret a flatfile, assuming it contains what's needed to do whatever you'd be doing. Here's another precompiled flat view of some data - this one will be much better at locality data, but doesn't contain any encumbered data. I'm not sure which one (if either) might be more useful.
|
Beta Was this translation helpful? Give feedback.
-
temp_flatbits_missing values.xlsx I've looked over both flatbits files. The second tab on the attached has the column headings transposed next to each other ("Comparison"). I looked at them with our data in mind and made additions in red at the bottom of the columns for what I think they're lacking or could benefit from. There are some fields that sound like they'd contain similar information -- INFORMATIONWITHHELD and ENCUMBRANCES for example. Is there a way to query all the available fields but just export ones with values? Or there would be too many redundancies and high processing time. |
Beta Was this translation helpful? Give feedback.
-
So if I understand this correctly, all the attribute information would be
concatentated into a single field, without dates or determiners, correct?
So ATTRIBUTE could be: "ATTRIBUTE: sex = male, age class= adult,
reproductive info = scrotal, t= 4 x 2,"
Is that correct?
And would preparator number, collector number etc be in a similar
concatenated field of OTHER IDs?
I agree with Dianna that it would be great if we could have a download of
only those fields that are populated with data. I would also want all the
fields that can be currently downloaded in the specimen results view using
add/remove data fields to be options for download either as individual
columns or as concatenated fields.
…On Thu, Apr 25, 2019 at 8:46 AM diannakrejsa ***@***.***> wrote:
temp_flatbits_missing values.xlsx
<https://github.com/ArctosDB/arctos/files/3117463/temp_flatbits_missing.values.xlsx>
I've looked over both flatbits files. The second tab on the attached has
the column headings transposed next to each other ("Comparison"_. I looked
at them with our data in mind and made additions in red at the bottom of
the columns for what I think they're lacking or could benefit from. There
are some fields that sound like they'd contain similar information --
INFORMATIONWITHHELD and ENCUMBRANCES for example. Is there a way to query
all the available fields but just export ones with values? Or there would
be too many redundancies and high processing time.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#2051 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADQ7JBER4MWUV4L3RGJSUHTPSG72VANCNFSM4HHTHLUA>
.
|
Beta Was this translation helpful? Give feedback.
-
That's a question for those who want flat extracts. I see no way in which they can avoid being lossy so they can be of ~no value to me. I keep hearing things like "loans." That could mean "partial dump of table LOAN" or it could need to include the 3rd phone number of the 9th preparator of specimens related to specimens from which parts were loaned - data which is in, and easily understandable and recoverable from, a DB export. I doubt you want that specific chunk of data, but there's an infinite amount of data which could be critical to certain tasks (understanding what was used for or intended by a citation in a publication, for example) that's an equal distance from table LOAN. A request for flat files is essentially a request to discard information; I need to know precisely what you don't want to toss, and how you want it arranged. |
Beta Was this translation helpful? Give feedback.
-
From the very narrow perspective of what I want, for a start I want just what goes in the specimen bulkloader, plus preparators/prep num/other ids and parts. It would be very close to what is available in the fields of the data tools download, just a few more fields would need to be added to those available. How would you like the information presented to you/the group for further editing? Columns ala the bulkloader (more long-form) or ala the download data tool? Then we can add a "wish list" of aspects others may want (e.g., loans) and figure out what parts of those wish list items can be added? |
Beta Was this translation helpful? Give feedback.
-
I would need to know what to do with the 11th collector, 13th attribute, 2nd specimen-event, etc. (And implicit agreement that strings are sufficient for your purposes - eg, the only thing you care about regarding agents is preferred_name, all other agent data can be discarded for this.)
Those are covered by "what goes in the specimen bulkloader."
I need specifics; there can be any number of otherIDs, and parts have an additional dimension for part attributes.
I don't know, perhaps because I'm having difficulty understanding the purpose. Maybe manually munge whatever you want of a record into a CSV file as an example???? That seems a fairly painful way to approach this, but it would let me request adding another record when something doesn't fit - maybe it would provide an effective means of communication. |
Beta Was this translation helpful? Give feedback.
-
Alright, attached is a stab at a beginning template. I started with the data within a bulkloader file for ASNHC:Mamm:20000. I added columns for other common data as well example fields that Mariel and Dusty exported before (something I had saved off as temp_flatbits_missingvalues, not sure that name would ring any bells for what ya'll did to export those fields in the past). The third row includes fields that might be sunk within the column above them. At the end of the series of columns I added "Loans?" simply because I imagine someone will want that data exportable in some capacity. |
Beta Was this translation helpful? Give feedback.
-
Excellent, thanks! I pulled that into https://docs.google.com/spreadsheets/d/1caZi8YvjKtMIklVSnlnfG3BdD1WQ3rQMUQgNGzlZqbA/edit#gid=1443094724 and anyone can edit. I made some preliminary comments. Essentially I'd need more detail; what precisely do you mean by "sex" (for example), and if there are 13972 determinations then how would you like them handled? |
Beta Was this translation helpful? Give feedback.
-
Cool! I wrote some responses to these, but other folks should take a look since I don't necessarily have a stake in every field (or know the full usage someone might require of them). When going through, one thought I had was making it a multi-page export process where data managers select what database they manage they'd like exported, then it's a locality page where they check what aspects of locality for those records they may want, then it's an attributes page with all options from the attributes_code_tables list and they check which ones they want to export data from, and so on. Kind of like the Download data tools thing but with more options? |
Beta Was this translation helpful? Give feedback.
-
OK - I am going to say what I think I have been saying all along. A complete export should be more than one file. Here is what you need (stuff in parens are the columns for each file, not comprehensive at this point...)
What have I forgotten? This is going to give you "your" data in a way that could be related to each other so that you could re-create stuff in Arctos with bulkloader tools. It will not be useable as Arctos, but that isn't what we are after here is it? Each file is going to include one row of data for each "thing", so if you have an object with multiple identifications, you are going to have more than one row using that GUID in column one. This is what you are going to need if you want to import the data into something else. If object tracking is used, a file for BARCODE (BARCODE, PARENT_BARCODE) would be needed as well and maybe something else I am missing. |
Beta Was this translation helpful? Give feedback.
-
Probably something - "complete export" still seems a very wrong description - but I think that's closer to achievable, and more useful, than trying to pretend that Arctos is a giant spreadsheet can be. This is getting closer to a DB dump, which includes everything you've mentioned plus whatever you've forgotten, and includes assembly instructions in a language that both computers and people can understand. |
Beta Was this translation helpful? Give feedback.
-
Yes, absolutely, this is what I have been trying to request. Also need a
file for BARCODE (BARCODE, PARENT_BARCODE), or better yet: Part Location
Path.
A DB is fine as long as it includes files that can be opened in
spreadsheets.
I have a server . . . ready to move MSB data there now.
…On Mon, Jun 29, 2020 at 10:07 AM dustymc ***@***.***> wrote:
* [EXTERNAL]*
What have I forgotten?
Probably something - "complete export" still seems a very wrong
description - but I think that's closer to achievable, and more useful,
than trying to pretend that Arctos is a giant spreadsheet can be.
This is getting closer to a DB dump, which includes everything you've
mentioned plus whatever you've forgotten, and includes assembly
instructions in a language that both computers and people can understand.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#2051 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBBACAIH6C7SFAIAZ5TRZC32FANCNFSM4HHTHLUA>
.
|
Beta Was this translation helpful? Give feedback.
-
Given that @mkoo asked this of a potential incoming collection yesterday "If you can get your data out of Specify..." We really need to think about how this would work when a collection eventually decides to leave Arctos. |
Beta Was this translation helpful? Give feedback.
-
Agree.
…On Wed, Feb 16, 2022, 9:34 AM Teresa Mayfield-Meyer < ***@***.***> wrote:
* [EXTERNAL]*
Given that @mkoo <https://github.com/mkoo> asked this of a potential
incoming collection yesterday
"If you can get your data out of Specify..."
We really need to think about how this would work when a collection
eventually decides to leave Arctos.
—
Reply to this email directly, view it on GitHub
<#2051 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBHHUIZBNQHVJEE4NVLU3PG2PANCNFSM4HHTHLUA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
In the past I've provided their parts of tables as CSV. Happy to discuss more, but I don't think this is going to go anywhere without some actionable specification - eg "LOAN fields" could literally be almost anything, I would need specifics to act. |
Beta Was this translation helpful? Give feedback.
-
wondering how https://github.com/ArctosDB/internal/issues/168 would make this "easier"? |
Beta Was this translation helpful? Give feedback.
-
Just putting this here. Symbiota allows for download of certain tables along with their basic "occurrence record". Maybe we could do something like this? Basic catalog record, plus parts table, identifiers table, identifications table, attributes table, events table - all zipped up. FWIW, I downloaded the DwC for all UTEP:Herb records and it took a while, but it didn't time out. |
Beta Was this translation helpful? Give feedback.
-
Symbiota is built on an exchange standard, there are no useful analogies between it and Arctos.
https://github.com/ArctosDB/internal/issues/260 would make that a lot more reliable (and allow you to make the request to vn's hardware).
Timeouts exist to protect the system (and aren't very effective at this when something like this is involved - pg's copy function can overload the VM faster than it can produce the error meant to save itself). There's an issue somewhere, the capabilities are purposeful so I'm not quite complaining, but you can absolutely kill Arctos that way. I've pgified my 'export a collection' scripts since this was started (for @jebrad), but I'm very hesitant to try to automate them for the reasons above. I'm also struggling with this on #6018 All of this of course still suffers from the limitations above - some tables of whatever format don't include the language necessary to really understand the data. If we need backups as a button, then I probably need a dedicated VM for it. |
Beta Was this translation helpful? Give feedback.
-
Issue Documentation is http://handbook.arctosdb.org/how_to/How-to-Use-Issues-in-Arctos.html
Is your feature request related to a problem? Please describe.
We have received repeated inquiries from potential new collections and existing collections as to whether collections data can be exported from Arctos can be exported for backup or migration to a different platform.
Describe the solution you'd like
We currently allow export of flat file data as specimen search results through Arctos and DWC fields through external aggregators. Perhaps provide the option of a regular, automated export of these data, ftp'd to a particular server?
Additionally, we could add in options for separate, linked downloads of transactions, projects, citations ? (by collection?), object tracking (show all objects in this container, flatten?)
Also explore the option of local Oracle backups by collection? or all Arctos?
Priority
Please assign a priority-label.
Beta Was this translation helpful? Give feedback.
All reactions