You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 28, 2021. It is now read-only.
Is your feature request related to a problem? Please describe.
When column headers differ and their row differs per document (different "intro" in each document)
I have to parse once to find the header and then do it again to work from that line.
Also "from" vs "from_line" becomes confusing when the header is not the first row
Describe the solution you'd like
I think aside from static row number there should be option to add function that returns true or array of headers (column names) when it detects columns header row, and then we parse from that on with columns ( object fields ) named as headers
Describe alternatives you've considered
I've tried a lot of workarounds. using on_records with some global isHEader boolean and then columns to manually convert to columns. All felt like reinventing what library already does well (when the header is in the same place).
Second workaround was to parse once to locate header, and then start over parsing from that line - that seems to be working but code is much much complicated and I cannot just stream the file and do it in one pass.
Let me know how much of a problem would it be to make header detection and if you are accepting PRs what conditions / requirements for it to be accepted without much back and forth - maybe I could contribute. Whatever would work :-)
Aside from that - great job and awesome library guys! - It helped me a lot with a LOT of huge and nasty csv files. :-)
The text was updated successfully, but these errors were encountered:
If I understood well, the column options can already be defined as an array, see those 3 tests. It is also documented. Does this answer your need or did I read your request to fast? Please provide a little wished sample to ease my understanding.
As @ev45ive described, we need to skip from 1 to 8 rows before reaching the header position.
In addition, the columns order may differ from one file to another: it can be a,b,c, a,b,c,d or a,c,b
Currently the only solution we could find was the same as described above:
parse the file once to find the header position
and parse it again starting from the header position found previously.
Because we are managing large files, we must ensure that only a small section of the file is computed during the first operation (with from_line, to_line).
Here's a small illustration of the files we may have to deal with for the same parser:
line to skip 1
a,b,c
1,2,3
5,6,7
line to skip 1
line to skip 2
a,c,b
1,3,2
5,7,6
line to skip 1
line to skip 2
a,b,c,d
1,2,3,4
5,6,7,8
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Is your feature request related to a problem? Please describe.
When column headers differ and their row differs per document (different "intro" in each document)
I have to parse once to find the header and then do it again to work from that line.
Also "from" vs "from_line" becomes confusing when the header is not the first row
Describe the solution you'd like
I think aside from static row number there should be option to add function that returns true or array of headers (column names) when it detects columns header row, and then we parse from that on with columns ( object fields ) named as headers
Describe alternatives you've considered
I've tried a lot of workarounds. using
on_records
with some global isHEader boolean and thencolumns
to manually convert to columns. All felt like reinventing what library already does well (when the header is in the same place).Second workaround was to parse once to locate header, and then start over parsing from that line - that seems to be working but code is much much complicated and I cannot just stream the file and do it in one pass.
Let me know how much of a problem would it be to make header detection and if you are accepting PRs what conditions / requirements for it to be accepted without much back and forth - maybe I could contribute. Whatever would work :-)
Aside from that - great job and awesome library guys! - It helped me a lot with a LOT of huge and nasty csv files. :-)
The text was updated successfully, but these errors were encountered: