Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classified fields in Perceval docs should correspond to the fields actually removed #611

Open
valeriocos opened this issue Feb 6, 2020 · 4 comments

Comments

@valeriocos
Copy link
Member

When executing the github backend with the option --filter-classified, the list of all classified fields is reported in each JSON document (pointer). Thus, it isn't possible to derive which fields were removed from a given document, if the latter didn't contain one of the classified fields. It would be useful to adapt the code to include in the classified_fields_filtered only the fields that have been removed.

@sduenas
Copy link
Member

sduenas commented Feb 6, 2020

The idea of these fields is all or nothing. You don't decide which fields you want to remove and which you don't want.

Is there anything I'm missing?

@valeriocos
Copy link
Member Author

Based on the classified fields declared at https://github.com/chaoss/grimoirelab-perceval/blob/master/perceval/backends/core/github.py#L102, for each document only the classified fields actually removed from it should appear in the classified_fields_filtered attribute.

Having a look at the code, I'm basically saying that in case of a KeyError, that classified field shouldn't be added to the classified_fields_filtered attribute. The reason is that with the document obtained, it isn't possible to know if that classified field was removed or didn't exist.

@sduenas
Copy link
Member

sduenas commented Feb 6, 2020

Is that really necessary? What would be the difference? In the end, data is not going to be there which is what we really want with that option.

@valeriocos
Copy link
Member Author

We can live with it, so feel free to close this issue.

The point is that the classified fileds at https://github.com/chaoss/grimoirelab-perceval/blob/master/perceval/backends/core/github.py#L102 include a mix of attributes present in issues and pull requests. It would be better to have classified fields per category, then we can decide whether to include in the classified_fields_filtered of each document the fields actually removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants