-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Column Names and Translation Dictionaries #68
Comments
I think this would be a very good addition to PVAnanlytics! It seems almost indispensable for any kind of automated analysis. I like the |
@bt- is the scope to host translation tools and also a library of known translation dicts? |
@cwhanse, I am thinking primarily hosting tools to create "translation dictionaries", where the translation dictionary is the mapping from measurement category id to groups of column names. But, I do think there should be a library of dictionaries to facilitate renaming columns. As an example, this would be helpful for renaming data from AlsoEnergy where they seem to be consistent in using @wfvining, when I review the library overview the intuitive location to me is under |
I was thinking about this in terms of quality control on the column names, not so much about identifying which sensors/equipment exist. I could see it going in |
Maybe an |
When I read |
I'm having trouble thinking of a good name that encompasses the renaming and the grouping functionalities without falling back to a name like What about one of these options: Or, I could see
I envision having renamed data and the translation dictionary exported being helpful if you wanted to use them in Pecos in one workflow/notebook and then use the same translation dictionary again in pvcaptest or other workflow. Maybe: |
We already have a |
@cwhanse and @wfvining, I'm considering if I should pull some of the functionality of pvcaptest out and put it into a separate package and I'd like to get your feedback on if pvanalytics might be a good place or if I should create a new package.
There are two closely related features that I'm considering pulling out:
A substantial amount of pvcaptest functionality depends on having a translation dictionary (
CapData.column_groups
). This approach was originally inspired by the Pecos package. Pecos enables using the translation dictionary concept, but doesn't generate them.The more performance engineering work I do, especially on tests with longer time frames, the more I think it would be valuable to use Pecos. To facilitate this, I think it would make sense to move automatic translation dictionary generation out of pvcaptest into a more general purpose package (pvanalytics?) that can output a translation dictionary that can be used in both pvcaptest and pecos.
The pvcaptest code that generates translation dictionaries is contained in the translation dictionaries, group_columns function, and the __series_type function. This algorithm works surprisingly well given how rudimentary it is, but it could definitely be greatly improved.
I started the tools to rename columns based on how much variety there is in column names coming from a wide range of DAS/SCADA vendors and projects. I think this has to be a first step to get any type of reliable results from the algorithm to automatically generate the translation dictionary.
Look forward to hearing your thoughts!
The text was updated successfully, but these errors were encountered: