Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments & questions on Luquillo CZO database used by ODM2-Admin #15

Open
emiliom opened this issue Jul 22, 2016 · 22 comments
Open

Comments & questions on Luquillo CZO database used by ODM2-Admin #15

emiliom opened this issue Jul 22, 2016 · 22 comments

Comments

@emiliom
Copy link
Member

emiliom commented Jul 22, 2016

@miguelcleon, I'm dumping a long list of my comments and questions in this single issue. I had already typed this in a local markdown document, so it's easiest for me to just dump it all in one place. We can split it off later. I'm pinging @lsetiawan (Don), so he's in the discussion.

When trying to restore the database backup file (using PgAdmin) on my laptop (Ubuntu 14.04, Postgresql 9.3, PostGIS 2.1), the database was installed with a couple of severe gaps. The problem boiled down to PostGIS failing to install, which in turn prevented the samplingfeatures table from installing.

PostGIS installation

  • To get around the PostGIS installation failure I first created a blank database using a template database that already has my standard PostGIS installation. Then I restored (using PgAdmin) from your backup file into that database. The CREATE EXTENSION PostGIS command in your backup probably checks if the extension exists already, and if it does, it doesn't try to install it.
  • Don and Miguel were able to install the dabase from Miguel's DB backup. In Don's case, PostGIS 2.2 was installed; in Miguel's case, 2.1.x
  • Unnecessary, postgis-related extensions are installed: topology and tiger extensions are in the backup database. They are not needed at all. I think they pollute the database instance and add to the complexity of installations.
  • ME: think about incorporating my PostGIS / samplingfeatures "additional" commands (index, dim constrain, etc)

database structure

  • A custom schema has been created, odm2extra. It holds 3 custom tables: Measurementresultvaluefile, featureactionsNames and processdataloggerfile. Note that the names of two of these are defined to be case-sensitive, which is inconsistent with general ODM2 PostgreSQL conventions
  • The custom table Measurementresultvaluefile is present in two schemas, odm2 and odm2extra. That doesn't seem right.
  • A custom function was created in the odm2 schema: MeasurementResultValsToResultsCountvalue
  • Django user and authorization management tables
    • A number of tables have been added for this purpose. Per Don, they're probably all created by Django. The tables are: auth_group, auth_group_permissions, auth_permission, auth_user, auth_user_groups, auth_user_user_permissions, django_admin_log, django_content_type, django_migrations, django_session
    • Ideally they should be placed in a separate schema, not under the default odm2 schema. I think it'd be best to have a dedicate schema such as odm2admin_auth or django_auth. But they could also go under odm2extra

table usage (records, implicit conventions, issues, etc)

samplingfeatures and related Sampling Features entities

  • Problems with the geospatial encoding (featuregeometry) for many of the records:
    • No lat-lon: 136 records have (0,0) for the lat-lon encoding
    • (Update: this has been fixed) Swapped lat-lon: 2 records have lat-lon values that are obviously wrong and simply swapped
    • query to select these records:
      sql select st_astext(featuregeometry), * from odm2.samplingfeatures where st_x(featuregeometry) > -1 order by samplingfeatureid
  • Some sampling features seem to be generic classifiers rather than actual features with geospatial information; eg, "Tabonuco forest"
  • Many feature relationships are used (relatedfeatures). It'll be helpful to learn more about what these are. An interesting example is from the relationship of samplingfeatureid 768, 769, 656.
  • sites and specimens are not used (no records)

result types

  • based on record counts in <resulttype>results tables, only two result types are being used: measurement and profile. But a count of results.resulttypecv entries yields two different result types: "Measurement" (12,192) and "Time series coverage" (199).
  • <resulttype>results table record counts: measurementresult (205) and profileresult (12,113). No records in timeseriesresult
  • These inconsistencies should create problems, in principle

actions

  • actions are probably "coarse" and aggregated, b/c there are only 29 actions records. Here's their distribution by actiontypecv:
actiontypecv count
Data retrieval 1
Equipment maintenance 1
Estimation 1
Instrument calibration 1
Observation 12
Specimen analysis 13
  • relatedactions is not used

citations

  • In citations table (has 159 records), the vast majority of entries include the authors and year in the title column; that column is intended to be limited to the title per se. But note that authorlists is also used extensively (has 278 records), so the author information may already be properly captured, and authors in titles could be removed.
@lsetiawan
Copy link
Member

@emiliom One thing I noticed about the topology and tiger extensions is that they are part of the blank ODM2 PostgreSQL

@emiliom
Copy link
Member Author

emiliom commented Jul 22, 2016

One thing I noticed about the topology and tiger extensions is that they are part of the blank ODM2 PostgreSQL

That's an important observation, @lsetiawan. I need to look into that and work on eliminating it there.

@miguelcleon
Copy link
Member

yup, I have my time series data in the measurement results and measurement results values tables instead of the time series results and times series result values tables. That is a problem I'll need to address and is systematic issue with ODM2 Admin.

@miguelcleon
Copy link
Member

miguelcleon commented Jul 22, 2016

sampling features 767,768, 769, 648,649,650 are 'part of' 656. They are all separate soil pits dug at a single larger site named TABOX-12.

Sounds like maybe your suggesting using the sites table for something like this?

@miguelcleon
Copy link
Member

I've fixed the two records with the swapped lat-long, thank you.

@emiliom
Copy link
Member Author

emiliom commented Jul 22, 2016

sampling features 767,768, 769, 648,649,650 are 'part of' 656. They are all separate soil pits dug at a single larger site named TABOX-12.

Thanks for that clarification. After exploring that particular relationship, I have one observation: relatedfeatures has many duplicated records. For example, the "Is part of" relationship between sampling features 767 and relatedfeatureid 656 occurs 6 times. The records are identical, except of course for the sequential, unique primary key, relationid. I'm guessing that's a messiness that was introduced inadvertently and should be cleaned up.

@emiliom
Copy link
Member Author

emiliom commented Jul 22, 2016

This isn't a comment or question. I'm putting here this summary of samplingfeaturetypecv usage in the samplingfeatures table for future reference.

samplingfeaturetypecv count
Excavation 324
Field area 13
Landscape classification 8
Observation well 3
Site 119
Stream gage 2
Transect 5

The table was generated using this query:

select samplingfeaturetypecv, count(*)
from odm2.samplingfeatures
group by samplingfeaturetypecv order by samplingfeaturetypecv

Update: I guess I do have one comment. I've compared these samplingfeaturetypecv occurrences to the corresponding ODM2 controlled vocabulary entries. The only that doesn't match exactly is "Landscape classification"; the official ODM2 entry is "Ecological land classification". I can see why your choice seems more natural. And it's possible (I don't know) that the ODM2 CV actually was edited, and the "Ecological land classification" entry was previously called "Landscape classification". But in general it's helpful to stick to strict compliance with ODM CV's, unless you have strong reasons not to. Also, the use of this type of Sampling Features could probably be enhanced, and we (ODM2 gang) could benefit from a discussion about a common implementation need like yours.

For reference, these are the Landscape Classification entries in use in the database: Colorado Forest, Palm Forest, Tabonuco Forest, Volcaniclastic, Quartz-diorite, Ridge -Topo Postion, Slope -Topo Postion, Valley -Topo Postion.

@emiliom
Copy link
Member Author

emiliom commented Jul 22, 2016

Another comment about this:

sampling features 767,768, 769, 648,649,650 are 'part of' 656. They are all separate soil pits dug at a single larger site named TABOX-12.

All those sampling features are classified as "Excavation". But TABOX-12 (656) is in fact something conceptually different. It's not an Excavation per se, but rather a "Site" or "Field area" where all those individual excavations (soil pits) are found. At least that's what I'm assuming.

@miguelcleon
Copy link
Member

miguelcleon commented Jul 22, 2016

Yeah, I haven't stayed consistent with classifying the sampling features and I'll need to double check them. Thanks for all your great feedback Emilio, this is really helpful! I'll plan to update you with a new db dump in probably about 2 weeks time does that sound good?

I'll also work on some application updates also particularly for the needed measurement result to time series result conversion that is needed.

@emiliom
Copy link
Member Author

emiliom commented Jul 22, 2016

Thanks for all your great feedback Emilio, this is really helpful!

I'm very glad it's helpful to you.

I'll plan to update you with a new db dump in probably about 2 weeks time does that sound good?

That time frame sounds good. We still have tons to learn, both about your ODM2 implementation and ODM2-Admin per se.

@miguelcleon
Copy link
Member

miguelcleon commented Jul 22, 2016

I thought I'd address one more of the issues you brought up for now. the table odm2.Measurementresultvaluefile can be dropped I had initially added these extra tables in odm2 schema and knowing that wasn't the best way to do things, I since moved them into odm2extra schema. I'm only using the odm2extra.Measurementresultvaluefile table now.

This should also probably be done for the django generated tables you mentioned, this may or may not be doable though as they are core aspects of the functioning of django and using multiple schema's the way I have with odm2extra doesn't seem well supported, so I have been hesitant to try to change those tables.

I should also move the custom function MeasurementResultValsToResultsCountvalue.

@miguelcleon
Copy link
Member

miguelcleon commented Jul 22, 2016

All those sampling features are classified as "Excavation". But TABOX-12 (656) is in fact something conceptually different. It's not an Excavation per se, but rather a "Site" or "Field area" where all those individual excavations (soil pits) are found. At least that's what I'm assuming.

I changed sampling features with samplingfeatureid's 654 to 677 sampling feature type to 'sites'

@emiliom
Copy link
Member Author

emiliom commented Jul 22, 2016

Thanks for the info on odm2.Measurementresultvaluefile vs odm2extra. I'm glad it was just a leftover table, no longer used.

Good to know about the django tables, too. If you'd like, maybe in a couple of weeks (when it's a good time for you) we can bring in Jeff Horsburgh and his team to see if they can help with Django insight. They're using Django heavily in web applications for ODM2, and they probably use credentials too. Maybe they've dealt with this schema customization issue.

@lsetiawan
Copy link
Member

@miguelcleon One question about the tables in odm2extra schema. Are they created by django after defining them in models? I tried doing syncdb with django=1.6.5 and the tables get created, but then using migrate with django=1.9.x doesn't create those tables, have you run into those issues? or did you create the tables separately with SQL statements? Thanks.

@miguelcleon
Copy link
Member

Yeah that is the iffy support for multiple schemas right there. I just used some extra side SQL to create the tables for django 1.9

@lsetiawan
Copy link
Member

Thanks @miguelcleon. While learning django. I've gained more understanding about the models and settings for ODM2-Admin. I've tweaked the model.py to hard coded odm2 schema for each db_table variables in class Meta similar to what you did for the odm2extra.

I then created another schema called admin. I also set admin schema as the default for django. When I ran python manage.py migrate it seems like the django core tables are created under the admin schema, leaving the odm2 and odm2extra schemas alone.

So from this, I think that in order to integrate multiple schemas, one have to designate the schema in db_table variables. I tried running django 1.9.x and it seems to work, though, this method doesn't seem to work with Django 1.6.5. Thanks.

@miguelcleon
Copy link
Member

Cool! Can you do a pull request into the Django1.9.7Support branch?

@lsetiawan
Copy link
Member

@miguelcleon let me clean up my repo git stuff. I didn't notice the second branch. Please ignore the current pull request. Thanks.

@emiliom
Copy link
Member Author

emiliom commented Jul 27, 2016

Regarding the schema issues:Thanks, @lsetiawan! That's great.

@miguelcleon, on your README page it says "support tested for django 1.6.5 and 1.9.x". I have no direct experience with django development. But given that so far you're the only user of ODM2-Admin, wouldn't it be easier if you limited backwards compatibility and supported only Django 1.9.x? It sounds like that would minimize development headaches. Just my 1 cent.

@miguelcleon
Copy link
Member

Yeah I'm hoping to drop support for 1.6.x soon, can't upgrade a server but we are transitioning to a different one which will be on django 1.9.x

emiliom referenced this issue Aug 5, 2016
…s branch, seems to be acting a little strangely, do you want to take a look and see what you think?
@miguelcleon
Copy link
Member

miguelcleon commented Sep 26, 2016

Hi @emiliom the last commit I pushed today should address the issue

result types

based on record counts in results tables, only two result types are being used: >measurement and profile. But a count of results.resulttypecv entries yields two different result types: >"Measurement" (12,192) and "Time series coverage" (199).
results table record counts: measurementresult (205) and profileresult (12,113). No >records in timeseriesresult
These inconsistencies should create problems, in principle

I converted my measurement result, and measurement result values which were really time series into time series result and time series result values. I did this for both the live LCZO and TRACE database. I'm creating back ups now and then will apply these changes on the CUAHSI instances of ODM2 Admin.

@emiliom
Copy link
Member Author

emiliom commented Sep 27, 2016

Thanks for the heads-up, @miguelcleon. I'm still trying to digest the email exchange about time-series-like measurements, and think about the pros and cons of measurements vs time series result types in this context. I'll try to have some input tomorrow.

Regardless, consistency is good, and your change does bring about consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants