Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: #444 Generate publication dates between two specific dates #447

Merged
merged 4 commits into from
Sep 7, 2023

Conversation

MRichards99
Copy link
Collaborator

This PR will close #444

Description

This PR changes how dates are generated for DataPublication.publicationDate and DataPublicationDate.date. They are now set to a random date between 2008 and 2023 - the previous range was 15 years so this is the same. DataPublicationDate.date used to be a different date to the associated DataPublication.publicationDate but they are now set as the same date. I did this because of how I read the issue (and my lack of knowledge in how these fields are actually used) but I'm not sure if they should be or not? Let me know if it should be changed back, it's only a one line change.

Testing Instructions

Check that data generated by this branch works for the e2e tests for DataGateway. I wasn't sure if you were having difficulty with existing e2e tests or new ones so I haven't done this myself yet, but happy to create a branch on DG pointing to this branch and run the CI if that helps (just let me know).

I haven't tested whether the same dates are generated when the data is generated on different days. Here's some output from the database, I'll re-generate the data in two days time and see if it's still the same:

MariaDB [icatdb]> SELECT ID, PUBLICATIONDATE FROM DATAPUBLICATION LIMIT 15;
+----+---------------------+
| ID | PUBLICATIONDATE     |
+----+---------------------+
|  1 | 2021-03-23 00:00:00 |
|  2 | 2017-10-02 00:00:00 |
|  3 | 2009-02-02 00:00:00 |
|  4 | 2022-02-06 00:00:00 |
|  5 | 2022-12-14 00:00:00 |
|  6 | 2014-11-02 00:00:00 |
|  7 | 2010-03-18 00:00:00 |
|  8 | 2010-02-24 00:00:00 |
|  9 | 2018-12-22 00:00:00 |
| 10 | 2017-03-12 00:00:00 |
| 11 | 2010-12-24 00:00:00 |
| 12 | 2019-06-14 00:00:00 |
| 13 | 2011-05-06 00:00:00 |
| 14 | 2018-02-16 00:00:00 |
| 15 | 2021-12-25 00:00:00 |
+----+---------------------+
15 rows in set (0.00 sec)

MariaDB [icatdb]> SELECT ID, DATE_ FROM DATAPUBLICATIONDATE LIMIT 15;
+----+------------+
| ID | DATE_      |
+----+------------+
|  1 | 2021-03-23 |
|  2 | 2017-10-02 |
|  3 | 2009-02-02 |
|  4 | 2022-02-06 |
|  5 | 2022-12-14 |
|  6 | 2014-11-02 |
|  7 | 2010-03-18 |
|  8 | 2010-02-24 |
|  9 | 2018-12-22 |
| 10 | 2017-03-12 |
| 11 | 2010-12-24 |
| 12 | 2019-06-14 |
| 13 | 2011-05-06 |
| 14 | 2018-02-16 |
| 15 | 2021-12-25 |
+----+------------+
15 rows in set (0.00 sec)

Note that one is a datetime and the other is just a date (well, string in ISO 8601 format) - this is how the data types were so I've kept them the same. If they need to be changed, let me know and I can do that.

  • Review code
  • Check GitHub Actions build
  • If icatdb Generator Script Consistency Test CI job fails, is this because of a deliberate change made to the script to change generated data (which isn't actually a problem) or is here an underlying issue with the changes made?
  • Review changes to test coverage
  • Does this change mean a new patch, minor or major version should be made? If so, does one of the commit messages feature fix:, feat: or BREAKING CHANGE: so a release is automatically made via GitHub Actions upon merge?

Agile Board Tracking

Connect to #444

- This should stop relative generation causing different data to be generated on different days
@MRichards99 MRichards99 force-pushed the bugfix/data-publication-dates-#444 branch from 2cfbaf2 to 7908c10 Compare August 30, 2023 12:45
@codecov
Copy link

codecov bot commented Aug 30, 2023

Codecov Report

Patch coverage has no change and project coverage change: +1.88% 🎉

Comparison is base (60c539c) 94.78% compared to head (04c3eaf) 96.66%.
Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #447      +/-   ##
==========================================
+ Coverage   94.78%   96.66%   +1.88%     
==========================================
  Files          40       39       -1     
  Lines        3375     3242     -133     
  Branches      317      317              
==========================================
- Hits         3199     3134      -65     
+ Misses        135       80      -55     
+ Partials       41       28      -13     

see 18 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MRichards99
Copy link
Collaborator Author

The failure on the generator script CI job is fine as we're expecting the data to change (I've checked the only differences are the dates). The failures from safety and 3.10 tests aren't expected so I'll investigate those

@MRichards99 MRichards99 mentioned this pull request Aug 30, 2023
6 tasks
@MRichards99
Copy link
Collaborator Author

Unexpected CI failures are fixed in #449

@MRichards99
Copy link
Collaborator Author

Data generated this morning contains the same dates as the other day!

MariaDB [icatdb]> SELECT ID, PUBLICATIONDATE FROM DATAPUBLICATION LIMIT 15;
+----+---------------------+
| ID | PUBLICATIONDATE     |
+----+---------------------+
|  1 | 2021-03-23 00:00:00 |
|  2 | 2017-10-02 00:00:00 |
|  3 | 2009-02-02 00:00:00 |
|  4 | 2022-02-06 00:00:00 |
|  5 | 2022-12-14 00:00:00 |
|  6 | 2014-11-02 00:00:00 |
|  7 | 2010-03-18 00:00:00 |
|  8 | 2010-02-24 00:00:00 |
|  9 | 2018-12-22 00:00:00 |
| 10 | 2017-03-12 00:00:00 |
| 11 | 2010-12-24 00:00:00 |
| 12 | 2019-06-14 00:00:00 |
| 13 | 2011-05-06 00:00:00 |
| 14 | 2018-02-16 00:00:00 |
| 15 | 2021-12-25 00:00:00 |
+----+---------------------+
15 rows in set (0.00 sec)

MariaDB [icatdb]> SELECT ID, DATE_ FROM DATAPUBLICATIONDATE LIMIT 15;
+----+------------+
| ID | DATE_      |
+----+------------+
|  1 | 2021-03-23 |
|  2 | 2017-10-02 |
|  3 | 2009-02-02 |
|  4 | 2022-02-06 |
|  5 | 2022-12-14 |
|  6 | 2014-11-02 |
|  7 | 2010-03-18 |
|  8 | 2010-02-24 |
|  9 | 2018-12-22 |
| 10 | 2017-03-12 |
| 11 | 2010-12-24 |
| 12 | 2019-06-14 |
| 13 | 2011-05-06 |
| 14 | 2018-02-16 |
| 15 | 2021-12-25 |
+----+------------+
15 rows in set (0.00 sec)

@louise-davies
Copy link
Member

@MRichards99 the e2e tests on develop in DG are the current ones. You can see the commit here where I work around the issue by setting the static date when the data was generated in SG-preprod, and working out the different between that date and the current date to adjust the dates to test for in the CI (where the data is generated on the fly in the job): ral-facilities/datagateway@ac1125f

Once this fix goes in, we'll have to regenerate the data in SG-preprod, and then we can rewrite that test that will pass on both CI and locally against SG-preprod with static dates, rather than the weird calculated work around that I made in that commit.

@MRichards99
Copy link
Collaborator Author

@louise-davies did you want me to rewrite that test in DG? Luckily I've got notes for regenerating the data (from when I did it with Reilly) so I haven't got to remember how to do it 🎉

Also, are you happy with the following?

DataPublicationDate.date used to be a different date to the associated DataPublication.publicationDate but they are now set as the same date.

@louise-davies
Copy link
Member

@MRichards99 up to you - if you're happy to regenerate the data on SG-preprod then that's the bit that I've done less recently. I'm also happy to have Kacper or me rewriting the test if you're busy but if you wanna do it then go ahead.

re: DataPublication.publicationDate vs DataPublication.date - I don't know what the latter is - in the ICAT schema there's only DataPublication.publicationDate and DataPublication.dates - I presume this is the latter? If so, we're not using it right now, but it may be useful for it to be different just so we know it's not the same field as DataPublication.publicationDate when testing

@MRichards99
Copy link
Collaborator Author

I've just made the two attributes different and have cleaned up the test on DG (I'll create a PR for this soon)

- As ever, we're having to ignore these vulnerabilities as patched versions don't support Python 3.6
Copy link
Member

@louise-davies louise-davies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MRichards99 MRichards99 merged commit b21002e into main Sep 7, 2023
@MRichards99 MRichards99 deleted the bugfix/data-publication-dates-#444 branch September 7, 2023 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data generator script doesn't return stable DataPublication dates
2 participants