Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✂️ Add script to purge inactive users from decommissioned deployments #1010

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

JGreenlee
Copy link
Contributor

The following conditions must be met before this script should be run on a program:

  • The program’s collection period (per MOU) has passed, and we have confirmed with program admins that it is ok to shut down
  • Data has already been archived in TSDC

I tested this on my local dump of nrel-commute:

(emission) jgreenle@jgreenle-34794s e-mission-server % PROD_LIST=nrel-commute DB_HOST=mongodb://localhost:27017/openpath_prod_nrel_commute PYTHONPATH=. ./e-mission-py.bash bin/historical/migrations/purge_inactive_users.py
Config file not found, returning a copy of the environment variables instead...
Retrieved config: {'DB_HOST': 'mongodb://localhost:27017/openpath_prod_nrel_commute', 'DB_RESULT_LIMIT': None}
Connecting to database URL mongodb://localhost:27017/openpath_prod_nrel_commute
PROD_LIST: ['nrel-commute']
About to run purge_inactive_users on 1 deployments. Proceed? [y/n]
y
Running purge_inactive_users for nrel-commute on DB nrel_commute
Config file not found, returning a copy of the environment variables instead...
Retrieved config: {'DB_HOST': 'mongodb://localhost:27017/openpath_prod_nrel_commute', 'DB_RESULT_LIMIT': None}
Connecting to database URL mongodb://localhost:27017/openpath_prod_nrel_commute
Total users: 51
Finding inactive users...
Checking activity for user (uuid)
for user <uuid>, last call was 1734401813.8308148
(...continued)
Of 51 users, found 46 inactive users:
(list of UUIDs)
Purging inactive users...
Purging user (uuid)
(...continued)

I also ran it a second time to ensure nothing bad happened:

Of 5 users, found 0 inactive users:
[]
Purging inactive users...

I inspected the DB and confirmed that 5 users remain.

Then, I referenced the UUIDs table on the admin dashboard and verified that those 5 users have a recent last_call_ts, while the other 46 users do not.

This gives us a chance to double-check the list of deployments before a script will be run on all of them.
Otherwise, a typo in the PROD_LIST environment variable could be potentially catastrophic

In practice, the console looks something like this:
```
PROD_LIST: ['wyoming', 'mm-masscec', 'uue', 'ccebikes', 'unc-ebike', 'uw-prs', 'uprm-civic', 'sm-ebike', 'dfc-fermata', '4core-ebike', 'godcgo', 'e-bikes-for-essentials', 'doee-electricbike-proj', 'ride2own', 'smart-commute-ebike', 'ebikethere-garfield-county', 'usaid-laos-ev', 'caeb-co', 'denver-casr', 'r2ohillsboro', 'nrel-commute', 'stm-community', 'r2omilwaukie', 'open-access', 'ca-ebike', 'r2oparkrose', 'uprm-nicr', 'durham', 'nc-transit-equity-study', 'ebikegj', 'cortezebikes', 'dcebike', 'washingtoncommons', 'choose-your-ride', 'cosa-ebike-project', 'fortmorgan']
About to run purge_inactive_users on 36 deployments. Proceed? [y/n]
```
This script identifies users in the DB who have no location data nor API calls in the last 90 days, and purges all of their data.
This should ONLY be run on inactive deployments that we are decommissioning, and that have already been archived in the TSDC.
Comment on lines +13 to +18
'''
Users are inactive if:
- no API calls in the last 90 days
AND
- no locations in the last 90 days
'''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JGreenlee is this consistent with the definition of "active user" in the admin dashboard and the public dashboard? I think we should ensure that our definition of "active user" is consistent throughout the platform.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the public dashboard, active users are the number of users that were active on a specific day

https://github.com/e-mission/em-public-dashboard/blob/00b70c96495f2bcc07837c4d89ff073bc30fdcc1/viz_scripts/generic_timeseries.ipynb#L186

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the admin dashboard, active users are users that have had activity in the last day

https://github.com/e-mission/op-admin-dashboard/blob/33d41d16ce9cd175922aa5cc6cf4264aee247911/pages/home.py#L237

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the admin dashboard also only uses the timestamp of API calls and no location timestamps

Copy link
Contributor

@shankari shankari Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use both API calls and locations. We can use this definition for now, but then encode this into https://github.com/JGreenlee/e-mission-common and change the others to use it.

Comment on lines +63 to +64
if __name__ == '__main__':
run_on_all_deployments(purge_inactive_users)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is useful, but I think that a first step is just a script to identify the number of inactive users in each program so that we can also identify inactive programs.

Such inactive programs are:

  • candidates for experimenting with the new architecture, AND
  • proactively reaching out to ask if we can shut down, even if the MOU is still active

Since the new proposed script would also use find_inactive_uuids, you can have it be a variant of this (controlled via a command line argument that you get using argparse) or a separate script that imports find_inactive_uuids

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this essentially be the same script but without the part where it actually removes the users?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

I would make the default be to print, and actually remove the users only when the appropriate command line arg is passed in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants