Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes an import error when using S3/Minio with no redis #1224

Merged
merged 7 commits into from
Oct 7, 2024

Conversation

Spoffy
Copy link
Contributor

@Spoffy Spoffy commented Sep 24, 2024

Context

Error is caused due to these steps:

  • File is uploaded to Home server and attempts to import
  • Import ends up in claimDocument in HostedStorageManager
  • Tries to read doc metadata from DocWorkerMap, gets 'unknown' as md5 hash
  • Thinks local doc is out of date and erases it.
  • Downloads a non-existent file from S3, so import fails as it has no data.

Proposed solution

This fixes it by checking for DummyDocWorker's special 'unknown' MD5, forcing an S3 check.

Related issues

https://community.getgrist.com/t/no-metadata-for-imported-grist-document/6029/32

Has this been tested?

Yes locally, but we don't have a CI/CD setup for S3 with no redis currently.

@paulfitz paulfitz self-requested a review September 25, 2024 12:10
@paulfitz
Copy link
Member

Yes locally, but we don't have a CI/CD setup for S3 with no redis currently.

Hmm, I think it may be possible to repeat the HostedStorageManager tests with REDIS_URL or TEST_REDIS_URL turned off, in which case S3/minio would still be used but Redis would be unavailable. But maybe it wouldn't even have caught this problem...

@paulfitz
Copy link
Member

Yes locally, but we don't have a CI/CD setup for S3 with no redis currently.

Hmm, I think it may be possible to repeat the HostedStorageManager tests with REDIS_URL or TEST_REDIS_URL turned off, in which case S3/minio would still be used but Redis would be unavailable. But maybe it wouldn't even have caught this problem...

This does seem doable, but the tests in the file don't exercise importing so they pass without this fix.

Hmm test/nbrowser/Importer.ts fails quickly if run again S3 and not Redis, then succeeds if your fix is applied.

It is maybe a bit excessive to run a whole new copy of all browser tests under this condition, but @Spoffy do you think it would be worth adding a clause in .github/workflows/main.yml to add test/nbrowser/Importer.ts to one of the runs, running it with S3 and with/without Redis? Or maybe there is a unit-like test that needs adding to HostedStorageManager, and have that do a run (say of cache or minio) without redis.

Error is caused due to these steps:
- File is uploaded to Home server and attempts to import
- Import ends up in `claimDocument` in `HostedStorageManager`
- Tries to read doc metadata from DocWorkerMap, gets 'unknown' as md5 hash
- Thinks local doc is out of date and erases it.
- Downloads a non-existent file from S3, so import fails as it has no data.

This fixes it by checking for DummyDocWorker's special 'unknown' MD5, forcing an S3 check.
@Spoffy Spoffy force-pushed the spoffy/fix-import-error-s3-no-redis branch from 5680d1f to 189aea5 Compare September 28, 2024 12:15
@Spoffy
Copy link
Contributor Author

Spoffy commented Sep 30, 2024

Added some tests that cover this change.

I also altered the constructors for HostedMetadataManager and HostedStorageManager to simplify mocking when testing.

Copy link
Member

@paulfitz paulfitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the test! Some small comments.

@@ -638,6 +638,7 @@ export class DocManager extends EventEmitter {
throw new Error('Grist docs must be uploaded individually');
}
const first = uploadInfo.files[0].origName;
log.debug(`DocManager._doImportDoc: Received doc with name ${first}`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any context that can be added here? For example, what user this is on behalf of, the team site, anything like that. Otherwise this isn't very helpful in prod, though I can see how it could be useful to you while you work on this change. Admittedly your log messages are consistent with the style in other parts of DocManager, but for new messages it would be great to get closer to what new messages in ActiveDoc do, see uses of this.getLogMeta(docSession) there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my main goal for leaving this in, was to know when a file was traveling along the import route.

I'll see if I can workspace / org / userId context? Up to you if this is worth having in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is without the context these kinds of logs are mostly useless in prod because they are mixed in with tons of parallel activity. Some kind of context can help you filter, or piece a narrative together. Still, not going to insist on it.

import log from 'app/server/lib/log';

export type SetDocsMetadataFunc = (metadata: { [p: string]: DocumentMetadata }) => Promise<any>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment on what this type is for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment, and slightly renamed the type for a bit more clarity.

@@ -52,6 +52,11 @@ function checkValidDocId(docId: string): void {
}
}

export interface HostedStorageCallbacks {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment on what this interface is for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comments to the individual functions, as I was struggling to come up with a comment for this that wasn't just a paraphrased explanation of dependency injection. Open to suggestions on this!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add the comment though.

I don't understand why dbManager is not enough. What other things can change that behavior (with links).

@@ -33,6 +33,7 @@ import {createTmpDir, getGlobalPluginManager} from 'test/server/docTools';
import {EnvironmentSnapshot, setTmpLogLevel, useFixtureDoc} from 'test/server/testUtils';
import {waitForIt} from 'test/server/wait';
import uuidv4 from "uuid/v4";
import {IDocWorkerMap} from "app/server/lib/DocWorkerMap";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As much as possible, try to order imports alphabetically, and use the local quote style (I realize it is already inconsistent in this file)

tmpDir = await createTmpDir();
oldEnv = new EnvironmentSnapshot();
// Disable Redis
delete process.env.REDIS_URL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a TEST_REDIS_URL I see mentioned in code, might that need wiping too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be fine - TEST_REDIS_URL seems to be used explicitly by other tests to configure where to connect to.

All we care about here is making sure we get DummyDocWorker out of getDocWorkerMap, which removing process.env.REDIS_URL is sufficient to do. 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid it gets set in test/testrun.sh in SaaS tests :( but yes it would be ok to deal with that if tests arrive there and are failing.


const creator = create?.getStorageOptions?.('minio')?.create;
if (!creator) {
return this.skip();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this won't skip if minio is not configured. What do you think about something like:

      const storage = create?.getStorageOptions?.('minio');
      const creator = storage?.create;
      if (!creator || !storage?.check?.()) {

@Spoffy Spoffy requested a review from paulfitz October 1, 2024 18:23
Copy link
Member

@paulfitz paulfitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Spoffy

@Spoffy Spoffy merged commit a5c9a49 into main Oct 7, 2024
11 checks passed
@Spoffy Spoffy deleted the spoffy/fix-import-error-s3-no-redis branch October 7, 2024 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants