Skip to content

Commit

Permalink
Adds initial support for external attachments using the Filesystem or…
Browse files Browse the repository at this point in the history
… MinIO (#1320)

When attachments are uploaded to Grist, they're stored in an SQLite table within the document. 

The result is that large files (multi-GB) substantially increase the size of the SQLite file, and cause performance problems (for example, with snapshots).

The current design / full scope of changes can be seen [in this Google document.](https://docs.google.com/document/d/1ST_DuR22llDyx4PVAMdBlHDz22L5QIJ4qIyLMOUOtZ8/edit#heading=h.d487w4iic6q).

This commit contains a minimal proof of concept for external attachments, with the following features:
- External attachment storage using either MinIO or the local filesystem.
- A "default attachment store" setting for documents, which sets the destination for newly uploaded attachments.
- Defaulting to existing behaviour. Without additional configuration, Grist's attachment storage behaviour is unchanged.
- MinIO attachment storage is automatically made available if it's configured for document storage, but not automatically used by documents.

No UI exists for configuring anything related to attachment storage at this time. To use a store, the API or an SQLite client needs using to set the `idOfDefaultAttachmentStore` document setting to the ID of the desired attachment store. Attachment store IDs are logged to the console.

Currently only a single store of each backend type (minio, filesystem) can exist, as they're automatically configured using the existing setting. 

Three significant components are added in this commit, with the rest of the commit being generally glue around them:
- `AttachmentFileManager` (in "AttachmentFileManager.ts") handles attachment fetching / saving logic on behalf of `ActiveDoc`
- `IAttachmentStore` is an instantiated store, that handles saving / loading attachments from a specific provider.
- `AttachmentStoreProvider` is used to resolve store IDs to stores, and is the main way of providing other components (e.g AttachmentFileManager) with access to store instances.

The available types of attachment store are defined in `coreCreator.ts`, under the name "attachmentStoreBackends". 

A few existing classes have been extended with streaming methods, to simplify interaction with the existing attachment code and to facilitate large files in the future without keeping copies in memory.
  • Loading branch information
Spoffy authored Jan 8, 2025
1 parent 4479ba4 commit ffc4855
Show file tree
Hide file tree
Showing 29 changed files with 1,218 additions and 142 deletions.
8 changes: 8 additions & 0 deletions app/common/DocumentSettings.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,14 @@ export interface DocumentSettings {
locale: string;
currency?: string;
engine?: EngineCode;
// Grist attachments can be stored within the document (embedded in the SQLite file), or held
// externally. The attachmentStoreId expresses a preference for which store should be used for
// attachments in this doc. This store will be used for new attachments when they're added, and a
// process can be triggered to start transferring all attachments that aren't already in this
// store over to it. A full id is stored, rather than something more convenient like a boolean or
// the string "external", after thinking carefully about how downloads/uploads and transferring
// files to other installations could work.
attachmentStoreId?: string;
}

/**
Expand Down
3 changes: 2 additions & 1 deletion app/server/generateInitialDocSql.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import { ActiveDoc } from 'app/server/lib/ActiveDoc';
import { AttachmentStoreProvider } from 'app/server/lib/AttachmentStoreProvider';
import { create } from 'app/server/lib/create';
import { DocManager } from 'app/server/lib/DocManager';
import { makeExceptionalDocSession } from 'app/server/lib/DocSession';
Expand Down Expand Up @@ -33,7 +34,7 @@ export async function main(baseName: string) {
if (await fse.pathExists(fname)) {
await fse.remove(fname);
}
const docManager = new DocManager(storageManager, pluginManager, null as any, {
const docManager = new DocManager(storageManager, pluginManager, null as any, new AttachmentStoreProvider([], ""), {
create,
getAuditLogger() { return createNullAuditLogger(); },
getTelemetry() { return createDummyTelemetry(); },
Expand Down
55 changes: 39 additions & 16 deletions app/server/lib/ActiveDoc.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ import {
UserActionBundle
} from 'app/common/ActionBundle';
import {ActionGroup, MinimalActionGroup} from 'app/common/ActionGroup';
import {ActionSummary} from "app/common/ActionSummary";
import {ActionSummary} from 'app/common/ActionSummary';
import {
AclResources,
AclTableDescription,
Expand Down Expand Up @@ -94,7 +94,6 @@ import {AssistanceSchemaPromptV1Context} from 'app/server/lib/Assistance';
import {AssistanceContext} from 'app/common/AssistancePrompts';
import {AuditEventAction} from 'app/server/lib/AuditEvent';
import {Authorizer, RequestWithLogin} from 'app/server/lib/Authorizer';
import {checksumFile} from 'app/server/lib/checksumFile';
import {Client} from 'app/server/lib/Client';
import {getMetaTables} from 'app/server/lib/DocApi';
import {DEFAULT_CACHE_TTL, DocManager} from 'app/server/lib/DocManager';
Expand All @@ -105,7 +104,7 @@ import {makeForkIds} from 'app/server/lib/idUtils';
import {GRIST_DOC_SQL, GRIST_DOC_WITH_TABLE1_SQL} from 'app/server/lib/initialDocSql';
import {ISandbox} from 'app/server/lib/ISandbox';
import log from 'app/server/lib/log';
import {LogMethods} from "app/server/lib/LogMethods";
import {LogMethods} from 'app/server/lib/LogMethods';
import {ISandboxOptions} from 'app/server/lib/NSandbox';
import {NullSandbox, UnavailableSandboxMethodError} from 'app/server/lib/NullSandbox';
import {DocRequests} from 'app/server/lib/Requests';
Expand All @@ -121,12 +120,13 @@ import {
} from 'app/server/lib/sessionUtils';
import {shortDesc} from 'app/server/lib/shortDesc';
import {TableMetadataLoader} from 'app/server/lib/TableMetadataLoader';
import {DocTriggers} from "app/server/lib/Triggers";
import {DocTriggers} from 'app/server/lib/Triggers';
import {fetchURL, FileUploadInfo, globalUploadSet, UploadInfo} from 'app/server/lib/uploads';
import assert from 'assert';
import {Mutex} from 'async-mutex';
import * as bluebird from 'bluebird';
import {EventEmitter} from 'events';
import {readFile} from 'fs-extra';
import {IMessage, MsgType} from 'grain-rpc';
import imageSize from 'image-size';
import * as moment from 'moment-timezone';
Expand All @@ -137,6 +137,8 @@ import tmp from 'tmp';
import {ActionHistory} from './ActionHistory';
import {ActionHistoryImpl} from './ActionHistoryImpl';
import {ActiveDocImport, FileImportOptions} from './ActiveDocImport';
import {AttachmentFileManager} from './AttachmentFileManager';
import {IAttachmentStoreProvider} from './AttachmentStoreProvider';
import {DocClients} from './DocClients';
import {DocPluginManager} from './DocPluginManager';
import {DocSession, makeExceptionalDocSession, OptDocSession} from './DocSession';
Expand Down Expand Up @@ -265,6 +267,7 @@ export class ActiveDoc extends EventEmitter {
private _onlyAllowMetaDataActionsOnDb: boolean = false;
// Cache of which columns are attachment columns.
private _attachmentColumns?: AttachmentColumns;
private _attachmentFileManager: AttachmentFileManager;

// Client watching for 'product changed' event published by Billing to update usage
private _redisSubscriber?: RedisClient;
Expand All @@ -283,6 +286,7 @@ export class ActiveDoc extends EventEmitter {
constructor(
private readonly _docManager: DocManager,
private _docName: string,
externalAttachmentStoreProvider?: IAttachmentStoreProvider,
private _options?: ICreateActiveDocOptions
) {
super();
Expand Down Expand Up @@ -388,6 +392,14 @@ export class ActiveDoc extends EventEmitter {
loadTable: this._rawPyCall.bind(this, 'load_table'),
});

// This will throw errors if _options?.doc or externalAttachmentStoreProvider aren't provided,
// and ActiveDoc tries to use an external attachment store.
this._attachmentFileManager = new AttachmentFileManager(
this.docStorage,
externalAttachmentStoreProvider,
_options?.doc,
);

// Our DataEngine is a separate sandboxed process (one sandbox per open document,
// corresponding to one process for pynbox, more for gvisor).
// The data engine runs user-defined python code including formula calculations.
Expand Down Expand Up @@ -925,7 +937,7 @@ export class ActiveDoc extends EventEmitter {
}
}
}
const data = await this.docStorage.getFileData(fileIdent);
const data = await this._attachmentFileManager.getFileData(fileIdent);
if (!data) { throw new ApiError("Invalid attachment identifier", 404); }
this._log.info(docSession, "getAttachment: %s -> %s bytes", fileIdent, data.length);
return data;
Expand Down Expand Up @@ -2344,13 +2356,16 @@ export class ActiveDoc extends EventEmitter {
dimensions.height = 0;
dimensions.width = 0;
}
const checksum = await checksumFile(fileData.absPath);
const fileIdent = checksum + fileData.ext;
const ret: boolean = await this.docStorage.findOrAttachFile(fileData.absPath, fileIdent);
this._log.info(docSession, "addAttachment: file %s (image %sx%s) %s", fileIdent,
dimensions.width, dimensions.height, ret ? "attached" : "already exists");
const attachmentStoreId = (await this._getDocumentSettings()).attachmentStoreId;
const addFileResult = await this._attachmentFileManager
.addFile(attachmentStoreId, fileData.ext, await readFile(fileData.absPath));
this._log.info(
docSession, "addAttachment: store: '%s', file: '%s' (image %sx%s) %s",
attachmentStoreId ?? 'local document', addFileResult.fileIdent, dimensions.width, dimensions.height,
addFileResult.isNewFile ? "attached" : "already exists"
);
return ['AddRecord', '_grist_Attachments', null, {
fileIdent,
fileIdent: addFileResult.fileIdent,
fileName: fileData.origName,
// We used to set fileType, but it's not easily available for native types. Since it's
// also entirely unused, we just skip it until it becomes relevant.
Expand Down Expand Up @@ -2822,17 +2837,25 @@ export class ActiveDoc extends EventEmitter {
return this._dataEngine;
}

private async _getDocumentSettings(): Promise<DocumentSettings> {
const docInfo = await this.docStorage.get('SELECT documentSettings FROM _grist_DocInfo');
const docSettingsString = docInfo?.documentSettings;
const docSettings = docSettingsString ? safeJsonParse(docSettingsString, undefined) : undefined;
if (!docSettings) {
throw new Error("No document settings found");
}
return docSettings;
}

private async _makeEngine(): Promise<ISandbox> {
// Figure out what kind of engine we need for this document.
let preferredPythonVersion: '2' | '3' = process.env.PYTHON_VERSION === '2' ? '2' : '3';

// Careful, migrations may not have run on this document and it may not have a
// documentSettings column. Failures are treated as lack of an engine preference.
const docInfo = await this.docStorage.get('SELECT documentSettings FROM _grist_DocInfo').catch(e => undefined);
const docSettingsString = docInfo?.documentSettings;
if (docSettingsString) {
const docSettings: DocumentSettings|undefined = safeJsonParse(docSettingsString, undefined);
const engine = docSettings?.engine;
const docSettings = await this._getDocumentSettings().catch(e => undefined);
if (docSettings) {
const engine = docSettings.engine;
if (engine) {
if (engine === 'python2') {
preferredPythonVersion = '2';
Expand Down
220 changes: 220 additions & 0 deletions app/server/lib/AttachmentFileManager.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
import {
AttachmentStoreDocInfo,
DocPoolId,
getDocPoolIdFromDocInfo,
IAttachmentStore
} from 'app/server/lib/AttachmentStore';
import {AttachmentStoreId, IAttachmentStoreProvider} from 'app/server/lib/AttachmentStoreProvider';
import {checksumFileStream} from 'app/server/lib/checksumFile';
import {DocStorage} from 'app/server/lib/DocStorage';
import log from 'app/server/lib/log';
import {LogMethods} from 'app/server/lib/LogMethods';
import {MemoryWritableStream} from 'app/server/utils/MemoryWritableStream';
import {Readable} from 'node:stream';

export interface IAttachmentFileManager {
addFile(storeId: AttachmentStoreId, fileExtension: string, fileData: Buffer): Promise<AddFileResult>;
getFileData(fileIdent: string): Promise<Buffer | null>;
}

export interface AddFileResult {
fileIdent: string;
isNewFile: boolean;
}

export class StoresNotConfiguredError extends Error {
constructor() {
super('Attempted to access a file store, but AttachmentFileManager was initialized without store access');
}
}

export class StoreNotAvailableError extends Error {
public readonly storeId: AttachmentStoreId;

constructor(storeId: AttachmentStoreId) {
super(`Store '${storeId}' is not a valid and available store`);
this.storeId = storeId;
}
}

export class MissingAttachmentError extends Error {
public readonly fileIdent: string;

constructor(fileIdent: string) {
super(`Attachment file '${fileIdent}' could not be found in this document`);
this.fileIdent = fileIdent;
}
}

export class AttachmentRetrievalError extends Error {
public readonly storeId: AttachmentStoreId;
public readonly fileId: string;

constructor(storeId: AttachmentStoreId, fileId: string, cause?: any) {
const causeError = cause instanceof Error ? cause : undefined;
const causeDescriptor = causeError ? `: ${cause.message}` : '';
super(`Unable to retrieve '${fileId}' from '${storeId}'${causeDescriptor}`);
this.storeId = storeId;
this.fileId = fileId;
this.cause = causeError;
}
}


interface AttachmentFileManagerLogInfo {
fileIdent?: string;
storeId?: string | null;
}

/**
* Instantiated on a per-document basis to provide a document with access to its attachments.
* Handles attachment uploading / fetching, as well as trying to ensure consistency with the local
* document database, which tracks attachments and where they're stored.
*
* This class should prevent the document code from having to worry about accessing the underlying
* stores.
*/
export class AttachmentFileManager implements IAttachmentFileManager {
// _docPoolId is a critical point for security. Documents with a common pool id can access each others' attachments.
private readonly _docPoolId: DocPoolId | null;
private readonly _docName: string;
private _log = new LogMethods(
"AttachmentFileManager ",
(logInfo: AttachmentFileManagerLogInfo) => this._getLogMeta(logInfo)
);

/**
* @param _docStorage - Storage of this manager's document.
* @param _storeProvider - Allows instantiating of stores. Should be provided except in test
* scenarios.
* @param _docInfo - The document this manager is for. Should be provided except in test
* scenarios.
*/
constructor(
private _docStorage: DocStorage,
private _storeProvider: IAttachmentStoreProvider | undefined,
_docInfo: AttachmentStoreDocInfo | undefined,
) {
this._docName = _docStorage.docName;
this._docPoolId = _docInfo ? getDocPoolIdFromDocInfo(_docInfo) : null;
}

public async addFile(
storeId: AttachmentStoreId | undefined,
fileExtension: string,
fileData: Buffer
): Promise<AddFileResult> {
const fileIdent = await this._getFileIdentifier(fileExtension, Readable.from(fileData));
return this._addFile(storeId, fileIdent, fileData);
}

public async _addFile(
storeId: AttachmentStoreId | undefined,
fileIdent: string,
fileData: Buffer
): Promise<AddFileResult> {
this._log.info({ fileIdent, storeId }, `adding file to ${storeId ? "external" : "document"} storage`);
if (storeId === undefined) {
return this._addFileToLocalStorage(fileIdent, fileData);
}
const store = await this._getStore(storeId);
if (!store) {
this._log.info({ fileIdent, storeId }, "tried to fetch attachment from an unavailable store");
throw new StoreNotAvailableError(storeId);
}
return this._addFileToAttachmentStore(store, fileIdent, fileData);
}

public async getFileData(fileIdent: string): Promise<Buffer> {
const fileInfo = await this._docStorage.getFileInfo(fileIdent);
if (!fileInfo) {
this._log.error({ fileIdent }, "cannot find file metadata in document");
throw new MissingAttachmentError(fileIdent);
}
this._log.debug(
{ fileIdent, storeId: fileInfo.storageId },
`fetching attachment from ${fileInfo.storageId ? "external" : "document "} storage`
);
if (!fileInfo.storageId) {
return fileInfo.data;
}
const store = await this._getStore(fileInfo.storageId);
if (!store) {
this._log.warn({ fileIdent, storeId: fileInfo.storageId }, `unable to retrieve file, store is unavailable`);
throw new StoreNotAvailableError(fileInfo.storageId);
}
return this._getFileDataFromAttachmentStore(store, fileIdent);
}

private async _addFileToLocalStorage(fileIdent: string, fileData: Buffer): Promise<AddFileResult> {
const isNewFile = await this._docStorage.findOrAttachFile(fileIdent, fileData);

return {
fileIdent,
isNewFile,
};
}

private async _getStore(storeId: AttachmentStoreId): Promise<IAttachmentStore | null> {
if (!this._storeProvider) {
throw new StoresNotConfiguredError();
}
return this._storeProvider.getStore(storeId);
}

private _getDocPoolId(): string {
if (!this._docPoolId) {
throw new StoresNotConfiguredError();
}
return this._docPoolId;
}

private async _getFileIdentifier(fileExtension: string, fileData: Readable): Promise<string> {
const checksum = await checksumFileStream(fileData);
return `${checksum}${fileExtension}`;
}

private async _addFileToAttachmentStore(
store: IAttachmentStore, fileIdent: string, fileData: Buffer
): Promise<AddFileResult> {
const isNewFile = await this._docStorage.findOrAttachFile(fileIdent, undefined, store.id);

// Verify the file exists in the store. This allows for a second attempt to correct a failed upload.
const existsInRemoteStorage = !isNewFile && await store.exists(this._getDocPoolId(), fileIdent);

if (!isNewFile && existsInRemoteStorage) {
return {
fileIdent,
isNewFile: false,
};
}

// Possible issue if this upload fails - we have the file tracked in the document, but not available in the store.
// TODO - Decide if we keep an entry in SQLite after an upload error or not. Probably not?
await store.upload(this._getDocPoolId(), fileIdent, Readable.from(fileData));

// TODO - Confirm in doc storage that it's successfully uploaded? Need to decide how to handle a failed upload.
return {
fileIdent,
isNewFile,
};
}

private async _getFileDataFromAttachmentStore(store: IAttachmentStore, fileIdent: string): Promise<Buffer> {
try {
const outputStream = new MemoryWritableStream();
await store.download(this._getDocPoolId(), fileIdent, outputStream);
return outputStream.getBuffer();
} catch(e) {
throw new AttachmentRetrievalError(store.id, fileIdent, e);
}
}

private _getLogMeta(logInfo?: AttachmentFileManagerLogInfo): log.ILogMeta {
return {
docName: this._docName,
docPoolId: this._docPoolId,
...logInfo,
};
}
}
Loading

0 comments on commit ffc4855

Please sign in to comment.