-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add finalize & invalidation implementations and mongo persistence #129
feat: add finalize & invalidation implementations and mongo persistence #129
Conversation
📝 WalkthroughWalkthroughThis pull request introduces comprehensive changes across multiple packages in the Apibara ecosystem, focusing on state management, persistence, and testing. The modifications primarily target the indexer, MongoDB, and SQLite plugins, adding new functionalities for handling state invalidation, finalization, and improved configuration management. The changes include updates to testing frameworks, persistence mechanisms, and JSON configurations, with a particular emphasis on providing more robust and flexible state tracking capabilities. Changes
Sequence DiagramsequenceDiagram
participant Indexer
participant StoragePlugin
participant Database
Indexer->>StoragePlugin: Initialize State
StoragePlugin->>Database: Create/Check Collections
Indexer->>StoragePlugin: Process Messages
StoragePlugin->>Database: Persist State
alt Invalidation Needed
Indexer->>StoragePlugin: Trigger Invalidate
StoragePlugin->>Database: Remove Invalid Entries
end
alt Finalization Needed
Indexer->>StoragePlugin: Trigger Finalize
StoragePlugin->>Database: Clean Up Old Entries
end
Possibly related PRs
Poem
Tip CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
cd5474a
to
358c7ec
Compare
const mongoPersistence = new MongoPersistence<TFilter>( | ||
db, | ||
session, | ||
indexerName, | ||
); | ||
|
||
await mongoPersistence.put({ cursor: endCursor, filter }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid creating the MongoPersistence
object and just implement the method inline? Add helper functions to avoid duplicate code but only if needed.
Looks good so far! Only a minor comment. Let's keep the same structure for the drizzle plugin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (9)
packages/plugin-sqlite/src/persistence.ts (3)
Line range hint
43-72
: Validate cursor creation logic
Upon retrieving the stored checkpoint, the code rehydratesorderKey
as a BigInt. Ensure that the database integer column can handle large values and that there are no silent overflows when writing or reading the data.
75-86
: No return value from finalizeState
The function deletes entries infilters
for blocks less than the cursor’sorderKey
, which is fine for finalization. Confirm whether any diagnostic or logging is needed for debugging.
150-159
: Statements object organization
The statements for finalization and invalidation are concise and well-named. Consider grouping them (e.g., under a “stateManagement” key) to keep them logically grouped.packages/plugin-mongo/src/persistence.ts (3)
10-15
: FilterSchema clarity
Definingfilter
asRecord<string, unknown>
is flexible. Ensure type-safety by restricting keys or specifying a domain-specific filter type if needed.
17-18
: Collection naming
Using constant names for collections helps avoid typos. You might further unify naming by suffixing with “Collection” in variable names for clarity, e.g.,CHECKPOINTS_COLLECTION_NAME
.
86-123
: getState result
Returning both cursor and filter is correct. The approach to parseorderKey
as BigInt is good for large integers. Double-check if your actual data can exceed safe integer limits on insert.packages/plugin-mongo/src/index.ts (1)
6-12
: Consolidated imports
Importing persistence methods in one block clarifies how state is managed. This is well-organized.packages/plugin-sqlite/src/index.ts (1)
63-63
: Docstring alignment
The doc now referencesindexerName
; consider elaborating on its usage in multi-indexer scenarios if needed.packages/plugin-mongo/tests/storage.test.ts (1)
13-22
: Use narrower imports to optimize clarity and build times.Imports look consistent and straightforward; however, consider importing only what is necessary from
@apibara/protocol
and../src/persistence
for clarity. This helps keep the code more explicit about dependencies.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
change/@apibara-indexer-ec5dcd3e-a927-4bb0-9f79-bd5767b54d96.json
(1 hunks)change/@apibara-plugin-mongo-fb2e2064-d1b5-4381-a424-e0de14b4a018.json
(1 hunks)change/@apibara-plugin-sqlite-40fbfead-cd0c-4773-a282-baf3f8eed92c.json
(1 hunks)packages/indexer/src/internal/testing.ts
(1 hunks)packages/plugin-mongo/src/index.ts
(3 hunks)packages/plugin-mongo/src/persistence.ts
(1 hunks)packages/plugin-mongo/tests/storage.test.ts
(8 hunks)packages/plugin-mongo/tsconfig.json
(1 hunks)packages/plugin-sqlite/src/index.ts
(6 hunks)packages/plugin-sqlite/src/kv.ts
(2 hunks)packages/plugin-sqlite/src/persistence.ts
(3 hunks)packages/plugin-sqlite/tests/kv.test.ts
(2 hunks)packages/plugin-sqlite/tests/persistence.test.ts
(6 hunks)packages/plugin-sqlite/tsconfig.json
(1 hunks)
✅ Files skipped from review due to trivial changes (3)
- change/@apibara-indexer-ec5dcd3e-a927-4bb0-9f79-bd5767b54d96.json
- change/@apibara-plugin-mongo-fb2e2064-d1b5-4381-a424-e0de14b4a018.json
- change/@apibara-plugin-sqlite-40fbfead-cd0c-4773-a282-baf3f8eed92c.json
🔇 Additional comments (60)
packages/plugin-mongo/tsconfig.json (1)
9-9
: Verify TypeScript file organization with broader inclusion pattern
The change to include all TypeScript files (**/*.ts
) provides more flexibility but requires careful file organization. Please ensure:
- Test files are properly segregated
- No temporary/development TypeScript files are accidentally included in the build
Run this script to analyze the TypeScript file organization:
✅ Verification successful
Let me generate additional scripts to better understand the test file organization and build configuration.
TypeScript file organization is well-structured and safe
The broader inclusion pattern (**/*.ts
) is safe in this context because:
- Test files are properly segregated in a dedicated
tests/
directory - Source files are organized under
src/
- No temporary or development TypeScript files exist in unexpected locations
- The only TypeScript file in the root is the standard
build.config.ts
The file organization follows a clean separation of concerns, making the inclusive pattern **/*.ts
appropriate for this setup.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify TypeScript file organization and potential inclusion issues
# List all TypeScript files that would be included
echo "TypeScript files that will be included in the build:"
fd -e ts
# Check for potential test files that might be accidentally included
echo -e "\nPotential test files that might be included:"
fd -e ts -e spec.ts -e test.ts
Length of output: 11925
Script:
#!/bin/bash
# Check for test file organization patterns and build configuration
# Check if there's a separate test configuration in tsconfig
echo "Checking for test-specific TypeScript configurations:"
fd tsconfig.test.json
# Look for test-related patterns in tsconfig.json
echo -e "\nChecking test patterns in tsconfig.json:"
rg "test|spec" packages/plugin-mongo/tsconfig.json
# Check if there are any TypeScript files directly in the root of plugin-mongo
echo -e "\nChecking TypeScript files in plugin-mongo root:"
fd -e ts -d 1 . packages/plugin-mongo/
# Analyze the structure of test files in plugin-mongo
echo -e "\nAnalyzing test file structure in plugin-mongo:"
tree packages/plugin-mongo/tests
Length of output: 693
packages/plugin-sqlite/tsconfig.json (1)
9-9
: LGTM! Maintains consistency across plugins
The configuration changes mirror those in the MongoDB plugin, maintaining a consistent build configuration across plugins. This unified approach will help with maintainability.
packages/plugin-sqlite/src/persistence.ts (3)
14-19
: Use a single object to simplify arguments
The refactored signature that takes a single props
object is clear and flexible; it also allows for optional properties like indexerName
.
88-103
: Consistent invalidation logic
This function cleanly deletes filters for blocks above the invalidation point and resets any partial ranges. The approach is consistent with the rest of the file.
20-42
: Potential integer overflow risk
You call Number(endCursor.orderKey)
to store a BigInt in the database. If orderKey
exceeds JavaScript’s max safe integer (2^53 - 1), this could lead to incorrect truncation.
As a follow-up, consider verifying usage scenarios for extremely large orderKey
or switching to a text-based storage to avoid integer overflow.
✅ Verification successful
Let me analyze the usage of orderKey
in the codebase to determine if there's a real risk of integer overflow.
Let me try a different approach to check the type definitions and usage patterns.
Let me check one more thing to understand how orderKey is handled in the protocol layer.
Integer overflow is properly handled - no risk
The protocol layer validates that orderKey values fit within uint64 bounds and uses BigInt throughout the codebase. The conversion to Number in the SQLite plugin is safe since the actual values used in the codebase (e.g. 800_000n, 10_000_000n) are well within JavaScript's safe integer range.
Key findings that support this:
- The protocol layer enforces uint64 bounds:
BigInt.asUintN(64, message.orderKey) !== message.orderKey
throws if exceeded - All orderKey values in the codebase use BigInt (n suffix) for storage and manipulation
- The largest orderKey values seen in production code are around 10_000_000n, far below JavaScript's Number.MAX_SAFE_INTEGER
- The conversion to Number only happens at database storage time, after the protocol layer has already validated the bounds
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Looks for code that might handle extremely large orderKey to confirm usage.
rg -A 5 'orderKey'
Length of output: 93702
Script:
#!/bin/bash
# Look for schema definitions and type declarations for orderKey
ast-grep --pattern 'type $_ = { $$$orderKey$$$}'
Length of output: 50
Script:
#!/bin/bash
# Look for type definitions and declarations related to orderKey
rg -A 5 "type.*orderKey|interface.*orderKey"
# Look for actual values being used for orderKey in the codebase
rg "orderKey:.*[0-9]"
Length of output: 18716
Script:
#!/bin/bash
# Look for BigInt handling and validation in the protocol layer
rg -A 5 "BigInt.*orderKey"
Length of output: 1780
packages/plugin-mongo/src/persistence.ts (4)
4-8
: CheckpointSchema looks good
The structure for checkpoint documents is clear and straightforward.
20-34
: initializePersistentState
You create both collections and indexes in one function. This is good for atomic setup. No major issues.
124-147
: invalidateState approach
Deleting filters with fromBlock
above the invalidation cursor, then patching existing ones, is consistent with how SQLite does it. Good alignment.
148-163
: finalizeState approach
Removing filter entries whose toBlock
is below the finalization cursor is straightforward. The logic is consistent with the rest of the module.
packages/plugin-mongo/src/index.ts (9)
38-38
: Optional indexerName in MongoStorageOptions
Providing an optional indexerName
is consistent with the approach in SQLite. Good for multi-indexer scenarios.
57-57
: Name parameter defaults
Ensure the default name 'default'
in this file matches usage in persistence.ts
to avoid confusion.
60-67
: run:before
Initialization in a transaction is a safe approach for setting up the schema. Good synergy with the new state methods.
69-88
: connect:before
Retrieving state from Mongo ensures the indexer restarts from the correct checkpoint. This is essential for robust indexing.
92-106
: connect:after
Performing invalidation here is crucial for removing data from partially processed blocks. Nicely handled with invalidateState
.
110-126
: connect:factory
Persisting the newly computed filter and checkpoint ensures the indexer’s updates are tracked. Good usage of the new persistState
.
128-143
: message:finalize
Calls to finalize
for both local data and the persisted state keep everything in sync. The explicit check for cursor
is important.
145-158
: message:invalidate
Similar to finalize, the code ensures that local data and persistent state remain consistent during invalidations.
178-178
: Persist after middleware
Ensuring a final persist after the handler logic helps avoid losing state updates from the message handling phase.
packages/plugin-sqlite/src/index.ts (9)
7-11
: Key-Value store utilities
Importing finalizeKV
and invalidateKV
here fosters consistency with the new finalize/invalidate approach.
48-48
: Optional indexerName
Storing an indexer-specific ID in SQLite is consistent with the multi-indexer approach used in other plugins.
71-71
: Consolidated usage
Accepting indexerName
directly in your plugin options is consistent with the docs in the rest of the codebase.
92-92
: Retrieve with indexerName
Fetching the correct row in checkpoints ensures that multiple indexers can share the same database.
104-121
: connect:after
Invalidating partial blocks is critical for consistency. The inline approach for invalidateState
and invalidateKV
is straightforward.
123-140
: connect:factory
Calls persistState
only if endCursor
is present. This check prevents accidental writes when no new blocks have been processed.
142-158
: message:finalize
Calling finalizeState
and finalizeKV
together ensures the persisted filter range and key-value store remain in sync.
160-176
: message:invalidate
Similarly, calling invalidateState
and invalidateKV
keeps both the filter data and kv-store consistent when rolling back.
204-204
: Middleware persistence
Persisting state in the handler middleware commits final changes after message processing completes.
packages/plugin-sqlite/tests/persistence.test.ts (9)
14-14
: Import of Finalize/Invalidate
Good usage of protocol-specific types for clarity in finalize/invalidate tests.
17-17
: Indexer name constant
Using a descriptive name ("test-indexer"
) ensures tests don’t conflict with the default name.
28-28
: Plugin usage
Specifying indexerName
at test time is a good practice to ensure collisions don’t occur in the same DB.
47-221
: Factory mode filter logic
These test steps are thorough, ensuring multiple transitions of the filter. The inlined assertions confirm the correct block and filter transitions.
232-232
: Throwing an error
Verifies that partial writes are rolled back if an error occurs during indexing. This test is critical for guaranteeing atomic transactions.
248-248
: Inline snapshot
Auto-generated snapshot is consistent with the behavior of partial indexing.
265-267
: Disabling persistence
Ensures no checkpoint table is created or updated, preserving ephemeral indexing.
277-441
: Invalidate state in factory mode
Tests partial rollback scenario with changing filters. Good coverage ensures correctness with invalidation logic.
443-608
: Finalize state in factory mode
Covers a finalization scenario with dynamic filter changes. Thoroughly validates correct table updates.
packages/plugin-mongo/tests/storage.test.ts (15)
23-32
: Schema definition is clear and concise.
Defining a dedicated TestSchema
type makes the Mongo collection usage explicit and reduces type errors. Good approach.
34-34
: Test suite naming.
Descriptive suite name provides clarity for the MongoDB-specific tests. The new name aligns with the tested functionality.
43-47
: Use typed collections for better type safety.
You correctly obtain the typed collection and insert new documents. Proper usage of collection<TestSchema>("test")
ensures type checking. The logic for retrieving existing rows and inserting a new one also appears correct.
106-108
: Collection usage in the second test.
Again, referencing the typed collection is consistent. Helps maintain uniform data shape across these tests.
Line range hint 109-126
: Insertion vs. update approach.
The code does an update first, and if no document is modified, inserts a new record. This pattern is clear and works well for an “upsert"-like flow.
141-146
: Sorting query results in tests.
Sorting by blockNumber
ensures the final documents are in ascending order for test verification. This is an effective way to produce deterministic test results.
156-156
: Inline snapshot checking.
Line 156 modifies the snapshot data. This ensures the returned document fields match the updated data. Looks correct.
166-168
: Snapshot shape consistency.
Ensuring the key and data appear in the snapshot is correct. This helps confirm the fields match the initial insertion.
191-202
: Delete vs. Insert logic.
The “delete if found, else insert” approach confirms that the code toggles existence of a single key in each iteration. That test is a good coverage scenario.
242-287
: Selective data handling test.
The test logic for insertion and conditional deletion at a specific block number, plus verifying final data, is well-structured and robust.
288-366
: Invalidation behavior.
Tests confirm that invalidation rewinds the state effectively. The usage of MockClient
with custom invalidate
triggers ensures coverage of edge cases.
367-435
: Partial commits on errors.
Throwing an error at a specific block number confirms that the transaction is rolled back if transform fails halfway. Good negative test.
437-628
: Filter persistence check.
Thorough test for verifying that filters and their applied blocks are stored to the relevant collections. This ensures correctness of multi-filter factory logic.
630-812
: Invalidation in factory mode.
Similar to the single-filter invalidation test, but with an added complexity of factory-based incremental filters. The final data set is accurately validated.
814-996
: Finalize scenario in factory mode.
This test ensures finalization steps apply correctly at the specified block, removing or pruning earlier states. Good coverage for finalize events.
packages/plugin-sqlite/src/kv.ts (3)
53-60
: Transaction-based finalize function.
assertInTransaction(db);
confirms that finalization occurs in a transaction, which is crucial for consistency. The deletion statement is straightforward, removing items whose to_block < cursor.orderKey
.
61-73
: Invalidate function correctness.
Splitting out invalidation into two statements—one for deletion, one for clearing the to_block
—is logical and avoids partial data states. Good approach.
106-115
: SQL statements readability.
The newly added SQL statements (finalize
, invalidateDelete
, invalidateUpdate
) are well structured and match the logic in the exported functions.
packages/indexer/src/internal/testing.ts (2)
13-19
: New optional invalidation options.
Defining MockMessagesOptions
with an invalidate
field is a clear way to track when and how the invalidate message is triggered in test scenarios.
20-51
: Conditional invalidation message generation.
The logic for pushing an invalidate
at invalidateTriggerIndex
is concise; the rest remain normal data
messages. This approach effectively simulates partial reverts or rewinds.
packages/plugin-sqlite/tests/kv.test.ts (4)
14-14
: Adding protocol types.
Importing Finalize
and Invalidate
clarifies the typed nature of your finalize/invalidate logic.
107-108
: Properly awaiting throw check.
Ensuring await expect(...).rejects.toThrow()
is the correct pattern for verifying that an async function throws.
110-277
: Invalidate flow test.
The test simulates multi-step indexing and verifies that data after invalidation is correct. This thoroughly exercises rollback logic.
279-455
: Finalize flow test.
Again, the multi-step indexing scenario culminating in a finalize
event is validated. The final state is consistent with removing stale data.
export async function persistState<TFilter>(props: { | ||
db: Db; | ||
session: ClientSession; | ||
endCursor: Cursor; | ||
filter?: TFilter; | ||
indexerName?: string; | ||
}) { | ||
const { db, session, endCursor, filter, indexerName = "default" } = props; | ||
|
||
if (endCursor) { | ||
await db.collection<CheckpointSchema>(checkpointCollectionName).updateOne( | ||
{ id: indexerName }, | ||
{ | ||
$set: { | ||
orderKey: Number(endCursor.orderKey), | ||
uniqueKey: endCursor.uniqueKey, | ||
}, | ||
}, | ||
{ upsert: true, session }, | ||
); | ||
|
||
if (filter) { | ||
// Update existing filter's to_block | ||
await db | ||
.collection<FilterSchema>(filterCollectionName) | ||
.updateMany( | ||
{ id: indexerName, toBlock: null }, | ||
{ $set: { toBlock: Number(endCursor.orderKey) } }, | ||
{ session }, | ||
); | ||
|
||
// Insert new filter | ||
await db.collection<FilterSchema>(filterCollectionName).updateOne( | ||
{ | ||
id: indexerName, | ||
fromBlock: Number(endCursor.orderKey), | ||
}, | ||
{ | ||
$set: { | ||
filter: filter as Record<string, unknown>, | ||
fromBlock: Number(endCursor.orderKey), | ||
toBlock: null, | ||
}, | ||
}, | ||
{ upsert: true, session }, | ||
); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Based on the search results, I can now provide a definitive response about the precision concern with Number(endCursor.orderKey)
:
orderKey
is a bigint
- use appropriate numeric type for storage
The codebase shows that orderKey
is defined as bigint
in the protocol and is commonly used with large numeric values (e.g., 800_000n
, 10_000_000n
). Converting it to Number
using Number()
could indeed lead to precision loss for very large values. Consider using a numeric type in MongoDB that can handle large integers without precision loss.
- Use MongoDB's
Long
type or store as string to preserve precision for large block numbers - The schema shows
orderKey
is stored asnumber
inCheckpointSchema
, which should be updated to handle large values
🔗 Analysis chain
persistState: handle large orderKey properly
Similar to the SQLite version, calling Number(endCursor.orderKey)
can risk losing precision for very large orderKey values. Review if needed.
Let me analyze the search results to verify the concern about potential precision loss when using Number(endCursor.orderKey)
.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Searching for references to endCursor usage for potential large values.
rg -A 3 'endCursor'
Length of output: 65726
Script:
#!/bin/bash
# Let's search for the type definition of orderKey to understand its nature
ast-grep --pattern 'type Cursor = {
$$$
orderKey: $_
$$$
}'
Length of output: 67
Script:
#!/bin/bash
# Let's try a different pattern to find the type definition
rg -A 5 "interface Cursor" || rg -A 5 "type Cursor"
Length of output: 1008
Script:
#!/bin/bash
# Let's find the actual type definition of orderKey
rg -A 2 "orderKey:" || rg -A 2 "_Cursor = Schema"
Length of output: 39792
No description provided.