-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-19232: [ABFS][FNSOverBlob] Implementing Ingress Support with various Fallback Handling #7272
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some thoughts around production code.
Will do another round of review for test code,.
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsConfiguration.java
Outdated
Show resolved
Hide resolved
...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java
Outdated
Show resolved
Hide resolved
...op-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/FileSystemConfigurations.java
Show resolved
Hide resolved
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/AbfsHttpConstants.java
Show resolved
Hide resolved
.../java/org/apache/hadoop/fs/azurebfs/contracts/exceptions/InvalidIngressServiceException.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlock.java
Show resolved
Hide resolved
...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlockStatus.java
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsErrors.java
Outdated
Show resolved
Hide resolved
...ools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java
Show resolved
Hide resolved
...ools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java
Show resolved
Hide resolved
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some Suggestions for test code improvement.
hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml
Outdated
Show resolved
Hide resolved
.../java/org/apache/hadoop/fs/azurebfs/contracts/exceptions/InvalidIngressServiceException.java
Show resolved
Hide resolved
...ools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAbfsCustomEncryption.java
Outdated
Show resolved
Hide resolved
...ls/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/AbstractAbfsIntegrationTest.java
Outdated
Show resolved
Hide resolved
...ools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAbfsCustomEncryption.java
Show resolved
Hide resolved
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemAppend.java
Outdated
Show resolved
Hide resolved
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemAppend.java
Outdated
Show resolved
Hide resolved
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemAppend.java
Outdated
Show resolved
Hide resolved
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemAppend.java
Outdated
Show resolved
Hide resolved
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemAppend.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pending Test Code Review
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/utils/AzcopyExecutionException.java
Outdated
Show resolved
Hide resolved
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/utils/AzcopyExecutionException.java
Show resolved
Hide resolved
...oop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsPaginatedDelete.java
Show resolved
Hide resolved
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsOutputStream.java
Show resolved
Hide resolved
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsOutputStream.java
Outdated
Show resolved
Hide resolved
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsOutputStream.java
Outdated
Show resolved
Hide resolved
...src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsHttpClientRequestExecutor.java
Show resolved
Hide resolved
...ls/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFilesystemAcl.java
Outdated
Show resolved
Hide resolved
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/AbfsHttpConstants.java
Show resolved
Hide resolved
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/AbfsHttpConstants.java
Show resolved
Hide resolved
...-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlock.java
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsErrors.java
Show resolved
Hide resolved
...ools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java
Show resolved
Hide resolved
...ools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java
Outdated
Show resolved
Hide resolved
...ools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java
Outdated
Show resolved
Hide resolved
...hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AzureBlobBlockManager.java
Outdated
Show resolved
Hide resolved
...doop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AzureBlobIngressHandler.java
Outdated
Show resolved
Hide resolved
...adoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AzureDFSIngressHandler.java
Outdated
Show resolved
Hide resolved
...adoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AzureDFSIngressHandler.java
Show resolved
Hide resolved
...c/main/java/org/apache/hadoop/fs/azurebfs/services/AzureDfsToBlobIngressFallbackHandler.java
Outdated
Show resolved
Hide resolved
...adoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AzureDFSIngressHandler.java
Outdated
Show resolved
Hide resolved
...adoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AzureDFSIngressHandler.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/hadoop/fs/azurebfs/services/AzureDfsToBlobIngressFallbackHandler.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/hadoop/fs/azurebfs/services/AzureDfsToBlobIngressFallbackHandler.java
Outdated
Show resolved
Hide resolved
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
============================================================
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Thanks for taking all the comments.
LGTM. Please resolve the checkstyle errors as part of next iteration.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
🎊 +1 overall
This message was automatically generated. |
...p-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobBlock.java
Show resolved
Hide resolved
...op-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/FileSystemConfigurations.java
Show resolved
Hide resolved
...-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsConfiguration.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsConfiguration.java
Show resolved
Hide resolved
...hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AzureBlobBlockManager.java
Outdated
Show resolved
Hide resolved
...hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AzureBlobBlockManager.java
Show resolved
Hide resolved
...doop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AzureBlobIngressHandler.java
Show resolved
Hide resolved
...doop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AzureBlobIngressHandler.java
Outdated
Show resolved
Hide resolved
...doop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AzureBlobIngressHandler.java
Outdated
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
...-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java
Show resolved
Hide resolved
...-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java
Show resolved
Hide resolved
...ools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java
Show resolved
Hide resolved
...ools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java
Show resolved
Hide resolved
...ools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java
Show resolved
Hide resolved
...hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AzureBlobBlockManager.java
Outdated
Show resolved
Hide resolved
...hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AzureBlobBlockManager.java
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
...ools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java
Show resolved
Hide resolved
...ools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM.
-
Please resolve all the comments before merging.
-
Please create a system test jira to upload the e2e test scenarios that you have verified and the future plans for exploratory testing as this feature made bulk changes in ABFS code path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+LGTM
🎊 +1 overall
This message was automatically generated. |
Description of PR :
This Pr is in correlation to the series of work done under Parent Jira: [HADOOP-19179]
(https://issues.apache.org/jira/browse/HADOOP-19179)
Jira for this Patch: https://issues.apache.org/jira/browse/HADOOP-19232
Scope of this task is to refactor the AbfsOutputStream class to handle the ingress for DFS and Blob endpoint effectively.
Production code changes :
The
AbfsOutputStream
class is crucial for handling the data being written to Azure Storage. Its primary responsibilities include:New Additions
The new additions introduce a more modular and flexible approach to managing data ingress (data being written to storage), catering to both Azure Data Lake Storage (ADLS) and Azure Blob Storage.
AzureIngressHandler
The
AzureIngressHandler
is a new parent class designed to encapsulate common logic for data ingress operations. It simplifies the process of writing data to Azure Storage by providing a unified interface. This class has two specialized child classes:AzureDfsIngressHandler:
AzureBlobIngressHandler:
blockId
.AbfsBlock and AbfsBlobBlock
Data is managed in discrete blocks to improve efficiency and manageability.
AbfsBlock:
AbfsBlobBlock:
AbfsBlock
tailored for Blob Storage.blockId
for each block, which is necessary for the Blob Storage API.Block Managers
To manage these data blocks, new manager classes have been introduced. These classes handle the lifecycle of blocks, including creation, appending, and flushing.
AzureBlockManager:
AzureDFSBlockManager:
AbfsBlock
instances for DFS.AzureBlobBlockManager:
AbfsBlobBlock
instances for Blob Storage.blockId
.Integration with AbfsOutputStream
The
AbfsOutputStream
class has been updated to incorporate the new ingress flow logic, enhancing its ability to handle data writes to both DFS and Blob Storage. Here’s how it integrates:Configuration Selection:
AbfsOutputStream
reads the configuration parameterfs.azure.ingress.service.type
to determine whether the user has configured the system to useBLOB
orDFS
for data ingress.Handler Initialization:
AbfsOutputStream
initializes the appropriate handler (AzureBlobIngressHandler
orAzureDfsIngressHandler
).Buffering Data:
AbfsOutputStream
, it is buffered into blocks (AbfsBlock
for DFS orAbfsBlobBlock
for Blob Storage).Managing Blocks:
AzureDFSBlockManager
orAzureBlobBlockManager
) manages the lifecycle of these blocks, ensuring that data is correctly created, appended, and flushed.Block Id Management (Blob Specific):
AzureBlobBlockManager
ensures that each block has a uniqueblockId
, adhering to the requirements of the Blob Storage API.Detailed Flow
Creating Data Blocks:
AbfsOutputStream
, it is divided into blocks (AbfsBlock
for DFS orAbfsBlobBlock
for Blob Storage).Appending Data:
AzureBlobIngressHandler
orAzureDfsIngressHandler
).Flushing Data:
Lifecycle Management:
AzureDFSBlockManager
andAzureBlobBlockManager
) oversee the lifecycle of blocks, handling retries, errors, and ensuring data integrity.Test Code Changes: