-
Notifications
You must be signed in to change notification settings - Fork 341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for HDFS federation #227
base: master
Are you sure you want to change the base?
Conversation
These changes now enforce proper HDFS configurations. Specifically: * Highly available (HA) clusters require 1. a nameservice in dfs.nameservices 2. namenode ids for that nameservice in dfs.ha.namenodes.NAMESERVICE 3. rpc addresses for those namenode ids in dfs.namenode.rpc-address.NAMESERVICE.NNID * Non-HA but federated clusters require 1. a nameservice in dfs.nameservices 2. an rpc address for the namenode in dfs.namenode.rpc-address.NAMESERVICE * Non-HA and non-federated clusters require 1. an rpc address for the namenode in dfs.namenode.rpc-address HA and federated configuration takes precedence such that if a property like dfs.nameservices is present, default clients will not use a sole namenode rpc address defined by dfs.namenode.rpc-address alone.
Hi @j4ns8i, sorry for the radio silence. Unfortunately, breaking changes to the API here would require a major version bump. Is the problem here for that the different nameservices aren't available through |
No worries at all 🙂 The problem I was facing was related to the different nameservices not working as I'd hoped from the While this is purely anecdotal, my application is currently running successfully using this PR's branch with the ability to distinguish between nameservices. I appreciate the response and especially the effort that was put into developing and publishing this package - thank you! |
Ok, then would you be willing to rejigger this PR a bit?
There are currently test fixtures in |
I'll try to get to this at some point, I've just been tied up a bit recently. |
@j4ns8i should we wait for you? :) |
These changes now enforce proper HDFS federation/HA configurations. Specifically:
HA and federated configuration takes precedence such that if a property like dfs.nameservices is present, default clients will not use a sole namenode rpc address defined by dfs.namenode.rpc-address alone.
I'm sorry I haven't written any additional tests; I haven't had the time to set up an image on which to run a minicluster. If you have an idea for easily setting up a testing environment I'd be happy to try it. I think the tests (or testing environment) might actually need some rework to match upstream hadoop client behavior.
I tried not to be too opinionated but some of these changes did break previous behavior. This was done to match the behavior I observed from the
hadoop fs
command, which I used as a generalization for upstream hadoop client behavior. For example, a namenode specified throughdfs.namenode.rpc-address.NAMESERVICE.NNID
will no longer be returned fromNamenodes
orDefaultNamenodes
unless that nameservice and namenode id are present indfs.nameservices
anddfs.ha.namenodes.NAMESERVICE
, respectively. This should close #225, however.Let me know what you think.