Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-10028 client: Add Go bindings for libdaos (Pool) #15659

Merged
merged 3 commits into from
Feb 7, 2025

Conversation

mjmac
Copy link
Contributor

@mjmac mjmac commented Dec 22, 2024

Start the work of converting the raw cgo in the daos
tool into proper Go bindings for libdaos. This patch
covers pool functionality and adds some new infrastructure
common to both pools and containers.

Features: daos_cmd pool
Required-githooks: true
Signed-off-by: Michael MacDonald [email protected]

Copy link

Ticket title is 'Update golang binding for the DAOS API'
Status is 'In Progress'
Labels: 'SODACODE2022'
https://daosio.atlassian.net/browse/DAOS-10028

@mjmac mjmac force-pushed the mjmac/DAOS-10028-pool branch from 20e923c to 68a15e1 Compare December 22, 2024 01:09
@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15659/2/execution/node/345/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15659/2/execution/node/346/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15659/2/execution/node/337/log

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15659/2/execution/node/340/log

@daosbuild1
Copy link
Collaborator

@daosbuild1
Copy link
Collaborator

Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15659/2/execution/node/508/log

@mjmac mjmac force-pushed the mjmac/DAOS-10028-pool branch 2 times, most recently from 785f5c7 to d94cc38 Compare December 22, 2024 16:02
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15659/4/execution/node/1483/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15659/5/execution/node/544/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15659/5/execution/node/560/log

@mjmac mjmac force-pushed the mjmac/DAOS-10028-pool branch 2 times, most recently from 5a76de4 to 628f3ce Compare December 28, 2024 00:03
@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15659/7/testReport/

@mjmac mjmac force-pushed the mjmac/DAOS-10028-pool branch 2 times, most recently from 510d64b to 90ef6d7 Compare December 28, 2024 18:58
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15659/9/execution/node/1405/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15659/9/execution/node/1415/log

@mjmac mjmac force-pushed the mjmac/DAOS-10028-pool branch from 90ef6d7 to ad55ebd Compare December 31, 2024 12:32
@mjmac mjmac force-pushed the mjmac/DAOS-10028-pool branch 7 times, most recently from 6d9b688 to 5f28018 Compare January 13, 2025 23:04
@mjmac mjmac force-pushed the mjmac/DAOS-10028-pool branch 2 times, most recently from 7a1ccde to aae584f Compare January 17, 2025 21:24
@mjmac mjmac force-pushed the mjmac/DAOS-10028-pool branch from aae584f to cf8ef8a Compare January 17, 2025 22:59
Start the work of converting the raw cgo in the daos
tool into proper Go bindings for libdaos. This patch
covers pool functionality and adds some new infrastructure
common to both pools and containers.

Features: daos_cmd
Required-githooks: true
Signed-off-by: Michael MacDonald <[email protected]>
@mjmac mjmac force-pushed the mjmac/DAOS-10028-pool branch from cf8ef8a to 9b0d824 Compare January 17, 2025 23:01
@mjmac mjmac marked this pull request as ready for review January 18, 2025 16:36
@mjmac mjmac requested review from a team as code owners January 18, 2025 16:36
@mjmac mjmac self-assigned this Jan 18, 2025
@mjmac
Copy link
Contributor Author

mjmac commented Jan 18, 2025

Note to reviewers... Yeah, this is big. It's the first of several patches, though, and contains a lot of the infrastructure needed to start the change over to using the client API instead of raw cgo in the daos tool. There is no new functionality here, and my focus will be on maintaining parity with the existing implementation for now. Most of the "new" code is just the existing cgo moved over into the API, along with extensive unit tests. Imagine that, unit tests!

For context, this is the first follow-on PR (still a WIP): #15721

Features: daos_cmd
Required-githooks: true
Signed-off-by: Michael MacDonald <[email protected]>
@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15659/23/testReport/

@mjmac mjmac requested review from tanabarr, kjacque and knard38 January 29, 2025 12:53
@mjmac
Copy link
Contributor Author

mjmac commented Jan 29, 2025

If it's helpful, it may be easiest to focus on the changes to the daos command specifically, e.g.

Starting with those changes should give reviewers a sense of how the new API works, e.g.

	-poolInfo, err := queryPool(cmd.cPoolHandle, queryMask)
	+poolInfo, err := cmd.pool.Query(cmd.MustLogCtx(), queryMask)

The code under lib/daos/api is mostly not new, just moved from the tool and made unit-testable via a mocking framework.

Copy link
Contributor

@kjacque kjacque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nice cleanup overall. I do find myself worrying about the rapidly-expanding stubs, though. I'm also curious how people are supposed to mock the API without the stubs, given that they are functions and not interfaces.

src/control/lib/daos/api/attribute.go Show resolved Hide resolved
src/control/lib/daos/api/attribute.go Show resolved Hide resolved
src/control/lib/daos/api/attribute.go Show resolved Hide resolved
src/control/lib/daos/api/attribute.go Show resolved Hide resolved
src/control/lib/daos/api/handle.go Show resolved Hide resolved
Comment on lines +20 to +21
// Long-term, this should be phased out as API users should mock
// the API instead of relying on the stubs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we probably need a task to change any existing code at the higher level so people don't imitate it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point. I see these as temporary scaffolding for the construction of the API. I'll update the comment to make that more explicit.

Comment on lines +92 to +99
/*static inline uint32_t
get_rebuild_state(struct daos_rebuild_status *drs)
{
if (drs == NULL)
return 0;

return drs->rs_state;
}*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think that's cruft that I missed. I'll remove it if I need to make changes to this PR, or in the next one.

@mjmac
Copy link
Contributor Author

mjmac commented Feb 1, 2025

A nice cleanup overall. I do find myself worrying about the rapidly-expanding stubs, though.

I really don't see a better way to make the code testable, but I'm open to suggestions. Personally, I think it's fine for the long term because all of the stub stuff should eventually be hidden behind the API, so API users and external tests don't need to deal with them.

I'm also curious how people are supposed to mock the API without the stubs, given that they are functions and not interfaces.

I was thinking about this the other day. I started playing around with the api.Provider thing when I added the health check stuff. One option would be to expand that to include methods for all of the API functions. At the moment, though, I think that would be a maintenance headache.

I think a better way to do this is to delegate mocking to users of the API. So, e.g. out in the daos tool we can define interfaces for the parts of the API that we want to stub, and just create wrappers to implement those interfaces. This seems like a more maintainable and cleaner solution. I think it will be clearer as more of the tool code is moved into the API, and the code that remains under cmd/daos is mostly presentation layer, which means that the tests and stubs should be pretty simple.

There's already some precedent for this kind of thing in the http.HandlerFunc pattern. Basically you can wrap a function in a struct with a method to implement an interface. We use it in the server: https://github.com/daos-stack/daos/blob/master/src/control/server/server_utils.go#L692

Copy link
Contributor

@tanabarr tanabarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to do as in-depth a review as I would have liked given the sheer volume of changes so whilst I haven't seen any obvious problems it's possible I might have missed something subtle. No blocking issues but I really dislike the "shoehorning" of target-indexes into RankSetFlag just because it's a numerical list.

Rank uint32 `long:"rank" required:"1" description:"Engine rank of the targets to be queried"`
Targets string `long:"target-idx" description:"Comma-separated list of target idx(s) to be queried"`
Rank uint32 `long:"rank" required:"1" description:"Engine rank of the target(s) to be queried"`
Targets ui.RankSetFlag `long:"target-idx" description:"Comma-separated list of target index(es) to be queried (default: all)"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like this, it's confusing. Either rename RankSetFlag or explicitly use ParseNumberList. Shoehorning targets into RankSetFlag just because it's a numerical list isn't pretty IMO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree that it would probably be better to create a new flag type. The nice thing about RankSetFlag is that it handles individual numbers, comma-separated numbers, or ranges (with or without enclosing brackets). There are also helpers already for converting rank slices to uint32 slices and vice-versa. Probably better to do that cleanup in a standalone PR, though.

src/control/cmd/dmg/pool.go Show resolved Hide resolved
src/control/cmd/dmg/pool.go Show resolved Hide resolved
@kjacque
Copy link
Contributor

kjacque commented Feb 3, 2025

I really don't see a better way to make the code testable, but I'm open to suggestions. Personally, I think it's fine for the long term because all of the stub stuff should eventually be hidden behind the API, so API users and external tests don't need to deal with them.

I think that's the piece that's missing for me. The stub usage climbs up out of the layer where it should be isolated.

To be honest I'd prefer the stubbing to be done at runtime in the test cases themselves, similar to what we do elsewhere in the code, by having a thin wrapper type that can have interfaces written around it, passing around functions as parameters, setting them as members of structs, etc. Any of those methods seem better to me than having build-time stubs similar to what we are forced to do in the C code.

For the time being I think it's fine, any coverage is better than nothing. But philosophically, I do think tests are more readable if you have most of the information in the test itself, regardless of the method used.

I think a better way to do this is to delegate mocking to users of the API. So, e.g. out in the daos tool we can define interfaces for the parts of the API that we want to stub, and just create wrappers to implement those interfaces. This seems like a more maintainable and cleaner solution.

I'm inclined toward the API package defining the wrapper struct itself, but otherwise I agree. One big interface seems guaranteed to be a maintenance nightmare (and leads to more bloated mock structures that try to be everything for every test, and become hard to understand). Makes sense to have the callers define their own interfaces, and mock those small interfaces in their tests.

All that is to say I think this is fine to land and continue to iterate. We can debate (or not) about approaches, but it's all an improvement over the current untestable tool. Nice work.

Copy link
Contributor

@kjacque kjacque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any remaining issues can be improved upon in the coming follow-on patches. Nice work.

Copy link
Contributor

@knard38 knard38 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly OK for me for what I understand.
If my two comments are relevant, I have no issues that they will be fix in a followup PR.

get_rebuild_state(struct daos_rebuild_status *drs)
{
if (drs == NULL)
return 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning DRS_IN_PROGRESS when drs is NULL sounds strange to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this was copied from the tool code: https://github.com/daos-stack/daos/blame/master/src/control/cmd/daos/util.h#L119

I agree, though... Seems like it would be better return DRS_NOT_STARTED? I'll fix it in the next PR.

}
pLabel := C.CString(cmd.pool.Label)
defer freeString(pLabel)
C.strncpy(&ap.pool_str[0], pLabel, C.DAOS_PROP_LABEL_MAX_LEN)
Copy link
Contributor

@knard38 knard38 Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but could be useful to add '\0' at the end of ap.pool_str as strncpy() is not safe.

Suggested change
C.strncpy(&ap.pool_str[0], pLabel, C.DAOS_PROP_LABEL_MAX_LEN)
C.strncpy(&ap.pool_str[0], pLabel, C.DAOS_PROP_LABEL_MAX_LEN)
ap.pool_str[C.DAOS_PROP_LABEL_MAX_LEN] = '\0'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll add that in the next PR.

@mjmac mjmac merged commit 7d02111 into master Feb 7, 2025
57 checks passed
@mjmac mjmac deleted the mjmac/DAOS-10028-pool branch February 7, 2025 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants