-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace most strtok(3) calls by strsep(3) #1093
Conversation
b87f79c
to
75903c3
Compare
62c5276
to
eb9e625
Compare
eb9e625
to
7f42b74
Compare
This is a little bit of a bottleneck to me (so many local patches that depend on this are waiting for publication). Would you mind prioritizing review of this over my other patch sets? :-) |
BTW, please be very careful/skeptic. If you know how to test some of these changes, it would be good to test them. The change from strtok(3) to strsep(3) has small implications, and we should be especially careful. |
7f42b74
to
40a468a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor improvements inline
{ | ||
GETGROUPS_T *grouplist; | ||
size_t i; | ||
int ngroups; | ||
bool added; | ||
char *token; | ||
char *g, *p; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does g and p mean? Can't we use token and subtoken or some other names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the function name is add_groups
, g
is of course a group. :)
Also, it's used in a very short context:
$ grepc add_groups lib/addgrps.c | grep '\<g\>' -C1
bool added;
char *g, *p;
char buf[1024];
--
p = buf;
while (NULL != (g = strsep(&p, ",:"))) {
struct group *grp;
grp = getgrnam(g); /* local, no need for xgetgrnam */
if (NULL == grp) {
fprintf(shadow_logfd, _("Warning: unknown group %s\n"), g);
continue;
The getgrnam(3) function accepts a group name as a string (const char *
), so that confirms the intuitive idea that g is for group in a function that adds groups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p
is just a short-lived pointer:
$ grepc add_groups lib/addgrps.c | grep '\<p\>' -C1
bool added;
char *g, *p;
char buf[1024];
--
added = false;
p = buf;
while (NULL != (g = strsep(&p, ",:"))) {
struct group *grp;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
token would be incorrect, IMO. strsep(3) doesn't give tokens, but rather delimiter-separated values. I consider token to be what strtok(3) gives, since it collapses contiguous delimiters (such as white-space in a C parser), and tokens must be non-empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have plans to make the function shorter after this patch, so that will hopefully add clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ptr
is much more palatable than pointer
. But for such a short-lived variable, I still prefer p
. I think p
is unambiguous; almost every use of p
in any C projects means pointer.
I agree with you on the final purpose, which is improving maintainability (and thus readability). However, I disagree in the clarity difference of p vs ptr vs pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, regarding common practice (at least in this project):
alx@devuan:~/src/shadow/shadow/master$ grep -r '\<p\>' src/ lib* contrib/ | wc -l
216
alx@devuan:~/src/shadow/shadow/master$ grep -r '\<ptr\>' src/ lib* contrib/ | wc -l
81
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hallyn Would you like to untie?
This is of course subjective, but in general I would say a single letter variable is preferable if and only if its scope is small, say, < 10 lines.
This case is on the edge, let's leave it as is. I think the proper fix would be to take the middle chunk of the function, the part using p and dup, and make it its own function :) But that also is subjective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the proper fix would be to take the middle chunk of the function, the part using p and dup, and make it its own function :) But that also is subjective.
You actually gave me an idea here. And I'm doing quite a lot of progress there.
Here's a preview of what will come soon:
alx@devuan:~/src/shadow/shadow/master$ grepc -tfd -h -B2 strsep2arr .
// strsep(3) a string into an array of strings.
// Return the number of fields in the string, or -1 on error.
inline ssize_t strsep2arr(char *s, const char *restrict delim,
size_t n, char *a[restrict n])
{
size_t i;
for (i = 0; i < n && s != NULL; i++)
l[i] = strsep(&s, delim);
if (s != NULL) {
errno = E2BIG;
return -1;
}
return i;
}
alx@devuan:~/src/shadow/shadow/master$ grepc -tm -h STRSEP2ARR .
#define STRSEP2ARR(s, delim, a) \
( \
strsep2arr(s, delim, NITEMS(a), a) == NITEMS(a) ? 0 : -1 \
)
alx@devuan:~/src/shadow/shadow/master$ grepc -tfd -h -B1 strsep2ls .
// Like strsep2arr(), but add a NULL terminator.
inline ssize_t strsep2ls(char *s, const char *restrict delim,
size_t n, char *ls[restrict n])
{
size_t i;
i = strsep2arr(s, delim, n, ls);
if (i >= n) {
errno = E2BIG;
return -1;
}
ls[i] = NULL;
return i;
}
alx@devuan:~/src/shadow/shadow/master$ grepc -tm -h STRSEP2LS .
#define STRSEP2LS(s, delim, ls) strsep2ls(s, delim, NITEMS(ls), ls)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just started using it in a few places, and I've already found a bug, and collapsed many lines of code:
$ git log -3 --oneline --stat
4745ef39 (HEAD -> astrsep2ls) lib/: Use STRSEP2ARR() instead of its pattern
lib/gshadow.c | 16 ++--------------
lib/sgetpwent.c | 20 +++++---------------
lib/subordinateio.c | 18 ++++--------------
3 files changed, 11 insertions(+), 43 deletions(-)
4f033a2e lib/port.c: getportent(): Use STRSEP2ARR() instead of its pattern
lib/port.c | 22 +++++++---------------
1 file changed, 7 insertions(+), 15 deletions(-)
38d0b90c lib/: Use STRSEP2LS() instead of its pattern
lib/port.c | 17 +++--------------
1 file changed, 3 insertions(+), 14 deletions(-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one comment, other than that it all looks good, thanks.
e1bad69
to
508d614
Compare
strsep(3) is stateless, and so is easier to reason about. It also has a slight difference: strtok(3) jumps over empty fields, while strsep(3) respects them as empty fields. In most of the cases where we were using strtok(3), it makes more sense to respect empty fields, and this commit probably silently fixes a few bugs. In other cases (most notably filesystem paths), contiguous delimiters ("//") should be collapsed, so strtok(3) still makes more sense there. This commit doesn't replace such strtok(3) calls. While at this, remove some useless variables used by these calls, and reduce the scope of others. Signed-off-by: Alejandro Colomar <[email protected]>
508d614
to
8e4a0e7
Compare
Thanks! |
This is a subset of #1048 for easier review.
Revisions:
v1b
v1c
v1d
v1e
v2
v2b
v2c
v2d
v3
v4