Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace most strtok(3) calls by strsep(3) #1093

Merged
merged 1 commit into from
Dec 5, 2024

Conversation

alejandro-colomar
Copy link
Collaborator

@alejandro-colomar alejandro-colomar commented Oct 14, 2024

This is a subset of #1048 for easier review.


Revisions:

v1b
  • Rebase
$ git range-diff gh/strtcpy..gh/strsep strtcpy..strsep 
1:  1ada0038 = 1:  b806041d lib/, src/: Use strsep(3) instead of strtok(3)
v1c
  • Rebase
$ git range-diff gh/strtcpy..gh/strsep master..strsep 
1:  b806041d = 1:  b87f79c6 lib/, src/: Use strsep(3) instead of strtok(3)
v1d
  • Expand commit message
$ git range-diff master gh/strsep strsep 
1:  b87f79c6 ! 1:  75903c36 lib/, src/: Use strsep(3) instead of strtok(3)
    @@ Metadata
      ## Commit message ##
         lib/, src/: Use strsep(3) instead of strtok(3)
     
    +    strsep(3) is stateless, and so is easier to reason about.
    +
    +    It also has a slight difference: strtok(3) jumps over empty fields,
    +    while strsep(3) respects them as empty fields.  In most of the cases
    +    where we were using strtok(3), it makes more sense to respect empty
    +    fields, and this commit probably silently fixes a few bugs.
    +
    +    In other cases (most notably filesystem paths), contiguous delimiters
    +    ("//") should be collapsed, so strtok(3) still makes more sense there.
    +    This commit doesn't replace such strtok(3) calls.
    +
         Signed-off-by: Alejandro Colomar <[email protected]>
     
      ## lib/addgrps.c ##
v1e
  • Rebase
$ git range-diff master..gh/strsep shadow/master..strsep 
1:  75903c36 = 1:  fa952752 lib/, src/: Use strsep(3) instead of strtok(3)
v2
  • Several rebases
$ git range-diff fa952752^..fa952752 shadow/master..gh/strsep 
1:  fa952752 = 1:  eb9e6257 lib/, src/: Use strsep(3) instead of strtok(3)
v2b
  • Rebase
$ git range-diff alx/master..gh/strsep shadow/master..strsep 
1:  eb9e6257 = 1:  7f42b741 lib/, src/: Use strsep(3) instead of strtok(3)
v2c
  • Rebase
$ git range-diff gh/master..gh/strsep shadow/master..strsep 
1:  7f42b741 ! 1:  40a468ab lib/, src/: Use strsep(3) instead of strtok(3)
    @@ src/login_nopam.c: static bool list_match (char *list, const char *item, bool (*
     +          while (   (NULL != (tok = strsep(&list, ", \t")))
                       && (strcasecmp (tok, "EXCEPT") != 0))
                        /* VOID */ ;
    -           if (tok == 0 || !list_match (NULL, item, match_fn)) {
    +           if (tok == NULL || !list_match(NULL, item, match_fn)) {
     
      ## src/suauth.c ##
     @@ src/suauth.c: static int isgrp (const char *, const char *);
v2d
  • Rebase
$ git range-diff master..gh/strsep shadow/master..strsep 
1:  40a468ab ! 1:  e1bad69d lib/, src/: Use strsep(3) instead of strtok(3)
    @@ lib/console.c: static bool is_listed (const char *cfgin, const char *tty, bool d
     -          while ((s = strtok (pbuf, ":")) != NULL) {
     +          pbuf = buf;
     +          while (NULL != (s = strsep(&pbuf, ":"))) {
    -                   if (strcmp (s, tty) == 0) {
    +                   if (streq(s, tty)) {
                                return true;
                        }
     -
    @@ src/suauth.c: int check_su_auth (const char *actual_id,
     -       tok = strtok (NULL, split)) {
     +  while (NULL != (tok = strsep(&list, ", "))) {
      
    -           if (!strcmp (tok, "ALL")) {
    +           if (streq(tok, "ALL")) {
                        if (state != 0) {
v3
  • Reduce the scope of a variable instead of removing it. [@hallyn ]
$ git range-diff master gh/strsep strsep 
1:  e1bad69d ! 1:  508d6143 lib/, src/: Use strsep(3) instead of strtok(3)
    @@ Commit message
         ("//") should be collapsed, so strtok(3) still makes more sense there.
         This commit doesn't replace such strtok(3) calls.
     
    +    While at this, remove some useless variables used by these calls, and
    +    reduce the scope of others.
    +
         Signed-off-by: Alejandro Colomar <[email protected]>
     
      ## lib/addgrps.c ##
    @@ src/login_nopam.c: int login_access (const char *user, const char *from)
     +static bool
     +list_match(char *list, const char *item, bool (*match_fn)(const char *, const char*))
      {
    ++  static const char  sep[] = ", \t";
    ++
        char *tok;
        bool match = false;
    + 
     @@ src/login_nopam.c: static bool list_match (char *list, const char *item, bool (*match_fn) (const ch
         * a match, look for an "EXCEPT" list and recurse to determine whether
         * the match is affected by any exceptions.
         */
     -  for (tok = strtok (list, sep); tok != NULL; tok = strtok (NULL, sep)) {
    -+  while (NULL != (tok = strsep(&list, ", \t"))) {
    ++  while (NULL != (tok = strsep(&list, sep))) {
                if (strcasecmp (tok, "EXCEPT") == 0) {  /* EXCEPT: give up */
                        break;
                }
    @@ src/login_nopam.c: static bool list_match (char *list, const char *item, bool (*
        /* Process exceptions to matches. */
        if (match) {
     -          while (   ((tok = strtok (NULL, sep)) != NULL)
    -+          while (   (NULL != (tok = strsep(&list, ", \t")))
    ++          while (   (NULL != (tok = strsep(&list, sep)))
                       && (strcasecmp (tok, "EXCEPT") != 0))
                        /* VOID */ ;
                if (tok == NULL || !list_match(NULL, item, match_fn)) {
v4
  • Rebase
$ git range-diff master..gh/strsep shadow/master..strsep 
1:  508d6143 ! 1:  8e4a0e70 lib/, src/: Use strsep(3) instead of strtok(3)
    @@ src/groupadd.c: static void grp_update (void)
                                exit (E_GRP_UPDATE);
                        }
     -                  grp.gr_mem = add_list(grp.gr_mem, token);
    --                  token = strtok(NULL, ",");
    ++
     +                  grp.gr_mem = add_list(grp.gr_mem, u);
    + #ifdef  SHADOWGRP
    +                   if (is_shadow_grp)
    +-                          sgrp.sg_mem = add_list(sgrp.sg_mem, token);
    ++                          sgrp.sg_mem = add_list(sgrp.sg_mem, u);
    + #endif
    +-                  token = strtok(NULL, ",");
                }
        }
      
    @@ src/groupmod.c: static void grp_update (void)
                if (!aflg) {
                        // requested to replace the existing groups
     @@ src/groupmod.c: static void grp_update (void)
    -                           grp.gr_mem = dup_list (grp.gr_mem);
                }
    + #endif                            /* SHADOWGRP */
      
     -          token = strtok(user_list, ",");
     -          while (token) {
    @@ src/groupmod.c: static void grp_update (void)
                                exit (E_GRP_UPDATE);
                        }
     -                  grp.gr_mem = add_list(grp.gr_mem, token);
    --                  token = strtok(NULL, ",");
    ++
     +                  grp.gr_mem = add_list(grp.gr_mem, u);
    + #ifdef    SHADOWGRP
    +                   if (NULL != osgrp)
    +-                          sgrp.sg_mem = add_list(sgrp.sg_mem, token);
    ++                          sgrp.sg_mem = add_list(sgrp.sg_mem, u);
    + #endif                            /* SHADOWGRP */
    +-                  token = strtok(NULL, ",");
                }
        }

@alejandro-colomar alejandro-colomar force-pushed the strsep branch 3 times, most recently from b87f79c to 75903c3 Compare October 15, 2024 10:15
@alejandro-colomar alejandro-colomar marked this pull request as ready for review October 15, 2024 10:17
@alejandro-colomar alejandro-colomar force-pushed the strsep branch 4 times, most recently from 62c5276 to eb9e625 Compare November 2, 2024 10:10
@alejandro-colomar
Copy link
Collaborator Author

@hallyn

This is a little bit of a bottleneck to me (so many local patches that depend on this are waiting for publication). Would you mind prioritizing review of this over my other patch sets? :-)

@alejandro-colomar
Copy link
Collaborator Author

alejandro-colomar commented Nov 11, 2024

BTW, please be very careful/skeptic. If you know how to test some of these changes, it would be good to test them. The change from strtok(3) to strsep(3) has small implications, and we should be especially careful.

Copy link
Collaborator

@ikerexxe ikerexxe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor improvements inline

{
GETGROUPS_T *grouplist;
size_t i;
int ngroups;
bool added;
char *token;
char *g, *p;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does g and p mean? Can't we use token and subtoken or some other names?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the function name is add_groups, g is of course a group. :)

Also, it's used in a very short context:

$ grepc add_groups lib/addgrps.c | grep '\<g\>' -C1
	bool added;
	char *g, *p;
	char buf[1024];
--
	p = buf;
	while (NULL != (g = strsep(&p, ",:"))) {
		struct group *grp;

		grp = getgrnam(g); /* local, no need for xgetgrnam */
		if (NULL == grp) {
			fprintf(shadow_logfd, _("Warning: unknown group %s\n"), g);
			continue;

The getgrnam(3) function accepts a group name as a string (const char *), so that confirms the intuitive idea that g is for group in a function that adds groups.

Copy link
Collaborator Author

@alejandro-colomar alejandro-colomar Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p is just a short-lived pointer:

$ grepc add_groups lib/addgrps.c | grep '\<p\>' -C1
	bool added;
	char *g, *p;
	char buf[1024];
--
	added = false;
	p = buf;
	while (NULL != (g = strsep(&p, ",:"))) {
		struct group *grp;

Copy link
Collaborator Author

@alejandro-colomar alejandro-colomar Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

token would be incorrect, IMO. strsep(3) doesn't give tokens, but rather delimiter-separated values. I consider token to be what strtok(3) gives, since it collapses contiguous delimiters (such as white-space in a C parser), and tokens must be non-empty.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have plans to make the function shorter after this patch, so that will hopefully add clarity.

Copy link
Collaborator Author

@alejandro-colomar alejandro-colomar Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ptr is much more palatable than pointer. But for such a short-lived variable, I still prefer p. I think p is unambiguous; almost every use of p in any C projects means pointer.

I agree with you on the final purpose, which is improving maintainability (and thus readability). However, I disagree in the clarity difference of p vs ptr vs pointer.

Copy link
Collaborator Author

@alejandro-colomar alejandro-colomar Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, regarding common practice (at least in this project):

alx@devuan:~/src/shadow/shadow/master$ grep -r '\<p\>' src/ lib* contrib/ | wc -l
216
alx@devuan:~/src/shadow/shadow/master$ grep -r '\<ptr\>' src/ lib* contrib/ | wc -l
81

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hallyn Would you like to untie?

This is of course subjective, but in general I would say a single letter variable is preferable if and only if its scope is small, say, < 10 lines.

This case is on the edge, let's leave it as is. I think the proper fix would be to take the middle chunk of the function, the part using p and dup, and make it its own function :) But that also is subjective.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the proper fix would be to take the middle chunk of the function, the part using p and dup, and make it its own function :) But that also is subjective.

You actually gave me an idea here. And I'm doing quite a lot of progress there.

Here's a preview of what will come soon:

alx@devuan:~/src/shadow/shadow/master$ grepc -tfd -h -B2 strsep2arr .
// strsep(3) a string into an array of strings.
// Return the number of fields in the string, or -1 on error.
inline ssize_t strsep2arr(char *s, const char *restrict delim,
    size_t n, char *a[restrict n])
{
	size_t  i;

	for (i = 0; i < n && s != NULL; i++)
		l[i] = strsep(&s, delim);

	if (s != NULL) {
		errno = E2BIG;
		return -1;
	}

	return i;
}
alx@devuan:~/src/shadow/shadow/master$ grepc -tm -h STRSEP2ARR .
#define STRSEP2ARR(s, delim, a)                                       \
(                                                                     \
	strsep2arr(s, delim, NITEMS(a), a) == NITEMS(a) ? 0 : -1      \
)
alx@devuan:~/src/shadow/shadow/master$ grepc -tfd -h -B1 strsep2ls .
// Like strsep2arr(), but add a NULL terminator.
inline ssize_t strsep2ls(char *s, const char *restrict delim,
    size_t n, char *ls[restrict n])
{
	size_t  i;

	i = strsep2arr(s, delim, n, ls);
	if (i >= n) {
		errno = E2BIG;
		return -1;
	}

	ls[i] = NULL;

	return i;
}
alx@devuan:~/src/shadow/shadow/master$ grepc -tm -h STRSEP2LS .
#define STRSEP2LS(s, delim, ls)  strsep2ls(s, delim, NITEMS(ls), ls)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just started using it in a few places, and I've already found a bug, and collapsed many lines of code:

$ git log -3 --oneline --stat
4745ef39 (HEAD -> astrsep2ls) lib/: Use STRSEP2ARR() instead of its pattern
 lib/gshadow.c       | 16 ++--------------
 lib/sgetpwent.c     | 20 +++++---------------
 lib/subordinateio.c | 18 ++++--------------
 3 files changed, 11 insertions(+), 43 deletions(-)
4f033a2e lib/port.c: getportent(): Use STRSEP2ARR() instead of its pattern
 lib/port.c | 22 +++++++---------------
 1 file changed, 7 insertions(+), 15 deletions(-)
38d0b90c lib/: Use STRSEP2LS() instead of its pattern
 lib/port.c | 17 +++--------------
 1 file changed, 3 insertions(+), 14 deletions(-)

lib/console.c Show resolved Hide resolved
src/groupadd.c Show resolved Hide resolved
src/groupmod.c Show resolved Hide resolved
Copy link
Member

@hallyn hallyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment, other than that it all looks good, thanks.

src/login_nopam.c Show resolved Hide resolved
@alejandro-colomar alejandro-colomar force-pushed the strsep branch 2 times, most recently from e1bad69 to 508d614 Compare December 2, 2024 10:46
strsep(3) is stateless, and so is easier to reason about.

It also has a slight difference: strtok(3) jumps over empty fields,
while strsep(3) respects them as empty fields.  In most of the cases
where we were using strtok(3), it makes more sense to respect empty
fields, and this commit probably silently fixes a few bugs.

In other cases (most notably filesystem paths), contiguous delimiters
("//") should be collapsed, so strtok(3) still makes more sense there.
This commit doesn't replace such strtok(3) calls.

While at this, remove some useless variables used by these calls, and
reduce the scope of others.

Signed-off-by: Alejandro Colomar <[email protected]>
@hallyn hallyn merged commit 90afe61 into shadow-maint:master Dec 5, 2024
9 checks passed
@alejandro-colomar
Copy link
Collaborator Author

Thanks!

@alejandro-colomar alejandro-colomar deleted the strsep branch December 5, 2024 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants