Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ponder some solutions to automate searching for terms in content #169

Open
waynebeaton opened this issue Apr 11, 2024 · 0 comments
Open
Assignees

Comments

@waynebeaton
Copy link

It would be handy to have a regular expression that one could use to run a quick scan on content to ensure that words in a particular set of tiers do not appear. I maintain, for example, some documentation in AsciiDoc that really lends itself well to this sort of search.

I cobbled this expression together based on the contents of the "term" field in each wordlist file:

$ grep -Pi "\b(?:black[\-\s]?box|Blackout|disable|fellow|master\s?mind|white\s?box|white[\-\s]?label|test1|cripple|master|slave|master|abort|blackhat|whitehat|Tribe|white[\-\s]?list|sanity(?:\-|\s)check|hallucinate|man\-in\-the\-middle|Segregate)\b" -R .

You'll also notice in my expression that I accounted for some variations ("sanity check" and "santity-check"; "black box", "black-box", and "blackbox"; ...), and broke apart some of the combinations ("whitehat-blackhat" became "whitehat" and "blackhat"). I didn't get them all, and there may be some errors (I haven't paid any attention to tiers, for example).

In typical fashion, I'm probably overthinking this... Since the lists are dynamic and I expect will change over time, I'm thinking that the expression should be generated automatically based on the wordlist files in /content/word-lists. It should be relatively straightforward to leverage Hugo to build an expression from this data, or from the content in the JSON word-list.

Having some consistency in the way that terms are captured would make this a lot more useful. The "master-slave" and "whitehat-blackhat" entries don't lend themselves well to the automation (I understand why combining them makes sense). Perhaps adding some version of a variations field could provide a simple solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants