Adding bloom command meta data, bloom group and bloom data type documentaion #233

zackcam · 2025-02-20T00:26:56Z

This is one of three PR's that will be done for adding information about the bloom module to the Valkey website:
Bloom repo json command files: valkey-io/valkey-bloom#47
valkey-io.github.io: valkey-io/valkey-io.github.io#212

This PR has three main changes

Adding the bloom command group
Adding bloom command metadata files (Example for bf.add below)

3. Adding bloom data type documents

as well Signed-off-by: zackcam <[email protected]>

zuiderkwast

Very interesting!

I skimmed through it very quickly. The documentation itself looks great AFAICT. I can do a more detailed review later.

The commands look very much like built-in commands. It's not mentioned anywhere that it's a separate module that users need to install. I think we should mentioned it on the bloom filters topic page with a link to the github repo. The BF command pages should link to that topic page, so the pages are all linked together.

To build man pages, the scripts in this repo need to be able to take multiple command JSON files. This needs to be added to the Makefile, the README and maybe the python scripts too. Please try to build the man pages as described in the README of this repo.

zuiderkwast · 2025-02-20T08:22:54Z

groups.json

@@ -3,6 +3,10 @@
    "display": "Bitmap",
    "description": "Operations on the Bitmap data type"
  },
+  "bloom": {
+    "display": "Bloom",


Use the full name? Compare to "Sorted Set".

Suggested change

"display": "Bloom",

"display": "Bloom Filter",

Updated to your suggestion

zuiderkwast · 2025-02-20T08:33:23Z

Many of the spellcheck errors can be fixed simply but writing the command names in backticks. Stuff in backticks are excluded from spellcheck IIRC.

zackcam · 2025-02-20T18:53:34Z

The commands look very much like built-in commands. It's not mentioned anywhere that it's a separate module that users need to install

I think we can make it more explicit on the data type page as well by making a modules section. i.e

Does this seem like something that would be wanted?

zuiderkwast · 2025-02-20T19:07:11Z

I think we can make it more explicit on the data type page as well by making a modules section. i.e

Yes, something like that would be good. In your screenshot it looks like the "Extensions" sub-heading is part of "Module Data Types" though, because of the levels of the headings used. If we do this, then "Module Data Types" should be a level-2 heading and "Bloom Filter" a level-3 heading under it.

How about just mentioning the module within the description? Something like this?

 ## Bloom Filter
 
 [Bloom filters](bloomfilters.md) provides a space efficient probabilistic data structure that allows checking if an element is a member of a set. False positives are possible, but it guarantees no false negatives.
+Bloom filters are provided by the module `valkey-bloom`.
 For more information, see:

 * [Overview of Bloom Filters](bloomfilters.md)
 * [Bloom filter command reference](../commands/#bloom)
+* [The valkey-bloom module on GitHub](https://github.com/valkey-io/valkey-bloom/)

madolson · 2025-02-20T19:54:25Z

@zuiderkwast I also wanted to get your input about how we should structure the modules to make it clear they aren't part of the core. The current structure is they are intermingled. I don't really have an opinion yet, but one alternative would be to at least separate them in a separate folder structure and clarify which module they are apart of.

zuiderkwast · 2025-02-20T20:37:24Z

@zuiderkwast I also wanted to get your input about how we should structure the modules to make it clear they aren't part of the core. The current structure is they are intermingled. I don't really have an opinion yet, but one alternative would be to at least separate them in a separate folder structure and clarify which module they are apart of.

Are you talking about the URLs of the commands? I like that it's a flat structure, just like the commands are in a global flat namespace. The BF. prefix is enough.

But we should definitely show it in some way. A line somewhere on each command page would be good. I hope we can be generate it in some way from an optional key in the command JSON file or something like that.

madolson · 2025-02-20T20:40:45Z

Are you talking about the URLs of the commands? I like that it's a flat structure, just like the commands are in a global flat namespace. The BF. prefix is enough.

I don't have a strong preference one way or the other about flat/nested, so sticking with flat is OK for me.

But we should definitely show it in some way. A line somewhere on each command page would be good. I hope we can be generate it in some way from an optional key in the command JSON file or something like that.

Yeah, I guess immediately let's make sure there is something in the JSON file. Maybe Module Required: <link to Bloom>.

madolson

Not a super deep review. I think we should indicate more clearly that the commands are from a module and not part of the core. That can maybe from the json docs only though.

madolson · 2025-02-20T20:43:09Z

commands/bf.add.md

+* key (required) - A Valkey key of Bloom data type
+* item (required) - Item to add


Suggested change

* key (required) - A Valkey key of Bloom data type

* item (required) - Item to add

We typically omit this, since the usage would be included at the top which will indicate if something is required.

Yeah makes sense I removed all these from the bloom commands and if I think the arguments needed explained updated the heading name

madolson · 2025-02-20T20:44:18Z

commands/bf.add.md

@@ -0,0 +1,12 @@
+Adds an item to a bloom filter, if the specified filter does not exist creates a default bloom filter with that name.


Suggested change

Adds an item to a bloom filter, if the specified filter does not exist creates a default bloom filter with that name.

Adds an item to a bloom filter, if the specified bloom filter does not exist creates a bloom filter with default configurations with that name.

If you want to create a bloom filter with non-standard options, use the `BF.INSERT` command.

Updated and made it less wordy as well by removing 'specified' from the description

madolson · 2025-02-20T20:45:03Z

commands/bf.exists.md

@@ -0,0 +1,16 @@
+Determines if a specified item has been added to the specified bloom filter.
+Syntax


Suggested change

Syntax

madolson · 2025-02-20T20:46:16Z

commands/bf.info.md

@@ -0,0 +1,35 @@
+Returns information about a bloomfilter
+
+## Arguments


These need to be kept because they include the info data, but I would change this to be about info fields or something.

madolson · 2025-02-20T20:47:53Z

commands/bf.info.md

+## Arguments
+* key (required) - A valkey key of bloom data type
+* CAPACITY (optional) - Returns the number of unique items that would need to be added before scaling would happen
+* SIZE (optional) - Returns the memory size which is the number of bytes allocated


Suggested change

* SIZE (optional) - Returns the memory size which is the number of bytes allocated

* SIZE (optional) - Returns the number of bytes allocated

Why waste time say lot word when few word do trick?

madolson · 2025-02-20T20:57:57Z

topics/data-types.md

@@ -92,6 +92,14 @@ The [HyperLogLog](hyperloglogs.md) data structures provide probabilistic estimat
 * [Overview of HyperLogLog](hyperloglogs.md)
 * [HyperLogLog command reference](../commands/#hyperloglog)

+## Bloom Filter
+
+[Bloom filters](bloomfilters.md) provides a space efficient probabilistic data structure that allows checking if an element is a member of a set. False positives are possible, but it guarantees no false negatives.


I would translate this to english with an example.

I tried to make this more understandable but I think potentially having what I use in the exists and mexists commands could also work if the new version still isn't great

madolson · 2025-02-20T20:59:37Z

topics/bloomfilters.md

+
+Bloom filters are a space efficient probabilistic data structure that allows checking whether an element is member of a set. False positives are possible, but it guarantees no false negatives.
+
+## Bloom commands


Are other examples include the "basic commands" up front, and then the more sophisticated commands later. I think we should do the same.

madolson · 2025-02-20T21:00:41Z

topics/bloomfilters.md

+
+**Financial fraud detection**
+
+Bloom filters can help answer the question "Has the user paid from this location before?", which can then give insights if there has been suspicious activity in shopping habits.


Is this a real use case? The false positive here is not idea, since it might make it seem like a transaction is legitimate when it is not.

Updated this use case to be more about card fraud instead of location based checking

madolson · 2025-02-20T21:01:14Z

topics/bloomfilters.md

+
+Bloom filters can help answer the question "Has the user paid from this location before?", which can then give insights if there has been suspicious activity in shopping habits.
+
+For the above each user would have a Bloom filter which is then checked for every transaction.


Might just merge this into the previous paragraph.

madolson · 2025-02-20T21:28:53Z

topics/bloomfilters.md

+
+**Check if URL's are malicious**
+
+Bloom filters can answer the question is a URL malicious. Any URL inputted would be checked against a malicious URL bloom filter. 


Suggested change

Bloom filters can answer the question is a URL malicious. Any URL inputted would be checked against a malicious URL bloom filter.

Bloom filters can answer the question "is a URL malicious?". Any URL inputted would be checked against a malicious URL bloom filter.

zuiderkwast

Not a complete review.

We need to think about what we want regarding

How to show which module a command belongs to and how to store this in the JSON file(s).
What to show in the Since fields. If we'll release some valkey-with-modules bundle, then the version number should probably follow valkey's versioning(?).

zuiderkwast · 2025-02-20T21:23:46Z

commands/commands

Is this a symlink added here? Let's not do that.

I think I accidentally added this for git.add will look at removing

zuiderkwast · 2025-02-20T21:28:26Z

resp2_replies.json

+    "* [Integer reply](../topics/protocol.md#integers): '1'. The item was successfully added",
+    "* [Integer reply](../topics/protocol.md#integers): '0'. The item already existed in the bloom filter",


With the single quotes it looks a bit like string literals. Use backticks instead to mark it as code? This seems to be how some other commands' integer replies are documented.

Suggested change

"* [Integer reply](../topics/protocol.md#integers): '1'. The item was successfully added",

"* [Integer reply](../topics/protocol.md#integers): '0'. The item already existed in the bloom filter",

"* [Integer reply](../topics/protocol.md#integers): `1` if the item was successfully added",

"* [Integer reply](../topics/protocol.md#integers): `0` if the item already existed in the bloom filter",

Compare to for example this one:

"CLIENT UNBLOCK": [ "One of the following:", "* [Integer reply](../topics/protocol.md#integers): `0` if the client was unblocked successfully.", "* [Integer reply](../topics/protocol.md#integers): `1` if the client wasn't unblocked." ],

Makes sense, updarted both add and exists in both response files to follow this

zuiderkwast · 2025-02-20T21:30:44Z

topics/bloomfilters.md

+
+Example usage for a default bloom object:
+```
+127.0.0.1:6379> bf.insert validate_scale_fail VALIDATESCALETO 26214301


Uppercase all commands like BF.INSERT and fixed tokens makes it easier to see what is fixed and what is variable.

zuiderkwast · 2025-02-20T21:32:41Z

topics/bloomfilters.md

+
+## Common use cases for bloom filters
+
+**Financial fraud detection**


These look like a sub-headings so I think we should mark them as such. It's semantically more correct. (The others too; not only this one.)

Suggested change

**Financial fraud detection**

### Financial fraud detection

zuiderkwast · 2025-02-20T21:34:42Z

commands/bf.add.md

@@ -0,0 +1,12 @@
+Adds an item to a bloom filter, if the specified filter does not exist creates a default bloom filter with that name.
+## Arguments


An empty line before and after headings, before and after bullet lists, etc. makes it more likely to be rendered correctly on website, man pages and github. The all use different markdown implementation with some subtle differences.

Suggested change

## Arguments

## Arguments

zuiderkwast · 2025-02-20T21:36:06Z

topics/bloomfilters.md

+    Introduction to Bloom Filters
+---
+
+Bloom filters are a space efficient probabilistic data structure that allows checking whether an element is member of a set. False positives are possible, but it guarantees no false negatives.


Here somewhere we should mention that it's a separate module and add a link to the module's repo.

I added a sentence at the top of the page linking the git repo and explaining it is a module

zuiderkwast · 2025-02-20T21:47:24Z

resp3_replies.json

+    "[Array reply](../topics/protocol.md#arrays): List of information about the bloom filter.",
+    "When an optional argument is provided:",
+    "* [Integer reply](../topics/protocol.md#integers): argument value",
+    "* [String reply??](../topics/protocol.md#simple-strings): argument value",


Why "String reply??" with double question marks??

that was accidentally left over, we only have a string reply when one of the optional arguments is provided so was meant to come back to this and try and clear up the differences and provide clarity on which case would have a string or integer

zuiderkwast · 2025-02-20T21:47:33Z

resp3_replies.json

+  ],
+  "BF.INSERT": [
+    "[Array reply](../topics/protocol.md#arrays): Array of ints (1’s and 0’s) - if filter already exists or if creation was successful. An empty array if no items are provided",
+    "[String reply??](../topics/protocol.md#simple-strings): not found, if the filter does not exist and NOCREATE is specified",


Actually don't need the string reply as it is actually an error reply so removed. But it had the question marks as I copied from the info command

madolson · 2025-02-20T23:39:42Z

What to show in the Since fields. If we'll release some valkey-with-modules bundle, then the version number should probably follow valkey's versioning(?).

I think for now we should show the independent modules version number, since we got alignment on that. Internally at AWS we are planning on reviving valkey-io/valkey#408 and posting some suggestions. Once that has alignment, we can maybe add more information about where it's available (i.e. Valkey core since 10.0, valkey-bloom since 1.0)

zackcam · 2025-02-21T07:53:40Z

List of non word choice/ document wording changes
The change to version isn't done in this repo but were discussed on this pr so adding screenshot:

Still looking at how best to determine if a command is from a specific module so that it is easy to expand on for future modules as well (the io pr has not been updated yet to include this module version change I will push that once I find out how to determine between modules)

Man page generation for modules, example for bf.add

For future modules there are only a few places they will need to add to in the make file
Main callout on change they need to do below others should be clear:
Line 187: $(eval VALKEY_ROOTS := $(VALKEY_ROOT) $(VALKEY_BLOOM_ROOT) $(FUTURE_MODULE))

…to generate bloom man pages Signed-off-by: zackcam <[email protected]>

zackcam · 2025-02-21T20:17:37Z

New command page example with hyperlink to module repo:

Adding bloom command meta data and bloom group, adding bloom data type

6f84713

as well Signed-off-by: zackcam <[email protected]>

madolson requested review from zuiderkwast and madolson February 20, 2025 00:37

zackcam mentioned this pull request Feb 20, 2025

Adding functionality for the bloom module to have its commands displayed on the Valkey website valkey-io/valkey-io.github.io#212

Open

1 task

zuiderkwast reviewed Feb 20, 2025

View reviewed changes

madolson reviewed Feb 20, 2025

View reviewed changes

zuiderkwast reviewed Feb 20, 2025

View reviewed changes

zackcam force-pushed the main branch from 2121cad to cd80826 Compare February 21, 2025 08:11

First round of rewording and changes to documentation. Added ability …

1f892b9

…to generate bloom man pages Signed-off-by: zackcam <[email protected]>

zackcam force-pushed the main branch from cd80826 to 1f892b9 Compare February 21, 2025 20:16

		* key (required) - A Valkey key of Bloom data type
		* item (required) - Item to add

		@@ -0,0 +1,12 @@
		Adds an item to a bloom filter, if the specified filter does not exist creates a default bloom filter with that name.

	Adds an item to a bloom filter, if the specified filter does not exist creates a default bloom filter with that name.
	Adds an item to a bloom filter, if the specified bloom filter does not exist creates a bloom filter with default configurations with that name.
	If you want to create a bloom filter with non-standard options, use the `BF.INSERT` command.

		@@ -0,0 +1,16 @@
		Determines if a specified item has been added to the specified bloom filter.
		Syntax

		@@ -0,0 +1,35 @@
		Returns information about a bloomfilter

		## Arguments

	* SIZE (optional) - Returns the memory size which is the number of bytes allocated
	* SIZE (optional) - Returns the number of bytes allocated


		Bloom filters are a space efficient probabilistic data structure that allows checking whether an element is member of a set. False positives are possible, but it guarantees no false negatives.

		## Bloom commands


		Financial fraud detection

		Bloom filters can help answer the question "Has the user paid from this location before?", which can then give insights if there has been suspicious activity in shopping habits.


		Bloom filters can help answer the question "Has the user paid from this location before?", which can then give insights if there has been suspicious activity in shopping habits.

		For the above each user would have a Bloom filter which is then checked for every transaction.


		Check if URL's are malicious

		Bloom filters can answer the question is a URL malicious. Any URL inputted would be checked against a malicious URL bloom filter.

	Bloom filters can answer the question is a URL malicious. Any URL inputted would be checked against a malicious URL bloom filter.
	Bloom filters can answer the question "is a URL malicious?". Any URL inputted would be checked against a malicious URL bloom filter.

		"* [Integer reply](../topics/protocol.md#integers): '1'. The item was successfully added",
		"* [Integer reply](../topics/protocol.md#integers): '0'. The item already existed in the bloom filter",


		## Common use cases for bloom filters

		Financial fraud detection

Adding bloom command meta data, bloom group and bloom data type documentaion #233

Are you sure you want to change the base?

Adding bloom command meta data, bloom group and bloom data type documentaion #233

Conversation

zackcam commented Feb 20, 2025 • edited Loading

This PR has three main changes

zuiderkwast left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zuiderkwast commented Feb 20, 2025

zackcam commented Feb 20, 2025

zuiderkwast commented Feb 20, 2025

madolson commented Feb 20, 2025

zuiderkwast commented Feb 20, 2025

madolson commented Feb 20, 2025

madolson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zuiderkwast left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

madolson commented Feb 20, 2025

zackcam commented Feb 21, 2025 • edited Loading

zackcam commented Feb 21, 2025

zackcam commented Feb 20, 2025 •

edited

Loading

zuiderkwast left a comment •

edited

Loading

zackcam commented Feb 21, 2025 •

edited

Loading