Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding bloom command meta data, bloom group and bloom data type documentaion #233

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

zackcam
Copy link

@zackcam zackcam commented Feb 20, 2025

This is one of three PR's that will be done for adding information about the bloom module to the Valkey website:
Bloom repo json command files: valkey-io/valkey-bloom#47
valkey-io.github.io: valkey-io/valkey-io.github.io#212

This PR has three main changes

  1. Adding the bloom command group
    image

  2. Adding bloom command metadata files (Example for bf.add below)

image
3. Adding bloom data type documents
image
image
image
image

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting!

I skimmed through it very quickly. The documentation itself looks great AFAICT. I can do a more detailed review later.

The commands look very much like built-in commands. It's not mentioned anywhere that it's a separate module that users need to install. I think we should mentioned it on the bloom filters topic page with a link to the github repo. The BF command pages should link to that topic page, so the pages are all linked together.

To build man pages, the scripts in this repo need to be able to take multiple command JSON files. This needs to be added to the Makefile, the README and maybe the python scripts too. Please try to build the man pages as described in the README of this repo.

groups.json Outdated
@@ -3,6 +3,10 @@
"display": "Bitmap",
"description": "Operations on the Bitmap data type"
},
"bloom": {
"display": "Bloom",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the full name? Compare to "Sorted Set".

Suggested change
"display": "Bloom",
"display": "Bloom Filter",

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to your suggestion

@zuiderkwast
Copy link
Contributor

Many of the spellcheck errors can be fixed simply but writing the command names in backticks. Stuff in backticks are excluded from spellcheck IIRC.

@zackcam
Copy link
Author

zackcam commented Feb 20, 2025

The commands look very much like built-in commands. It's not mentioned anywhere that it's a separate module that users need to install

I think we can make it more explicit on the data type page as well by making a modules section. i.e
image
Does this seem like something that would be wanted?

@zuiderkwast
Copy link
Contributor

I think we can make it more explicit on the data type page as well by making a modules section. i.e

Yes, something like that would be good. In your screenshot it looks like the "Extensions" sub-heading is part of "Module Data Types" though, because of the levels of the headings used. If we do this, then "Module Data Types" should be a level-2 heading and "Bloom Filter" a level-3 heading under it.

How about just mentioning the module within the description? Something like this?

 ## Bloom Filter
 
 [Bloom filters](bloomfilters.md) provides a space efficient probabilistic data structure that allows checking if an element is a member of a set. False positives are possible, but it guarantees no false negatives.
+Bloom filters are provided by the module `valkey-bloom`.
 For more information, see:

 * [Overview of Bloom Filters](bloomfilters.md)
 * [Bloom filter command reference](../commands/#bloom)
+* [The valkey-bloom module on GitHub](https://github.com/valkey-io/valkey-bloom/)

@madolson
Copy link
Member

@zuiderkwast I also wanted to get your input about how we should structure the modules to make it clear they aren't part of the core. The current structure is they are intermingled. I don't really have an opinion yet, but one alternative would be to at least separate them in a separate folder structure and clarify which module they are apart of.

@zuiderkwast
Copy link
Contributor

@zuiderkwast I also wanted to get your input about how we should structure the modules to make it clear they aren't part of the core. The current structure is they are intermingled. I don't really have an opinion yet, but one alternative would be to at least separate them in a separate folder structure and clarify which module they are apart of.

Are you talking about the URLs of the commands? I like that it's a flat structure, just like the commands are in a global flat namespace. The BF. prefix is enough.

But we should definitely show it in some way. A line somewhere on each command page would be good. I hope we can be generate it in some way from an optional key in the command JSON file or something like that.

@madolson
Copy link
Member

Are you talking about the URLs of the commands? I like that it's a flat structure, just like the commands are in a global flat namespace. The BF. prefix is enough.

I don't have a strong preference one way or the other about flat/nested, so sticking with flat is OK for me.

But we should definitely show it in some way. A line somewhere on each command page would be good. I hope we can be generate it in some way from an optional key in the command JSON file or something like that.

Yeah, I guess immediately let's make sure there is something in the JSON file. Maybe Module Required: <link to Bloom>.

Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a super deep review. I think we should indicate more clearly that the commands are from a module and not part of the core. That can maybe from the json docs only though.

Comment on lines 3 to 4
* key (required) - A Valkey key of Bloom data type
* item (required) - Item to add
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* key (required) - A Valkey key of Bloom data type
* item (required) - Item to add

We typically omit this, since the usage would be included at the top which will indicate if something is required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense I removed all these from the bloom commands and if I think the arguments needed explained updated the heading name

@@ -0,0 +1,12 @@
Adds an item to a bloom filter, if the specified filter does not exist creates a default bloom filter with that name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Adds an item to a bloom filter, if the specified filter does not exist creates a default bloom filter with that name.
Adds an item to a bloom filter, if the specified bloom filter does not exist creates a bloom filter with default configurations with that name.
If you want to create a bloom filter with non-standard options, use the `BF.INSERT` command.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and made it less wordy as well by removing 'specified' from the description

@@ -0,0 +1,16 @@
Determines if a specified item has been added to the specified bloom filter.
Syntax
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Syntax

@@ -0,0 +1,35 @@
Returns information about a bloomfilter

## Arguments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These need to be kept because they include the info data, but I would change this to be about info fields or something.

## Arguments
* key (required) - A valkey key of bloom data type
* CAPACITY (optional) - Returns the number of unique items that would need to be added before scaling would happen
* SIZE (optional) - Returns the memory size which is the number of bytes allocated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* SIZE (optional) - Returns the memory size which is the number of bytes allocated
* SIZE (optional) - Returns the number of bytes allocated

Why waste time say lot word when few word do trick?

@@ -92,6 +92,14 @@ The [HyperLogLog](hyperloglogs.md) data structures provide probabilistic estimat
* [Overview of HyperLogLog](hyperloglogs.md)
* [HyperLogLog command reference](../commands/#hyperloglog)

## Bloom Filter

[Bloom filters](bloomfilters.md) provides a space efficient probabilistic data structure that allows checking if an element is a member of a set. False positives are possible, but it guarantees no false negatives.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would translate this to english with an example.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to make this more understandable but I think potentially having what I use in the exists and mexists commands could also work if the new version still isn't great


Bloom filters are a space efficient probabilistic data structure that allows checking whether an element is member of a set. False positives are possible, but it guarantees no false negatives.

## Bloom commands
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are other examples include the "basic commands" up front, and then the more sophisticated commands later. I think we should do the same.


**Financial fraud detection**

Bloom filters can help answer the question "Has the user paid from this location before?", which can then give insights if there has been suspicious activity in shopping habits.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a real use case? The false positive here is not idea, since it might make it seem like a transaction is legitimate when it is not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this use case to be more about card fraud instead of location based checking


Bloom filters can help answer the question "Has the user paid from this location before?", which can then give insights if there has been suspicious activity in shopping habits.

For the above each user would have a Bloom filter which is then checked for every transaction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might just merge this into the previous paragraph.


**Check if URL's are malicious**

Bloom filters can answer the question is a URL malicious. Any URL inputted would be checked against a malicious URL bloom filter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Bloom filters can answer the question is a URL malicious. Any URL inputted would be checked against a malicious URL bloom filter.
Bloom filters can answer the question "is a URL malicious?". Any URL inputted would be checked against a malicious URL bloom filter.

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a complete review.

We need to think about what we want regarding

  1. How to show which module a command belongs to and how to store this in the JSON file(s).
  2. What to show in the Since fields. If we'll release some valkey-with-modules bundle, then the version number should probably follow valkey's versioning(?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a symlink added here? Let's not do that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I accidentally added this for git.add will look at removing

Comment on lines 67 to 68
"* [Integer reply](../topics/protocol.md#integers): '1'. The item was successfully added",
"* [Integer reply](../topics/protocol.md#integers): '0'. The item already existed in the bloom filter",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the single quotes it looks a bit like string literals. Use backticks instead to mark it as code? This seems to be how some other commands' integer replies are documented.

Suggested change
"* [Integer reply](../topics/protocol.md#integers): '1'. The item was successfully added",
"* [Integer reply](../topics/protocol.md#integers): '0'. The item already existed in the bloom filter",
"* [Integer reply](../topics/protocol.md#integers): `1` if the item was successfully added",
"* [Integer reply](../topics/protocol.md#integers): `0` if the item already existed in the bloom filter",

Compare to for example this one:

  "CLIENT UNBLOCK": [
    "One of the following:",
    "* [Integer reply](../topics/protocol.md#integers): `0` if the client was unblocked successfully.",
    "* [Integer reply](../topics/protocol.md#integers): `1` if the client wasn't unblocked."
  ],

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, updarted both add and exists in both response files to follow this


Example usage for a default bloom object:
```
127.0.0.1:6379> bf.insert validate_scale_fail VALIDATESCALETO 26214301
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uppercase all commands like BF.INSERT and fixed tokens makes it easier to see what is fixed and what is variable.


## Common use cases for bloom filters

**Financial fraud detection**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look like a sub-headings so I think we should mark them as such. It's semantically more correct. (The others too; not only this one.)

Suggested change
**Financial fraud detection**
### Financial fraud detection

@@ -0,0 +1,12 @@
Adds an item to a bloom filter, if the specified filter does not exist creates a default bloom filter with that name.
## Arguments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An empty line before and after headings, before and after bullet lists, etc. makes it more likely to be rendered correctly on website, man pages and github. The all use different markdown implementation with some subtle differences.

Suggested change
## Arguments
## Arguments

Introduction to Bloom Filters
---

Bloom filters are a space efficient probabilistic data structure that allows checking whether an element is member of a set. False positives are possible, but it guarantees no false negatives.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here somewhere we should mention that it's a separate module and add a link to the module's repo.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a sentence at the top of the page linking the git repo and explaining it is a module

"[Array reply](../topics/protocol.md#arrays): List of information about the bloom filter.",
"When an optional argument is provided:",
"* [Integer reply](../topics/protocol.md#integers): argument value",
"* [String reply??](../topics/protocol.md#simple-strings): argument value",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "String reply??" with double question marks??

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was accidentally left over, we only have a string reply when one of the optional arguments is provided so was meant to come back to this and try and clear up the differences and provide clarity on which case would have a string or integer

],
"BF.INSERT": [
"[Array reply](../topics/protocol.md#arrays): Array of ints (1’s and 0’s) - if filter already exists or if creation was successful. An empty array if no items are provided",
"[String reply??](../topics/protocol.md#simple-strings): not found, if the filter does not exist and NOCREATE is specified",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

??

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually don't need the string reply as it is actually an error reply so removed. But it had the question marks as I copied from the info command

@madolson
Copy link
Member

What to show in the Since fields. If we'll release some valkey-with-modules bundle, then the version number should probably follow valkey's versioning(?).

I think for now we should show the independent modules version number, since we got alignment on that. Internally at AWS we are planning on reviving valkey-io/valkey#408 and posting some suggestions. Once that has alignment, we can maybe add more information about where it's available (i.e. Valkey core since 10.0, valkey-bloom since 1.0)

@zackcam
Copy link
Author

zackcam commented Feb 21, 2025

List of non word choice/ document wording changes
The change to version isn't done in this repo but were discussed on this pr so adding screenshot:
image
Still looking at how best to determine if a command is from a specific module so that it is easy to expand on for future modules as well (the io pr has not been updated yet to include this module version change I will push that once I find out how to determine between modules)

Man page generation for modules, example for bf.add
image
For future modules there are only a few places they will need to add to in the make file
Main callout on change they need to do below others should be clear:
Line 187: $(eval VALKEY_ROOTS := $(VALKEY_ROOT) $(VALKEY_BLOOM_ROOT) $(FUTURE_MODULE))

@zackcam
Copy link
Author

zackcam commented Feb 21, 2025

New command page example with hyperlink to module repo:
Screenshot 2025-02-21 at 12 14 04 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants