Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add jmh benchmarks #396

Merged
merged 6 commits into from
Feb 25, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,6 @@ hdf5/

### aider
.aider*

# JMH generated files
dependency-reduced-pom.xml
35 changes: 35 additions & 0 deletions benchmarks-jmh/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# JMH Benchmarks
Micro benchmarks for jVector. While {@link Bench.java} is about recall, the JMH benchmarks
are mostly targeting scalability and latency aspects.

## Building and running the benchmark

1. You can build and then run
```shell
mvn clean install -DskipTests=true
java --enable-native-access=ALL-UNNAMED \
--add-modules=jdk.incubator.vector \
-XX:+HeapDumpOnOutOfMemoryError \
-Xmx14G -Djvector.experimental.enable_native_vectorization=true \
-jar benchmarks-jmh/target/benchmarks-jmh-4.0.0-beta.2-SNAPSHOT.jar
```

You can add additional optional JMH arguments dynamically from command line. For example, to run the benchmarks with 4 forks, 5 warmup iterations, 5 measurement iterations, 2 threads, and 10 seconds warmup time per iteration, use the following command:
```shell
java --enable-native-access=ALL-UNNAMED \
--add-modules=jdk.incubator.vector \
-XX:+HeapDumpOnOutOfMemoryError \
-Xmx14G -Djvector.experimental.enable_native_vectorization=true \
-jar benchmarks-jmh/target/benchmarks-jmh-4.0.0-beta.2-SNAPSHOT.jar \
-f 4 -wi 5 -i 5 -t 2 -w 10s
```

Common JMH command line options you can use in the configuration or command line:
- `-f <num>` - Number of forks
- `-wi <num>` - Number of warmup iterations
- `-i <num>` - Number of measurement iterations
- `-w <time>` - Warmup time per iteration
- `-r <time>` - Measurement time per iteration
- `-t <num>` - Number of threads
- `-p <param>=<value>` - Benchmark parameters
- `-prof <profiler>` - Add profiler
99 changes: 99 additions & 0 deletions benchmarks-jmh/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>io.github.jbellis</groupId>
<artifactId>jvector-parent</artifactId>
<version>${revision}</version>
</parent>

<artifactId>benchmarks-jmh</artifactId>

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.release>22</maven.compiler.release>
<jmh.version>1.37</jmh.version>
</properties>

<dependencies>
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>jvector-base</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>io.github.jbellis</groupId>
<artifactId>jvector-twenty</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>io.github.jbellis</groupId>
<artifactId>jvector-native</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>io.github.jbellis</groupId>
<artifactId>jvector-examples</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>${jmh.version}</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>${jmh.version}</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j2-impl</artifactId>
<version>2.24.3</version>
</dependency>

</dependencies>

<build>
<plugins>
<!--Ensures that annotation processor is running during compilation-->
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.13.0</version>
<configuration>
<annotationProcessorPaths>
<path>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>${jmh.version}</version>
</path>
</annotationProcessorPaths>
</configuration>
</plugin>

<!-- Shade this so we can run as a standalone jar -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.3.0</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>org.openjdk.jmh.Main</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
/*
* Copyright DataStax, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package io.github.jbellis.jvector.bench;

import io.github.jbellis.jvector.example.SiftSmall;
import io.github.jbellis.jvector.graph.*;
import io.github.jbellis.jvector.graph.similarity.BuildScoreProvider;
import io.github.jbellis.jvector.util.Bits;
import io.github.jbellis.jvector.vector.VectorSimilarityFunction;
import io.github.jbellis.jvector.vector.VectorizationProvider;
import io.github.jbellis.jvector.vector.types.VectorFloat;
import io.github.jbellis.jvector.vector.types.VectorTypeSupport;
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.util.ArrayList;
import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Thread)
@Fork(1)
@Warmup(iterations = 2)
@Measurement(iterations = 5)
@Threads(1)
public class RandomVectorsBenchmark {
private static final Logger log = LoggerFactory.getLogger(RandomVectorsBenchmark.class);
private static final VectorTypeSupport VECTOR_TYPE_SUPPORT = VectorizationProvider.getInstance().getVectorTypeSupport();
private RandomAccessVectorValues ravv;
private ArrayList<VectorFloat<?>> baseVectors;
private ArrayList<VectorFloat<?>> queryVectors;
private GraphIndexBuilder graphIndexBuilder;
private GraphIndex graphIndex;
int originalDimension;
@Param({"1000", "10000", "100000", "1000000"})
int numBaseVectors;
@Param({"10"})
int numQueryVectors;

@Setup
public void setup() throws IOException {
originalDimension = 128; // Example dimension, can be adjusted

baseVectors = new ArrayList<>(numBaseVectors);
queryVectors = new ArrayList<>(numQueryVectors);

for (int i = 0; i < numBaseVectors; i++) {
VectorFloat<?> vector = createRandomVector(originalDimension);
baseVectors.add(vector);
}

for (int i = 0; i < numQueryVectors; i++) {
VectorFloat<?> vector = createRandomVector(originalDimension);
queryVectors.add(vector);
}

// wrap the raw vectors in a RandomAccessVectorValues
ravv = new ListRandomAccessVectorValues(baseVectors, originalDimension);

// score provider using the raw, in-memory vectors
BuildScoreProvider bsp = BuildScoreProvider.randomAccessScoreProvider(ravv, VectorSimilarityFunction.EUCLIDEAN);

graphIndexBuilder = new GraphIndexBuilder(bsp,
ravv.dimension(),
16, // graph degree
100, // construction search depth
1.2f, // allow degree overflow during construction by this factor
1.2f); // relax neighbor diversity requirement by this factor
graphIndex = graphIndexBuilder.build(ravv);
}

private VectorFloat<?> createRandomVector(int dimension) {
VectorFloat<?> vector = VECTOR_TYPE_SUPPORT.createFloatVector(dimension);
for (int i = 0; i < dimension; i++) {
vector.set(i, (float) Math.random());
}
return vector;
}

@TearDown
public void tearDown() throws IOException {
baseVectors.clear();
queryVectors.clear();
graphIndexBuilder.close();
}

@Benchmark
public void testOnHeapRandomVectors(Blackhole blackhole) {
var queryVector = SiftSmall.randomVector(originalDimension);
// Your benchmark code here
var searchResult = GraphSearcher.search(queryVector,
10, // number of results
ravv, // vectors we're searching, used for scoring
VectorSimilarityFunction.EUCLIDEAN, // how to score
graphIndex,
Bits.ALL); // valid ordinals to consider
blackhole.consume(searchResult);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
/*
* Copyright DataStax, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package io.github.jbellis.jvector.bench;

import io.github.jbellis.jvector.example.SiftSmall;
import io.github.jbellis.jvector.example.util.SiftLoader;
import io.github.jbellis.jvector.graph.*;
import io.github.jbellis.jvector.graph.similarity.BuildScoreProvider;
import io.github.jbellis.jvector.util.Bits;
import io.github.jbellis.jvector.vector.VectorSimilarityFunction;
import io.github.jbellis.jvector.vector.types.VectorFloat;
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Set;
import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Thread)
@Fork(1)
@Warmup(iterations = 2)
@Measurement(iterations = 5)
@Threads(1)
public class StaticSetVectorsBenchmark {
private static final Logger log = LoggerFactory.getLogger(StaticSetVectorsBenchmark.class);
private RandomAccessVectorValues ravv;
private ArrayList<VectorFloat<?>> baseVectors;
private ArrayList<VectorFloat<?>> queryVectors;
private ArrayList<Set<Integer>> groundTruth;
private GraphIndexBuilder graphIndexBuilder;
private GraphIndex graphIndex;
int originalDimension;

@Setup
public void setup() throws IOException {
var siftPath = "siftsmall";
baseVectors = SiftLoader.readFvecs(String.format("%s/siftsmall_base.fvecs", siftPath));
queryVectors = SiftLoader.readFvecs(String.format("%s/siftsmall_query.fvecs", siftPath));
groundTruth = SiftLoader.readIvecs(String.format("%s/siftsmall_groundtruth.ivecs", siftPath));
log.info("base vectors size: {}, query vectors size: {}, loaded, dimensions {}",
baseVectors.size(), queryVectors.size(), baseVectors.get(0).length());
originalDimension = baseVectors.get(0).length();
// wrap the raw vectors in a RandomAccessVectorValues
ravv = new ListRandomAccessVectorValues(baseVectors, originalDimension);

// score provider using the raw, in-memory vectors
BuildScoreProvider bsp = BuildScoreProvider.randomAccessScoreProvider(ravv, VectorSimilarityFunction.EUCLIDEAN);

graphIndexBuilder = new GraphIndexBuilder(bsp,
ravv.dimension(),
16, // graph degree
100, // construction search depth
1.2f, // allow degree overflow during construction by this factor
1.2f); // relax neighbor diversity requirement by this factor
graphIndex = graphIndexBuilder.build(ravv);
}

@TearDown
public void tearDown() throws IOException {
baseVectors.clear();
queryVectors.clear();
groundTruth.clear();
graphIndexBuilder.close();
}

@Benchmark
public void testOnHeapWithRandomQueryVectors(Blackhole blackhole) throws IOException {
var queryVector = SiftSmall.randomVector(originalDimension);
// Your benchmark code here
var searchResult = GraphSearcher.search(queryVector,
10, // number of results
ravv, // vectors we're searching, used for scoring
VectorSimilarityFunction.EUCLIDEAN, // how to score
graphIndex,
Bits.ALL); // valid ordinals to consider
blackhole.consume(searchResult);
}
}
15 changes: 15 additions & 0 deletions benchmarks-jmh/src/main/resources/log4j2.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="INFO">
<Appenders>
<!-- Console Appender -->
<Console name="Console" target="SYSTEM_OUT">
<PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg%n"/>
</Console>
</Appenders>
<Loggers>
<!-- Root Logger -->
<Root level="INFO">
<AppenderRef ref="Console"/>
</Root>
</Loggers>
</Configuration>
1 change: 1 addition & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@
<module>jvector-tests</module>
<module>jvector-multirelease</module>
<module>jvector-examples</module>
<module>benchmarks-jmh</module>
</modules>
<build>
<resources>
Expand Down
3 changes: 2 additions & 1 deletion rat-excludes.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@ pom.xml
src/main/assembly/test-jar-with-dependencies.xml
src/assembly/mrjar.xml
src/assembly/sourcesjar.xml
src/main/java/io/github/jbellis/jvector/vector/cnative/*
src/main/java/io/github/jbellis/jvector/vector/cnative/*
src/main/resources/log4j2.xml
Loading