Llama.cpp example for cpp backend (#2904)

* Version1 of llm inference with cpp backend Signed-off-by: Shrinath Suresh <[email protected]> Updating llm handler - loadmodel, preprocess, inference methods Signed-off-by: Shrinath Suresh <[email protected]> Fixed infinite lock by adding request ids to the preprocess method Signed-off-by: Shrinath Suresh <[email protected]> Adding test script for finding tokens per second llama-7b-chat and ggml version Signed-off-by: Shrinath Suresh <[email protected]> GGUF Compatibility Signed-off-by: Shrinath Suresh <[email protected]> Fixing unit tests Signed-off-by: Shrinath Suresh <[email protected]> Fix typo Signed-off-by: Shrinath Suresh <[email protected]> Using folly to read config path Signed-off-by: Shrinath Suresh <[email protected]> Removing debug couts Signed-off-by: Shrinath Suresh <[email protected]> Processing all the items in the batch Signed-off-by: Shrinath Suresh <[email protected]> Adopted llama.cpp api changes * Adapt to removal of TS backend * Re-add test for llama.cpp example * Add llama.cpp as a submodule * Point to correct llama.cpp installation * Build llama.cpp in build.sh * Skip llama.cpp example test if model weights are not available * renamed torchscript_model folder into examples * Adjust to new base_handler interface * Remove debug statement * Rename llamacpp class + remove dummy.pt file * Move llamacpp config.json * Moved and created prompt file * Reset context for mutiple batch entries * Add doc for llamacpp example * Fix spell check * Replace output example in llamacpp example * Move cpp example src into main examples folder * Convert cerr/cout into logs --------- Co-authored-by: Shrinath Suresh <[email protected]>
pytorch · Jan 26, 2024 · a07b7d9 · a07b7d9
1 parent 3ecaf0b
commit a07b7d9
Show file tree

Hide file tree

Showing 40 changed files with 564 additions and 67 deletions.
diff --git a/.gitmodules b/.gitmodules
@@ -1,3 +1,6 @@
 [submodule "third_party/google/rpc"]
 	path = third_party/google/rpc
 	url = https://github.com/googleapis/googleapis.git
+[submodule "cpp/third-party/llama.cpp"]
+	path = cpp/third-party/llama.cpp
+	url = https://github.com/ggerganov/llama.cpp.git
diff --git a/cpp/README.md b/cpp/README.md
@@ -49,23 +49,23 @@ By default, TorchServe cpp provides a handler for TorchScript [src/backends/hand
 ```
  torch-model-archiver --model-name mnist_base --version 1.0 --serialized-file mnist_script.pt --handler TorchScriptHandler --runtime LSP
 ```
-Here is an [example](https://github.com/pytorch/serve/tree/cpp_backend/cpp/test/resources/torchscript_model/mnist/base_handler) of unzipped model mar file.
+Here is an [example](https://github.com/pytorch/serve/tree/cpp_backend/cpp/test/resources/examples/mnist/base_handler) of unzipped model mar file.
 ##### Using Custom Handler
 * build customized handler shared lib. For example [Mnist handler](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/examples/image_classifier/mnist).
 * set runtime as "LSP" in model archiver option [--runtime](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
 * set handler as "libmnist_handler:MnistHandler" in model archiver option [--handler](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
 ```
 torch-model-archiver --model-name mnist_handler --version 1.0 --serialized-file mnist_script.pt --handler libmnist_handler:MnistHandler --runtime LSP
 ```
-Here is an [example](https://github.com/pytorch/serve/tree/cpp_backend/cpp/test/resources/torchscript_model/mnist/mnist_handler) of unzipped model mar file.
+Here is an [example](https://github.com/pytorch/serve/tree/cpp_backend/cpp/test/resources/examples/mnist/mnist_handler) of unzipped model mar file.
 ##### BabyLLama Example
 The babyllama example can be found [here](https://github.com/pytorch/serve/blob/master/cpp/src/examples/babyllama/).
 To run the example we need to download the weights as well as tokenizer files:
 ```bash
 wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
 wget https://github.com/karpathy/llama2.c/raw/master/tokenizer.bin
 ```
-Subsequently, we need to adjust the paths according to our local file structure in [config.json](https://github.com/pytorch/serve/blob/master/serve/cpp/test/resources/torchscript_model/babyllama/babyllama_handler/config.json).
+Subsequently, we need to adjust the paths according to our local file structure in [config.json](https://github.com/pytorch/serve/blob/master/serve/cpp/test/resources/examples/babyllama/babyllama_handler/config.json).
 ```bash
 {
 "checkpoint_path" : "/home/ubuntu/serve/cpp/stories15M.bin",
@@ -74,7 +74,7 @@ Subsequently, we need to adjust the paths according to our local file structure
 ```
 Then we can create the mar file and deploy it with:
 ```bash
-cd serve/cpp/test/resources/torchscript_model/babyllama/babyllama_handler
+cd serve/cpp/test/resources/examples/babyllama/babyllama_handler
 torch-model-archiver --model-name llm --version 1.0 --handler libbabyllama_handler:BabyLlamaHandler --runtime LSP --extra-files config.json
 mkdir model_store && mv llm.mar model_store/
 torchserve --ncs --start --model-store model_store
@@ -85,7 +85,7 @@ The handler name `libbabyllama_handler:BabyLlamaHandler` consists of our shared
 
 To test the model we can run:
 ```bash
-cd serve/cpp/test/resources/torchscript_model/babyllama/
+cd serve/cpp/test/resources/examples/babyllama/
 curl http://localhost:8080/predictions/llm -T prompt.txt
 ```
 ##### Mnist example

diff --git a/cpp/build.sh b/cpp/build.sh
@@ -136,6 +136,14 @@ function install_yaml_cpp() {
   cd "$BWD" || exit
 }
 
+function build_llama_cpp() {
+  BWD=$(pwd)
+  LLAMA_CPP_SRC_DIR=$BASE_DIR/third-party/llama.cpp
+  cd "${LLAMA_CPP_SRC_DIR}"
+  make
+  cd "$BWD" || exit
+}
+
 function build() {
   MAYBE_BUILD_QUIC=""
   if [ "$WITH_QUIC" == true ] ; then
@@ -206,16 +214,6 @@ function build() {
   echo -e "${COLOR_GREEN}torchserve_cpp build is complete. To run unit test: \
   ./_build/test/torchserve_cpp_test ${COLOR_OFF}"
 
-  if [ -f "$DEPS_DIR/../src/examples/libmnist_handler.dylib" ]; then
-    mv $DEPS_DIR/../src/examples/libmnist_handler.dylib $DEPS_DIR/../../test/resources/torchscript_model/mnist/mnist_handler/libmnist_handler.dylib
-  elif [ -f "$DEPS_DIR/../src/examples/libmnist_handler.so" ]; then
-    mv $DEPS_DIR/../src/examples/libmnist_handler.so $DEPS_DIR/../../test/resources/torchscript_model/mnist/mnist_handler/libmnist_handler.so
-  fi
-
-  if [ -f "$DEPS_DIR/../src/examples/libbabyllama_handler.so" ]; then
-    mv $DEPS_DIR/../src/examples/libbabyllama_handler.so $DEPS_DIR/../../test/resources/torchscript_model/babyllama/babyllama_handler/libbabyllama_handler.so
-  fi
-
   cd $DEPS_DIR/../..
   if [ -f "$DEPS_DIR/../test/torchserve_cpp_test" ]; then
     $DEPS_DIR/../test/torchserve_cpp_test
@@ -311,10 +309,13 @@ mkdir -p "$LIBS_DIR"
 # Must execute from the directory containing this script
 cd $BASE_DIR
 
+git submodule update --init --recursive
+
 install_folly
 install_kineto
 install_libtorch
 install_yaml_cpp
+build_llama_cpp
 build
 symlink_torch_libs
 symlink_yaml_cpp_lib

diff --git a/cpp/src/examples/CMakeLists.txt b/cpp/src/examples/CMakeLists.txt
@@ -1,16 +1,6 @@
-set(MNIST_SRC_DIR "${torchserve_cpp_SOURCE_DIR}/src/examples/image_classifier/mnist")
 
-set(MNIST_SOURCE_FILES "")
-list(APPEND MNIST_SOURCE_FILES ${MNIST_SRC_DIR}/mnist_handler.cc)
-add_library(mnist_handler SHARED ${MNIST_SOURCE_FILES})
-target_include_directories(mnist_handler PUBLIC ${MNIST_SRC_DIR})
-target_link_libraries(mnist_handler PRIVATE ts_backends_core ts_utils ${TORCH_LIBRARIES})
+add_subdirectory("../../../examples/cpp/babyllama/" "../../../test/resources/examples/babyllama/babyllama_handler/")
 
+add_subdirectory("../../../examples/cpp/llamacpp/" "../../../test/resources/examples/llamacpp/llamacpp_handler/")
 
-set(BABYLLAMA_SRC_DIR "${torchserve_cpp_SOURCE_DIR}/src/examples/babyllama")
-set(BABYLLAMA_SOURCE_FILES "")
-list(APPEND BABYLLAMA_SOURCE_FILES ${BABYLLAMA_SRC_DIR}/baby_llama_handler.cc)
-add_library(babyllama_handler SHARED ${BABYLLAMA_SOURCE_FILES})
-target_include_directories(babyllama_handler PUBLIC ${BABYLLAMA_SRC_DIR})
-target_link_libraries(babyllama_handler PRIVATE ts_backends_core ts_utils ${TORCH_LIBRARIES})
-target_compile_options(babyllama_handler PRIVATE -Wall -Wextra -Ofast)
+add_subdirectory("../../../examples/cpp/mnist/" "../../../test/resources/examples/mnist/mnist_handler/")
diff --git a/cpp/test/backends/otf_protocol_and_handler_test.cc b/cpp/test/backends/otf_protocol_and_handler_test.cc
@@ -24,7 +24,7 @@ TEST(BackendIntegTest, TestOTFProtocolAndHandler) {
       // model_name length
       .WillOnce(::testing::Return(5))
       // model_path length
-      .WillOnce(::testing::Return(51))
+      .WillOnce(::testing::Return(42))
       // batch_size
       .WillOnce(::testing::Return(1))
       // handler length
@@ -44,9 +44,8 @@ TEST(BackendIntegTest, TestOTFProtocolAndHandler) {
         strncpy(data, "mnist", length);
       }))
       .WillOnce(testing::Invoke([=](size_t length, char* data) {
-        ASSERT_EQ(length, 51);
-        strncpy(data, "test/resources/torchscript_model/mnist/base_handler",
-                length);
+        ASSERT_EQ(length, 42);
+        strncpy(data, "test/resources/examples/mnist/base_handler", length);
       }))
       .WillOnce(testing::Invoke([=](size_t length, char* data) {
         ASSERT_EQ(length, 11);
@@ -60,7 +59,7 @@ TEST(BackendIntegTest, TestOTFProtocolAndHandler) {
   EXPECT_CALL(*client_socket, SendAll(testing::_, testing::_)).Times(1);
   auto load_model_request = OTFMessage::RetrieveLoadMsg(*client_socket);
   ASSERT_EQ(load_model_request->model_dir,
-            "test/resources/torchscript_model/mnist/base_handler");
+            "test/resources/examples/mnist/base_handler");
   ASSERT_EQ(load_model_request->model_name, "mnist");
   ASSERT_EQ(load_model_request->envelope, "");
   ASSERT_EQ(load_model_request->model_name, "mnist");
@@ -71,7 +70,7 @@ TEST(BackendIntegTest, TestOTFProtocolAndHandler) {
   auto backend = std::make_shared<torchserve::Backend>();
   MetricsRegistry::Initialize("test/resources/metrics/default_config.yaml",
                               MetricsContext::BACKEND);
-  backend->Initialize("test/resources/torchscript_model/mnist/base_handler");
+  backend->Initialize("test/resources/examples/mnist/base_handler");
 
   // load the model
   auto load_model_response = backend->LoadModel(load_model_request);
@@ -126,7 +125,7 @@ TEST(BackendIntegTest, TestOTFProtocolAndHandler) {
       .WillOnce(testing::Invoke([=](size_t length, char* data) {
         ASSERT_EQ(length, 3883);
         // strncpy(data, "valu", length);
-        std::ifstream input("test/resources/torchscript_model/mnist/0_png.pt",
+        std::ifstream input("test/resources/examples/mnist/0_png.pt",
                             std::ios::in | std::ios::binary);
         std::vector<char> image((std::istreambuf_iterator<char>(input)),
                                 (std::istreambuf_iterator<char>()));

diff --git a/cpp/test/examples/examples_test.cc b/cpp/test/examples/examples_test.cc
@@ -1,10 +1,38 @@
+#include <fstream>
+
 #include "test/utils/common.hh"
 
 TEST_F(ModelPredictTest, TestLoadPredictBabyLlamaHandler) {
+  std::string base_dir = "test/resources/examples/babyllama/";
+  std::string file1 = base_dir + "babyllama_handler/stories15M.bin";
+  std::string file2 = base_dir + "babyllama_handler/tokenizer.bin";
+
+  std::ifstream f1(file1);
+  std::ifstream f2(file2);
+
+  if (!f1.good() && !f2.good())
+    GTEST_SKIP()
+        << "Skipping TestLoadPredictBabyLlamaHandler because of missing files: "
+        << file1 << " or " << file2;
+
+  this->LoadPredict(
+      std::make_shared<torchserve::LoadModelRequest>(
+          base_dir + "babyllama_handler", "llm", -1, "", "", 1, false),
+      base_dir + "babyllama_handler", base_dir + "prompt.txt", "llm_ts", 200);
+}
+
+TEST_F(ModelPredictTest, TestLoadPredictLlmHandler) {
+  std::string base_dir = "test/resources/examples/llamacpp/";
+  std::string file1 = base_dir + "llamacpp_handler/llama-2-7b-chat.Q5_0.gguf";
+  std::ifstream f(file1);
+
+  if (!f.good())
+    GTEST_SKIP()
+        << "Skipping TestLoadPredictLlmHandler because of missing file: "
+        << file1;
+
   this->LoadPredict(
       std::make_shared<torchserve::LoadModelRequest>(
-          "test/resources/torchscript_model/babyllama/babyllama_handler", "llm",
-          -1, "", "", 1, false),
-      "test/resources/torchscript_model/babyllama/babyllama_handler",
-      "test/resources/torchscript_model/babyllama/prompt.txt", "llm_ts", 200);
+          base_dir + "llamacpp_handler", "llamacpp", -1, "", "", 1, false),
+      base_dir + "llamacpp_handler", base_dir + "prompt.txt", "llm_ts", 200);
 }
diff --git a/...a/babyllama_handler/MAR-INF/MANIFEST.json → ...a/babyllama_handler/MAR-INF/MANIFEST.json b/...a/babyllama_handler/MAR-INF/MANIFEST.json → ...a/babyllama_handler/MAR-INF/MANIFEST.json
diff --git a/cpp/test/resources/examples/babyllama/babyllama_handler/config.json b/cpp/test/resources/examples/babyllama/babyllama_handler/config.json
@@ -0,0 +1,4 @@
+{
+"checkpoint_path" : "test/resources/examples/babyllama/babyllama_handler/stories15M.bin",
+"tokenizer_path" : "test/resources/examples/babyllama/babyllama_handler/tokenizer.bin"
+}
diff --git a/...llama/babyllama_handler/config.properties → ...llama/babyllama_handler/config.properties b/...llama/babyllama_handler/config.properties → ...llama/babyllama_handler/config.properties
diff --git a/...es/torchscript_model/babyllama/prompt.txt → ...t/resources/examples/babyllama/prompt.txt b/...es/torchscript_model/babyllama/prompt.txt → ...t/resources/examples/babyllama/prompt.txt
diff --git a/cpp/test/resources/examples/llamacpp/llamacpp_handler/MAR-INF/MANIFEST.json b/cpp/test/resources/examples/llamacpp/llamacpp_handler/MAR-INF/MANIFEST.json
@@ -0,0 +1,10 @@
+{
+  "createdOn": "28/07/2020 06:32:08",
+  "runtime": "LSP",
+  "model": {
+    "modelName": "llamacpp",
+    "handler": "libllamacpp_handler:LlamaCppHandler",
+    "modelVersion": "2.0"
+  },
+  "archiverVersion": "0.2.0"
+}
diff --git a/...t/resources/torchscript_model/mnist/0.png → cpp/test/resources/examples/mnist/0.png b/...t/resources/torchscript_model/mnist/0.png → cpp/test/resources/examples/mnist/0.png
diff --git a/...esources/torchscript_model/mnist/0_png.pt → cpp/test/resources/examples/mnist/0_png.pt b/...esources/torchscript_model/mnist/0_png.pt → cpp/test/resources/examples/mnist/0_png.pt
diff --git a/.../mnist/base_handler/MAR-INF/MANIFEST.json → .../mnist/base_handler/MAR-INF/MANIFEST.json b/.../mnist/base_handler/MAR-INF/MANIFEST.json → .../mnist/base_handler/MAR-INF/MANIFEST.json
diff --git a/..._model/mnist/base_handler/mnist_script.pt → ...amples/mnist/base_handler/mnist_script.pt b/..._model/mnist/base_handler/mnist_script.pt → ...amples/mnist/base_handler/mnist_script.pt
diff --git a/...mnist/mnist_handler/MAR-INF/MANIFEST.json → ...mnist/mnist_handler/MAR-INF/MANIFEST.json b/...mnist/mnist_handler/MAR-INF/MANIFEST.json → ...mnist/mnist_handler/MAR-INF/MANIFEST.json
diff --git a/...model/mnist/mnist_handler/mnist_script.pt → ...mples/mnist/mnist_handler/mnist_script.pt b/...model/mnist/mnist_handler/mnist_script.pt → ...mples/mnist/mnist_handler/mnist_script.pt
diff --git a/...mnist/wrong_handler/MAR-INF/MANIFEST.json → ...mnist/wrong_handler/MAR-INF/MANIFEST.json b/...mnist/wrong_handler/MAR-INF/MANIFEST.json → ...mnist/wrong_handler/MAR-INF/MANIFEST.json
diff --git a/...l/mnist/wrong_model/MAR-INF/MANIFEST.json → ...s/mnist/wrong_model/MAR-INF/MANIFEST.json b/...l/mnist/wrong_model/MAR-INF/MANIFEST.json → ...s/mnist/wrong_model/MAR-INF/MANIFEST.json
diff --git a/cpp/test/resources/torchscript_model/babyllama/babyllama_handler/config.json b/cpp/test/resources/torchscript_model/babyllama/babyllama_handler/config.json
diff --git a/cpp/test/torch_scripted/torch_scripted_test.cc b/cpp/test/torch_scripted/torch_scripted_test.cc
@@ -9,47 +9,44 @@
 
 TEST_F(ModelPredictTest, TestLoadPredictBaseHandler) {
   this->LoadPredict(std::make_shared<torchserve::LoadModelRequest>(
-                        "test/resources/torchscript_model/mnist/mnist_handler",
+                        "test/resources/examples/mnist/mnist_handler",
                         "mnist_scripted_v2", -1, "", "", 1, false),
-                    "test/resources/torchscript_model/mnist/base_handler",
-                    "test/resources/torchscript_model/mnist/0_png.pt",
-                    "mnist_ts", 200);
+                    "test/resources/examples/mnist/base_handler",
+                    "test/resources/examples/mnist/0_png.pt", "mnist_ts", 200);
 }
 
 TEST_F(ModelPredictTest, TestLoadPredictMnistHandler) {
   this->LoadPredict(std::make_shared<torchserve::LoadModelRequest>(
-                        "test/resources/torchscript_model/mnist/mnist_handler",
+                        "test/resources/examples/mnist/mnist_handler",
                         "mnist_scripted_v2", -1, "", "", 1, false),
-                    "test/resources/torchscript_model/mnist/mnist_handler",
-                    "test/resources/torchscript_model/mnist/0_png.pt",
-                    "mnist_ts", 200);
+                    "test/resources/examples/mnist/mnist_handler",
+                    "test/resources/examples/mnist/0_png.pt", "mnist_ts", 200);
 }
 
 TEST_F(ModelPredictTest, TestBackendInitWrongModelDir) {
-  auto result = backend_->Initialize("test/resources/torchscript_model/mnist");
+  auto result = backend_->Initialize("test/resources/examples/mnist");
   ASSERT_EQ(result, false);
 }
 
 TEST_F(ModelPredictTest, TestBackendInitWrongHandler) {
-  auto result = backend_->Initialize(
-      "test/resources/torchscript_model/mnist/wrong_handler");
+  auto result =
+      backend_->Initialize("test/resources/examples/mnist/wrong_handler");
   ASSERT_EQ(result, false);
 }
 
 TEST_F(ModelPredictTest, TestLoadModelFailure) {
-  backend_->Initialize("test/resources/torchscript_model/mnist/wrong_model");
+  backend_->Initialize("test/resources/examples/mnist/wrong_model");
   auto result =
       backend_->LoadModel(std::make_shared<torchserve::LoadModelRequest>(
-          "test/resources/torchscript_model/mnist/wrong_model",
-          "mnist_scripted_v2", -1, "", "", 1, false));
+          "test/resources/examples/mnist/wrong_model", "mnist_scripted_v2", -1,
+          "", "", 1, false));
   ASSERT_EQ(result->code, 500);
 }
 
 TEST_F(ModelPredictTest, TestLoadPredictMnistHandlerFailure) {
   this->LoadPredict(std::make_shared<torchserve::LoadModelRequest>(
-                        "test/resources/torchscript_model/mnist/mnist_handler",
+                        "test/resources/examples/mnist/mnist_handler",
                         "mnist_scripted_v2", -1, "", "", 1, false),
-                    "test/resources/torchscript_model/mnist/mnist_handler",
-                    "test/resources/torchscript_model/mnist/0.png", "mnist_ts",
-                    500);
+                    "test/resources/examples/mnist/mnist_handler",
+                    "test/resources/examples/mnist/0.png", "mnist_ts", 500);
 }
diff --git a/cpp/test/utils/model_archiver_test.cc b/cpp/test/utils/model_archiver_test.cc
@@ -6,7 +6,7 @@ namespace torchserve {
 TEST(ManifestTest, TestInitialize) {
   torchserve::Manifest manifest;
   manifest.Initialize(
-      "test/resources/torchscript_model/mnist/base_handler/MAR-INF/"
+      "test/resources/examples/mnist/base_handler/MAR-INF/"
       "MANIFEST.json");
   ASSERT_EQ(manifest.GetCreatOn(), "28/07/2020 06:32:08");
   ASSERT_EQ(manifest.GetArchiverVersion(), "0.2.0");

diff --git a/cpp/third-party/llama.cpp b/cpp/third-party/llama.cpp
diff --git a/examples/cpp/babyllama/CMakeLists.txt b/examples/cpp/babyllama/CMakeLists.txt
@@ -0,0 +1,5 @@
+
+add_library(babyllama_handler SHARED src/baby_llama_handler.cc)
+
+target_link_libraries(babyllama_handler PRIVATE ts_backends_core ts_utils ${TORCH_LIBRARIES})
+target_compile_options(babyllama_handler PRIVATE -Wall -Wextra -Ofast)
diff --git a/examples/cpp/babyllama/config.json b/examples/cpp/babyllama/config.json
@@ -0,0 +1,4 @@
+{
+"checkpoint_path" : "/home/ubuntu/serve/examples/cpp/babyllama/stories15M.bin",
+"tokenizer_path" : "/home/ubuntu/serve/examples/cpp/babyllama/tokenizer.bin"
+}
diff --git a/.../examples/babyllama/baby_llama_handler.cc → ...s/cpp/babyllama/src/baby_llama_handler.cc b/.../examples/babyllama/baby_llama_handler.cc → ...s/cpp/babyllama/src/baby_llama_handler.cc
@@ -1,11 +1,11 @@
-#include "src/examples/babyllama/baby_llama_handler.hh"
+#include "baby_llama_handler.hh"
 
 #include <folly/FileUtil.h>
 #include <folly/json.h>
 
 #include <typeinfo>
 
-#include "src/examples/babyllama/llama2.c/run.c"
+#include "llama2.c/run.c"
 
 namespace llm {
 
@@ -233,7 +233,6 @@ c10::IValue BabyLlamaHandler::Inference(
   } catch (...) {
     TS_LOG(ERROR, "Failed to run inference on this batch");
   }
-  std::cout << "WOOT?" << std::endl;
   return batch_output_vector;
 }
 

diff --git a/.../examples/babyllama/baby_llama_handler.hh → ...s/cpp/babyllama/src/baby_llama_handler.hh b/.../examples/babyllama/baby_llama_handler.hh → ...s/cpp/babyllama/src/baby_llama_handler.hh
diff --git a/cpp/src/examples/babyllama/llama2.c/LICENSE → examples/cpp/babyllama/src/llama2.c/LICENSE b/cpp/src/examples/babyllama/llama2.c/LICENSE → examples/cpp/babyllama/src/llama2.c/LICENSE
diff --git a/cpp/src/examples/babyllama/llama2.c/run.c → examples/cpp/babyllama/src/llama2.c/run.c b/cpp/src/examples/babyllama/llama2.c/run.c → examples/cpp/babyllama/src/llama2.c/run.c
diff --git a/examples/cpp/llamacpp/CMakeLists.txt b/examples/cpp/llamacpp/CMakeLists.txt
@@ -0,0 +1,20 @@
+set(LLAMACPP_SRC_DIR "${torchserve_cpp_SOURCE_DIR}/third-party/llama.cpp")
+
+add_library(llamacpp_handler SHARED src/llamacpp_handler.cc)
+
+set(MY_OBJECT_FILES
+    ${LLAMACPP_SRC_DIR}/ggml.o
+    ${LLAMACPP_SRC_DIR}/llama.o
+    ${LLAMACPP_SRC_DIR}/common.o
+    ${LLAMACPP_SRC_DIR}/ggml-quants.o
+    ${LLAMACPP_SRC_DIR}/ggml-alloc.o
+    ${LLAMACPP_SRC_DIR}/grammar-parser.o
+    ${LLAMACPP_SRC_DIR}/console.o
+    ${LLAMACPP_SRC_DIR}/build-info.o
+    ${LLAMACPP_SRC_DIR}/ggml-backend.o
+
+)
+
+target_sources(llamacpp_handler PRIVATE ${MY_OBJECT_FILES})
+target_include_directories(llamacpp_handler PUBLIC ${LLAMACPP_SRC_DIR})
+target_link_libraries(llamacpp_handler PRIVATE ts_backends_core ts_utils ${TORCH_LIBRARIES})