Skip to content

Commit

Permalink
Feature/cpp baby llama rework (#2903)
Browse files Browse the repository at this point in the history
* Baby Llama - Porting run.c for integration and fixed clang type conversion errors.

Signed-off-by: Shrinath Suresh <[email protected]>

Custom preprocess implementation

Signed-off-by: Shrinath Suresh <[email protected]>

Free memory only after the inference is done

Signed-off-by: Shrinath Suresh <[email protected]>

Implement Postprocess

Signed-off-by: Shrinath Suresh <[email protected]>

Setting Fast compiler option

Signed-off-by: Shrinath Suresh <[email protected]>

Reading checkpoint path and tokenizer path from config file using folly

Signed-off-by: Shrinath Suresh <[email protected]>

Removing run.c from cmake

Signed-off-by: Shrinath Suresh <[email protected]>

Replace auto with appropriate data type

Signed-off-by: Shrinath Suresh <[email protected]>

Using smartpointers and initializing the vector with appropriate size upfront

Signed-off-by: Shrinath Suresh <[email protected]>

Using smartpointers

Signed-off-by: Shrinath Suresh <[email protected]>

Directly converting the tensor values to prompt token ids

Signed-off-by: Shrinath Suresh <[email protected]>

Moving run.c and common variables to .cc file

Signed-off-by: Shrinath Suresh <[email protected]>

Moving run.c to a separate folder

Signed-off-by: Shrinath Suresh <[email protected]>

Uncommenting the original run.c main method

Signed-off-by: Shrinath Suresh <[email protected]>

Implemented destructor to free up resources

Signed-off-by: Shrinath Suresh <[email protected]>

Supporting files for unit test

Signed-off-by: Shrinath Suresh <[email protected]>

Processing all the batch inputs

Signed-off-by: Shrinath Suresh <[email protected]>

Setting InferenceMode guard

Signed-off-by: Shrinath Suresh <[email protected]>

Updating InferenceMode to use torch::InferenceMode

Signed-off-by: Shrinath Suresh <[email protected]>

Updating class name to BabyLlamaHandler

Signed-off-by: Shrinath Suresh <[email protected]>

Renaming llm_handler target to babyllama_handler

Signed-off-by: Shrinath Suresh <[email protected]>

Adding dummy pt file

Signed-off-by: Shrinath Suresh <[email protected]>

Typo Fix

Signed-off-by: Shrinath Suresh <[email protected]>

Calculate tokens/per second for batch input

Signed-off-by: Shrinath Suresh <[email protected]>

Adding README.md for babyllama example

Signed-off-by: Shrinath Suresh <[email protected]>

Fixing out-of-bound mem access in babyllama example

Move model instance out of ts_backend

Use shared_ptr<void> for model to detangle from torchscript

Move BaseHAndler to backends/handler

Move model instance into core

Remove Torchscript as a backend and implement it as a handler

Move torchscript test out of backend folder

Remove dummy.pt in babyllama + update README + mvoe babyllama test to new examples/examples_test.cc file

* fix spell check

* Move cpp babyllama example to main example folder

* Add last successful location to error message in handle function

* Fix babyllama batching by changing input/output from tensor to IValue

* rename prompt file

* Fix spellcheck

---------

Co-authored-by: Shrinath Suresh <[email protected]>
  • Loading branch information
mreso and shrinath-suresh authored Jan 26, 2024
1 parent 9e6f1c2 commit 3ecaf0b
Show file tree
Hide file tree
Showing 41 changed files with 1,812 additions and 470 deletions.
69 changes: 46 additions & 23 deletions cpp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ python ts_scripts/install_dependencies.py --cpp [--cuda=cu121|cu118]
### Building the backend
```
## Dev Build
cd serve/cpp
cd serve/cpp
./build.sh [-g cu121|cu118]
## Install TorchServe from source
Expand All @@ -34,32 +34,60 @@ cd serve
torchserve torchserve --ncs --start --model-store model_store
```
## Backend
TorchServe cpp backend can run as a process, which is similar to [TorchServe Python backend](https://github.com/pytorch/serve/tree/master/ts). By default, TorchServe supports torch scripted model in cpp backend. [src/backends/core/backend.hh](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/core/backend.hh) defines the APIs of backend to support multiple different platforms such as MxNet, ONNX and so on.
* [Backend](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/core/backend.hh#L60) defines function `LoadModelInternal` to support model loading on different platforms.
* [ModelInstance](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/core/backend.hh#L25) represents a model copy. The function `Predict` is to support prediction on different platforms.
### TorchScripted Backend
By default, TorchServe cpp provides [TorchScripted backend](https://github.com/pytorch/serve/tree/cpp_backend/cpp/src/backends/torch_scripted). Its [base handler](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/torch_scripted/handler/base_handler.hh) defines APIs to customize handler.
* [Initialize](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/torch_scripted/handler/base_handler.hh#L29)
* [LoadModel](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/torch_scripted/handler/base_handler.hh#L37)
* [Preprocess](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/torch_scripted/handler/base_handler.hh#L40)
* [Inference](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/torch_scripted/handler/base_handler.hh#L46)
* [Postprocess](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/torch_scripted/handler/base_handler.hh#L53)
TorchServe cpp backend can run as a process, which is similar to [TorchServe Python backend](https://github.com/pytorch/serve/tree/master/ts). By default, TorchServe supports torch scripted model in cpp backend. Other platforms such as MxNet, ONNX can be supported through custom handlers following the TorchScript example [src/backends/handler/torch_scripted_handler.hh](https://github.com/pytorch/serve/blob/master/src/backends/handler/torch_scripted_handler.hh).
### Custom Handler
By default, TorchServe cpp provides a handler for TorchScript [src/backends/handler/torch_scripted_handler.hh](https://github.com/pytorch/serve/blob/master/src/backends/handler/torch_scripted_handler.hh). Its uses the [BaseHandler](https://github.com/pytorch/serve/blob/master/src/backends/handler/base_handler.hh) which defines the APIs to customize handler.
* [Initialize](serve/blob/cpp_backend/cpp/src/backends/handler/base_handler.hh#L29)
* [LoadModel](serve/blob/cpp_backend/cpp/src/backends/handler/base_handler.hh#L37)
* [Preprocess](serve/blob/cpp_backend/cpp/src/backends/handler/base_handler.hh#L40)
* [Inference](serve/blob/cpp_backend/cpp/src/backends/handler/base_handler.hh#L46)
* [Postprocess](serve/blob/cpp_backend/cpp/src/backends/handler/base_handler.hh#L53)
#### Example
##### Using BaseHandler
* set runtime as "LSP" in model archiver option [--runtime](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
* set handler as "BaseHandler" in model archiver option [--handler](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
##### Using TorchScriptHandler
* set runtime as "LSP" in model archiver option [--runtime](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
* set handler as "TorchScriptHandler" in model archiver option [--handler](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
```
torch-model-archiver --model-name mnist_base --version 1.0 --serialized-file mnist_script.pt --handler BaseHandler --runtime LSP
torch-model-archiver --model-name mnist_base --version 1.0 --serialized-file mnist_script.pt --handler TorchScriptHandler --runtime LSP
```
Here is an [example](https://github.com/pytorch/serve/tree/cpp_backend/cpp/test/resources/torchscript_model/mnist/base_handler) of unzipped model mar file.
##### Using customized handler
##### Using Custom Handler
* build customized handler shared lib. For example [Mnist handler](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/examples/image_classifier/mnist).
* set runtime as "LSP" in model archiver option [--runtime](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
* set runtime as "LSP" in model archiver option [--runtime](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
* set handler as "libmnist_handler:MnistHandler" in model archiver option [--handler](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
```
torch-model-archiver --model-name mnist_handler --version 1.0 --serialized-file mnist_script.pt --handler libmnist_handler:MnistHandler --runtime LSP
```
Here is an [example](https://github.com/pytorch/serve/tree/cpp_backend/cpp/test/resources/torchscript_model/mnist/mnist_handler) of unzipped model mar file.
##### BabyLLama Example
The babyllama example can be found [here](https://github.com/pytorch/serve/blob/master/cpp/src/examples/babyllama/).
To run the example we need to download the weights as well as tokenizer files:
```bash
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
wget https://github.com/karpathy/llama2.c/raw/master/tokenizer.bin
```
Subsequently, we need to adjust the paths according to our local file structure in [config.json](https://github.com/pytorch/serve/blob/master/serve/cpp/test/resources/torchscript_model/babyllama/babyllama_handler/config.json).
```bash
{
"checkpoint_path" : "/home/ubuntu/serve/cpp/stories15M.bin",
"tokenizer_path" : "/home/ubuntu/serve/cpp/src/examples/babyllama/tokenizer.bin"
}
```
Then we can create the mar file and deploy it with:
```bash
cd serve/cpp/test/resources/torchscript_model/babyllama/babyllama_handler
torch-model-archiver --model-name llm --version 1.0 --handler libbabyllama_handler:BabyLlamaHandler --runtime LSP --extra-files config.json
mkdir model_store && mv llm.mar model_store/
torchserve --ncs --start --model-store model_store

curl -v -X POST "http://localhost:8081/models?initial_workers=1&url=llm.mar"
```
The handler name `libbabyllama_handler:BabyLlamaHandler` consists of our shared library name (as defined in our [CMakeLists.txt](https://github.com/pytorch/serve/blob/master/serve/cpp/src/examples/CMakeLists.txt)) as well as the class name we chose for our [custom handler class](https://github.com/pytorch/serve/blob/master/serve/cpp/src/examples/babyllama/baby_llama_handler.cc) which derives its properties from BaseHandler.

To test the model we can run:
```bash
cd serve/cpp/test/resources/torchscript_model/babyllama/
curl http://localhost:8080/predictions/llm -T prompt.txt
```
##### Mnist example
* Transform data on client side. For example:
```
Expand All @@ -75,9 +103,4 @@ image = Image.open("examples/image_classifier/mnist/test_data/0.png")
image = image_processing(image)
torch.save(image, "0_png.pt")
```
* Run model registration and prediction: [Using BaseHandler](https://github.com/pytorch/serve/blob/cpp_backend/cpp/test/backends/torch_scripted/torch_scripted_backend_test.cc#L54) or [Using customized handler](https://github.com/pytorch/serve/blob/cpp_backend/cpp/test/backends/torch_scripted/torch_scripted_backend_test.cc#L72).





* Run model registration and prediction: [Using BaseHandler](serve/cpp/test/backends/torch_scripted/torch_scripted_backend_test.cc#L54) or [Using customized handler](serve/cpp/test/backends/torch_scripted/torch_scripted_backend_test.cc#L72).
4 changes: 4 additions & 0 deletions cpp/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,10 @@ function build() {
mv $DEPS_DIR/../src/examples/libmnist_handler.so $DEPS_DIR/../../test/resources/torchscript_model/mnist/mnist_handler/libmnist_handler.so
fi

if [ -f "$DEPS_DIR/../src/examples/libbabyllama_handler.so" ]; then
mv $DEPS_DIR/../src/examples/libbabyllama_handler.so $DEPS_DIR/../../test/resources/torchscript_model/babyllama/babyllama_handler/libbabyllama_handler.so
fi

cd $DEPS_DIR/../..
if [ -f "$DEPS_DIR/../test/torchserve_cpp_test" ]; then
$DEPS_DIR/../test/torchserve_cpp_test
Expand Down
41 changes: 14 additions & 27 deletions cpp/src/backends/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,40 +15,27 @@ target_link_libraries(ts_backends_protocol PRIVATE ts_utils ${FOLLY_LIBRARIES})
install(TARGETS ts_backends_protocol DESTINATION ${torchserve_cpp_SOURCE_DIR}/_build/libs)

# build library ts_backend_core
set(TS_BACKENDS_CORE_SOURCE_FILES "")
list(APPEND TS_BACKENDS_CORE_SOURCE_FILES ${TS_BACKENDS_CORE_SRC_DIR}/backend.cc)
add_library(ts_backends_core SHARED ${TS_BACKENDS_CORE_SOURCE_FILES})
set(BACKEND_SOURCE_FILES "")
list(APPEND BACKEND_SOURCE_FILES ${TS_BACKENDS_SRC_DIR}/core/backend.cc)
list(APPEND BACKEND_SOURCE_FILES ${TS_BACKENDS_SRC_DIR}/core/model_instance.cc)
list(APPEND BACKEND_SOURCE_FILES ${TS_BACKENDS_SRC_DIR}/handler/base_handler.cc)
list(APPEND BACKEND_SOURCE_FILES ${TS_BACKENDS_SRC_DIR}/handler/torch_scripted_handler.cc)
add_library(ts_backends_core SHARED ${BACKEND_SOURCE_FILES})
target_include_directories(ts_backends_core PUBLIC ${TS_BACKENDS_CORE_SRC_DIR})
target_link_libraries(ts_backends_core PRIVATE ts_utils ts_backends_protocol ${FOLLY_LIBRARIES})
target_link_libraries(ts_backends_core PUBLIC ts_utils ts_backends_protocol ${FOLLY_LIBRARIES})
install(TARGETS ts_backends_core DESTINATION ${torchserve_cpp_SOURCE_DIR}/_build/libs)

# build library ts_backend_torch_scripted
set(TS_BACKENDS_TORCH_SCRIPTED_SOURCE_FILES "")
list(APPEND TS_BACKENDS_TORCH_SCRIPTED_SOURCE_FILES ${TS_BACKENDS_TORCH_SCRIPTED_SRC_DIR}/torch_scripted_backend.cc)
list(APPEND TS_BACKENDS_TORCH_SCRIPTED_SOURCE_FILES ${TS_BACKENDS_TORCH_SCRIPTED_SRC_DIR}/handler/base_handler.cc)
add_library(ts_backends_torch_scripted SHARED ${TS_BACKENDS_TORCH_SCRIPTED_SOURCE_FILES})
target_include_directories(ts_backends_torch_scripted PUBLIC
${TS_BACKENDS_TORCH_SCRIPTED_SRC_DIR} ${TS_BACKENDS_TORCH_SCRIPTED_SRC_DIR}/handler ${TORCH_INCLUDE_DIRS})
target_link_libraries(ts_backends_torch_scripted PUBLIC ts_utils ts_backends_core ${TORCH_LIBRARIES})
install(TARGETS ts_backends_torch_scripted DESTINATION ${torchserve_cpp_SOURCE_DIR}/_build/libs)

# build library ts_backend_torch_deploy
#set(TS_BACKENDS_TORCH_DEPLOY_SOURCE_FILES "")
#add_library(ts_backends_torch_deploy SHARED ${TS_BACKENDS_TORCH_DEPLOY_SOURCE_FILES})
#target_include_directories(ts_backends_torch_deploy PUBLIC ${TS_BACKENDS_TORCH_DEPLOY_SRC_DIR})
#target_link_libraries(ts_backends_torch_deploy PRIVATE ts_utils ts_backends_core ${TORCH_LIBRARIES})

# build exe model_worker_socket
add_executable(model_worker_socket
add_executable(model_worker_socket
"${TS_BACKENDS_PROCESS_SRC_DIR}/model_worker_socket.cc"
"${TS_BACKENDS_PROCESS_SRC_DIR}/model_worker.cc"
)
target_include_directories(model_worker_socket PRIVATE
target_include_directories(model_worker_socket PRIVATE
${TS_BACKENDS_CORE_SRC_DIR}
${TS_BACKENDS_PROTOCOL_SRC_DIR}
${TS_BACKENDS_PROCESS_SRC_DIR}
${TS_BACKENDS_TORCH_SCRIPTED_SRC_DIR}
${TS_BACKENDS_PROTOCOL_SRC_DIR}
${TS_BACKENDS_PROCESS_SRC_DIR}
${TS_BACKENDS_TORCH_SCRIPTED_SRC_DIR}
)
target_link_libraries(model_worker_socket
PRIVATE ts_backends_core ts_backends_protocol ts_backends_torch_scripted ${FOLLY_LIBRARIES})
target_link_libraries(model_worker_socket
PRIVATE ts_backends_core ts_backends_protocol ${FOLLY_LIBRARIES} ${TORCH_LIBRARIES})
install(TARGETS model_worker_socket DESTINATION ${torchserve_cpp_SOURCE_DIR}/_build/bin)
96 changes: 92 additions & 4 deletions cpp/src/backends/core/backend.cc
Original file line number Diff line number Diff line change
@@ -1,6 +1,63 @@
#include "src/backends/core/backend.hh"

#include <memory>

#include "src/backends/handler/handler_factory.hh"

namespace torchserve {
Backend::Backend() {}

Backend::~Backend() {
handler_.reset();
model_instance_table_.clear();
// Todo: do proper cleanup
// dl_loader_->CloseDL();
}

bool Backend::Initialize(const std::string &model_dir) {
random_generator_.seed(time(0));
manifest_ = std::make_shared<torchserve::Manifest>();
// TODO: windows
if (!manifest_->Initialize(
fmt::format("{}/MAR-INF/MANIFEST.json", model_dir))) {
return false;
}

LoadHandler(model_dir);

if (!handler_) {
return false;
}

handler_->Initialize(model_dir, manifest_);

return true;
}

void Backend::LoadHandler(const std::string &model_dir) {
const std::string &handler_str = manifest_->GetModel().handler;
std::size_t delimiter_pos = handler_str.find(manifest_->kHandler_Delimiter);
if (delimiter_pos != std::string::npos) {
#ifdef __APPLE__
std::string lib_path = fmt::format("{}/{}.dylib", model_dir,
handler_str.substr(0, delimiter_pos));
#else
std::string lib_path = fmt::format("{}/{}.so", model_dir,
handler_str.substr(0, delimiter_pos));
#endif
std::string handler_class_name = handler_str.substr(delimiter_pos + 1);
std::string allocator_func = fmt::format("allocator{}", handler_class_name);
std::string deleter_func = fmt::format("deleter{}", handler_class_name);

dl_loader_ = std::make_unique<DLLoader<BaseHandler>>(
lib_path, allocator_func, deleter_func);
dl_loader_->OpenDL();
handler_ = dl_loader_->GetInstance();
} else {
handler_ = HandlerFactory::GetInstance().createHandler(handler_str);
}
}

std::unique_ptr<torchserve::LoadModelResponse> Backend::LoadModel(
std::shared_ptr<torchserve::LoadModelRequest> load_model_request) {
/**
Expand All @@ -13,12 +70,43 @@ std::unique_ptr<torchserve::LoadModelResponse> Backend::LoadModel(
* - status_READY: return the model instance if it is already.
*
* Common steps:
* https://github.com/pytorch/serve/blob/master/ts/model_loader.py#L62
* serve/blob/master/ts/model_loader.py#L62
*/

// TODO: support request envelope:
// serve/tree/master/ts/torch_handler/request_envelope

return LoadModelInternal(std::move(load_model_request));
}

std::unique_ptr<LoadModelResponse> Backend::LoadModelInternal(
std::shared_ptr<LoadModelRequest> load_model_request) {
std::string model_instance_id = BuildModelInstanceId(load_model_request);
try {
model_instance_table_[model_instance_id] = {
ModelInstanceStatus::INIT, std::shared_ptr<ModelInstance>(nullptr)};

auto result = handler_->LoadModel(load_model_request);
SetModelInstanceInfo(model_instance_id, ModelInstanceStatus::READY,
std::make_shared<ModelInstance>(
model_instance_id, std::move(result.first),
handler_, std::move(result.second)));

ready_model_instance_ids_.emplace_back(model_instance_id);
std::string message =
fmt::format("loaded model {}", load_model_request->model_name);
return std::make_unique<LoadModelResponse>(
// TODO: check current response msg content
200, message);
} catch (const c10::Error &e) {
SetModelInstanceInfo(model_instance_id, ModelInstanceStatus::FAILED,
std::shared_ptr<ModelInstance>(nullptr));
return std::make_unique<LoadModelResponse>(
// TODO: check existing
500, e.msg());
}
}

std::string Backend::BuildModelInstanceId(
std::shared_ptr<torchserve::LoadModelRequest> load_model_request) {
std::string device_type("cpu");
Expand All @@ -30,15 +118,15 @@ std::string Backend::BuildModelInstanceId(
}

void Backend::SetModelInstanceInfo(
const std::string& model_instance_id, ModelInstanceStatus new_status,
const std::string &model_instance_id, ModelInstanceStatus new_status,
std::shared_ptr<torchserve::ModelInstance> new_model_instance) {
model_instance_table_[model_instance_id].status = new_status;
model_instance_table_[model_instance_id].model_instance =
std::move(new_model_instance);
}

torchserve::Backend::ModelInstanceStatus Backend::GetModelInstanceStatus(
const std::string& model_instance_id) {
const std::string &model_instance_id) {
auto model_instance_info = model_instance_table_.find(model_instance_id);
if (model_instance_info == model_instance_table_.end()) {
return torchserve::Backend::ModelInstanceStatus::NOT_INIT;
Expand All @@ -47,7 +135,7 @@ torchserve::Backend::ModelInstanceStatus Backend::GetModelInstanceStatus(
}

std::shared_ptr<torchserve::ModelInstance> Backend::GetModelInstance(
const std::string& model_instance_id) {
const std::string &model_instance_id) {
auto model_instance_info = model_instance_table_.find(model_instance_id);
if (model_instance_info == model_instance_table_.end()) {
return std::shared_ptr<torchserve::ModelInstance>(nullptr);
Expand Down
Loading

0 comments on commit 3ecaf0b

Please sign in to comment.