Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xnnpack backend support #159

Merged
merged 69 commits into from
Oct 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
59cdc15
feat: XpOps, XpDirect.
chenghuaWang Oct 9, 2024
8fbd7df
feat: Xnnpack Add Example
chenghuaWang Oct 9, 2024
7024eab
feat: mllm frontend -> xnn static graph
chenghuaWang Oct 10, 2024
a3f271c
fix: Add Example Done.
chenghuaWang Oct 10, 2024
9c59cd0
feat: xnnpack wrap
chenghuaWang Oct 10, 2024
c4bad7b
fix: include path, update xnnpack to latest
chenghuaWang Oct 10, 2024
0a6f706
feat: xnn backend element wise op function
chenghuaWang Oct 11, 2024
f156f35
feat: xnn weight register and linear op
chenghuaWang Oct 11, 2024
efb3fa4
fix: XpLinear error with NoLoadWeightsDtype
chenghuaWang Oct 11, 2024
5781312
feat: xnnpack matmul rope
chenghuaWang Oct 14, 2024
72dfb7a
feat: fix redefine tensor in xnnpack bug
chenghuaWang Oct 15, 2024
4ad8d31
feat: add relu and rope bug fix
chenghuaWang Oct 15, 2024
ad3b148
feat: xnnpack GELU, Softmax, SiLU impl
chenghuaWang Oct 16, 2024
886eba7
feat: rms norm, tranpose
chenghuaWang Oct 16, 2024
b37ad92
feat: kvcache, still buggy, rfc.
chenghuaWang Oct 17, 2024
90164da
feat: update 3rd party packages
chenghuaWang Oct 17, 2024
b0883bb
feat: xp kvcache fix.
chenghuaWang Oct 18, 2024
05f167b
fix: github action main.yml
chenghuaWang Oct 18, 2024
69e80f9
feat: !!!SDPA!!! (Support B, H, S, D) layout
chenghuaWang Oct 20, 2024
a7471be
feat: XpSDPA torch impl for check mllm's correctness
chenghuaWang Oct 20, 2024
4a95180
fix: XpRoPE, add view func
chenghuaWang Oct 20, 2024
a34240f
fix: rope test example
chenghuaWang Oct 22, 2024
5b4bd71
fix: transpose xnn example bugs find
chenghuaWang Oct 22, 2024
314ca42
fix: xnnpack uuid register bug
chenghuaWang Oct 22, 2024
bea70c1
fix: xnnpack uuid register bug
chenghuaWang Oct 22, 2024
063f37f
fix: rope xnnpack test error
chenghuaWang Oct 23, 2024
3b75a25
feat: matmul xnnpack, failed at stl malloc.
chenghuaWang Oct 23, 2024
2945423
fix: xnnpack illegal memory r/w by using valgrind.
chenghuaWang Oct 23, 2024
d778974
fix: xnnpack attention impl bug
chenghuaWang Oct 24, 2024
dbb642b
feat: XpEmbedding op
chenghuaWang Oct 25, 2024
75a2850
feat: xp llama
chenghuaWang Oct 25, 2024
895e84e
fix: use_layername_2_tensorname = false;
chenghuaWang Oct 26, 2024
d0e31e9
change: llama xp example -> qwen xp example
chenghuaWang Oct 26, 2024
19ba8aa
Merge remote-tracking branch 'upstream/main'
chenghuaWang Oct 26, 2024
a4a8c70
fix: megre confict bugs
chenghuaWang Oct 26, 2024
bf278ef
fix: llm_model_ptr null error
chenghuaWang Oct 26, 2024
1c043ee
Merge branch 'main' into main
UbiquitousLearning Oct 26, 2024
d5982d2
fix: xnn backend load bug fix
chenghuaWang Oct 28, 2024
c689339
Merge remote-tracking branch 'upstream/main'
chenghuaWang Oct 29, 2024
189fa72
fix: add OUTPUT_TENSOR to TensorType. Remove this type after xnnpack …
chenghuaWang Oct 29, 2024
ca7b7f5
update: third_party submodule
chenghuaWang Oct 29, 2024
f97ee69
fix: freeze xnnpack version
chenghuaWang Oct 29, 2024
6b185ec
feat: QWen version 1.5 0.5B and 1.8B xnnpack backend.
chenghuaWang Oct 29, 2024
1bc39c2
Merge branch 'main' into main
yirongjie Oct 29, 2024
5e9345e
fix: xnnpack backend rope
chenghuaWang Oct 29, 2024
cc27a58
feat: reduce memory load time in xnnpack
chenghuaWang Oct 29, 2024
11cfa6f
fix: xnnpack qwen example token backend
chenghuaWang Oct 29, 2024
8c0c300
Merge branch 'UbiquitousLearning:main' into main
chenghuaWang Oct 29, 2024
f77622f
fix: use_layername_2_tensorname
yirongjie Oct 29, 2024
d3d9211
fix: MLLM_BUILD_XNNPACK OFF error and redundant xnnpack exec targets
chenghuaWang Oct 29, 2024
2cd51cf
Merge branch 'main' of https://github.com/chenghuaWang/mllm
chenghuaWang Oct 29, 2024
172bbd9
Merge branch 'main' into main
yirongjie Oct 29, 2024
757905e
fix: remove memory test due to previous workflow.yaml remove it from …
chenghuaWang Oct 30, 2024
c5302e0
fix: mask init error in xnnpack
chenghuaWang Oct 30, 2024
7f38bdf
Merge branch 'UbiquitousLearning:main' into main
chenghuaWang Oct 30, 2024
20393cc
Merge branch 'main' into main
yirongjie Oct 30, 2024
b43180e
Merge branch 'main' into main
yirongjie Oct 30, 2024
da3c295
fix: merge tokenize error
yirongjie Oct 30, 2024
9eb944c
fix: tokenizer apply
yirongjie Oct 30, 2024
779207b
fix: change HardSwish to original Swish function
chenghuaWang Oct 31, 2024
458a1e3
fix: mask bug in xnnpack
chenghuaWang Oct 31, 2024
fd1a449
fix: tokenizer.tokenize
yirongjie Oct 31, 2024
1d62462
fix: remove unused
yirongjie Oct 31, 2024
211ad56
update: move Xp*Test to MLLM_TEST
chenghuaWang Oct 31, 2024
5ceadc3
Merge branch 'main' of https://github.com/chenghuaWang/mllm
chenghuaWang Oct 31, 2024
ddddab3
fix: test bug
chenghuaWang Oct 31, 2024
919520f
fix: xnnpack test setup
chenghuaWang Oct 31, 2024
99f2ebc
fix: XpTest Error
chenghuaWang Oct 31, 2024
b829c24
fix: set xnn default threads to 4
chenghuaWang Oct 31, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -88,4 +88,4 @@ jobs:





9 changes: 8 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -127,9 +127,16 @@ if(QNN) # QNN lib
add_subdirectory(${CMAKE_CURRENT_LIST_DIR}/src/backends/qnn)
endif()

option(MLLM_BUILD_XNNPACK_BACKEND "Build mllm's XNNPACK backend" OFF)
option(MLLM_BUILD_XNNPACK_BACKEND "Build mllm's XNNPACK backend" ON)
if(MLLM_BUILD_XNNPACK_BACKEND)
if(NOT WIN32)
add_compile_options(-fPIC)
else()
# -fPIC is not a windows flag
set(CMAKE_POSITION_INDEPENDENT_CODE FALSE)
endif()
set(XNNPACK_BUILD_TESTS OFF)
set(XNNPACK_BUILD_BENCHMARKS OFF)
add_definitions(-DMLLM_BUILD_XNNPACK_BACKEND)
add_subdirectory(${CMAKE_CURRENT_LIST_DIR}/src/backends/xnnpack)
endif()
Expand Down
13 changes: 10 additions & 3 deletions examples/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ macro(func_link_libaries target)
target_link_libraries(${target} PUBLIC MLLM_CPU MLLM_QNN ${CMAKE_DL_LIBS} -fopenmp -static-openmp)
endif ()
endif()
if (MLLM_BUILD_XNNPACK_BACKEND)
target_link_libraries(${target} PRIVATE MLLM_CPU MllmXnnpackBackend)
endif()
endmacro()


Expand Down Expand Up @@ -55,7 +58,9 @@ endmacro()


## new demos
func_llm_add_executable(benchmark)

# if(NOT MLLM_BUILD_XNNPACK_BACKEND)
func_llm_add_executable(mllm_benchmark)
func_llm_add_executable(demo_llama)
func_llm_add_executable(demo_tinyllama)
func_llm_add_executable(demo_stablelm)
Expand All @@ -81,7 +86,7 @@ func_vlm_add_executable(demo_vit)
func_vlm_add_executable(demo_clip)
func_vlm_add_executable(demo_imagebind)
func_vlm_add_executable(demo_imagebind_1mod)
# func_vlm_add_executable(demo)
# endif()

# QNN demo
if(QNN)
Expand All @@ -90,7 +95,9 @@ if(QNN)
endif()



if(MLLM_BUILD_XNNPACK_BACKEND)
func_llm_add_executable(demo_qwen_xp)
endif()


# old main
Expand Down
74 changes: 74 additions & 0 deletions examples/demo_qwen_xp.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
/**
* @file demo_llama_xp.cpp
* @author your name ([email protected])
* @version 0.1
* @date 2024-10-20
*
* @copyright Copyright (c) 2024
*
*/
#include "Types.hpp"
#include "cmdline.h"
#include "models/qwen/configuration_qwen.hpp"
#include "models/qwen/tokenization_qwen.hpp"
#include "models/qwen/modeling_qwen_xp_sdpa.hpp"
#include "backends/xnnpack/Utils/Logger.hpp"
#include "xnnpack/XnnpackBackend.hpp"

using namespace mllm;

int main(int argc, char **argv) {
mllm::xnnpack::Log::log_level = mllm::xnnpack::Log::LogLevel::ERROR;

cmdline::parser cmdParser;
cmdParser.add<string>("vocab", 'v', "specify mllm tokenizer model path", false, "../vocab/qwen_vocab.mllm");
cmdParser.add<string>("merge", 'e', "specify mllm merge file path", false, "../vocab/qwen_merges.txt");
cmdParser.add<string>("model", 'm', "specify mllm model path", false, "../models/qwen-1.5-1.8b-fp32.mllm");
cmdParser.add<string>("billion", 'b', "[0.5B | 1.8B]", false, "1.8B");
cmdParser.add<int>("limits", 'l', "max KV cache size", false, 400);
cmdParser.add<int>("thread", 't', "num of threads", false, 4);
cmdParser.parse_check(argc, argv);

string vocab_path = cmdParser.get<string>("vocab");
string merge_path = cmdParser.get<string>("merge");
string model_path = cmdParser.get<string>("model");
string model_billion = cmdParser.get<string>("billion");
int tokens_limit = cmdParser.get<int>("limits");
mllm::xnnpack::XnnpackBackend::xnn_threads = cmdParser.get<int>("thread");

auto tokenizer = QWenTokenizer(vocab_path, merge_path);
QWenConfig config(tokens_limit, model_billion, RoPEType::HFHUBROPE);
auto model = QWenForCausalLM(config);
model.to(BackendType::MLLM_XNNPACK);
model.load(model_path);

vector<string> in_strs = {
"Hello, who are you?",
"What can you do?",
"Please introduce Beijing University of Posts and Telecommunications.",
};
for (const auto &in_str : in_strs) {
auto input_str = tokenizer.apply_chat_template(in_str);
auto input_tensor = tokenizer.tokenize(input_str, "name", MLLM_XNNPACK);
std::cout << "[Q] " << in_str << std::endl;
std::cout << "[A] " << std::flush;

LlmTextGeneratorOpts opt{
.max_new_tokens = 100,
.do_sample = false,
.temperature = 0.3F,
.top_k = 50,
.top_p = 0.F,
};
model.generate(input_tensor, opt, [&](unsigned int out_token) -> bool {
auto out_string = tokenizer.detokenize({out_token});
auto [not_end, output_string] = tokenizer.postprocess(out_string);
if (!not_end) { return false; }
std::cout << output_string << std::flush;
return true;
});
std::cout << "\n";
}

return 0;
}
File renamed without changes.
25 changes: 23 additions & 2 deletions include/OpDefined.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,19 @@ enum OpType {
MERGEOUTPUT,
SPLITINPUT,
IROPE,
OP_NUM,

// add in xnnpack
DIRECT,
DISPATCH,
SUBGRAPHSTART,
SUBGRAPHFINALIZE,
D2H,
XP_KVCACHE,
SDPA,

// new front-end
SUPERSILU,
OP_NUM
};

static const vector<string> OpNames = {
Expand Down Expand Up @@ -119,8 +130,18 @@ static const vector<string> OpNames = {
"MergeOutput",
"SplitInput",
"IRoPE",
"OP_NUM",

// in xnnpack
"Direct",
"Dispatch",
"SubgraphStart",
"SubgraphFinalize",
"D2H",
"XP_KVCACHE",
"SDPA",
"SuperSiLU",
"OP_NUM"};
};

enum TensorFuncType {
FUNC_ADD,
Expand Down
4 changes: 3 additions & 1 deletion include/Types.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ typedef enum {
MLLM_DEFAULT,
MLLM_CPU,
MLLM_OPENCL,
MLLM_QNN
MLLM_QNN,
MLLM_XNNPACK,
} BackendType;

enum TensorStatus {
Expand Down Expand Up @@ -96,6 +97,7 @@ enum TensorType {
INPUT_TENSOR = 0, // used for input of the model
NORMAL_TENSOR,
GRAPH_OUTPUT, // used for output of a graph
OUTPUT_TENSOR,
};

enum Chl {
Expand Down
9 changes: 9 additions & 0 deletions scripts/run_test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash
for file in ../bin/*Test ../bin/*TEST; do
if [ -x "$file" ]; then
echo "Running $file..."
"../bin/$file"
else
echo "Skipping non-executable $file..."
fi
done
6 changes: 5 additions & 1 deletion src/Backend.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ namespace mllm {
extern void registerCPUBackendCreator();
#ifdef USE_QNN
extern void registerQNNBackendCreator();
#elif defined(MLLM_BUILD_XNNPACK_BACKEND)
extern void registerXNNBackendCreator();
#endif

static std::once_flag s_flag;
Expand All @@ -17,6 +19,8 @@ void registerBackend() {
registerCPUBackendCreator();
#ifdef USE_QNN
registerQNNBackendCreator();
#elif defined(MLLM_BUILD_XNNPACK_BACKEND)
registerXNNBackendCreator();
#endif
});
}
Expand All @@ -30,7 +34,7 @@ static std::unordered_map<BackendType, std::shared_ptr<BackendCreator>> &GetBack
}

const std::shared_ptr<BackendCreator> GetBackendCreator(BackendType type) {
if (type == MLLM_QNN) {
if (type == MLLM_QNN || type == MLLM_XNNPACK) {
Layer::use_layername_2_tensorname = false;
}
registerBackend();
Expand Down
97 changes: 94 additions & 3 deletions src/Layer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,8 @@
#ifndef OPERATION_H
#define OPERATION_H

#include <cassert>
#include <cstddef>
#include <cstdlib>
#include <iostream>
#include <memory>
#include <utility>

Expand All @@ -17,7 +15,6 @@
#include "Op.hpp"
#include "ParamLoader.hpp"
#include "Backend.hpp"
#include "Timing.hpp"

#include <Module.hpp>

Expand Down Expand Up @@ -60,6 +57,11 @@ class Layer {
return ts[0].get();
}

Tensor &operator()(Tensor &input0, Tensor &input1, Tensor &input2, Tensor &input3) {
auto ts = run({input0, input1, input2, input3}, 1);
return ts[0].get();
}

private:
std::string name_num_to_X(const std::string &input_string) {
std::regex pattern(R"(\.\d{1,3}\.)"); // Matches any number between 1 and 100 between two dots
Expand Down Expand Up @@ -731,6 +733,21 @@ class Quantize final : public Layer {
}
};

class Direct final : public Layer {
public:
enum DirectType : uint32_t {
Normal = 0,
ExternalInput = 1,
ExternalOutput = 2,
KeepLive = 3,
};

Direct(DirectType t, const std::string &name) {
param_["DirectType"] = (float)t;
init(name, OpType::DIRECT);
}
};

class Dequantize final : public Layer {
public:
explicit Dequantize(bool isNSHD, std::string name, bool isFP32 = true) {
Expand All @@ -744,6 +761,18 @@ class Dequantize final : public Layer {
}
};

class Dispatch final : public Layer {
public:
explicit Dispatch(const std::string &name) {
init(name, OpType::DISPATCH);
}

Tensor &operator()(Tensor &input) {
auto ts = run({input}, 1);
return ts[0].get();
}
};

class Add final : public Layer {
public:
explicit Add(std::string name) {
Expand Down Expand Up @@ -819,6 +848,18 @@ class View final : public Layer {
}
};

class SubgraphStart final : public Layer {
public:
explicit SubgraphStart(const std::string &name) {
init(name, OpType::SUBGRAPHSTART);
}

Tensor &operator()(Tensor &input) {
auto ts = run({input}, 1);
return ts[0].get();
}
};

class Transpose final : public Layer {
public:
explicit Transpose(std::vector<int> perm, std::string name) {
Expand All @@ -834,6 +875,56 @@ class Transpose final : public Layer {
}
};

class SubgraphFinalize final : public Layer {
public:
explicit SubgraphFinalize(const std::string &name) {
init(name, OpType::SUBGRAPHFINALIZE);
}

Tensor &operator()(Tensor &input) {
auto ts = run({input}, 1);
return ts[0].get();
}
};

class Device2Host final : public Layer {
public:
explicit Device2Host(const std::string &name) {
init(name, OpType::D2H);
}

Tensor &operator()(Tensor &input) {
auto ts = run({input}, 1);
return ts[0].get();
}
};

class XP_KVCache final : public Layer {
public:
explicit XP_KVCache(int n_rep, int cache_max, std::string name) {
param_["n_rep"] = (float)n_rep;
param_["cache_max"] = (float)cache_max;
init(std::move(name), OpType::XP_KVCACHE);
}

Tensor &operator()(Tensor &input) {
auto ts = run({input}, 1);
return ts[0].get();
}
};

class ScaledDotProductAttention final : public Layer {
public:
explicit ScaledDotProductAttention(std::string name) {
init(std::move(name), OpType::SDPA);
}

// Q, K, V
Tensor &operator()(Tensor &Q, Tensor &K, Tensor &V) {
auto ts = run({Q, K, V}, 1); // Q, K, V
return ts[0].get();
}
};
// Only for QNN END

} // namespace mllm
Expand Down
Loading
Loading