-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[native] Dynamically Linked Library in Presto CPP #24330
base: master
Are you sure you want to change the base?
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also rebase.
@@ -76,7 +76,8 @@ target_link_libraries( | |||
${FOLLY_WITH_DEPENDENCIES} | |||
${GLOG} | |||
${GFLAGS_LIBRARIES} | |||
pthread) | |||
pthread | |||
velox_dynamic_function_loader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this before the velox_encode library.
@@ -89,6 +90,7 @@ set_property(TARGET presto_server_lib PROPERTY JOB_POOL_LINK | |||
presto_link_job_pool) | |||
|
|||
add_executable(presto_server PrestoMain.cpp) | |||
target_link_options(presto_server BEFORE PUBLIC "-Wl,-export-dynamic") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment here why we need these flags?
const fs::path path(systemConfig->pluginDir()); | ||
PRESTO_STARTUP_LOG(INFO) << path; | ||
std::error_code | ||
ec; // For using the non-throwing overloads of functions below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don;t need this comment here and so can fix up the odd formatting.
void PrestoServer::registerDynamicFunctions() { | ||
auto systemConfig = SystemConfig::instance(); | ||
if (!systemConfig->pluginDir().empty()) { | ||
// if it is a valid directory, traverse and call dynamic function loader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure the comments are full sentences beginning with capitalization etc.
auto dirEntryPath = dirEntry.path(); | ||
if (!fs::is_directory(dirEntry, ec) && | ||
extensions.find(dirEntryPath.extension()) != extensions.end()) { | ||
facebook::velox::loadDynamicLibrary(dirEntryPath.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
facebook is not needed here because we are already in the facebook namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc! A few minor formatting and phrasing suggestions.
@@ -0,0 +1,17 @@ | |||
# Dynamic Loading of Presto Cpp Extensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Dynamic Loading of Presto Cpp Extensions | |
# Dynamic Loading of Presto CPP Extensions |
@@ -0,0 +1,17 @@ | |||
# Dynamic Loading of Presto Cpp Extensions | |||
This library adds the ability to load User Defined Functions (UDFs), connectors, or types without having to fork and build Prestissimo, through the use of shared libraries that a Prestissimo worker can access. These are to be loaded on launch of the Presto server. The Presto server searches for any .so or .dylib files and loads them using this library. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This library adds the ability to load User Defined Functions (UDFs), connectors, or types without having to fork and build Prestissimo, through the use of shared libraries that a Prestissimo worker can access. These are to be loaded on launch of the Presto server. The Presto server searches for any .so or .dylib files and loads them using this library. | |
This library adds the ability to load User Defined Functions (UDFs), connectors, or types without having to fork and build Prestissimo, through the use of shared libraries that a Prestissimo worker can access. These are loaded on launch of the Presto server. The Presto server searches for any .so or .dylib files and loads them using this library. |
This library adds the ability to load User Defined Functions (UDFs), connectors, or types without having to fork and build Prestissimo, through the use of shared libraries that a Prestissimo worker can access. These are to be loaded on launch of the Presto server. The Presto server searches for any .so or .dylib files and loads them using this library. | ||
## Getting started | ||
1. Create a cpp file for your dynamic library | ||
For dynamically loaded function registration, the format followed is mirrored of that of built-in function registration with some noted differences. Using [MyDynamicFunction.cpp](examples/MyDynamicFunction.cpp) as an example, the function uses the extern "C" keyword to protect against name mangling. A registry() function call is also necessary here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For dynamically loaded function registration, the format followed is mirrored of that of built-in function registration with some noted differences. Using [MyDynamicFunction.cpp](examples/MyDynamicFunction.cpp) as an example, the function uses the extern "C" keyword to protect against name mangling. A registry() function call is also necessary here. | |
For dynamically loaded function registration, the format is similar to that of built-in function registration, with some noted differences. Using [MyDynamicFunction.cpp](examples/MyDynamicFunction.cpp) as an example, the function uses the extern "C" keyword to protect against name mangling. A registry() function call is also necessary here. |
``` | ||
plugin.dir="User\Test\Path\plugin" | ||
``` | ||
4. When the worker or the sidecar process starts, it will scan the plugin directory and attempt to dynamically load all shared libraries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4. When the worker or the sidecar process starts, it will scan the plugin directory and attempt to dynamically load all shared libraries | |
When the worker or the sidecar process starts, it scans the plugin directory and attempts to dynamically load all shared libraries. |
I don't think of this as part of the steps to configure, it's what happens when things are run using the config in steps 1-3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the documentation! I really liked your including the setup steps on plugin.rst. Some minor suggestions for formatting and phrasing but looks good overall.
presto-docs/src/main/sphinx/presto_cpp/plugin/function_plugin.rst
Outdated
Show resolved
Hide resolved
presto-docs/src/main/sphinx/presto_cpp/plugin/function_plugin.rst
Outdated
Show resolved
Hide resolved
presto-docs/src/main/sphinx/presto_cpp/plugin/function_plugin.rst
Outdated
Show resolved
Hide resolved
Thanks @steveburnett please take another look, I've made the changes. I'm not sure about the tone on the intro to UDFs i have on function_plugin.rst, would appreciate another set of eyes there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the revision! Nice work, your unordered list of UDF benefits was great and the format fixes in the README look good.
I made a couple of small suggestions about the intro to UDFs, let me know what you think.
presto-docs/src/main/sphinx/presto_cpp/plugin/function_plugin.rst
Outdated
Show resolved
Hide resolved
presto-docs/src/main/sphinx/presto_cpp/plugin/function_plugin.rst
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have noticed this spelling nit earlier! After I found this one, I did a complete review of the doc in this PR and found no other errors so I think this is the last one.
presto-docs/src/main/sphinx/presto_cpp/plugin/function_plugin.rst
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull updated branch, new local doc build, complete review of new content. No concerns, looks good. Thanks for working with me on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, a couple of comments.
|
||
2. Set the ``plugin.dir`` property to the path of the ``plugins`` directory in the ``config.properties`` file of each of your workers. | ||
|
||
3. Run the coordinator and workers to load your plugins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should we say start or restart the coordinator/worker to pick up any placed libraries? Run could mean it is already running? It's a bit ambiguous to me in this context.
Creating a Shared Library for UDFs | ||
---------------------------------- | ||
User defined functions (UDFs) allow users to create custom functions without the need to rebuild the executable. | ||
There are many benefits to UDFs other than shorter compile times, such as: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"other than shorter compile times". An user wouldn't care about compile times. I would just leave the "other than ..." out and simply state the benefits.
#include "presto_cpp/main/dynamic_registry/DynamicFunctionRegistrar.h" | ||
|
||
template <typename T> | ||
struct nameOfStruct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most (all?) the code conventions a struct/class name starts with upper case for a class or struct so should do the same here (and then usage later).
|
||
Note: the ``int64_t`` return type can be changed as needed. For more examples, see the `README <https://github.com/prestodb/presto-native-execution/main/dynamic_registry/README.md>`_. | ||
|
||
2. Create a shared library which may be made using CMakeLists like the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: ... may be made using a CMakeLists.txt
like ...
We should highlight the file name because it is fixed for a directory.
@@ -642,6 +642,8 @@ class SystemConfig : public ConfigBase { | |||
static constexpr std::string_view kInternalCommunicationJwtExpirationSeconds{ | |||
"internal-communication.jwt.expiration-seconds"}; | |||
|
|||
/// Optional string containing the path to the plugin directory | |||
static constexpr std::string_view kPluginDir{"plugin.dir"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: add new line before comments on line 647.
add_library(name_of_dynamic_fn SHARED TestFunction.cpp) | ||
target_link_libraries(name_of_dynamic_fn PRIVATE fmt::fmt Folly::folly gflags::gflags) | ||
3. Put your shared libraries in the plugin directory described in :doc:`../plugin`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not true anymore.
The user must configure the new plugin.dir
variable to indicate where to pick up the new libraries. If it is the default (empty string) nothing happens.
On the CPP side this is good because we likely need some kind of option where libraries are not loaded even if a library is present - aka never load anything dynamically as otherwise it is too easy to exploit. Java is a bit more secure in that regard.
|
||
#include "presto_cpp/main/common/Configs.h" | ||
#include "velox/functions/Macros.h" | ||
#include "velox/functions/Registerer.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: leave a new line between the last include and the namespace.
@@ -0,0 +1,44 @@ | |||
======================= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need the README if we have the documentation here?
There is some divergence between the readme and this.
Perhaps the readme can just point to the documentation link instead? We've done something similar with the velox part of the PR.
We could also link to the Velox piece that is underlying the Presto part.
} // namespace facebook::velox::common::dynamicRegistry | ||
|
||
extern "C" { | ||
// In this case, we assume that facebook::velox::registerFunction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this comment? The user here doesn't know about the Velox API. They don't need to know, do they?
Or it might be explained in the doc what it relies on?
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
#include "presto_cpp/main/dynamic_registry/DynamicFunctionRegistrar.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: please add a new line between include and comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soumiiow : Thanks for this code. Would be great to have an e2e test with a SQL statement using the dynamic functions. On the lines of https://github.com/prestodb/presto/blob/master/presto-native-execution/src/test/java/com/facebook/presto/nativeworker/TestPrestoNativeRemoteFunctions.java
auto systemConfig = SystemConfig::instance(); | ||
cpp_nameSpace = systemConfig->prestoDefaultNamespacePrefix(); | ||
} | ||
std::string cpp_name(cpp_nameSpace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an assumption that namespace should end in "." ? We should make this behavior more robust by adding the period separator if its not present.
const char* nameSpace = "", | ||
const std::vector<velox::exec::SignatureVariable>& constraints = {}, | ||
bool overwrite = true) { | ||
std::string cpp_nameSpace(nameSpace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use camelcase cppNamespace
@@ -1598,5 +1600,28 @@ protocol::NodeStatus PrestoServer::fetchNodeStatus() { | |||
|
|||
return nodeStatus; | |||
} | |||
void PrestoServer::registerDynamicFunctions() { | |||
auto systemConfig = SystemConfig::instance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to have a config of library names to load rather than load all those found in the directory. wdyt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an example of a function which is added to the non-default namespace.
Description
Depends on facebookincubator/velox#11439 in the Velox space
and based off of the following PR: https://github.com/facebookincubator/velox/pull/1005/files
Motivation and Context
Having these changes will enable users to register custom functions dynamically without requiring a fork of Prestissimo.
Impact
This extends Prestissimo functionality to include dynamic loading of functions, types, connectors, etc.
Test Plan
Unit tested. and Manually end to end tested the changes.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
If release note is NOT required, use: