Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native] Dynamically Linked Library in Presto CPP #24330

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

soumiiow
Copy link

@soumiiow soumiiow commented Jan 7, 2025

Description

Depends on facebookincubator/velox#11439 in the Velox space
and based off of the following PR: https://github.com/facebookincubator/velox/pull/1005/files

Motivation and Context

Having these changes will enable users to register custom functions dynamically without requiring a fork of Prestissimo.

Impact

This extends Prestissimo functionality to include dynamic loading of functions, types, connectors, etc.

Test Plan

Unit tested. and Manually end to end tested the changes.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* ... :pr:`12345`
* ... :pr:`12345`

Hive Connector Changes
* ... :pr:`12345`
* ... :pr:`12345`

If release note is NOT required, use:

== NO RELEASE NOTE ==

@soumiiow soumiiow self-assigned this Jan 7, 2025
Copy link

linux-foundation-easycla bot commented Jan 7, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: soumiiow / name: Soumya Duriseti (27bd55e)

@soumiiow soumiiow marked this pull request as ready for review January 7, 2025 18:04
@soumiiow soumiiow requested a review from a team as a code owner January 7, 2025 18:04
@tdcmeehan tdcmeehan self-assigned this Jan 8, 2025
@tdcmeehan tdcmeehan added the from:IBM PR from IBM label Jan 30, 2025
@prestodb-ci prestodb-ci requested review from a team, pdabre12 and psnv03 and removed request for a team January 30, 2025 18:47
Copy link
Contributor

@czentgr czentgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also rebase.

@@ -76,7 +76,8 @@ target_link_libraries(
${FOLLY_WITH_DEPENDENCIES}
${GLOG}
${GFLAGS_LIBRARIES}
pthread)
pthread
velox_dynamic_function_loader)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this before the velox_encode library.

@@ -89,6 +90,7 @@ set_property(TARGET presto_server_lib PROPERTY JOB_POOL_LINK
presto_link_job_pool)

add_executable(presto_server PrestoMain.cpp)
target_link_options(presto_server BEFORE PUBLIC "-Wl,-export-dynamic")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment here why we need these flags?

const fs::path path(systemConfig->pluginDir());
PRESTO_STARTUP_LOG(INFO) << path;
std::error_code
ec; // For using the non-throwing overloads of functions below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don;t need this comment here and so can fix up the odd formatting.

void PrestoServer::registerDynamicFunctions() {
auto systemConfig = SystemConfig::instance();
if (!systemConfig->pluginDir().empty()) {
// if it is a valid directory, traverse and call dynamic function loader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure the comments are full sentences beginning with capitalization etc.

auto dirEntryPath = dirEntry.path();
if (!fs::is_directory(dirEntry, ec) &&
extensions.find(dirEntryPath.extension()) != extensions.end()) {
facebook::velox::loadDynamicLibrary(dirEntryPath.c_str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

facebook is not needed here because we are already in the facebook namespace.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc! A few minor formatting and phrasing suggestions.

@@ -0,0 +1,17 @@
# Dynamic Loading of Presto Cpp Extensions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Dynamic Loading of Presto Cpp Extensions
# Dynamic Loading of Presto CPP Extensions

@@ -0,0 +1,17 @@
# Dynamic Loading of Presto Cpp Extensions
This library adds the ability to load User Defined Functions (UDFs), connectors, or types without having to fork and build Prestissimo, through the use of shared libraries that a Prestissimo worker can access. These are to be loaded on launch of the Presto server. The Presto server searches for any .so or .dylib files and loads them using this library.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This library adds the ability to load User Defined Functions (UDFs), connectors, or types without having to fork and build Prestissimo, through the use of shared libraries that a Prestissimo worker can access. These are to be loaded on launch of the Presto server. The Presto server searches for any .so or .dylib files and loads them using this library.
This library adds the ability to load User Defined Functions (UDFs), connectors, or types without having to fork and build Prestissimo, through the use of shared libraries that a Prestissimo worker can access. These are loaded on launch of the Presto server. The Presto server searches for any .so or .dylib files and loads them using this library.

This library adds the ability to load User Defined Functions (UDFs), connectors, or types without having to fork and build Prestissimo, through the use of shared libraries that a Prestissimo worker can access. These are to be loaded on launch of the Presto server. The Presto server searches for any .so or .dylib files and loads them using this library.
## Getting started
1. Create a cpp file for your dynamic library
For dynamically loaded function registration, the format followed is mirrored of that of built-in function registration with some noted differences. Using [MyDynamicFunction.cpp](examples/MyDynamicFunction.cpp) as an example, the function uses the extern "C" keyword to protect against name mangling. A registry() function call is also necessary here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For dynamically loaded function registration, the format followed is mirrored of that of built-in function registration with some noted differences. Using [MyDynamicFunction.cpp](examples/MyDynamicFunction.cpp) as an example, the function uses the extern "C" keyword to protect against name mangling. A registry() function call is also necessary here.
For dynamically loaded function registration, the format is similar to that of built-in function registration, with some noted differences. Using [MyDynamicFunction.cpp](examples/MyDynamicFunction.cpp) as an example, the function uses the extern "C" keyword to protect against name mangling. A registry() function call is also necessary here.

```
plugin.dir="User\Test\Path\plugin"
```
4. When the worker or the sidecar process starts, it will scan the plugin directory and attempt to dynamically load all shared libraries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
4. When the worker or the sidecar process starts, it will scan the plugin directory and attempt to dynamically load all shared libraries
When the worker or the sidecar process starts, it scans the plugin directory and attempts to dynamically load all shared libraries.

I don't think of this as part of the steps to configure, it's what happens when things are run using the config in steps 1-3.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the documentation! I really liked your including the setup steps on plugin.rst. Some minor suggestions for formatting and phrasing but looks good overall.

@soumiiow
Copy link
Author

Thanks @steveburnett please take another look, I've made the changes. I'm not sure about the tone on the intro to UDFs i have on function_plugin.rst, would appreciate another set of eyes there.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the revision! Nice work, your unordered list of UDF benefits was great and the format fixes in the README look good.

I made a couple of small suggestions about the intro to UDFs, let me know what you think.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have noticed this spelling nit earlier! After I found this one, I did a complete review of the doc in this PR and found no other errors so I think this is the last one.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local doc build, complete review of new content. No concerns, looks good. Thanks for working with me on this!

Copy link
Contributor

@czentgr czentgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, a couple of comments.


2. Set the ``plugin.dir`` property to the path of the ``plugins`` directory in the ``config.properties`` file of each of your workers.

3. Run the coordinator and workers to load your plugins.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should we say start or restart the coordinator/worker to pick up any placed libraries? Run could mean it is already running? It's a bit ambiguous to me in this context.

Creating a Shared Library for UDFs
----------------------------------
User defined functions (UDFs) allow users to create custom functions without the need to rebuild the executable.
There are many benefits to UDFs other than shorter compile times, such as:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"other than shorter compile times". An user wouldn't care about compile times. I would just leave the "other than ..." out and simply state the benefits.

#include "presto_cpp/main/dynamic_registry/DynamicFunctionRegistrar.h"

template <typename T>
struct nameOfStruct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most (all?) the code conventions a struct/class name starts with upper case for a class or struct so should do the same here (and then usage later).


Note: the ``int64_t`` return type can be changed as needed. For more examples, see the `README <https://github.com/prestodb/presto-native-execution/main/dynamic_registry/README.md>`_.

2. Create a shared library which may be made using CMakeLists like the following:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: ... may be made using a CMakeLists.txt like ...

We should highlight the file name because it is fixed for a directory.

@@ -642,6 +642,8 @@ class SystemConfig : public ConfigBase {
static constexpr std::string_view kInternalCommunicationJwtExpirationSeconds{
"internal-communication.jwt.expiration-seconds"};

/// Optional string containing the path to the plugin directory
static constexpr std::string_view kPluginDir{"plugin.dir"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add new line before comments on line 647.

add_library(name_of_dynamic_fn SHARED TestFunction.cpp)
target_link_libraries(name_of_dynamic_fn PRIVATE fmt::fmt Folly::folly gflags::gflags)
3. Put your shared libraries in the plugin directory described in :doc:`../plugin`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true anymore.
The user must configure the new plugin.dir variable to indicate where to pick up the new libraries. If it is the default (empty string) nothing happens.
On the CPP side this is good because we likely need some kind of option where libraries are not loaded even if a library is present - aka never load anything dynamically as otherwise it is too easy to exploit. Java is a bit more secure in that regard.


#include "presto_cpp/main/common/Configs.h"
#include "velox/functions/Macros.h"
#include "velox/functions/Registerer.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: leave a new line between the last include and the namespace.

@@ -0,0 +1,44 @@
=======================
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need the README if we have the documentation here?
There is some divergence between the readme and this.

Perhaps the readme can just point to the documentation link instead? We've done something similar with the velox part of the PR.

We could also link to the Velox piece that is underlying the Presto part.

} // namespace facebook::velox::common::dynamicRegistry

extern "C" {
// In this case, we assume that facebook::velox::registerFunction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this comment? The user here doesn't know about the Velox API. They don't need to know, do they?
Or it might be explained in the doc what it relies on?

* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "presto_cpp/main/dynamic_registry/DynamicFunctionRegistrar.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: please add a new line between include and comment below.

@aditi-pandit aditi-pandit changed the title Dynamically Linked Library in Presto CPP [native] Dynamically Linked Library in Presto CPP Feb 13, 2025
Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumiiow : Thanks for this code. Would be great to have an e2e test with a SQL statement using the dynamic functions. On the lines of https://github.com/prestodb/presto/blob/master/presto-native-execution/src/test/java/com/facebook/presto/nativeworker/TestPrestoNativeRemoteFunctions.java

auto systemConfig = SystemConfig::instance();
cpp_nameSpace = systemConfig->prestoDefaultNamespacePrefix();
}
std::string cpp_name(cpp_nameSpace);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an assumption that namespace should end in "." ? We should make this behavior more robust by adding the period separator if its not present.

const char* nameSpace = "",
const std::vector<velox::exec::SignatureVariable>& constraints = {},
bool overwrite = true) {
std::string cpp_nameSpace(nameSpace);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use camelcase cppNamespace

@@ -1598,5 +1600,28 @@ protocol::NodeStatus PrestoServer::fetchNodeStatus() {

return nodeStatus;
}
void PrestoServer::registerDynamicFunctions() {
auto systemConfig = SystemConfig::instance();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to have a config of library names to load rather than load all those found in the directory. wdyt ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an example of a function which is added to the non-default namespace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants