-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add client interface. #16
Changes from all commits
6ffea6e
94af15b
f69523a
ccf6cc8
5e26f58
020093c
ee222f0
1b0ccac
b3a2e1e
ec8f500
5dc12cb
0d59c62
4cd6233
fec5e73
a5e799b
1b15b5d
98104f1
cb369fd
b4a6f36
f3de2ca
70547ae
525311c
c776376
49c571e
43d7e16
5e8e1dd
fe23c3c
064edd8
e7c5240
b0b414e
97761e1
91d36f2
84c2f41
f7ab013
302e68a
a2dc8bc
046e740
92d6489
d27f042
2b49746
c4ee015
0cc231b
069a7a7
9069030
0fe063f
e089107
159aa08
8e035b7
9d90c37
8de26ec
b9bfcdd
351c8b5
7dae8d4
5d15b22
1588b51
1e1e41d
c0a6e6f
99c5935
80f314a
7796e15
036dd51
8affa10
77d2458
026b6f1
448f693
769a708
3555d2e
a0c5b3a
e185bf3
f0d79e9
f444772
530da78
db21fe7
165eb84
06de774
c7a07b1
f8c623a
488eaa3
f0729d9
4d2aa6c
88ed638
bd55552
9897995
73eabef
85d2475
61be939
5bffeee
22f370d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
@@ -0,0 +1,82 @@ | ||||
#ifndef SPIDER_CLIENT_DATA_HPP | ||||
#define SPIDER_CLIENT_DATA_HPP | ||||
|
||||
#include <functional> | ||||
#include <memory> | ||||
#include <string> | ||||
#include <vector> | ||||
|
||||
#include "../core/Serializer.hpp" | ||||
|
||||
kirkrodrigues marked this conversation as resolved.
Show resolved
Hide resolved
|
||||
namespace spider { | ||||
class DataImpl; | ||||
|
||||
/** | ||||
* A representation of data stored on external storage. This class allows the user to define: | ||||
* - how the data should be cleaned up (garbage collected) once it is no longer referenced. | ||||
* - the locality of the data. | ||||
* | ||||
* Example: | ||||
* @code{.cpp} | ||||
* auto disk_file_data = spider::Data<std::string>::Builder() | ||||
* .set_locality({"node_address"}, true) | ||||
* .set_cleanup_func([](std::string const& path) { std::filesystem::remove(path); }) | ||||
* .build("/path/of/file"); | ||||
* @endcode | ||||
* | ||||
* @tparam T Type of the value. | ||||
*/ | ||||
template <Serializable T> | ||||
class Data { | ||||
kirkrodrigues marked this conversation as resolved.
Show resolved
Hide resolved
|
||||
public: | ||||
/** | ||||
* @return The stored value. | ||||
*/ | ||||
auto get() -> T; | ||||
Comment on lines
+32
to
+35
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Consider adding error handling to get() The
|
||||
|
||||
/** | ||||
* Sets the data's locality, indicated by the nodes that contain the data. | ||||
* | ||||
* @param nodes | ||||
* @param hard Whether the data is only accessible from the given nodes (i.e., the locality is a | ||||
* hard requirement). | ||||
*/ | ||||
void set_locality(std::vector<std::string> const& nodes, bool hard); | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Locality information might not be available at the time we create the data. For example, we create an associated data for HDFS files before we create the actual HDFS files and write to them so that Spider can gc the HDFS files on failure during write. However, we don't know the locality of the HDFS files before creating them, and thus we need a way to set locality on the fly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||
|
||||
class Builder { | ||||
public: | ||||
/** | ||||
* Sets the data's locality, indicated by the nodes that contain the data. | ||||
* | ||||
* @param nodes | ||||
* @param hard Whether the data is only accessible from the given nodes (i.e., the locality | ||||
* is a hard requirement. | ||||
* @return self | ||||
*/ | ||||
auto set_locality(std::vector<std::string> const& nodes, bool hard) -> Builder&; | ||||
|
||||
/** | ||||
* Sets the cleanup function for the data. This function will be called when the data is no | ||||
* longer referenced. | ||||
* | ||||
* @param f | ||||
* @return self | ||||
*/ | ||||
auto set_cleanup_func(std::function<void(T const&)> const& f) -> Builder&; | ||||
|
||||
/** | ||||
* Builds the data object. | ||||
* | ||||
* @param t Value of the data | ||||
* @return The built object. | ||||
* @throw spider::ConnectionException | ||||
*/ | ||||
auto build(T const& t) -> Data; | ||||
}; | ||||
|
||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||
private: | ||||
std::unique_ptr<DataImpl> m_impl; | ||||
}; | ||||
Comment on lines
+77
to
+79
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add destructor declaration for pImpl idiom. When using the pImpl idiom with std::unique_ptr, you need to declare the destructor in the header and define it in the implementation file where the complete type is available. Add the following declaration: ~Data(); // Declaration only, define in .cpp file |
||||
} // namespace spider | ||||
|
||||
#endif // SPIDER_CLIENT_DATA_HPP |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,139 @@ | ||
#ifndef SPIDER_CLIENT_DRIVER_HPP | ||
#define SPIDER_CLIENT_DRIVER_HPP | ||
|
||
#include <optional> | ||
#include <string> | ||
#include <vector> | ||
|
||
#include <boost/uuid/uuid.hpp> | ||
|
||
#include "../worker/FunctionManager.hpp" | ||
#include "Job.hpp" | ||
#include "task.hpp" | ||
#include "TaskGraph.hpp" | ||
|
||
/** | ||
* Registers a Task function with Spider | ||
* @param func | ||
*/ | ||
// NOLINTNEXTLINE(cppcoreguidelines-macro-usage) | ||
#define SPIDER_REGISTER_TASK(func) SPIDER_WORKER_REGISTER_TASK(func) | ||
|
||
/** | ||
* Registers a timed Task function with Spider | ||
* @param func | ||
* @param timeout The time after which the task is considered a straggler, triggering Spider to | ||
* replicate the task. TODO: Use the timeout. | ||
*/ | ||
// NOLINTNEXTLINE(cppcoreguidelines-macro-usage) | ||
#define SPIDER_REGISTER_TASK_TIMEOUT(func, timeout) SPIDER_WORKER_REGISTER_TASK(func) | ||
Comment on lines
+26
to
+29
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Timeout parameter is unused in The Would you like assistance in implementing the timeout functionality, or should we open a new GitHub issue to track this task? |
||
|
||
namespace spider { | ||
|
||
/** | ||
* An interface for a client to interact with Spider and create jobs, access the kv-store, etc. | ||
*/ | ||
class Driver { | ||
public: | ||
/** | ||
* @param storage_url | ||
* @throw spider::ConnectionException | ||
*/ | ||
explicit Driver(std::string const& storage_url); | ||
|
||
/** | ||
* @param storage_url | ||
* @param id A caller-specified ID to associate with this driver. All jobs created by this | ||
* driver will be associated with this ID. This may be useful if, for instance, the caller | ||
* fails and then needs to reconnect and retrieve all previously created jobs. NOTE: It is | ||
* undefined behaviour for two clients to concurrently use the same ID. | ||
* @throw spider::ConnectionException | ||
* @throw spider::DriverIdInUseException | ||
*/ | ||
Driver(std::string const& storage_url, boost::uuids::uuid id); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This method should throw an exception if the client ID is already in use, right? |
||
|
||
/** | ||
* Inserts the given key-value pair into the key-value store, overwriting any existing value. | ||
* | ||
* @param key | ||
* @param value | ||
* @throw spider::ConnectionException | ||
*/ | ||
auto kv_store_insert(std::string const& key, std::string const& value); | ||
|
||
/** | ||
* Gets the value corresponding to the given key. | ||
* | ||
* NOTE: Callers cannot get values created by other clients, but they can get values created by | ||
* previous `Driver` instances with the same client ID. | ||
* | ||
* @param key | ||
* @return An optional containing the value if the given key exists, or `std::nullopt` | ||
* otherwise. | ||
* @throw spider::ConnectionException | ||
*/ | ||
auto kv_store_get(std::string const& key) -> std::optional<std::string>; | ||
|
||
/** | ||
* Binds inputs to a task. Inputs can be: | ||
* - the outputs of a task or task graph, forming dependencies between tasks. | ||
* - any value that satisfies the `TaskIo` concept. | ||
* | ||
* @tparam ReturnType Return type for both the task and the resulting `TaskGraph`. | ||
* @tparam TaskParams | ||
* @tparam Inputs | ||
* @tparam GraphParams | ||
* @param task | ||
* @param inputs Inputs to bind to `task`. If an input is a `Task` or `TaskGraph`, their | ||
* outputs will be bound to the inputs of `task`. | ||
* @return A `TaskGraph` of the inputs bound to `task`. | ||
*/ | ||
template < | ||
TaskIo ReturnType, | ||
TaskIo... TaskParams, | ||
RunnableOrTaskIo... Inputs, | ||
TaskIo... GraphParams> | ||
auto bind(TaskFunction<ReturnType, TaskParams...> const& task, Inputs&&... inputs) | ||
-> TaskGraph<ReturnType(GraphParams...)>; | ||
|
||
/** | ||
* Starts running a task with the given inputs on Spider. | ||
* | ||
* @tparam ReturnType | ||
* @tparam Params | ||
* @param task | ||
* @param inputs | ||
* @return A job representing the running task. | ||
* @throw spider::ConnectionException | ||
*/ | ||
template <TaskIo ReturnType, TaskIo... Params> | ||
auto | ||
start(TaskFunction<ReturnType, Params...> const& task, Params&&... inputs) -> Job<ReturnType>; | ||
|
||
/** | ||
* Starts running a task graph with the given inputs on Spider. | ||
* | ||
* @tparam ReturnType | ||
* @tparam Params | ||
* @param graph | ||
* @param inputs | ||
* @return A job representing the running task graph. | ||
* @throw spider::ConnectionException | ||
*/ | ||
template <TaskIo ReturnType, TaskIo... Params> | ||
auto | ||
start(TaskGraph<ReturnType(Params...)> const& graph, Params&&... inputs) -> Job<ReturnType>; | ||
|
||
/** | ||
* Gets all scheduled and running jobs started by drivers with the current client's ID. | ||
* | ||
* NOTE: This method will not return jobs that have finished. | ||
* | ||
* @return IDs of the jobs. | ||
* @throw spider::ConnectionException | ||
*/ | ||
auto get_jobs() -> std::vector<boost::uuids::uuid>; | ||
}; | ||
} // namespace spider | ||
|
||
#endif // SPIDER_CLIENT_DRIVER_HPP |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
#ifndef SPIDER_CLIENT_EXCEPTION_HPP | ||
#define SPIDER_CLIENT_EXCEPTION_HPP | ||
|
||
#include <exception> | ||
#include <string> | ||
|
||
#include <boost/uuid/uuid.hpp> | ||
#include <boost/uuid/uuid_io.hpp> | ||
#include <fmt/format.h> | ||
|
||
namespace spider { | ||
class ConnectionException final : public std::exception { | ||
public: | ||
explicit ConnectionException(std::string const& addr) | ||
: m_message(fmt::format("Cannot connect to storage {}.", addr)) {} | ||
|
||
[[nodiscard]] auto what() const noexcept -> char const* override { return m_message.c_str(); } | ||
|
||
private: | ||
std::string m_message; | ||
}; | ||
|
||
class DriverIdInUseException final : public std::exception { | ||
public: | ||
explicit DriverIdInUseException(boost::uuids::uuid id) | ||
: m_message( | ||
fmt::format("Driver ID {} is currently in use.", boost::uuids::to_string(id)) | ||
) {} | ||
|
||
[[nodiscard]] auto what() const noexcept -> char const* override { return m_message.c_str(); } | ||
|
||
private: | ||
std::string m_message; | ||
}; | ||
} // namespace spider | ||
|
||
#endif // SPIDER_CLIENT_EXCEPTION_HPP |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
#ifndef SPIDER_CLIENT_JOB_HPP | ||
#define SPIDER_CLIENT_JOB_HPP | ||
|
||
#include <cstdint> | ||
#include <string> | ||
#include <utility> | ||
|
||
#include "task.hpp" | ||
|
||
namespace spider { | ||
// TODO: Use std::expected or Boost's outcome so that the user can get the result of the job in one | ||
// call rather than the current error-prone approach which requires that the user check the job's | ||
// status and then call the relevant method. | ||
|
||
enum class JobStatus : uint8_t { | ||
Running, | ||
Succeeded, | ||
Failed, | ||
Cancelled, | ||
}; | ||
|
||
/** | ||
* A running task graph. | ||
* | ||
* @tparam ReturnType | ||
*/ | ||
template <TaskIo ReturnType> | ||
class Job { | ||
public: | ||
/** | ||
* Waits for the job to complete. | ||
* | ||
* @throw spider::ConnectionException | ||
*/ | ||
auto wait_complete(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Codebase verification Template implementation file is missing The methods in the Job template class are declared but have no implementation. No source files (.cpp), template implementation files (.ipp/.tpp), or inline implementations were found. This could lead to linker errors.
🔗 Analysis chainAdd inline keyword for header-only methods The method declarations don't have implementations. If these are meant to be implemented in a source file, ensure the implementation file exists. If these are meant to be header-only, mark them as inline. Also applies to: 41-41, 46-46, 54-54, 63-63 🏁 Scripts executedThe following scripts were executed for the analysis: Script: #!/bin/bash
# Check if there's a corresponding source file
fd "Job\.(cpp|cc|cxx)$" src/spider/client/
# Check if these methods are implemented elsewhere
rg "Job<.*>::(wait_complete|cancel|get_status|get_result|get_error)"
Length of output: 115 Script: #!/bin/bash
# Let's check the Job.hpp file content to see if methods are implemented inline
cat src/spider/client/Job.hpp
# Also check for any potential template implementation files
fd "Job\..*" src/spider/client/
Length of output: 1664 Script: #!/bin/bash
# Let's check if there's a template implementation file with .ipp or .tpp extension
fd "Job\.(ipp|tpp)$" src/spider/
# Let's also check if implementations are in a separate template implementation file
rg -l "template.*Job<.*>::(wait_complete|cancel|get_status|get_result|get_error)"
# Check JobImpl implementation as it might contain the actual implementations
fd "JobImpl\.(hpp|cpp|cc|h)$" src/spider/
Length of output: 162 |
||
|
||
/** | ||
* Cancels the job and waits for the job's tasks to be cancelled. | ||
* | ||
* @throw spider::ConnectionException | ||
*/ | ||
auto cancel(); | ||
|
||
/** | ||
* @return Status of the job. | ||
* @throw spider::ConnectionException | ||
*/ | ||
auto get_status() -> JobStatus; | ||
|
||
/** | ||
* NOTE: It is undefined behavior to call this method for a job that is not in the `Succeeded` | ||
* state. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct the job state names in the documentation There are inconsistencies in the documentation comments regarding job state names:
Apply the following changes to correct the documentation: - * NOTE: It is undefined behaviour to call this method for a job that is not in the `Succeed` state.
+ * NOTE: It is undefined behaviour to call this method for a job that is not in the `Succeeded` state. - * NOTE: It is undefined behaviour to call this method for a job that is not in the `Fail` state.
+ * NOTE: It is undefined behaviour to call this method for a job that is not in the `Failed` state. Also applies to: 63-63 |
||
* | ||
* @return Result of the job. | ||
* @throw spider::ConnectionException | ||
*/ | ||
auto get_result() -> ReturnType; | ||
|
||
/** | ||
* NOTE: It is undefined behavior to call this method for a job that is not in the `Failed` | ||
* state. | ||
* | ||
* @return A pair: | ||
* - the name of the task function that failed. | ||
* - the error message sent from the task through `TaskContext::abort` or from Spider. | ||
* @throw spider::ConnectionException | ||
*/ | ||
auto get_error() -> std::pair<std::string, std::string>; | ||
}; | ||
} // namespace spider | ||
|
||
#endif // SPIDER_CLIENT_JOB_HPP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Standardize header file naming convention
The header file naming is inconsistent:
task.hpp
uses lowercaseTaskGraph.hpp
)Please maintain consistent PascalCase for class header files.
📝 Committable suggestion