diff --git a/README.md b/README.md index 46198be90..32b5b0d1f 100644 --- a/README.md +++ b/README.md @@ -305,8 +305,8 @@ python tools/process_data.py --config ./demos/process_video_on_ray/configs/demo. - To run data processing across multiple machines, it is necessary to ensure that all distributed nodes can access the corresponding data paths (for example, by mounting the respective data paths on a file-sharing system such as NAS). - The deduplicator operators for RAY mode are different from the single-machine version, and all those operators are prefixed with `ray`, e.g. `ray_video_deduplicator` and `ray_document_deduplicator`. Those operators also rely on a [Redis](https://redis.io/) instance. So in addition to starting the RAY cluster, you also need to setup your Redis instance in advance and provide `host` and `port` of your Redis instance in configuration. -> Users can also opt not to use RAY and instead split the dataset to run on a cluster with [Slurm](https://slurm.schedmd.com/) / [Aliyun PAI-DLC](https://www.aliyun.com/activity/bigdata/pai-dlc). In this case, please use the default Data-Juicer without RAY. - +> Users can also opt not to use RAY and instead split the dataset to run on a cluster with [Slurm](https://slurm.schedmd.com/). In this case, please use the default Data-Juicer without RAY. +> [Aliyun PAI-DLC](https://www.aliyun.com/activity/bigdata/pai-dlc) supports the RAY framework, Slurm framework, etc. Users can directly create RAY jobs and Slurm jobs on the DLC cluster. ### Data Analysis - Run `analyze_data.py` tool or `dj-analyze` command line tool with your config as the argument to analyze your dataset. diff --git a/README_ZH.md b/README_ZH.md index 7ee7acc70..03e349547 100644 --- a/README_ZH.md +++ b/README_ZH.md @@ -282,7 +282,8 @@ python tools/process_data.py --config ./demos/process_video_on_ray/configs/demo. - 如果需要在多机上使用RAY执行数据处理,需要确保所有节点都可以访问对应的数据路径,即将对应的数据路径挂载在共享文件系统(如NAS)中。 - RAY 模式下的去重算子与单机版本不同,所有 RAY 模式下的去重算子名称都以 `ray` 作为前缀,例如 `ray_video_deduplicator` 和 `ray_document_deduplicator`。这些去重算子依赖于 [Redis](https://redis.io/) 实例.因此使用前除启动 RAY 集群外还需要启动 Redis 实例,并在对应的配置文件中填写 Redis 实例的 `host` 和 `port`。 -> 用户也可以不使用 RAY,拆分数据集后使用 [Slurm](https://slurm.schedmd.com/) / [阿里云 PAI-DLC](https://www.aliyun.com/activity/bigdata/pai-dlc) 在集群上运行,此时使用不包含 RAY 的原版 Data-Juicer 即可。 +> 用户也可以不使用 RAY,拆分数据集后使用 [Slurm](https://slurm.schedmd.com/) 在集群上运行,此时使用不包含 RAY 的原版 Data-Juicer 即可。 +> [阿里云 PAI-DLC](https://www.aliyun.com/activity/bigdata/pai-dlc) 支持 RAY 框架、Slurm 框架等,用户可以直接在DLC集群上创建 RAY 作业 和 Slurm 作业。 ### 数据分析