Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

DataLab Features

viravit edited this page Nov 8, 2021 · 6 revisions

To be updated

DataLab Feature Mapping

# Features AWS [Debian] AWS [RedHat] Azure [Debian] Azure [RedHat] GCP [Debian]
1 DataLab installation
1.1 - Support of installing DataLab into one VPC yes yes yes yes yes
1.2 - Support of installing DataLab into two VPCs yes yes no no no
1.3 - DataLab installation via public IP yes yes yes no yes
1.4 - DataLab installation via private IP yes yes yes yes no
2 Login/Logout
2.1 - Login with LDAP yes yes yes yes yes
2.2 - Login with OAuth2 authentication and authorization no no yes yes no
2.3 - Logout yes yes yes yes yes
3 Edge node management
3.1 - Create Edge node yes yes yes yes yes
3.2 - Stop Edge node yes yes yes yes yes
3.3 - Start Edge node yes yes yes yes yes
3.4 - Recreate Edge node in progress in progress in progress in progress in progress
4 Supported notebook templates
4.1 - Jupyter notebook template yes yes yes yes yes
4.2 - RStudio notebook template yes yes yes yes yes
4.3 - Apache Zeppelin notebook template yes yes yes yes yes
4.4 - DeepLearning notebook template yes yes yes no yes
4.5 - Rstudio with TensorFlow notebook template yes yes no no no
4.6 - Jupyter with TensorFlow notebook template yes yes yes no yes
4.7 - Superset notebook template no no no no yes
5 Notebook instance management
5.1 - Stop notebook server instance yes yes yes yes yes
5.2 - Start notebook server instance yes yes yes yes yes
5.3 - Terminate notebook server instance yes yes yes yes yes
5.5 - Go to notebook UI (reverse proxy) yes yes yes yes yes
5.6 - Creating custom AMI from running notebook instance yes yes yes yes yes
5.7 - Creating notebook instance from custom image yes yes yes yes yes
5.8 - Creating notebook instance from shared image yes yes yes yes yes
5.9 - Tune local spark parameters from web UI on instance creation step: yes yes yes yes yes
5.10 - Reconfiguration local spark on already existed notebook server instance yes yes yes yes yes
6 Notebook templates that support Spark Standalone as computational resource
6.1 - Jupyter notebook template yes yes yes yes yes
6.2 - RStudio notebook template yes yes yes yes yes
6.2 - Apache Zeppelin notebook template yes yes yes yes yes
6.3 - DeepLearning notebook template yes yes yes no yes
6.4 - RStudio with TensorFlow notebook template yes yes no no no
6.5 - Jupyter with TensorFlow notebook template yes yes yes no yes
6.5 - Superset notebook template no no no no no
7 Data Engine (Spark Standalone) management
7.1 - Stop yes yes yes yes yes
7.2 - Start  yes yes yes yes yes
7.3 - Terminate yes yes yes yes yes
7.4 - Ability to deploy Spark Standalone using Notebook's images yes yes yes yes yes
7.5 - Ability to tune Spark Standalone parameters from web UI on instance creation step yes yes yes yes yes
7.6 - Ability to reconfiguration already existing Spark Standalone from DataLab Web UI yes yes yes yes yes
7.7 - Ability to access Spark Standalone job tracker URL from Web UI (via reverse proxy) yes yes yes yes yes
8 Notebook templates that support Cloud provider Data Engine Service as computational resource
8.1 - Jupyter notebook template yes, EMR yes, EMR no no yes, Dataproc
8.2 - RStudio notebook template yes, EMR yes, EMR no no yes, Dataproc
8.3 - Zeppelin notebook template yes, EMR yes, EMR no no yes, Dataproc
8.4 - TensorFlow notebook template no no no no no
8.5 - DeepLearning notebook template no no no no no
8.6 - RStudio with TensorFlow notebook temlate no no no no no
8.7 - Jupyter with TensorFlow notebook template no no no no no
8.8 - Superset notebook template no no no no no
9 Data Engine Service management
9.1 - Stop no no no no no
9.2 - Start no no no no no
9.3 - Terminate yes yes yes yes yes
9.4 - Ability to tune Engine Service parameters from WEB UI on instance creation step yes yes no no no
9.5 - Ability to navigate to Data Engine Service job tracker URL from Web UI (via reverse proxy) yes yes no no yes
10 Libraries management
10.1 - Ability to deploy libraries on notebook instance yes yes yes yes yes
10.2 - Ability to deploy libraries on Data Engine (Spark Standalone) yes yes yes yes yes
10.3 - Ability to deploy libraries on Data Engine Service yes yes no no yes
11 Available library groups for installation from WEB UI:
11.1 - Apt/Yum yes yes yes yes yes
11.2 - Pip2 yes yes yes yes yes
11.3 - Pip3 yes yes yes yes yes
11.4 - R packages yes yes yes yes yes
11.5 - Java yes yes yes yes yes
12 Instance management via scheduler
12.1 - Ability to stop/start notebook instance on scheduled basis yes yes yes yes yes
12.2 - Ability to stop/start Data Engine (Spark Standalone) on scheduled basis yes yes yes yes yes
12.3 - Ability to stop/start Data Engine Service on scheduled basis yes yes yes yes yes
12.4 - Ability to terminate Compute on scheduled basis yes yes yes yes yes
12.5 - Support of resources stopping on exceeding idle time via scheduler yes yes yes yes yes
12.6 - Reminder after login, notifying that corresponding resources are about to be stopped/terminated yes yes yes yes yes
13 Admin user only functionality
13.1 - Ability to stop user's Edge node/Compute/ notebook instance separately yes yes yes yes yes
13.2 - Ability to terminate user's Compute/notebook instance separately yes yes yes yes yes
13.3 - Ability to stop user's Edge node with related instances simultaneously yes yes yes yes yes
13.4 - Ability to terminate user's Edge node with related instances simultaneously yes yes yes yes yes
13.5 - Ability to connect/disconnect endpoint yes yes yes yes yes
13.6 - Ability to restrict available instance shapes based on user login (per user/group) yes yes yes yes yes
13.7 - Ability to adjust total cost limitation for project and total Datalab as well yes yes yes yes yes
13.8 - Ability to see and export billing data for all users yes yes yes yes yes
13.9 - Ability to to set permissions for cloud buckets if user only accesses via bucket browser yes yes yes yes yes
14 Notebook templates that support cloning repository/merging/pulling/pushing), ungit
14.1 - Jupyter notebook template yes yes yes yes yes
14.2 - RStudio notebook template yes yes yes yes yes
14.3 - Apache Zeppelin notebook template no no no no no
14.4 - DeepLearning notebook template yes yes yes no yes
14.5 - Rstudio with TensorFlow notebook temlate yes no no no no
14.6 - Jupyter with TensorFlow notebook template yes yes yes no yes
15 Data Storage
15.1 - Ability to read/write from/to shared or personal bucket yes, S3 yes, S3 yes, blob storage, data lake yes, blob storage, data lake yes, blob storage, data lake
16 Bucket browser
16.1 Ability to upload file, create folder, delete folder/file, download file, copy path to folder/file yes yes yes yes yes
17 Billing Report
17.1 - Ability to see and export billing Data yes yes yes yes no
18 Audit Report
18.1 - Ability to see all users action on DataLab UI yes yes yes yes yes
Clone this wiki locally