Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

Latest commit

 

History

History
476 lines (473 loc) · 19.3 KB

scheduler-configuration.md

File metadata and controls

476 lines (473 loc) · 19.3 KB

Scheduler Configuration Reference

The Aurora scheduler can take a variety of configuration options through command-line arguments. A list of the available options can be seen by running aurora-scheduler -help.

Please refer to the Operator Configuration Guide for details on how to properly set the most important options.

Usage: org.apache.aurora.scheduler.app.SchedulerMain [options]
  Options:
    -allow_container_volumes
      Allow passing in volumes in the job. Enabling this could pose a
      privilege escalation threat.
      Default: false
    -allow_docker_parameters
      Allow to pass docker container parameters in the job.
      Default: false
    -allow_gpu_resource
      Allow jobs to request Mesos GPU resource.
      Default: false
    -allowed_container_types
      Container types that are allowed to be used by jobs.
      Default: [MESOS]
    -allowed_job_environments
      Regular expression describing the environments that are allowed to be
      used by jobs.
      Default: ^(prod|devel|test|staging\d*)$
    -async_slot_stat_update_interval
      Interval on which to try to update open slot stats.
      Default: (1, mins)
    -async_task_stat_update_interval
      Interval on which to try to update resource consumption stats.
      Default: (1, hrs)
    -async_worker_threads
      The number of worker threads to process async task operations with.
      Default: 8
  * -backup_dir
      Directory to store backups under. Will be created if it does not exist.
    -backup_interval
      Minimum interval on which to write a storage backup.
      Default: (1, hrs)
  * -cluster_name
      Name to identify the cluster being served.
    -cron_scheduler_num_threads
      Number of threads to use for the cron scheduler thread pool.
      Default: 10
    -cron_scheduling_max_batch_size
      The maximum number of triggered cron jobs that can be processed in a
      batch.
      Default: 10
    -cron_start_initial_backoff
      Initial backoff delay while waiting for a previous cron run to be
      killed.
      Default: (5, secs)
    -cron_start_max_backoff
      Max backoff delay while waiting for a previous cron run to be killed.
      Default: (1, mins)
    -cron_timezone
      TimeZone to use for cron predictions.
      Default: GMT
    -custom_executor_config
      Path to custom executor settings configuration file.
    -default_docker_parameters
      Default docker parameters for any job that does not explicitly declare
      parameters.
      Default: []
    -dlog_max_entry_size
      Specifies the maximum entry size to append to the log. Larger entries
      will be split across entry Frames.
      Default: (512, KB)
    -dlog_snapshot_interval
      Specifies the frequency at which snapshots of local storage are taken
      and written to the log.
      Default: (1, hrs)
    -enable_cors_for
      List of domains for which CORS support should be enabled.
    -enable_mesos_fetcher
      Allow jobs to pass URIs to the Mesos Fetcher. Note that enabling this
      feature could pose a privilege escalation threat.
      Default: false
    -enable_preemptor
      Enable the preemptor and preemption
      Default: true
    -enable_revocable_cpus
      Treat CPUs as a revocable resource.
      Default: true
    -enable_revocable_ram
      Treat RAM as a revocable resource.
      Default: false
    -enable_update_affinity
      Enable best-effort affinity of task updates.
      Default: false
    -executor_user
      User to start the executor. Defaults to "root". Set this to an
      unprivileged user if the mesos master was started with
      "--no-root_submissions". If set to anything other than "root", the
      executor will ignore the "role" setting for jobs since it can't use
      setuid() anymore. This means that all your jobs will run under the
      specified user and the user has to exist on the Mesos agents.
      Default: root
    -first_schedule_delay
      Initial amount of time to wait before first attempting to schedule a
      PENDING task.
      Default: (1, ms)
    -flapping_task_threshold
      A task that repeatedly runs for less than this time is considered to be
      flapping.
      Default: (5, mins)
    -framework_announce_principal
      When 'framework_authentication_file' flag is set, the FrameworkInfo
      registered with the mesos master will also contain the principal. This
      is necessary if you intend to use mesos authorization via mesos ACLs.
      The default will change in a future release. Changing this value is
      backwards incompatible. For details, see MESOS-703.
      Default: false
    -framework_authentication_file
      Properties file which contains framework credentials to authenticate
      with Mesosmaster. Must contain the properties
      'aurora_authentication_principal' and 'aurora_authentication_secret'.
    -framework_failover_timeout
      Time after which a framework is considered deleted.  SHOULD BE VERY
      HIGH.
      Default: (21, days)
    -framework_name
      Name used to register the Aurora framework with Mesos.
      Default: Aurora
    -global_container_mounts
      A comma separated list of mount points (in host:container form) to mount
      into all (non-mesos) containers.
      Default: []
    -history_max_per_job_threshold
      Maximum number of terminated tasks to retain in a job history.
      Default: 100
    -history_min_retention_threshold
      Minimum guaranteed time for task history retention before any pruning is
      attempted.
      Default: (1, hrs)
    -history_prune_threshold
      Time after which the scheduler will prune terminated task history.
      Default: (2, days)
    -hold_offers_forever
      Hold resource offers indefinitely, disabling automatic offer decline
      settings.
      Default: false
    -host_maintenance_polling_interval
      Interval between polling for pending host maintenance requests.
      Default: (1, mins)
    -hostname
      The hostname to advertise in ZooKeeper instead of the locally-resolved
      hostname.
    -http_authentication_mechanism
      HTTP Authentication mechanism to use.
      Default: NONE
      Possible Values: [NONE, BASIC, NEGOTIATE]
    -http_port
      The port to start an HTTP server on.  Default value will choose a random
      port.
      Default: 0
    -initial_flapping_task_delay
      Initial amount of time to wait before attempting to schedule a flapping
      task.
      Default: (30, secs)
    -initial_schedule_penalty
      Initial amount of time to wait before attempting to schedule a task that
      has failed to schedule.
      Default: (1, secs)
    -initial_task_kill_retry_interval
      When killing a task, retry after this delay if mesos has not responded,
      backing off up to transient_task_state_timeout
      Default: (15, secs)
    -ip
      The ip address to listen. If not set, the scheduler will listen on all
      interfaces.
    -job_update_history_per_job_threshold
      Maximum number of completed job updates to retain in a job update
      history.
      Default: 10
    -job_update_history_pruning_interval
      Job update history pruning interval.
      Default: (15, mins)
    -job_update_history_pruning_threshold
      Time after which the scheduler will prune completed job update history.
      Default: (30, days)
    -kerberos_debug
      Produce additional Kerberos debugging output.
      Default: false
    -kerberos_server_keytab
      Path to the server keytab.
    -kerberos_server_principal
      Kerberos server principal to use, usually of the form
      HTTP/[email protected]
    -max_flapping_task_delay
      Maximum delay between attempts to schedule a flapping task.
      Default: (5, mins)
    -max_leading_duration
      After leading for this duration, the scheduler should commit suicide.
      Default: (1, days)
    -max_parallel_coordinated_maintenance
      Maximum number of coordinators that can be contacted in parallel.
      Default: 10
    -max_registration_delay
      Max allowable delay to allow the driver to register before aborting
      Default: (1, mins)
    -max_reschedule_task_delay_on_startup
      Upper bound of random delay for pending task rescheduling on scheduler
      startup.
      Default: (30, secs)
    -max_saved_backups
      Maximum number of backups to retain before deleting the oldest backups.
      Default: 48
    -max_schedule_attempts_per_sec
      Maximum number of scheduling attempts to make per second.
      Default: 40.0
    -max_schedule_penalty
      Maximum delay between attempts to schedule a PENDING tasks.
      Default: (1, mins)
    -max_sla_duration_secs
      Maximum duration window for which SLA requirements are to be
      satisfied.This does not apply to jobs that have a CoordinatorSlaPolicy.
      Default: (2, hrs)
    -max_status_update_batch_size
      The maximum number of status updates that can be processed in a batch.
      Default: 1000
    -max_task_event_batch_size
      The maximum number of task state change events that can be processed in
      a batch.
      Default: 300
    -max_tasks_per_job
      Maximum number of allowed tasks in a single job.
      Default: 4000
    -max_tasks_per_schedule_attempt
      The maximum number of tasks to pick in a single scheduling attempt.
      Default: 5
    -max_update_instance_failures
      Upper limit on the number of failures allowed during a job update. This
      helps cap potentially unbounded entries into storage.
      Default: 20000
    -mesos_driver
      Which Mesos Driver to use
      Default: SCHEDULER_DRIVER
      Possible Values: [SCHEDULER_DRIVER, V0_DRIVER, V1_DRIVER]
  * -mesos_master_address
      Address for the mesos master, can be a socket address or zookeeper path.
    -mesos_role
      The Mesos role this framework will register as. The default is to left
      this empty, and the framework will register without any role and only
      receive unreserved resources in offer.
    -min_offer_hold_time
      Minimum amount of time to hold a resource offer before declining.
      Default: (5, mins)
    -min_required_instances_for_sla_check
      Minimum number of instances required for a job to be eligible for SLA
      check. This does not apply to jobs that have a CoordinatorSlaPolicy.
      Default: 20
    -native_log_election_retries
      The maximum number of attempts to obtain a new log writer.
      Default: 20
    -native_log_election_timeout
      The timeout for a single attempt to obtain a new log writer.
      Default: (15, secs)
    -native_log_file_path
      Path to a file to store the native log data in.  If the parent directory
      doesnot exist it will be created.
    -native_log_quorum_size
      The size of the quorum required for all log mutations.
      Default: 1
    -native_log_read_timeout
      The timeout for doing log reads.
      Default: (5, secs)
    -native_log_write_timeout
      The timeout for doing log appends and truncations.
      Default: (3, secs)
    -native_log_zk_group_path
      A zookeeper node for use by the native log to track the master
      coordinator.
    -offer_filter_duration
      Duration after which we expect Mesos to re-offer unused resources. A
      short duration improves scheduling performance in smaller clusters, but
      might lead to resource starvation for other frameworks if you run many
      frameworks in your cluster.
      Default: (5, secs)
    -offer_hold_jitter_window
      Maximum amount of random jitter to add to the offer hold time window.
      Default: (1, mins)
    -offer_order
      Iteration order for offers, to influence task scheduling. Multiple
      orderings will be compounded together. E.g. CPU,MEMORY,RANDOM would sort
      first by cpus offered, then memory and finally would randomize any equal
      offers.
      Default: [RANDOM]
    -offer_reservation_duration
      Time to reserve a agent's offers while trying to satisfy a task
      preempting another.
      Default: (3, mins)
    -offer_set_module
      Custom Guice module to provide a custom OfferSet.
      Default: class org.apache.aurora.scheduler.offers.OfferManagerModule$OfferSetModule
    -offer_static_ban_cache_max_size
      The number of offers to hold in the static ban cache. If no value is
      specified, the cache will grow indefinitely. However, entries will
      expire within 'min_offer_hold_time' + 'offer_hold_jitter_window' of
      being written.
      Default: 9223372036854775807
    -partition_aware
      Enable paritition-aware status updates.
      Default: false
    -populate_discovery_info
      If true, Aurora populates DiscoveryInfo field of Mesos TaskInfo.
      Default: false
    -preemption_delay
      Time interval after which a pending task becomes eligible to preempt
      other tasks
      Default: (3, mins)
    -preemption_reservation_max_batch_size
      The maximum number of reservations for a task group to be made in a
      batch.
      Default: 5
    -preemption_slot_finder_modules
      Guice modules for custom preemption slot searching for pending tasks.
      Default: [class org.apache.aurora.scheduler.preemptor.PendingTaskProcessorModule, class org.apache.aurora.scheduler.preemptor.PreemptionVictimFilterModule]
    -preemption_slot_hold_time
      Time to hold a preemption slot found before it is discarded.
      Default: (5, mins)
    -preemption_slot_search_initial_delay
      Initial amount of time to delay preemption slot searching after
      scheduler start up.
      Default: (3, mins)
    -preemption_slot_search_interval
      Time interval between pending task preemption slot searches.
      Default: (1, mins)
    -receive_revocable_resources
      Allows receiving revocable resource offers from Mesos.
      Default: false
    -reconciliation_explicit_batch_interval
      Interval between explicit batch reconciliation requests.
      Default: (5, secs)
    -reconciliation_explicit_batch_size
      Number of tasks in a single batch request sent to Mesos for explicit
      reconciliation.
      Default: 1000
    -reconciliation_explicit_interval
      Interval on which scheduler will ask Mesos for status updates of
      allnon-terminal tasks known to scheduler.
      Default: (60, mins)
    -reconciliation_implicit_interval
      Interval on which scheduler will ask Mesos for status updates of
      allnon-terminal tasks known to Mesos.
      Default: (60, mins)
    -reconciliation_initial_delay
      Initial amount of time to delay task reconciliation after scheduler
      start up.
      Default: (1, mins)
    -reconciliation_schedule_spread
      Difference between explicit and implicit reconciliation intervals
      intended to create a non-overlapping task reconciliation schedule.
      Default: (30, mins)
    -require_docker_use_executor
      If false, Docker tasks may run without an executor (EXPERIMENTAL)
      Default: true
    -scheduling_max_batch_size
      The maximum number of scheduling attempts that can be processed in a
      batch.
      Default: 3
    -serverset_endpoint_name
      Name of the scheduler endpoint published in ZooKeeper.
      Default: http
  * -serverset_path
      ZooKeeper ServerSet path to register at.
    -shiro_after_auth_filter
      Fully qualified class name of the servlet filter to be applied after the
      shiro auth filters are applied.
    -shiro_credentials_matcher
      The shiro credentials matcher to use (will be constructed by Guice).
      Default: class org.apache.shiro.authc.credential.SimpleCredentialsMatcher
    -shiro_ini_path
      Path to shiro.ini for authentication and authorization configuration.
    -shiro_realm_modules
      Guice modules for configuring Shiro Realms.
      Default: [class org.apache.aurora.scheduler.http.api.security.IniShiroRealmModule]
    -sla_aware_action_max_batch_size
      The maximum number of sla aware update actions that can be processed in
      a batch.
      Default: 300
    -sla_aware_kill_non_prod
      Enables SLA awareness for drain and and update for non-production tasks
      Default: false
    -sla_aware_kill_retry_max_delay
      Maximum amount of time to wait between attempting to perform an
      SLA-Aware kill on a task.
      Default: (5, mins)
    -sla_aware_kill_retry_min_delay
      Minimum amount of time to wait between attempting to perform an
      SLA-Aware kill on a task.
      Default: (1, mins)
    -sla_coordinator_timeout
      Timeout interval for communicating with Coordinator.
      Default: (1, mins)
    -sla_non_prod_metrics
      Metric categories collected for non production tasks.
      Default: []
    -sla_prod_metrics
      Metric categories collected for production tasks.
      Default: [JOB_UPTIMES, PLATFORM_UPTIME, MEDIANS]
    -sla_stat_refresh_interval
      The SLA stat refresh interval.
      Default: (1, mins)
    -stat_retention_period
      Time for a stat to be retained in memory before expiring.
      Default: (1, hrs)
    -stat_sampling_interval
      Statistic value sampling interval.
      Default: (1, secs)
    -task_assigner_modules
      Guice modules for customizing task assignment.
      Default: [class org.apache.aurora.scheduler.scheduling.TaskAssignerImplModule]
    -thermos_executor_cpu
      The number of CPU cores to allocate for each instance of the executor.
      Default: 0.25
    -thermos_executor_flags
      Extra arguments to be passed to the thermos executor
    -thermos_executor_path
      Path to the thermos executor entry point.
    -thermos_executor_ram
      The amount of RAM to allocate for each instance of the executor.
      Default: (128, MB)
    -thermos_executor_resources
      A comma separated list of additional resources to copy into the
      sandbox.Note: if thermos_executor_path is not the thermos_executor.pex
      file itself, this must include it.
      Default: []
    -thermos_home_in_sandbox
      If true, changes HOME to the sandbox before running the executor. This
      primarily has the effect of causing the executor and runner to extract
      themselves into the sandbox.
      Default: false
    -thrift_method_interceptor_modules
      Custom Guice module(s) to provide additional Thrift method interceptors.
      Default: []
    -tier_config
      Configuration file defining supported task tiers, task traits and
      behaviors.
    -transient_task_state_timeout
      The amount of time after which to treat a task stuck in a transient
      state as LOST.
      Default: (5, mins)
    -unavailability_threshold
      Threshold time, when running tasks should be drained from a host, before
      a host becomes unavailable. Should be greater than min_offer_hold_time +
      offer_hold_jitter_window.
      Default: (6, mins)
    -update_affinity_reservation_hold_time
      How long to wait for a reserved agent to reoffer freed up resources.
      Default: (3, mins)
    -viz_job_url_prefix
      URL prefix for job container stats.
      Default: <empty string>
    -webhook_config
      Path to webhook configuration file.
    -zk_chroot_path
      chroot path to use for the ZooKeeper connections
    -zk_connection_timeout
      The ZooKeeper connection timeout.
      Default: (10, secs)
    -zk_digest_credentials
      user:password to use when authenticating with ZooKeeper.
  * -zk_endpoints
      Endpoint specification for the ZooKeeper servers.
    -zk_in_proc
      Launches an embedded zookeeper server for local testing causing
      -zk_endpoints to be ignored if specified.
      Default: false
    -zk_session_timeout
      The ZooKeeper session timeout.
      Default: (15, secs)