Spark Modes of Operation and Deployment

  • Post author:
  • Post last modified:July 16, 2019
  • Post category:Apache Spark
  • Reading time:5 mins read

Apache Spark Mode of operations or Deployment refers how Spark will run. Spark can run either in Local Mode or Cluster Mode. Local mode is used to test your application and cluster mode for production deployment. In this article, we will check the Spark Mode of operation and deployment.

Apache Spark Mode of Operation and Deployment

Spark Mode of Operation

Apache Spark by default runs in Local Mode. Usually, local modes are used for developing applications and unit testing. Spark can be configured to run in Cluster Mode using YARN Cluster Manager.

Currently, Spark supports Three Cluster Managers

  • Spark Standalone – Available as part of Spark Installation
  • Spark on YARN (Hadoop)
  • Spark on Mesos

You can specify the Spark mode of operation or deployment while submitting the Spark application. Usually, Spark applications are submitted using spark-submit script.

Read:

Spark Standalone

Spark standalone refers to the built-in scheduler. It does not require any external scheduler. Spark Standalone is available as part of the Spark standard installation. Apache Spark Standalone includes everything you need to get started to execute your Spark applications.

Spark Standalone Syntax

Below is the spark-submit syntax that you can use to run spark application on locally as a standalone application.

# Run application locally on 8 cores
./bin/spark-submit \
  /script/pyspark_test.py \
  --master local[8] \
  100

Do not get confused with term “standalone”. You can have a single machine or a multi-node fully distributed cluster both running in Spark Standalone mode. The term “standalone” simply means it does not need an external scheduler.

Spark on YARN

You can submit Spark applications to a Hadoop YARN cluster using a yarn master URL. Spark on YARN operation modes uses the resource schedulers YARN to run Spark applications.

By default, deployment mode will be client.

Spark on YARN Syntax

Below is the spark-submit syntax that you can use to run the spark application on YARN scheduler.

# Run Spark application on YARN schedule
spark-submit --master yarn mySparkApp.jar

Spark on Mesos

You can submit Spark applications to a Hadoop cluster using a Mesos master URL. In both YARN and Mesos case, you would need to establish a working YARN or Mesos cluster prior to installing and configuring Spark.

Spark on Mesos Syntax

Below is the spark-submit syntax that you can use to run the spark application on Mesos.

# Run on a Mesos cluster in cluster deploy mode
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master mesos://207.184.161.138:7077 \
  --deploy-mode cluster \
  --supervise \
  --executor-memory 20G \
  --total-executor-cores 100 \
  http://path/to/examples.jar \
  1000

Spark Deployment Modes

During Spark execution, you have to specify where the driver program reside. It can be local or cluster.

There are two types of Spark deployment modes:

  • Spark Client Mode
  • Spark Cluster Mode

For example, deploy spark application in cluster mode, you can use below syntax:

# Run on a Mesos cluster in cluster deploy mode
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master mesos://207.184.161.138:7077 \
  --deploy-mode cluster \
  --supervise \
  --executor-memory 20G \
  --total-executor-cores 100 \
  http://path/to/examples.jar \
  1000

Hope this helps 🙂