Apache Spark Mode of operations or Deployment refers how Spark will run. Spark can run either in Local Mode or Cluster Mode. Local mode is used to test your application and cluster mode for production deployment. In this article, we will check the Spark Mode of operation and deployment.
Spark Mode of Operation
Apache Spark by default runs in Local Mode. Usually, local modes are used for developing applications and unit testing. Spark can be configured to run in Cluster Mode using YARN Cluster Manager.
Currently, Spark supports Three Cluster Managers
- Spark Standalone – Available as part of Spark Installation
- Spark on YARN (Hadoop)
- Spark on Mesos
You can specify the Spark mode of operation or deployment while submitting the Spark application. Usually, Spark applications are submitted using spark-submit script.
Read:
- Spark Dataset Join Operators using Pyspark – Examples
- Basic Spark Transformations and Actions using pyspark
Spark Standalone
Spark standalone refers to the built-in scheduler. It does not require any external scheduler. Spark Standalone is available as part of the Spark standard installation. Apache Spark Standalone includes everything you need to get started to execute your Spark applications.
Spark Standalone Syntax
Below is the spark-submit syntax that you can use to run spark application on locally as a standalone application.
# Run application locally on 8 cores
./bin/spark-submit \
/script/pyspark_test.py \
--master local[8] \
100
Do not get confused with term “standalone”. You can have a single machine or a multi-node fully distributed cluster both running in Spark Standalone mode. The term “standalone” simply means it does not need an external scheduler.
Spark on YARN
You can submit Spark applications to a Hadoop YARN cluster using a yarn master URL. Spark on YARN operation modes uses the resource schedulers YARN to run Spark applications.
By default, deployment mode will be client.
Spark on YARN Syntax
Below is the spark-submit syntax that you can use to run the spark application on YARN scheduler.
# Run Spark application on YARN schedule
spark-submit --master yarn mySparkApp.jar
Spark on Mesos
You can submit Spark applications to a Hadoop cluster using a Mesos master URL. In both YARN and Mesos case, you would need to establish a working YARN or Mesos cluster prior to installing and configuring Spark.
Spark on Mesos Syntax
Below is the spark-submit syntax that you can use to run the spark application on Mesos.
# Run on a Mesos cluster in cluster deploy mode
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master mesos://207.184.161.138:7077 \
--deploy-mode cluster \
--supervise \
--executor-memory 20G \
--total-executor-cores 100 \
http://path/to/examples.jar \
1000
Spark Deployment Modes
During Spark execution, you have to specify where the driver program reside. It can be local or cluster.
There are two types of Spark deployment modes:
- Spark Client Mode
- Spark Cluster Mode
For example, deploy spark application in cluster mode, you can use below syntax:
# Run on a Mesos cluster in cluster deploy mode
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master mesos://207.184.161.138:7077 \
--deploy-mode cluster \
--supervise \
--executor-memory 20G \
--total-executor-cores 100 \
http://path/to/examples.jar \
1000
Hope this helps 🙂