How to Install Apache Spark on Ubuntu 20.10/20.04/18.04 & Debian 10/9

In this tutorial, we’ll explain how to Install Apache Spark on Ubuntu. Apache Spark is an open-source unified analytics engine for large-scale data processing.

Steps to Install Apache Spark on Ubuntu 20.10/20.04/18.04

Step 1: Update Ubuntu system.

It is recommended to update the system before installation of Apache Spark.

sudo apt update

Step 2: Install Java on Ubuntu 20.10

Java package is a prerequisite to use Apache Spark. As of now we’ll install default Java on Ubuntu.

sudo apt install curl mlocate default-jdk -y

Step 3: Verify Java version

$ java -version
root@Apache-Spark:~# java -version
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.20.10)
OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.20.10, mixed mode, sharing)
root@Apache-Spark:~#

Step 4: Download Apache Spark on Ubuntu 20.10

Check out for the latest Apache Spark version.

curl -O https://archive.apache.org/dist/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz

Extract Spark tarball

tar xvf spark-3.1.1-bin-hadoop3.2.tgz

Move spark directory to /opt

sudo mv spark-3.1.1-bin-hadoop3.2/ /opt/spark 

Configure Spark environment

vim ~/.bashrc

Add below line:

export SPARK_HOME=/opt/spark

export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

Reflect or activate ~/.bashrc 

source ~/.bashrc

Step 5: Start a standalone master server

$ start-master.sh 

Sample Output:

root@Apache-Spark:~# start-master.sh 
starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-Apache-Spark.out
root@Apache-Spark:~#

Step 6: Verify the TCP port

$ sudo ss -tunelp | grep 8080

Sample Output:

root@Apache-Spark:~# sudo ss -tunelp | grep 8080
tcp LISTEN 0 1 *:8080 *:* users:(("java",pid=11444,fd=308)) ino:946073 sk:17 v6only:0 <->

Access Web UI

My Spark URL http://localhost:8080/ OR http://127.0.0.1:8080

how to Install Apache Spark on Ubuntu

Step 7: Start Spark worker 

$ start-workers.sh spark://localhost:7077

Sample Output:

ubuntu@php:/opt$ start-workers.sh spark://localhost:7077
ubuntu@localhost's password: 
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-ubuntu-org.apache.spark.deploy.worker.Worker-1-Apache-Spark.out
ubuntu@php:/opt$

If you are not getting start-slave.sh file using locate command

$ sudo updatedb

$ locate start-slave.sh

Once worker get started, then go back to the browser and access spark UI.

how to Install Apache Spark on Ubuntu

How to access Spark shell

$ /opt/spark/bin/spark-shell

how to Install Apache Spark on Ubuntu

How to access python spark shell

$ /opt/spark/bin/pyspark

how to Install Apache Spark on Ubuntu

How to shutdown master and slave Spark processes 

$ SPARK_HOME/sbin/stop-slave.sh

$ SPARK_HOME/sbin/stop-master.sh

End of the article, we’ve seen how to Install Apache Spark on Ubuntu.

See also:

List of monitoring tools 

Linux Blogs

AWS Cloud Blogs

Database Blogs

DevOps Blogs

Interview Questions & Answers

Docker Blogs

Google Cloud Blogs