![Install Apache Spark on macOS using Homebrew](https://www.justinjbird.me/images/apps/spark.webp)
Install Apache Spark on macOS using Homebrew
Table of Contents
Introduction
Installing Apache Spark on macOS is a simple process using Homebrew. Homebrew is a package manager for macOS that (in their own words) “installs stuff you need”. This guide will walk you through the steps to install Apache Spark on macOS.
Pre-requisites
You will need to have Homebrew installed on your Mac. If you don’t have it installed, you can install it by running the following command in terminal:
curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh
You can alternatively download a .pkg file from the Homebrew website.
Install Java
Checking over the Apache Spark Homebrew Formula there is a dependency on openjdk@17
which is a development kit for the Java programming language. You can install Java using Homebrew by running the following command in terminal:
arch -arm64 brew install openjdk@17 # or run this on non-Mx Macs: brew install openjdk@17
Install Apache Spark
Once complete, you can run the following command to install Apache Spark:
arch -arm64 brew install apache-spark # or run this on non-Mx Macs: brew install apache-spark
Run spark shell
Once installed, you can run spark-shell
from the command line to test out the install. Running the command should return something similar to this:
24/03/06 00:05:15 WARN Utils: Your hostname, {computer-name} resolves to a loopback address: 127.0.0.1; using 192.168.68.115 instead (on interface en0)
24/03/06 00:05:15 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/03/06 00:05:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://192.168.68.115:4040
Spark context available as 'sc' (master = local[*], app id = local-1709683518635).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.5.1
/_/
Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 17.0.10)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
Access the web UI
Note from the output, the line Spark context Web UI available at http://192.168.68.115:4040
which is the address I can use to access the Spark UI. Your IP address is potentially going to be different so make sure to use the one that is in your output.
![A screenshot of the spark web portal, there is a menu at the top that displays jobs, stages, storage, environment and executors, the jobs menu is selected and a timeline is displayed showing executors added / removed and jobs succeeded / failed / running. Nothing has run yet.](https://www.justinjbird.me/posts/2024/2024-06-05-001.webp)
Now to create a dataframe and read in a csv file…
Create a dataframe
I have placed a csv file in the root of my home directory called data.csv. I am going to load this in and then display the contents of the dataframe using these commands:
## read the data
val df = spark.read.format("csv").option("header","true").option("inferSchema", "true").load("./data.csv")
## show the contents of the dataframe
df.show()
This should return the contents of the csv file:
+-------+----+------+
| person|code|colour|
+-------+----+------+
| luke|jedi| green|
| yoda|jedi| green|
| anakin|jedi| red|
|obi-wan|jedi| blue|
| vader|sith| red|
|sidious|sith| red|
| maul|sith| red|
+-------+----+------+
Consider that the path I have used is relative to where I started spark-shell
so you may need to adjust the path to the csv file accordingly.
You can exit the spark shell by pressing Ctrl + D
.
Conclusion
And that is it! In just a few minutes I have spun up Apache Spark and loaded data from a csv file direct from my machine.
References
![the jedi order logo](/images/star-wars/jedi.webp)
#mtfbwy