Spark SQL Array Functions – Syntax and Examples

  • Post author:
  • Post last modified:June 14, 2021
  • Post category:Apache Spark
  • Reading time:8 mins read

Similar to relational databases such as Snowflake, Teradata, Spark SQL support many useful array functions. You can use these array manipulation functions to manipulate the array types. In this article, we will check how to work with Spark SQL Array Functions its Syntax and Examples.

Spark SQL Array Functions

Following is the list of Spark SQL array functions with brief descriptions:

Spark SQL Array FunctionDescription
array(expr, …) Returns an array with the given elements.
array_contains(array, value)Returns true if the array contains the value.
array_distinct(array)This function removes duplicate values from the array
array_except(array1, array2)Returns an array of the elements in array1 but not in array2, without duplicates.
array_intersect(array1, array2)Returns an array of the elements in the intersection of array1 and array2, without duplicates.
array_join(array, delimiter[, nullReplacement])This function concatenates the elements of the given array using the delimiter. You can provide optional null replacement character.
array_max(array)This function returns maximum values in the array. It wont consider the NULL values.
array_min(array)This function returns minimum values in the array. It wont consider the NULL values.
array_position(array, element)Return an index of the array element.
array_remove(array, element)Removes the element from an array.
array_repeat(element, count) Returns the array containing element count times.
array_sort(array)Sorts the input array in ascending order. All null values will be placed at the end.
array_union(array1, array2)Returns union of two array variable without duplicate values.
arrays_overlap(array1, array2)Returns true if array1 contains at least a non-null element present also in array2.
arrays_zip(array1, array2, …) Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.

Spark SQL Array Functions Examples

Following are few examples on array function usage.

How to create an array in Spark SQL?

You can use the array function to create an array in Spark SQL

For example,

> SELECT array(0, 1, 2, 3) as arr;
+------------+
|         arr|
+------------+
|[0, 1, 2, 3]|
+------------+

How to check if element present in Spark SQL array?

You can use the array_contains function to check if element present in an array.

For example,

> SELECT array_contains(array(1, 2, 3), 2) as arr;
+----+
| arr|
+----+
|true|
+----+
How to remove duplicate element from Spark SQL Array?

You can use array_distinct function to remove duplicate elements from array.

For example,

> SELECT array_distinct(array(0, 1, 2, 3, 2, 3)) as arr
+------------+
|         arr|
+------------+
|[0, 1, 2, 3]|
+------------+
How to return elements from the first Spark SQL array which are not present in second array?

You can use the array_except function return element present in the first Spark SQL array which are not present in second array.

For example,

> SELECT array_except(array(1, 2, 3, 4), array(1, 3, 5)) as arr;
+------+
|   arr|
+------+
|[2, 4]|
+------+
How to return common elements between two Spark SQL array variables?

You can use array_intersect function to return common element between two Spark SQL array variables

For example,

> SELECT array_intersect(array(1, 2, 3, 4), array(1, 3, 5)) as arr
+------+
|   arr|
+------+
|[1, 3]|
+------+
How to concatenate Spark SQL array element using delimiter?

You can use an array_join function to concatenate Spark SQL array elements using a delimiter.

For example,

> SELECT array_join(array('Array', 'Examples'), ' ') as arr;
+--------------+
|           arr|
+--------------+
|Array Examples|
+--------------+
How to return max element from a Spark SQL array?

You can use the array_max function to return maximum value from a Spark SQL array.

For example,

> SELECT array_max(array(1, 20, null, 3)) as arr;
+---+
|arr|
+---+
| 20|
+---+
How to return min element from a Spark SQL array?

You can use the array_min function to return minimum value from a Spark SQL array.

For example,

> SELECT array_min(array(1, 20, null, 3)) as arr;
+---+
|arr|
+---+
|  1|
+---+
How to return an index of the Spark SQL array element?

You can use the array_position function to return an index of the Spark SQL array element.

For exmaple,

> SELECT array_position(array(3, 2, 1), 1) as arr;
+---+
|arr|
+---+
|  3|
+---+
How to remove an element from the Spark SQL Array?

You can use an array_remove function to remove elements from the Spark SQL array.

For example,

> SELECT array_remove(array(3, 2, 1), 1) as arr
+------+
|   arr|
+------+
|[3, 2]|
+------+
How to sort the Spark SQL array?

You can use an array_sort function to sort the Spark SQL array.

For example,

> SELECT array_sort(array('b', 'd', null, 'c', 'a')) as arr;
+-------------+
|          arr|
+-------------+
|[a, b, c, d,]|
+-------------+

> SELECT array_sort(array(4, 1, 2, 0, null)) as arr
;
+-------------+
|          arr|
+-------------+
|[0, 1, 2, 4,]|
+-------------+
How to combine two Spark SQL arrays?

You can use an array_union function to combine two Spark SQL array variables. It will remove the duplicate elements, if any.

For exmaple,

> SELECT array_union(array(1, 2, 3), array(1, 3, 5)) as arr;
+------------+
|         arr|
+------------+
|[1, 2, 3, 5]|
+------------+

Hope this helps 🙂