Similar to relational databases such as Snowflake, Teradata, Spark SQL support many useful array functions. You can use these array manipulation functions to manipulate the array types. In this article, we will check how to work with Spark SQL Array Functions its Syntax and Examples.
Spark SQL Array Functions
Following is the list of Spark SQL array functions with brief descriptions:
Spark SQL Array Function | Description |
---|---|
array(expr, …) | Returns an array with the given elements. |
array_contains(array, value) | Returns true if the array contains the value. |
array_distinct(array) | This function removes duplicate values from the array |
array_except(array1, array2) | Returns an array of the elements in array1 but not in array2, without duplicates. |
array_intersect(array1, array2) | Returns an array of the elements in the intersection of array1 and array2, without duplicates. |
array_join(array, delimiter[, nullReplacement]) | This function concatenates the elements of the given array using the delimiter. You can provide optional null replacement character. |
array_max(array) | This function returns maximum values in the array. It wont consider the NULL values. |
array_min(array) | This function returns minimum values in the array. It wont consider the NULL values. |
array_position(array, element) | Return an index of the array element. |
array_remove(array, element) | Removes the element from an array. |
array_repeat(element, count) | Returns the array containing element count times. |
array_sort(array) | Sorts the input array in ascending order. All null values will be placed at the end. |
array_union(array1, array2) | Returns union of two array variable without duplicate values. |
arrays_overlap(array1, array2) | Returns true if array1 contains at least a non-null element present also in array2. |
arrays_zip(array1, array2, …) | Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays. |
Spark SQL Array Functions Examples
Following are few examples on array function usage.
How to create an array in Spark SQL?
You can use the array
function to create an array in Spark SQL
For example,
> SELECT array(0, 1, 2, 3) as arr;
+------------+
| arr|
+------------+
|[0, 1, 2, 3]|
+------------+
How to check if element present in Spark SQL array?
You can use the array_contains
function to check if element present in an array.
For example,
> SELECT array_contains(array(1, 2, 3), 2) as arr;
+----+
| arr|
+----+
|true|
+----+
How to remove duplicate element from Spark SQL Array?
You can use array_distinct
function to remove duplicate elements from array.
For example,
> SELECT array_distinct(array(0, 1, 2, 3, 2, 3)) as arr
+------------+
| arr|
+------------+
|[0, 1, 2, 3]|
+------------+
How to return elements from the first Spark SQL array which are not present in second array?
You can use the array_except
function return element present in the first Spark SQL array which are not present in second array.
For example,
> SELECT array_except(array(1, 2, 3, 4), array(1, 3, 5)) as arr;
+------+
| arr|
+------+
|[2, 4]|
+------+
How to return common elements between two Spark SQL array variables?
You can use array_intersect
function to return common element between two Spark SQL array variables
For example,
> SELECT array_intersect(array(1, 2, 3, 4), array(1, 3, 5)) as arr
+------+
| arr|
+------+
|[1, 3]|
+------+
How to concatenate Spark SQL array element using delimiter?
You can use an array_join
function to concatenate Spark SQL array elements using a delimiter.
For example,
> SELECT array_join(array('Array', 'Examples'), ' ') as arr;
+--------------+
| arr|
+--------------+
|Array Examples|
+--------------+
How to return max element from a Spark SQL array?
You can use the array_max
function to return maximum value from a Spark SQL array.
For example,
> SELECT array_max(array(1, 20, null, 3)) as arr;
+---+
|arr|
+---+
| 20|
+---+
How to return min element from a Spark SQL array?
You can use the array_min
function to return minimum value from a Spark SQL array.
For example,
> SELECT array_min(array(1, 20, null, 3)) as arr;
+---+
|arr|
+---+
| 1|
+---+
How to return an index of the Spark SQL array element?
You can use the array_position
function to return an index of the Spark SQL array element.
For exmaple,
> SELECT array_position(array(3, 2, 1), 1) as arr;
+---+
|arr|
+---+
| 3|
+---+
How to remove an element from the Spark SQL Array?
You can use an array_remove
function to remove elements from the Spark SQL array.
For example,
> SELECT array_remove(array(3, 2, 1), 1) as arr
+------+
| arr|
+------+
|[3, 2]|
+------+
How to sort the Spark SQL array?
You can use an array_sort
function to sort the Spark SQL array.
For example,
> SELECT array_sort(array('b', 'd', null, 'c', 'a')) as arr;
+-------------+
| arr|
+-------------+
|[a, b, c, d,]|
+-------------+
> SELECT array_sort(array(4, 1, 2, 0, null)) as arr
;
+-------------+
| arr|
+-------------+
|[0, 1, 2, 4,]|
+-------------+
How to combine two Spark SQL arrays?
You can use an array_union
function to combine two Spark SQL array variables. It will remove the duplicate elements, if any.
For exmaple,
> SELECT array_union(array(1, 2, 3), array(1, 3, 5)) as arr;
+------------+
| arr|
+------------+
|[1, 2, 3, 5]|
+------------+
Hope this helps 🙂