Spark SQL isnumeric Function Alternative and Example

  • Post author:
  • Post last modified:July 30, 2020
  • Post category:Apache Spark
  • Reading time:5 mins read

Most of the organizations are moving their data warehouse to the Hive and using Spark as an execution engine. Spark as an execution engine will boost the performance. In SQL, there are many options that you can use to deal with non-numeric values, for example, you can create user defined functions to filter out unwanted data. In this article, we will check Spark SQL isnumeric function alternative and examples.

Spark SQL isnumeric Function

Spark SQL, or Apache Hive does not provide support for is numeric function. You have to write a user defined function using your favorite programming language and register it in Spark or use alternative SQL option to check numeric values.

In the subsequent chapters, we will check different methods that you can use in Spark SQL to check numeric data.

Spark SQL isnumeric Alternative

As mentioned earlier, Spark SQL does not provide support to isnumeric function. In another article, we have seen Hive isnumeric function alternatives. You can use those methods such as RLIKE or CAST function to filter out records or convert to numeric types.

You can also write user defined function and register that in Spark so that you can directly use that in your Spark SQL statements. In this article, we will write UDF using pyspark.

Create Python UDF to Check Numeric Value

You have to create python user defined function on pyspark terminal that you want to register in Spark.

For example, consider below user defined function.

def is_numeric(s):
    try:
        float(s)
        return True
    except ValueError:
        return False

The function will try to convert given string value to float. If the string is converted successfully, then True will be returned, and any value error False will be returned.

Import Spark Data Type

The second steps is to import the Spark data type. The type should be same as the one above created function returns.

For example, in our example, function returns Boolean value.

from pyspark.sql.types import BooleanType
Register isnumeric Function into Spark

The final step is to register the python function into spark. Use below command to register user defined function.

sqlContext.udf.register("is_numeric_type", is_numeric, BooleanType())
Spark SQL is numeric Check

Now, use above registered function in your Spark SQL function to check numeric value.

For example,

sqlContext.sql("select is_numeric_type('1.234')").show()
+----+
| _c0|
+----+
|true|
+----+

sqlContext.sql("select is_numeric_type('1234.234')").show()
+----+
| _c0|
+----+
|true|
+----+

sqlContext.sql("select is_numeric_type('12a123.2')").show()
+-----+
|  _c0|
+-----+
|false|
+-----+

Related Articles

Hope this helps 🙂