Most of the organizations are moving their data warehouse to the Hive and using Spark as an execution engine. Spark as an execution engine will boost the performance. In SQL, there are many options that you can use to deal with non-numeric values, for example, you can create user defined functions to filter out unwanted data. In this article, we will check Spark SQL isnumeric function alternative and examples.
Spark SQL isnumeric Function
Spark SQL, or Apache Hive does not provide support for is numeric function. You have to write a user defined function using your favorite programming language and register it in Spark or use alternative SQL option to check numeric values.
In the subsequent chapters, we will check different methods that you can use in Spark SQL to check numeric data.
Spark SQL isnumeric Alternative
As mentioned earlier, Spark SQL does not provide support to isnumeric function. In another article, we have seen Hive isnumeric function alternatives. You can use those methods such as RLIKE or CAST function to filter out records or convert to numeric types.
You can also write user defined function and register that in Spark so that you can directly use that in your Spark SQL statements. In this article, we will write UDF using pyspark.
Create Python UDF to Check Numeric Value
You have to create python user defined function on pyspark terminal that you want to register in Spark.
For example, consider below user defined function.
def is_numeric(s):
try:
float(s)
return True
except ValueError:
return False
The function will try to convert given string value to float. If the string is converted successfully, then True will be returned, and any value error False will be returned.
Import Spark Data Type
The second steps is to import the Spark data type. The type should be same as the one above created function returns.
For example, in our example, function returns Boolean value.
from pyspark.sql.types import BooleanType
Register isnumeric Function into Spark
The final step is to register the python function into spark. Use below command to register user defined function.
sqlContext.udf.register("is_numeric_type", is_numeric, BooleanType())
Spark SQL is numeric Check
Now, use above registered function in your Spark SQL function to check numeric value.
For example,
sqlContext.sql("select is_numeric_type('1.234')").show()
+----+
| _c0|
+----+
|true|
+----+
sqlContext.sql("select is_numeric_type('1234.234')").show()
+----+
| _c0|
+----+
|true|
+----+
sqlContext.sql("select is_numeric_type('12a123.2')").show()
+-----+
| _c0|
+-----+
|false|
+-----+
Related Articles
- How to Create Spark SQL User Defined Functions? Example
- Create Spark SQL isdate Function – Date Validation
Hope this helps 🙂