Apache Spark SQL Bucketing Support – Explanation

Spark SQL supports clustering column values using bucketing concept. Bucketing and partition is similar to that of Hive concept, but with syntax change. In this article, we will check Apache Spark SQL Bucketing support in different versions of Spark. In this article, we will concentrate only on the Spark SQL DDL changes. On applying bucketing on DataFrame, go through the article. Apache Spark SQL Bucketing Support Bucketing is an optimization technique in Spark SQL that uses buckets and bucketing columns to determine data partitioning. The bucketing concept is one of the optimization technique that use bucketing to…

Continue ReadingApache Spark SQL Bucketing Support – Explanation
Comments Off on Apache Spark SQL Bucketing Support – Explanation