How to Find Tables Size in Spark SQL? – Scala Example

  • Post author:
  • Post last modified:June 28, 2021
  • Post category:Apache Spark
  • Reading time:3 mins read

Be it relational database, Hive, or Spark SQL, Finding the table size is one of the common requirements. Relational databases such as Snowflake, Teradata, etc support system tables. You can use those system tables to identify the size of tables. But, there are no system tables in Spark SQL. You can make use of the Spark catalog API to find the tables size in the Spark SQL database.

Find Tables Size in Spark SQL

Starting version 2.0, Spark supports catalog API. It has many useful methods such as listtables, listdatabases, etc. You can read more about those API’s in my other post – Search Table in Spark Database.

You can determine the size of a table by calculating the total sum of the individual files within the underlying directory. You can also use queryExecution.analyzed.stats to return the size.

For example,

Following example return the size of the “customer” table.

spark.read.table("customer").queryExecution.analyzed.stats.sizeInBytes

and following is the output.

res0: BigInt = 13

Identify Size of all Tables Present in Spark SQL database

Along with Spark catalog API methods, you can use queryExecution.analyzed.stats to return the size of all tables in the Spark SQL database.

Following example return the size of all tables present in “default” database. Feel free to change database name as per your requirements.

import org.apache.spark.sql.SparkSession

val spark = SparkSession
  .builder()
  .appName("Spark SQL Find Table Size")
  .getOrCreate()

val db_name = "default"
val tb_list = spark.catalog.listTables(db_name).select("name")
for (row <- tb_list.rdd.collect)
{   
  println(row(0).toString + ": " + spark.read.table( row(0).toString ).queryExecution.analyzed.stats.sizeInBytes)
}

Sample output:

customer: 13 
date_dim: 300 
demo_data: 88

Related Articles,

Hope this helps 🙂