Skip to content

TiSpark Benchmark

shiyuhang0 edited this page Jul 6, 2022 · 4 revisions

Benchmark with TPC-H

Environment

Machine * 10
* CPU: 8 Intel Xeon Processor (Icelake)
* Memory: 32G
* Disk: 500G

TiDB 5.4.0: 3 TiDB + 3 TiKV + 1PD (TiDB and PD are in the same machine)

Spark 3.0.3 StandAlone: 1 master + 3 worker

Parallel Number

Parallel number depends on the total number of executor cores = 3*8 = 24

Write Benchmark

Write data from HDFS to TiDB with Data generated by TPC-H (ORDERS table)

TiSpark Write bechmark

Count(*) Data size Tasknumber Time(s)
1,500,000 164M 9 62
15,000,000 1.7G 23 396
150,000,000 17G 226 4722

Spark JDBC Write benchmark

Count(*) Data size Tasknumber Time(s)
1,500,000 164M 24 23
15,000,000 1.7G 24 244
150,000,000 17G 133 2483

Delete Benchmark

Delete data from TiDB with TiSpark (ORDERS table)

Count(*) Data size Tasknumber Time(s)
1,500,000 164M 3 31
15,000,000 1.7G 5 269
150,000,000 17G 33 3225

Select Benchmark

Select with TPCH 22 queries and table scan

  • Spark JDBC uses default config without partitionColumn, lowerBound, upperBound to partition the table
  • TiSpark will partition the table for us automatically
Query DataSize TiSpark(s) Spark JDBC(s)
TPC-H 22 queries 1G 131 157
TPC-H 22 queries 10G 424 1793 ( q21 OOM )
select * from orders 164M 5 10
select * from orders 1.7G 14 89

If you want to do a benchmark for TiSpark,here is a reference (Chinese only for now)

Benchmark with TPC-DS

Environment

Machine * 2
* CPU: 48c
* Memory: 187G

TiDB v6.0.0: 3 TiKV

Spark v3.1.3: Local Mode

the first machine run 2 TiKV, the second machine run 1 TiKV and 1 spark

Data

Load 50G TPC-DS Data to TiDB. See here for the detail of data load

Query

Some queries are not compatible with Spark SQL

  • change all the date_add(start_date, interval 30 day) to date_add(start_date, 30)
  • change alias from 'name' to `name`

BenchMark

Execute 99 TPC-DS query on 50G Data

storage total time(s)
TiSpark on TiKV 7504
TiSpark on TiFlash 2928 (Q5 Fail)

TPC-DS with large data

BenchMark

data tiflash on tidb tiflash on tispark env
1T 1672.783 4673.186 80C 512G+2*960SSD * 9
3T 1315.159 6162.302 ARM 80C 512G+SSD * 6
5T 1947.046 6162.302 80C 512G+2*960SSD * 10