timescaledb性能测试 - 刘达的博客

link:https://github.com/timescale/timescaledb[timescaledb]是postgresql数据库的一个插件，通过hypertable实现对时间序列数据库进行分块，通过把大表分为多个小表的方式，新的数据总是插入到最新的块上，由于进行了分块，使得索引数据变小，保持索引一直处于>内存当中，不用去磁盘交换数据，因此加快了数据的插入速度。timescaledb官方提供了数据库性能测试工具和数据，可参考link:https://github.com/timescale/benchmark-postgres[benchmark-postgres]。官方也有link:https://blog.timescale.com/timescaledb-vs-6a696248104e[测试报告]，随着数据量的增加，数据插入的性能一直处于稳定状态，而相比之下，postgresql则表现为指数级下降。本测试采用官方提供的数据和程序，测试环境为阿里云ecs.g5.large云服务器，具体配置如下： * 2核cpu * 8GB内存 * 两块100GB ssd云盘，1800 IOPS * 系统为Debian 9.2 ## 测试步骤：分别挂在两块ssd云盘到/data和/data1。安装postgresql-9.6，安装timescaledb扩展。修改配置，修改配置： ```ini shared_preload_libraries = 'timescaledb' data_directory = ‘/data/pgdb’ shared_buffers = 2GB ``` 保存后重启postgresql。具体步骤参考相应的文档。下载官方提供的link:https://timescaledata.blob.core.windows.net/datasets/benchmark_postgres.tar.bz2[数据]并解压到/data1目录下，cpu-data.csv中包含1亿条数据，每条数据格式如下： ```shell 2016-01-03 23:59:30+00,host_1999,80.6521114433705,60.9865370795396,38.7496729040943,11.5620089238801,47.2124408432341,82.4793308352443,41.6463534391737,85.2477509705689,53.843722695106,52.3795963946359 ``` 下载测试程序，安装go环境，并编译3个程序，并将测试程序拷贝到/data1目录下。创建benchmark数据，并初始化cpu_ts和cpu_pg表： ```shell postgres@debian:/data1$ psql -c 'CREATE DATABASE benchmark;' postgres@debian:/data1$ psql -d benchmark < benchmark-setup-timescaledb.sql postgres@debian:/data1$ psql -d benchmark < benchmark-setup-postgresql.sql ``` 分别想cpu_pg和cpu_ts数据库插入数据： ``` postgres@debian:/data1$ ./copy --db-name=benchmark --table=cpu_pg --verbose --reporting-period=30s --file=cpu-data.csv --connection='host=localhost user=postgres password=postgres' postgres@debian:/data1$ ./copy --db-name=benchmark --table=cpu_ts --verbose --reporting-period=30s --file=cpu-data.csv --connection='host=localhost user=postgres password=postgres' ``` 我们分别得到普通表和hypertable表插入数据的性能报告，link:/wp-content/uploads/2018/04/timescale-copy-cpu-test.txt[点击获取]。可以看到插入到普通表花费了2h39m，而hypertable表一共花费了1h10m。通过图表生成工具，得到如下图： image::/wp-content/uploads/2018/04/2018-04-02-22-13-36屏幕截图.png[,width="746px",height="390px"] 从图表中我们可以看出cpu_pg表插入性能随着数据量的变大而一直下降，而timescaledb插入数据的性能呈现周期性的变化。出现周期性变化的原因，初步分析是postgresql的buffer不够大，即使hypertable将表分成了小的块，但随着数据的插入，每个块上的索引数量增多，由于cpu和内存的限制，性能逐步下载，而timescale通过一定的算法，又创建了新的块，从而性能又恢复，通过查询timescale的区块信息我们便可验证： ``` benchmark=# select * from chunk_relation_size('cpu_ts'); ``` 由于清空了表，这些信息都不在了，但是查询结果中显示的是6个块，因此验证了这个推断。后续又通过增加worker的数量到4，将数据插入的时间缩短到了47分钟，最低速率为26837条/秒，平均速率为36560条/秒，由此可见timescale对于时间序列数据的提升非常明显，在加上postgres自身的分区，和timescale正在开发的集群支持，效率会有数倍甚至数十倍的提升。总结得到，作为时间序列数据库，选用postgresql + timescaledb是一个非常不错的选择。既有关系型数据库的优点，又有nosql（json的支持）的性能。对于监控，物联网等，做到支持几十万台设备应该会很容易。如果你在做时间序列数据库的选型，不妨试试timescaledb。

发表回复 取消回复

发表回复取消回复