Elasticsearch TSDS vs. Regular Data Stream and Index

Vakhtang Matskeplishvili
3 min readJul 15, 2024

--

Forwarding to new technology in an existing big system cluster can take significant time and be quite complex. Therefore, it is essential to understand the advantages and disadvantages of this step before proceeding.
In our project “DBeast Monitor” for Elastic Stack , we collect many metrics, and this fact led us to consider the possibility of migrating to TSDS.

In this article, I will share my comparison of TSDS, Data Stream, and Index, including a benchmark I conducted.

TSDS

Time Series Data Stream (TSDS) is a specialized data storage format optimized for time-based data. TSDS allows for efficient storage, retrieval, and analysis of time-series data by leveraging data structures and indexing techniques tailored to temporal data. It is particularly beneficial for applications requiring high-frequency data logging and retrieval. TSDS is ideal for metric data such as CPU usage, stock prices, or weather sensor readings, where data points are consistently time-stamped and must be stored sequentially.

TSDS Documentation

Advantages of TSDS:

  • Efficient Storage and Query Performance: TSDS optimizes storage and retrieval for time-series data, providing high performance for temporal queries.
  • Automatic Index Management: TSDS simplifies time-series data management by automating index rollover and retention policies.
  • Data Compression: TSDS supports data compression, which significantly reduces storage requirements. This can lead to storage savings.
  • Downsampling: TSDS supports downsampling, which allows you to reduce the granularity of stored data to save disk space and improve query performance. Downsampling converts a set of high-frequency data points into a summary document with aggregated values such as sum, max, min, and average.

Disadvantages of TSDS:

  • Limited Field Type Support: TSDS does not support certain field types, which can be a significant limitation for applications requiring a wide variety of data types. Supported types include a keyword, ip, byte, short, integer, long, and unsigned_long for dimensions and numeric and aggregate_metric_double for metrics. Unsupported field types include text, nested, object, and geo_point
  • Complexity in Migration: Migrating existing data to TSDS can be complex, mainly if the data includes unsupported field types or the system architecture relies heavily on regular indices.

When to Use TSDS:

  • Storing Metrics Data: Ideal for capturing and storing numerical data that changes over time, such as CPU usage, stock prices, or weather sensor readings.
  • Near Real-Time Data: Suited for data that needs ingested and queried in near real-time.

Comparison Table:

Benchmark

We used an index with 6.4 million documents containing various metrics sampled once per minute for the benchmark. We reindexed the data to both Data Stream and TSDS. After reindexing, each index was forced to merge into one segment. Here are the results of the storage used by the index:

  • Regular Index: 437MB
  • Data Stream: 429MB
  • TSDS: 158MB

Additionally, we tested TSDS downsampling with a 10-minute frame, resulting in a dataset with 640,000 documents and a size of 40.2MB. The larger result data set size may be due to the smaller index size (data compression issue).

Summary

Migrating to TSDS can provide substantial storage savings and improve query performance for time-series data. For projects where we handle extensive metric data, TSDS offers a compelling advantage. However, be mindful of the limitations in field type support, which may impact the feasibility of migrating all your data types. While the migration process requires careful planning, the benefits of reduced storage and improved efficiency make it worthwhile. If you manage similar indices, consider migrating to TSDS to optimize your data handling capabilities.

Feel free to reach out for further consultation or to discuss your project needs.

--

--

Vakhtang Matskeplishvili
Vakhtang Matskeplishvili

Written by Vakhtang Matskeplishvili

Try my open-source applications for Elasticsearch on my site: https://dbeast.co

Responses (1)