Optimizing Logstash Data by Removing Null and Empty Fields
Introduction:
Logstash users often face the challenge of dealing with unclean data, where some fields may contain null, empty, or “Space only” values. These unnecessary data entries can increase memory usage, generate higher output traffic, and lead to inconsistencies in data handling.
Common Problems Caused by Null and Empty Fields:
- Data Inconsistencies: Without proper handling, these fields can lead to errors such as `mapper_parsing_exception` during data ingestion, especially when the data types do not match the expected field specifications.
- Increased Memory Usage and Output Traffic: Null or empty fields consume space and bandwidth, affecting system performance.
Example Scenario:
Consider an index with the following mapping:
{
"mappings": {
"properties": {
"@timestamp": {"type": "date"},
"ip": {"type": "ip"},
"message": {"type": "text"}
}
}
}
We have two log entries:
[2023–01–01 00:00:00][127.0.0.1] The first log
[2023–01–01 00:00:00][ ] The second log
Using the grok plugin, these entries are parsed into two documents. However, attempting to ingest the second document with an empty IP field results in a `mapper_parsing_exception` because the empty string cannot be converted to the IP data type.
Solution: Using Ruby Filter Plugin to Clean Data:
To efficiently remove null and empty fields in Logstash, the Ruby filter plugin can be employed. Below is an example of a Logstash configuration using Ruby code to recursively delete null, empty, or “Space only” fields:
input { stdin { codec => json }}
filter {
ruby {
code => '
def recursiveDelete(innerEvent, path, originalEvent)
if innerEvent.nil?
originalEvent.remove(path);
return true;
end
if innerEvent.is_a?(Hash)
innerEvent.each do |key, value|
recursiveDelete(value, path + "[" + key + "]", originalEvent);
end
if innerEvent.empty?
originalEvent.remove(path);
return true;
end
elsif innerEvent.is_a?(Array)
innerEvent.delete_if { |value| recursiveDelete(value, path, originalEvent) }
if innerEvent.empty?
originalEvent.remove(path);
return true;
end
elsif innerEvent.is_a?(String) && innerEvent.empty? || innerEvent =~ /\A[[:space:]]*\z/
originalEvent.remove(path);
return true;
end
end
recursiveDelete(event.to_hash, "", event);
'
}
}
output { stdout { codec => rubydebug }}
This script checks each field and sub-field within the document for null or empty values, removing them to ensure data cleanliness and consistency.
Conclusion
Optimizing data flow in Logstash by removing null and empty fields significantly enhances data processing efficiency and integrity. By integrating the Ruby filter plugin, users can maintain clean and consistent data streams, improving overall system performance and reliability.
Have a good Logstash usage!