Improving the performance of Logstash persistent queues

This article is available at: https://www.elastic.co/blog/using-parallel-logstash-pipelines-to-improve-persistent-queue-performance

4 thoughts on “Improving the performance of Logstash persistent queues”

Jack says:

September 24, 2019 at 6:35 pm

I don’t exactly understand what you mean by:
“….This was accomplished by splitting the source data into multiple streams, running multiple pipelines in parallel within a Logstash process, and targeting each stream at a different pipeline….”
Do you mean you have increased the number of workers? Of do you mean you some how split one big stream of data into multiple before they enter a pipeline?
Can you provide an example of your logstash.yml, pipelines.yml and a path.config file where you ‘split’ the source data?
My situation: lots of winbeatsagents filling up one pipeline, with mediocre logstatsh EPS performance.

LikeLike

Reply
1. Alexander Marquardt says:
  
  September 24, 2019 at 9:01 pm
  
  ” do you mean you some how split one big stream of data into multiple before they enter a pipeline” – Yes.
  
  For example, if you are using something like filebeat you can specify multiple logstash output destinations which could be parallel pipelines, and which would load balance between the destinations – eg. https://www.elastic.co/guide/en/beats/filebeat/current/load-balancing.html. If you are not using filebeat, then you might have another way of targeting some of your data at one logstash pipeline, and other data at a different logstash pipeline. Does that make sense?
  
  LikeLike
  
  Reply
  1. Jack says:
    
    September 25, 2019 at 9:18 am
    
    Thanks this helped ! it Almost doubled our Events Received Rate (/s) and Events Emitted Rate (/s). When we added two more pipelines doing the exact same thing.
    CPU Utilization (%) went from +-40% to 50%. I think we should even add more pipelines, but for now let’s see what happens in the long run.
    This should be added to the tuning guide here: https://www.elastic.co/guide/en/logstash/current/tuning-logstash.html and here: https://www.elastic.co/guide/en/logstash/current/performance-troubleshooting.html
    
    Just to explain for anyone reading this:
    What we did, was create 2 extra beats pipelines on the same logstash machine.
    1. Add 2 additional pipelines in pipeline.yml
    – pipeline.id: beats
    path.config: “/etc/path/to/beats.config”
    – pipeline.id: beats2
    path.config: “/etc/path/to/beats2.config”
    – pipeline.id: beats3
    path.config: “/etc/path/to/beats3.config”
    2. Copy beats.config as beats2.config & beats3.config where we only modified the (listening) port for each pipeline (so 3 different ports). The rest (filter, output) was untouched. Beatsn.config change:
    input {
    beats {
    port => XXXXn
    3. Modify the winlogbeats config on the machines where we collect logging, so it outputs to two pipelines (ports) on the same machine and enable load balancing (true). winlogbeat.yml change:
    output.logstash:
    hosts: [“x.x.x.x:xxxn”, “x.x.x.x:xxxn+1”]
    loadbalance: true
    
    LikeLike
Jack says:

September 30, 2019 at 6:05 am

Eventually we ended up creating 5 parallel pipelines to triple our throughput.
But more important I would like to send people in the direction of pipeline to pipeline communication. Because sending logs of all different sources to one loaded pipeline is a challenge, but splitting it up to different indices AND keeping your conf files clean and sorted is another thing. So people should definitely look into to pipeline-to-pipeline communication when they connect loads of Beats agents. (Send Tags with the Beats client to identify every category/type of log source)

https://www.elastic.co/guide/en/logstash/current/pipeline-to-pipeline.html

LikeLike

Reply

Share this:

4 thoughts on “Improving the performance of Logstash persistent queues”

Leave a comment Cancel reply