This article is available at: https://www.elastic.co/blog/using-parallel-logstash-pipelines-to-improve-persistent-queue-performance
This article is available at: https://www.elastic.co/blog/using-parallel-logstash-pipelines-to-improve-persistent-queue-performance
I don’t exactly understand what you mean by:
“….This was accomplished by splitting the source data into multiple streams, running multiple pipelines in parallel within a Logstash process, and targeting each stream at a different pipeline….”
Do you mean you have increased the number of workers? Of do you mean you some how split one big stream of data into multiple before they enter a pipeline?
Can you provide an example of your logstash.yml, pipelines.yml and a path.config file where you ‘split’ the source data?
My situation: lots of winbeatsagents filling up one pipeline, with mediocre logstatsh EPS performance.
LikeLike
” do you mean you some how split one big stream of data into multiple before they enter a pipeline” – Yes.
For example, if you are using something like filebeat you can specify multiple logstash output destinations which could be parallel pipelines, and which would load balance between the destinations – eg. https://www.elastic.co/guide/en/beats/filebeat/current/load-balancing.html. If you are not using filebeat, then you might have another way of targeting some of your data at one logstash pipeline, and other data at a different logstash pipeline. Does that make sense?
LikeLike
Thanks this helped ! it Almost doubled our Events Received Rate (/s) and Events Emitted Rate (/s). When we added two more pipelines doing the exact same thing.
CPU Utilization (%) went from +-40% to 50%. I think we should even add more pipelines, but for now let’s see what happens in the long run.
This should be added to the tuning guide here: https://www.elastic.co/guide/en/logstash/current/tuning-logstash.html and here: https://www.elastic.co/guide/en/logstash/current/performance-troubleshooting.html
Just to explain for anyone reading this:
What we did, was create 2 extra beats pipelines on the same logstash machine.
1. Add 2 additional pipelines in pipeline.yml
– pipeline.id: beats
path.config: “/etc/path/to/beats.config”
– pipeline.id: beats2
path.config: “/etc/path/to/beats2.config”
– pipeline.id: beats3
path.config: “/etc/path/to/beats3.config”
2. Copy beats.config as beats2.config & beats3.config where we only modified the (listening) port for each pipeline (so 3 different ports). The rest (filter, output) was untouched. Beatsn.config change:
input {
beats {
port => XXXXn
3. Modify the winlogbeats config on the machines where we collect logging, so it outputs to two pipelines (ports) on the same machine and enable load balancing (true). winlogbeat.yml change:
output.logstash:
hosts: [“x.x.x.x:xxxn”, “x.x.x.x:xxxn+1”]
loadbalance: true
LikeLike
Eventually we ended up creating 5 parallel pipelines to triple our throughput.
But more important I would like to send people in the direction of pipeline to pipeline communication. Because sending logs of all different sources to one loaded pipeline is a challenge, but splitting it up to different indices AND keeping your conf files clean and sorted is another thing. So people should definitely look into to pipeline-to-pipeline communication when they connect loads of Beats agents. (Send Tags with the Beats client to identify every category/type of log source)
https://www.elastic.co/guide/en/logstash/current/pipeline-to-pipeline.html
LikeLike