Enriching data with the Logstash translate filter

March 6, 2020 Introduction Logstash is an open source, server-side data processing pipeline that ingests data from a multitude of sources, transforms it, and then sends it to one or more outputs. One use of Logstash is for enriching data before sending it to Elasticsearch. Logstash supports several different lookup plugin filters that can be used for enriching data. Many of these rely on components that are external to the Logstash pipeline for storing enrichment data. On the other hand, the translate filter plugin can be used for looking up data and enriching documents without dependencies. Therefore, in this blog article I focus on using Logstash with the translate filter plugin for enriching data. ...

March 6, 2020

How to create maintainable and reusable logstash pipelines

This article is available at: https://www.elastic.co/blog/how-to-create-maintainable-and-reusable-logstash-pipelines

February 26, 2020

Using Logstash and Elasticsearch scripted upserts to transform eCommerce purchasing data

Introduction Logstash is a tool that can be used to collect, process, and forward events to Elasticsearch. In order to demonstrate the power of Logstash when used in conjunction with Elasticsearch’s scripted upserts, I will show you how to create a near-real-time entity-centric index. Once data is transformed into an entity-centric index, many kinds of analysis become possible with simple (cheap) queries rather than more computationally intensive aggregations. As a note, using the approach demonstrated here would result in documents similar to those generated by Elasticsearch transforms. Nevertheless, the technique that is documented has not been benchmarked against Elasticsearch transforms, as the main goal of this blog is to demonstrate the power and flexibility of Logstash combined with scripted upserts. ...

December 17, 2019

Emulating transactional functionality in Elasticsearch with two-phase commits

Introduction Elasticsearch supports atomic create, update, and delete operations at the individual document level, but does not have built-in support for multi-document transactions. Although Elasticsearch does not position itself as a system of record for storing data, in some cases it may be necessary to modify multiple documents as a single cohesive unit. Therefore, in this blog post we present a two-phase commit protocol which can be used to emulate multi-document transactions. ...

December 5, 2019

Converting local time to ISO 8601 time in Elasticsearch

This article is available at: https://www.elastic.co/blog/converting-local-time-to-iso-8601-time-in-elasticsearch

October 16, 2019

Es Local Indexer - using Elasticsearch for searching locally stored documents

Moved to: https://alexmarquardt.com/es-local-indexer-desktop-search-built-with-elasticsearch/

August 8, 2019

ES Local Indexer – Desktop search powered by Elasticsearch

August 7, 2019 Introduction Elasticsearch provides search functionality for some of the most important websites in the world including Wikimedia (i.e. Wikipedia), eBay, Yelp, Tinder, and many others. Elasticsearch is super scalable, which means that just as easily as it can be scaled it up for use in huge complex systems, it can also be scaled down for use in smaller projects. ES Local Indexer is a small desktop search application that runs on top of a local Elasticsearch installation. It indexes HTML documents into Elasticsearch and provides an intuitive browser-based interface for searching through the ingested documents. The ES Local Indexer project consists of two main components: ...

August 8, 2019

Counting unique beats agents sending data into Elasticsearch

Introduction When using Beats with Elasticsearch, it may be useful to keep track of how many unique agents are sending data into an Elasticsearch cluster, and how many documents each agent is submitting. Such information for example could be useful for detecting if beats agents are behaving as expected. In this blog post, I first discuss how to efficiently specify a filter for documents corresponding to a particular time range, followed by several methods for detecting how many beats agents are sending documents to Elasticsearch within the specified time range. ...

July 18, 2019

Improving the performance of Logstash persistent queues

This article is available at: https://www.elastic.co/blog/using-parallel-logstash-pipelines-to-improve-persistent-queue-performance

June 15, 2019

Debugging Elasticsearch and Lucene with IntelliJ IDEA

This article can be found at: https://www.elastic.co/blog/how-to-debug-elasticsearch-source-code-in-intellij-idea

February 2, 2019