Elastic Stack

Ranking by Profit and Popularity in Elasticsearch

Moved to Elastic’s blog - https://www.elastic.co/search-labs/blog/function-score-query-boosting-profit-popularity-elasticsearch

Personalizing e-commerce search results based on purchase history in Elasticsearch (Without a need for Machine Learning Post Processing)

Whether you’re looking for a product in an online store, an article in a news archive, or a file in a company knowledge base, the quality of the search experience determines how quickly you find what you need. Behind the scenes, many of these systems are powered by Elasticsearch, a popular open-source search engine designed to handle large volumes of data and return relevant results in milliseconds. At its core, Elasticsearch matches user queries against text fields and ranks results using relevance scoring. But search doesn’t have to stop there. That’s where personalization comes in. By incorporating signals such as past purchases, browsing behavior, or recent activity, search results can be adjusted so the items most relevant to you appear higher. For example, if two people both search for “chips”, one might see classic potato chips at the top, while the other sees crispy thin-cut chips, depending on their history. ...

Efficient bitwise matching of documents in Elasticsearch

This article has now been published on Elastic’s website. Please check it out at: https://www.elastic.co/search-labs/blog/efficient-bitwise-matching-in-elasticsearch

Automating the Import and Export of Kibana Saved Objects

Introduction Kibana is an open-source data visualization and exploration tool used for log and time-series analytics, application monitoring, and operational intelligence use cases. It offers powerful and easy-to-use features that allow users to visualize data from Elasticsearch in various formats such as charts, tables, and maps. While Kibana offers a robust user interface for managing many tasks, certain operations can become tedious and time-consuming when done manually, especially for operations teams managing large and complex environments. One such operation is the migration of Kibana spaces and objects between environments—a task that can be critical in scenarios where clients cannot utilize the snapshot/restore functionality provided by Elasticsearch. ...

Re-directing Elasticsearch documents with out-of-range timestamps that (would) fail to get written into Time Series Data Streams

Introduction Elasticsearch Time Series Data Streams (TSDS) are designed to provide an efficient and scalable way to handle time-based data within the Elasticsearch ecosystem. This feature is specifically optimized for storing, searching, and managing time-series data such as metrics, and events, where data is continuously indexed in chronological order. However, if events arrive with timestamps that fall outside of a pre-defined range, they will be lost. In this blog I will demonstrate logic that can be added to an Elasticsearch ingest pipeline which can be used to intercept documents that would be rejected by the TSDS index due to timestamp range issues, and to instead redirect them to a “failed” index. The documents that are redirected to the “failed” index may (for example) be used to raise alerts and examined. ...

Using Logstash to scan inside event contents to replace sensitive data with a consistent hash

Introduction Logstash is commonly used for transforming data before it is sent to another system for storage, and so it is often well positioned for finding and replacing sensitive text, as may be required for GDPR compliance. Therefore, in this blog I show how Logstash can make use of a ruby filter to scan through the contents of an event and to replace each occurrence of sensitive text with the value of its hash. ...

Combining Elasticsearch stemmers and synonyms to improve search relevance

This is now published on Elastic’s official blog. Please check it out at: https://www.elastic.co/blog/improve-search-relevance-by-combining-elasticsearch-stemmers-and-synonyms

Driving Filebeat data into separate indices (uses legacy index templates)

Introduction When driving data into Elasticsearch from Filebeat, the default behaviour is for all data to be sent into the same destination index regardless of the source of the data. This may not always be desirable since data from different sources may have different access requirements , different retention policies, or different ingest processing requirements. In this post, we’ll use Filebeat to send data from separate sources into multiple indices, and then we’ll use index lifecycle management (ILM), legacy index templates, and a custom ingest pipeline to further control that data. ...

Using Kibana's Painless Lab (Beta) to test an ingest processor script

Introduction In several previous blog posts I have shown how a Painless script can be used to process new documents as they are ingested into an Elasticsearch cluster. In each of these posts I have made use of the simulate pipeline API to test the Painless scripts. While developing such scripts, it may be helpful to use Painless Lab (Beta) in Kibana to debug Painless scripts. In this blog I will show how to use Painless Lab to develop and debug custom scripts, and then show how these can be then easily copied into ingest pipelines. ...

Using Elasticsearch Painless scripting to recursively iterate through JSON fields

Authors Alexander Marquardt Honza Kral Introduction Painless is a simple, secure scripting language designed specifically for use with Elasticsearch. It is the default scripting language for Elasticsearch and can safely be used for inline and stored scripts. In one of its many use cases, Painless can modify documents as they are ingested into your Elasticsearch cluster. In this use case, you may find that you would like to use Painless to evaluate every field in each document that is received by Elasticsearch. However, because of the hierarchical nature of JSON documents, how to iterate over all of the fields may be non-obvious. ...