About this site
This site brings together my technical writing on search, data pipelines, and distributed systems at enterprise scale. The work is informed by real production systems and reflects recurring patterns, trade-offs, and architectural decisions observed across system design, deployment, and long-term operation.
The material emphasizes architectural reasoning for large-scale systems. It examines how relevance models, reliability mechanisms, and ingest pipelines behave under real-world constraints, with attention to system-level decisions that shape scalability, maintainability, and long-term operational effort.
Elastic Stack
Search & Relevance
Influencing BM25 ranking with multiplicative boosting in Elasticsearch | Elasticsearch Labs (Dec 22, 2025)
Boosting e-commerce search by profit and popularity with the function score query in Elasticsearch | Elasticsearch Labs (Dec 17, 2025)
How to improve e-commerce search relevance with personalized cohort-aware ranking | Elasticsearch Labs (Dec 10, 2025)
Personalizing e-commerce search results based on purchase history in Elasticsearch (Sep 12, 2025)
Efficient bitwise matching of documents in Elasticsearch | Elasticsearch Labs (Oct 21, 2024)
Improve search relevance by combining Elasticsearch stemmers and synonyms | Elastic Blog (June 23, 2021)
Improving search relevance with boolean queries | Elastic Blog (May 26, 2020)
Performance, Reliability & Failure Analysis
How excessive replica counts can degrade performance, and what to do about it | Elasticsearch Labs (Dec 8, 2025)
Understanding and fixing “too many script compilations” errors in Elasticsearch (Oct 21, 2020)
Using slow logs in Elastic Cloud Enterprise (Apr 26, 2020)
Using parallel Logstash pipelines to improve persistent queue throughput | Elastic Blog (Nov 14, 2019)
Improving the performance of high-cardinality terms aggregations | Elastic Blog (May 9, 2019)
How to tune Elasticsearch for aggregation performance (Oct 2, 2018)
Ingest Architecture
Re-directing Elasticsearch documents with out-of-range timestamps that (would) fail to get written into Time Series Data Streams (Apr 16, 2024)
Driving Filebeat data into separate indices (uses legacy index templates) (March 15, 2021)
How to create maintainable and reusable Logstash pipelines | Elastic Blog (Feb 27, 2020)
Using Logstash to Split Data and Send it to Multiple Outputs | Elastic Blog (Jan 15, 2019)
Data Parsing & Structuring
Using Kibana’s Painless Lab (Beta) to test an ingest processor script (Nov 9, 2020)
Using Elasticsearch Painless scripting to recursively iterate through JSON fields (Nov 6, 2020)
Debugging broken grok expressions in Elasticsearch ingest processors | Elastic Blog (Sep 3, 2020)
Slow and steady: How to build custom grok patterns incrementally | Elastic Blog (Aug 26, 2020)
Structuring Elasticsearch data with grok on ingest for faster analytics | Elastic Blog (Jul 30, 2020)
Converting CSV to JSON in Filebeat (Mar 17, 2020)
Using Logstash prune capabilities to whitelist sub-documents (Aug 28, 2018)
Data Derivation & Enrichment
Using Logstash to scan inside event contents and to replace sensitive data with a consistent hash (Jan 20, 2022)
Using Logstash and Elasticsearch to calculate transaction duration in a microservices architecture (Sep 16, 2020)
How to enrich logs and metrics using an Elasticsearch ingest node | Elastic Blog (May 12, 2020)
Enriching data with the Logstash translate filter (Mar 6, 2020)
Using Logstash and Elasticsearch scripted upserts to transform eCommerce purchasing data (Dec 17, 2019)
End-to-End Systems & Reference Implementations
ES Local Indexer – Desktop search powered by Elasticsearch (Aug 7, 2019)
How to keep Elasticsearch synchronized with a relational database using Logstash | Elastic Blog (Jun 20, 2019)
Operational Foundations
Safely sample production data into pre-production environments with Logstash | Elastic Blog (Oct 1, 2024)
Automating the Import and Export of Kibana Saved Objects (May 3, 2024)
Calculating ingest lag and storing ingest time in Elasticsearch to improve observability | Elastic Blog (Jun 16, 2020)
Converting local time to ISO 8601 time in Elasticsearch | Elastic Blog (Nov 7, 2019)
Counting unique beats agents sending data into Elasticsearch (Jul 18, 2019)
Correctness, Consistency & Security
Emulating transactional functionality in Elasticsearch with two-phase commits (Dec 5, 2019)
Elasticsearch Security: Configure TLS/SSL & PKI Authentication | Elastic Blog (Dec 12, 2018)
How to Find and Remove Duplicate Documents in Elasticsearch | Elastic Blog (Dec 11, 2018)
System Behavior & Advanced Techniques
Using Elastic machine learning to detect anomalies in derivative values (Apr 21, 2020)
How to Debug Elasticsearch Source Code in IntelliJ IDEA | Elastic Blog (Feb 14, 2019)
Airbyte
Using the new Airbyte API to orchestrate Airbyte Cloud with Airflow (Mar 2, 2023)
The difference between Airbyte and Airflow (Feb 24, 2023)
Learn how to create an Airflow DAG (directed acyclic graph) that triggers Airbyte synchronizations (Feb 8, 2023)
You have collected unstructured data! Now what? (Jan 11, 2022)
Data Warehouse vs. Operational Database! What? How? Which One? (Dec 16, 2022)
What is an ELT data pipeline? (Nov 17, 2022)
EtLT for improved GDPR compliance (Oct 20, 2022)
An overview of Airbyte’s replication modes (Oct 7, 2022)
Explore Airbyte’s Change Data Capture (CDC) synchronization (Sep 29, 2022)
Explore Airbyte’s incremental data synchronization (Sep 8, 2022)
Explore Airbyte’s full refresh data synchronization (Aug 2, 2022)
Build a connector to extract data from the Webflow API (June 29, 2022)
Data Integration Guide: Techniques, Technologies, and Tools (May 19, 2022)
MongoDB
Trade-offs to consider when storing binary data in MongoDB (Mar 2, 2017)
How to generate unique identifiers for use with MongoDB (Jan 30, 2017)
How to manually perform a point in time restore in MongoDB (Jan 25, 2017)