How to mitigate hangovers

Introduction

Many natural products exist that are purported to be able to prevent hangovers, but unfortunately when one talks about natural hangover remedies it is difficult to find many well funded scientific studies that validate or refute such claims. This is at least in part because few if any companies are willing to invest millions of dollars investigating and running clinical trials to validate natural products that cannot then be patented to make a profit from.

In this article, I briefly discuss the main cause of hangovers, and will then discuss two natural supplements that can mitigate the negative impacts of drinking alcohol. This article refers to scientific publications wherever possible, but like many other articles on this topic it is also partially based on pseudo-science and personal experience.

What causes a hangover

In order to understand the cause of a hangover, it is necessary to first understand at a high-level the process how alcohol is broken down. As described in this report from the National Institute of Health, alcohol is broken down in two stages:

  1. Most of the ethanol (aka alcohol) is broken down in the liver, which transforms the ethanol into a toxic compound called acetaldehyde, a known carcinogen.
  2. Then, in a second step, acetaldehyde  (from step 1) is further metabolised by the liver down to another, less active byproduct called acetate. This acetate then is broken down into water and carbon dioxide for easy elimination.

As discussed in this article from Scientific American, most hangover symptoms are linked to elevated levels of acetaldehyde, and it specifically states the following:

That dreadful feeling the next day is the condition often called a hangover, which the journal Alcohol and Alcoholism characterizes as “general misery” with symptoms of drowsiness, concentration problems, dry mouth, dizziness, gastrointestinal complaints, sweating, nausea, hyperexcitability and anxiety. Most of these symptoms have been linked to elevated levels of acetaldehyde.

If acetaldehyde is the cause of hangovers, then the natural conclusion is that in order to reduce hangovers, one should try to reduce the amount of acetaldehyde in the body.  This can be achieved in the following ways:

  1. Drink less alcohol.
  2. Improve liver function to speed up the conversion of acetaldehyde into acetate.

For the remainder of this blog, we focus on option #2.

How to reduce acetaldehyde buildup caused by heavy drinking

Given that the breakdown of alcohol occurs primarily in the liver, if one can improve liver function then one could theoretically remove acetaldehyde from one’s system faster, and therefore mitigate the resultant hangover. It turns out that a naturally occurring product that improves liver function exists, and that it is readily and cheaply available. It is called milk thistle, and it has been used for thousands of years for improving liver function.

According to this article from the National Institute of Health,  the following has been shown about milk thistle:

  1. Evidence exists that milk thistle may be hepatoprotective (protects the liver) through a number of mechanisms: antioxidant activity, toxin blockade at the membrane level, enhanced protein synthesis, antifibriotic activity, and possible anti-inflammatory or immunomodulating effects.
  2. Among six studies of milk thistle and chronic alcoholic liver disease, four reported significant improvement in at least one measurement of liver function.

Given that milk thistle has been shown to improve liver function and to protect the liver, it stands to reason that if it is taken after drinking it should help metabolise acetaldehyde. Therefore, if one takes milk thistle after drinking and before going to bed, theoretically the amount of acetaldehyde in the body should be reduced while one sleeps, and the hangover should therefore be reduced.

How to reduce that lethargic feeling after drinking

Alcohol is known to reduce vitamin B concentrations as reported in this article. Additionally, low vitamin B levels can cause anaemia as documented in this article. Therefore, taking vitamin B supplements can reduce the feelings of lethargy if one’s vitamin B levels have been depleted from excess alcohol consumption.

Conclusions

Two cheap and readily available supplements that can reduce hangovers and the effects of excessive alcohol consumption are milk thistle, and vitamin B. If these supplements are taken before going to sleep after a night of drinking, they should help mitigate the hangover that would have otherwise been experienced. It is important to note that these supplements must be taken before going to bed – by morning it may be too late.

Financial implications of exercising share options

Disclaimer

I am not an accountant and this article should not be considered as financial or tax advice. I am providing analysis and calculations which may be used at your own peril. This article is written to demonstrate basic concepts, and does not account for country-specific tax laws or company-specific share option details. Your individual situation may invalidate some or all of the arguments and/or calculations made in this blog post.

Introduction

In many jurisdictions when share options are exercised, the income from such an exercise is taxed at the normal income tax rate. On the other hand, gains on shares (not options) are generally taxed at the more favourable capital gains rate. One therefore might assume that it is beneficial from a taxation perspective to convert company-issued share options into shares. In this blog we analyse the total profit generated from different strategies related to when and how share options are exercised, and will demonstrate that holding on to share options for as long as possible is likely a better strategy than converting options to shares.

The purpose of this blog is to demonstrate a thought process and methodology that can then be extended to account for country-specific tax rates, and country-specific tax benefits. It does not claim to give a universal answer on whether share options should be exercised to convert into shares.

What is a share option

A share option (aka stock option) is the right to buy  shares in a company at a fixed price. For example, if one has been issued 1000 share options with a strike price of $10, this means that at some point in the future one can buy up to 1000 shares for $10 each. If those shares are then trading at $100, then each share option would be worth $90, as determined by the market price at that time minus strike price. Another way of thinking about this is if someone can buy something for $10 that they can immediately sell for $100, then whatever it is that they are buying is worth $90. In our example, the 1000 share options would therefore have a value of $90,000.

Assumptions

In the remainder of this blog there are some simplifying assumptions made:

  1. The value of shares of the company that you have share options for will continue to increase.
  2. Tax is incurred when a share option is exercised.
  3. The rate of taxation on gains from a share exercise is higher than the rate of taxation on capital gains.

Country-specific tax considerations and/or benefits are out of scope of this blog.

What does it mean to exercise a share option

Exercising a share option refers to the act of paying the strike price to convert the share option into a share. Continuing with the previous example, we saw that 1000 options to buy shares for $10 would have a value of $90,00, assuming that the shares are currently worth $100 each. However, if we go ahead and actually exercise the shares, then we have triggered a taxable event of $90,000.

The $90,000 benefit from exercising the share options would generally considered income, and will normally be taxed at one’s standard income tax rate.  For example, if one is in the 40% income tax bracket then one would have to pay $36,000 in tax in order to be allowed to hold on to one’s shares. In order to convert these share options into shares, the total cost is the strike price of $10,000 plus the $36,000 of income tax. After paying $46,000 one would have shares worth $100,000. One is therefore $54,000 ahead compared to if one had not been granted the stock options.

In order to avoid having to pay cash out-of-pocket in order to exercise a share, one also has the option of a cashless exercise – this is where the money to pay the tax and the exercise price is paid by exercising and immediately selling a portion of one’s share options to cover the strike price and taxes. In the above example, one would pay $46,000  to cover the exercise price plus tax, which can be paid by selling $46,000/$100=460 shares. In this approach, one would be left with 540 shares worth $100 each. As expected, in this scenario one is also $54,000 ahead than if one had not been granted stock options.

Is it a good idea to exercise share options as soon as possible?

If one expects the share price to continue to rise, then one may be tempted to exercise all share options to convert into shares, to take advantage of the lower tax rate applied to capital gains. In this section, we consider why this is unlikely to be a good strategy.

Let’s imagine a scenario where the share price continues to rise to $1000, and we are still employed by the same company. Would we be farther ahead financially if we had exercised our shares at $100 and later sell the shares at $1000, or just held on to the original share options and finally exercise and sell at $1000? The table below shows the difference between the two scenarios. The original formulas can be seen in the first tab of this spreadsheet.

Assumed income tax rate 40%
<– Assumes that taxes are due on exercise (may be country specific)
Assumed capital gains tax rate 18%
<– Even if zero, in the calculations below, holding options until sale is preferable.
Assumed number of share options 1000
Cashless exercise and hold Hold and exercise at time of final sale
Per share exercise price 10 10
Per share value at exercise 100 1,000
Per share value at sale 1,000 1,000
Taxable income at exercise 90,000 990,000
Exercise cost (strike price * number shares) 10,000 10,000
Tax on exercise (income tax rate * taxable income) 36,000 396,000
Cost of cashless exercise (tax + exercise price) 46,000 406,000
Number of shares to sell to pay cost of exercise 460 406
Number of shares owned after exercise 540 594
Capital gain 486,000 0
Tax on capital gain 87,480 0
Pre-tax gain 576,000 990,000
Total tax per share (income tax + capital gain tax) 123,480 396,000
Total profit after tax 452,520 594,000

Notice that holding onto the share options for as long as possible has resulted in a greater profit than exercising the share options earlier and benefitting from a lower capital gains rate. In-fact, even if the capital gains rate is set to zero, in a scenario where the future value of the shares is $1000, it is still more profitable to hold on to share options rather than exercising to convert to shares.

This happens because in many countries, at the moment that one exercises shares, one incurs an immediate tax liability. Additionally, one also has to pay the strike price. In our example of a cashless exercise of 1000 shares at $100, the number of shares left after paying for exercise costs is only 540 – and therefore instead of enjoying growth on 1000 share options, one would instead only enjoy growth on 540 shares. Because of this reduction in the number of shares after a cashless exercise, the amount of growth is dramatically reduced versus the growth that would have been experienced if original options had been held.

On the other hand, the tax has indeed been dramatically reduced by performing a cashless exercise at $100 rather than waiting and exercising at $1000. However, the amount of tax savings does not compensate for the lost growth.

The above may lead one to conclude that they should therefore use cash to exercise their share options. We will investigate this in the next section.

Should share options be exercised with cash

If one has extra cash around, they may believe it would be a good idea to use cash to pay the cost of exercising their share options, rather than executing a cashless exercise and hold. This is likely true, and if the share price continues to rise then it would result in more profit than a cashless exercise and hold.

However, if one believes in their company enough to wish to invest cash, then they should consider if it is best to use that cash to exercise their existing share options to convert them to shares, or if it would instead be better to just buy additional shares on the open market with that cash. We therefore compare these two scenarios in the table below. Original calculations can be seen in the second tab of this spreadsheet.

Assumed income tax rate 40%
<– Assumes that taxes are due on exercise (may be country specific)
Assumed capital gains tax rate 18%
<– Even if zero, in the calculations below, using cash to buy more is better than using it to exercise
Assumed number of share options 1000
Pay cash to exercise Wait until sale to exercise, and instead buy more shares
Per share exercise price 10 10
Per share value at exercise 100 1000
Per share value at sale 1000 1000
Taxable income at exercise 90,000 990,000
Exercise cost (strike price * number shares) 10,000 10,000
Tax on exercise (income tax rate * taxable income) 36,000 396,000
Cost of cash exercise (tax + exercise price) 46,000 0
Cash paid (purchase more shares) 0 46,000
Sale value of additional shares (purchased in lieu of cash-exercise) 0 460,000
Capital gain 900,000 414,000
Tax on capital gain 162,000 74,520
Pre-tax gain 990,000 1,404,000
Total tax (income tax + capital gain tax) 198,000 470,520
Total profit after tax 792,000 933,480

Notice that the above calculations demonstrate that it is more beneficial to buy shares on the open market rather than using that same cash to exercise and hold shares. This is true even if the capital gains rate is zero. Again, this is due to the loss of future growth on any amount that has been paid in tax as well as any future growth on the cash that was used to pay the strike price.

Caveats

In pre-IPO companies where it is not possible to allocate cash to buying additional shares, it is likely beneficial from a tax perspective to early exercise such share options to convert them to shares. This is because the alternative of buying additional shares on the open market is not an option.

Share options generally expire if one leaves their employer. Therefore share options should be exercised before they expire and become worthless.

Share options likely expire a certain number of years after their grant, and should be exercised before they expire. The following article provides additional information related to exercising stock options: https://kellblog.com/2019/08/18/avoiding-the-ten-year-stock-option-trap/.

Conclusions

Based on the above calculations, if one wants to maintain their investment in their publicly listed company, and have the expectation that their company stock price will go up, then it would generally be financially beneficial to hold on to stock options as long as possible rather than exercising those options to convert them to stock. If one wishes to invest additional cash, then it is likely better allocated buying additional shares rather than to exercise options.

Disclaimer: this should not be considered as financial or tax advice, and your individual tax circumstances may differ. In the above calculations we disregard any country-specific tax laws that may increase the attractiveness of exercising share options.

 

Counting unique beats agents sending data into Elasticsearch

Introduction

When using Beats with Elasticsearch, it may be useful to keep track of how many unique agents are sending data into an Elasticsearch cluster, and how many documents each agent is submitting. Such information for example could be useful for detecting if beats agents are behaving as expected.

In this blog post, I first discuss how to efficiently specify a filter for documents corresponding to a particular time range, followed by several methods for detecting how many beats agents are sending documents to Elasticsearch within the specified time range.

How to filter for documents in a specific time range

This section describes how to efficiently filter for documents from a particular time range. In the following example, we filter for documents that were received yesterday:

GET filebeat-*/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "@timestamp": {
            "gte": "now-1d/d",
            "lt" : "now/d"
          }
        }
      }
    }
  },
  "sort": [{"@timestamp": "desc"}]
}

Notice that we use the filter context for the range query. Using the filter context is efficient for two reasons:

  1. Operations inside a filter context must answer a yes/no question – either documents fall into the time range or they do not. Because this is a yes/no question, a _score is not computed when filtering documents like this.
  2. The data inside a filter can be efficiently cached by the Node Query Cache, which “caches queries which are being used in a filter context”.

It is worth highlighting that if the parameters inside the filter are different on each query, then the results of the filter cannot be efficiently cached. This would be the case if the range that is being queried is continually changing. This may unintentionally occur if “now” is used inside a range query without any rounding.

In the above example we ensure that the filter can cache documents by using date math to round the range that we are searching in to the nearest day (as indicated by the “/d”). Compare this to the following which would give us all documents in the 24 hours prior to the current moment.

GET filebeat-*/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "@timestamp": {
            "gte": "now-1d",
            "lt" : "now"
          }
        }
      }
    }
  },
  "sort": [{"@timestamp": "desc"}]
}

Note that the above filter cannot be cached because “now” is changing at every millisecond.

A middle-ground may be to round to the nearest hour to allow the filter to be cached most of the time, except once per hour when the range is modified. Rounding to the nearest hour could be done as follows:

GET filebeat-*/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "@timestamp": {
            "gte": "now-1d/h",
            "lt" : "now/h"
          }
        }
      }
    }
  },
  "sort": [{"@timestamp": "desc"}]
}

Now that we have covered how to efficiently query for a documents in a particular time range, we are ready to demonstrate how to count the number of unique beats agents that are submitting documents to Elasticsearch.

A basic query to get a count of unique agents

To get a count of unique beats agents we can use a cardinality aggregation as shown below.

POST filebeat-*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": {
        "range": {
          "@timestamp": {
            "gte": "now-1d/d",
            "lt" : "now/d"          
          }
        }
      }
    }
  },
  "aggs" : {
      "unique_agent_id_count" : {
          "cardinality" : {
              "field" : "agent.id.keyword",
              "precision_threshold": 500 
          }
      }
  }
}

Note that we first filter documents by time (in this case documents from yesterday), and then execute the  cardinality aggregation on the filtered set of documents . Also notice that the size is set to 0 – this tells ES that we are not interested in seeing the actual documents that match the range query, we just want to see the results of the cardinality aggregation done across those documents.

Get an example document from each agent using field collapsing

The example below demonstrates how to use field collapsing to return the _source of a single document corresponding to each beats agent that submitted a document yesterday. Be aware that by default a search will only return 10 hits. In order to see all documents that match a given query the size should be increased, or if a large number of results are expected then pagination techniques should be used. In the example below we have set the size to 100, which will return up to 100 unique agents.

GET filebeat-*/_search
{
  "size" : 100,
  "query": {
    "bool": {
      "filter": {
        "range": {
          "@timestamp": {
            "gte": "now-1d/d",
            "lt": "now/d"
          }
        }
      }
    }
  },
  "collapse": {
    "field": "agent.id.keyword"
  },
  "sort": [
    {
      "@timestamp": "desc"
    }
  ]
}

Get an example document from each agent using a terms aggregation and top hits

We can use a terms aggregation and top hits aggregation to get each unique agent as well as a count of the number of documents submitted from each unique agent. Be aware that this code is likely less efficient than the above and may not be practical if a very large number of agents are reporting into Elasticsearch.

GET filebeat-*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": {
        "range": {
          "@timestamp": {
            "gte": "now-1d/d",
            "lt" : "now/d"
          }
        }
      }
    }
  },
  "aggs": {
    "unique_agents": {
      "terms": {
        "field": "agent.id.keyword",
        "size": 500,
      },
      "aggs": {
        "get_a_single_doc_for_each_unique_id": {
          "top_hits": {
            "sort": [
              {
                "@timestamp": {
                  "order": "desc"
                }
              }
            ],
            "size": 1
          }
        }
      }
    }
  }
}

There are three “size” settings in the code above:

  1. We have set the size to 0 for the query results – this just means that we don’t return the documents that match the range query, as we are only interested in the results of the aggregation.
  2. A terms aggregation by default will only return the top 10 hits. In the example above we have increased the size of the terms aggregation to 500. Be careful, as setting this to a very large value to handle a very large number of agents may be slow. For a very large number of agents, terms aggregations may become infeasible.
  3. Inside the top hits aggregations, we have specified a size of 1, meaning that a single document will be returned for each term.

Conclusion

In this blog, we have demonstrated how to ensure the best performance when filtering for documents, followed by several methods for detecting how many unique beats agents are submitting documents into an Elasticsearch cluster.

Improving the performance of Logstash persistent queues

By default, Logstash uses in-memory bounded queues between pipeline stages (inputs → pipeline workers) to buffer events. However, in order to protect against data loss during abnormal termination, Logstash has a persistent queue feature which can be enabled to store the message queue on disk. The queue sits between the input and filter stages as follows:

input → queue → filter + output

According to the following blog post, Logstash persistent queues should have a small impact on overall throughput. While this is likely true for use cases where the pipeline is CPU bound, it is not always the case.

Recently while consulting at a client for Elastic, enabling Logstash persistent queues caused a slowdown of about 75%, from about 40K events/s down to about 10K events/s. Somewhat surprisingly, based on disk I/O metrics it was clear that the disks were not saturated. Additionally, standard Logstash tuning techniques such as testing different batch sizes and adding more worker threads were unable to remedy this slowdown.

Investigations showed that the reason that throughput was limited is because a single Logstash pipeline runs a single-threaded persistent queue, or to put it another way, a single Logstash pipeline only drives data to disk from a single thread. This is true even if that pipeline has multiple inputs, as additional inputs in a single pipeline do not increase Disk I/O threads. Furthermore, because enabling the persistent queue adds synchronous disk I/O (wait time) into the pipeline, it reduces throughput even if none of the resources on the system are maxed-out.

Given that Logstash throughput was limited by synchronous disk I/O rather than resource constraints, more threads running in parallel were used to drive the disks harder and increase the overall throughput. This was accomplished by splitting the source data into multiple streams, running multiple pipelines in parallel within a Logstash process, and targeting each stream at a different pipeline.  After increasing the number of pipelines to 4 and splitting the input data across these 4 pipelines, Logstash performance with persistent queues increased up to about 30K events/s, or only 25% worse than without persistent queues. At this point the disks were saturated, and no further performance improvements were possible.

 

 

 

Debugging Elasticsearch and Lucene with IntelliJ IDEA

Now posted on the Elastic blog

January 14, 2018 update: This article has been published on Elastic’s website as: https://www.elastic.co/blog/how-to-debug-elasticsearch-source-code-in-intellij-idea.

Introduction

IntelliJ IDEA is a Java integrated development environment (IDE) for developing computer software. In this blog post, I discuss how to setup an IntelliJ IDEA project that will allow interactive debugging of Elasticsearch and Lucene source code.

The instructions presented in this blog have been tested on Mac OSX with IntelliJ IDEA 2018.3 (Community Edition), and OpenJDK 11.

Download Elasticsearch

Get a copy of the Elasticsearch source code from github as follows:

git clone https://github.com/elastic/elasticsearch.git

Checkout the branch for the Elasticsearch release that you want to debug.

cd elasticsearch
git checkout --track origin/6.6

Review text files included with the distribution

Within the “elasticsearch” directory, there are several text files that should be reviewed. In particular, “CONTRIBUTING.md” includes a description of the process for importing Elasticsearch code into an IntelliJ IDEA project, and “TESTING.asciidoc” describes ways to build and debug the code. The remainder of this blog post is based on the instructions in these files.

Configure the code for use with IntelliJ IDEA

The build system used by Elasticsearch is gradle, and at least Java 11 is required to build Elasticsearch gradle tools. Before executing gradlew, ensure that your JAVA_HOME environment variable is set correctly. For example my JAVA_HOME (on OSX) is set as follows:

JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk-11.0.2.jdk/Contents/Home
export JAVA_HOME 

Finally, execute the following command to configure Elasticsearch for use in an IntelliJ IDEA project.

./gradlew idea

The above command may take a few minutes to execute, and once it is complete, your project is ready to be imported into IntelliJ IDEA.

Import Elasticsearch into an IntelliJ IDEA project

1. Open IntelliJ Idea, and if you don’t have any other projects open, you will see a screen that looks like the image below. Click on “Import project”

 

Screen Shot 2019-02-02 at 8.56.52 PM

2. Open the “elasticsearch” directory that was created by the previously executed “git clone” command.

Screen Shot 2019-02-02 at 9.22.19 PM

3. Select “Import project from external model” -> “Gradle”, and the click on “Next”

Screen Shot 2019-02-02 at 9.28.06 PM

4. Select “Use default gradle wrapper (recommended)” and set “Gradle JVM” to version 11, as shown below. Then click on “Finish”.

Screen Shot 2019-02-02 at 9.28.36 PM

5. After completing the above steps, IntelliJ IDEA will start building the source code. The IntelliJ IDEA window should look similar to the image below once the build has completed.

Screen Shot 2019-02-02 at 9.36.27 PM

Start Elasticsearch for debugging

One way to debug Elasticsearch is to start the project in debug mode from the command line with the following command:

./gradlew run --debug-jvm

It may take a few minutes for the above process to fully start, at which point you can connect to the process from IntelliJ IDEA  by clicking on “Run” -> “Attach to Process” as shown below:

Screen Shot 2019-02-02 at 9.49.47 PM.png

This will allow you to select the process to attach to, which should look similar to the following:

Screen Shot 2019-02-02 at 9.53.27 PM

You should now be able to set breakpoints and step through both Elasticsearch and Lucene code.

Conclusion

In this blog post, I have demonstrated how to setup a project in IntelliJ IDEA that will allow interactive debugging of Elasticsearch and Lucene source code. You are now ready to dig deep into the internal workings of Elasticsearch!

 

A step-by-step guide to enabling security, TLS/SSL, and PKI authentication in Elasticsearch

Now posted on the Elastic blog

December 12, 2018 update: This article has been published on Elastic’s website as: https://www.elastic.co/blog/elasticsearch-security-configure-tls-ssl-pki-authentication

Introduction

When Elasticsearch security is enabled for a cluster that is running with a production license, the use of TLS/SSL for transport communications is obligatory and must be correctly setup. Additionally, once security has been enabled, all communications to an Elasticsearch cluster must be authenticated, including communications from Kibana and/or application servers.

The simplest way that Kibana and/or application servers can authenticate to an Elasticsearch cluster is by embedding a username and password in their configuration files or source code. However, in many organizations, it is forbidden to store usernames and passwords in such locations. In this case, one alternative is to use Public Key Infrastructure (PKI) (client certificates) for authenticating to an Elasticsearch cluster.

Configuring security along with TLS/SSL and PKI can seem daunting at first, and so this blog gives step-by-step instructions on how to: enable security; configure TLS/SSL; set passwords for built-in users; use PKI for authentication; and finally, how to authenticate Kibana to an Elasticsearch cluster using PKI.

Enabling security

In order to enable security it is necessary to have either a Gold or Platinum subscription, or a trial license enabled via Kibana or API. For example, the following command would enable a trial license via the API:

curl -X POST "localhost:9200/_xpack/license/start_trial?acknowledge=true"

Where localhost must be replaced with the name of a node in our Elasticsearch cluster.

After enabling a license, security can be enabled. We must modify the elasticsearch.yml file on each node in the cluster with the following line:

xpack.security.enabled: true

For a cluster that is running in production mode with a production license, once security is enabled, transport TLS/SSL must also be enabled. However, if we are running with a trial license, then transport TLS/SSL is not obligatory.

If we are running with a production license and we attempt to start the cluster with security enabled before we have enabled transport TLS/SSL, we will see the following error message:

Transport SSL must be enabled for setups with production licenses. Please set [xpack.security.transport.ssl.enabled] to [true] or disable security by setting [xpack.security.enabled] to [false]

Configuration of TLS/SSL is covered in the following sections.

TLS/SSL encryption

Elasticsearch has two levels of communications, transport communications and http communications. The transport protocol is used for internal communications between Elasticsearch nodes, and the http protocol is used for communications from clients to the Elasticsearch cluster. Securing these communications will be discussed in the following paragraphs.

Transport TLS/SSL encryption

The transport protocol is used for communication between nodes within an Elasticsearch cluster. Because each node in an Elasticsearch cluster is both a client and a server to other nodes in the cluster, all transport certificates must be both client and server certificates. If TLS/SSL certificates do not have Extended Key Usage defined, then they are already defacto client and server certificates. If transport certificates do have an Extended Key Usage section, which is usually the case for CA-signed certificates used in corporate environments, then they must explicitly enable both clientAuth and serverAuth.

Note that Elasticsearch comes with a utility called elasticsearch-certutil that can be used for generating self-signed certificates that can be used for encrypting internal communications within an Elasticsearch cluster.

The following commands can be used for generating certificates that can be used for transport communications, as described in this page on Encrypting Communications in Elasticsearch:

bin/elasticsearch-certutil ca
ENTER ENTER
bin/elasticsearch-certutil cert --ca elastic-stack-ca.p12
ENTER ENTER ENTER

Once the above commands have been executed, we will have TLS/ SSL certificates that can be used for encrypting communications.

The newly created certificates should be copied into a sub-directory called certs located within the config directory. The certificates will then be specified in the elasticsearch.yml file as follows:

xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12

Now restart all of the nodes in our Elasticsearch cluster for the above changes to take effect.

Define built-in user’s passwords

We must now define passwords for the built-in users as described in Setting built-in user passwords. Note that if we are running with a Gold or Platinum license, the previous steps to enable TLS/SSL for the transport communications must be executed before the cluster will start. Additionally, note that setting built-in user’s passwords should be completed before we enable TLS/SSL for http communications, as the command for setting passwords will communicate with the cluster via unsecured http.

Built-in users passwords can be setup with the following command:

bin/elasticsearch-setup-passwords interactive

Be sure to remember the passwords that we have assigned for each of the built-in users. We will make use of the elastic superuser to help configure PKI authentication later in this blog.

Http TLS/SSL encryption

For http communications, the Elasticsearch nodes will only act as servers and therefore can use Server certificates —  i.e. http TLS/SSL certificates do not need to enable Client authentication.

In many cases, certificates for http communications would be signed by a corporate CA. It is worth noting that the certificates used for encrypting http communications can be totally independent from the certificates that are used for transport communications.

To reduce the number of steps in this blog, we’ll use the same certificates for http communications as we have already used for the transport communications. These are specified in elasticsearch.yml file as follows:

xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.keystore.path: certs/elastic-certificates.p12
xpack.security.http.ssl.truststore.path: certs/elastic-certificates.p12
xpack.security.http.ssl.client_authentication: optional

Enabling PKI authentication

As discussed in Configuring a PKI Realm, the following must be added to the elasticsearch.yml file to allow PKI authentication.

xpack.security.authc.realms.pki1.type: pki

Combined changes to elasticsearch.yml

Once the above steps have been followed, we should have the following defined in our elasticsearch.yml configuration:

xpack.security.enabled: true

xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12

xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.keystore.path: certs/elastic-certificates.p12
xpack.security.http.ssl.truststore.path: certs/elastic-certificates.p12
xpack.security.http.ssl.client_authentication: optional

xpack.security.authc.realms.pki1.type: pki

Note that once the above changes have been made to our elasticsearch.yml file, we will have to restart all of the Elasticsearch nodes in our cluster in order for the changes to take effect.

Creating a client certificate

Certificates that will be used for PKI authentication must be signed by the same CA as the certificates that are used for encrypting http communications. Normally, these would be signed by an official CA within an organization. However, because we have already used a self signed CA, we also sign our http client certificates with that same self-signed CA which we previously saved as elastic-stack-ca.p12. We can create a certificate for client authentication as follows:

bin/elasticsearch-certutil cert --ca \
config/certs/elastic-stack-ca.p12 \
-name "CN=something,OU=Consulting Team,DC=mydomain,DC=com"
ENTER
client.p12 ENTER
ENTER

The above will create a file called client.p12, which contains all of the information required for PKI authentication to our Elasticsearch cluster. However, in order to use this certificate it is helpful to break it into its private key, public certificate, and CA certificate. This can be done with the following commands:

Private Key

openssl pkcs12 -in client.p12 -nocerts -nodes  > client.key

Public Certificate

openssl pkcs12 -in client.p12 -clcerts -nokeys  > client.cer

CA Certificate

openssl pkcs12 -in client.p12 -cacerts -nokeys -chain > client-ca.cer

Which should produce three files,

  1. client.key —  The private key
  2. client.cer —  The public certificate
  3. client-ca.cer — The CA that signed the public certificate

Create a directory called certs in Kibana’s config directory, and move all of the client certificates there.

Configure Kibana to authenticate to elasticsearch

Now that we have enabled security on the Elasticsearch cluster, communications to the cluster must be authenticated. Therefore, if we plan on using Kibana to interact with the cluster, then we must enable security and configure Kibana to authenticate to the cluster as the kibana user over https. As we have not yet fully setup PKI authentication from Kibana to the Elasticsearch cluster, authentication can initially be done with the following lines in the kibana.yml file:

elasticsearch.url: "https://localhost:9200" #ensure https not http
xpack.security.enabled: true
elasticsearch.username: "kibana"
elasticsearch.password: "our new kibana password here"
elasticsearch.ssl.certificateAuthorities: config/certs/client-ca.cer
elasticsearch.ssl.verificationMode: certificate

Ensure that we change localhost to the name of one of our Elasticsearch nodes, and that the certificates are available in the config/certs directory within the Kibana folder.

Note that the kibana user is like a service account that works behind the scenes to authenticate the Kibana application to the Elasticsearch cluster. We will generally never directly login to the Elasticsearch cluster or into the Kibana UI as the kibana user.

Restart Kibana in order for it to authenticate to the Elasticsearch cluster as the kibana user. We should be able to now login through the Kibana UI as the elastic built-in superuser.

PKI Authentication

We can use the three new client certificate files to test PKI authentication to the cluster with curl. Open a new terminal and cd to Kibana’s config/certs directory, and use curl to call the authenticate API as shown below.

curl https://localhost:9200/_xpack/security/_authenticate?pretty \
--key client.key --cert client.cer --cacert client-ca.cer -k -v

Be sure to replace localhost with the name of a node in our Elasticsearch cluster and be sure to use https (not http). Also note that the -k option is required as we did not create certificates with the hostnames specified, and therefore hostname verification must be turned off.

The above command should respond with something similar to the following:

{
 "username" : "something",
 "roles" : [ ],
 "full_name" : null,
 "email" : null,
 "metadata" : {
   "pki_dn" : "CN=something, OU=Consulting Team, DC=mydomain, DC=com"
 },
 "enabled" : true
}

Notice that the roles is currently empty which means that although we have authenticated to Elasticsearch, we are not authorized to perform any actions. Authentication is allowed because the client certificate that we sent to the cluster was signed by the same CA as the http TLS/SSL certificates used by the Elasticsearch nodes. Now that we are authenticated, we need to authorize this user to be able to do something.

The pki_dn value returned from the authenticate API will be used to configure the roles that will be assigned to this certificate.

Open the Kibana UI and if we have not already done so, login as the elastic user. As the elastic user has superuser privileges, this user can assign roles to the certificate. Execute the following command from Dev Tools in Kibana, ensuring that the previously returned pki_dn value is copied into the dn field as follows:

PUT _xpack/security/role_mapping/kibana_certificate_authorization
{
 "roles" : [ "kibana_system" ],
 "rules" : { "field" : { "dn" : "CN=something, OU=Consulting Team, DC=mydomain, DC=com" } },
 "enabled": true
}

Now that we have assigned kibana_system role to this certificate, verify this is set correctly with another call to the authenticate API:

curl https://localhost:9200/_xpack/security/_authenticate?pretty \
--key client.key --cert client.cer --cacert client-ca.cer -k -v

And we should see the following response, which indicates that we now have the “kibana_system” role assigned to this certificate.

{
 "username" : "something",
 "roles" : [
   "kibana_system"

 ],
 "full_name" : null,
 "email" : null,
 "metadata" : {
   "pki_dn" : "CN=something, OU=Consulting Team, DC=mydomain, DC=com"
 },
 "enabled" : true
}

Using PKI to authenticate Kibana to the Elasticsearch cluster

Now that we have tested our client-side certificate and assigned the kibana_system role to the certificate, we can use this certificate instead of a username and password, to authenticate Kibana to Elasticsearch.

Remove the following lines from our kibana.yml file:

elasticsearch.username: "kibana"
elasticsearch.password: "XXXXXX"

Ensure that all relevant certificates are copied to Kibana’s config/certs directory, and add the following lines to our kibana.yml file:

elasticsearch.url: "https://localhost:9200" #ensure https
xpack.security.enabled: true
elasticsearch.ssl.certificate: config/certs/client.cer
elasticsearch.ssl.key: config/certs/client.key
elasticsearch.ssl.certificateAuthorities: config/certs/client-ca.cer
elasticsearch.ssl.verificationMode: certificate

We can now restart Kibana, and it should authenticate to our Elasticsearch cluster, without any need for an embedded username and password!

Conclusion

In this blog post, I have demonstrated how to: enable security; configure TLS/SSL; set passwords for built-in users; use PKI for authentication; and finally, how to authenticate Kibana to an Elasticsearch cluster using PKI.

If you have any questions about PKI authentication with Elasticsearch, or any other Elasticsearch-related topics, have a look at our Discuss forums for valuable discussion,  insights, and information.

Using Logstash to drive filtered data from a single source into multiple output destinations

Now posted on the Elastic blog

Jan 15, 2019 update: A newer version of this article has been published on Elastic’s website as: https://www.elastic.co/blog/using-logstash-to-split-data-and-send-it-to-multiple-outputs

Overview

In this blog post we demonstrate how Logstash can be used to accomplish the following tasks:

  1. Create multiple copies of an input stream.
  2. Filter each unique copy of the input stream to only contain desired fields.
  3. Drive the modified copies of the input stream into different output destinations.

Note that in this blog post, we do not make use of pipeline-to-pipeline communication (beta) which could also likely achieve much of the functionality described here.

Example input file

As an input to Logstash, we use a CSV file that contains stock market trades. A few example CSV stock market trades are given below. 

1483230600,1628.75,1678.1,1772.8,2443.6
1483232400,1613.63,1688.5,1750.5,2460.2
1483234200,1606.51,1678.6,1718,2448.2
1483236000,1621.04,1684.1,1708.1,2470.4

The comma separated values represent  “time”, “DAX”, “SMI”, “CAC”, “FTSE” . You may wish to copy and paste the above lines into a CSV file called stocks.csv in order to execute the example logstash pipeline. 

Example Logstash pipeline

Below is a logstash pipeline that should be stored in a file called ‘clones.conf’. This pipeline does the following:

  1. Reads stock market trades as CSV-formatted input from a CSV file. Note that you should modify ‘clones.conf’ to use the correct path to your ‘stocks.csv’ file.
  2. Maps each row of the CSV input to a JSON document, where the CSV columns map to JSON fields.
  3. Converts the time field to Unix format.
  4. Uses the clone filter plugin to create two copies of each document. The clone filter will add a new ‘type’ field to each copy that corresponds to the names given in the clones array. (Note that the original version of each document will still exist in addition to the copies, but will not have a ‘type’ field added to it).
  5. For each copy:
    1. Adds metadata to each document corresponding to the ‘type’ that was added by the clone function. This allows us to later remove the ‘type’ field, while retaining the information required for routing different copies to different outputs.
    2. Uses the prune filter plugin to remove all fields except those which are whitelisted for the specific output.
  6. Removes the ‘type’ field that the clone function inserted into the documents. This is not strictly necessary, but eliminates the ‘type’ data and prevents it from being written to Elasticsearch.
  7. Writes the resulting documents to different outputs, depending on the value defined in the metadata field that we added in step 5.
input {
  file {
    path => "${HOME}/stocks.csv"
    start_position => "beginning"

    # The following line will ensure re-reading of input 
    # each time logstash executes.
    sincedb_path => "/dev/null"
  }
}

filter {
   csv {
    columns => ["time","DAX","SMI","CAC","FTSE"]
    separator => ","
    convert => { 'DAX' => 'float'
    'SMI' => 'float'
    'CAC' => 'float'
    'FTSE' => 'float'}
  }
  date {
    match => ['time', 'UNIX']
  }

  # The following line will create 2 additional 
  # copies of each document (i.e. including the 
  # original, 3 in total). 
  # Each copy will have a "type" field added 
  # corresponding to the name given in the array.
  clone {
    clones => ['copy_only_SMI', 'copy_only_FTSE']
  }

  if [type] == 'copy_only_SMI' {
    mutate { 
      add_field => { "[@metadata][type]" => "copy_only_SMI" } 
    }
    # Remove everything except "SMI"
    prune {
       whitelist_names => [ "SMI"]
    }
  } 

  else if [type] == 'copy_only_FTSE' {
    mutate { 
      add_field => { "[@metadata][type]" => "copy_only_FTSE" } 
    }
    prune {
       whitelist_names => [ "FTSE"]
    }
  } 

  # Remove 'type' which was added in the clone
  mutate {
    remove_field => ['type']
  }
}

output {
  stdout { codec =>  "rubydebug" }

  if [@metadata][type] == 'copy_only_SMI' {
    elasticsearch {
      index => "smi_data"
    }
  }
  else if [@metadata][type] == 'copy_only_FTSE' {
    elasticsearch {
      index => "ftse_data"
    }
  }
  else {
    elasticsearch {
      index => "stocks_original"
    }
  }
}

Testing the logstash pipeline

To test this pipeline with the example CSV data, you could execute something similar to the following command, modifying it to ensure that you use paths that are correct for your system. Note that specifying ‘config.reload.automatic’ is optional, but allows us to automatically reload ‘clones.conf’ without restarting Logstash. Remember that ‘clones.conf’ that is used below is the file that contains the pipeline described in the previous section.

./logstash -f ./clones.conf --config.reload.automatic

Once logstash has read the stocks.csv file, we can check the various outputs that have been written. We have written three indexes called ‘smi_data’, ‘ftse_data’, and ‘stocks_original’.

Check the SMI index

GET /smi_data/_search

Should display documents with the following structure. Notice that only “SMI” data appears in the ‘smi_data’ index.

      {
        "_index": "smi_data",
        "_type": "doc",
        "_id": "_QRskWUBsYalOV9y9hGJ",
        "_score": 1,
        "_source": {
          "SMI": 1688.5
        }
      }

Check the FTSE index

GET /ftse_data/_search

Should display documents with the following structure. Notice that only “FTSE” field appears in documents in the ‘ftse_data’ index.

      {
        "_index": "ftse_data",
        "_type": "doc",
        "_id": "AgRskWUBsYalOV9y9hL0",
        "_score": 1,
        "_source": {
          "FTSE": 2448.2
        }
      }

Check the original documents index

GET /stocks_originals/_search

Should display documents with the following structure. Notice that the entire original version of the documents appears in the ‘stocks_original’ index.

      {
        "_index": "stocks_original",
        "_type": "doc",
        "_id": "-QRskWUBsYalOV9y9hFo",
        "_score": 1,
        "_source": {
          "host": "Alexanders-MBP",
          "@timestamp": "2017-01-01T00:30:00.000Z",
          "SMI": 1678.1,
          "@version": "1",
          "message": "1483230600,1628.75,1678.1,1772.8,2443.6",
          "CAC": 1772.8,
          "DAX": 1628.75,
          "time": "1483230600",
          "path": "/Users/arm/Documents/ES6.3/datasets/stocks_for_clones.csv",
          "FTSE": 2443.6
        }
      }

Conclusion

In this blog post, we have demonstrated how to use Logstash to create multiple copies of an input stream, to then modify documents in each stream as required for different outputs, and to then drive the different streams into different outputs.