Welcome

A thousand times the mysteries unfold like galaxies in my head.

Follow me

Amazon launched the Amazon Elasticsearch Service less than a month ago to enable their clients to spin up scalable Elasticsearch clusters directly from the AWS Management Console and forget about about managing these clusters by themselves. While you can spin up and use an Elasticsearch cluster in several minutes, this ease of use comes with a small disadvantage: as opposed to a classic Elasticsearch setup, the Elasticsearch service only exposes the publicly accessible client gateway, making it impossible for Hadoop applications to connect to the nodes behind this gateway using discovery mechanisms.

Hive and Elasticsearch

To connect to the ElasticSearch service from any popular Hadoop applications (Hive, Pig, Spark etc.) you need to use the Elasticsearch Hadoop connector. This can be imported into your Java/Scala application using build tools such as Maven and sbt respectively. To use the connector in Hive though, you need to download the standalone jar package available on the Elasticsearch website.