Connection modes in Elasticsearch on waitingforcode.com

Elasticsearch has a powerful RESTful web service. But it's only one from available methods to connect our application to server.

In this article we'll present the possibilities to connect external application to Elasticsearch server. At the begin we'll explore the idea of already quoted RESTful API. After that, supposing that we are using Java, we'll show how to explore Java's API of Elasticsearch to create connections to server. This part will contain two connection modes: through node and transport clients.

Elasticsearch RESTful API

By default, Elasticsearch listens HTTP traffic on 9200 port. This entry (http.port) can be changed, as much another ones, in elasticsearch.yml confgiuration file. The communication with API consists on sending requests with HTTP methods appropriated to REST meaning (DELETE for deleting an element, PUT to update, GET to query etc.). These requests can have JSON body written with Elasticsearch DSL.

API responses are also in JSON format. To modify it, we can use some of available plugins changing output format. Actually the single alternative to JSON is YAML. It can be downloaded by adding format=yaml parameter in query string. Another interesting output parameter in is pretty=true. Thanks to it, response will be more readable, with logical indentation. The last parameter written in query string and very useful during debug process, is explain. If specified, it will return detailed information about score computation for each result.

Let's see simple search result which role consists on getting all teams called "roubaix" - http://localhost:9200/waitingforcode/teams/_search?q=roubaix&pretty=true:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 4,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 2.1971753,
    "hits" : [ {
      "_index" : "waitingforcode",
      "_type" : "teams",
      "_id" : "AU38kcywNvU--AapUYRl",
      "_score" : 2.1971753,
      "_source":{"name": "Excelsior Roubaix"}
    }, {
      "_index" : "waitingforcode",
      "_type" : "teams",
      "_id" : "AU38kc2pNvU--AapUYa1",
      "_score" : 2.1971753,
      "_source":{"name": "RC Roubaix"}
    } ]
  }
}

At this occasion, let's note that Elasticsearch RESTful API allows to send JSON body in search queries. Thanks to it, we can easily use some of advanced search features, as aggregation or filtering. And this search body can be sent as well for POST requests as for GET ones.

Elasticsearch Java API clients

Another possibility of connection with Elasticsearch are Java's API clients. As mentioned earlier, two types of clients are available for this job: node client and transport client. The main difference between them is their participation in cluster. Node client, once initialized, becomes a part of Elasticsearch cluster. In the other side, transport clients acts more like a supplementary proxy layer between cluster and application. When it connects to one of cluster's instances, this instance doesn't connect back to transport client.

Transport clients brings an interesting subject which is transport module. This is a mechanism used for internal communication between cluster's nodes. It's also the way of connecting transport client to the cluster. Instead of joining cluster, transport client gets several transport addresses and communicates with them. The communication process is based on round-robin fashion. It means that each server in the cluster treats the same number of requests. The treatment must be done under fixed limits shared by all servers. All servers are on equal terms.

Node clients are more adapted to few long-live connection. Transport clients are supposed to work better in the environment of multiple connections with short lifetime. In additionally, both are aware on cluster state. It means that when master server is stopped or started, client's connection is managed automatically. Below log entries can be observed in the case of restarting master server:

2015-07-16 19:08:24.912  INFO 13362 --- [][generic][T#1]] org.elasticsearch.discovery.zen          : [Maha Yogi] master_left [[Jordan Seberius][Qwqa0oUQRoGzNR2LFP-W3w][bartosz][inet[/]]], reason [transport disconnected]
2015-07-16 19:08:24.913  WARN 13362 --- [pdateTask][T#1]] org.elasticsearch.discovery.zen          : [Maha Yogi] master left (reason = transport disconnected), current nodes: {[Maha Yogi][FuLR0dD4TOWijOfqFee9JQ][bartosz][inet[/]]{data=false, client=true},}
2015-07-16 19:08:24.914  INFO 13362 --- [pdateTask][T#1]] org.elasticsearch.cluster.service        : [Maha Yogi] removed {[Jordan Seberius][Qwqa0oUQRoGzNR2LFP-W3w][bartosz][inet[/]],}, reason: zen-disco-master_failed ([Jordan Seberius][Qwqa0oUQRoGzNR2LFP-W3w][bartosz][inet[/]])

2015-07-16 19:10:14.108  INFO 13362 --- [pdateTask][T#1]] org.elasticsearch.cluster.service        : [Maha Yogi] detected_master [Amphibius][CH1fRmyGRnOivLFp_mQdmw][bartosz][inet[/]], added {[Amphibius][CH1fRmyGRnOivLFp_mQdmw][bartosz][inet[/]],}, reason: zen-disco-receive(from master [[Amphibius][CH1fRmyGRnOivLFp_mQdmw][bartosz][inet[/]]])

Elasticsearch node client

In our French football stats project we'll use node client. because single connection is enough. The client is configured as Spring's bean singleton-scoped:

@Configuration
public class ElasticSearchConfig {

    public static final String INDEX = "waitingforcode";

    @Bean
    public Client elasticSearchClient() {
        return NodeBuilder.nodeBuilder().client(true).clusterName(INDEX)
                .node().client();
    }
}

As you can see, the configuration is very simple and pleasant to read thanks to fluent interface:

.client(true): this method is equal to putting node.data configuration entry to false. It means that given node won't hold any data. Another method to specify that given node can't hold any data is the metho data(boolean data)
.clusterName(INDEX): indicates the name of cluster to which part will be created node.
.node.client():

Another possible options used in node client instance are: passing exact settings entries such as node.data, cluster.name and marking a client as a local one. Local node means that its visibility is restricted to its JVM. Consequently, if two local nodes are created in the same JVM, they will form one cluster.

Under-the-hood, object created in this way is an instance of org.elasticsearch.node.internal.InternalNode.

Elastichsearch transport client

In creation of transport client we need to specify associated host and port, called as transport addresses. One transport client can have one or more these adresses. If one of addresses is invalid (cluster stopped, bad port typed), no exception is thrown. Instead, every failure is caught in org.elasticsearch.client.transport.TransportClientNodesService class through this block:

try {
  // its a listed node, light connect to it...
  logger.trace("connecting to listed node (light) [{}]", listedNode);
  transportService.connectToNodeLight(listedNode);
} catch (Throwable e) {
  logger.debug("failed to connect to node [{}], removed from nodes list", e, listedNode);
  continue;
}

An exception is only thrown at the moment when we need to query cluster with misconfigured transport client:

private void ensureNodesAreAvailable(ImmutableList<DiscoveryNode> nodes) {
  if (nodes.isEmpty()) {
    String message = String.format(Locale.ROOT, "None of the configured nodes are available: %s", nodes);
    throw new NoNodeAvailableException(message);
  }
}

Following test cases help to observe valid and invalid transport client's configuration:

public class TransportClientIntegrationTest {

    private static final Settings SETTINGS = ImmutableSettings.settingsBuilder()
            .put("cluster.name", "waitingforcode").build();

    @Test
    public void test_connection_through_transport_client() {
        Client client = new TransportClient(SETTINGS)
                .addTransportAddress(new InetSocketTransportAddress("localhost", 9300));

        // Sample request to count all indexed teams
        CountResponse response = getTestQuery(client);

        assertThat(response.getCount()).isGreaterThan(0L);
    }

    @Test(expected = NoNodeAvailableException.class)
    public void test_connection_to_bad_port() {
        // When we try to connect to invalid cluster, the failure is silent.
        // Only the query executed with invalid client causes an exception.
        Client client = new TransportClient(SETTINGS)
                .addTransportAddress(new InetSocketTransportAddress("localhost", 1234));

        CountResponse response = getTestQuery(client);
    }

    private CountResponse getTestQuery(Client client) {
        CountRequestBuilder countRequestBuilder = new CountRequestBuilder(client)
                .setTypes("teams");
        ActionFuture<CountResponse> responseFuture = client.count(countRequestBuilder.request());
        return responseFuture.actionGet();
    }

}

This article presents two different modes to connect one application to Elasticsearch server. The first one, the most neutral, is based on HTTP protocol and RESTful API standard. Unlike another mode, it's not restricted to Java language. This second mode, based on programatically created objects called clients, has also two subtypes.The first one represents node client, ie. client connected to another cluster's nodes. The second one, transport clients, acts more like an independent proxy communicating with cluster through transport module.