Dockerize Cassandra troubleshooting

Versions: Cassandra 3.10

Some time ago I tried to create Docker image with Cassandra and some other programs. For the "others", the operation was quite easy but Cassandra caused some problems because of several configuration properties.

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I'm currently writing one on that topic and the first chapters are already available in πŸ‘‰ Early Release on the O'Reilly platform

I also help solve your data engineering problems πŸ‘‰ πŸ“©

This post tries to explain in what creating Cassandra Docker image was difficult. It starts by showing what happens with the default configuration and explains the evolution through parameters changes.

The first victory - --net=host

Below code shows the first step to put Cassandra into Docker container:

RUN wget --quiet -P /home/streaming_user/installs
RUN tar -xvzf /home/streaming_user/installs/apache-cassandra-3.10-bin.tar.gz -C /home/streaming_user/programs
RUN sed -i -E "s/MAX_HEAP_SIZE=\".*\"/MAX_HEAP_SIZE=\"1000M\"/" $cassandraConfig/

It simply downloads specific Cassandra's version, untars it to configured location and starts the process. Docker container was firstly ran with this command: docker run -p --name my_streaming_context -i -t streaming_context. The problem was that 9042 port, even if explicitly exposed through the command, was not reachable from client's application.

After some configuration research, the property --net=host appeared to be the solution. And it was, but rather in terms of just making something work - not necessarily proper solution. In fact --net=host flag allows container to share the network namespace of the host, i.e. container is exposed to public network.

It wasn't necessarily the thing I wanted. It's why I continued the tests.

Solution that fits

After reading some blogs and analyzing different Docker images, I found Hugo Picado's blog post about Containerizing Spring Boot and Cassandra making insight on properties to change before making Cassandra works without container's port public exposition. Among these properties we can find:

Not all of above properties are useful in our case of simple containerization of databases. But to remain totally consistent, they all were changed as advised. And without any surprise, the 9042 port was correctly exposed to client's program.

This post presents the integration of Cassandra into Docker container composed of different data sources. It's mostly focused on troubleshooting part. The first section described the problem of port accessibility and a hack solution consisted on exposing container's ports publicly. The second part gave another, more proper solution, consisted on changing Cassandra's default properties.