Articles about Storage on waitingforcode.com - articles for the pleasure of learning and discovery

Looking for something else? Check the categories of Storage:

Apache Avro Apache Cassandra Apache Hudi Apache Iceberg Apache Parquet Apache ZooKeeper Delta Lake Elasticsearch Embedded databases HDFS MySQL PostgreSQL Time series

If not, below you can find all articles belonging to Storage.

March 12, 2016 • Apache Avro

Introduction to serialization in Big Data

NoSQL solutions are very often related to the word schemaless. Sometimes the absence of schema can lead to maintenance or backward compatibility problems. One of solutions to these issues in Big Data systems are serialization frameworks.

Continue Reading →

March 12, 2016 • Elasticsearch

Reverse nested aggregation in Elasticsearch

Aggregations are a really powerful Elasticsearch feature. Besides aggregations known from RDBMS, such as sum, min, max, count, they offer the possibility to apply aggregation on different levels. It's particularly useful with nested documents.

Continue Reading →

March 12, 2016 • Elasticsearch

Parent-children relationship in Elasticsearch

Make links between entities is quite easy in relational databases. And it's not a trivial task in document databases, adapted to less normalized data storage. Elasticsearch is not the exception of this rule but it defines some mechanisms to support parent-children relationship between documents.

Continue Reading →

March 12, 2016 • Elasticsearch

Locks in Elasticsearch

Concurrency issues in Elasticsearch are often provoked by the lack of ACID transactions support. However, the search engine provides some of locking mechanisms to deal with them.

Continue Reading →

February 12, 2016 • Elasticsearch

Aggregations in Elasticsearch

Even if Elasticsearch is not relational system, it allows to aggregate results. This operation is very helpful if we want to group set of documents.

Continue Reading →

February 12, 2016 • Elasticsearch

Routing in Elasticsearch

If you've been worked with PHP frameworks like Zend or Symfony, you are certainly familiar with the concept of routing which is based on redirection of HTTP request to appropriated controller. Elasticsearch has similar feature, by the way, also called routing.

Continue Reading →

January 1, 2016 • Elasticsearch

Proximity matching in Elasticsearch

Elasticsearch and its idea of inverted index is a kind of magic infinitely deep hat in which we can hide millions of terms. However, sometimes these terms need to be analyzed with some logic, not just only as plain words. It's here where proximity matching comes with help.

Continue Reading →

January 1, 2016 • Elasticsearch

Partial matching and ngrams in Elasticsearch

Elasticsearch search matches only terms defined in inverted index. So even if we are looking for only two first letters of given term, we won't be able to do it with standard match query. Instead of it we should use partial matching, provided by Elasticsearch in different forms.

Continue Reading →

January 1, 2016 • Elasticsearch

Filtered queries in Elasticsearch

Queries in Elasticsearch can be executed not only against full-text searches. They can also be filtered. And in Elasticsearch world, filters mean another operation than queries.

Continue Reading →

January 1, 2016 • Elasticsearch

Scoring and boosting in Elasticsearch

A subtle difference between filter and full-text search consists on scoring. It's score who distinguishes result corresponding to filter from how well result matches the query.

Continue Reading →

December 13, 2015 • Elasticsearch

Aliases in Elasticsearch

Nobody is perfect and my name is not nobody. Elasticsearch mapping, as mappings of another storage engines, is rarely immutable. Because of that, index changes can provoke service downtime, according to size of reindexed data. But there are a trick to avoid this dead times.

Continue Reading →

December 13, 2015 • Elasticsearch

Indexing documents in Elasticsearch

Retrieving documents in Elasticsearch shouldn't be possible without indexing. They are a intermediate layer between user and shards which store documents data.

Continue Reading →

December 13, 2015 • Elasticsearch

Bulk queries in Elasticsearch

Elasticsearch is devoted to store big amount of data. Making some operations as indexing them can be costly. It's one of the reasons Elasticsearch adopted the same features as in the most of principal RDBMS, batch operations - in Elasticsearch known as bulk operations.

Continue Reading →

December 13, 2015 • Elasticsearch

Queries in Elasticsearch

Using Elasticsearch without querying is a little bit strange activity. After all, the name of this document-oriented database is composed by "search" suffix.

Continue Reading →

September 1, 2015 • Elasticsearch

Connection modes in Elasticsearch

Elasticsearch has a powerful RESTful web service. But it's only one from available methods to connect our application to server.

Continue Reading →

September 1, 2015 • Elasticsearch

Elasticsearch architecture and vocabulary

Every time before starting to learn new technology, we need to appropriate its specific vocabulary. In the case of Elasticsearch, this vocabulary is mostly related to the architecture terms.

Continue Reading →

September 1, 2015 • Elasticsearch

Elasticsearch and some concepts of document-oriented database

Every NoSQL solution has some basic concepts associated to it. For example, in graph databases we'll talk about nodes in different meaning than in document-oriented and clustered databases such as ElasticSearch (ElasticSearchSearch). This article will present some of concepts specific to ElasticSearch search engine.

Continue Reading →

June 28, 2014 • MySQL

Managing hierarchical data in MySQL - closure table

Until then we approached 3 ways to manage hierarchical data in MySQL : adjacency, nested set and path enumeration. There remains one method which will be covered in this article, closure table, called adjacency relation too.

Continue Reading →

June 28, 2014 • MySQL

Managing hierarchical data in MySQL - path enumeration

Previously we saw that they are already two methods, adjacency list and nested set model, to manage hierarchical data in RDBMS. But it's not all. A third method, called path enumeration, permits to handle trees on relational database too.

Continue Reading →

June 28, 2014 • MySQL

Managing hierarchical data in MySQL - nested set

Another way to manipulate hierarchical data in MySQL are nested sets. This approach uses an interesting technique to represent hierarchies of data.

Continue Reading →

Storage articles