Graphs blog posts on waitingforcode.com

4-day workshop · In-person or online

What would it take for you to trust your Databricks pipelines in production?

A 3-day bug hunt on a 3-person team costs up to €7,200 in lost engineering time. This workshop teaches you to prevent that — unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.

Unit, data & integration tests

Medallion architecture & Lakeflow SDP

Max 10 participants · production-ready templates

See the full curriculum → €7,000 flat fee · cohort of up to 10

Bartosz
Konieczny

May 11, 2019 • Graphs

Bipartite graph recommendation example

When I was analyzing the API of Gelly, I was quite surprised for its support of bipartite graphs. First, because I didn't know that data structure and second because it wasn't supported in other analyzed frameworks. Hence, I added that graph structure to my backlog and sometime later wrote a post to explain it better.

Continue Reading →

February 20, 2019 • Graphs

Chaos in streaming graph processing

Some time ago I wrote a post about the graph data processing with streams. That article was based on X-Stream framework proposed by the searchers of EPFL research institute. At this occasion, I also mentioned the existence of newer alternative for X-Stream, adapted for distributed workloads, called Chaos. I voluntary omitted the explanation of Chaos in the previous post. Putting it aside of X-Stream would introduce too many new concepts. But now, after some weeks of graph processing discoveries, I would like to return to the successor of X-Stream and present it more in details.

Continue Reading →

November 21, 2018 • Graphs

Graph processing frameworks survey

The series about graph processing continues. Today it's the moment to analyze some major graph processing frameworks and choose the framework that I'll present more in details in incoming posts.

Continue Reading →

November 14, 2018 • Graphs

Graph mining

Because of its connected nature, graph structure has its own branch in data mining. Thanks to this branch we can get insight into relationships and dependencies between vertices.

Continue Reading →

November 7, 2018 • Graphs

Graph partitioning

As told many times in previous posts, one of the most challenging tasks in distributed graph processing is the partitioning. Connected nature of the graph components makes the partitioning hard. Hopefully, the researchers continue to propose the solutions.

Continue Reading →

November 1, 2018 • Graphs

Graph algorithms in distributed world - part 1

During last weeks we've discovered a lot about graph data processing in distributed world. However we haven't learned yet about the problems the graphs can solve. And it's as important as the knowledge about the processing techniques. Hopefully, this post will try to catch up this late.

Continue Reading →

November 1, 2018 • Graphs

Graph storage

Until now we've discovered exclusively the concepts devoted to computing distributed graphs. But the compute part can't go without storage. And since for the latter in the context of graph we can't talk about the storage, it requires its own detailed explanation.

Continue Reading →

October 24, 2018 • Graphs

Streaming and graph processing

Use cases of streaming surprise me more and more. In my recent research about graph processing in Big Data era I found a paper presenting the graph framework working on vertices and edges directly from a stream. In case you've missed that paper I'll try to present this idea to you.

Continue Reading →

October 24, 2018 • Graphs

Graph-centric graph processing

Previously described vertex-centric model is not the single one used to process graph data. Another one uses subgraphs as the processing unit.

Continue Reading →

October 18, 2018 • Graphs

Vertex-centric graph processing

Graph data processing, even though seems to be less popular than streaming or files processing, is an important member of data-oriented systems. And as its "colleagues", it also has some different processing logics. The first described in this blog is called vertex-centric.

Continue Reading →

October 11, 2018 • Graphs

Graphs and data processing

In this blog I've covered the topics about relational databases, key-value stores, search engines or log systems. There are still some storage systems deserving some learning effort and one of them are graphs considered here in the context of data processing.

Continue Reading →

Graphs articles

What would it take for you to trust your Databricks pipelines in production?