Having data close to the computation has a lot of advantages. This idea called data locality is not new since it was popularized with Hadoop MapReduce. Despite of that, it's worth recalling some of its main points and trying to adapt it to modern data pipelines most of the time based on cloud services.
If you've ever analyzed Spark UI, you've certainly seen the part of Locality level in the table with tasks. Even if this concept is less exposed than the topics as shuffle, it remains quite important in efficient data processing.