Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add map reduce pattern issue 2927 #3057

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 124 additions & 0 deletions map-reduce/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
---
title: "MapReduce Pattern in Java"
shortTitle: MapReduce
description: "Learn the MapReduce pattern in Java with real-world examples, class diagrams, and tutorials. Understand its intent, applicability, benefits, and known uses to enhance your design pattern knowledge."
category: Performance optimization
language: en
tag:
- Data processing
- Code simplification
- Delegation
- Performance
---

# MapReduce in Java

## Intent of Map Reduce Pattern

The MapReduce design pattern is intended to simplify the processing of large-scale data sets by breaking down complex tasks into smaller, manageable units of work. This pattern leverages parallel processing to increase efficiency, scalability, and fault tolerance across distributed systems.

## Detailed Explanation of Map Reduce with Real-World Examples

1. Map Phase
The Map phase is the first step in the process. It takes the input data (often unstructured or semi-structured) and transforms it into intermediate key-value pairs. Each data element is processed independently and converted into one or more key-value pairs.
2. Reduce Phase
The Reduce phase aggregates the intermediate key-value pairs produced in the Map phase. All values associated with the same key are passed to the Reduce function, which processes them to produce a final output.

Real-world example

> Imagine you work for a company that processes millions of customer reviews, and you want to find out how often each word is used across all the reviews. Doing this on one machine would take a lot of time, so you want to use a MapReduce process to speed it up by distributing the work across multiple computers.

In plain words

> By abstracting the distribution and coordination of these tasks, MapReduce enables developers to focus on the processing logic, rather than the complexities of managing concurrency

Wikipedia say
> MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster.

## Programmatic Example of DTO Pattern in Java

Let's first start with our map method

```java
public static List<Map.Entry<String, Integer>> map(List<String> sentences) {
List<Map.Entry<String, Integer>> mapped = new ArrayList<>();
for (String sentence : sentences) {
String[] words = sentence.split("\\s+");
for (String word : words) {
mapped.add(new AbstractMap.SimpleEntry<>(word.toLowerCase(), 1));
}
}
return mapped;
}
```
The purpose of the Map function is to process the input data and convert it into key-value pairs.

Now lets go with the reduce function:
```java
public static Map<String, Integer> reduce(List<Map.Entry<String, Integer>> mappedWords) {
Map<String, Integer> reduced = new HashMap<>();
for (Map.Entry<String, Integer> entry : mappedWords) {
reduced.merge(entry.getKey(), entry.getValue(), Integer::sum);
}
return reduced;
}
```
In the Reduce phase, the grouped key-value pairs are processed to combine or summarize the data. The idea is to aggregate all the values for each key to produce the final result.



An example of the console output after running the algorithm with the following input:
input
```java
List<String> sentences = Arrays.asList(
"hello world",
"hello java java",
"map reduce pattern in java",
"world of java map reduce"
);
```
Output
```
reduce: 2
java: 4
world: 2
in: 1
of: 1
pattern: 1
hello: 2
map: 2
```

## When to Use the MapReduce Pattern in Java

Use the MapReduce pattern when:

* You are working with massive datasets that can't fit on a single machine.
* You need a batch-processing solution to analyze or transform large volumes of data.
* Fault tolerance and high availability are important for your use case.

## MapReduce Pattern Java Tutorials

* [MapReduce Algorithm (Baeldung)](https://www.baeldung.com/cs/mapreduce-algorithm)
* [MapReduce Tutorial (javatpoint)](https://www.javatpoint.com/mapreduce)


## Real-World Applications of MapReduce Pattern in Java
MapReduce is a powerful tool for processing large-scale data because it breaks down complex tasks into smaller, manageable parts, allowing for efficient parallel computation across distributed systems

## Benefits of MapReduce pattern

1. Developers only need to define two simple functions: Map and Reduce. They don't have to worry about how tasks are distributed, how failures are handled, or how the data is split.
2. MapReduce scales horizontally. You can add more machines (nodes) to the cluster to handle larger data sets or process more tasks in parallel. This makes it perfect for large datasets in the order of terabytes or petabytes.
3. The system is designed to handle failures. If a worker node crashes or a task fails, it automatically retries the task on another available node without developer intervention.

## Trade-offs of Map Reduce pattern

1. MapReduce is designed for batch processing, meaning it’s not suitable for real-time data processing or low-latency jobs. The process of splitting data, shuffling it between nodes, and reducing it can introduce significant delays.
2. If the dataset is small or can fit in memory on a single machine, MapReduce can be overkill. The overhead of distributing tasks across nodes outweighs the benefits for small-scale tasks. Single-node solutions like a relational database or in-memory processing tools.
3. MapReduce frequently reads from and writes to disk (between the Map and Reduce phases), leading to performance bottlenecks. This is particularly problematic for jobs with lots of intermediate data.

## References and Credits

* [Designing Data-intensive Applications](https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321/ref=sr_1_1?s=books&sr=1-1)
* [MapReduce Design Patterns](https://www.amazon.com/MapReduce-Design-Patterns-Effective-Algorithms/dp/1449327176/ref=sr_1_1?crid=3N6I3219DQBM&dib=eyJ2IjoiMSJ9.v6J5LaH30wtWyGQ7t20oSWIhd3rZs9GOaU3r-fSfZbd11rwjP0d0lL4tdcsD_yMt-WY6-XDWWakgkvMv38W9YD7CZDIgJ1G-LuazC8rNILObJBIRg09-7-ugQHZbtkqZFEt1ZCyFiDV4E3Iq2Db41vOpjbrU_B-phwzNQoRU175m1i-WvzTdcWL5GwVcbIWClmYB99kszZ1wX76nfjfq9YUHAFZtlpvLNMavBY4KTjI.QhcDrdrN5Bdd5ZVRTf9cZw0lAXNX83ncVVws8UbVDKU&dib_tag=se&keywords=MapReduce+Design+Patterns&qid=1728522338&s=books&sprefix=mapreduce+design+patterns%2Cstripbooks-intl-ship%2C198&sr=1-1)
Binary file added map-reduce/etc/map-reduce.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 10 additions & 0 deletions map-reduce/etc/model-view-intent.urm.puml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
@startuml
package com.iluwatar.mapreduce {

class App {
+ App()
+ main(args : String[]) {static}
+ map(sentences : List<String>) : List<Map.Entry<String, Integer>> {static}
+ reduce(mappedWords : List<Map.Entry<String, Integer>>) : Map<String, Integer> {static}
}
@enduml
67 changes: 67 additions & 0 deletions map-reduce/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--

This project is licensed under the MIT license. Module model-view-viewmodel is using ZK framework licensed under LGPL (see lgpl-3.0.txt).

The MIT License
Copyright © 2014-2022 Ilkka Seppälä

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>java-design-patterns</artifactId>
<groupId>com.iluwatar</groupId>
<version>1.26.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>marker-interface</artifactId>
<dependencies>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-engine</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.hamcrest</groupId>
<artifactId>hamcrest-core</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<executions>
<execution>
<configuration>
<archive>
<manifest>
<mainClass>App</mainClass>
</manifest>
</archive>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
80 changes: 80 additions & 0 deletions map-reduce/src/main/java/com/iluwatar/mapreduce/App.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
package com.iluwatar.mapreduce;

import lombok.extern.slf4j.Slf4j;
import java.util.AbstractMap;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

/**
* The main intent of the MapReduce design pattern is to allow for the processing of large data sets with a distributed algorithm,
* minimizing the overall time of computation by exploiting various parallel computing nodes. This design pattern simplifies the complexity
* of concurrency and hides the details of data distribution,fault tolerance, and load balancing, making it an effective model for
* processing vast amounts of data.
*/
@Slf4j
public class App {

/**
* Program entry point.
*
* @param args command line args
*/
public static void main(String[] args) {

// Sample input: List of sentences
List<String> sentences = Arrays.asList(
"hello world",
"hello java java",
"map reduce pattern in java",
"world of java map reduce"
);

// Step 1: Map phase
List<Map.Entry<String, Integer>> mappedWords = map(sentences);

// Step 2: Reduce phase
Map<String, Integer> wordCounts = reduce(mappedWords);

// Step 3: Output the final result
wordCounts.forEach((word, count) -> LOGGER.info("{}: {}", word, count));
}

/**
* The map function processes a list of input data and produces key-value pairs.
*
* @param sentences The input data to be processed by the map function.
* @return A List of maps entries containing keys (e.g., words) and their occurrences.
*/
public static List<Map.Entry<String, Integer>> map(List<String> sentences) {
List<Map.Entry<String, Integer>> mapped = new ArrayList<>();
for (String sentence : sentences) {
// Split the sentence into words using whitespace as a delimiter
String[] words = sentence.split("\\s+");
for (String word : words) {
// Create a key-value pair where the key is the word and the value is 1
mapped.add(new AbstractMap.SimpleEntry<>(word.toLowerCase(), 1));
}
}
return mapped;
}

/**
* The reduce function processes the grouped data and aggregates the values
* (e.g., sums up the occurrences for each word).
*
* @param mappedWords A List of maps where each key has a list of associated values.
* @return A final map with each key and its aggregated result.
*/
public static Map<String, Integer> reduce(List<Map.Entry<String, Integer>> mappedWords) {
Map<String, Integer> reduced = new HashMap<>();
for (Map.Entry<String, Integer> entry : mappedWords) {
// If the word is already in the map, increment the count, otherwise set it to 1
reduced.merge(entry.getKey(), entry.getValue(), Integer::sum);
}
return reduced;
}

}
103 changes: 103 additions & 0 deletions map-reduce/src/test/java/com/iluwatar/AppTest.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
/*
* This project is licensed under the MIT license. Module model-view-viewmodel is using ZK framework licensed under LGPL (see lgpl-3.0.txt).
*
* The MIT License
* Copyright © 2014-2022 Ilkka Seppälä
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
package com.iluwatar;

import com.iluwatar.mapreduce.App;
import org.junit.jupiter.api.Test;

import java.util.AbstractMap;
import java.util.Arrays;
import java.util.List;
import java.util.Map;

import static org.junit.jupiter.api.Assertions.assertEquals;

class AppTest {

@Test
void testMap_singleSentence() {

List<String> sentences = Arrays.asList("hello world");
List<Map.Entry<String, Integer>> result = App.map(sentences);

// Assert
assertEquals(2, result.size(), "Should have 2 entries");
assertEquals("hello", result.get(0).getKey());
assertEquals(1, result.get(0).getValue());
assertEquals("world", result.get(1).getKey());
assertEquals(1, result.get(1).getValue());
}

@Test
void testMap_multipleSentences() {

List<String> sentences = Arrays.asList("hello world", "hello java");
List<Map.Entry<String, Integer>> result = App.map(sentences);

// Assert
assertEquals(4, result.size(), "Should have 4 entries (2 words per sentence)");
assertEquals("hello", result.get(0).getKey());
assertEquals(1, result.get(0).getValue());
assertEquals("world", result.get(1).getKey());
assertEquals(1, result.get(1).getValue());
assertEquals("hello", result.get(2).getKey());
assertEquals(1, result.get(2).getValue());
assertEquals("java", result.get(3).getKey());
assertEquals(1, result.get(3).getValue());
}

@Test
void testReduce_singleWordMultipleEntries() {

List<Map.Entry<String, Integer>> mappedWords = Arrays.asList(
new AbstractMap.SimpleEntry<>("hello", 1),
new AbstractMap.SimpleEntry<>("hello", 1)
);
Map<String, Integer> result = App.reduce(mappedWords);

// Assert
assertEquals(1, result.size(), "Should only contain one unique word");
assertEquals(2, result.get("hello"), "The count of 'hello' should be 2");
}

@Test
void testReduce_multipleWords() {
List<Map.Entry<String, Integer>> mappedWords = Arrays.asList(
new AbstractMap.SimpleEntry<>("hello", 1),
new AbstractMap.SimpleEntry<>("world", 1),
new AbstractMap.SimpleEntry<>("hello", 1),
new AbstractMap.SimpleEntry<>("java", 1)
);
Map<String, Integer> result = App.reduce(mappedWords);

// Assert
assertEquals(3, result.size(), "Should contain 3 unique words");
assertEquals(2, result.get("hello"), "The count of 'hello' should be 2");
assertEquals(1, result.get("world"), "The count of 'world' should be 1");
assertEquals(1, result.get("java"), "The count of 'java' should be 1");
}

}

1 change: 1 addition & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,7 @@
<module>virtual-proxy</module>
<module>function-composition</module>
<module>microservices-distributed-tracing</module>
<module>map-reduce</module>
</modules>
<repositories>
<repository>
Expand Down
Loading