What is Map/Filter/Reduce?
Map/Filter/Reduce is a high-level design pattern that simplifies the implementation of functions that operate over sequences (any datatype that has an iterator) making use of functional programming. This paradigm enables the programmer to treat sequences as a unit, making it possible to get rid of the entire control flow (no need of for or if statements).
How to use it
There are three basic functional operators: Map, Filter and Reduce. We will look over them one at a time and then see how they work together.
Map is a high-order function that applies a given function to each element of a sequence, returning a sequence of results in the same order. It can be formally defined as follows:
map : (E → F) × Seq<E> → Seq<F>
Two simple examples in Python:
>>> from math import sqrt >>> map(sqrt, [1, 16, 49]) [1.0, 4.0, 7.0] >>> map(str, [1, 2]) ['1', '2']
The map function takes two arguments: a reference to a function using its name, and a sequence. Let’s focus on the function reference. In Python, functions are first-class. T, this means that they can be assigned to variables, passed as parameters, be returned by other functions and be stored in data structures. In the above example, built-in functions were used, but it is possible to use any function defined by the programmer with the def keyword. For example:
>>> def inc(x): ... return x+1
As in the previous example, that function may be referenced by its name.
However, if a function is meant to be used only in one place, it is usually more convenient to define an anonymous function using a lambda expression. The previously defined inc function can be defined this way as follows:
>>> lambda x: x+1 <function <lambda> at 0x107ba1e60>
Which returns the same function as inc and can be used just like it. Even though this is quite useful, it has some syntactic limitations. Functions defined with a lambda expression are limited to those which can be defined just using a return statement and nothing else (no if or for statements, nor local variables). Luckily this will not be a great limitation for us, since we will be using it along with Map, Filter and Reduce.
The next operation to be defined is filter. It applies a unary predicate to each element of a sequence. The ones that satisfy that predicate are kept. Those that do not are removed. Then, it returns a new sequence with the filtered elements, while the original sequence is not modified. It is formally defined as follows:
filter : (E → boolean) × Seq<E> → Seq<E>
For example, a filter combined with a lambda expression would be the following:
>>> filter(lambda x: x>0, [-1,0,1]) 
It is also possible to pass any function by reference as a parameter to filter, identically as in the map operator.
Reduce is the last operator, which is used to aggregate elements in a sequence, using a binary function. Additionally to the function and the sequence, it can optionally take an initial value which initializes the reduction operation. This will also be the return value in case the sequence is empty (otherwise an exception would be thrown). The reduce operation is formally defined as follows:
reduce : (F × E → F) × Seq<E> × F → F
The function reduce(func, list, init) combines the elements of the list from left to right:
result0 = init result1 = func(result0, list) result2 = func(result1, list) ... resultn = func(resultn-1, list[n-1])
Where resultn is the return value. If the initial value is omitted, list is used as the first value.
An important design decision was made in the reduce implementation, and it is about the order in which the elements are aggregated. For associative operators like addition and product, it does not matter at all, but for other non-associative operators like subtractions or divisions, it will.
In Python it works starting from the left of the sequence (the first element), also called fold-left. Fold-right works the other way round, it takes the last element of the sequence as the initial value, defined as follows:
fold-right : (E × F → F) × Seq<E> × F → F
The function fold-right(func, list, init) works with the same logic as reduce but inverts the order:
result0 = init result1 = func(list[n-1], result0) result2 = func(list[n-2], result1) ... resultn = func(list, resultn-1)
Where resultn is the return value.
To show how this affects non-associative operators, consider the subtraction operator and a list like [1,2]. If fold-left is used, the reduction process is like follows:
fold-right(-, [1, 2], 0) result0 = 0 result1 = list[n-1] - result0 = 2 - 0 = 2 result2 = list[n-2] - result1 = 1 - 2 = -1
So it returns -1. Then if fold-left is used:
fold-left(-, [1, 2], 0) result0 = 0 result1 = result0 - list = 0 - 1 = -1 result2 = result1 - list = -1 -2 = -3
So the returned value is different depending on the aggregation order.
Map/Filter/Reduce is useful to perform a large variety of tasks, being one of those database queries. Actually, relational databases implement the map/filter/reduce paradigm (where it is called project/select/aggregate). Consider the following SQL query from a database about cellphones:
#What is the longest battery duration among Nokia cellphones? select max(batery_duration) from cellphones where brand = "Nokia"
"cellphones" is a sequence (a list of rows, where each row has the data for one cellphones)
where brand = "Nokia" is a filter
battery_duration is a map (extracting just the battery_duration field from the row)
max is a reduce
Then, the same query can be done using Map/Filter/Reduce assuming the database information is into a sequence called “cellphones”, where each object Cellphone has a method brand() and battery_duration(). The following query returns the same information as the previous one:
reduce(max, map(Cellphone.battery_duration, filter(lambda c: c.brand() == "Nokia", cellphones)))
You can check a sample project here, where Map, Filter and Reduce operations have been used to process data.