Let's Implement the Map Method from First Principles

💡

This post originally started as an answer to this question on Reddit: When to use lambda or proc vs. regular functions. As with all my writing goes, it got so large that I had to convert it to a blog post.

A higher order function is a function that

takes another function as a parameter,
returns a function, or
does both.

It's a very important and powerful concept in functional programming, and Ruby has a first-class support for higher-order functions, via its delicious flavors like blocks, procs, and lambdas.

However, if you're a new programmer, it can be really confusing to wrap your head around just when you might want to use higher order functions, i.e. write functions that accept other functions as parameters.

Alternatively, when do you use blocks, procs, or lambdas in Ruby?

After writing the above post, I got a few emails asking me to explain how anonymous functions are different from regular functions and when to use them.

In short, you know they are an important concept, and all the examples make sense, but you've no idea when you might use them in real-world.

I banged my head against a brick wall trying to figure out how higher-order functions work for a long time, and there didn't seem to be any concise, clear explanation on just how does it work. So here's my attempt at explaining them with a very simple example.

Let's try to implement the ubiquitous and omnipresent map method in Ruby, from the first principles.

Map is used to transform each item in an array into something else. Given some array of items and a function, map applies that function to every item and returns a new array containing the transformed (mapped) elements.

I hope that by the end of the post, you'll have a solid understanding of when you might need to write a function that takes other functions (blocks, procs, or lambdas in Ruby) as a parameter.

Imagine you have a list of email subscribers as a list of plain-old Ruby objects (the Subscriber class) and we need to get a list of their email addresses.

We can implement this features (without writing any fancy code) like this:

def collect_emails(subscribers)
  emails = []

  subscribers.each do |subscriber|
    emails << subscriber.email
  end

  emails
end

emails = collect_emails(subscribers)

Now imagine you also have a list of products and we want to know the price of each item. Your might write something like this:

def collect_prices(products)
  prices = []

  products.each do |product|
    prices << product.price
  end

  prices
end

prices = collect_prices(products)

If you look carefully, you'll notice that they're very similar. In both cases, we perform following operations:

create an empty array,
iterate over another list of items,
create a new item by performing some operation on each item in that list,
append the result of the previous operation to the new array, and finally
return that array

There's only one real difference between them: the operation we are performing on each item in the list.

In the first example we're calling the email method from the item.

email = subscriber.email

In the second example, we're extracting the price of a product.

price = product.price

Let's generalize the names of everything except the two blocks of code that are different. We get following functions.

def collect_emails(items)
  results = []

  items.each do |item|
    result = item.email  # code that changes
    results << result
  end

  results
end

def collect_prices(items)
  results = []

  items.each do |item|
    result = item.price  # code that changes
    results << result
  end

  results
end

emails = collect_emails(subscribers)
prices = collect_prices(products)

The code still works as expected.

Let's remove the duplication by extracting the part that's changing into a separate function that's stored in a variable. Specifically, we'll achieve this by extracting those chunks of code into Ruby lambdas or procs. These are anonymous functions.

# using lambda

email_collector = ->(subscriber) { subscriber.email }
price_collector = ->(product) { product.price }

# using proc

email_collector = Proc.new { |subscriber| subscriber.email }
price_collector = proc { |product| product.price }

We're simply storing the code that we want to execute later in a separate variable. Nothing fancy.

Here's the resulting examples. I'll use the lambda, as it's my favorite.

def collect_emails(items)
  results = []
  email_collector = ->(subscriber) { subscriber.email }  # Code that changes
  
  items.each do |item|
    result = email_collector.call(item)
    results << result
  end

  results
end

def collect_prices(items)
  results = []
  price_collector = ->(product) { product.price }  # Code that changes
  
  items.each do |item|
    result = price_collector.call(item)
    results << result
  end

  results
end

emails = collect_emails(subscribers)
prices = collect_prices(products)

We're getting close to completely remove the duplicated code.

Let's extract the big chunk of code that's repeated in both functions. We'll do this by parameterizing the {email/price}_collector variable, which is a lambda. I'll call the new parameter collector, and pass it from the code that calls our functions.

def collect(items, collector)
  results = []
  
  items.each do |item|
    result = collector.call(item)
    results << result
  end

  results
end

email_collector = ->(subscriber) { subscriber.email }
emails = collect(subscribers, email_collector)

price_collector = ->(product) { product.price }
prices = collect(products, price_collector)

We can further simplify the usage by eliminating the temporary variables as follows:

emails = collect subscribers, ->(subscriber) { subscriber.email }

prices = collect products, ->(product) { product.price }

Let's use blocks which are pretty. This also lets us eliminate the second parameter collector. We can simply yield the item after checking if the block was provided.

The yield method will call the provided block and forward all its arguments to the block.

def collect(items)
  results = []
  
  items.each do |item|
    result = yield(item) if block_given?
    results << result
  end

  results
end

emails = collect(subscribers) { |subscriber| subscriber.email }

prices = collect(products) { |product| product.price }

Congratulations, we've implemented a higher order function called collect, also known as map.

def map(items)
  results = []
  
  items.each do |item|
    result = yield(item) if block_given?
    results << result
  end

  results
end

Ruby already implements the map and collect methods on arrays, so we can directly call it on the subscribers and products.

emails = subscribers.collect { |subscriber| subscriber.email }

prices = products.map { |product| product.price }

And that's how you can create a higher-order function that accepts other function (blocks, procs, or lambdas in Ruby) to invoke it at some later time in execution.

That's a wrap. I hope you liked this article and you learned something new. If you're new to the blog, check out the start here page for a guided tour or browse the full archive to see all the posts I've written so far.

As always, if you have any questions or feedback, didn't understand something, or found a mistake, please leave a comment below or send me an email. I reply to all emails I get from developers, and I look forward to hearing from you.

If you'd like to receive future articles directly in your email, please subscribe to my blog. If you're already a subscriber, thank you.

Let's Implement the Map Method from First Principles

Posted by Akshay Khot

From a Lorry Driver to Ruby on Rails Developer at 38

Annual Blog Review: 2023

Akshay's Blog is now Write Software, Well

A List of Books to Learn Programming with Ruby and Rails

Share with friends

Posted by Akshay Khot

Join our newsletter