To read some text from a file is one of the most common tasks while programming. However, there are a few gotchas that await a new programmer that have bitten me in the past. So in this post, we'll explore a few different ways to read data from file and when to choose which.
TL; DR
If the file is small, slurp it:
content = File.read "data.txt"
If the file is large, stream it:
File.foreach("data.txt") { |line| puts line }
Let's assume we have a file named companies.csv
containing the following data:
id,company,product
1,Microsoft,windows
2,Apple,iphone
3,Meta,facebook
4,Google,search
5,Amazon,ecommerce
Let's see a few different ways to read this file in Ruby.
Using File.new
The simplest way to access this file in Ruby is to create an instance of the File
class, passing the name of the file. Using this file instance, you can manipulate the file to your heart's content: read it, write to it, inspect the permissions, etc.
To read the whole file at once, use the read
method.
file = File.new "companies.csv"
contents = file.read
# "id,company,product\n1,Microsoft,windows\n2,Apple,iphone\n ..."
Once you're done processing the file, don't forget to close the file, to avoid leaking resources.
file.close # Important!
Using File.open
In the previous solution, you have to remember to close the file. However, Ruby provides a better solution with the File.open
method, which takes a block that receives an instance of the File
class, which represents the underlying file. The file is closed automatically at the end of the block.
File.open("companies.csv") do |file|
end
Using the file instance, you can read the whole file at once using the read
method.
File.open("companies.csv") do |file|
content = file.read
end
Using File.read
In everyday programming, the simplest (and most readable) way is to directly read the contents of the file using the File.read
class method.
content = File.read "companies.csv"
If all you are doing is read the file and don't need the file object for anything else, use this method. Ruby takes care of opening and closing the file behind the scenes and you don't have to worry about it.
Reading All Lines
To read all lines from the file into an array, Ruby provides the readlines
method.
File.open("companies.csv") do |file|
lines = file.readlines
end
# Output
[
[0] "id,company,product\n",
[1] "1,Microsoft,windows\n",
[2] "2,Apple,iphone\n",
[3] "3,Meta,facebook\n",
[4] "4,Google,search\n",
[5] "5,Amazon,ecommerce"
]
Also, just like the read
method, you can call the readlines
method on the File
class itself.
lines = File.readlines("companies.csv")
# Output
[
[0] "id,company,product\n",
[1] "1,Microsoft,windows\n",
[2] "2,Apple,iphone\n",
[3] "3,Meta,facebook\n",
[4] "4,Google,search\n",
[5] "5,Amazon,ecommerce"
]
Now let's address a common issue while reading the file. Often, during development, you're working with smaller files. You read the file and everything works fine. Then you deploy it to production where it needs to work with large files, and suddenly your program crashes. Oops!
Read Large Files as Streams
All the solutions we've seen so far load the whole file in memory at once. If the file is huge, it will consume too much memory. What's more, good luck loading a 10 GB log file on a machine with 8 GB of memory.
To read a huge file, a better solution is to treat it as a flowing stream.
We don't need to have the whole file into memory at once to process it. You can process the file one line at a time, or even one character at a time.
File.open("companies.csv") do |f|
f.each_line do |line|
puts line
end
# OR
f.each do |line|
puts line
end
end
# OR
File.foreach "companies.csv" do |line|
puts line
end
The advantage of treating a file as a stream is this: At no point, we have the whole file in the memorey, and as the size of the file increases, the above code won't use any more memory.
Bonus: File is an Enumerable
As you can see in the above example, you can use the each
method on the File
instance to read each line. In addition, the File
class includes the Enumerable
module via its superclass, the IO
class.
Since the Enumerable module contains useful methods such as map
, filter
, reduce
, etc. that operate on the collections, you can use these methods on the file lines, without having to load the whole file in memory. What that means, is that you can manipulate lines in a file, just like an array.
For example, to group the above CSV records via the company name, you could write:
File.open csv do |f|
puts f.drop(1)
.map { |line| { company: line.split(',')[1], product: line.split(',')[2] } }
.group_by { |record| record[:company] }
end
# Output
#
# {"Microsoft"=>[{:company=>"Microsoft", :product=>"windows\n"}], "Apple"=>[{:company=>"Apple", :product=>"iphone\n"}], "Meta"=>[{:company=>"Meta", :product=>"facebook\n"}], "Google"=>[{:company=>"Google", :product=>"search\n"}], "Amazon"=>[{:company=>"Amazon", :product=>"ecommerce"}]}
That's a wrap. I hope you found this article helpful and you learned something new.
As always, if you have any questions or feedback, didn't understand something, or found a mistake, please leave a comment below or send me an email. I reply to all emails I get from developers, and I look forward to hearing from you.
If you'd like to receive future articles directly in your email, please subscribe to my blog. If you're already a subscriber, thank you.