Julia: switch slow CSV in favor of Feather

Most probably you have already noticed that reading CSV files in Julia is slow as hell. The process is much more slower than R readr::read_csv or the latest version of data.table.

For example, reading 40mb files consisting of 515356 rows and 25 columns can take up to 7 seconds/ 607MB of memory and that’s on Macbook Pro with SSD.

julia> @benchmark CSV.read(file_path, DataTable, nullable = false, types = data_types)
BenchmarkTools.Trial:
memory estimate: 607.77 MiB
allocs estimate: 37767548
--------------
minimum time: 4.684 s (0.79% GC)
median time: 5.032 s (2.04% GC)
mean time: 5.032 s (2.04% GC)
maximum time: 5.379 s (3.13% GC)
--------------
samples: 2
evals/sample: 1

So what’s the solution?

The solution would be to save and read files from feather. Feather was designed by Apache to be a very fast file format for storing data frames.

What is Feather?

Feather is a fast, lightweight, and easy-to-use binary file format for storing data frames. It has a few specific design goals:

  • Lightweight, minimal API: make pushing data frames in and out of memory as simple as possible
  • Language agnostic: Feather files are the same whether written by Julia, Python or R code.
  • High read and write performance. When possible, Feather operations should be bound by local disk performance.

How good is it?

The same CSV file saved as feather and then read in Julia. According to the results below Feather performed 25 times faster.

julia> @benchmark Feather.read("opens/data/data.contacts.feather")
BenchmarkTools.Trial:
memory estimate: 43.96 MiB
allocs estimate: 4860
--------------
minimum time: 47.767 ms (1.92% GC)
median time: 73.757 ms (22.80% GC)
mean time: 118.652 ms (52.99% GC)
maximum time: 183.505 ms (68.02% GC)
--------------
samples: 43
evals/sample: 1

No-brainer I will use it next time I will have to save files.

The package is registered in METADATA.jl and so can be installed with Pkg.add.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s