Julia: switch slow CSV in favor of Feather

Most probably you have already noticed that reading CSV files in Julia is slow as hell. The process is much more slower than R readr::read_csv or the latest version of data.table.

For example, reading 40mb files consisting of 515356 rows and 25 columns can take up to 7 seconds/ 607MB of memory and that’s on Macbook Pro with SSD.

julia> @benchmark CSV.read(file_path, DataTable, nullable = false, types = data_types)
BenchmarkTools.Trial:
memory estimate: 607.77 MiB
allocs estimate: 37767548
--------------
minimum time: 4.684 s (0.79% GC)
median time: 5.032 s (2.04% GC)
mean time: 5.032 s (2.04% GC)
maximum time: 5.379 s (3.13% GC)
--------------
samples: 2
evals/sample: 1

So what’s the solution?

Continue reading

Julia: getting the best performance out of your data types

Anyone using Julia?

Whenever you just start with Julia every book suggests you to forget about data type annotation and let compiler decide and define it. For example:

Yes, it works great… in 99% of the scenarios. I followed the same approach until I noticed an enormous memory consumption when working with large datasets.

Today I would like to show how using the right data type can go along way toward minimising problems, optimising performance and reducing memory consumption.

Continue reading

21-111, Applications of the derivative, Ch2

I am still doing Calculus 1, 21-111 from Carnegie Mellon UC. Today I was going through Chapter 2. Everything seemed to be clear and obvious.

  • Describing Graphs of Functions
  • The First- and Second-Derivative Rules
  • The First- and Second-Derivative Tests and Curve Sketching
  • Curve Sketching (Conclusion)
  • Optimization Problems
  • Further Optimization Problems
  • Applications of Derivatives to Business and Economics

Continue reading

21-111, The derivative, Ch1

Yesterday I have managed to complete my next chapter of Brief Calculus & Its Applications.

The following topics were covered by Chapter 1:

  • The Slope of a Straight Line
  • The Slope of a Curve at a Point
  • The Derivative and Limits
  • Limits and the Derivative
  • Differentiability and Continuity
  • Some Rules for Differentiation
  • More About Derivatives
  • The Derivative as a Rate of Change

So far so good but I guess my pace is to high right now. I will spend some time going through the book once again just in case I missed anything.

~450 pages left to finish. I might slow down a little bit but hope to be done with the book by the end of next week.

Continue reading

21-111, Review of algebra, Ch0

Today I had a chance to go through the first (0) chapter from the textbook (Brief Calculus & Its Applications 13th edition by Larry J. Goldstein).

It consisted purely from a school math and precisely

  • Functions and Their Graphs
  • Some Important Functions
  • The Algebra of Functions
  • Zeros of Functions—The Quadratic Formula and Factoring
  • Exponents and Power Functions
  • Functions and Graphs in Applications

I am using most of the material on my daily basis, but it was anyways great to refresh the memory.

The book itself is amazing with great examples and solutions throughout each chapter. I had pleasure scrolling the pages 🙂

The most interesting part so far has been Compound Interest and formula which you can find below.

Continue reading

My first blog post

Hello,

My name is Dmitry and I have set target to improve my statistics and machine learning skills.

Throughout the year I will be completing different MOOCs from Carnegie Mellon UC, participate in machine learning competitions and contribute to Julia Lang. You will have a chance to follow my progress directly here.

Carnegie Mellon UC provides access to Syllabus and course materials. On top of that I was lucky to find a page describing the full path becoming a Machine Learning Expert.

I will start with basics Calculus 1, 21-111. The textbook for this course is Brief Calculus & Its Applications by Larry J. Goldstein, David C. Lay, and David I. Schneider.

Continue reading