Most probably you have already noticed that reading CSV files in Julia is slow as hell. The process is much more slower than R readr::read_csv or the latest version of data.table.
For example, reading 40mb files consisting of 515356 rows and 25 columns can take up to 7 seconds/ 607MB of memory and that’s on Macbook Pro with SSD.
julia> @benchmark CSV.read(file_path, DataTable, nullable = false, types = data_types)
memory estimate: 607.77 MiB
allocs estimate: 37767548
minimum time: 4.684 s (0.79% GC)
median time: 5.032 s (2.04% GC)
mean time: 5.032 s (2.04% GC)
maximum time: 5.379 s (3.13% GC)
So what’s the solution?
Anyone using Julia?
Whenever you just start with Julia every book suggests you to forget about data type annotation and let compiler decide and define it. For example:
Yes, it works great… in 99% of the scenarios. I followed the same approach until I noticed an enormous memory consumption when working with large datasets.
Today I would like to show how using the right data type can go along way toward minimising problems, optimising performance and reducing memory consumption.
I am continuing my path on completing 21-11 and today topic is Techniques of Differentiation.
The following topics were covered in the book and classes
- The Product and Quotient Rules
- The Chain Rule and the General Power Rule
- Implicit Differentiation and Related Rates
I am still doing Calculus 1, 21-111 from Carnegie Mellon UC. Today I was going through Chapter 2. Everything seemed to be clear and obvious.
- Describing Graphs of Functions
- The First- and Second-Derivative Rules
- The First- and Second-Derivative Tests and Curve Sketching
- Curve Sketching (Conclusion)
- Optimization Problems
- Further Optimization Problems
- Applications of Derivatives to Business and Economics
Yesterday I have managed to complete my next chapter of Brief Calculus & Its Applications.
The following topics were covered by Chapter 1:
- The Slope of a Straight Line
- The Slope of a Curve at a Point
- The Derivative and Limits
- Limits and the Derivative
- Differentiability and Continuity
- Some Rules for Differentiation
- More About Derivatives
- The Derivative as a Rate of Change
So far so good but I guess my pace is to high right now. I will spend some time going through the book once again just in case I missed anything.
~450 pages left to finish. I might slow down a little bit but hope to be done with the book by the end of next week.
Today I had a chance to go through the first (0) chapter from the textbook (Brief Calculus & Its Applications 13th edition by Larry J. Goldstein).
It consisted purely from a school math and precisely
- Functions and Their Graphs
- Some Important Functions
- The Algebra of Functions
- Zeros of Functions—The Quadratic Formula and Factoring
- Exponents and Power Functions
- Functions and Graphs in Applications
I am using most of the material on my daily basis, but it was anyways great to refresh the memory.
The book itself is amazing with great examples and solutions throughout each chapter. I had pleasure scrolling the pages 🙂
The most interesting part so far has been Compound Interest and formula which you can find below.
My name is Dmitry and I have set target to improve my statistics and machine learning skills.
Throughout the year I will be completing different MOOCs from Carnegie Mellon UC, participate in machine learning competitions and contribute to Julia Lang. You will have a chance to follow my progress directly here.
Carnegie Mellon UC provides access to Syllabus and course materials. On top of that I was lucky to find a page describing the full path becoming a Machine Learning Expert.
I will start with basics Calculus 1, 21-111. The textbook for this course is Brief Calculus & Its Applications by Larry J. Goldstein, David C. Lay, and David I. Schneider.