Anyone using Julia?
Whenever you just start with Julia every book suggests you to forget about data type annotation and let compiler decide and define it. For example:
Yes, it works great… in 99% of the scenarios. I followed the same approach until I noticed an enormous memory consumption when working with large datasets.
Today I would like to show how using the right data type can go along way toward minimising problems, optimising performance and reducing memory consumption.
So the reason for enormously high memory consumption is the number of bits required to store your value. I suspect you are using 64 bit system. Int64 is standard for Integers.
|Type||Signed?||Number of bits||Smallest value||Largest value|
|Int8||✓||8||-2^7||2^7 – 1|
|UInt8||8||0||2^8 – 1|
|Int16||✓||16||-2^15||2^15 – 1|
|UInt16||16||0||2^16 – 1|
|Int32||✓||32||-2^31||2^31 – 1|
|UInt32||32||0||2^32 – 1|
|Int64||✓||64||-2^63||2^63 – 1|
|UInt64||64||0||2^64 – 1|
|Int128||✓||128||-2^127||2^127 – 1|
|UInt128||128||0||2^128 – 1|
|Bool||N/A||8||false (0)||true (1)|
So the example we had above could be easily defined as an array of Int8 and we would have saved 56*4 bits in total:
a = Int8[1, 2, 3, 4]
But does it really matter that much? I will run some tests below to prove that choosing the appropriate data type matters.
We will randomly generate integers from 0 to 100 and store them as Int64, Float32, Int8 and compare benchmarks.
NB: The script below runs in a global scope and that can affect the results.
From the Git below you can see that both speed and memory consumption are highly linked to the data types. There is over 60MB difference between Int64 and Int8!
Now imagine having multi-dimensional array. What would be the difference there?
We have chosen Julia for its speed being close to C/ Fortran so lets be careful.