Scala Notes Number Madness
This is another article in the scala-notes series. This one deals with what I call number madness.
What!?
Numbers are used everywhere in code, and often the type used is something like
Int (Scala) or Integer/int (Java). These simple types are familiar to
those of us who used languages like C but should not be used without
considering their inherent limitations. Let’s imagine an example
web application that during its normal operation records interesting events…
Imagine a field used to represent the key of a database table named
INTERESTING_EVENT, specifically an automatically incrementing surrogate key,
as is typical. (This is not the best example, in part because one would rarely
need perform arithmetic on a key value, but it is simple to explain.) Let’s say
that on average, only twenty INTERESTING_EVENT records are populated per
second.
Is an Int or Integer safe? Let’s get some perspective…
- 365 days per year
- 8760 hours per year
- 525,600 minutes per year
- 31,536,000 seconds per year
Given a liner expansion of the INTERESTING_EVENT sequence, by twenty each
second, the value would grow by 630,720,000 each year.
- 630,720,000 events per year
In four years that would grow to 2,524,608,000.
- 2,522,880,000 events in four years
The MAX_VALUE for java.lang.Integer is 2,147,483,647.
The example is contrived and unlikely but also recall that automatically
incremented keys do not always grow linearly, and twenty events per second is
conservative for a web server… (Insert your own web scale humor here.) But
even with those constraints the key grows beyond the capacity of
java.lang.Integer in less that four years. Four years are enough time to
forget about the capacity constraint built into the example application.
You might forget how these simple number types work and think:
No problem, statistics or other processes are running, constantly computing metrics against
Eventsand some computations use the key value… Surely an exception will be thrown as soon as the Integer overflows.
This is where you would be wrong.
Division will throw an ArithmeticException on division by zero, but a sum,
for example, will quietly continue on an overflow. Let me repeat that…
The sum will return an incorrect value on overflow, but it will not throw a
runtime exception! This is serious problem. The application will quietly
continue to run, but the mathematical operations on the value will be
incorrect. This should be a real “What!?” moment.
Two’s complement
The JVM stores an int as a two’s complement binary (the core of
java.lang.Integer and scala.Int), and thus it is often referred to as a
two’s complement type.
Specifically, the JVM uses a 32-bit signed two’s complement integer for int.
The most significant bit is used to designate the sign of the number. The max
value for int is 2,147,483,647, and would thus be represented1 like 0111 1111 1111 1111 1111 1111 1111 1111.
Simple sum
First lets simply add 1 to 2,147,483,646.
0111 1111 1111 1111 1111 1111 1111 1110
+ 0000 0000 0000 0000 0000 0000 0000 0001
-------------------------------------------
0111 1111 1111 1111 1111 1111 1111 1111
The result is what we expect, 0111 1111 1111 1111 1111 1111 1111 1111, or
2,147,483,647 in decimal.
Negative values
The JVM int is signed, that is to say both positive and negative values can
be stored as an int.
Using the MSB (most significant bit) to designate sign makes it possible to
store negative values using the same placeholders and arithmetic. A negative
is represented with the MSB set to 1. To convert -2,147,483,647 to a two’s
complement binary for int, we must find the complement of each bit then add
one to the result.
Find the complement of each bit of
0111 1111 1111 1111 1111 1111 1111 1111.
1000 0000 0000 0000 0000 0000 0000 0000Add one
1000 0000 0000 0000 0000 0000 0000 0000
+ 0000 0000 0000 0000 0000 0000 0000 0001
-------------------------------------------
1000 0000 0000 0000 0000 0000 0000 0001
The minimum value would be the value with the greatest magnitude and the
opposite sign. The 32 bit, negative, two’s complement number with the greatest
magnitude would be where the sign bit is set to 1 and all the other bits are
set to 0, i.e.. 1000 0000 0000 0000 0000 0000 0000 0000. This means the
min value2 is -2,147,483,648 in decimal.
Bad arithmetic without error
Let’s see what happens when we overflow by adding one to 2,147,483,647.
A carry bit needed…
1
0111 1111 1111 1111 1111 1111 1111 1111
+ 0000 0000 0000 0000 0000 0000 0000 0001
-------------------------------------------
0
and so on…
1
0111 1111 1111 1111 1111 1111 1111 1111
+ 0000 0000 0000 0000 0000 0000 0000 0001
-------------------------------------------
1000 0000 0000 0000 0000 0000 0000 0000
Wait! We have seen this one before, it is -2,147,483,648! So in this
case the result is mathematically incorrect, but a valid int!
You can try out Scala code snips in the online REPL
here, and Java in the online REPL
here. The following return
-2147483648.
- Scala
-
Int.MaxValue + 1 - Java
-
Integer.MAX_VALUE + 1
BigInteger (java.math.BigInteger) to the rescue?
Clearly when arithmetic is needed, or when it is possible to overflow a field, one should use something safe like BigInteger over its primitive counterpart. Since one can never know for certain how a field will be used, selecting a safe type should be the default.
If one can be sure that arithmetic will never be performed on a field, and benchmarking shows that using a primitive is needed, then maybe it is worth the risk.
Primitive Type
- Pros
- Simple quick arithmetic.
- Fixed memory allocation size.
-
Symbol operators. (
+ - * /) - Cons
- Arithmetic can fail without error!
BigInteger (java.math.BigInteger)
- Pros
- Correct arithmetic
- Cons
-
Can not use symbol operators; must use
add,subtract,multiply, anddividemethods. - Arithmetic could be slower.
- Higher memory allocation requirements because more information is stored and capacity is dynamic.
BigInt (scala.math.BigInt)
If you live in the Scala world you have better options, such as
scala.math.BigInt. It can be thought of a Scala’s BigInteger. In Scala,
characters such as + - * / can be used as method names, so this makes using
scala.math.BigInt a pleasant experience.
Spire
Finally, when talking about handling numbers in Scala, we need to talk about spire. Spire makes working with numbers a beautiful experience.
//spire example
object Inspired extends App {
import spire.syntax.literals.si._
val y = big"2 147 483 647"
println(s"y, using spire string interpolation = ${y + 1}")
}Look at how clear and simple it is to create a BigInt using
spire string interpolation. Using spaces makes
it easier to identify the place values for each digit and results in source
that is simpler to read. Spire supports additional number types, type classes
and more. Let’s take
SafeLong for
example. It works as a nice replacement for BigInt. While it provides
accurate arithmetic, independant of value, it performs better than BigInt for
values inside the java.lang.Long range.3
Conclusion
A two’s complement type like int should be considered an optimization choice
for Scala and Java software development, and not a default. Its performance
comes at a cost and that cost is silent failure. The default choice for a
whole number field should be something like BigInt or SafeLong because they
ensure correct arithmetic without regard to the size of the operands. This is
made far more pleasant in Scala, especially when using a library like
spire.



