NaN-boxing
To understand why we gave the name “NaN bug” to this bug, we first need to understand the IEEE754 standard. We shall also dive into how JSValues are represented in memory by means of a technique called “NaN-boxing”.
IEEE754
JavaScriptCore uses the IEEE Standard for Floating-Point Arithmetic (IEEE754). This standard serves the purpose of representing floating point values in memory. It does so by encoding, for example on a 64-bit value (double-precision floating-point format), data such as the sign, the exponent, and the significand. There are also 16-bit (half-precision) and 32-bit (single-precision) representations that are outside of the scope of this blog post.
| Sign | Exponent | Significand |
|---|
| Bit 63 | Bits 62-52 | Bits 51-0 |
Depending on these bits, the calculation for the representation would be as follows.
With exponent 0: (-1)**(sign bit) * 2**(1-1023) * 1.significand
With exponent other than 0: (-1)**(sign bit) * 2**(exponent-1023) * 0.significand
With all bits of exponent set and significand is 0: (-1)**(sign bit)*Infinity
With all bits of exponent set and significand not 0: Not a number (NaN)
The reason why 1023 is used on the exponent is because it is encoded using an offset-binary representation which aides in implementing negative numbers with 1023 as the zero offset. In order to understand offset-binary representation, we can picture an example with a 3 digit binary exponent. In this representation it would be possible to encode up to number 7 and the offset would be 4
(2**2). This way we would encode the number 0 as (2**1) in this offset-binary representation and therefore the encoded range would be (-4, 3) corresponding to the binary range of (000, 111).
NaN
If all the bits of the exponent on the IEE754 standard representation are set, it describes a value that is not a number (NaN). These values are described in the standard as a way to establish values that are either undefined or unrepresentable. In addition, there exist Quiet and Signaling NaN values (QNaN, sNaN) which serve the purpose of either notifying of a normal undefined or unrepresentable value or, in the case of a signaling NaN, a representation to add diagnostics info (other data encoded in the payload of the value).
There are 2**51 possible values we can encode in the payload of the NaN number in the double-precision floating-point format. This allows a huge value space for implementers to encode all sorts of information. In hexadecimal, this range would be any values between 0xFFF0000000000000 and 0xFFFFFFFFFFFFFFFF.
Specifically, JavaScriptCore uses NaN values to encode different types of information.
JSValue
Most JavaScript engines choose to represent JavaScript objects in memory in a way that enables efficient handling of the values. JavaScriptCore is no exception, and to do so, it backs up JavaScript objects with the C class JSValue. It is possible to find a detailed explanation on how values in the JavaScript engine are encoded in JavaScriptCore within the file Source/JavaScriptCore/runtime/JSCJSValue.h: