NaN-boxing
To understand why we gave the name “NaN bug” to this bug, we first need to understand the IEEE754 standard. We shall also dive into how JSValue
s are represented in memory by means of a technique called “NaN-boxing”.
IEEE754
JavaScriptCore uses the IEEE Standard for Floating-Point Arithmetic (IEEE754). This standard serves the purpose of representing floating point values in memory. It does so by encoding, for example on a 64-bit value (double-precision floating-point format), data such as the sign, the exponent, and the significand. There are also 16-bit (half-precision) and 32-bit (single-precision) representations that are outside of the scope of this blog post.
Sign | Exponent | Significand |
---|
Bit 63 | Bits 62-52 | Bits 51-0 |
Depending on these bits, the calculation for the representation would be as follows.
With exponent 0: (-1)**(sign bit) * 2**(1-1023) * 1.significand
With exponent other than 0: (-1)**(sign bit) * 2**(exponent-1023) * 0.significand
With all bits of exponent set and significand is 0: (-1)**(sign bit)*Infinity
With all bits of exponent set and significand not 0: Not a number
(NaN
)
The reason why 1023
is used on the exponent is because it is encoded using an offset-binary representation which aides in implementing negative numbers with 1023 as the zero offset. In order to understand offset-binary representation, we can picture an example with a 3 digit binary exponent. In this representation it would be possible to encode up to number 7
and the offset would be 4
(2**2
). This way we would encode the number 0
as (2**1
) in this offset-binary representation and therefore the encoded range would be (-4, 3)
corresponding to the binary range of (000
, 111
).
NaN
If all the bits of the exponent on the IEE754 standard representation are set, it describes a value that is not a number (NaN
). These values are described in the standard as a way to establish values that are either undefined or unrepresentable. In addition, there exist Quiet and Signaling NaN
values (QNaN
, sNaN
) which serve the purpose of either notifying of a normal undefined or unrepresentable value or, in the case of a signaling NaN
, a representation to add diagnostics info (other data encoded in the payload of the value).
There are 2**51
possible values we can encode in the payload of the NaN
number in the double-precision floating-point format. This allows a huge value space for implementers to encode all sorts of information. In hexadecimal, this range would be any values between 0xFFF0000000000000
and 0xFFFFFFFFFFFFFFFF
.
Specifically, JavaScriptCore uses NaN
values to encode different types of information.
JSValue
Most JavaScript engines choose to represent JavaScript objects in memory in a way that enables efficient handling of the values. JavaScriptCore is no exception, and to do so, it backs up JavaScript objects with the C class JSValue
. It is possible to find a detailed explanation on how values in the JavaScript engine are encoded in JavaScriptCore within the file Source/JavaScriptCore/runtime/JSCJSValue.h
: