Skip to content

LwIP issue, or how a simple "if" can go wrong

Ivan Grokhotkov edited this page Jan 26, 2016 · 1 revision

There once was an issue of apparently random exceptions coming from the WebServer. It was hard to track, but was ultimately resolved by re-building LwIP library from source provided by Espressif. It was quite strange that lwip.a built by Espressif with xt-xcc resulted in issues, while the one built with gcc did not. However, the issue was resolved, everyone was happy, and I have already lost too much time to analyze this further.

So time went by, we have upgraded to SDK 1.5.1, and with the Espressif-provided LwIP the issue, naturally, reappeared. I couldn't fix this again by building LwIP from source, because LwIP source for this SDK version was not released yet. So I decided to dig into it again.

The offending piece of code was found within tcp_input function. Here is the source (manually pre-processed for clarity):

err_t err;
if(pcb->recv != NULL) {
    err = pcb->recv(pcb->callback_arg, pcb, NULL, ERR_OK);
} else {
    err = ERR_OK;
}
if (err == ERR_ABRT) {
    goto aborted;
}
/// some code

tcp_output(pcb);
/// more code

aborted:
/// jump here if connection was aborted from callback

This part of code gets executed when TCP connection is closed by the remote side. LwIP calls recv callback with NULL pbuf pointer, indicating that connection is closed. Application code may simply return ERR_OK, or it may abort the connection (calling tcp_abort(pcb)) and return ERR_ABRT. LwIP will check the return value of callback and will either go through normal close (calling tcp_output), or bail out if ERR_ABRT was returned. It is particularly important to bail out in this case, because tcp_abort frees memory allocated for pcb, so no further operations on pcb may be performed, including tcp_output.

Let's look at the code generated for this function:

|4022f600 loc_4022f600:
│4022f600     $a7 = *(u32*)($a12 + 0x8c)    ; l32i     $a7, $a12, 0x8c     ; $a7 = pcb->recv
│4022f603     if (!$a7) goto recv_is_null   ; beqz.n   $a7, recv_is_null
│4022f605     $a2 = *(u32*)($a12 + 0x18)    ; l32i.n   $a2, $a12, 0x18
│4022f607     $a3 = $a12                    ; mov.n    $a3, $a12
│4022f609     $a4 = 0x0                     ; movi.n   $a4, 0x0
│4022f60b     $a5 = 0x0                     ; movi.n   $a5, 0x0
│4022f60d     call $a7                      ; callx0   $a7                 ; call recv callback
│4022f610     goto if_block_done            ; j        if_block_done       ; return value (err) is in $a2
│4022f613              ; xref: 0x4022f603 j
│4022f613 recv_is_null:
│4022f613     $a2 = 0x0                     ; movi.n   $a2, 0x0            ; err = 0 (ERR_OK)
│4022f615              ; xref: 0x4022f610 j
│4022f615 if_block_done:
│4022f615     $a4 = $a2 + 0x8               ; addi.n   $a4, $a2, 0x8       ; err += 8
│4022f617     if (!$a4) goto label_aborted  ; beqz     $a4, label_aborted

This looks pretty straightforward. First we check if the callback function pointer is non-zero. If it's not, we branch to recv_is_null and set err variable in register a2 to zero (which is ERR_OK). If a callback is set, we set up arguments and invoke it. Return value is placed into the same a2 register.

Then the code compares err with ERR_ABRT (which is defined as -8) by adding 8 and comparing to 0. This is fine until you remember that err_t is a typedef for int8_t. So if recv callback returns ERR_ABRT, the value of a2 will be 248. Add 8 to that, and you get 256, not 0. Uh-oh! The code fails to properly handle ERR_ABRT, goes on to call tcp_output on a destroyed pcb, and we are in trouble.

Looking through the code, sent callback is also affected by the same issue. However, in another part of tcp_input compiler decided to generate a different sequence of instructions:

│4022f4b9     $a11 = -0x8                        ; movi.n   $a11, -0x8
│4022f4bb     if ($a2 == $a11) goto aborted      ; beq      $a2, $a11, aborted

Until we find a proper fix for this issue (on the xt-xcc side), we will convert return values of callbacks to int32_t.

Clone this wiki locally