Bug 3939 (int-96588)

Summary: sapwood loops infinitely at malformed request
Product: [Maemo Official Platform] Desktop platform Reporter: Carsten Munk <carsten.munk>
Component: sapwoodAssignee: unassigned <nobody>
Status: RESOLVED WONTFIX QA Contact: sapwood-bugs
Severity: normal    
Priority: Medium CC: andre_klapper, carsten.munk, eero.tamminen, quim.gil, sven
Version: unspecifiedKeywords: patch
Target Milestone: ---   
Hardware: All   
OS: Maemo   
Attachments: testcase doing a too short request

Description Carsten Munk (reporter) maemo.org 2008-12-19 00:32:29 UTC
SOFTWARE VERSION:
from SVN source

STEPS TO REPRODUCE THE PROBLEM:

send sapwood a pixbufrequest where length of data written to socket <
base->length but > sizeof(PixbufBaseRequest)

EXPECTED OUTCOME:

To deal with this gracefully.

ACTUAL OUTCOME:

It pegs CPU to 100%

REPRODUCIBILITY:
(always/sometimes/once)

always:

from sapwood-server.c:

  ofs = 0;
  while (ofs < buflen)
    {
      n = process_buffer (fd, buf + ofs, buflen - ofs, user_data);
      if (n < 0)
        return FALSE;

      ofs += n;
    }

and from process_buffer:

if (buflen < sizeof (PixbufBaseRequest) || buflen < base->length)
    return 0;

if base->length is larger than recieved data, and buflen larger than sizeof
PixbufBaseRequest, it will loop infinitely.

Before closing this bug because "we trust our clients", the reason why this bug
has been posted is because we are looking at some debug output at a user (.. on
a 770) that suggests a corrupted base->length, and this causes sapwood to peg
the cpu.

EXTRA SOFTWARE INSTALLED:

OTHER COMMENTS:

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; da; rv:1.9.0.4)
Gecko/2008102920 Firefox/3.0.4
Comment 1 Eero Tamminen nokia 2008-12-22 16:05:58 UTC
To have this remain critical severity, you should provide a repeatable end-user
test-case.


> Before closing this bug because "we trust our clients", the reason why this
> bug has been posted is because we are looking at some debug output at a user
> (.. on a 770) that suggests a corrupted base->length, and this causes sapwood
> to peg the cpu.

As if you're not trusting the clients, why you would trust the client to write
data to the socket on which Sapwood is reading?
Comment 2 Carsten Munk (reporter) maemo.org 2008-12-22 19:48:47 UTC
Fair enough - it was in terms of that we cannot safely assume clients send
properly structured data - which is what the bug is about, a case where a
corrupted stack, compiler bug, or whatever cause sapwood server to peg CPU
completely and lock up.

We're seeing this on armv5t archs (a zaurus and a 770) with hildon-desktop &
demos/gradient in sapwood, a suspicion of a stack corruption/compiler
bug/whatever where the client sends badly formatted data (length moved into
padding, wrong values in borders , etc). Through chasing down this bug we
noticed this bug as well, so that's why this bug was reported as fixing it may
increase stability with unstable applications.

I've moved severity to normal. Fix -looks- like it would be a matter of
changing 'if (n < 0)' to 'if (n <= 0)' without looking much closer to it.
Comment 3 Sven Herzberg 2008-12-22 23:29:38 UTC
Sapwood trunk has a proper testing framework. So please provide a patch adding
a test for sapwood's behavior (there's already a test you can use as a starting
place for yours).

And with such a testcase you could easily verify your patch-guess and submit a
proper patch if it really fixes this issue.

So I have to agree with Eero here. Unless there is a proper testcase, there's
no trial-and-error fixing of this bug (and also no high severity). We added the
test framework for exactly this purpose to get out of the blue-sky bug fixing
for sapwood.
Comment 4 Carsten Munk (reporter) maemo.org 2008-12-24 01:28:01 UTC
Created an attachment (id=1076) [details]
testcase doing a too short request

Attached is a testcase built on double-free that does the following:

* Connects to sapwood, sends a standard request, but actually only sends one
byte of data. This is one of the cases where the method indicated would return
0, and causing sapwood to spin into a infinite loop. 

On my FREMANTLE_X86 target, sapwood-3.1.1, this spins sapwood into an infinite
loop. Due to limited time I've not had a chance to test using the built-in test
framework (which requires Xvfb which doesn't exist in the Fremantle pre-alpha
SDK). 

A more proper way to test (given the existence of a within-sdk working test
framework of sapwood) would be to take double-free, do a single modification to
the pixbuf_proto_request which changes the length of the packet sent to sapwood
to 1.

Honestly, the problem report is quite simple. Given the situation where
process_buffer returns 0 (which it will in case of malformed header or a too
short buffer), the while (ofs < buflen) loop will not iterate, as we do ofs +=
n, where n in this case is 0, which means it will try process_buffer in next
loop iteration, get same return value again, repeat loop contents over and over
again. 

The problem about how to go about the fix is if to simply rely on that we get
all the data at once in a socket callback, and terminate the connection in the
case of short data, like is done later in process_buffer (causing return FALSE
in socket callback). 

Or make sane buffering: break from the while loop, and keep input in a buffer
for next portion of data/socket callback and keep connection alive (return
TRUE) until the incoming data makes sense.
Comment 5 Eero Tamminen nokia 2009-01-19 15:46:36 UTC
> Due to limited time I've not had a chance to test using the built-in test
> framework (which requires Xvfb which doesn't exist in the Fremantle pre-alpha
> SDK). 

You can run it outside the Scratchbox, similarly to Xephyr.

(Xvfb is used because some of our internal build machines don't have displays.)
Comment 6 Sven Herzberg 2009-01-19 20:17:31 UTC
(In reply to comment #5)
> > Due to limited time I've not had a chance to test using the built-in test
> > framework (which requires Xvfb which doesn't exist in the Fremantle pre-alpha
> > SDK). 
> 
> You can run it outside the Scratchbox, similarly to Xephyr.

Not without patching the source tree.

> (Xvfb is used because some of our internal build machines don't have displays.)

Xvfb is not used because of the build machines. It's used to not distract the
user and because it has some features that Xephyr doesn't provide (eg. multiple
screens, which we will need for fixing a different issue with sapwood…).
Comment 7 Carsten Munk (reporter) maemo.org 2009-01-21 12:21:58 UTC
Related bug: #4023 (sapwood engine request buffer corruption due to misaligned
data structure)
Comment 8 Eero Tamminen nokia 2009-01-21 13:06:26 UTC
(In reply to comment #7)
> Related bug: #4023 (sapwood engine request buffer corruption due to misaligned
> data structure)

Hm.  Misalignment and the infinite loop mentioned in this bug reminds me of
another issue.

Older ARM kernels had the problem that they don't handle unaligned data
accesses properly (on ARMv6 kernel assume CPU handles the alignment, but
unaligned multiword access still causes alignment exception.  Because that's
not handled, process goes into infinite loop with 100% CPU usage instead of
being terminated with SIGBUS).  This should be fixed in Diablo kernel, but it
at least wasn't fixed in upstream ARM kernel[1] if you're using that instead.

[1] See "Misaligned memory accesses handling & ARMv6/ARMv7" thread on ARM
kernel mailing list.
Comment 9 Andre Klapper maemo.org 2010-08-03 01:59:29 UTC
If this is still valid it probably makes more sense to upstream this to
https://bugzilla.gnome.org/enter_bug.cgi?product=sapwood .
I don't expect any process here.
Comment 10 Andre Klapper maemo.org 2010-08-13 15:00:34 UTC
Yes - please upstream in case it makes sense.