maemo.org Bugzilla – Bug 3939
sapwood loops infinitely at malformed request
Last modified: 2010-08-13 15:00:34 UTC
You need to log in before you can comment on or make changes to this bug.
SOFTWARE VERSION: from SVN source STEPS TO REPRODUCE THE PROBLEM: send sapwood a pixbufrequest where length of data written to socket < base->length but > sizeof(PixbufBaseRequest) EXPECTED OUTCOME: To deal with this gracefully. ACTUAL OUTCOME: It pegs CPU to 100% REPRODUCIBILITY: (always/sometimes/once) always: from sapwood-server.c: ofs = 0; while (ofs < buflen) { n = process_buffer (fd, buf + ofs, buflen - ofs, user_data); if (n < 0) return FALSE; ofs += n; } and from process_buffer: if (buflen < sizeof (PixbufBaseRequest) || buflen < base->length) return 0; if base->length is larger than recieved data, and buflen larger than sizeof PixbufBaseRequest, it will loop infinitely. Before closing this bug because "we trust our clients", the reason why this bug has been posted is because we are looking at some debug output at a user (.. on a 770) that suggests a corrupted base->length, and this causes sapwood to peg the cpu. EXTRA SOFTWARE INSTALLED: OTHER COMMENTS: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; da; rv:1.9.0.4) Gecko/2008102920 Firefox/3.0.4
To have this remain critical severity, you should provide a repeatable end-user test-case. > Before closing this bug because "we trust our clients", the reason why this > bug has been posted is because we are looking at some debug output at a user > (.. on a 770) that suggests a corrupted base->length, and this causes sapwood > to peg the cpu. As if you're not trusting the clients, why you would trust the client to write data to the socket on which Sapwood is reading?
Fair enough - it was in terms of that we cannot safely assume clients send properly structured data - which is what the bug is about, a case where a corrupted stack, compiler bug, or whatever cause sapwood server to peg CPU completely and lock up. We're seeing this on armv5t archs (a zaurus and a 770) with hildon-desktop & demos/gradient in sapwood, a suspicion of a stack corruption/compiler bug/whatever where the client sends badly formatted data (length moved into padding, wrong values in borders , etc). Through chasing down this bug we noticed this bug as well, so that's why this bug was reported as fixing it may increase stability with unstable applications. I've moved severity to normal. Fix -looks- like it would be a matter of changing 'if (n < 0)' to 'if (n <= 0)' without looking much closer to it.
Sapwood trunk has a proper testing framework. So please provide a patch adding a test for sapwood's behavior (there's already a test you can use as a starting place for yours). And with such a testcase you could easily verify your patch-guess and submit a proper patch if it really fixes this issue. So I have to agree with Eero here. Unless there is a proper testcase, there's no trial-and-error fixing of this bug (and also no high severity). We added the test framework for exactly this purpose to get out of the blue-sky bug fixing for sapwood.
Created an attachment (id=1076) [details] testcase doing a too short request Attached is a testcase built on double-free that does the following: * Connects to sapwood, sends a standard request, but actually only sends one byte of data. This is one of the cases where the method indicated would return 0, and causing sapwood to spin into a infinite loop. On my FREMANTLE_X86 target, sapwood-3.1.1, this spins sapwood into an infinite loop. Due to limited time I've not had a chance to test using the built-in test framework (which requires Xvfb which doesn't exist in the Fremantle pre-alpha SDK). A more proper way to test (given the existence of a within-sdk working test framework of sapwood) would be to take double-free, do a single modification to the pixbuf_proto_request which changes the length of the packet sent to sapwood to 1. Honestly, the problem report is quite simple. Given the situation where process_buffer returns 0 (which it will in case of malformed header or a too short buffer), the while (ofs < buflen) loop will not iterate, as we do ofs += n, where n in this case is 0, which means it will try process_buffer in next loop iteration, get same return value again, repeat loop contents over and over again. The problem about how to go about the fix is if to simply rely on that we get all the data at once in a socket callback, and terminate the connection in the case of short data, like is done later in process_buffer (causing return FALSE in socket callback). Or make sane buffering: break from the while loop, and keep input in a buffer for next portion of data/socket callback and keep connection alive (return TRUE) until the incoming data makes sense.
> Due to limited time I've not had a chance to test using the built-in test > framework (which requires Xvfb which doesn't exist in the Fremantle pre-alpha > SDK). You can run it outside the Scratchbox, similarly to Xephyr. (Xvfb is used because some of our internal build machines don't have displays.)
(In reply to comment #5) > > Due to limited time I've not had a chance to test using the built-in test > > framework (which requires Xvfb which doesn't exist in the Fremantle pre-alpha > > SDK). > > You can run it outside the Scratchbox, similarly to Xephyr. Not without patching the source tree. > (Xvfb is used because some of our internal build machines don't have displays.) Xvfb is not used because of the build machines. It's used to not distract the user and because it has some features that Xephyr doesn't provide (eg. multiple screens, which we will need for fixing a different issue with sapwood…).
Related bug: #4023 (sapwood engine request buffer corruption due to misaligned data structure)
(In reply to comment #7) > Related bug: #4023 (sapwood engine request buffer corruption due to misaligned > data structure) Hm. Misalignment and the infinite loop mentioned in this bug reminds me of another issue. Older ARM kernels had the problem that they don't handle unaligned data accesses properly (on ARMv6 kernel assume CPU handles the alignment, but unaligned multiword access still causes alignment exception. Because that's not handled, process goes into infinite loop with 100% CPU usage instead of being terminated with SIGBUS). This should be fixed in Diablo kernel, but it at least wasn't fixed in upstream ARM kernel[1] if you're using that instead. [1] See "Misaligned memory accesses handling & ARMv6/ARMv7" thread on ARM kernel mailing list.
If this is still valid it probably makes more sense to upstream this to https://bugzilla.gnome.org/enter_bug.cgi?product=sapwood . I don't expect any process here.
Yes - please upstream in case it makes sense.