maemo.org Bugzilla – Bug 4023
Sapwood doesn't align PixbufOpenRequest properly (crash on 770)
Last modified: 2010-08-03 12:13:15 UTC
You need to log in before you can comment on or make changes to this bug.
SOFTWARE VERSION: unspecified (all pre-ARMV6 platforms) STEPS TO REPRODUCE THE PROBLEM: Try running Sapwood on an ARM CPU that does not support reading unaligned 16bit values (ARMV5TE found in Nokia 770 or any other pre-ARMV6 CPU). EXPECTED OUTCOME: Sapwood should work perfectly well. ACTUAL OUTCOME: Sapwood crashes due to misaligned data structure in sapwood-pixmap.c. If you look at the sapwood_pixmap_get_for_file() function, you will see the following code: ... char buf[ sizeof(PixbufOpenRequest) + PATH_MAX + 1 ] = {0}; PixbufOpenRequest *req = (PixbufOpenRequest *) buf; ... The PixbufOpenRequest structure contains 16bit data members while the buf[] address is not guaranteed to be aligned to a 16bit boundary. This causes data corruption and crashing on any pre-ARMV6 platform, such as Nokia 770. Starting with ARMV6, the CPU handles misaligned 16bit accesses gracefully, at least when making them over a 32bit data bus. Proposed fix is to replace the above lines with struct { PixbufOpenRequest request; char namebuf[PATH_MAX+1]; } req; and clear this structure with memset(req,0,sizeof(req)); REPRODUCIBILITY: always OTHER COMMENTS: Please, do not claim that this bug should not be fixed because it "does not occur on the latest Nokia hardware". It is a data alignment bug. It has to be fixed.
Could we get a patch?
Does this happen also with the old 770 "specific" version of Sapwood? Is there a test-case that reliably triggers this? (would help in verification and finding out what's the actual severity of this for 770 etc)
Also see bug 3939.
(In reply to comment #2) > Does this happen also with the old 770 "specific" version of Sapwood? It happens with the current Sapwood, when run on ARMV5TE platforms as part of Mer. > Is there a test-case that reliably triggers this? (would help in verification > and finding out what's the actual severity of this for 770 etc) The actual severity is a Sapwood crash on any pre-ARMV6 platform, 770 or anything else that does not allow misaligned 16bit ints (it does not even have to be an ARM). More details are available from Carsten Munk (putting him on the CC list for this bug).
(In reply to comment #4) >> Is there a test-case that reliably triggers this? (would help in >> verification and finding out what's the actual severity of this for 770 etc) > > The actual severity is a Sapwood crash on any pre-ARMV6 platform, 770 or > anything else that does not allow misaligned 16bit ints (it does not even have > to be an ARM). More details are available from Carsten Munk (putting him on > the CC list for this bug). My questions was about how often that happens: - blocker: - always/often on boot - critical: - always/often when using some specific application - major: - not on boot and rarely/non-reproducibly when using some application, or only with synthetic test-case The test-case is needed e.g. to test that same kind of issue isn't triggered elsewhere in the code, but if this issue is non-reproducible, it has less value. On ARMv6 this would be just a performance issue, aligned data is faster. -> adding also perf keyword.
It happens immediately in sapwood applications when running on pre-ARMV6 platforms, both 770 and on Zaurus (PXA). We had it happening when using the gradient demo, consistently. What happens is a corruption of the request structure, such as length ending up in the padding, and the other structure members shifted to the left memory cell/corrupted, causing bug 3939. Example of this case: #1 0x40df2f60 in pixbuf_proto_request (req=0xbecf1817 "\001G", reqlen=71, rep=0xbecf17c4 "", replen=80, err=0xbecf2860) at sapwood-pixmap.c:71 #2 0x40df3284 in sapwood_pixmap_get_for_file ( filename=0x171fa8 "/usr/share/themes/plankton/images/qgn_plat_view_button.png", border_left=14, border_right=14, border_top=14, border_bottom=14, err=0xbecf2860) at sapwood-pixmap.c:126 (gdb) frame 2 #2 0x40df3284 in sapwood_pixmap_get_for_file ( filename=0x171fa8 "/usr/share/themes/plankton/images/qgn_plat_view_button.png", border_left=14, border_right=14, border_top=14, border_bottom=14, err=0xbecf2860) at sapwood-pixmap.c:126 126 if (!pixbuf_proto_request ((char*)req, reqlen, (gdb) print *(PixbufOpenRequest *) buf $1 = {base = {op = 1 '\001', _pad1 = 71 'G', length = 3584}, border_left = 3584, border_right = 3584, border_top = 3584, border_bottom = 0, filename = 0xbecf1823 "/usr/share/themes/plankton/images/qgn_plat_view_button.png"} Notice that reqlen is initially 71, but it ends up in padding instead. Also, length is 3584 (256*14), and border_bottom is now suddenly 0. Example of the link to bug 3939: Breakpoint 1 at 0x9f00: file sapwood-server.c, line 336. Breakpoint 1, process_buffer (fd=5, buf=0x1329c "\001\\", buflen=92, user_data=0x16ac8) at sapwood-server.c:336 336 in sapwood-server.c (gdb) print *(PixbufBaseRequest *) buf $1 = {op = 1 '\001', _pad1 = 92 '\\', length = 5120} In this case, we see buflen=92 (as it should be), but length is not matching cos of length being moved into padding
Adding the mail mentioned in bug 3939 http://www.spinics.net/lists/arm-kernel/msg52260.html
(In reply to comment #0) > The PixbufOpenRequest structure contains 16bit data members while the buf[] > address is not guaranteed to be aligned to a 16bit boundary. This causes data > corruption and crashing on any pre-ARMV6 platform, such as Nokia 770. Starting > with ARMV6, the CPU handles misaligned 16bit accesses gracefully, at least when > making them over a 32bit data bus. Which version of sapwood are you referring to? The trunk version of sapwood has 8, 16 and 32bit fields, but the struct sizes are all multiples of 32...
(In reply to comment #8) > (In reply to comment #0) > > The PixbufOpenRequest structure contains 16bit data members while the buf[] > > address is not guaranteed to be aligned to a 16bit boundary. This causes data > > corruption and crashing on any pre-ARMV6 platform, such as Nokia 770. Starting > > with ARMV6, the CPU handles misaligned 16bit accesses gracefully, at least when > > making them over a 32bit data bus. > Which version of sapwood are you referring to? The trunk version of sapwood has > 8, 16 and 32bit fields, but the struct sizes are all multiples of 32... I mean any version of Sapwood that contains above-mentioned code fragment. The problem is not related to the data structure size. Rather, it is caused by a cast from the unaligned char buffer to a structure that is supposed to be aligned so that its non-char members are accessible.
Proposed patch for this problem (tested on Zaurus and 770 and found to work), http://bazaar.launchpad.net/~carsten-munk/m-r/sapwood/revision/177
Adding patch keyword. Kind of. See comment 10.
Sven: Realistically speaking as I don't see any progress here: Is this a WONTFIX as there are no plans to include this, because 770 is not supported anymore?
No real need to send private emails. Quote: "As I can't reproduce this bug right now and as there more important sapwood work, I don't have time for this now -- but I still plan to look into it later."
I do not understand why can't you simply fix this alignment problem? Especially when provided a ready made patch? - It is already debugged for you. - The solution provided. - The fix is minor and does not require extensive testing. If you do not fix this alignment problem, Sapwood will consistently go bonkers on architectures that do not allow misaligned data accesses (older ARMs, MIPS, SPARC, etc). Ignoring bug reports, especially as serious as this one, does not say anything good about Maemo development team.
Currently developers really concentrate on Fremantle hence there is not much time left. I understand that this is very unfortunate and can be a source of frustration but almost always it is not the developer to blame for having to follow a tight schedule. As already written: This bug will be handled.
Hi, I've just came across this bug. This is marked internally as WONTFIX. Besides that, and since it cannot be reproduced at all in the current Maemo hardware I don't think it's possible to have this fix accepted and released in any future update. However if there's anything I can do to make life easier for the Mer guys I would like to do it. How about commiting the patch to a separate branch in the Git repository. Would that help you guys?
If this is still valid it probably makes more sense to upstream this to https://bugzilla.gnome.org/enter_bug.cgi?product=sapwood . I don't expect any process here.
WONTFIX