Bug 4023 (int-99528)

Summary: Sapwood doesn't align PixbufOpenRequest properly (crash on 770)
Product: [Maemo Official Platform] Desktop platform Reporter: luarvique
Component: sapwoodAssignee: Sven Herzberg <sven>
Status: RESOLVED WONTFIX QA Contact: sapwood-bugs
Severity: critical    
Priority: Low CC: agarcia, andre_klapper, carsten.munk, eero.tamminen, sven
Version: unspecifiedKeywords: crash, patch, performance
Target Milestone: ---   
Hardware: All   
OS: Maemo   

Description luarvique (reporter) 2009-01-21 12:10:05 UTC
SOFTWARE VERSION:
unspecified (all pre-ARMV6 platforms)

STEPS TO REPRODUCE THE PROBLEM:
Try running Sapwood on an ARM CPU that does not support reading unaligned 16bit
values (ARMV5TE found in Nokia 770 or any other pre-ARMV6 CPU).

EXPECTED OUTCOME:
Sapwood should work perfectly well.

ACTUAL OUTCOME:
Sapwood crashes due to misaligned data structure in sapwood-pixmap.c. If you
look at the sapwood_pixmap_get_for_file() function, you will see the following
code:

...
char               buf[ sizeof(PixbufOpenRequest) + PATH_MAX + 1 ] = {0};
PixbufOpenRequest *req = (PixbufOpenRequest *) buf;
...

The PixbufOpenRequest structure contains 16bit data members while the buf[]
address is not guaranteed to be aligned to a 16bit boundary. This causes data
corruption and crashing on any pre-ARMV6 platform, such as Nokia 770. Starting
with ARMV6, the CPU handles misaligned 16bit accesses gracefully, at least when
making them over a 32bit data bus. Proposed fix is to replace the above lines
with

struct
{
  PixbufOpenRequest request;
  char namebuf[PATH_MAX+1];
} req;

and clear this structure with

memset(req,0,sizeof(req));

REPRODUCIBILITY:
always

OTHER COMMENTS:
Please, do not claim that this bug should not be fixed because it "does not
occur on the latest Nokia hardware". It is a data alignment bug. It has to be
fixed.
Comment 1 Ryan Abel maemo.org 2009-01-21 12:20:11 UTC
Could we get a patch?
Comment 2 Eero Tamminen nokia 2009-01-21 12:28:52 UTC
Does this happen also with the old 770 "specific" version of Sapwood?

Is there a test-case that reliably triggers this? (would help in verification
and finding out what's the actual severity of this for 770 etc)
Comment 3 Andre Klapper maemo.org 2009-01-21 12:37:44 UTC
Also see bug 3939.
Comment 4 luarvique (reporter) 2009-01-21 12:40:29 UTC
(In reply to comment #2)
> Does this happen also with the old 770 "specific" version of Sapwood?
It happens with the current Sapwood, when run on ARMV5TE platforms as part of
Mer.

> Is there a test-case that reliably triggers this? (would help in verification
> and finding out what's the actual severity of this for 770 etc)
The actual severity is a Sapwood crash on any pre-ARMV6 platform, 770 or
anything else that does not allow misaligned 16bit ints (it does not even have
to be an ARM). More details are available from Carsten Munk (putting him on the
CC list for this bug).
Comment 5 Eero Tamminen nokia 2009-01-21 13:18:27 UTC
(In reply to comment #4)
>> Is there a test-case that reliably triggers this? (would help in
>> verification and finding out what's the actual severity of this for 770 etc)
>
> The actual severity is a Sapwood crash on any pre-ARMV6 platform, 770 or
> anything else that does not allow misaligned 16bit ints (it does not even have
> to be an ARM). More details are available from Carsten Munk (putting him on
> the CC list for this bug). 

My questions was about how often that happens:
- blocker:
  - always/often on boot
- critical:
  - always/often when using some specific application
- major:
  - not on boot and rarely/non-reproducibly when using some application,
    or only with synthetic test-case

The test-case is needed e.g. to test that same kind of issue isn't triggered
elsewhere in the code, but if this issue is non-reproducible, it has less
value.

On ARMv6 this would be just a performance issue, aligned data is faster. ->
adding also perf keyword.
Comment 6 Carsten Munk maemo.org 2009-01-21 13:22:24 UTC
It happens immediately in sapwood applications when running on pre-ARMV6
platforms, both 770 and on Zaurus (PXA). We had it happening when using the
gradient demo, consistently. What happens is a corruption of the request
structure, such as length ending up in the padding, and the other structure
members shifted to the left memory cell/corrupted, causing bug 3939.

Example of this case:

#1  0x40df2f60 in pixbuf_proto_request (req=0xbecf1817 "\001G", reqlen=71, 
rep=0xbecf17c4 "", replen=80, err=0xbecf2860) at sapwood-pixmap.c:71
#2  0x40df3284 in sapwood_pixmap_get_for_file ( filename=0x171fa8
"/usr/share/themes/plankton/images/qgn_plat_view_button.png", border_left=14,
border_right=14, border_top=14, border_bottom=14,  err=0xbecf2860) at
sapwood-pixmap.c:126

(gdb) frame 2
#2  0x40df3284 in sapwood_pixmap_get_for_file ( filename=0x171fa8
"/usr/share/themes/plankton/images/qgn_plat_view_button.png", border_left=14,
border_right=14, border_top=14, border_bottom=14,  err=0xbecf2860) at
sapwood-pixmap.c:126
126       if (!pixbuf_proto_request ((char*)req,  reqlen,
(gdb) print *(PixbufOpenRequest *) buf
$1 = {base = {op = 1 '\001', _pad1 = 71 'G', length = 3584},  border_left =
3584, border_right = 3584, border_top = 3584,  border_bottom = 0,  filename =
0xbecf1823 "/usr/share/themes/plankton/images/qgn_plat_view_button.png"}

Notice that reqlen is initially 71, but it ends up in padding instead. Also,
length is 3584 (256*14), and border_bottom is now suddenly 0.

Example of the link to bug 3939:

Breakpoint 1 at 0x9f00: file sapwood-server.c, line 336.
Breakpoint 1, process_buffer (fd=5, buf=0x1329c "\001\\", buflen=92, 
user_data=0x16ac8) at sapwood-server.c:336
336     in sapwood-server.c
(gdb) print *(PixbufBaseRequest *) buf
$1 = {op = 1 '\001', _pad1 = 92 '\\', length = 5120}

In this case, we see buflen=92 (as it should be), but length is not matching
cos of length being moved into padding
Comment 7 Sven Herzberg 2009-01-21 14:43:05 UTC
Adding the mail mentioned in bug 3939

http://www.spinics.net/lists/arm-kernel/msg52260.html
Comment 8 Sven Herzberg 2009-01-21 14:57:37 UTC
(In reply to comment #0)
> The PixbufOpenRequest structure contains 16bit data members while the buf[]
> address is not guaranteed to be aligned to a 16bit boundary. This causes data
> corruption and crashing on any pre-ARMV6 platform, such as Nokia 770. Starting
> with ARMV6, the CPU handles misaligned 16bit accesses gracefully, at least when
> making them over a 32bit data bus.

Which version of sapwood are you referring to? The trunk version of sapwood has
8, 16 and 32bit fields, but the struct sizes are all multiples of 32...
Comment 9 luarvique (reporter) 2009-01-21 15:10:15 UTC
(In reply to comment #8)
> (In reply to comment #0)
> > The PixbufOpenRequest structure contains 16bit data members while the buf[]
> > address is not guaranteed to be aligned to a 16bit boundary. This causes data
> > corruption and crashing on any pre-ARMV6 platform, such as Nokia 770. Starting
> > with ARMV6, the CPU handles misaligned 16bit accesses gracefully, at least when
> > making them over a 32bit data bus.
> Which version of sapwood are you referring to? The trunk version of sapwood has
> 8, 16 and 32bit fields, but the struct sizes are all multiples of 32...
I mean any version of Sapwood that contains above-mentioned code fragment. The
problem is not related to the data structure size. Rather, it is caused by a
cast from the unaligned char buffer to a structure that is supposed to be
aligned so that its non-char members are accessible.
Comment 10 Carsten Munk maemo.org 2009-01-22 00:20:08 UTC
Proposed patch for this problem (tested on Zaurus and 770 and found to work),
http://bazaar.launchpad.net/~carsten-munk/m-r/sapwood/revision/177
Comment 11 Andre Klapper maemo.org 2009-01-22 01:33:02 UTC
Adding patch keyword. Kind of. See comment 10.
Comment 12 Andre Klapper maemo.org 2009-06-23 17:41:20 UTC
Sven:
Realistically speaking as I don't see any progress here:
Is this a WONTFIX as there are no plans to include this, because 770 is not
supported anymore?
Comment 13 Andre Klapper maemo.org 2009-06-23 23:46:14 UTC
No real need to send private emails.
Quote:
"As I can't reproduce this bug right now and as there more important
sapwood work, I don't have time for this now -- but I still plan to look
into it later."
Comment 14 luarvique (reporter) 2009-06-24 00:06:34 UTC
I do not understand why can't you simply fix this alignment problem? Especially
when provided a ready made patch?

- It is already debugged for you.
- The solution provided.
- The fix is minor and does not require extensive testing.

If you do not fix this alignment problem, Sapwood will consistently go bonkers
on architectures that do not allow misaligned data accesses (older ARMs, MIPS,
SPARC, etc). Ignoring bug reports, especially as serious as this one, does not
say anything good about Maemo development team.
Comment 15 Andre Klapper maemo.org 2009-06-24 02:20:53 UTC
Currently developers really concentrate on Fremantle hence there is not much
time left.
I understand that this is very unfortunate and can be a source of frustration
but almost always it is not the developer to blame for having to follow a tight
schedule.
As already written: This bug will be handled.
Comment 16 Alberto Garcia Gonzalez 2010-03-08 16:05:56 UTC
Hi, I've just came across this bug.

This is marked internally as WONTFIX.

Besides that, and since it cannot be reproduced at all in the current
Maemo hardware I don't think it's possible to have this fix accepted
and released in any future update.

However if there's anything I can do to make life easier for the Mer
guys I would like to do it.

How about commiting the patch to a separate branch in the Git
repository. Would that help you guys?
Comment 17 Andre Klapper maemo.org 2010-08-03 01:59:34 UTC
If this is still valid it probably makes more sense to upstream this to
https://bugzilla.gnome.org/enter_bug.cgi?product=sapwood .
I don't expect any process here.
Comment 18 Alberto Garcia Gonzalez 2010-08-03 12:13:15 UTC
WONTFIX