Bug 4870 - Red & blue color channels swapped sometimes in SDK
: Red & blue color channels swapped sometimes in SDK
Status: NEW
Product: Desktop platform
clutter
: 5.0:(10.2010.19-1)
: x86 Linux
: Low minor with 3 votes (vote)
: 5.0+
Assigned To: unassigned
: clutter-bugs
: http://talk.maemo.org/showthread.php?...
: patch
:
:
  Show dependency tree
 
Reported: 2009-08-03 02:58 UTC by Javier S. Pedro
Modified: 2010-10-25 18:59 UTC (History)
8 users (show)

See Also:


Attachments
Testcase (1.85 KB, text/x-csrc)
2009-08-03 02:59 UTC, Javier S. Pedro
Details
proposed patch to clutter-0.8.2-0maemo49+0m5 (534 bytes, patch)
2009-11-05 02:01 UTC, Javier S. Pedro
Details
more complete patch (2.00 KB, patch)
2009-11-05 19:23 UTC, Javier S. Pedro
Details
Sets "texture from pixmap" textures byte order appropiately. (1.35 KB, patch)
2009-12-31 21:02 UTC, Javier S. Pedro
Details
newer patch as per comment #22 (2.59 KB, patch)
2010-04-27 22:19 UTC, Javier S. Pedro
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description Javier S. Pedro (reporter) 2009-08-03 02:58:40 UTC
SOFTWARE VERSION: FREMANTLE_X86 SDK beta2.
PLATFORM: Debian Lenny amd64.

STEPS TO REPRODUCE THE PROBLEM:
1. Build attached test application with "gcc `sdl-config --libs --cflags`
video.c -o video" (or write any SDL application showing a black to yellow
gradient which specifies preferred window size in call to SDL_SetVideoMode).
2. Start Xephyr & Hildon (af-sb-init.sh start).
3. Run the application built in step 1.

EXPECTED OUTCOME:
Black to yellow gradient.
http://cosas.javispedro.com/nit/bug1/good.png

ACTUAL OUTCOME:
Black to blue gradient.
http://cosas.javispedro.com/nit/bug1/bad.png

REPRODUCIBILITY:
This happens ONLY if the window size is specified in the SDL_SetVideoMode
function AND Hildon is up. Running the application after "af-sb-init.sh stop"
results in correct Black to yellow gradient.

OTHER COMMENTS:
I detected this behavior while trying to port OpenTTD (an SDL game).
There are some references to what may or may not be the same problem, even in
GTK apps:
* http://talk.maemo.org/showthread.php?t=30331
* http://www.youtube.com/watch?v=T3K6MJAIJFU mentions something about "color
distortions"

It is my understanding that this does not happen in the real device; that's the
reason I'm filling it against SDK and setting minor severity.
Comment 1 Javier S. Pedro (reporter) 2009-08-03 02:59:36 UTC
Created an attachment (id=1288) [details]
Testcase

The testcase SDL application mentioned in comment #0.
Comment 2 Andre Klapper maemo.org 2009-08-03 12:08:23 UTC
(In reply to comment #0)
> It is my understanding that this does not happen in the real device

It does not, at least not with (another) testcase Javier provided.
Comment 3 Andre Klapper maemo.org 2009-08-03 12:19:10 UTC
A second confirmation in the talk.maemo.org thread, hence confirming.
Comment 4 Soumya nokia 2009-09-08 14:04:56 UTC
Moving the bug to Desktop team hoping they can help in understanding why there
is such a difference.
Comment 5 Graham Cobb maemo.org 2009-10-08 02:16:51 UTC
Just in case someone needs a non-SDL test case, this is reproducible with
gpe-calendar (available in extras-testing).  The icons at the bottom of the
screen are orange on the real device.  In scratchbox they mostly (but not all)
start off orange but the orange becomes blue when you click on an icon. 
Sometimes is goes back to orange if you click again, or click a different icon!
Comment 6 Nicolas Frenay 2009-10-08 08:32:52 UTC
I think I have the same problem on the final SDK. Colors are swapped.

To reproduce: open a web page on the browser (example: www.youtube.com)

The colors will be probably be wrong (Youtube's red logo appears blue). After
some screen changes, or scrolling, sometimes it changes to the correct color,
but behavior is almost random (couldn't point out what exactly triggers the
color change).
Comment 7 Javier S. Pedro (reporter) 2009-10-08 10:34:22 UTC
Clearly, this is a more generic bug now.
Comment 8 Lucas Maneos 2009-10-12 11:07:20 UTC
*** Bug 5307 has been marked as a duplicate of this bug. ***
Comment 9 Javier S. Pedro (reporter) 2009-10-12 17:31:12 UTC
More info: not Xephyr issue, and not a mesa libGL issue.

I've run the compositor with $DISPLAY pointing to a real X11 DRI-enabled
display with nvidia's libGL and the issue was still there (the transitions were
really fast tho :) ).
Comment 10 Javier S. Pedro (reporter) 2009-11-05 02:01:49 UTC
Created an attachment (id=1537) [details]
proposed patch to clutter-0.8.2-0maemo49+0m5

While this does not fix the failing texture format conversion, it workarounds
the issue by allowing Clutter to send Mesa the RGB565 textures "as is".

As you can see, the patch is a one line fix for what I believe is a typo in
COGL.

Can anyone check the official GL specs for the "GL_EXT_packed_pixel"[sic]
extension? Google returns nearly no hits at all about it (and they're mostly
maemo.org related).
I do know about the GL_EXT_packed_pixels (note the trailing s) extension, which
is implemented by both the Mesa libGL shipped in Final SDK and my NVidia's
propietary libGL. 

The only scenario where this workaround fails is when using VMGL (so far I only
know myself using it with the SDK ;) ), since VMGL does not support
GL_EXT_packed_pixels and thus fallbacks to the faulting texture conversion.
Comment 11 Javier S. Pedro (reporter) 2009-11-05 19:23:41 UTC
Created an attachment (id=1542) [details]
more complete patch

There's also the issue that only accepting GL_UNSIGNED_SHORT_5_6_5 when
GL_EXT_packed_pixel{,s} is present seems wrong to me, since _5_6_5 was
introduced with GL 1.2 and not with packed_pixels (which introduced 4_4_4_4 and
the like)

This does not matter much since the Mesa libGL shipped with the X86 SDK does
support packed_pixels and thus everything is fine, but when using VMGL (which
as said doesn't support packed_pixels) clutter won't allow RGB565 textures to
be uploaded to GL and the bug is back. Since the check is redundant due to the
above argument, the attached patch removes it.

Note that all this means Clutter requires OGL >= 1.2. Please correct if wrong.
Comment 12 Gordon Williams 2009-12-01 13:31:40 UTC
Hi, thanks for the patch. Definitely GL_EXT_packed_pixel(s) is a typo, and
probably solves the issue for pretty much everyone.

The original check (and code) was added because IIRC the version of Xephyr we
were using at the time didn't support 565 unless the user had a DRI-capable
video card (I'm unsure of the reasons behind that though).

It seems a shame about the removal of the check - but it does seem that things
have moved on, and I don't think it is unreasonable to require users to have
GL>=1.2. I'll actually just remove the whole COGL_FEATURE_PACKED_PIXEL flag, as
it was added for this and is no longer required either. If anyone does have a
problem with GL<1.2 we can always add a COGL_FEATURE_GL_1_2 flag.

Obviously it would still be nice to get this properly fixed somehow as I guess
it could show up elsewhere.

Do you have a clue about where the channel swapping happened? It seemed to me
that it only happened for partial texture updates (does it happen for every
partial update?)
Comment 13 Javier S. Pedro (reporter) 2009-12-01 17:28:14 UTC
(In reply to comment #12)
> It seems a shame about the removal of the check - but it does seem that things
> have moved on, and I don't think it is unreasonable to require users to have
> GL>=1.2. I'll actually just remove the whole COGL_FEATURE_PACKED_PIXEL flag, as
> it was added for this and is no longer required either. If anyone does have a
> problem with GL<1.2 we can always add a COGL_FEATURE_GL_1_2 flag.

Or something like
- GL>=1.2 enables 5_6_5 and 4_4_4_4, 5_5_5_1, etc.
- packed pixels enables 4_4_4_4, 5_5_5_1, etc. (but not 5_6_5).

You have an example of an implementation with packed_pixels but not GL 1.2
(Xephyr without DRI?) and I have an example of a 1.2 implementation without
packed_pixels (VMGl), so I don't know if either 1.2 or packed pixels is more
common :)

> Obviously it would still be nice to get this properly fixed somehow as I guess
> it could show up elsewhere.
> 
> Do you have a clue about where the channel swapping happened? It seemed to me
> that it only happened for partial texture updates (does it happen for every
> partial update?)
> 

You're right. During my testing I found out that image data loaded through
cogl_texture_set_region had channels swapped, while that loaded with
_bitmap_prepare/new_with_data was not (note though that I couldn't reproduce it
in a smaller testcase, so the bug may be in the callers of those functions). I
was looking for an unneeded RGB<->BGR conversion (since that could cause this
effect) until I realized that it was converting 5_6_5 textures to 8_8_8_8 when
it was not really needed and made this workaround. 
Since 5_6_5 textures (the most common, of course) now go through without
conversion at all, it also happened to hide the bug :).
Comment 14 Andre Klapper maemo.org 2009-12-04 21:54:56 UTC
Merged:
http://maemo.gitorious.org/fremantle-hildon-desktop/clutter_0_8/merge_requests/2248

I normally close reports as FIXED here once stuff has been released
(internally). No idea though which release will include this. :-/
Comment 15 Gordon Williams 2009-12-07 12:24:04 UTC
This hasn't been sent to integration yet - I'll do it now. Keep an eye open for
clutter 0.8.2-0maemo61
Comment 16 Carsten Munk maemo.org 2009-12-31 10:40:17 UTC
A quick heads up: The committed patch (and the original patch) seems to cause
empty white surfaces on ARM (Mesa-SWX11) but not on i386. We're investigating
why that is.
Comment 17 Javier S. Pedro (reporter) 2009-12-31 21:02:14 UTC
Created an attachment (id=1890) [details]
Sets "texture from pixmap" textures byte order appropiately.

Ok then, I believe this is a fix for the underlying cause.

clutter-x11 was assuming XImages were in BGR format, but that doesn't seem
always true. The XImage struct itself has a field for determining exactly that
so I just query it.

(why is clutter not using texture_from_pixmap on x86, btw?)

You can use this fix without the workaround I posted here, though of course you
lose the performance gain from Clutter not having to convert rgb565 textures to
rgb888. Fixing the whole mess with packed_pixels would be nice specially if Mer
is going to use the GLX backend on performance constrained devices, though not
the main scope of this bug.
Comment 18 Gordon Williams 2010-01-18 12:41:24 UTC
Thanks! That seems like a great fix - did you try without the packed_pixels
changes? (as CLUTTER_TEXTURE_RGB_FLAG_BGR is ignored with them in) I must admit
I'm still confused about why the previous change solved the problem if this
does too?

I'll apply that asap, and set support_gl_1_2==FALSE, which should effectively
revert the previous changes and make ARM Mesa-SWX11 work again (comment #16).

wrt #16, My guess is that the version of mesa used doesn't support packed
pixels?
Comment 19 Javier S. Pedro (reporter) 2010-01-19 22:22:07 UTC
Unfortunately, it is not a great fix :( since it doesn't make much sense when
put in perspective.

If I keep to clutter upstream definition of the BGR flag, which is independent
of host endianness, when the XImage byte_order is MSBFirst the BGR flag should
be cleared, not set (considering the default Maemo Visual, etc.).

However, the Maemo Clutter doesn't seem to agree on this definition. For
example, the _cogl_rgb565_to_rgba function in cogl-bitmap-fallback.c (which,
btw, is not used as a fallback but rather is the main one) byteswaps (in a
hidden way, since it reads the image via gushort, and this is a little endian
platform). So, rgb565 it's not really rgb565 but bgr565 on LE platforms, and
rgb565 on BE platforms.

But that function is in the COGL shared code (code that's running on the N900,
and not only on the SDK), and fixing it means possibly uncovering a lot of new
bugs everywhere in clutter, so all we can do is workarounds.


This explains why it works when using packed_pixels; no conversion functions
invoked, no bug. And why it also works without testing the byte_order flag in
Ximages (this is a LE platform so "all is well"). 

I still fail to explain the issue Carsten mentions, though.
Comment 20 Gordon Williams 2010-01-25 19:40:46 UTC
Thanks for the explanation. It makes a lot of sense, but I'm still quite
wary...

On the N900, in some cases SGX can't actually map any more pixmaps directly
(20-odd windows), so we revert to the slower clutter-x11-texture-pixmap method
which currently works fine. Do you think it is the case that always
'image->byte_order == MSBFirst' for the N900? If not, we could possibly break
it...

To be honest I think we might be better off changing the image format
conversion code (since it's stuff I added anyway - and yes, I know it's not
fallback code :), although I'm still not sure about the endianness issue. If
you consider we have the 16 bit RGB565 colour with the bits as follows:

[ R4 R3 R2 R1 R0 G5 G4 G3  ] [ G2 G1 G0 B4 B3 B2 B1 B0 ]

Then if you swap the bytes, you get:
[ G2 G1 G0 B4 B3 B2 B1 B0 ] [ R4 R3 R2 R1 R0 G5 G4 G3  ]

And when you try and pull out the colour, the bits of each colour channel get
totally messed up - which I didn't think was what was happening at all, it
seemed like a straight channel swap?

It may be the safest solution is just to #ifdef __i386__ the image->byte_order
== MSBFirst, and then we know we're safe.

Even then I'm still mighty confused. You'd expect that byte_order would always
be the same for a certain platform, eg:

 pixel_is_bgr = image->byte_order == MSBFirst;

would be the same as:

#ifdef __i386__
 pixel_is_bgr = false;
#else
 pixel_is_bgr = true;
#endif

However if you did that then I don't believe it would work? You'd get
fullscreen updates with the wrong colours, and partial updates with the right
ones?
Comment 21 Andre Klapper maemo.org 2010-04-26 13:20:55 UTC
Javier, can you comment on the latest comment?
Comment 22 Javier S. Pedro (reporter) 2010-04-26 22:55:27 UTC
Thanks for the ping, Andre. I totally missed the latest post, which is
really interesting. Sorry Gordon!

Somebody poked me a few months ago with the following snippet from Clutter
documentation:

http://www.clutter-project.org/docs/cogl/stable/cogl-General-API.html :
> So for example COGL_PIXEL_FORMAT_RGB_888 would have the red component
> in the lowest address, green in the next address and blue after that
> regardless of the endinanness of the system. 
> For the 16-bit formats the component order specifies the order within a
> 16-bit number from most significant bit to least significant. So for
> COGL_PIXEL_FORMAT_RGB_565, the red component would be in bits 11-15, the
> green component would be in 6-11 and the blue component would be in 1-5.
> Therefore the order in memory depends on the endianness of the system. 

So, upstream here takes a mixed approach: for most formats RGB/BGR flag is
host endianness independent, but for packed pixel formats it is host 
endianness dependent. Thus I was wrong in a core assumption of my previous
rambling; apologies.

Knowing this, and also knowing the default visual color masks on _X86 for:
- 32/24 bits per pixel: R=0xFF0000, G=0xFF00, B=0xFF
- 16 bits per pixel: R=0xF800 G=0x7E0 B=0x1F

We can say that while 24 bpp XImages with those masks and LSBFirst bit &
byte order would map to COGL_PIXEL_FORMAT_BGR_888; 16 bpp XImages would map
to COGL_PIXEL_FORMAT_RGB_565 (no BGR flag!). As such, the previous fix is
not "entirely" correct and would break 24 & 32 bpp windows. I tested it
with a 32bpp animation actor and behold -- it was channel swapped :).


Also, I've spent a few hours trying to understand why the issue only
manifests itself with partial updates.

I found that cogl_pixel_format_to_gl would, when asked about
COGL_PIXEL_FORMAT_BGR_565, say the nearest COGL format is
"COGL_PIXEL_FORMAT_BGR_888" (channel swapping!! see my introductory point)
and that the appropiate GL format is GL_RGB, GL_UNSIGNED_BYTE (wrong!
channel swapping again!! -- COGL's BGR_888 should map to OGL's GL_BGR
instead).

Channel swapping + channel swapping = :) ;)

However, when partial updating the texture, the texture's internal format
is COGL_PIXEL_FORMAT_BGR_888 (because of the above). So, the partial
texture data will get converted from BGR_565 to BGR_888 (and channel
swapped!) and THEN cogl_pixel_format_to_gl will be asked for the matching GL
format to the texture's internal format (BGR_888). It will correctly say
GL_BGR,GL_UNSIGNED_BYTE , which is completely correct, but means that GL
won't channel swap, and thus the (only once) channel swapped data gets 
uploaded to the screen, and the problem manifests.

---------------------------------------

So, to recap:

1. Fix _cogl_pixel_format_to_gl (the GL<1.2 version).
On Little Endian platforms, when asked about format
COGL_PIXEL_FORMAT_RGB_565, it should output COGL_PIXEL_FORMAT_BGR_888 
and GL_BGR. When asked about BGR_565, it should output RGB_888 and GL_RGB.
Opposite on Big Endian platforms.
This will make the bug consistent at last :), and make it fail for both
partial and full texture updates.

2. Fix clutter_x11_texture_pixmap_update_area_real. Using something like
the following logic (note that it does not take care of visuals, masks, ..):
if (depth == 24 || depth == 32) bgr_flag = (image_byte_order == LSBFirst);
else if (depth == 16) bgr_flag = (image_byte_order != host_byte_order);
else bgr_flag = false;

I will clean (and upgrade) my source tree for a proper patch tomorrow if I
find some time.

-----------------------------------------

(In reply to comment #20)
> On the N900, in some cases SGX can't actually map any more pixmaps directly
> (20-odd windows), so we revert to the slower clutter-x11-texture-pixmap method
> which currently works fine. Do you think it is the case that always
> 'image->byte_order == MSBFirst' for the N900? If not, we could possibly break
> it...
Actually, image->byte_order == LSBFirst for most windows. And indeed the 
patch was breaking 32 bpp textures.
I think I could create a window where byte_order == MSBFirst, but I've seen
no app doing it so far.
(BTW, here be dragons when byte_order and bit_order don't match, but I've
not seen that either)

> To be honest I think we might be better off changing the image format
> conversion code (since it's stuff I added anyway - and yes, I know it's not
> fallback code :), although I'm still not sure about the endianness issue.
You're completely right here -- I was confusing byte swapping and "B/R
channel swapping".
The conversion routines are fine if we are to accept the above definition of
BGR flag, though.

> You'd expect that byte_order would always be the same for a certain
> platform
It might be; I don't know enough about X11 to answer if you might get a
"non-native" byte or bit order pixmap. If this is the case, the bgr_flag
logic would be reduced to :
bgr_flag = host_byte_order == LE && (depth == 24 || depth == 32);

Note that ARM is also LE; this bug doesn't manifest at all in ARM for the
same reason it doesn't manifest on x86 when support_gl_1_2 is TRUE. The
code as it is now on both doesn't even check the BGR flag in 16 bpp textures
for GLES and GL>=1.2 , so patching the above two points should be relatively
harmless for the N900 (and most SDK users).

Ah, long post. At least this fix makes sense -- I hope it's OK for you too.

BTW, Carsten, did you find the ARM/Mesa-SWX11 issue?
Comment 23 Gordon Williams 2010-04-27 11:16:35 UTC
Thanks for putting so much work into this, that sounds like the issue. I guess
maybe the RGB565 format comes from OpenGL's internal definition of it - I never
thought to check, but it makes a lot of sense that the ordering would be like
that.

Sorry I didn't get to put any real time into this myself - I'm afraid I got
distracted by other stuff and basically forgot about it after I'd posted the
last comment :(

I look forward to your patches - It'll be great to finally see this fixed.
Comment 24 Javier S. Pedro (reporter) 2010-04-27 22:05:39 UTC
(In reply to comment #23)
> Thanks for putting so much work into this, that sounds like the issue. I guess
> maybe the RGB565 format comes from OpenGL's internal definition of it - I never
> thought to check, but it makes a lot of sense that the ordering would be like
> that.
Aha, that's the case: COGL's RGB_565 => GL_UNSIGNED_SHORT_5_6_5, BGR_565 =>
GL_UNSIGNED_SHORT_5_6_5_REV whatever the platform. I've modified to
cogl_pixel_format_to_gl appropriately. I'm trying my changes (and Gitorious
itself) here:
http://maemo.gitorious.org/~javispedro/fremantle-hildon-desktop/javispedros-clutter_0_8
.
Comment 25 Javier S. Pedro (reporter) 2010-04-27 22:19:44 UTC
Created an attachment (id=2665) [details]
newer patch as per comment #22

I don't really have a BE platform to test, so I decided to go for LE only. This
patch:
1) Changes _cogl_pixel_format_to_gl so that cogl's rgb_565 is mapped to GL's
5_6_5, bgr_565 to GL's 5_6_5_REV . For GL <= 1.2, always asks for conversion
to RGB_888 whatever the platform (profiling could say which one is most
performing for each platform).
2) Changes clutter_x11_texture_pixmap_update_area_real so that the BGR flag is
NOT set for 16bpp windows.

Clutter's GLES backend conveniently (and possibly, incorrectly) ignores the BGR
flag for 16bpp textures so this shouldnt't affect device operation.
Comment 26 Andre Klapper maemo.org 2010-06-03 19:04:29 UTC
So this is still valid for PR1.2 I assume?
Comment 27 Gordon Williams 2010-06-03 19:26:35 UTC
I haven't got around to putting these patches in yet I'm afraid, so yes - most
likely it is still visible in PR1.2
Comment 28 Javier S. Pedro (reporter) 2010-06-04 01:27:11 UTC
Still in PR1.2 but "your average user" will never see it.