maemo.org Bugzilla – Bug 4870
Red & blue color channels swapped sometimes in SDK
Last modified: 2010-10-25 18:59:20 UTC
You need to log in before you can comment on or make changes to this bug.
SOFTWARE VERSION: FREMANTLE_X86 SDK beta2. PLATFORM: Debian Lenny amd64. STEPS TO REPRODUCE THE PROBLEM: 1. Build attached test application with "gcc `sdl-config --libs --cflags` video.c -o video" (or write any SDL application showing a black to yellow gradient which specifies preferred window size in call to SDL_SetVideoMode). 2. Start Xephyr & Hildon (af-sb-init.sh start). 3. Run the application built in step 1. EXPECTED OUTCOME: Black to yellow gradient. http://cosas.javispedro.com/nit/bug1/good.png ACTUAL OUTCOME: Black to blue gradient. http://cosas.javispedro.com/nit/bug1/bad.png REPRODUCIBILITY: This happens ONLY if the window size is specified in the SDL_SetVideoMode function AND Hildon is up. Running the application after "af-sb-init.sh stop" results in correct Black to yellow gradient. OTHER COMMENTS: I detected this behavior while trying to port OpenTTD (an SDL game). There are some references to what may or may not be the same problem, even in GTK apps: * http://talk.maemo.org/showthread.php?t=30331 * http://www.youtube.com/watch?v=T3K6MJAIJFU mentions something about "color distortions" It is my understanding that this does not happen in the real device; that's the reason I'm filling it against SDK and setting minor severity.
Created an attachment (id=1288) [details] Testcase The testcase SDL application mentioned in comment #0.
(In reply to comment #0) > It is my understanding that this does not happen in the real device It does not, at least not with (another) testcase Javier provided.
A second confirmation in the talk.maemo.org thread, hence confirming.
Moving the bug to Desktop team hoping they can help in understanding why there is such a difference.
Just in case someone needs a non-SDL test case, this is reproducible with gpe-calendar (available in extras-testing). The icons at the bottom of the screen are orange on the real device. In scratchbox they mostly (but not all) start off orange but the orange becomes blue when you click on an icon. Sometimes is goes back to orange if you click again, or click a different icon!
I think I have the same problem on the final SDK. Colors are swapped. To reproduce: open a web page on the browser (example: www.youtube.com) The colors will be probably be wrong (Youtube's red logo appears blue). After some screen changes, or scrolling, sometimes it changes to the correct color, but behavior is almost random (couldn't point out what exactly triggers the color change).
Clearly, this is a more generic bug now.
*** Bug 5307 has been marked as a duplicate of this bug. ***
More info: not Xephyr issue, and not a mesa libGL issue. I've run the compositor with $DISPLAY pointing to a real X11 DRI-enabled display with nvidia's libGL and the issue was still there (the transitions were really fast tho :) ).
Created an attachment (id=1537) [details] proposed patch to clutter-0.8.2-0maemo49+0m5 While this does not fix the failing texture format conversion, it workarounds the issue by allowing Clutter to send Mesa the RGB565 textures "as is". As you can see, the patch is a one line fix for what I believe is a typo in COGL. Can anyone check the official GL specs for the "GL_EXT_packed_pixel"[sic] extension? Google returns nearly no hits at all about it (and they're mostly maemo.org related). I do know about the GL_EXT_packed_pixels (note the trailing s) extension, which is implemented by both the Mesa libGL shipped in Final SDK and my NVidia's propietary libGL. The only scenario where this workaround fails is when using VMGL (so far I only know myself using it with the SDK ;) ), since VMGL does not support GL_EXT_packed_pixels and thus fallbacks to the faulting texture conversion.
Created an attachment (id=1542) [details] more complete patch There's also the issue that only accepting GL_UNSIGNED_SHORT_5_6_5 when GL_EXT_packed_pixel{,s} is present seems wrong to me, since _5_6_5 was introduced with GL 1.2 and not with packed_pixels (which introduced 4_4_4_4 and the like) This does not matter much since the Mesa libGL shipped with the X86 SDK does support packed_pixels and thus everything is fine, but when using VMGL (which as said doesn't support packed_pixels) clutter won't allow RGB565 textures to be uploaded to GL and the bug is back. Since the check is redundant due to the above argument, the attached patch removes it. Note that all this means Clutter requires OGL >= 1.2. Please correct if wrong.
Hi, thanks for the patch. Definitely GL_EXT_packed_pixel(s) is a typo, and probably solves the issue for pretty much everyone. The original check (and code) was added because IIRC the version of Xephyr we were using at the time didn't support 565 unless the user had a DRI-capable video card (I'm unsure of the reasons behind that though). It seems a shame about the removal of the check - but it does seem that things have moved on, and I don't think it is unreasonable to require users to have GL>=1.2. I'll actually just remove the whole COGL_FEATURE_PACKED_PIXEL flag, as it was added for this and is no longer required either. If anyone does have a problem with GL<1.2 we can always add a COGL_FEATURE_GL_1_2 flag. Obviously it would still be nice to get this properly fixed somehow as I guess it could show up elsewhere. Do you have a clue about where the channel swapping happened? It seemed to me that it only happened for partial texture updates (does it happen for every partial update?)
(In reply to comment #12) > It seems a shame about the removal of the check - but it does seem that things > have moved on, and I don't think it is unreasonable to require users to have > GL>=1.2. I'll actually just remove the whole COGL_FEATURE_PACKED_PIXEL flag, as > it was added for this and is no longer required either. If anyone does have a > problem with GL<1.2 we can always add a COGL_FEATURE_GL_1_2 flag. Or something like - GL>=1.2 enables 5_6_5 and 4_4_4_4, 5_5_5_1, etc. - packed pixels enables 4_4_4_4, 5_5_5_1, etc. (but not 5_6_5). You have an example of an implementation with packed_pixels but not GL 1.2 (Xephyr without DRI?) and I have an example of a 1.2 implementation without packed_pixels (VMGl), so I don't know if either 1.2 or packed pixels is more common :) > Obviously it would still be nice to get this properly fixed somehow as I guess > it could show up elsewhere. > > Do you have a clue about where the channel swapping happened? It seemed to me > that it only happened for partial texture updates (does it happen for every > partial update?) > You're right. During my testing I found out that image data loaded through cogl_texture_set_region had channels swapped, while that loaded with _bitmap_prepare/new_with_data was not (note though that I couldn't reproduce it in a smaller testcase, so the bug may be in the callers of those functions). I was looking for an unneeded RGB<->BGR conversion (since that could cause this effect) until I realized that it was converting 5_6_5 textures to 8_8_8_8 when it was not really needed and made this workaround. Since 5_6_5 textures (the most common, of course) now go through without conversion at all, it also happened to hide the bug :).
Merged: http://maemo.gitorious.org/fremantle-hildon-desktop/clutter_0_8/merge_requests/2248 I normally close reports as FIXED here once stuff has been released (internally). No idea though which release will include this. :-/
This hasn't been sent to integration yet - I'll do it now. Keep an eye open for clutter 0.8.2-0maemo61
A quick heads up: The committed patch (and the original patch) seems to cause empty white surfaces on ARM (Mesa-SWX11) but not on i386. We're investigating why that is.
Created an attachment (id=1890) [details] Sets "texture from pixmap" textures byte order appropiately. Ok then, I believe this is a fix for the underlying cause. clutter-x11 was assuming XImages were in BGR format, but that doesn't seem always true. The XImage struct itself has a field for determining exactly that so I just query it. (why is clutter not using texture_from_pixmap on x86, btw?) You can use this fix without the workaround I posted here, though of course you lose the performance gain from Clutter not having to convert rgb565 textures to rgb888. Fixing the whole mess with packed_pixels would be nice specially if Mer is going to use the GLX backend on performance constrained devices, though not the main scope of this bug.
Thanks! That seems like a great fix - did you try without the packed_pixels changes? (as CLUTTER_TEXTURE_RGB_FLAG_BGR is ignored with them in) I must admit I'm still confused about why the previous change solved the problem if this does too? I'll apply that asap, and set support_gl_1_2==FALSE, which should effectively revert the previous changes and make ARM Mesa-SWX11 work again (comment #16). wrt #16, My guess is that the version of mesa used doesn't support packed pixels?
Unfortunately, it is not a great fix :( since it doesn't make much sense when put in perspective. If I keep to clutter upstream definition of the BGR flag, which is independent of host endianness, when the XImage byte_order is MSBFirst the BGR flag should be cleared, not set (considering the default Maemo Visual, etc.). However, the Maemo Clutter doesn't seem to agree on this definition. For example, the _cogl_rgb565_to_rgba function in cogl-bitmap-fallback.c (which, btw, is not used as a fallback but rather is the main one) byteswaps (in a hidden way, since it reads the image via gushort, and this is a little endian platform). So, rgb565 it's not really rgb565 but bgr565 on LE platforms, and rgb565 on BE platforms. But that function is in the COGL shared code (code that's running on the N900, and not only on the SDK), and fixing it means possibly uncovering a lot of new bugs everywhere in clutter, so all we can do is workarounds. This explains why it works when using packed_pixels; no conversion functions invoked, no bug. And why it also works without testing the byte_order flag in Ximages (this is a LE platform so "all is well"). I still fail to explain the issue Carsten mentions, though.
Thanks for the explanation. It makes a lot of sense, but I'm still quite wary... On the N900, in some cases SGX can't actually map any more pixmaps directly (20-odd windows), so we revert to the slower clutter-x11-texture-pixmap method which currently works fine. Do you think it is the case that always 'image->byte_order == MSBFirst' for the N900? If not, we could possibly break it... To be honest I think we might be better off changing the image format conversion code (since it's stuff I added anyway - and yes, I know it's not fallback code :), although I'm still not sure about the endianness issue. If you consider we have the 16 bit RGB565 colour with the bits as follows: [ R4 R3 R2 R1 R0 G5 G4 G3 ] [ G2 G1 G0 B4 B3 B2 B1 B0 ] Then if you swap the bytes, you get: [ G2 G1 G0 B4 B3 B2 B1 B0 ] [ R4 R3 R2 R1 R0 G5 G4 G3 ] And when you try and pull out the colour, the bits of each colour channel get totally messed up - which I didn't think was what was happening at all, it seemed like a straight channel swap? It may be the safest solution is just to #ifdef __i386__ the image->byte_order == MSBFirst, and then we know we're safe. Even then I'm still mighty confused. You'd expect that byte_order would always be the same for a certain platform, eg: pixel_is_bgr = image->byte_order == MSBFirst; would be the same as: #ifdef __i386__ pixel_is_bgr = false; #else pixel_is_bgr = true; #endif However if you did that then I don't believe it would work? You'd get fullscreen updates with the wrong colours, and partial updates with the right ones?
Javier, can you comment on the latest comment?
Thanks for the ping, Andre. I totally missed the latest post, which is really interesting. Sorry Gordon! Somebody poked me a few months ago with the following snippet from Clutter documentation: http://www.clutter-project.org/docs/cogl/stable/cogl-General-API.html : > So for example COGL_PIXEL_FORMAT_RGB_888 would have the red component > in the lowest address, green in the next address and blue after that > regardless of the endinanness of the system. > For the 16-bit formats the component order specifies the order within a > 16-bit number from most significant bit to least significant. So for > COGL_PIXEL_FORMAT_RGB_565, the red component would be in bits 11-15, the > green component would be in 6-11 and the blue component would be in 1-5. > Therefore the order in memory depends on the endianness of the system. So, upstream here takes a mixed approach: for most formats RGB/BGR flag is host endianness independent, but for packed pixel formats it is host endianness dependent. Thus I was wrong in a core assumption of my previous rambling; apologies. Knowing this, and also knowing the default visual color masks on _X86 for: - 32/24 bits per pixel: R=0xFF0000, G=0xFF00, B=0xFF - 16 bits per pixel: R=0xF800 G=0x7E0 B=0x1F We can say that while 24 bpp XImages with those masks and LSBFirst bit & byte order would map to COGL_PIXEL_FORMAT_BGR_888; 16 bpp XImages would map to COGL_PIXEL_FORMAT_RGB_565 (no BGR flag!). As such, the previous fix is not "entirely" correct and would break 24 & 32 bpp windows. I tested it with a 32bpp animation actor and behold -- it was channel swapped :). Also, I've spent a few hours trying to understand why the issue only manifests itself with partial updates. I found that cogl_pixel_format_to_gl would, when asked about COGL_PIXEL_FORMAT_BGR_565, say the nearest COGL format is "COGL_PIXEL_FORMAT_BGR_888" (channel swapping!! see my introductory point) and that the appropiate GL format is GL_RGB, GL_UNSIGNED_BYTE (wrong! channel swapping again!! -- COGL's BGR_888 should map to OGL's GL_BGR instead). Channel swapping + channel swapping = :) ;) However, when partial updating the texture, the texture's internal format is COGL_PIXEL_FORMAT_BGR_888 (because of the above). So, the partial texture data will get converted from BGR_565 to BGR_888 (and channel swapped!) and THEN cogl_pixel_format_to_gl will be asked for the matching GL format to the texture's internal format (BGR_888). It will correctly say GL_BGR,GL_UNSIGNED_BYTE , which is completely correct, but means that GL won't channel swap, and thus the (only once) channel swapped data gets uploaded to the screen, and the problem manifests. --------------------------------------- So, to recap: 1. Fix _cogl_pixel_format_to_gl (the GL<1.2 version). On Little Endian platforms, when asked about format COGL_PIXEL_FORMAT_RGB_565, it should output COGL_PIXEL_FORMAT_BGR_888 and GL_BGR. When asked about BGR_565, it should output RGB_888 and GL_RGB. Opposite on Big Endian platforms. This will make the bug consistent at last :), and make it fail for both partial and full texture updates. 2. Fix clutter_x11_texture_pixmap_update_area_real. Using something like the following logic (note that it does not take care of visuals, masks, ..): if (depth == 24 || depth == 32) bgr_flag = (image_byte_order == LSBFirst); else if (depth == 16) bgr_flag = (image_byte_order != host_byte_order); else bgr_flag = false; I will clean (and upgrade) my source tree for a proper patch tomorrow if I find some time. ----------------------------------------- (In reply to comment #20) > On the N900, in some cases SGX can't actually map any more pixmaps directly > (20-odd windows), so we revert to the slower clutter-x11-texture-pixmap method > which currently works fine. Do you think it is the case that always > 'image->byte_order == MSBFirst' for the N900? If not, we could possibly break > it... Actually, image->byte_order == LSBFirst for most windows. And indeed the patch was breaking 32 bpp textures. I think I could create a window where byte_order == MSBFirst, but I've seen no app doing it so far. (BTW, here be dragons when byte_order and bit_order don't match, but I've not seen that either) > To be honest I think we might be better off changing the image format > conversion code (since it's stuff I added anyway - and yes, I know it's not > fallback code :), although I'm still not sure about the endianness issue. You're completely right here -- I was confusing byte swapping and "B/R channel swapping". The conversion routines are fine if we are to accept the above definition of BGR flag, though. > You'd expect that byte_order would always be the same for a certain > platform It might be; I don't know enough about X11 to answer if you might get a "non-native" byte or bit order pixmap. If this is the case, the bgr_flag logic would be reduced to : bgr_flag = host_byte_order == LE && (depth == 24 || depth == 32); Note that ARM is also LE; this bug doesn't manifest at all in ARM for the same reason it doesn't manifest on x86 when support_gl_1_2 is TRUE. The code as it is now on both doesn't even check the BGR flag in 16 bpp textures for GLES and GL>=1.2 , so patching the above two points should be relatively harmless for the N900 (and most SDK users). Ah, long post. At least this fix makes sense -- I hope it's OK for you too. BTW, Carsten, did you find the ARM/Mesa-SWX11 issue?
Thanks for putting so much work into this, that sounds like the issue. I guess maybe the RGB565 format comes from OpenGL's internal definition of it - I never thought to check, but it makes a lot of sense that the ordering would be like that. Sorry I didn't get to put any real time into this myself - I'm afraid I got distracted by other stuff and basically forgot about it after I'd posted the last comment :( I look forward to your patches - It'll be great to finally see this fixed.
(In reply to comment #23) > Thanks for putting so much work into this, that sounds like the issue. I guess > maybe the RGB565 format comes from OpenGL's internal definition of it - I never > thought to check, but it makes a lot of sense that the ordering would be like > that. Aha, that's the case: COGL's RGB_565 => GL_UNSIGNED_SHORT_5_6_5, BGR_565 => GL_UNSIGNED_SHORT_5_6_5_REV whatever the platform. I've modified to cogl_pixel_format_to_gl appropriately. I'm trying my changes (and Gitorious itself) here: http://maemo.gitorious.org/~javispedro/fremantle-hildon-desktop/javispedros-clutter_0_8 .
Created an attachment (id=2665) [details] newer patch as per comment #22 I don't really have a BE platform to test, so I decided to go for LE only. This patch: 1) Changes _cogl_pixel_format_to_gl so that cogl's rgb_565 is mapped to GL's 5_6_5, bgr_565 to GL's 5_6_5_REV . For GL <= 1.2, always asks for conversion to RGB_888 whatever the platform (profiling could say which one is most performing for each platform). 2) Changes clutter_x11_texture_pixmap_update_area_real so that the BGR flag is NOT set for 16bpp windows. Clutter's GLES backend conveniently (and possibly, incorrectly) ignores the BGR flag for 16bpp textures so this shouldnt't affect device operation.
So this is still valid for PR1.2 I assume?
I haven't got around to putting these patches in yet I'm afraid, so yes - most likely it is still visible in PR1.2
Still in PR1.2 but "your average user" will never see it.