maemo.org Bugzilla – Bug 9948
Contact quick search entering L does not recognize Polish "Ł"
Last modified: 2010-11-24 21:15:39 UTC
You need to log in before you can comment on or make changes to this bug.
SOFTWARE VERSION: (Settings > General > About product) 3.2010.02-8 EXACT STEPS LEADING TO PROBLEM: When you try to finde a contact by writing on the hardware kayboard first contact letters and contact includes Polish special signs like (ą, ę, ś, ć, ń, ó, ż, ź) then everything is working fine expect to contacts includes (ł). Example contact 1: Józef Nowak On the hardware keyboard type jozef - then system automaticly recognize "o" as "ó" and all Józef on my contact book are listed - the same for other letters (a-ą, c-ć, e-ę etc.) Example contact 2: Łukasz Nowak On the hardware keyboard type lukasz - then system DOESNT recognize "l" as "ł" and any contact is listed. Of course there is a way to put "ł" by virtual keyboard with special signes but to open it, it is neccesary to press Fn+Ctrl which works only when text/filter bar is open. EXPECTED OUTCOME: The "l" letter in the Polish layout of Hardware Keyboard durring contact filtration should be recognized as "l" and "ł" like other already are.
Confirming.
(guessing correct component)
This is because there is no unicode decomposition possible for this character. é can be decomposed as e + ´. We decompose it and take the ascii character. However, Ł is not.
You can do that with iconv: $ echo Łukasz | iconv -f utf-8 -t ascii//translit Lukasz
In the live search we use g_unicode_canonical_decomposition(). I'm checking now how to solve this. More examples on where this fails would be welcome. Cyrillic to latin doesn't count though..
(In reply to comment #5) > More examples on where this fails would be welcome. Cyrillic to > latin doesn't count though.. In fact iconv doesn't handle them :)
(In reply to comment #3) > This is because there is no unicode decomposition possible for this character. Does that also mean that bug 2259 is not fixable?
http://www.fileformat.info/info/unicode/char/fe63/index.htm isn't this the object that combined with L gives Ł?
(In reply to comment #8) > http://www.fileformat.info/info/unicode/char/fe63/index.htm > isn't this the object that combined with L gives Ł? No, at least not in Unicode. ﹡﹢﹣﹤﹥﹦
(In reply to comment #7) > (In reply to comment #3) > > This is because there is no unicode decomposition possible for this character. > > Does that also mean that bug 2259 is not fixable? > I never said this is not fixable :) Using g_converter() we can translliterate it to ascii. However that brings some issues with the cyrillic alphabet. I am looking for a way to solve this now.
(In reply to comment #8) > http://www.fileformat.info/info/unicode/char/fe63/index.htm > > isn't this the object that combined with L gives Ł? > Unicode (as of the 5.0.0 character database) doesn't consider Ł as a character that can be canonically decomposed between L and something else. GLib unicode implementation reflects this, and that's the reason why g_unicode_canonical_decompose() returns Ł itself on Ł. I don't know if this could be changed, I will try to ask Behdad about it.
(In reply to comment #7) > (In reply to comment #3) > > This is because there is no unicode decomposition possible for this character. > > Does that also mean that bug 2259 is not fixable? > but bug 2259 is about composing and this one is about decomposing. if I understood correctly, Ł has been made non decomposable because speakers of languages where Ł exists consider L and Ł two different letters. (at least in Italian) we do not consider èée different letters. but this sounds like a matter of definitions.
> > if I understood correctly, Ł has been made non decomposable because speakers > of languages where Ł exists consider L and Ł two different letters. (at > least in Italian) we do not consider èée different letters. but this sounds > like a matter of definitions. > Not really. All of the following are different letters in their respective languages but nevertheless they can be decomposed: - ñ and n (spanish) - ä and a (finnish, german, etc) - á and a (hungarian) - ő and o (hungarian) and so on..
(In reply to comment #13) > > > > [...] this sounds > > like a matter of definitions. > > > > Not really. All of the following are different letters in their respective > languages but nevertheless they can be decomposed: > > - ñ and n (spanish) > - ä and a (finnish, german, etc) > - á and a (hungarian) > - ő and o (hungarian) > > and so on.. this is what I meant, "matter of definition"... for what I know of Spanish and Polish, I would say that Spanish ñ and n are different in a similar way as Polish L and (haven't got broken L on Maemo - see bug 2259). they come at different places in traditional dictionaries, for example. this in contrast with German ä and a. but as non native speaker of either languages, I can be misunderstanding.
Hi, Thank you for you reaction. now i see that this problem is not correctly understand. By this bug 9948 i mean: 1. In Polish leanguage are two diferent leatters L and ł - they are not the same. 2. Nokia has very helpfully feature like contact quick search. on the main screen you can just start to type contact name on the HWKB and contact list appears with the list of contacts which includes typed string. the same if you filter contacts when you open address book. 3. all polish special leatters are recognized correctly (expl: when you have a contact named żęścina it is enough when you type zescina and contact will be found correctly because application reads Z as Z, ź and ż; a as a and ą etc. 4. Problem is only with ł. when you have contact named łukasz it is impossible to find it! you can not just type lukasz because application doesn read l a as l and ł but only as l. so how to find contact named łukasz??? even you would like to use virtual keyboard how to activate it?! to activate vitrual keyboard you have to press fn+sym but it works only when text bar is opened. Conclusion: How to find/fiter a contact named łukasz. It is realy serious BUG and shall be fixed by next firmware! Regards Daniel
(In reply to comment #15) > Hi, > Thank you for you reaction. now i see that this problem is not correctly > understand. I understand correctly the problem. Here we are discussing why it happens with the current implementation, which relies in the canonical decomposition of Unicode characters. > 3. all polish special leatters are recognized correctly (expl: when you have a > contact named żęścina it is enough when you type zescina and contact will be > found correctly because application reads Z as Z, ź and ż; a as a and ą etc. That's because all of these characters are canonically decomposable. > 4. Problem is only with ł. Because this character is not canonically decomposable. > How to find/fiter a contact named łukasz. > It is realy serious BUG and shall be fixed by next firmware! It's not that serious to be honest, Poland is not such a big country. However rest assured that I'm spending quite some time on this.
(In reply to comment #16) > Here we are discussing why it happens with > the current implementation, which relies in the canonical decomposition of > Unicode characters. Such approach (ą->a, etc) works in Polish, because it is (or was) common to use just the latin alphabet letters without acutes, ogoneks, etc, due to problems with multiple character encodings, availability of fonts, writer's convenience, and so on. However, I really can't imagine how such approach would work in languages that have strict transcription rules. For example, in German "Müller" can be transcribed only to "Mueller" and consequently "Muller" is incorrect. And what about languages that are not basing on latin alphabet? Take for example Russian "чиж", which transliterates to "Chizh" (What about non-english transliterations? In Polish that would be "Cziż", which again would be reduced to "Cziz". Oh fun.). > It's not that serious to be honest, Poland is not such a big country. And you are lynching Negroes.
(In reply to comment #17) > (In reply to comment #16) > > Here we are discussing why it happens with > > the current implementation, which relies in the canonical decomposition of > > Unicode characters. > Such approach (ą->a, etc) works in Polish, because it is (or was) common to > use just the latin alphabet letters without acutes, ogoneks, etc, due to > problems with multiple character encodings, availability of fonts, writer's > convenience, and so on. However, I really can't imagine how such approach would > work in languages that have strict transcription rules. For example, in German > "Müller" can be transcribed only to "Mueller" and consequently "Muller" is > incorrect. And what about languages that are not basing on latin alphabet? Take > for example Russian "чиж", which transliterates to "Chizh" (What about > non-english transliterations? In Polish that would be "Cziż", which again > would be reduced to "Cziz". Oh fun.). We are not going to transliterate. It's way out of scope. > > It's not that serious to be honest, Poland is not such a big country. > And you are lynching Negroes. Don't be so dramatic. I am being extremely honest by telling you that Maemo has other bugs that are really serious, in the sense that affect a bigger user base, and priorities are set according to that.
(In reply to comment #16) > It's not that serious to be honest, Poland is not such a big country. from a European perspective, this isn't completely true: it's the 6th largest country by population in the EU. to see how many users are impacted, don't forget that Polish women are very popular in all of Europe. I guess this problem affects more or less half of the European guys (defensive estimate: 25% of Nokia users base in Europe)!
Let's please stop the Offtopic noise here. Thanks.
Created an attachment (id=2661) [details] Fix for this Since we have people that consider this bug so important, this is a patch against hildon to fix this issue. Some applications (like the filemanager) will need to be fixed to use the new methods here. However, other applications using a touchselector directly should be fixed with this patch already.
Created an attachment (id=2662) [details] Correct patch My bad, previous patch was incomplete.
Created an attachment (id=2663) [details] test case This test case shows all the possible combination of matches that I tested for. These include standard ascii, latin with accents, some characters that are decomposable like ñ, ü, and so on, cyrillic, and the famous by now polish Ł. If anyone wants to patch the test to add some other test cases that we should be taking into consideration now is the time.
Here is the output of the test case as of now. As you can see, the tests we already know to fail, fail with the current method. Also, a Cyrillic to ascii transliterated search fails. All of these pass with the newer matching code. [sbox-maemo5-i486: ~/git/hildon/tests] > run-standalone.sh ./test-helper-matching Testing methods using g_unicode_canonical_decomposition() test-helper-matching[4108]: GLIB WARNING ** default - test 35 failed: (łukasz, lukasz, 0) test-helper-matching[4108]: GLIB WARNING ** default - test 36 failed: (łukasz, L, 0) test-helper-matching[4108]: GLIB WARNING ** default - test 37 failed: (Łucasz, lucasz, 0) test-helper-matching[4108]: GLIB WARNING ** default - test 39 failed: (-Łucasz-, -l, 0) test-helper-matching[4108]: GLIB WARNING ** default - test 40 failed: (-Łucasz-, l, 1) test-helper-matching[4108]: GLIB WARNING ** default - test 43 failed: (Мария, maria, 0) Testing methods using g_convert()
just to be sure, you might want to test also for øØ and the Romanian ă. but thanks!
maybe better explain why I suggest testing also ø (o-slash) and ă (a-breve). I don't want to sound as teasing or joking again. o-slash and a-breve are namely symbols for which bug 2259 also fails. if I understood Andre Klapper correctly, he thinks that there might be a link between the two problems. so maybe better be sure now that things are being done anyway.
Created an attachment (id=2664) [details] maemo9948-hildon-fm.patch (In reply to comment #22) > Created an attachment (id=2662) [details] [details] > Correct patch > > My bad, previous patch was incomplete. Here is the patch for hildon-fm. I tested the Live Search in File Manager with your patch and this one, and it works fine.
(In reply to comment #25) > just to be sure, you might want to test also for øØ and the Romanian ă. > but thanks! > Tests for ø fail, however tests for ǿ work fine. ă works fine. I guess iconv doesn't know what to do with ø..
See http://taschenorakel.de/mathias/2007/11/06/iconv-transliterations/ for reference. It seems that iconv transliteration is heavily dependant on the locale setting (after reading comments in the link it even makes sense). Is it taken into account in the tests?
Yes, I know that. The tests are assuming en_GB tbh, but in any case, it shouldn't be a problem what iconv transliterates to, as long as the person typing is aware of his locale and typing accordingly. In any case, that doesn't seem to be related to the problem, because I tested with da_DK.UTF-8 and it didn't really make any difference.
Committed into master and hildon-2-2. commit 2b2a3e8ecb19bbe00f334f6e88d2bcfa89408b24 Author: Claudio Saavedra <csaavedra@igalia.com> Date: Tue Apr 27 12:23:12 2010 +0300 Add and use new iconv based matching methods These methods allow for more complete matching rules, including for example the polish Ł. Use them now in the HildonTouchSelector live search. Fixes: MB#9948 (Contact quick search entering L does not recognize Polish "Ł") Fixes: NB#165083 (Contact quick search entering L does not recognize Polish "Ł")
The problem reported here should be fixed in the update that was released today for public: The Maemo5 update version 20.2010.36-2 (also called "PR1.3" sometimes). Please leave a comment if the problem is not fixed for you in this update version.
Bartosz: Note that the patch for *hildon-widgets* has been submitted but it does not affect Contacts as Contacts is not using Hildon's matching function to filter contacts. Please feel free to file a new report.
sorry but it still doesnt work - unfortunately:(
Daniel: See comment 33.