Bug 9948 - (int-165083) Contact quick search entering L does not recognize Polish "Ł"
(int-165083)
: Contact quick search entering L does not recognize Polish "Ł"
Status: RESOLVED FIXED
Product: Desktop platform
hildon-widgets
: 5.0/(3.2010.02-8)
: N900 Maemo
: Unspecified normal (vote)
: 5.0/(20.2010.36-2)
Assigned To: Claudio Saavedra
: hildon-libs-bugs
:
:
:
:
  Show dependency tree
 
Reported: 2010-04-17 23:51 UTC by Daniel Poznański
Modified: 2010-11-24 21:15 UTC (History)
6 users (show)

See Also:


Attachments
Fix for this (6.32 KB, patch)
2010-04-27 12:25 UTC, Claudio Saavedra
Details
Correct patch (6.39 KB, patch)
2010-04-27 12:39 UTC, Claudio Saavedra
Details
test case (4.53 KB, text/x-csrc)
2010-04-27 12:41 UTC, Claudio Saavedra
Details
maemo9948-hildon-fm.patch (1.90 KB, patch)
2010-04-27 15:53 UTC, Alban Crequy
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description Daniel Poznański (reporter) 2010-04-17 23:51:04 UTC
SOFTWARE VERSION:
(Settings > General > About product)
3.2010.02-8

EXACT STEPS LEADING TO PROBLEM: 
When you try to finde a contact by writing on the hardware kayboard first
contact letters and contact includes Polish special signs like (ą, ę, ś, ć, ń,
ó, ż, ź) then everything is working fine expect to contacts includes (ł).

Example contact 1: Józef Nowak
On the hardware keyboard type jozef - then system automaticly recognize "o" as
"ó" and all Józef on my contact book are listed - the same for other letters
(a-ą, c-ć, e-ę etc.)

Example contact 2: Łukasz Nowak
On the hardware keyboard type lukasz - then system DOESNT recognize "l" as "ł"
and any contact is listed.

Of course there is a way to put "ł" by virtual keyboard with special signes but
to open it, it is neccesary to press Fn+Ctrl which works only when text/filter
bar is open.


EXPECTED OUTCOME:
The "l" letter in the Polish layout of Hardware Keyboard durring contact
filtration should be recognized as "l" and "ł" like other already are.
Comment 1 Bartosz Taudul 2010-04-21 13:34:12 UTC
Confirming.
Comment 2 Andre Klapper maemo.org 2010-04-21 14:41:41 UTC
(guessing correct component)
Comment 3 Claudio Saavedra 2010-04-21 15:00:39 UTC
This is because there is no unicode decomposition possible for this character.

é can be decomposed as e + ´. We decompose it and take the ascii character.
However, Ł is not.
Comment 4 Alberto Garcia Gonzalez 2010-04-21 15:04:48 UTC
You can do that with iconv:

$ echo Łukasz | iconv -f utf-8 -t ascii//translit
Lukasz
Comment 5 Claudio Saavedra 2010-04-21 15:27:34 UTC
In the live search we use g_unicode_canonical_decomposition(). I'm checking now
how to solve this.

More examples on where this fails would be welcome. Cyrillic to latin doesn't
count though..
Comment 6 Alberto Garcia Gonzalez 2010-04-21 15:41:55 UTC
(In reply to comment #5)
> More examples on where this fails would be welcome. Cyrillic to
> latin doesn't count though..

In fact iconv doesn't handle them :)
Comment 7 Andre Klapper maemo.org 2010-04-22 17:10:22 UTC
(In reply to comment #3)
> This is because there is no unicode decomposition possible for this character.

Does that also mean that bug 2259 is not fixable?
Comment 8 Mario Frasca 2010-04-22 17:23:46 UTC
http://www.fileformat.info/info/unicode/char/fe63/index.htm

isn't this the object that combined with L gives Ł?
Comment 9 Andre Klapper maemo.org 2010-04-22 17:26:13 UTC
(In reply to comment #8)
> http://www.fileformat.info/info/unicode/char/fe63/index.htm
> isn't this the object that combined with L gives Ł?

No, at least not in Unicode. ﹡﹢﹣﹤﹥﹦
Comment 10 Claudio Saavedra 2010-04-22 17:28:17 UTC
(In reply to comment #7)
> (In reply to comment #3)
> > This is because there is no unicode decomposition possible for this character.
> 
> Does that also mean that bug 2259 is not fixable?
> 

I never said this is not fixable :)

Using g_converter() we can translliterate it to ascii. However that brings some
issues with the cyrillic alphabet. I am looking for a way to solve this now.
Comment 11 Claudio Saavedra 2010-04-22 17:30:56 UTC
(In reply to comment #8)
> http://www.fileformat.info/info/unicode/char/fe63/index.htm
> 
> isn't this the object that combined with L gives Ł?
> 

Unicode (as of the 5.0.0 character database) doesn't consider Ł as a character
that can be canonically decomposed between L and something else. GLib unicode
implementation reflects this, and that's the reason why
g_unicode_canonical_decompose() returns Ł itself on Ł.

I don't know if this could be changed, I will try to ask Behdad about it.
Comment 12 Mario Frasca 2010-04-22 18:03:51 UTC
(In reply to comment #7)
> (In reply to comment #3)
> > This is because there is no unicode decomposition possible for this character.
> 
> Does that also mean that bug 2259 is not fixable?
> 

but bug 2259 is about composing and this one is about decomposing.

if I understood correctly, Ł has been made non decomposable because speakers of
languages where Ł exists consider L and Ł two different letters.  (at least in
Italian) we do not consider èée different letters.  but this sounds like a
matter of definitions.
Comment 13 Claudio Saavedra 2010-04-22 19:26:30 UTC
> 
> if I understood correctly, Ł has been made non decomposable because speakers
> of languages where Ł exists consider L and Ł two different letters.  (at
> least in Italian) we do not consider èée different letters.  but this sounds
> like a matter of definitions.
> 

Not really. All of the following are different letters in their respective
languages but nevertheless they can be decomposed:

- ñ and n (spanish)
- ä and a (finnish, german, etc)
- á and a (hungarian)
- ő and o (hungarian)

and so on..
Comment 14 Mario Frasca 2010-04-22 19:52:51 UTC
(In reply to comment #13)
> > 
> > [...] this sounds
> > like a matter of definitions.
> > 
> 
> Not really. All of the following are different letters in their respective
> languages but nevertheless they can be decomposed:
> 
> - ñ and n (spanish)
> - ä and a (finnish, german, etc)
> - á and a (hungarian)
> - ő and o (hungarian)
> 
> and so on..  

this is what I meant, "matter of definition"...  for what I know of Spanish and
Polish, I would say that Spanish ñ and n are different in a similar way as
Polish L and (haven't got broken L on Maemo - see bug 2259).  they come at
different places in traditional dictionaries, for example.  this in contrast
with German ä and a.  but as non native speaker of either languages, I can be
misunderstanding.
Comment 15 Daniel Poznański (reporter) 2010-04-22 22:00:05 UTC
Hi,
Thank you for you reaction. now i see that this problem is not correctly
understand.
By this bug 9948 i mean:
1. In Polish leanguage are two diferent leatters L and ł - they are not the
same.
2. Nokia has very helpfully feature like contact quick search. on the main
screen you can just start to type contact name on the HWKB and contact list
appears with the list of contacts which includes typed string. the same if you
filter contacts when you open address book.
3. all polish special leatters are recognized correctly (expl: when you have a
contact named żęścina it is enough when you type zescina and contact will be
found correctly because application reads Z as Z, ź and ż; a as a and ą etc.
4. Problem is only with ł. when you have contact named łukasz it is impossible
to find it! you can not just type lukasz because application doesn read l a as
l and ł but only as l. so how to find contact named łukasz??? even you would
like to use virtual keyboard how to activate it?! to activate vitrual keyboard
you have to press fn+sym but it works only when text bar is opened.

Conclusion:
How to find/fiter a contact named łukasz.
It is realy serious BUG and shall be fixed by next firmware!

Regards
Daniel
Comment 16 Claudio Saavedra 2010-04-22 22:12:47 UTC
(In reply to comment #15)
> Hi,
> Thank you for you reaction. now i see that this problem is not correctly
> understand.

I understand correctly the problem. Here we are discussing why it happens with
the current implementation, which relies in the canonical decomposition of
Unicode characters.

> 3. all polish special leatters are recognized correctly (expl: when you have a
> contact named żęścina it is enough when you type zescina and contact will be
> found correctly because application reads Z as Z, ź and ż; a as a and ą etc.

That's because all of these characters are canonically decomposable.

> 4. Problem is only with ł.

Because this character is not canonically decomposable.

> How to find/fiter a contact named łukasz.
> It is realy serious BUG and shall be fixed by next firmware!

It's not that serious to be honest, Poland is not such a big country. However
rest assured that I'm spending quite some time on this.
Comment 17 Bartosz Taudul 2010-04-22 22:59:08 UTC
(In reply to comment #16)
> Here we are discussing why it happens with
> the current implementation, which relies in the canonical decomposition of
> Unicode characters.
Such approach (ą->a, etc) works in Polish, because it is (or was) common to use
just the latin alphabet letters without acutes, ogoneks, etc, due to problems
with multiple character encodings, availability of fonts, writer's convenience,
and so on. However, I really can't imagine how such approach would work in
languages that have strict transcription rules. For example, in German "Müller"
can be transcribed only to "Mueller" and consequently "Muller" is incorrect.
And what about languages that are not basing on latin alphabet? Take for
example Russian "чиж", which transliterates to "Chizh" (What about non-english
transliterations? In Polish that would be "Cziż", which again would be reduced
to "Cziz". Oh fun.).

> It's not that serious to be honest, Poland is not such a big country.
And you are lynching Negroes.
Comment 18 Claudio Saavedra 2010-04-23 00:10:32 UTC
(In reply to comment #17)
> (In reply to comment #16)
> > Here we are discussing why it happens with
> > the current implementation, which relies in the canonical decomposition of
> > Unicode characters.
> Such approach (ą->a, etc) works in Polish, because it is (or was) common to
> use just the latin alphabet letters without acutes, ogoneks, etc, due to
> problems with multiple character encodings, availability of fonts, writer's
> convenience, and so on. However, I really can't imagine how such approach would
> work in languages that have strict transcription rules. For example, in German
> "Müller" can be transcribed only to "Mueller" and consequently "Muller" is
> incorrect. And what about languages that are not basing on latin alphabet? Take
> for example Russian "чиж", which transliterates to "Chizh" (What about
> non-english transliterations? In Polish that would be "Cziż", which again
> would be reduced to "Cziz". Oh fun.).

We are not going to transliterate. It's way out of scope.

> > It's not that serious to be honest, Poland is not such a big country.
> And you are lynching Negroes.

Don't be so dramatic. I am being extremely honest by telling you that Maemo has
other bugs that are really serious, in the sense that affect a bigger user
base, and priorities are set according to that.
Comment 19 Mario Frasca 2010-04-23 08:52:45 UTC
(In reply to comment #16)
> It's not that serious to be honest, Poland is not such a big country.

from a European perspective, this isn't completely true: it's the 6th largest
country by population in the EU.  to see how many users are impacted, don't
forget that Polish women are very popular in all of Europe.  I guess this
problem affects more or less half of the European guys (defensive estimate: 25%
of Nokia users base in Europe)!
Comment 20 Andre Klapper maemo.org 2010-04-23 12:55:43 UTC
Let's please stop the Offtopic noise here. Thanks.
Comment 21 Claudio Saavedra 2010-04-27 12:25:30 UTC
Created an attachment (id=2661) [details]
Fix for this

Since we have people that consider this bug so important, this is a patch
against hildon to fix this issue.

Some applications (like the filemanager) will need to be fixed to use the new
methods here. However, other applications using a touchselector directly should
be fixed with this patch already.
Comment 22 Claudio Saavedra 2010-04-27 12:39:33 UTC
Created an attachment (id=2662) [details]
Correct patch

My bad, previous patch was incomplete.
Comment 23 Claudio Saavedra 2010-04-27 12:41:35 UTC
Created an attachment (id=2663) [details]
test case

This test case shows all the possible combination of matches that I tested for.
These include standard ascii, latin with accents, some characters that are
decomposable like ñ, ü, and so on, cyrillic, and the famous by now polish Ł.

If anyone wants to patch the test to add some other test cases that we should
be taking into consideration now is the time.
Comment 24 Claudio Saavedra 2010-04-27 12:43:32 UTC
Here is the output of the test case as of now. As you can see, the tests we
already know to fail, fail with the current method. Also, a Cyrillic to ascii
transliterated search fails. All of these pass with the newer matching code. 

[sbox-maemo5-i486: ~/git/hildon/tests] > run-standalone.sh
./test-helper-matching
Testing methods using g_unicode_canonical_decomposition()
test-helper-matching[4108]: GLIB WARNING ** default - test 35 failed: (łukasz,
lukasz, 0)
test-helper-matching[4108]: GLIB WARNING ** default - test 36 failed: (łukasz,
L, 0)
test-helper-matching[4108]: GLIB WARNING ** default - test 37 failed: (Łucasz,
lucasz, 0)
test-helper-matching[4108]: GLIB WARNING ** default - test 39 failed:
(-Łucasz-, -l, 0)
test-helper-matching[4108]: GLIB WARNING ** default - test 40 failed:
(-Łucasz-, l, 1)
test-helper-matching[4108]: GLIB WARNING ** default - test 43 failed: (Мария,
maria, 0)
Testing methods using g_convert()
Comment 25 Mario Frasca 2010-04-27 13:31:05 UTC
just to be sure, you might want to test also for øØ and the Romanian ă.
but thanks!
Comment 26 Mario Frasca 2010-04-27 15:09:16 UTC
maybe better explain why I suggest testing also ø (o-slash) and ă (a-breve).  I
don't want to sound as teasing or joking again.

o-slash and a-breve are namely symbols for which bug 2259 also fails.  if I
understood Andre Klapper correctly, he thinks that there might be a link
between the two problems.  so maybe better be sure now that things are being
done anyway.
Comment 27 Alban Crequy 2010-04-27 15:53:56 UTC
Created an attachment (id=2664) [details]
maemo9948-hildon-fm.patch

(In reply to comment #22)
> Created an attachment (id=2662) [details] [details]
> Correct patch
> 
> My bad, previous patch was incomplete.

Here is the patch for hildon-fm. I tested the Live Search in File Manager with
your patch and this one, and it works fine.
Comment 28 Claudio Saavedra 2010-04-27 16:14:50 UTC
(In reply to comment #25)
> just to be sure, you might want to test also for øØ and the Romanian ă.
> but thanks!
> 

Tests for ø fail, however tests for ǿ work fine.

ă works fine.

I guess iconv doesn't know what to do with ø..
Comment 29 Bartosz Taudul 2010-04-27 16:43:28 UTC
See http://taschenorakel.de/mathias/2007/11/06/iconv-transliterations/ for
reference. It seems that iconv transliteration is heavily dependant on the
locale setting (after reading comments in the link it even makes sense). Is it
taken into account in the tests?
Comment 30 Claudio Saavedra 2010-04-27 17:42:58 UTC
Yes, I know that. The tests are assuming en_GB tbh, but in any case, it
shouldn't be a problem what iconv transliterates to, as long as the person
typing is aware of his locale and typing accordingly.

In any case, that doesn't seem to be related to the problem, because I tested
with da_DK.UTF-8 and it didn't really make any difference.
Comment 31 Claudio Saavedra 2010-06-01 18:14:01 UTC
Committed into master and hildon-2-2.

commit 2b2a3e8ecb19bbe00f334f6e88d2bcfa89408b24
Author: Claudio Saavedra <csaavedra@igalia.com>
Date:   Tue Apr 27 12:23:12 2010 +0300

    Add and use new iconv based matching methods

    These methods allow for more complete matching rules, including for
    example the polish Ł. Use them now in the HildonTouchSelector live
    search.

    Fixes: MB#9948 (Contact quick search entering L does not recognize Polish
"Ł")
    Fixes: NB#165083 (Contact quick search entering L does not recognize Polish
"Ł")
Comment 32 Andre Klapper maemo.org 2010-10-25 17:13:39 UTC
The problem reported here should be fixed in the update that was released today
for public: The Maemo5 update version 20.2010.36-2 (also called "PR1.3"
sometimes). Please leave a comment if the problem is not fixed for you in this
update version.
Comment 33 Andre Klapper maemo.org 2010-10-26 15:33:24 UTC
Bartosz:
Note that the patch for *hildon-widgets* has been submitted but it does not
affect Contacts as Contacts is not using Hildon's matching function to filter
contacts.
Please feel free to file a new report.
Comment 34 Daniel Poznański (reporter) 2010-11-24 20:42:27 UTC
sorry but it still doesnt work - unfortunately:(
Comment 35 Andre Klapper maemo.org 2010-11-24 21:15:39 UTC
Daniel: See comment 33.