Closed Bug 1048899 Opened 10 years ago Closed 10 years ago

::first-letter pseudo-element does not extend to enough letters in some cases for Telugu

Categories

(Core :: Layout: Text and Fonts, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla34

People

(Reporter: veeven, Assigned: jfkthame)

References

(Depends on 1 open bug)

Details

(Keywords: testcase)

Attachments

(4 files, 2 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 6.1; rv:34.0) Gecko/20100101 Firefox/34.0 (Beta/Release)
Build ID: 20140804030205

Steps to reproduce:

1. Open the attached file in the browser.

(Background: On few Telugu blogs, I noticed that dropcaps are not rendered as expected. When I checked, I found that those themes are using "::first-letter" pseudo-element to style the drop-caps.

To get an understanding of what is working well and what is not, I created the attached file to see different combinations.)




Actual results:

The text in "Reference" and "Actual" columns should be rendered same. But, last 3 rows are not matching.


Expected results:

The text in "Actual" column should have been rendered exactly as in the "Reference" column (barring the color difference).
Component: Untriaged → Layout
Keywords: testcase
OS: Windows 7 → All
Product: Firefox → Core
Hardware: x86 → All
See Also: → 597510
Version: 34 Branch → Trunk
Chrome and IE are doing worse. Here is their renderings juxtaposed. Red line indicates incorrectly rendered ones.
I reported bug 603710 some time ago on similar issues in Devanagari, but I'm still not clear how to define the cases which we get wrong.

nsTextFrame.cpp uses FindClusterEnd to find the characters which are given first-letter style. Is this a bug in FindClusterEnd, or is it correct for some purposes and we need a different API for first-letter?
Component: Layout → Layout: Text
Depends on: 603710
Flags: needinfo?(jfkthame)
Part of the problem in the attached testcase is that it applies font-weight:bold to the first-letter, in addition to changing the color. We don't currently support complex-script shaping across font-run boundaries (and it's unclear in general whether this is even something that should be attempted). So this is why the text shaping breaks down in the last three examples.

Removing the font-weight property, leaving just color to indicate the extent of the first-letter, provides a clearer test of what that selector is matching. This leaves the glyph shaping correct, but the first-letter color still does not cover the full extent of the initial cluster.

We should probably split first-letter behavior out from FindClusterEnd to its own helper, as it is unlikely that a single definition of "cluster" will serve for all purposes. The "cluster end" for first-letter purposes may well be different than the cluster end for letter-spacing or for cursor movement, for example.

(http://www.w3.org/TR/css3-selectors/#first-letter currently leaves the precise definition of ::first-letter rather vague, largely because I don't think anyone has figured out exactly what the behavior should be across all scripts and languages.)
Flags: needinfo?(jfkthame)
Here's an experimental hack that improves our behavior in a bunch of Indic cases, at least. The idea is simply to avoid ending the ::first-letter range if we're within a ligature (in addition to the existing cluster-boundary check). This is unlikely to be perfect, but I think it's an improvement. Note that this changes behavior for a Latin-script example that starts with an "fi" ligature or similar; previously, we'd color only half the ligature, or split it if the ::first-letter style changes the font; with this change, the complete ligature is treated as the first letter. I'm not sure that's a bad thing, actually...
Comment on attachment 8468375 [details] [diff] [review]
(experimental patch) don't end ::first-letter in the middle of a ligature

:smontagu, WDYT about trying something like this, in the absence of a more precise definition of what the behavior "ought" to be?
Attachment #8468375 - Flags: feedback?(smontagu)
Comment on attachment 8468375 [details] [diff] [review]
(experimental patch) don't end ::first-letter in the middle of a ligature

Review of attachment 8468375 [details] [diff] [review]:
-----------------------------------------------------------------

I don't think this is desirable for all scripts, especially not Arabic, but not even in the Latin script example you give with fi.
Attachment #8468375 - Flags: feedback?(smontagu) → feedback-
(In reply to Simon Montagu :smontagu from comment #6)
> I don't think this is desirable for all scripts, especially not Arabic,

Arabic is a tricky one: I think it'd be desirable to treat lam-alef as a single unit for ::first-letter purposes, but if a font implements other ligatures (rather than using contextual forms, for instance) those probably shouldn't be.

The most glaring problems here, though, are probably with Indic and SEAsian scripts, where ::first-letter really needs to handle the "orthographic syllable". This can not in general be determined purely from the Unicode data, as it may depend on the degree of conjunct formation or stacking implemented by the font. E.g. if a Devanagari font implements a stacked conjunct for NGA + HALANT + GA, this should be treated as a unit for ::first-letter; but if the font doesn't have this stacked form, but displays an explicit NGA-HALANT followed by separate GA, then only the NGA-HALANT cluster would be styled by ::first-letter. This is why I was experimenting with checking the ligature-start flag in the textrun.

(There are hints of this behavior in http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries, although it does not attempt to fully specify such rules.)

How about making this behavior script-dependent, then: If we're dealing with an Indic (etc) script, avoid splitting ligatures for ::first-letter, but if it's a simple script like Latin, then we ignore ligature-ness and maintain the existing behavior?
How about trying something like this? ISTM that for Indic-family scripts, this provides substantially better behavior than we currently have.
Attachment #8468375 - Attachment is obsolete: true
Attachment #8471875 - Flags: feedback?(smontagu)
Should we send spec feedback about what the spec says should happen, given that this (I think) disagrees with the spec?
Try run with the patch from comment 8: https://tbpl.mozilla.org/?tree=Try&rev=1026e9caa31a.

Note that the reftest orange here is a good thing: it's an "unexpected pass" on layout/reftests/first-letter/329069-5.html, which is indeed an Indic cluster first-letter test, and currently passes only under CoreText/AAT (where we get different "cluster" behavior from the shaper).
(In reply to David Baron [:dbaron] (UTC-7) (needinfo? for questions) from comment #9)
> Should we send spec feedback about what the spec says should happen, given
> that this (I think) disagrees with the spec?

AFAICT, the spec leaves this up to the UA, and doesn't attempt an exhaustive definition of exactly what ::first-letter includes. The explanatory "Note" at http://www.w3.org/TR/css3-selectors/#first-letter says (in part), "Additionally, some languages may have specific rules about how to treat certain letter combinations. The UA definition of ::first-letter should include at least the default grapheme cluster as defined by UAX29 and may include more than that as appropriate."

So I don't think we'd be in disagreement with the spec if we do something like this.
Comment on attachment 8471875 [details] [diff] [review]
don't end ::first-letter in the middle of a ligature for Indic and SEAsian scripts.

Review of attachment 8471875 [details] [diff] [review]:
-----------------------------------------------------------------

Yes, I think this approach is the way to go.
Attachment #8471875 - Flags: feedback?(smontagu) → feedback+
Cleaned up the patch a bit, and added a reftest based on the Telugu examples here. This also allows us to remove the 'fails' annotation from reftest 329069-5, resolving bug 603710.
Attachment #8481395 - Flags: review?(smontagu)
Attachment #8471875 - Attachment is obsolete: true
Assignee: nobody → jfkthame
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Try run to confirm the reftests pass: https://tbpl.mozilla.org/?tree=Try&rev=78efd48b03e0.
Comment on attachment 8481395 [details] [diff] [review]
don't end ::first-letter in the middle of a ligature for Indic and SEAsian scripts.

Review of attachment 8481395 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/generic/nsTextFrame.cpp
@@ +6991,5 @@
> +    case MOZ_SCRIPT_ORIYA:
> +    case MOZ_SCRIPT_TAMIL:
> +    case MOZ_SCRIPT_TELUGU:
> +    case MOZ_SCRIPT_SINHALA:
> +    case MOZ_SCRIPT_BALINESE:

I'm not sure why you need the division into "Indic", "Tibetan" etc., but if so, shouldn't Balinese through Khmer be under "other SEAsian"?
Attachment #8481395 - Flags: review?(smontagu) → review+
Attachment #8481396 - Flags: review?(smontagu) → review+
(In reply to Simon Montagu :smontagu from comment #16)
> I'm not sure why you need the division into "Indic", "Tibetan" etc., but if
> so, shouldn't Balinese through Khmer be under "other SEAsian"?

We don't really need it; it was just a convenience when collecting the list of scripts. The division and comments there reflect the shaping engines that harfbuzz uses for the respective scripts. Balinese through Khmer still go through the Indic shaping engine, as their fonts rely on the same set of features, etc.

(See http://mxr.mozilla.org/mozilla-central/source/gfx/harfbuzz/src/hb-ot-shape-complex-private.hh#154, etc.)
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: