Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[css-selectors-4] Include whitespace in non-optional <combinator> #7027

Closed
cdoublev opened this issue Feb 9, 2022 · 9 comments
Closed

[css-selectors-4] Include whitespace in non-optional <combinator> #7027

cdoublev opened this issue Feb 9, 2022 · 9 comments
Labels
selectors-4 Current Work

Comments

@cdoublev
Copy link
Collaborator

cdoublev commented Feb 9, 2022

Selectors 4 does not include the whitespace in the production rule for <combinator>:

<complex-selector> = <compound-selector> [ <combinator>? <compound-selector> ]*
[...]
<combinator> = '>' | '+' | '~' | [ '|' '|' ]

[...]
Note: The grammar above states that a combinator is optional between two <compound-selector>s in a <complex-selector>. This is only for grammatical purposes, as the CSS Value Definition Syntax’s lax treatment of whitespace makes it difficult to indicate that a grammar term can be whitespace. "Omitting" a combinator is actually just specifying the descendant combinator.

What is wrong with <combinator> = ' ' | '>' | '+' | '~' | [ '|' '|' ]? (whitespace between quotes)

With svg|* as input, my current implementation for parsing this grammar result to two <compound-selector>. I would like to know if whitespaces are actually required between tokens for applying the correct fix.

<whitespace-token> is never indicated in the grammar; <whitespace-token>s are allowed before, after, and between any two tokens, unless explicitly specified otherwise in prose definitions. (For example, if the prelude of a rule is a selector, whitespace is significant.)

https://drafts.csswg.org/css-syntax-3/#rule-defs

@Loirooriol
Copy link
Contributor

I don't understand the problem, what makes you think that whitespaces are required? svg|* is a type selector, so it's a single compound selector, not 2. https://drafts.csswg.org/selectors/#typedef-type-selector

<type-selector> = <wq-name> | <ns-prefix>? '*'
<ns-prefix> = [ <ident-token> | '*' ]? '|'

Also note:

White space is forbidden:

  • Between any of the components of a <type-selector> or a <class-selector>.
@tabatkins
Copy link
Member

Yeah, I'm not certain what issue you're running into exactly either. Whitespace is implicitly allowed between any two grammar productions, so it's never indicated explicitly. The prose indicates where whitespace is not allowed.

Can you clarify what grammar productions, precisely, you're decomposing into when parsing svg|*? What are the two compound selectors?

@cdoublev
Copy link
Collaborator Author

cdoublev commented Feb 9, 2022

svg matches <wq-name>, then the parser goes upwards in <type-selector>, <compound-selector>, and continues parsing <complex-selector>. It does not find a match for <combinator> (optional) but |* matches <ns-prefix>? '*' as the second <coumpound-selector>.

@tabatkins
Copy link
Member

...huh, I was gonna say that CSS grammars obey longest-match semantics, not ordered-match semantics, but that's not stated anywhere.

I still don't think it's quite possible to indicate whitespace in the grammar (the quoted-character notation indicates delim tokens, which whitespace is not), but indicating in the prose that an "omitted" combinator must be whitespace should be fine.

@tabatkins tabatkins added the selectors-4 Current Work label Feb 9, 2022
@cdoublev
Copy link
Collaborator Author

cdoublev commented Feb 10, 2022

CSS grammars obey longest-match semantics, not ordered-match semantics

Is there a difference between obey longest-match semantics and greedy? Is it possible to obey longest-match semantics without being greedy?

I still don't think it's quite possible to indicate whitespace in the grammar (the quoted-character notation indicates delim tokens, which whitespace is not), but indicating in the prose that an "omitted" combinator must be whitespace should be fine.

I perfectly understand this and it answers my questions about the fix I need to apply on my implementation. Thanks.

@tabatkins
Copy link
Member

Is there a difference between obey longest-match semantics and greedy?

Yes, greedy means the first thing to match a segment claims that segment forever, even if it would cause the overall match to fail. Longest-match is somewhat greedy - the parse that consumes the most tokens for a production is the one that wins, even if it would cause the overall match to fail, but that isn't necessarily the "first" (it depends on how you order the productions).

CSS tokenization is explicitly designed to be longest-match; it will always parse foo( as a function-token, not as an ident-token followed by a parentheses delim-token, for example. CSS grammar parsing is not longest-match, at least in general.


Anyway, fixed the issue here by adding an explicit prose requirement that whitespace is required if you omit the combinator.

@cdoublev
Copy link
Collaborator Author

cdoublev commented Feb 18, 2022

Thanks, I'm closing this issue then.

Just so I'm sure I understood correctly, and to explain a bit better the issue than in my initial comment (9 days later I had a hard time figuring it out myself, sorry):

Input: `svg|*`

 1. Match `svg|*` vs. `<complex-selector> = <compound-selector> [<combinator>? <compound-selector>]*`
 2. Match `svg|*` vs. ...sub-productions that finally includes `<type-selector>`
 3. Match `svg|*` vs. `<type-selector> = <wq-name> | <ns-prefix>? '*'`
 4. Match `svg|*` vs. `<wq-name> = <ns-prefix>? <ident-token>`
 5. Match `svg|*` vs. `<ns-prefix> = [ <ident-token> | '*' ]? '|'`
    - result: `svg|`
    - resume 4
 6. Match `*` vs. `<ident-token>`
    - result: fails
    - backtrack to 4 and discard the match for `<ns-prefix>?` (omitted)
 7. Match `svg|*` vs. `<ident-token>`
    - result: `svg`
    - resume 4 (end) then 3, then 2, then 1
 8. Match `|*` vs. `<combinator>?`
    - result: fails but optional, resume 1
 9. Match `|*` vs. ...sub-productions that finally includes `<type-selector>`
10. Match `|*` vs. `<type-selector> = <wq-name> | <ns-prefix>? '*'`
11. Match `|*` vs. `<wq-name> = <ns-prefix>? <ident-token>`
12. Match `|*` vs. `<ns-prefix> = [ <ident-token> | '*' ]? '|'`
    - result: `|` (omitted namespace prefix)
    - resume 11
13. Match `*` vs. `<ident-token>`
    - result: fails
    - backtrack to 10 and discard the match for `<ns-prefix>?`
14. Match `|*` vs. `<ns-prefix> = [ <ident-token> | '*' ]? '|'`
    - result: `|` (omitted namespace prefix)
    - resume 10
14. Match `*` vs. `'*'`
    -> result: `*`

Results: 
  - `<compound-selector>` matches `svg`
  - `<combinator>?` is omitted
  - `<compound-selector>` matches `|*`

But because CSS matching against a grammar is implicitly defined to obey longest-match, instead of the above step 7, the parser should try with the second alternative for <type-selector> in <wq-name> | <ns-prefix>? '*', even if a match for <wq-name> were found. If it would fail with <ns-prefix>? '*', then it moves back and returns its initial match for <wq-name>, right?

I feel like this is fundamental and probably missing from the spec, as well as these comments:

parsing is non-greedy; if the first branch that starts to match eventually fails, you just move on to the second branch and try again

Right, that's a backtracking parser vs a greedy/first-match parser. CSS grammars are intended for use with a backtracking parser.

@cdoublev
Copy link
Collaborator Author

That is, to match foo bar baz vs. foo | foo bar | foo bar baz, the parser should try every alternative before returning a result for the grammar, instead of trying every alternative as a result of backtracking (until the whole input is consumed).

@tabatkins
Copy link
Member

  1. Match |* vs. <combinator>?
    result: fails but optional, resume 1

No, my edit fixed this case. Combinator is optional, but if omitted the two <compound-selector> productions must have whitespace separating them. So the next step after 8 here would be to fail to find whitespace, thus failing the attempt to start a new <compound-selector> production, so you have to go back and keep trying to build the first one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
selectors-4 Current Work
3 participants