本文介绍了LR、SLR 和 LALR 解析器有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

LR、SLR 和 LALR 解析器之间的实际区别是什么?我知道 SLR 和 LALR 是 LR 解析器的类型,但就它们的解析表而言,它们的实际区别是什么?

What is the actual difference between LR, SLR, and LALR parsers? I know that SLR and LALR are types of LR parsers, but what is the actual difference as far as their parsing tables are concerned?

以及如何显示一个语法是 LR、SLR 还是 LALR?对于 LL 语法,我们只需要证明解析表的任何单元格都不应包含多个产生式规则.LALR、SLR 和 LR 是否有类似的规则?

And how to show whether a grammar is LR, SLR, or LALR? For an LL grammar we just have to show that any cell of the parsing table should not contain multiple production rules. Any similar rules for LALR, SLR, and LR?

例如,我们如何证明语法

For example, how can we show that the grammar

S --> Aa | bAc | dc | bda
A --> d

是 LALR(1) 但不是 SLR(1)?

is LALR(1) but not SLR(1)?

编辑(ybungalobill):对于 LALR 和 LR 之间的区别,我没有得到满意的答案.所以 LALR 的表尺寸更小,但它只能识别 LR 语法的一个子集.有人可以详细说明 LALR 和 LR 之间的区别吗?LALR(1) 和 LR(1) 足以回答.他们都使用 1 个令牌前瞻,并且 两者 都是表驱动的!它们有何不同?

EDIT (ybungalobill): I didn't get a satisfactory answer for what's the difference between LALR and LR. So LALR's tables are smaller in size but it can recognize only a subset of LR grammars. Can someone elaborate more on the difference between LALR and LR please? LALR(1) and LR(1) will be sufficient for an answer. Both of them use 1 token look-ahead and both are table driven! How they are different?

推荐答案

SLR、LALR 和 LR 解析器都可以使用完全相同的表驱动机制来实现.

SLR, LALR and LR parsers can all be implemented using exactly the same table-driven machinery.

从根本上说,解析算法收集下一个输入标记 T,并参考当前状态 S(以及相关的前瞻、GOTO 和归约表)来决定要做什么:

Fundamentally, the parsing algorithm collects the next input token T, and consults the current state S (and associated lookahead, GOTO, and reduction tables) to decide what to do:

  • SHIFT:如果当前表对标记 T 说 SHIFT,则将 (S,T) 对推入解析堆栈,根据 GOTO 表对当前标记(例如,GOTO(T)),获取另一个输入令牌 T',并重复该过程
  • REDUCE:每个状态都有 0、1 或许多可能发生的减少.如果解析器是 LR 或 LALR,则针对该状态的所有有效缩减的前瞻集检查令牌.如果标记与语法规则 G = R1 R2 .. Rn 的归约的前瞻集匹配,则会发生堆栈归约和移位:调用 G 的语义操作,堆栈被弹出 n(从 Rn)次,对 (S,G) 被压入堆栈,新状态 S' 设置为 GOTO(G),循环以相同的标记 T 重复.如果解析器是 SLR 解析器,则最多有一个归约规则状态等归约动作可以盲目地完成,而无需搜索以查看适用的归约.SLR 解析器知道是否存在 减少是很有用的;这很容易判断每个状态是否明确记录了与其关联的减少次数,并且无论如何在实践中 L(AL)R 版本都需要该计数.
  • 错误:如果 SHIFT 和 REDUCE 都不可能,则声明语法错误.
  • SHIFT: If the current table says to SHIFT on the token T, the pair (S,T) is pushed onto the parse stack, the state is changed according to what the GOTO table says for the current token (e.g, GOTO(T)), another input token T' is fetched, and the process repeats
  • REDUCE: Every state has 0, 1, or many possible reductions that might occur in the state. If the parser is LR or LALR, the token is checked against lookahead sets for all valid reductions for the state. If the token matches a lookahead set for a reduction for grammar rule G = R1 R2 .. Rn, a stack reduction and shift occurs: the semantic action for G is called, the stack is popped n (from Rn) times, the pair (S,G) is pushed onto the stack, the new state S' is set to GOTO(G), and the cycle repeats with the same token T. If the parser is an SLR parser, there is at most one reduction rule for the state and so the reduction action can be done blindly without searching to see which reduction applies. It is useful for an SLR parser to know if there is a reduction or not; this is easy to tell if each state explicitly records the number of reductions associated with it, and that count is needed for the L(AL)R versions in practice anyway.
  • ERROR: If neither SHIFT nor REDUCE is possible, a syntax error is declared.

那么,如果他们都使用相同的机器,那有什么意义呢?

So, if they all the use the same machinery, what's the point?

SLR 的价值在于其实现的简单性;您不必扫描检查前瞻集的可能减少,因为最多有一个,如果状态没有 SHIFT 退出,这是唯一可行的操作.哪个减少适用可以专门附加到状态,因此 SLR 解析机器不必寻找它.在实践中,L(AL)R 解析器可以处理大量有用的语言,而且实现起来的额外工作非常少,以至于除了作为学术练习之外,没有人实现 SLR.

The purported value in SLR is its simplicity in implementation; you don't have to scan through the possible reductions checking lookahead sets because there is at most one, and this is the only viable action if there are no SHIFT exits from the state. Which reduction applies can be attached specifically to the state, so the SLR parsing machinery doesn't have to hunt for it. In practice L(AL)R parsers handle a usefully larger set of langauges, and is so little extra work to implement that nobody implements SLR except as an academic exercise.

LALR 和 LR 的区别在于表 generator.LR 解析器生成器跟踪特定状态的所有可能减少及其精确的前瞻集;您最终会得到这样的状态,在这种状态下,每个归约都与其左侧上下文中的确切前瞻集相关联.这往往会建立相当大的状态集.如果 GOTO 表和减少的查找头集兼容且不冲突,则 LALR 解析器生成器愿意组合状态;这会产生相当少的状态,代价是无法区分 LR 可以区分的某些符号序列.因此,LR 解析器可以解析比 LALR 解析器更大的语言集,但解析器表要大得多.在实践中,可以找到与目标语言足够接近的 LALR 语法,以使状态机的大小值得优化;LR 解析器更好的地方由解析器外部的临时检查来处理.

The difference between LALR and LR has to do with the table generator. LR parser generators keep track of all possible reductions from specific states and their precise lookahead set; you end up with states in which every reduction is associated with its exact lookahead set from its left context. This tends to build rather large sets of states. LALR parser generators are willing to combine states if the GOTO tables and lookhead sets for reductions are compatible and don't conflict; this produces considerably smaller numbers of states, at the price of not be able to distinguish certain symbol sequences that LR can distinguish. So, LR parsers can parse a larger set of languages than LALR parsers, but have very much bigger parser tables. In practice, one can find LALR grammars which are close enough to the target langauges that the size of the state machine is worth optimizing; the places where the LR parser would be better is handled by ad hoc checking outside the parser.

所以:这三个都使用相同的机器.SLR 是简单"的,因为您可以忽略一点点机器,但不值得这么麻烦.LR 解析更广泛的语言集,但状态表往往很大.这使得 LALR 成为实际的选择.

So: All three use the same machinery. SLR is "easy" in the sense that you can ignore a tiny bit of the machinery but it is just not worth the trouble. LR parses a broader set of langauges but the state tables tend to be pretty big. That leaves LALR as the practical choice.

说了这么多,值得知道 GLR 解析器 可以解析任何上下文无关语言,使用更复杂的机器但完全相同的表(包括 LALR 使用的较小版本).这意味着 GLR 严格来说比 LR、LALR 和 SLR 更强大;如果你能写出标准的 BNF 语法,GLR 就会根据它进行解析.机制的不同之处在于,当 GOTO 表和/或前瞻集之间存在冲突时,GLR 愿意尝试多次解析.(GLR 如何有效地做到这一点纯粹是天才 [不是我的],但不适合这篇 SO 帖子).

Having said all this, it is worth knowing that GLR parsers can parse any context free language, using more complicated machinery but exactly the same tables (including the smaller version used by LALR). This means that GLR is strictly more powerful than LR, LALR and SLR; pretty much if you can write a standard BNF grammar, GLR will parse according to it. The difference in the machinery is that GLR is willing to try multiple parses when there are conflicts between the GOTO table and or lookahead sets. (How GLR does this efficiently is sheer genius [not mine] but won't fit in this SO post).

这对我来说是一个非常有用的事实.我构建程序分析器和代码转换器和解析器是必要的,但无趣";有趣的工作是你对解析结果所做的事情,所以重点是做解析后的工作.与破解语法以进入 LALR 可用形式相比,使用 GLR 意味着我可以相对轻松地构建工作语法.这在尝试处理 C++ 或 Fortran 等非学术语言时非常重要,在这些语言中,您实际上需要数千条规则才能很好地处理整个语言,而且您不想花费一生来尝试破解语法规则以满足LALR(甚至LR)的限制.

That for me is an enormously useful fact. I build program analyzers and code transformers and parsers are necessary but "uninteresting"; the interesting work is what you do with the parsed result and so the focus is on doing the post-parsing work. Using GLR means I can relatively easily build working grammars, compared to hacking a grammar to get into LALR usable form. This matters a lot when trying to deal to non-academic langauges such as C++ or Fortran, where you literally needs thousands of rules to handle the entire language well, and you don't want to spend your life trying to hack the grammar rules to meet the limitations of LALR (or even LR).

作为一个著名的例子,C++ 被认为是极难解析的......被进行 LALR 解析的人认为.使用 GLR 机器解析 C++ 非常简单,几乎使用 C++ 参考手册背面提供的规则.(我正好有这样一个解析器,它不仅可以处理普通 C++,还可以处理各种供应商方言.这只有在实践中才有可能,因为我们使用的是 GLR 解析器,恕我直言).

As a sort of famous example, C++ is considered to be extremely hard to parse... by guys doing LALR parsing. C++ is straightforward to parse using GLR machinery using pretty much the rules provided in the back of the C++ reference manual. (I have precisely such a parser, and it handles not only vanilla C++, but also a variety of vendor dialects as well. This is only possible in practice because we are using a GLR parser, IMHO).

[2011 年 11 月我们扩展了解析器以处理所有 C++11.GLR 让这件事变得容易多了.编辑 2014 年 8 月:现在处理所有 C++17.没有任何问题或变得更糟,GLR 仍然是猫的叫声.]

这篇关于LR、SLR 和 LALR 解析器有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-20 07:13