本文介绍了线程安全/可重入野牛+ flex的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

与任何解释相比,我真的更喜欢一个可行的示例.到目前为止,我在Bison的文档站点上所读到的任何内容都与Flex所说的相矛盾.有人说将yylex声明为

I would really prefer a working example to any explanation. Whatever I read so far on Bison's documentation site contradicts whatever Flex says. One says to declare yylex as

int yylex (yyscan_t yyscanner);

另一个人希望它成为:

int yylex(YYSTYPE *lvalp, YYLTYPE *llocp);

我真正需要的是位置信息.我尚不确定是否需要YYSTYPE(我现在没有使用此信息,但将来可能会使用).

What I really need is the location information. I'm not sure as of yet if I need YYSTYPE (I don't have a use for this information right now, but maybe in the future I will).

与上述内容无关,并且作为奖励,我很想知道为什么这种基础设施如此糟糕.这似乎是一件很简单的事情,但换句话说,它是不好的.它永远不会与默认值一起使用.即使编写最简单的教科书示例计算器也需要花费大量的时间来解决配置错误...为什么?

Unrelated to the above, and as a bonus, I'd be interesting to know why this infrastructure is so bad. It seems like such a straight-forward thing to do, and yet it's otherwordly bad. It never works with defaults. Even writing a simplest textbook example of calculator requires a many days of fixing configuration errors... why?

推荐答案

1.示例代码

此答案的第2部分提供了有关如何将重入配置为野牛和弹性的一种解释.示例代码的其他注释在第3节中.

1. Sample code

A kind of explanation of how reentrancy is configured into bison and flex is provided in section 2 of this answer. Other annotations of the sample code are in section 3.

%option noinput nounput noyywrap 8bit nodefault
%option yylineno
%option reentrant bison-bridge bison-locations

%{
  #include <stdlib.h>
  #include <string.h>
  #include "eval.tab.h"

  #define YY_USER_ACTION                                             \
    yylloc->first_line = yylloc->last_line;                          \
    yylloc->first_column = yylloc->last_column;                      \
    if (yylloc->last_line == yylineno)                               \
      yylloc->last_column += yyleng;                                 \
    else {                                                           \
      yylloc->last_line = yylineno;                                  \
      yylloc->last_column = yytext + yyleng - strrchr(yytext, '\n'); \
    }
%}
%%
[ \t]+            ;
#.*               ;

[[:digit:]]+      *yylval = strtol(yytext, NULL, 0); return NUMBER;

.|\n              return *yytext;

1.2评估

%define api.pure full
%locations
%param { yyscan_t scanner }

%code top {
  #include <stdio.h>
}
%code requires {
  typedef void* yyscan_t;
}
%code {
  int yylex(YYSTYPE* yylvalp, YYLTYPE* yyllocp, yyscan_t scanner);
  void yyerror(YYLTYPE* yyllocp, yyscan_t unused, const char* msg);
}

%token NUMBER UNOP
%left '+' '-'
%left '*' '/' '%'
%precedence UNOP
%%
input: %empty
     | input expr '\n'      { printf("[%d]: %d\n", @2.first_line, $2); }
     | input '\n'
     | input error '\n'     { yyerrok; }
expr : NUMBER
     | '(' expr ')'         { $$ = $2; }
     | '-' expr %prec UNOP  { $$ = -$2; }
     | expr '+' expr        { $$ = $1 + $3; }
     | expr '-' expr        { $$ = $1 - $3; }
     | expr '*' expr        { $$ = $1 * $3; }
     | expr '/' expr        { $$ = $1 / $3; }
     | expr '%' expr        { $$ = $1 % $3; }

%%

void yyerror(YYLTYPE* yyllocp, yyscan_t unused, const char* msg) {
  fprintf(stderr, "[%d:%d]: %s\n",
                  yyllocp->first_line, yyllocp->first_column, msg);
}

1.3评估小时

有关此文件需求的说明,请参见3.1.

1.3 eval.h

See 3.1 for an explanation of the need for this file.

#include "eval.tab.h"
#include "eval.lex.h"

1.4 main.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "eval.h"
#if !YYDEBUG
  static int yydebug;
#endif

int main(int argc, char* argv[]) {
  yyscan_t scanner;
  yylex_init(&scanner);

  do {
    switch (getopt(argc, argv, "sp")) {
      case -1: break;
      case 's': yyset_debug(1, scanner); continue;
      case 'p': yydebug = 1; continue;
      default: exit(1);
    }
    break;
  } while(1);

  yyparse(scanner);
  yylex_destroy(scanner);
  return 0;
}

1.5 Makefile

all: eval

eval.lex.c: eval.l
        flex -o $@ --header-file=$(patsubst %.c,%.h,$@) --debug $<

eval.tab.c: eval.y
        bison -o $@ --defines=$(patsubst %.c,%.h,$@) --debug $<

eval: main.c eval.tab.c eval.lex.c eval.h
        $(CC) -o $@ -Wall --std=c11 -ggdb -D_XOPEN_SOURCE=700 $(filter %.c,$^)

clean:
        rm -f eval.tab.c eval.lex.c eval.tab.h eval.lex.h main

2.重新进入问题

最要记住的是Bison/Yacc和Flex/Lex是两个独立的代码生成器.虽然它们经常一起使用,但这不是必需的.既可以单独使用它,也可以与其他工具一起使用.

2. Re-entrancy issues

The most important thing to remember is that Bison/Yacc and Flex/Lex are two independent code generators. While they are frequently used together, this is not necessary; either one can be used by itself or with other tools.

注意:以下讨论仅适用于常规的拉"式操作.解析器. Bison可以生成推式解析器(类似于Lemon),并允许进行有用的控制流反转,从而实际上简化了下面提到的几个问题.特别是,它完全避免了3.1中分析的循环依赖.我通常更喜欢推送解析器,但是对于这个特定问题,它们似乎超出了范围.

Note: The following discussion only applies to normal "pull" parsers. Bison can generate push parsers (similar to Lemon) and that allows a useful control flow inversion, which actually simplifies several of the issues mentioned below. In particular, it completely avoids the circular dependency analysed in 3.1. I usually prefer push parsers, but they seemed out of scope for this particular question.

一次调用由Bison/Yacc生成的解析器来解析整个文本,因此它不需要在调用之间维护可变的持久数据对象.它确实依赖于指导解析器进度的许多表,但是这些不可变表具有静态生存期的事实并不影响重入. (至少对于Bison,这些表没有外部链接,但是当然,插入到解析器中的用户编写的代码仍然可以看到它们.)

A Bison/Yacc generated parser is called once to parse an entire body of text, so it has no need to maintain mutable persistent data objects between calls. It does rely on a number of tables which guide the progress of the parser, but the fact that these immutable tables have static lifetime does not affect re-entrancy. (With Bison, at least, these tables do not have external linkage but of course they are still visible by user-written code inserted into the parser.)

那么,主要问题是外部可见的可变全局变量yylvalyylloc,用于增强解析器-词法分析器接口.这些全局变量绝对是Bison/Yacc的一部分; Flex生成的代码甚至都没有提及它们,它们的所有使用都是在Flex定义文件中的用户操作中明确执行的.要使bison解析器重新进入,有必要修改解析器用来从词法分析器收集有关每个令牌的信息的API,而Bison所采用的解决方案是提供额外参数(即指向数据指针)的经典方法之一.被返回"的结构解析器.因此,此重入要求更改了Bison生成的解析器调用yylex的方式;而不是调用

The main issue, then, are the externally-visible mutable globals yylval and yylloc, used to augment the parser-lexer interface. These globals are definitely part of Bison/Yacc; Flex-generated code does not even mention them, and all use of them is explicitly performed in user actions in the Flex definition files. To make a bison parser re-entrant, it is necessary to modify the API which the parser uses to collect information from the lexer about each token, and the solution adopted by Bison is the classic one of providing additional parameters which are pointers to the data structures being "returned" to the parser. So this re-entrancy requirement changes the way the Bison-generated parser calls yylex; instead of invoking

int yylex(void);

原型变为:

int yylex(YYSTYPE* yylvalp);

int yylex(YYSTYPE* yylvalp, YYLTYPE* yyllocp);

取决于解析器是否需要存储在yylloc中的位置信息. (Bison会自动检测动作中位置信息的使用,但您也可以坚持将位置对象提供给yylex.)

depending on whether or not the parser requires the location information stored in yylloc. (Bison will automatically detect use of location information in actions, but you can also insist that a location object be provided to yylex.)

这意味着即使词法分析器本身不是可重入的,也必须修改扫描程序才能与可重入的野牛解析器正确通信. (请参见下文.)

That means that the scanner must be modified in order to correctly communicate with a re-entrant bison parser, even if the lexer itself is not re-entrant. (See below.)

有少量其他供用户代码使用的Bison/Yacc变量,如果使用这些变量可能会强制更改源代码:

There are a small number of additional Bison/Yacc variables which are intended for use by user code, which might force source code changes if used:

  • yynerrs计算遇到的语法错误的数量;对于重入解析器,yynerrs对于yyparse是本地的,因此只能在操作中使用. (在旧版应用程序中,有时yyparse的调用方会引用它;对于重入解析器,需要修改此类用法.)

  • yynerrs counts the number of syntax errors which have been encountered; with a re-entrant parser, yynerrs is local to the yyparse and therefore can only be used in actions. (In legacy applications, it is sometimes referenced by yyparse's caller; such uses need to be modified for re-entrant parsers.)

yychar是超前符号的令牌类型,有时在错误报告中使用.在可重入解析器中,它对于yyparse是本地的,因此如果错误报告功能需要它,则必须显式传递它.

yychar is the token type of the lookahead symbol, and is sometimes used in error reporting. In a re-entrant parser, it is local to yyparse so if it is needed by an error reporting function, it will have to be passed explicitly.

yydebug控制是否已生成调试跟踪(如果已启用调试代码). yydebug在重入解析器中仍然是全局的,因此不可能仅对单个解析器实例启用调试跟踪. (我将其视为错误,但可以将其视为功能请求.)

yydebug controls whether a parse trace is produced, if debugging code has been enabled. yydebug is still global in a re-entrant parser, so it is not possible to enable debugging traces only for a single parser instance. (I regard this as a bug, but it could be considered a feature request.)

通过定义预处理程序宏YYDEBUG或使用-t命令行标志来启用调试代码.这些是由Posix定义的; Flex还提供了--debug命令行标志; %debug指令和parse.trace配置指令(可以在野牛命令行上用-Dparse.trace进行设置.

Debugging code is enabled by defining the preprocessor macro YYDEBUG or by using the -t command-line flag. These are defined by Posix; Flex also provides the --debug command line flag; the %debug directive and the parse.trace configuration directive (which can set with -Dparse.trace on the bison command line.

yylex;每次调用时,它都返回一个令牌.它需要在调用之间保持大量的持久状态,包括其当前缓冲区和跟踪词汇进度的各种指针.

yylex is called repeatedly over the course of the parse; each time it is called, it returns a single token. It needs to maintain a large amount of persistent state between calls, including its current buffer and various pointers tracking lexical progress.

在默认的词法分析器中,此信息保存在全局struct中,该信息仅供特定的全局变量(在现代Flex模板中大多数为宏)之外,不希望由用户代码引用.

In a default lexer, this information is kept in a global struct which is not intended to be referenced by user code, except for specific global variables (which are mostly macros in modern Flex templates).

在可重入词法分析器中,所有Flex的持久性信息都收集到一个不透明的数据结构中,该结构由类型为yyscan_t的变量所指向.必须将此变量传递给对Flex函数的每次调用,而不仅仅是yylex. (例如,列表包括各种缓冲区管理功能.)Flex约定是,持久状态对象始终是函数的 last 参数.一些已重定位到此数据结构中的全局变量具有关联的宏,因此可以通过其传统名称Flex动作来引用它们.在yylex之外,所有访问(和修改,在可变变量的情况下)必须使用 Flex手册.显然,getter/setter函数的列表不包括 Bison 变量的访问器,例如yylval.

In a re-entrant lexer, all of Flex's persistent information is collected into an opaque data structure pointed to by a variable of type yyscan_t. This variable must be passed to every call to Flex functions, not just yylex. (The list includes, for example, the various buffer management functions.) The Flex convention is that the persistent state object is always the last argument to a function. Some globals which have been relocated into this data structure have associated macros, so that it is possible to refer to them by their traditional names Flex actions. Outside of yylex, all accesses (and modifications, in the case of mutable variables) must be done with getter and setter functions documented in the Flex manual. Obviously, the list of getter/setter functions does not include accessors for Bison variables, such as yylval.

所以yylex在可重入的 scanner 中具有原型

So yylex in a re-entrant scanner has the prototype

int yylex(yyscan_t state);

2.3解析器与扫描器之间的通信

Flex/lex本身仅识别令牌;取决于与每个模式相关联的用户操作才能传达匹配结果.按照惯例,解析器希望yylex返回一个小整数,表示令牌的语法类型,或者返回0,以表示已到达输入的末尾.令牌的文本存储在变量(或yyscan_t成员)yytext中(其长度在yyleng中),但是由于yytext是指向生成的扫描器中内部缓冲区的指针,因此只能在使用字符串值之前下一次调用yylex.由于LR解析器通常在读取了多个标记后才处理语义信息,因此yytext不是传递语义信息的适当机制.

2.3 Communication between parser and scanner

Flex/lex itself only recognizes tokens; it is up to the user action associated with each pattern to communicate the result of the match. Conventionally, parsers expect that yylex will return a small integer representing the token's syntactic type or 0 to indicate that the end of input has been reached. The token's text is stored in the variable (or yyscan_t member) yytext (and its length in yyleng) but since yytext is a pointer to an internal buffer in the generated scanner, the string value can only be used before the next call to yylex. Since LR parsers do not generally process semantic information until several tokens have been read, yytext is not an appropriate mechanism for passing semantic information.

如上所述,如果需要,非可重​​入Bison/Yacc生成的解析器假定使用全局yylval来传达语义信息,以及使用yylloc全局来传达源位置信息(Bison仅).

As mentioned above, non-reentrant Bison/Yacc generated parsers provide assume the use of the global yylval to communicate semantic information, as well as the yylloc global to communicate source location information, if that is desired (Bison only).

但是,如上所述,在可重入解析器中,这些变量是yyparse的局部变量,并且解析器在每次调用词法分析器时将 pointers 传递给变量.这需要更改yylex的原型,以及使用yylval和/或yylloc的所有扫描程序操作.

But, as noted above, in a re-entrant parser these variables are local to yyparse and the parser passes pointers to the variables on each call to the lexer. This requires changes to the prototype of yylex, as well as to any scanner actions which use yylval and/or yylloc.

可重入的野牛生成的解析器期望的原型是:

The prototype expected by a reentrant bison-generated parser is:

int yylex(YYSTYPE* yylvalp, YYLTYPE* yyllocp, yyscan_t state);

(如果不使用位置,则删除yyllocp自变量.)

(If locations are not used, the yyllocp argument is eliminated.)

Flex的%bison-bridge指令(如果正在使用位置跟踪,则为%bison-bridge%bison-locations的组合)将确保yylex原型正确.

Flex's %bison-bridge directive (or the combination of %bison-bridge and %bison-locations if location tracking is being used) will ensure that the yylex prototype is correct.

扫描器操作中对yylval的所有引用也需要修改,因为bison的可重入API将指针指向语义值和位置对象.如果语义类型是union(通常是通过在野牛源中放置%union声明产生的),则需要将使用yylval.tag的扫描程序动作更改为yylval->tag.同样,如果您使用一种语义类型,即默认类型或使用%define api.value.type声明的一种(在bison源中),则需要将yylval = ...替换为*yylval = ...,如上面的示例代码中所示.

All references to yylval in scanner actions also need to be modified, since bison's reentrant API passes pointers to the semantic value and location objects. If the semantic type is a union (normally produced by placing a %union declaration in the bison source), then you'll need to change scanner actions which use yylval.tag to yylval->tag. Similarly, if you use a single semantic type, either the default type or one declared (in the bison source) with %define api.value.type, then you'll need to replace yylval = ... with *yylval = ..., as in the sample code above.

鉴于上述情况,在声明YYSTYPE之前不可能声明yylex().同样,在声明了yyscan_t之前,不可能声明yyparse().由于yylexyyscan_t在flex生成的标头中,而yyparseYYSTYPE在bison生成的标头中,因此两个标头的包含顺序都不起作用.或者,换句话说,存在循环依赖.

Given the above, it is impossible to declare yylex() until YYSTYPE has been declared. Also it is impossible to declare yyparse() until yyscan_t has been declared. Since yylex and yyscan_t are in the flex-generated header and yyparse and YYSTYPE are in the bison-generated header, neither inclusion order for the two headers can work. Or, to put it another way, there is a circular dependency.

由于yyscan_t只是void*的类型别名(而不是指向不完整类型的指针,可以说是将指针传递给不透明数据结构的更干净的方式),因此可以通过插入冗余来打破循环typedef:

Since yyscan_t is just a type alias for void* (rather than being a pointer to an incomplete type, which is arguably a cleaner way of passing pointers to opaque datastructures), the cycle can be broken by inserting a redundant typedef:

typedef void* yyscan_t;
#include "flex.tab.h"
#include "flex.lex.h"

那很好.下一步似乎是将typedef和第二个#include都放在野牛生成的标头flex.tab.h中,使用code requires块将typedef放在开头附近,将code provides块将#include放在末尾(或至少在YYSTYPE声明之后).不幸的是,这是行不通的,因为flex.tab.h包含在flex生成的扫描程序代码中.这样做的结果是将flex生成的标头包含在flex生成的源代码中,并且不受支持. (尽管flex生成的标头确实具有标头保护,但是生成的源文件不需要标头文件存在,因此它包含内容的副本而不是#include语句,并且该副本不包含标头.警卫.)

That works fine. The next step would appear to be to put both the typedef and the second #include inside the bison-generated header flex.tab.h, using a code requires block to put the typedef near the beginning and a code provides block to put the #include near the end (or at least after the YYSTYPE declaration). Unfortunately, that does not work, because flex.tab.h is included in the flex-generated scanner code. That would have the result of including the flex-generated header into the flex-generated source code, and that is not supported. (Although the flex-generated header does have a header guard, the generated source file does not require the header file to exist, so it contains a copy of the contents rather than an #include statement, and the copy does not include the header guard.)

在示例代码中,我做的第二件事是:我使用了code requires块将typedef插入到bison生成的标头中,并创建了另一个eval.h标头文件,其他人可以使用翻译单元,其中包含按正确顺序排列的由bison和flex生成的标头.

In the sample code, I did the next best thing: I used a code requires block to insert the typedef into the bison-generated header, and created an additional eval.h header file which can be used by other translation units which includes the bison- and flex-generated headers in the correct order.

这很丑.已经提出了其他解决方案,但是恕我直言,它们都是丑陋的.这恰好是我使用的那个.

That's ugly. Other solutions have been proposed, but they are all, IMHO, equally ugly. This just happens to be the one which I use.

yylex和yyerror原型都根据解析器是否需要源位置而有所不同.由于这些更改将在各个项目文件中回荡,因此我认为最可取的做法是强制使用位置信息,即使解析器尚未使用它也是如此.有一天,您可能想使用它,并且维护它的运行时开销并不大(尽管它是可衡量的,所以您可能希望在资源受限的环境中忽略此建议).

Both the yylex and yyerror prototypes vary depending on whether or not source locations are required by the parser. Since these changes will reverberate through the various project files, I think that the most advisable is to force the usage of location information, even if it is not (yet) being used by the parser. Someday you might want to use it, and the runtime overhead of maintaining it is not enormous (although it is measurable, so you might want to ignore this advice in resource-constrained environments).

为简化加载,我在flex.l的第10-17行中包含一个简单的常规实现,该实现在YY_USER_ACTION上使用,以在所有flex规则动作的开始处插入代码.此YY_USER_ACTION宏应适用于不使用yyless()yymore()input()REJECT的任何扫描仪.正确应对这些功能并不是很困难,但在这里似乎超出了范围.

To simplify the load, I include a simple general implementation in lines 10-17 of flex.l which uses on the YY_USER_ACTION to insert code at the beginning of all flex rule actions. This YY_USER_ACTION macro should work for any scanner which does not use yyless(), yymore(), input() or REJECT. Correctly coping with these features is not too difficult but it seemed out of scope here.

该示例代码实现了一个简单的面向行的计算器,该计算器可用于交互式评估. (不包括一些其他对交互式评估有用的功能.交互式计算器可以从readline()集成和访问以前计算的值中大大受益;变量和命名常量也很方便.)为了使交互式使用合理,我插入了一个最小的错误恢复策略:flex.y的第24行的error生产将丢弃令牌,直到遇到换行符为止,然后使用yyerrok以避免丢弃错误消息.

The sample code implements a simple line-oriented calculator, which can be used for interactive evaluation. (Some other features useful for interactive evaluation were not included. An interactive calculator could benefit greatly from readline() integration and access to previously calculated values; variables and named constants would also be handy.) To make interactive use reasonable, I inserted a very minimal error recovery strategy: the error production at line 24 of flex.y discards tokens until a newline is encountered and then uses yyerrok to avoid discarding error messages.

Bison和Yacc生成的解析器遵循Posix的要求,除非定义了预处理器宏YYDEBUG且其值非零,否则不编译生成源中的调试代码.如果将调试代码编译到二进制文件中,则调试跟踪由全局变量yydebug控制.如果YYDEBUG不为零,则为yydebug提供默认值0,该值将禁用跟踪.如果YYDEBUG为0,则由野牛/yacc生成的代码未定义yydebug.如果未定义YYDEBUG,则它将由生成的代码定义,其值为0,除非使用-t命令行选项,在这种情况下它将具有默认值1.

Bison and Yacc generated parsers follow the Posix requirement that debugging code in the generated source is not compiled unless the preprocessor macro YYDEBUG is defined and has a non-zero value. If debugging code is compiled into the binary, then debugging traces are controlled by the global variable yydebug. If YYDEBUG is non-zero, yydebug is given a default value of 0, which disables traces. If YYDEBUG is 0, yydebug is not defined by the bison/yacc-generated code. If YYDEBUG is not defined, then it will be defined by the generated code, with value 0 unless the -t command-line option is used, in which case it will have default value 1.

Bison将YYDEBUG宏定义插入到生成的头文件中(尽管Posix并不强制这样做),因此我在main.c中对其进行了测试,并在以下情况下提供了yydebug变量的替代定义:尚未定义.这样一来,即使无法打开跟踪功能,也可以启用调试跟踪的代码进行编译.

Bison inserts the YYDEBUG macro definition into the generated header file (although it is not obliged by Posix to do so), so I test for it in main.c and provide an alternative definition of the yydebug variable if it has not been defined. This allows the code which enables debugging traces to compile even if it is not going to be able to turn on tracing.

Flex生成的代码通常使用全局变量yy_flex_debug来打开和关闭跟踪.与yacc/bison不同,如果将调试代码编译到可执行文件中,则yy_flex_debug的默认值为1.由于可重入扫描程序无法使用全局变量,因此可重入扫描程序将调试启动器放入yyscan_t对象,可在其中使用yyset_debugyyget_debug访问功能对其进行访问,这些功能定义了是否已调试代码.编译.但是,可重入调试标志的默认值为0,因此,如果创建可重入扫描程序,则即使已将跟踪编译到可执行文件中,也需要显式启用跟踪. (这使可重入的扫描器更像是解析器.)

Flex-generated code normally uses the global variable yy_flex_debug to turn traces on and off; unlike yacc/bison, the default value of yy_flex_debug is 1 if debugging code is compiled into the executable. Since a reentrant scanner cannot use global variables, the reentrant scanner puts the debug enabler into the yyscan_t object, where it can be accessed with the yyset_debug and yyget_debug access functions, which are defined whether or not debugging code has been compiled. However, the default value of the re-entrant debugging flag is 0, so if you create a reentrant scanner, you need to explicitly enable tracing even if tracing has been compiled into the executable. (This makes a reentrant scanner more like a parser.)

如果使用-s命令行选项运行,示例main程序将打开扫描仪跟踪,并使用-sp选项打开解析器跟踪.

The sample main program turns on scanner tracing if run with the -s command-line option, and parser tracing with the -sp option.

这篇关于线程安全/可重入野牛+ flex的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-16 08:21