1. Lex and Yacc Introduction
1. Background
Lex and Yac Mechanics
1.1 Lexers and parsers
- Lexers (lex) and parsers (yacc) are used together because they divide the task of interpreting and understanding code. The lexer tokenizes the input (splitting it into meaningful units or tokens), while the parser uses those tokens to form a syntax tree based on grammatical rules.
- Communication: The lexer provides tokens to the parser one by one. The parser requests tokens from the lexer as needed to analyze and build its parse tree.
1.2 Parser require knowledge of all tokens?
- No, the parser doesn’t need to know all tokens upfront. It works on the tokens supplied by the lexer and uses the grammar rules to determine if the sequence of tokens forms a valid structure. The parser needs to understand the types of tokens (e.g., identifiers, operators) and the structure that they can form, but it doesn’t need to know every possible token at once.
1.3 Reasons behind creating a language?
- To simplify solving a specific class of problems (domain-specific languages).
- To improve productivity or expressiveness for developers.
- To introduce new features or paradigms not well-supported by existing languages.
- For educational or research purposes, to explore language design concepts.
1.4 Yacc with shift and reduce
- Shift: Moving a token from the input to the parser’s stack (i.e., reading the next token without any further actions).
- Reduce: Replacing a series of tokens on the stack that matches a rule’s right-hand side with the rule’s left-hand side, effectively collapsing them into a higher-level construct.
1.5 reduce/reduce error in Yacc
- A reduce/reduce error occurs when the parser cannot decide which rule to reduce, meaning two or more rules can apply to the same input at the same point. This ambiguity makes it unclear how to proceed with parsing.
1.6 Limitations of single-symbol lookahead?
- Single-symbol lookahead means the parser can only peek one token ahead to decide what rule to apply. This limitation can make it harder to parse more complex or ambiguous languages, as it may not have enough information to decide on the correct parsing rule with just one token in advance.
1.7 Yacc & Lex for a English-language?
- Not really. Yacc and Lex are designed for deterministic parsing of structured, formal languages like programming languages. Natural languages like English are highly ambiguous and require more complex parsing strategies (e.g., probabilistic or context-sensitive grammars) than Yacc/Lex can handle effectively.
2. Getting Started
To create a simple Lex and Yacc program on Ubuntu, follow these steps:
2.1 Install Lex (Flex) and Yacc (Bison)
sudo apt-get install flex bison
2.2 Simple Drone Language Specification
Lets design a simple language for controlling a remote control drone. Define basic commands for movement, turning, and altitude control.
- Commands:
- MOVE direction [distance]: Moves the drone forward or backward by a certain distance.
MOVE FORWARD [distance]
MOVE BACKWARD [distance]
- TURN direction [degrees]: Rotates the drone left or right by a specific number of degrees.
TURN LEFT [degrees]
TURN RIGHT [degrees]
- ALTITUDE [up/down] [distance]: Changes the altitude of the drone by a specified amount.
ALTITUDE UP [distance]
ALTITUDE DOWN [distance]
- END: Ends the program and lands the drone.
- MOVE direction [distance]: Moves the drone forward or backward by a certain distance.
2.2 Example Drone Program
MOVE FORWARD 100TURN LEFT 90ALTITUDE UP 20MOVE FORWARD 50TURN RIGHT 45ALTITUDE DOWN 10MOVE BACKWARD 30END
2.3 Lex and Yacc Implementation
2.3.1 Lex File (drone.l
)
The lexer will recognize the commands and pass tokens to the parser:
%{#include "y.tab.h"%}
%%MOVE { return MOVE; }FORWARD { return FORWARD; }BACKWARD { return BACKWARD; }TURN { return TURN; }LEFT { return LEFT; }RIGHT { return RIGHT; }ALTITUDE { return ALTITUDE; }UP { return UP; }DOWN { return DOWN; }END { return END; }[0-9]+ { yylval.num = atoi(yytext); return NUMBER; }[ \t\n]+ { /* Ignore whitespace */ }. { return yytext[0]; }%%
2.3.2 Yacc File (drone.y
)
The parser will define the structure of valid programs:
%{#include <stdio.h>#include <stdlib.h>
int yyerror(const char *s);void move(char *direction, int distance);void turn(char *direction, int degrees);void change_altitude(char *direction, int distance);%}
%union { int num;}
%token MOVE FORWARD BACKWARD TURN LEFT RIGHT ALTITUDE UP DOWN END%token <num> NUMBER
%%program: commands END { printf("Drone landed.\n"); } ;
commands: /* empty */ | commands command ;
command: MOVE FORWARD NUMBER { move("forward", $3); } | MOVE BACKWARD NUMBER { move("backward", $3); } | TURN LEFT NUMBER { turn("left", $3); } | TURN RIGHT NUMBER { turn("right", $3); } | ALTITUDE UP NUMBER { change_altitude("up", $3); } | ALTITUDE DOWN NUMBER { change_altitude("down", $3); } ;%%
void move(char *direction, int distance) { printf("Moving %s %d units.\n", direction, distance);}
void turn(char *direction, int degrees) { printf("Turning %s %d degrees.\n", direction, degrees);}
void change_altitude(char *direction, int distance) { printf("Changing altitude %s by %d units.\n", direction, distance);}
int main() { yyparse(); return 0;}
int yyerror(const char *s) { fprintf(stderr, "Error: %s\n", s); return 0;}
2.4 Compiling and Running
-
Generate and Compile:
Terminal window yacc -d drone.ylex drone.lgcc y.tab.c lex.yy.c -o drone_control -lfl -
Run:
Terminal window ./drone_control -
Example Input:
MOVE FORWARD 100TURN LEFT 90ALTITUDE UP 20END
2.5 Example Output:
Moving forward 100 units.Turning left 90 degrees.Changing altitude up by 20 units.Drone landed.