Falldog的程式戰場: [Lex] 簡易語法教學

27 9月 2007

[Lex] 簡易語法教學

Lex是個古老的工具
雖然是個老東西，但是還是挺好用的！
Lex的功用主要是對一個文件寫下rule
然後產生一個compiler去paser這種文件

安裝：
　　目前Lex / Flex在linux下皆可以安裝執行
　　以Ubuntu為例，只要下指令
　　% apt-get install flex
　　即自動幫你安裝完成
　　接著只要輸入指令flex即可執行Lex程式了

執行Lex的順序：
　　Lex的input file，必須是*.l 的檔案 ( 副檔名為l ... 小寫的L )
　　接著只要輸入指令
　　% flex test.l
　　然後Lex就會自動產生一個output file：lex.yy.c
　　接著只要compile這個lex.yy.c 就可以執行這個token parser了
　　% gcc lex.yy.c -ll
　　而-ll是為了include lex的library

Lex的Input File架構：
　　*.l 主要分三個部分：definition & rules & user code
　　這三個部分以「%%」為分界

　　definition
　　%%
　　rules
　　%%
　　user code

　　definition：使用者自己定義的變數，都放在這個地方
　　rules：parser對token match的規則
　　user code：最後產生的lex.yy.c最底下會有一模一樣的code

Definition：
　　在Definition的區間裡，可以宣告一些在rule中的code要使用的變數(寫法跟c一模一樣)
　　而這些code必須用%{ 與 %} 將跨行的code包起來
　　因為在這個區間的code都會被完完整整、一字不漏地output至lex.yy.c檔中
　　所以在compile lex.yy.c檔時，才不會產生error!
　　Ex:
　　%{//要記錄parser的input file的總字數與行數
　　　　　int num_char = 0;
　　　　　int num_line = 0;
　　%}
　　%%
　　\n { num_line++; }
　　. { num_char++; }

　　也可以宣告一些「rule的變數」，讓rule的寫法更簡潔
　　寫法為：
　　name definition
　　
　　Ex :
　　number [0-9]+
　　identifier [a-zA-Z_][a-zA-Z_0-9]*
　　%%
　　{number} printf("%s this token is a number\n", yytext);
　　{identifier} printf("%s this token is a identifier\n", yytext);

　　上面的意思其實就是...
　　{[0-9]+} printf("%s this token is a number\n", yytext);
　　{[a-zA-Z_][a-zA-Z_0-9]*} printf("%s this token is a identifier\n", yytext);

Rule:
　　要對input file切token的規則，全寫在這裡。
　　寫法的規則是：
　　pattern action
　　
　　pattern可以輸入一些正規表示法，或是一些word，而正規表示法在此不再贅述
　　想了解的人，自己想辦法吧，筆者累了....Or2...
　　action則是當pattern match後，執行相對應的code(跟c一模一樣)，因此這些code會原封不動地寫入output file中
　　若action的code太多，則可以用"{" "}"跨行將code包起來
　　Ex:
　　[0-9]+ ECHO;printf("this is a number!\n");
　　等同於...
　　[0-9]+ {
　　　　　　　　ECHO;
　　　　　　　　printf("this is a number!\n");
　　　　　　　}
　　
　　在這邊有一個特別的word可以用在action中
　　ECHO 可以印出yytext(match pattern的字串)中的內容至output中

Global Variable：
　　這個是lex的預設變數，在寫*.l檔的definition & rule時，可以直接使用這些變數
　　yyin 是lex的input來源，型態為FILE * ，初始預設為stdin
　　yytext 當rule中match一個pattern時，match的string就會存在yytext中，型態為char *
　　yyleng 記錄yytext的長度
　　yylineno 記錄目前的yyin讀到第幾行了
　　

Example：
　　這是一個計算input file的總字數&行數的lex檔　　

%{
int num_lines = 0, num_chars = 0;
%}

%%
\n   { ++num_lines; ++num_chars; }
.    { ++num_chars; }

%%
main()
{
yylex();
printf( "# of lines = %d, # of chars = %d\n",
num_lines, num_chars );
}

5 則留言:

Daybreak 提到...: 這份文件對Lex新手很有幫助喔！謝謝你不吝分享。; 2008年3月15日晚上7:08
Falldog 提到...: :)
好說好說; 2008年3月19日下午3:07
匿名提到...: 太感謝了~~稍微知道怎麼使用了
請問最後的執行檔要怎麼使用阿?
(原始碼怎麼經過執行檔產生parser後的文件?); 2008年4月17日晚上11:13
Falldog 提到...: 很簡單
將上面最後的範例存成test.l檔
% flex test.l
#產生lex.yy.c
% g++ lex.yy.c -ll
#產生a.out執行檔
% cat [Any File] | a.out
#將任何一個文字檔cat出來然後pipe給a.out即可出現結果

因為Lex的yyin預設為stdin(也就是使用者輸入) 所以可以將一個文字檔cat出來，然後透過pipe的方式，變成a.out的stdin
:); 2008年4月18日上午9:49
neco 提到...: 這篇真得是佛心來著　！
對新手來說寫得很清楚幫助也很大
（剛好要寫作業找到這篇XD)

不過剛剛照著範例複製貼上,好像因為有空格的關係會有錯誤喔（剛剛把空格弄就OK了）; 2009年4月7日凌晨1:20

張貼留言