The Racket Reference阅读笔记【八】Regexp

June 12, 2021

Table of Contents

4.8 Regular Expressions

The Racket Reference阅读笔记。

4.8 Regular Expressions

正则表达式可以用字符串或者字节串来定义。采用的模式语言源于Unix的egrep或者Perl。字符串正则产生的是字符串匹配，字节串正则产生的是字节串匹配。如果用字符串正则来匹配字节串或者字节流输入，那么默认会用UTF-8编码来处理。如果用字节串正则来匹配字符串，那么同样会用UTF-8编码来处理。

正则表达式可以（采用regexp或者byte-regexp）编译成一个正则量值，以提高匹配执行效率（对比regexp-match）。regexp和byte-regexp默认采用的是egrep的模式语言，pregexp以及byte-pregexp采用的是偏向于Perl的模式语言。

两个正则量值equal?判定为真的条件是它们有相同的源，使用相同的模式语言，并且同为字符串正则或者同为字节串正则。

正则量值的字面值表示为：#rx和#px。默认的读取器会将正则字面值内拘化。

4.8.1 Regexp Syntax

此章节给出regexp以及pregexp的语法。

4.8.2 Additional Syntactic Constraints

一些额外的语法限制。

4.8.3 Regexp Constructors

(regexp? v)
(pregexp? v)
(byte-regexp? v)
(byte-pregexp? v)
(regexp str)
(regexp str handler)，假使str不是一个符合规则的正则，丢给handler处理
(pregexp str)
(pregexp str handler)
(byte-regexp bstr)
(byte-regexp bstr handler)
(byte-pregexp bstr)
(byte-pregexp bstr handler)
(regexp-quote str [case-sensitive?])，转义字符串使其可以在正则中匹配字面值
(regexp-quote bstr [case-sensitive?])
(regexp-max-lookbehind pattern)，有趣

4.8.4 Regexp Matching

regexp-match，字符串正则匹配
regexp-match*，同上但是结果为连对
regexp-try-match，匹配失败时不会消耗输入端口的内容
regexp-match-positions，返回匹配的坐标
regexp-match-positions*
regexp-match? ，只返回匹配成功或者失败
regexp-match-exact?，必须匹配完整内容
regexp-match-peek ，用于输入端口的预览
regexp-match-peek-positions
regexp-match-peek-immediate
regexp-match-peek-positions-immediate
regexp-match-peek-positions*
regexp-match/end，返回额外的结果
regexp-match-positions/end
regexp-match-peek-positions/end
regexp-match-peek-positions-immediate/end

4.8.5 Regexp Splitting

regexp-split，regexp-match*的补充，使用正则来分隔目标串

4.8.6 Regexp Substitution

regexp-replace
regexp-replace*
regexp-replaces
regexp-replace-quote

（未完待续）