Regex Patterns技能使用说明
2026-03-28
新闻来源:网淘吧
围观:12
电脑广告
手机广告
正则表达式模式
实用的正则表达式速查手册。涵盖JavaScript、Python、Go及命令行工具中用于验证、解析、提取和重构的模式。
使用场景
- 验证用户输入(邮箱、URL、IP、电话、日期)
- 解析日志行或结构化文本
- 从字符串中提取数据(ID、数字、令牌)
- 代码中的搜索替换(重命名变量、更新导入)
- 筛选文件或命令输出中的行
- 调试未按预期匹配的正则表达式
速查参考
元字符
| 模式 | 匹配内容 | 示例 |
|---|---|---|
. | 任意字符(换行符除外) | a.c匹配abc、a1c |
\d | 数字[0-9] | \d{3}匹配123 |
\w | 单词字符[a-zA-Z0-9_] | \w+匹配hello_123 |
\s | 空白字符[ \t\n\r\f] | \s+匹配空格/制表符 |
\b | 单词边界 | \bcat\b匹配cat不匹配scatter |
^ | 行首 | ^Error匹配以Error开头的行 |
$ | 行结束符 | \.js$匹配以 .js 结尾的行 |
\D,\W,\S | 取反:非数字、非单词、非空格 |
量词
| 模式 | 含义 |
|---|---|
* | 0 次或多次(贪婪) |
+ | 1 次或多次(贪婪) |
? | 0 次或 1 次(可选) |
{3} | 恰好 3 次 |
{2,5} | 2 到 5 次之间 |
{3,} | 3 次或更多次 |
*?,+? | 懒惰匹配(尽可能少地匹配) |
分组与交替
| 模式 | 含义 |
|---|---|
(abc) | 捕获组 |
(?:abc) | 非捕获组 |
(?P<name>abc) | 命名组(Python) |
(?<name>abc) | 命名组(JS/Go) |
a|b | 交替(a 或 b) |
[abc] | 字符类(a、b 或 c) |
[^abc] | 否定类(非 a、b 或 c) |
[a-z] | 范围 |
前瞻与后顾
| 模式 | 含义 |
|---|---|
(?=abc) | 正向先行断言(后接 abc) |
(?!abc) | 否定型先行断言(后面不跟随 abc) |
(?<=abc) | 肯定型后行断言(前面由 abc 引导) |
(?<!abc) | 否定型后行断言(前面不由 abc 引导) |
验证模式
电子邮件
# Basic (covers 99% of real emails)
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
# Stricter (no consecutive dots, no leading/trailing dots in local part)
^[a-zA-Z0-9]([a-zA-Z0-9._%+-]*[a-zA-Z0-9])?@[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?(\.[a-zA-Z]{2,})+$
统一资源定位符(URL)
# HTTP/HTTPS URLs
https?://[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?(\.[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?)*(/[^\s]*)?
# With optional port and query
https?://[^\s/]+(/[^\s?]*)?(\?[^\s#]*)?(#[^\s]*)?
IP地址
# IPv4
\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b
# IPv4 (simple, allows invalid like 999.999.999.999)
\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
# IPv6 (simplified)
(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}
电话号码
# US phone (various formats)
(?:\+1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}
# Matches: +1 (555) 123-4567, 555.123.4567, 5551234567
# International (E.164)
\+[1-9]\d{6,14}
日期与时间
# ISO 8601 date
\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])
# ISO 8601 datetime
\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:\d{2})
# US date (MM/DD/YYYY)
(?:0[1-9]|1[0-2])/(?:0[1-9]|[12]\d|3[01])/\d{4}
# Time (HH:MM:SS, 24h)
(?:[01]\d|2[0-3]):[0-5]\d:[0-5]\d
密码(强度检测)
# At least 8 chars, 1 upper, 1 lower, 1 digit, 1 special
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*()_+=-]).{8,}$
通用唯一识别码(UUID)
[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}
语义化版本
\bv?(\d+)\.(\d+)\.(\d+)(?:-([\w.]+))?(?:\+([\w.]+))?\b
# Captures: major, minor, patch, prerelease, build
# Matches: 1.2.3, v1.0.0-beta.1, 2.0.0+build.123
解析模式
日志行
# Apache/Nginx access log
# Format: IP - - [date] "METHOD /path HTTP/x.x" status size
grep -oP '(\S+) - - \[([^\]]+)\] "(\w+) (\S+) \S+" (\d+) (\d+)' access.log
# Extract IP and status code
grep -oP '^\S+|"\s\K\d{3}' access.log
# Syslog format
# Format: Mon DD HH:MM:SS hostname process[pid]: message
grep -oP '^\w+\s+\d+\s[\d:]+\s(\S+)\s(\S+)\[(\d+)\]:\s(.*)' syslog
# JSON log — extract a field
grep -oP '"level"\s*:\s*"\K[^"]+' app.log
grep -oP '"message"\s*:\s*"\K[^"]+' app.log
代码模式
# Find function definitions (JavaScript/TypeScript)
grep -nP '(?:function\s+\w+|(?:const|let|var)\s+\w+\s*=\s*(?:async\s*)?\([^)]*\)\s*=>|(?:async\s+)?function\s*\()' src/*.ts
# Find class definitions
grep -nP 'class\s+\w+(?:\s+extends\s+\w+)?' src/*.ts
# Find import statements
grep -nP '^import\s+.*\s+from\s+' src/*.ts
# Find TODO/FIXME/HACK comments
grep -rnP '(?:TODO|FIXME|HACK|XXX|WARN)(?:\([^)]+\))?:?\s+' src/
# Find console.log left in code
grep -rnP 'console\.(log|debug|info|warn|error)\(' src/ --include='*.ts' --include='*.js'
数据提取
# Extract all email addresses from a file
grep -oP '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt
# Extract all URLs
grep -oP 'https?://[^\s<>"]+' file.html
# Extract all quoted strings
grep -oP '"[^"\\]*(?:\\.[^"\\]*)*"' file.json
# Extract numbers (integer and decimal)
grep -oP '-?\d+\.?\d*' data.txt
# Extract key-value pairs (key=value)
grep -oP '\b(\w+)=([^\s&]+)' query.txt
# Extract hashtags
grep -oP '#\w+' posts.txt
# Extract hex colors
grep -oP '#[0-9a-fA-F]{3,8}\b' styles.css
特定语言用法
JavaScript
// Test if a string matches
const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
emailRegex.test('user@example.com'); // true
// Extract with capture groups
const match = '2026-02-03T12:30:00Z'.match(/(\d{4})-(\d{2})-(\d{2})/);
// match[1] = '2026', match[2] = '02', match[3] = '03'
// Named groups
const m = 'John Doe, age 30'.match(/(?<name>[A-Za-z ]+), age (?<age>\d+)/);
// m.groups.name = 'John Doe', m.groups.age = '30'
// Find all matches (matchAll returns iterator)
const text = 'Call 555-1234 or 555-5678';
const matches = [...text.matchAll(/\d{3}-\d{4}/g)];
// [{0: '555-1234', index: 5}, {0: '555-5678', index: 18}]
// Replace with callback
'hello world'.replace(/\b\w/g, c => c.toUpperCase());
// 'Hello World'
// Replace with named groups
'2026-02-03'.replace(/(?<y>\d{4})-(?<m>\d{2})-(?<d>\d{2})/, '$<m>/$<d>/$<y>');
// '02/03/2026'
// Split with regex
'one, two; three'.split(/[,;]\s*/);
// ['one', 'two', 'three']
Python
import re
# Match (anchored to start)
m = re.match(r'^(\w+)@(\w+)\.(\w+)$', 'user@example.com')
if m:
print(m.group(1)) # 'user'
# Search (find first match anywhere)
m = re.search(r'\d{3}-\d{4}', 'Call 555-1234 today')
print(m.group()) # '555-1234'
# Find all matches
emails = re.findall(r'[\w.+-]+@[\w.-]+\.\w{2,}', text)
# Named groups
m = re.match(r'(?P<name>\w+)\s+(?P<age>\d+)', 'Alice 30')
print(m.group('name')) # 'Alice'
# Substitution
result = re.sub(r'\bfoo\b', 'bar', 'foo foobar foo')
# 'bar foobar bar'
# Sub with callback
result = re.sub(r'\b\w', lambda m: m.group().upper(), 'hello world')
# 'Hello World'
# Compile for reuse (faster in loops)
pattern = re.compile(r'\d{4}-\d{2}-\d{2}')
dates = pattern.findall(log_text)
# Multiline and DOTALL
re.findall(r'^ERROR.*$', text, re.MULTILINE) # ^ and $ match line boundaries
re.search(r'start.*end', text, re.DOTALL) # . matches newlines
# Verbose mode (readable complex patterns)
pattern = re.compile(r'''
^ # Start of string
(?P<year>\d{4}) # Year
-(?P<month>\d{2}) # Month
-(?P<day>\d{2}) # Day
$ # End of string
''', re.VERBOSE)
Go
import "regexp"
// Compile pattern (panics on invalid regex)
re := regexp.MustCompile(`\d{4}-\d{2}-\d{2}`)
// Match test
re.MatchString("2026-02-03") // true
// Find first match
re.FindString("Date: 2026-02-03 and 2026-03-01") // "2026-02-03"
// Find all matches
re.FindAllString(text, -1) // []string of all matches
// Capture groups
re := regexp.MustCompile(`(\w+)@(\w+)\.(\w+)`)
match := re.FindStringSubmatch("user@example.com")
// match[0] = "user@example.com", match[1] = "user", match[2] = "example"
// Named groups
re := regexp.MustCompile(`(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})`)
match := re.FindStringSubmatch("2026-02-03")
for i, name := range re.SubexpNames() {
if name != "" {
fmt.Printf("%s: %s\n", name, match[i])
}
}
// Replace
re.ReplaceAllString("foo123bar", "NUM") // "fooNUMbar"
// Replace with function
re.ReplaceAllStringFunc(text, strings.ToUpper)
// Note: Go uses RE2 syntax — no lookahead/lookbehind
命令行(grep/sed)
# grep -P uses PCRE (Perl-compatible — full features)
# grep -E uses Extended regex (no lookahead/lookbehind)
# Find lines matching a pattern
grep -P '\d{3}-\d{4}' file.txt
# Extract only the matching part
grep -oP '\d{3}-\d{4}' file.txt
# Invert match (lines NOT matching)
grep -vP 'DEBUG|TRACE' app.log
# sed replacement
sed 's/oldPattern/newText/g' file.txt # Basic
sed -E 's/foo_([a-z]+)/bar_\1/g' file.txt # Extended with capture group
# Perl one-liner (most powerful)
perl -pe 's/(?<=price:\s)\d+/0/g' file.txt # Lookbehind works in Perl
搜索与替换模式
代码重构
# Rename a variable across files
grep -rlP '\boldName\b' src/ | xargs sed -i 's/\boldName\b/newName/g'
# Convert var to const (JavaScript)
sed -i -E 's/\bvar\b/const/g' src/*.js
# Convert single quotes to double quotes
sed -i "s/'/\"/g" src/*.ts
# Add trailing commas to object properties
sed -i -E 's/^(\s+\w+:.+[^,])$/\1,/' config.json
# Update import paths
sed -i 's|from '\''../old-path/|from '\''../new-path/|g' src/*.ts
# Convert snake_case to camelCase (Python → JavaScript naming)
perl -pe 's/_([a-z])/uc($1)/ge' file.txt
文本清理
# Remove trailing whitespace
sed -i 's/[[:space:]]*$//' file.txt
# Remove blank lines
sed -i '/^$/d' file.txt
# Remove duplicate blank lines (keep at most one)
sed -i '/^$/N;/^\n$/d' file.txt
# Trim leading and trailing whitespace from each line
sed -i 's/^[[:space:]]*//;s/[[:space:]]*$//' file.txt
# Remove HTML tags
sed 's/<[^>]*>//g' file.html
# Remove ANSI color codes
sed 's/\x1b\[[0-9;]*m//g' output.txt
常见陷阱
贪婪匹配与惰性匹配
Pattern: <.*> Input: <b>bold</b>
Greedy matches: <b>bold</b> (entire string between first < and last >)
Lazy matches: <b> (stops at first >)
Pattern: <.*?> (lazy version)
转义特殊字符
Characters that need escaping in regex: . * + ? ^ $ { } [ ] ( ) | \
In character classes []: only ] - ^ \ need escaping
# To match a literal dot: \.
# To match a literal *: \*
# To match a literal \: \\
# To match [ or ]: \[ or \]
换行与多行模式
By default . does NOT match newline.
By default ^ and $ match start/end of STRING.
# To make . match newlines:
JavaScript: /pattern/s (dotAll flag)
Python: re.DOTALL or re.S
Go: (?s) inline flag
# To make ^ $ match line boundaries:
JavaScript: /pattern/m (multiline flag)
Python: re.MULTILINE or re.M
Go: (?m) inline flag
回溯与性能
# Catastrophic backtracking (avoid these patterns on untrusted input):
(a+)+ # Nested quantifiers
(a|a)+ # Overlapping alternation
(.*a){10} # Ambiguous .* with repetition
# Safe alternatives:
[a]+ # Instead of (a+)+
a+ # Instead of (a|a)+
[^a]*a # Possessive/atomic instead of .*a
技巧
- 从简单开始,逐步增加复杂度
\d+几乎总是够用的——很少需要用到[0-9]+。 - 要在真实数据上测试正则表达式,而不仅仅是理想路径。边缘情况(空字符串、特殊字符、Unicode)会破坏简单的模式。
- 使用非捕获分组
(?:...)当您不需要捕获值时。这样速度稍快且更简洁。 - 在JavaScript中,始终使用
g标志来进行matchAll和全局replace没有它,就只能找到/替换第一个匹配项。 - Go语言的
regexp包使用RE2(不支持前瞻/后顾断言)。如果你需要这些功能,请采用不同的方法或使用regexp2包。 grep -P(PCRE)是最强大的命令行正则表达式工具。当需要前瞻断言、\d或\b时,请使用它而非grep -E。- 对于复杂的模式,请使用详细模式(Python中的
re.VERBOSE,Perl中的/x),并用注释解释每个部分。 - 正则表达式不适合用于解析HTML、XML或JSON。请使用专用的解析器。正则表达式适用于从这些格式中提取简单值,但不适用于结构解析。
文章底部电脑广告
手机广告位-内容正文底部


微信扫一扫,打赏作者吧~