Grok Patterns
The grok processor includes built-in patterns for parsing common log formats and data types. These patterns make it easy to parse structured data from logs without writing complex regular expressions.
Quick Start
Parse Apache combined logs:
pipeline:
processors:
- grok:
expressions:
- '%{COMBINEDAPACHELOG}'
Parse custom formats using composable patterns:
pipeline:
processors:
- grok:
expressions:
- '%{IPORHOST:client} - %{USER:user} \[%{HTTPDATE:timestamp}\] "%{WORD:method} %{URIPATH:path}" %{INT:status}'
Common Log Format Patterns
Apache & Nginx Logs
| Pattern | Description | Example Output |
|---|---|---|
COMMONAPACHELOG | Apache common log format | Extracts: clientip, ident, auth, timestamp, verb, request, httpversion, response, bytes |
COMBINEDAPACHELOG | Apache combined log format | Same as common + referrer, agent |
HTTPD20_ERRORLOG | Apache 2.0 error logs | Extracts: timestamp, loglevel, clientip, errormsg |
HTTPD24_ERRORLOG | Apache 2.4 error logs | Extracts: timestamp, module, loglevel, pid, tid, client, message |
HTTPD_ERRORLOG | Apache error logs (any version) | Matches both 2.0 and 2.4 formats |
Syslog
| Pattern | Description | Example Output |
|---|---|---|
SYSLOGBASE | Basic syslog format | Extracts: timestamp, logsource, program, pid |
SYSLOGTIMESTAMP | Syslog timestamp (e.g., "Jan 15 10:30:00") | - |
SYSLOGPROG | Syslog program with optional PID | Extracts: program, pid |
SYSLOGFACILITY | Syslog facility and priority | Extracts: facility, priority |
Note: For syslog, also consider the dedicated parse_log processor which handles RFC3164 and RFC5424 formats.
Network & Data Types
IP Addresses & Hosts
| Pattern | Description | Example |
|---|---|---|
IP | IPv4 or IPv6 address | 192.168.1.1 or 2001:db8::1 |
IPV4 | IPv4 address only | 192.168.1.1 |
IPV6 | IPv6 address | 2001:0db8:85a3::8a2e:0370:7334 |
HOSTNAME | DNS hostname | example.com |
IPORHOST | IP address or hostname | Either format |
HOSTPORT | Host and port | example.com:8080 |
MAC | MAC address (any format) | 00:1B:44:11:3A:B7 |
URLs & Paths
| Pattern | Description | Example |
|---|---|---|
URI | Complete URI | https://user:[email protected]:8080/path?query=1 |
URIPROTO | URI protocol | http, https, ftp |
URIHOST | URI host with optional port | example.com:8080 |
URIPATH | URI path | /api/v1/users |
URIPARAM | URI query parameters | ?key=value&foo=bar |
PATH | File system path (Unix or Windows) | /var/log/app.log or C:\logs\app.log |
UNIXPATH | Unix file path | /var/log/app.log |
WINPATH | Windows file path | C:\Program Files\app |
Date & Time Patterns
| Pattern | Description | Example |
|---|---|---|
TIMESTAMP_ISO8601 | ISO8601 timestamp | 2024-01-15T10:30:00Z |
HTTPDATE | HTTP/Apache log date | 15/Jan/2024:10:30:00 +0000 |
SYSLOGTIMESTAMP | Syslog timestamp | Jan 15 10:30:00 |
DATESTAMP_RFC822 | RFC822 date | Mon Jan 15 2024 10:30:00 UTC |
DATESTAMP_RFC2822 | RFC2822 date | Mon, 15 Jan 2024 10:30:00 +0000 |
DATE_US | US date format | 01/15/2024 |
DATE_EU | European date format | 15/01/2024 |
YEAR | Year (2 or 4 digits) | 2024 or 24 |
MONTH | Month name | January, Jan |
MONTHNUM | Month number (1-12) | 1 or 01 |
MONTHDAY | Day of month | 15 |
DAY | Day name | Monday, Mon |
TIME | Time (HH:MM:SS) | 10:30:00 |
Basic Data Types
| Pattern | Description | Example |
|---|---|---|
INT | Integer (signed) | 42, -123 |
NUMBER | Number (int or float) | 42, 3.14, -0.5 |
POSINT | Positive integer | 123 |
NONNEGINT | Non-negative integer | 0, 123 |
WORD | Single word (alphanumeric + underscore) | user_name |
NOTSPACE | Non-whitespace string | foo-bar_123 |
DATA | Lazy match (non-greedy) | Matches minimal text |
GREEDYDATA | Greedy match | Matches all remaining text |
QUOTEDSTRING | Quoted string | "hello world" or 'hello world' |
UUID | UUID | 550e8400-e29b-41d4-a716-446655440000 |
User & Email
| Pattern | Description | Example |
|---|---|---|
USERNAME | Username | john.doe, user123 |
USER | Alias for USERNAME | Same as USERNAME |
EMAILADDRESS | Email address | [email protected] |
HTTPDUSER | HTTP user (email or username) | Either format |
Log Levels
| Pattern | Description | Matches |
|---|---|---|
LOGLEVEL | Common log levels | DEBUG, INFO, WARN, WARNING, ERROR, CRITICAL, FATAL, etc. (case insensitive) |
Examples
Parse Apache Combined Logs
Input:
192.168.1.100 - john [15/Jan/2024:10:30:00 +0000] "GET /api/users HTTP/1.1" 200 1234 "https://example.com" "Mozilla/5.0"
Pipeline:
pipeline:
processors:
- grok:
expressions:
- '%{COMBINEDAPACHELOG}'
Output:
{
"clientip": "192.168.1.100",
"ident": "-",
"auth": "john",
"timestamp": "15/Jan/2024:10:30:00 +0000",
"verb": "GET",
"request": "/api/users",
"httpversion": "1.1",
"response": "200",
"bytes": "1234",
"referrer": "\"https://example.com\"",
"agent": "\"Mozilla/5.0\""
}
Parse Custom Application Logs
Input:
2024-01-15 10:30:00 [ERROR] [email protected] failed login from 192.168.1.100
Pipeline:
pipeline:
processors:
- grok:
expressions:
- '%{TIMESTAMP_ISO8601:timestamp} \[%{LOGLEVEL:level}\] %{EMAILADDRESS:user} %{GREEDYDATA:message}'
Output:
{
"timestamp": "2024-01-15 10:30:00",
"level": "ERROR",
"user": "[email protected]",
"message": "failed login from 192.168.1.100"
}
Parse Syslog
Input:
Jan 15 10:30:00 server1 sshd[12345]: Failed password for invalid user admin from 192.168.1.100
Pipeline:
pipeline:
processors:
- grok:
expressions:
- '%{SYSLOGBASE} %{GREEDYDATA:message}'
Output:
{
"timestamp": "Jan 15 10:30:00",
"logsource": "server1",
"program": "sshd",
"pid": "12345",
"message": "Failed password for invalid user admin from 192.168.1.100"
}
Extract Multiple Fields
Input:
Request from 192.168.1.100:8080 to https://api.example.com/v1/users?id=123 took 250ms
Pipeline:
pipeline:
processors:
- grok:
expressions:
- 'Request from %{HOSTPORT:client} to %{URI:url} took %{NUMBER:duration}ms'
Output:
{
"client": "192.168.1.100:8080",
"url": "https://api.example.com/v1/users?id=123",
"duration": "250"
}
Custom Patterns
Define your own patterns for reuse:
pipeline:
processors:
- grok:
pattern_definitions:
MYAPP_TIMESTAMP: '%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}'
MYAPP_SEVERITY: '(CRITICAL|HIGH|MEDIUM|LOW)'
expressions:
- '%{MYAPP_TIMESTAMP:timestamp} \[%{MYAPP_SEVERITY:severity}\] %{GREEDYDATA:message}'
Type Hints
Extract values as specific types:
pipeline:
processors:
- grok:
expressions:
# :int suffix converts to integer
- 'Status: %{INT:status:int}, Size: %{INT:bytes:int}'
Input: Status: 200, Size: 1024
Output: {"status": 200, "bytes": 1024} (numbers, not strings)
Performance Tips
- Order patterns by likelihood - Put most common patterns first
- Use specific patterns -
%{IPV4}is faster than%{IP}if you know it's IPv4 - Avoid greedy matching - Use
%{DATA}instead of%{GREEDYDATA}when possible - Consider alternatives - For simple parsing, regex in
mappingmay be faster
Multiple Pattern Matching
Try multiple patterns until one matches:
pipeline:
processors:
- grok:
expressions:
- '%{COMBINEDAPACHELOG}' # Try Apache combined first
- '%{COMMONAPACHELOG}' # Fall back to common
- '%{HTTPD_ERRORLOG}' # Fall back to error log
The first pattern that extracts at least one value will be used.
Next Steps
- Grok Processor: Full configuration options
- Parse Log Processor: Alternative for syslog
- Log Processing Example: Complete pipeline examples
- Bloblang Guide: Alternative parsing with functions