Skip to main content

Grok Patterns

The grok processor includes built-in patterns for parsing common log formats and data types. These patterns make it easy to parse structured data from logs without writing complex regular expressions.

Quick Start

Parse Apache combined logs:

pipeline:
processors:
- grok:
expressions:
- '%{COMBINEDAPACHELOG}'

Parse custom formats using composable patterns:

pipeline:
processors:
- grok:
expressions:
- '%{IPORHOST:client} - %{USER:user} \[%{HTTPDATE:timestamp}\] "%{WORD:method} %{URIPATH:path}" %{INT:status}'

Common Log Format Patterns

Apache & Nginx Logs

PatternDescriptionExample Output
COMMONAPACHELOGApache common log formatExtracts: clientip, ident, auth, timestamp, verb, request, httpversion, response, bytes
COMBINEDAPACHELOGApache combined log formatSame as common + referrer, agent
HTTPD20_ERRORLOGApache 2.0 error logsExtracts: timestamp, loglevel, clientip, errormsg
HTTPD24_ERRORLOGApache 2.4 error logsExtracts: timestamp, module, loglevel, pid, tid, client, message
HTTPD_ERRORLOGApache error logs (any version)Matches both 2.0 and 2.4 formats

Syslog

PatternDescriptionExample Output
SYSLOGBASEBasic syslog formatExtracts: timestamp, logsource, program, pid
SYSLOGTIMESTAMPSyslog timestamp (e.g., "Jan 15 10:30:00")-
SYSLOGPROGSyslog program with optional PIDExtracts: program, pid
SYSLOGFACILITYSyslog facility and priorityExtracts: facility, priority

Note: For syslog, also consider the dedicated parse_log processor which handles RFC3164 and RFC5424 formats.

Network & Data Types

IP Addresses & Hosts

PatternDescriptionExample
IPIPv4 or IPv6 address192.168.1.1 or 2001:db8::1
IPV4IPv4 address only192.168.1.1
IPV6IPv6 address2001:0db8:85a3::8a2e:0370:7334
HOSTNAMEDNS hostnameexample.com
IPORHOSTIP address or hostnameEither format
HOSTPORTHost and portexample.com:8080
MACMAC address (any format)00:1B:44:11:3A:B7

URLs & Paths

PatternDescriptionExample
URIComplete URIhttps://user:[email protected]:8080/path?query=1
URIPROTOURI protocolhttp, https, ftp
URIHOSTURI host with optional portexample.com:8080
URIPATHURI path/api/v1/users
URIPARAMURI query parameters?key=value&foo=bar
PATHFile system path (Unix or Windows)/var/log/app.log or C:\logs\app.log
UNIXPATHUnix file path/var/log/app.log
WINPATHWindows file pathC:\Program Files\app

Date & Time Patterns

PatternDescriptionExample
TIMESTAMP_ISO8601ISO8601 timestamp2024-01-15T10:30:00Z
HTTPDATEHTTP/Apache log date15/Jan/2024:10:30:00 +0000
SYSLOGTIMESTAMPSyslog timestampJan 15 10:30:00
DATESTAMP_RFC822RFC822 dateMon Jan 15 2024 10:30:00 UTC
DATESTAMP_RFC2822RFC2822 dateMon, 15 Jan 2024 10:30:00 +0000
DATE_USUS date format01/15/2024
DATE_EUEuropean date format15/01/2024
YEARYear (2 or 4 digits)2024 or 24
MONTHMonth nameJanuary, Jan
MONTHNUMMonth number (1-12)1 or 01
MONTHDAYDay of month15
DAYDay nameMonday, Mon
TIMETime (HH:MM:SS)10:30:00

Basic Data Types

PatternDescriptionExample
INTInteger (signed)42, -123
NUMBERNumber (int or float)42, 3.14, -0.5
POSINTPositive integer123
NONNEGINTNon-negative integer0, 123
WORDSingle word (alphanumeric + underscore)user_name
NOTSPACENon-whitespace stringfoo-bar_123
DATALazy match (non-greedy)Matches minimal text
GREEDYDATAGreedy matchMatches all remaining text
QUOTEDSTRINGQuoted string"hello world" or 'hello world'
UUIDUUID550e8400-e29b-41d4-a716-446655440000

User & Email

PatternDescriptionExample
USERNAMEUsernamejohn.doe, user123
USERAlias for USERNAMESame as USERNAME
EMAILADDRESSEmail address[email protected]
HTTPDUSERHTTP user (email or username)Either format

Log Levels

PatternDescriptionMatches
LOGLEVELCommon log levelsDEBUG, INFO, WARN, WARNING, ERROR, CRITICAL, FATAL, etc. (case insensitive)

Examples

Parse Apache Combined Logs

Input:

192.168.1.100 - john [15/Jan/2024:10:30:00 +0000] "GET /api/users HTTP/1.1" 200 1234 "https://example.com" "Mozilla/5.0"

Pipeline:

pipeline:
processors:
- grok:
expressions:
- '%{COMBINEDAPACHELOG}'

Output:

{
"clientip": "192.168.1.100",
"ident": "-",
"auth": "john",
"timestamp": "15/Jan/2024:10:30:00 +0000",
"verb": "GET",
"request": "/api/users",
"httpversion": "1.1",
"response": "200",
"bytes": "1234",
"referrer": "\"https://example.com\"",
"agent": "\"Mozilla/5.0\""
}

Parse Custom Application Logs

Input:

2024-01-15 10:30:00 [ERROR] [email protected] failed login from 192.168.1.100

Pipeline:

pipeline:
processors:
- grok:
expressions:
- '%{TIMESTAMP_ISO8601:timestamp} \[%{LOGLEVEL:level}\] %{EMAILADDRESS:user} %{GREEDYDATA:message}'

Output:

{
"timestamp": "2024-01-15 10:30:00",
"level": "ERROR",
"user": "[email protected]",
"message": "failed login from 192.168.1.100"
}

Parse Syslog

Input:

Jan 15 10:30:00 server1 sshd[12345]: Failed password for invalid user admin from 192.168.1.100

Pipeline:

pipeline:
processors:
- grok:
expressions:
- '%{SYSLOGBASE} %{GREEDYDATA:message}'

Output:

{
"timestamp": "Jan 15 10:30:00",
"logsource": "server1",
"program": "sshd",
"pid": "12345",
"message": "Failed password for invalid user admin from 192.168.1.100"
}

Extract Multiple Fields

Input:

Request from 192.168.1.100:8080 to https://api.example.com/v1/users?id=123 took 250ms

Pipeline:

pipeline:
processors:
- grok:
expressions:
- 'Request from %{HOSTPORT:client} to %{URI:url} took %{NUMBER:duration}ms'

Output:

{
"client": "192.168.1.100:8080",
"url": "https://api.example.com/v1/users?id=123",
"duration": "250"
}

Custom Patterns

Define your own patterns for reuse:

pipeline:
processors:
- grok:
pattern_definitions:
MYAPP_TIMESTAMP: '%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}'
MYAPP_SEVERITY: '(CRITICAL|HIGH|MEDIUM|LOW)'
expressions:
- '%{MYAPP_TIMESTAMP:timestamp} \[%{MYAPP_SEVERITY:severity}\] %{GREEDYDATA:message}'

Type Hints

Extract values as specific types:

pipeline:
processors:
- grok:
expressions:
# :int suffix converts to integer
- 'Status: %{INT:status:int}, Size: %{INT:bytes:int}'

Input: Status: 200, Size: 1024

Output: {"status": 200, "bytes": 1024} (numbers, not strings)

Performance Tips

  1. Order patterns by likelihood - Put most common patterns first
  2. Use specific patterns - %{IPV4} is faster than %{IP} if you know it's IPv4
  3. Avoid greedy matching - Use %{DATA} instead of %{GREEDYDATA} when possible
  4. Consider alternatives - For simple parsing, regex in mapping may be faster

Multiple Pattern Matching

Try multiple patterns until one matches:

pipeline:
processors:
- grok:
expressions:
- '%{COMBINEDAPACHELOG}' # Try Apache combined first
- '%{COMMONAPACHELOG}' # Fall back to common
- '%{HTTPD_ERRORLOG}' # Fall back to error log

The first pattern that extracts at least one value will be used.

Next Steps