Skip to the content.

Home

jc.parsers.clf_s

jc - JSON Convert Common Log Format file streaming parser

This streaming parser outputs JSON Lines (cli) or returns an Iterable of Dictionaries (module)

This parser will handle the Common Log Format standard as specified at https://www.w3.org/Daemon/User/Config/Logging.html#common-logfile-format.

Combined Log Format is also supported. (Referer and User Agent fields added)

Extra fields may be present and will be enclosed in the extra field as a single string.

If a log line cannot be parsed, an object with an unparsable field will be present with a value of the original line.

The epoch calculated timestamp field is naive. (i.e. based on the local time of the system the parser is run on)

The epoch_utc calculated timestamp field is timezone-aware and is only available if the timezone field is UTC.

Usage (cli):

$ cat file.log | jc --clf-s

Usage (module):

import jc

result = jc.parse('clf_s', common_log_file_output.splitlines())
for item in result:
    # do something

Schema:

Empty strings and `-` values are converted to `null`/`None`.

{
  "host":                         string,
  "ident":                        string,
  "authuser":                     string,
  "date":                         string,
  "day":                          integer,
  "month":                        string,
  "year":                         integer,
  "hour":                         integer,
  "minute":                       integer,
  "second":                       integer,
  "tz":                           string,
  "request":                      string,
  "request_method":               string,
  "request_url":                  string,
  "request_version":              string,
  "status":                       integer,
  "bytes":                        integer,
  "referer":                      string,
  "user_agent":                   string,
  "extra":                        string,
  "epoch":                        integer,  # [0]
  "epoch_utc":                    integer,  # [1]
  "unparsable":                   string    # [2]
}

[0] naive timestamp
[1] timezone-aware timestamp. Only available if timezone field is UTC
[2] exists if the line was not able to be parsed

Examples:

$ cat file.log | jc --clf-s
{"host":"127.0.0.1","ident":"user-identifier","authuser":"frank","...}
{"host":"1.1.1.2","ident":null,"authuser":null,"date":"11/Nov/2016...}
...

$ cat file.log | jc --clf-s -r
{"host":"127.0.0.1","ident":"user-identifier","authuser":"frank","...}
{"host":"1.1.1.2","ident":"-","authuser":"-","date":"11/Nov/2016:0...}
...

parse

def parse(
    data: Iterable[str],
    raw: bool = False,
    quiet: bool = False,
    ignore_exceptions: bool = False
) -> Iterator[Union[Dict[str, Any], Tuple[BaseException, str]]]

Main text parsing generator function. Returns an iterable object.

Parameters:

data:              (iterable)  line-based text data to parse
                               (e.g. sys.stdin or str.splitlines())

raw:               (boolean)   unprocessed output if True
quiet:             (boolean)   suppress warning messages if True
ignore_exceptions: (boolean)   ignore parsing exceptions if True

Returns:

Iterable of Dictionaries

Parser Information

Compatibility: linux, darwin, cygwin, win32, aix, freebsd

Source: jc/parsers/clf_s.py

Version 1.0 by Kelly Brazil (kellyjonbrazil@gmail.com)