Skip to the content.

Home

jc.parsers.clf

jc - JSON Convert Common Log Format file parser

This parser will handle the Common Log Format standard as specified at https://www.w3.org/Daemon/User/Config/Logging.html#common-logfile-format.

Combined Log Format is also supported. (Referer and User Agent fields added)

Extra fields may be present and will be enclosed in the extra field as a single string.

If a log line cannot be parsed, an object with an unparsable field will be present with a value of the original line.

The epoch calculated timestamp field is naive. (i.e. based on the local time of the system the parser is run on)

The epoch_utc calculated timestamp field is timezone-aware and is only available if the timezone field is UTC.

Usage (cli):

$ cat file.log | jc --clf

Usage (module):

import jc
result = jc.parse('clf', common_log_file_output)

Schema:

Empty strings and - values are converted to null/None.

[
  {
    "host":                         string,
    "ident":                        string,
    "authuser":                     string,
    "date":                         string,
    "day":                          integer,
    "month":                        string,
    "year":                         integer,
    "hour":                         integer,
    "minute":                       integer,
    "second":                       integer,
    "tz":                           string,
    "request":                      string,
    "request_method":               string,
    "request_url":                  string,
    "request_version":              string,
    "status":                       integer,
    "bytes":                        integer,
    "referer":                      string,
    "user_agent":                   string,
    "extra":                        string,
    "epoch":                        integer,  # [0]
    "epoch_utc":                    integer,  # [1]
    "unparsable":                   string    # [2]
  }
]

[0] naive timestamp
[1] timezone-aware timestamp. Only available if timezone field is UTC
[2] exists if the line was not able to be parsed

Examples:

$ cat file.log | jc --clf -p
[
  {
    "host": "127.0.0.1",
    "ident": "user-identifier",
    "authuser": "frank",
    "date": "10/Oct/2000:13:55:36 -0700",
    "day": 10,
    "month": "Oct",
    "year": 2000,
    "hour": 13,
    "minute": 55,
    "second": 36,
    "tz": "-0700",
    "request": "GET /apache_pb.gif HTTPS/1.0",
    "status": 200,
    "bytes": 2326,
    "referer": null,
    "user_agent": null,
    "extra": null,
    "request_method": "GET",
    "request_url": "/apache_pb.gif",
    "request_version": "HTTPS/1.0",
    "epoch": 971211336,
    "epoch_utc": null
  },
  {
    "host": "1.1.1.2",
    "ident": null,
    "authuser": null,
    "date": "11/Nov/2016:03:04:55 +0100",
    "day": 11,
    "month": "Nov",
    "year": 2016,
    "hour": 3,
    "minute": 4,
    "second": 55,
    "tz": "+0100",
    "request": "GET /",
    "status": 200,
    "bytes": 83,
    "referer": null,
    "user_agent": null,
    "extra": "- 9221 1.1.1.1",
    "request_method": "GET",
    "request_url": "/",
    "request_version": null,
    "epoch": 1478862295,
    "epoch_utc": null
  },
  ...
]

$ cat file.log | jc --clf -p -r
[
  {
    "host": "127.0.0.1",
    "ident": "user-identifier",
    "authuser": "frank",
    "date": "10/Oct/2000:13:55:36 -0700",
    "day": "10",
    "month": "Oct",
    "year": "2000",
    "hour": "13",
    "minute": "55",
    "second": "36",
    "tz": "-0700",
    "request": "GET /apache_pb.gif HTTPS/1.0",
    "status": "200",
    "bytes": "2326",
    "referer": null,
    "user_agent": null,
    "extra": "",
    "request_method": "GET",
    "request_url": "/apache_pb.gif",
    "request_version": "HTTPS/1.0"
  },
  {
    "host": "1.1.1.2",
    "ident": "-",
    "authuser": "-",
    "date": "11/Nov/2016:03:04:55 +0100",
    "day": "11",
    "month": "Nov",
    "year": "2016",
    "hour": "03",
    "minute": "04",
    "second": "55",
    "tz": "+0100",
    "request": "GET /",
    "status": "200",
    "bytes": "83",
    "referer": "-",
    "user_agent": "-",
    "extra": "- 9221 1.1.1.1",
    "request_method": "GET",
    "request_url": "/",
    "request_version": null
  },
  ...
]

parse

def parse(data: str,
          raw: bool = False,
          quiet: bool = False) -> List[Dict[str, Any]]

Main text parsing function

Parameters:

data:        (string)  text data to parse
raw:         (boolean) unprocessed output if True
quiet:       (boolean) suppress warning messages if True

Returns:

List of Dictionaries. Raw or processed structured data.

Parser Information

Compatibility: linux, darwin, cygwin, win32, aix, freebsd

Source: jc/parsers/clf.py

Version 1.0 by Kelly Brazil (kellyjonbrazil@gmail.com)