Skip to the content.

Home

jc.parsers.url

jc - JSON Convert URL string parser

Normalized, Encoded, and Decoded versions of the original URL and URL parts are included in the output. Encoding and Decoding is best effort.

Note: Do not use the Encoded fields for a URL that has already been Encoded. Similarly, do not use the Decoded fields for a URL that has already been Decoded.

This parser will work with naked and wrapped URL strings:

Usage (cli):

$ echo "http://example.com/test/path?q1=foo&q2=bar#frag" | jc --url

Usage (module):

import jc
result = jc.parse('url', url_string)

Schema:

{
  "url":                       string,
  "scheme":                    string or null,
  "netloc":                    string or null,
  "path":                      string or null,
  "parent":                    string or null,
  "filename":                  string or null,
  "stem":                      string or null,
  "extension":                 string or null,
  "path_list": [               array or null
                               string
  ],
  "query":                     string or null,
  "query_obj": {               object or null
    <query-key>: [             array or null
      <query-value>            string            # [0]
    ]
  },
  "fragment":                  string or null,
  "username":                  string or null,
  "password":                  string or null,
  "hostname":                  string or null,
  "port":                      integer or null,  # [1]
  "encoded": {
    "url":                     string,
    "scheme":                  string or null,
    "netloc":                  string or null,
    "path":                    string or null,
    "parent":                  string or null,
    "filename":                string or null,
    "stem":                    string or null,
    "extension":               string or null,
    "path_list": [             array or null
                               string
    ],
    "query":                   string or null,
    "fragment":                string or null,
    "username":                string or null,
    "password":                string or null,
    "hostname":                string or null,
    "port":                    integer or null,  # [1]
  },
  "decoded": {
    "url":                     string,
    "scheme":                  string or null,
    "netloc":                  string or null,
    "path":                    string or null,
    "parent":                  string or null,
    "filename":                string or null,
    "stem":                    string or null,
    "extension":               string or null,
    "path_list": [             array or null
                               string
    ],
    "query":                   string or null,
    "fragment":                string or null,
    "username":                string or null,
    "password":                string or null,
    "hostname":                string or null,
    "port":                    integer or null,  # [1]
  }
}

[0] Duplicate query-keys will have their values consolidated into the
    array of query-values

[1] Invalid port values will be converted to null/None and a warning
    message will be printed to `STDERR` if quiet=False

Examples:

$ echo "http://example.com/test/path?q1=foo&q1=bar&q2=baz#frag" \\
       | jc --url -p
{
  "url": "http://example.com/test/path?q1=foo&q1=bar&q2=baz#frag",
  "scheme": "http",
  "netloc": "example.com",
  "path": "/test/path",
  "parent": "/test",
  "filename": "path",
  "stem": "path",
  "extension": null,
  "path_list": [
    "test",
    "path"
  ],
  "query": "q1=foo&q1=bar&q2=baz",
  "query_obj": {
    "q1": [
      "foo",
      "bar"
    ],
    "q2": [
      "baz"
    ]
  },
  "fragment": "frag",
  "username": null,
  "password": null,
  "hostname": "example.com",
  "port": null,
  "encoded": {
    "url": "http://example.com/test/path?q1=foo&q1=bar&q2=baz#frag",
    "scheme": "http",
    "netloc": "example.com",
    "path": "/test/path",
    "parent": "/test",
    "filename": "path",
    "stem": "path",
    "extension": null,
    "path_list": [
      "test",
      "path"
    ],
    "query": "q1=foo&q1=bar&q2=baz",
    "fragment": "frag",
    "username": null,
    "password": null,
    "hostname": "example.com",
    "port": null
  },
  "decoded": {
    "url": "http://example.com/test/path?q1=foo&q1=bar&q2=baz#frag",
    "scheme": "http",
    "netloc": "example.com",
    "path": "/test/path",
    "parent": "/test",
    "filename": "path",
    "stem": "path",
    "extension": null,
    "path_list": [
      "test",
      "path"
    ],
    "query": "q1=foo&q1=bar&q2=baz",
    "fragment": "frag",
    "username": null,
    "password": null,
    "hostname": "example.com",
    "port": null
  }
}

$ echo "ftp://localhost/filepath" | jc --url -p
{
  "url": "ftp://localhost/filepath",
  "scheme": "ftp",
  "netloc": "localhost",
  "path": "/filepath",
  "parent": "/",
  "filename": "filepath",
  "stem": "filepath",
  "extension": null,
  "path_list": [
    "filepath"
  ],
  "query": null,
  "query_obj": null,
  "fragment": null,
  "username": null,
  "password": null,
  "hostname": "localhost",
  "port": null,
  "encoded": {
    "url": "ftp://localhost/filepath",
    "scheme": "ftp",
    "netloc": "localhost",
    "path": "/filepath",
    "parent": "/",
    "filename": "filepath",
    "stem": "filepath",
    "extension": null,
    "path_list": [
      "filepath"
    ],
    "query": null,
    "fragment": null,
    "username": null,
    "password": null,
    "hostname": "localhost",
    "port": null
  },
  "decoded": {
    "url": "ftp://localhost/filepath",
    "scheme": "ftp",
    "netloc": "localhost",
    "path": "/filepath",
    "parent": "/",
    "filename": "filepath",
    "stem": "filepath",
    "extension": null,
    "path_list": [
      "filepath"
    ],
    "query": null,
    "fragment": null,
    "username": null,
    "password": null,
    "hostname": "localhost",
    "port": null
  }
}

parse

def parse(data: str, raw: bool = False, quiet: bool = False) -> Dict

Main text parsing function

Parameters:

data:        (string)  text data to parse
raw:         (boolean) unprocessed output if True
quiet:       (boolean) suppress warning messages if True

Returns:

Dictionary. Raw or processed structured data.

Parser Information

Compatibility: linux, darwin, cygwin, win32, aix, freebsd

Source: jc/parsers/url.py

This parser can be used with the --slurp command-line option.

Version 1.1 by Kelly Brazil (kellyjonbrazil@gmail.com)