gfm.semi_sane_lists – GitHub-like list parsing

The gfm.semi_sane_lists module provides an extension that causes lists to be treated the same way GitHub does.

Like the sane_lists extension, GitHub considers a list to end if it’s separated by multiple newlines from another list of a different type. Unlike the sane_lists extension, GitHub will mix list types if they’re not separated by multiple newlines.

GitHub also recognizes lists that start in the middle of a paragraph. This is currently not supported by this extension, since the Python parser has a deeply-ingrained belief that blocks are always separated by multiple newlines.

Typical usage

import markdown
from gfm import SemiSaneListExtension

print(markdown.markdown("""
- eggs
- milk

1. mix
2. stew
""", extensions=[SemiSaneListExtension()]))
<ul>
<li>eggs</li>
<li>milk</li>
</ul>
<ol>
<li>mix</li>
<li>stew</li>
</ol>
class gfm.semi_sane_lists.SemiSaneListExtension(**kwargs)[source]

Bases: markdown.extensions.Extension

An extension that causes lists to be treated the same way GitHub does.

extendMarkdown(md)[source]

Add the various proccesors and patterns to the Markdown Instance.

This method must be overriden by every extension.

Keyword arguments:

  • md: The Markdown instance.
  • md_globals: Global variables in the markdown module namespace.
getConfig(key, default='')

Return a setting for the given key or an empty string.

getConfigInfo()

Return all config descriptions as a list of tuples.

getConfigs()

Return all configs settings as a dict.

setConfig(key, value)

Set a config setting for key with the given value.

setConfigs(items)

Set multiple config settings given a dict or list of tuples.

class gfm.semi_sane_lists.SemiSaneOListProcessor(parser)[source]

Bases: markdown.blockprocessors.OListProcessor

detab(text)

Remove a tab from the front of each line of the given text.

get_items(block)

Break a block into list items.

lastChild(parent)

Return the last child of an etree element.

looseDetab(text, level=1)

Remove a tab from front of lines but allowing dedented lines.

run(parent, blocks)

Run processor. Must be overridden by subclasses.

When the parser determines the appropriate type of a block, the parser will call the corresponding processor’s run method. This method should parse the individual lines of the block and append them to the etree.

Note that both the parent and etree keywords are pointers to instances of the objects which should be edited in place. Each processor must make changes to the existing objects as there is no mechanism to return new/different objects to replace them.

This means that this method should be adding SubElements or adding text to the parent, and should remove (pop) or add (insert) items to the list of blocks.

Keywords:

  • parent: A etree element which is the parent of the current block.
  • blocks: A list of all remaining blocks of the document.
test(parent, block)

Test for block type. Must be overridden by subclasses.

As the parser loops through processors, it will call the test method on each to determine if the given block of text is of that type. This method must return a boolean True or False. The actual method of testing is left to the needs of that particular block type. It could be as simple as block.startswith(some_string) or a complex regular expression. As the block type may be different depending on the parent of the block (i.e. inside a list), the parent etree element is also provided and may be used as part of the test.

Keywords:

  • parent: A etree element which will be the parent of the block.
  • block: A block of text from the source which has been split at
    blank lines.
class gfm.semi_sane_lists.SemiSaneUListProcessor(parser)[source]

Bases: markdown.blockprocessors.UListProcessor

detab(text)

Remove a tab from the front of each line of the given text.

get_items(block)

Break a block into list items.

lastChild(parent)

Return the last child of an etree element.

looseDetab(text, level=1)

Remove a tab from front of lines but allowing dedented lines.

run(parent, blocks)

Run processor. Must be overridden by subclasses.

When the parser determines the appropriate type of a block, the parser will call the corresponding processor’s run method. This method should parse the individual lines of the block and append them to the etree.

Note that both the parent and etree keywords are pointers to instances of the objects which should be edited in place. Each processor must make changes to the existing objects as there is no mechanism to return new/different objects to replace them.

This means that this method should be adding SubElements or adding text to the parent, and should remove (pop) or add (insert) items to the list of blocks.

Keywords:

  • parent: A etree element which is the parent of the current block.
  • blocks: A list of all remaining blocks of the document.
test(parent, block)

Test for block type. Must be overridden by subclasses.

As the parser loops through processors, it will call the test method on each to determine if the given block of text is of that type. This method must return a boolean True or False. The actual method of testing is left to the needs of that particular block type. It could be as simple as block.startswith(some_string) or a complex regular expression. As the block type may be different depending on the parent of the block (i.e. inside a list), the parent etree element is also provided and may be used as part of the test.

Keywords:

  • parent: A etree element which will be the parent of the block.
  • block: A block of text from the source which has been split at
    blank lines.