Extension APIのサンプルコードを実行してみる

《 初回公開:2022/03/26 , 最終更新:未 》

Python-MarkdownのためのExtensionを記述する」のサンプルコードを実際に実行してみる。

【 目次 】

各プロセッサの登録状況と優先度は?

独自のExtensionを作成しプロセッサをRegistryに登録するには、プロセッサの優先度をregisterのpriority引数を使って指定する必要がある。
しかし、このためには現状どのようなプロセッサがどのような優先度において登録されているかを知る必要があるように思う。

このため、Registryのregisterメソッドなどを私なりに解析して、各プロセッサの優先度がどのように登録されているか調べるためのコードを、なかば強引につくってみた。

#! python3

import markdown

extensions = ["extra", "codehilite", "fenced_code", "toc", "admonition", "meta"]
md = markdown.Markdown(extensions=extensions)

registrys = [
    ("Preprocessors", md.preprocessors),
    ("Block Processors", md.parser.blockprocessors),
    ("Tree Processors", md.treeprocessors),
    ("Inline Processors", md.inlinePatterns),
    ("Postprocessors", md.postprocessors),
]

for (name, registry) in registrys:
    print()
    print(f"processor_name: {name}")
    for priorityItem in registry._priority:
        print(f"name: {priorityItem.name}, priority: {priorityItem.priority}, type: {type(registry._data[priorityItem.name]).__name__}")

実行結果

processor_name: Preprocessors
name: normalize_whitespace, priority: 30, type: NormalizeWhitespace
name: html_block, priority: 20, type: HtmlBlockPreprocessor
name: fenced_code_block, priority: 25, type: FencedBlockPreprocessor
name: meta, priority: 27, type: MetaPreprocessor

processor_name: Block Processors
name: empty, priority: 100, type: EmptyBlockProcessor
name: indent, priority: 90, type: ListIndentProcessor
name: code, priority: 80, type: CodeBlockProcessor
name: hashheader, priority: 70, type: HashHeaderProcessor
name: setextheader, priority: 60, type: SetextHeaderProcessor
name: hr, priority: 50, type: HRProcessor
name: olist, priority: 40, type: OListProcessor
name: ulist, priority: 30, type: UListProcessor
name: quote, priority: 20, type: BlockQuoteProcessor
name: reference, priority: 15, type: ReferenceProcessor
name: paragraph, priority: 10, type: ParagraphProcessor
name: footnote, priority: 17, type: FootnoteBlockProcessor
name: defindent, priority: 85, type: DefListIndentProcessor
name: deflist, priority: 25, type: DefListProcessor
name: table, priority: 75, type: TableProcessor
name: abbr, priority: 16, type: AbbrPreprocessor
name: markdown_block, priority: 105, type: MarkdownInHtmlProcessor
name: admonition, priority: 105, type: AdmonitionProcessor

processor_name: Tree Processors
name: inline, priority: 20, type: InlineProcessor
name: prettify, priority: 10, type: PrettifyTreeprocessor
name: footnote, priority: 50, type: FootnoteTreeprocessor
name: footnote-duplicate, priority: 15, type: FootnotePostTreeprocessor
name: attr_list, priority: 8, type: AttrListTreeprocessor
name: hilite, priority: 30, type: HiliteTreeprocessor
name: toc, priority: 5, type: TocTreeprocessor

processor_name: Inline Processors
name: backtick, priority: 190, type: BacktickInlineProcessor
name: escape, priority: 180, type: EscapeInlineProcessor
name: reference, priority: 170, type: ReferenceInlineProcessor
name: link, priority: 160, type: LinkInlineProcessor
name: image_link, priority: 150, type: ImageInlineProcessor
name: image_reference, priority: 140, type: ImageReferenceInlineProcessor
name: short_reference, priority: 130, type: ShortReferenceInlineProcessor
name: short_image_ref, priority: 125, type: ShortImageReferenceInlineProcessor
name: autolink, priority: 120, type: AutolinkInlineProcessor
name: automail, priority: 110, type: AutomailInlineProcessor
name: linebreak, priority: 100, type: SubstituteTagInlineProcessor
name: html, priority: 90, type: HtmlInlineProcessor
name: entity, priority: 80, type: HtmlInlineProcessor
name: not_strong, priority: 70, type: SimpleTextInlineProcessor
name: em_strong, priority: 60, type: AsteriskProcessor
name: em_strong2, priority: 50, type: UnderscoreProcessor
name: footnote, priority: 175, type: FootnoteInlineProcessor

processor_name: Postprocessors
name: footnote, priority: 25, type: FootnotePostprocessor
name: amp_substitute, priority: 20, type: AndSubstitutePostprocessor
name: unescape, priority: 10, type: UnescapePostprocessor
name: raw_html, priority: 30, type: MarkdownInHTMLPostprocessor

Preprocessorのサンプルコードを実行してみる

NoRenderクラスを含むnorender_extentionモジュールのフルコードは

norender_extention.py

from markdown.extensions import Extension
from markdown.preprocessors import Preprocessor
import re

class NoRenderPreprocessor(Preprocessor):
    """ Skip any line with words 'NO RENDER' in it. """
    def run(self, lines):
        new_lines = []
        for line in lines:
            m = re.search("NO RENDER", line)
            if not m:    
                # any line without NO RENDER is passed through
                new_lines.append(line)  
        return new_lines

class NoRenderExtension (Extension):
    def extendMarkdown(self, md):
        md.preprocessors.register(NoRenderPreprocessor(md), 'NoRender', 40)

def makeExtension(**kwargs):
    return NoRenderExtension(**kwargs)

このnorender_extentionをテストしてみる。

norender_extention_test.py

import markdown

makedown_text = u'''
ドキュメントタイトル
===================

# NO RENDER
Markdownについては

- [MarkdownでHTMLを簡単に - 愚鈍人](http://ichitcltk.hustle.ne.jp/gudon2/index.php?pageType=file&id=markdown_memo)
****NO RENDER****
- [MarkdownでHTMLを簡単に - Sublime Tex 編 - 愚鈍人](http://ichitcltk.hustle.ne.jp/gudon2/index.php?pageType=file&id=sublime_markdown)
'''

extensions = ["norender_extention",]
print(markdown.Markdown(extensions=extensions).convert(makedown_text))

実行結果

<h1>ドキュメントタイトル</h1>
<p>Markdownについては</p>
<ul>
<li><a href="http://ichitcltk.hustle.ne.jp/gudon2/index.php?pageType=file&amp;id=markdown_memo">MarkdownでHTMLを簡単に - 愚鈍人</a></li>
<li><a href="http://ichitcltk.hustle.ne.jp/gudon2/index.php?pageType=file&amp;id=sublime_markdown">MarkdownでHTMLを簡単に - Sublime Tex 編 - 愚鈍人</a></li>
</ul>

BlockProcessorのサンプルコードを実行してみる

box_extentionモジュールのフルコードは

box_extention.py

from markdown.extensions import Extension
from markdown.blockprocessors import BlockProcessor
import re
import xml.etree.ElementTree as etree

class BoxBlockProcessor(BlockProcessor):
    RE_FENCE_START = r'^ *!{3,} *\n' # start line, e.g., `   !!!! `
    RE_FENCE_END = r'\n *!{3,}\s*$'  # last non-blank line, e.g, '!!!\n  \n\n'

    def test(self, parent, block):
        return re.match(self.RE_FENCE_START, block)

    def run(self, parent, blocks):
        original_block = blocks[0]
        blocks[0] = re.sub(self.RE_FENCE_START, '', blocks[0])

        # Find block with ending fence
        for block_num, block in enumerate(blocks):
            if re.search(self.RE_FENCE_END, block):
                # remove fence
                blocks[block_num] = re.sub(self.RE_FENCE_END, '', block)
                # render fenced area inside a new div
                e = etree.SubElement(parent, 'div')
                e.set('style', 'display: inline-block; border: 1px solid red;')
                self.parser.parseBlocks(e, blocks[0:block_num + 1])
                # remove used blocks
                for i in range(0, block_num + 1):
                    blocks.pop(0)
                return True  # or could have had no return statement
        # No closing marker!  Restore and do nothing
        blocks[0] = original_block
        return False  # equivalent to our test() routine returning False

class BoxExtension(Extension):
    def extendMarkdown(self, md):
        md.parser.blockprocessors.register(BoxBlockProcessor(md.parser), 'box', 175)

def makeExtension(**kwargs):
    return BoxExtension(**kwargs)

このbox_extentionをテストしてみる。

box_extention_test.py

import markdown

makedown_text = '''
A regular paragraph of text.

!!!!!
First paragraph of wrapped text.

Second Paragraph of **wrapped** text.
!!!!!

Another regular paragraph of text.
'''

extensions = ["box_extention",]
print(markdown.Markdown(extensions=extensions).convert(makedown_text))

実行結果

<p>A regular paragraph of text.</p>
<div style="display: inline-block; border: 1px solid red;">
<p>First paragraph of wrapped text.</p>
<p>Second Paragraph of <strong>wrapped</strong> text.</p>
</div>
<p>Another regular paragraph of text.</p>

Treeprocessorのサンプルコード

Treeprocessorのサンプルのサンプルコードについては

を参照。

InlineProcessorのサンプルコード

del_extensionモジュールのフルコードは

del_extension.py

from markdown.inlinepatterns import SimpleTagInlineProcessor
from markdown.extensions import Extension

class DelExtension(Extension):
    def extendMarkdown(self, md):
        md.inlinePatterns.register(SimpleTagInlineProcessor(r'()--(.*?)--', 'del'), 'del', 175)

def makeExtension(**kwargs):
    return DelExtension(**kwargs)

このdel_extentionをテストするコードは。

del_extension_test.py

import markdown

makedown_text = '''
First line of the block.
This is --strike one--.
This is --strike two--.
End of the block.
'''

extensions = ["del_extension",]
print(markdown.Markdown(extensions=extensions).convert(makedown_text))

実行結果

<p>First line of the block.
This is <del>strike one</del>.
This is <del>strike two</del>.
End of the block.</p>

Patternクラスを使ったサンプルコード

PatternクラスはInlineProcessorクラスに取って代わられているようで、使う必要はなさそうであるが。
一応、Patternクラスを使ったサンプルコードは

が参考になると思う。

これを参考にEmphasisPatternのコードを補完してみると、
(実際には*emphasis*の機能はPython-Markdownに組み込まれており、このコードは仮のコードであり実用には適さないが)

from markdown.inlinepatterns import Pattern
from markdown.extensions import Extension
import xml.etree.ElementTree as etree

MYPATTERN = r'\*([^*]+)\*'

class EmphasisPattern(Pattern):
    def handleMatch(self, m):
        el = etree.Element('em')
        el.text = m.group(2)
        return el

class  EmphasisExtension(Extension):
    def extendMarkdown(self, md):
        emphasis = EmphasisPattern(MYPATTERN)
        md.inlinePatterns.register(emphasis, 'emphasis', 175)

def makeExtension(**kwargs):
    return EmphasisPattern(**kwargs)

ポストプロセッサのサンプルコード

show_actual_html_extentionモジュールのフルコードは

show_actual_html_extention.py

from markdown.extensions import Extension
from markdown.postprocessors import Postprocessor
import re

class ShowActualHtmlPostprocesor(Postprocessor):
    """ Wrap entire output in <pre> tags as a diagnostic. """
    def run(self, text):
        return '<pre>\n' + re.sub('<', '&lt;', text) + '</pre>\n'

class ShowActualHtmlExtension(Extension):
    def extendMarkdown(self, md):
        md.postprocessors.register(ShowActualHtmlPostprocesor(), 'show_actual_html', 40)

def makeExtension(**kwargs):
    return ShowActualHtmlExtension(**kwargs)

このdel_extentionをテストするコードは。

show_actual_html_extention_test.py

import markdown

makedown_text = u'''
ドキュメントタイトル
===================

Markdownについては

- [MarkdownでHTMLを簡単に - 愚鈍人](http://ichitcltk.hustle.ne.jp/gudon2/index.php?pageType=file&id=markdown_memo)
- [MarkdownでHTMLを簡単に - Sublime Tex 編 - 愚鈍人](http://ichitcltk.hustle.ne.jp/gudon2/index.php?pageType=file&id=sublime_markdown)
'''

extensions = ["show_actual_html_extention",]
print(markdown.Markdown(extensions=extensions).convert(makedown_text))

実行結果

<pre>
&lt;h1>ドキュメントタイトル&lt;/h1>
&lt;p>Markdownについては&lt;/p>
&lt;ul>
&lt;li>&lt;a href="http://ichitcltk.hustle.ne.jp/gudon2/index.php?pageType=file&amp;id=markdown_memo">MarkdownでHTMLを簡単に - 愚鈍人&lt;/a>&lt;/li>
&lt;li>&lt;a href="http://ichitcltk.hustle.ne.jp/gudon2/index.php?pageType=file&amp;id=sublime_markdown">MarkdownでHTMLを簡単に - Sublime Tex 編 - 愚鈍人&lt;/a>&lt;/li>
&lt;/ul></pre>
ページのトップへ戻る