This script extracts a single HTML tag (returns an object) or multiple HTML tags (returns a list of objects) as JSON.
{
"name": "head",
"content": "<head><title>Test</title></head>",
"innerHtml": {
"name": "title",
"content": "<title>Test</title>",
"innerHtml": "Test"
}
}
myHtmlContent = Html('<head><title>Test</title></head>')
myHtmlContent.extractHtmlTagAsDict()
Output:
{
name: 'head',
content: '<head><title>Test</title></head>',
innerHtml: {
name: 'title',
content: '<title>Test</title>',
innerHtml: 'Test'
}
}
If there's no HTML
content, the main extraction function (extractHtmlTagAsDict
) will just return the string
back:
myOnlyTextContent = Html('This is not HTML.')
myOnlyTextContent.extractHtmlTagAsDict()
Output:
'This is not HTML.'
If there's HTML
content mixed with text and comments, the main extraction function (extractHtmlTagAsDict
) will return an array
with each tag, text and comment:
myHtmlAndTextContent = Html('''
<p>
This is a paragraph.<br>
</p>
This is text with HTML (and a comment).
<!-- A comment -->
''')
myHtmlAndTextContent.extractHtmlTagAsDict()
Output:
[
{
name: 'p':,
content: '<p>This is a paragraph.<br></p>',
innerHtml: [
'This is a paragraph.',
{
name: 'br',
content: '<br>'
}
]
},
'This is text with HTML (and a comment).',
{
comment: '<!-- A comment -->',
}
]