一、HTML文件格式
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>My test page</title>
</head>
<body>
<p>This is my page</p>
</body>
</html>
二、Python下生成HTML文檔
.html 文檔外在表現(xiàn)為許多行包含各個標簽的文本,實際上可將其抽象為一棵標簽樹。
使用 xml.etree.ElementTree 來管理 .html 的標簽樹,并將該樹轉(zhuǎn)換為 .html 文檔。
2.1 基本樹結(jié)構(gòu)
import xml.etree.ElementTree as et
class HtmlTree(object):
doctype_str = "<!DOCTYPE html>"
def __init__(self):
self.html_ele = et.Element("html")
self.head_ele = et.SubElement(self.html_ele, "head")
self.body_ele = et.SubElement(self.html_ele, "body")
self.charset_ele = et.SubElement(self.head_ele, "meta", attrib={"charset": "utf-8"})
self.title_ele = et.SubElement(self.head_ele, "title")
2.2 將樹轉(zhuǎn)為字符串
class HtmlTree(object):
# ...
def __str__(self):
html_str = et.tostring(self.html_ele, encoding="unicode")
return self.doctype_str + '\n' + html_str
2.3 設(shè)置 title
class HtmlTree(object):
# ...
def set_title(self, title_str):
self.title_ele.text = title_str
2.4 設(shè)置 body
class HtmlTree(object):
# ...
def set_body(self, body_str):
body_str = "<body>" + body_str + "</body>"
body_subtree = et.fromstring(body_str)
# 復制body元素的內(nèi)容,參考 Element.copy() 函數(shù)源碼
self.body_ele.text = body_subtree.text
self.body_ele.tail = body_subtree.tail
self.body_ele[:] = body_subtree # 復制子節(jié)點