現(xiàn)在需要解決的問題是,針對某些富文本的接口,需要保留常用的html富文本標簽,不能完全過濾。
針對某些接口,或者url這個好辦,直接在過濾時對uri地址進行篩選。
@Override
protected void doFilterInternal(HttpServletRequest httpServletRequest, HttpServletResponse httpServletResponse, FilterChain filterChain) throws ServletException, IOException {
String uri = httpServletRequest.getRequestURI();
if(uri.startsWith("/filter/richText")){
//走富文本過濾器
RichTextParamtersWrapper wrapper = new RichTextParamtersWrapper((HttpServletRequest) httpServletRequest);
filterChain.doFilter(wrapper, httpServletResponse);
return;
}
//其他過濾器
ModifyParametersWrapper wrapper = new ModifyParametersWrapper((HttpServletRequest) httpServletRequest);
filterChain.doFilter(wrapper, httpServletResponse);
}
難辦的是針對富文本常見的標簽,正則是不會寫的,這輩子都寫不來正則( ╯□╰ )
搜了一下,貌似可以用Jsoup這個東西。它主要功能是解析html,常見是用來做爬蟲的,我們可以通過加白名單(黑名單不建議,白名單的把控更好)的方式實現(xiàn)富文本的過濾。
參考文檔
https://blog.csdn.net/skyrunner06/article/details/25876693
添加的依賴
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.8.3</version>
</dependency>
通過加載白名單配置文件的方式(改起來方便),將允許的標簽添加到Jsoup的WhiteList規(guī)則里。
public class JsoupUtil {
public static Whitelist whitelist = null;
/**
* 配置Jsoup標簽白名單
* @return
*/
public static Whitelist initWhiteList() {
if (whitelist == null) synchronized (new Object()) {
whitelist = Whitelist.relaxed();
String jsonString = null;
Resource resource = new ClassPathResource("/whitelist.conf");
File file = null;
InputStream input = null;
Writer output = null;
try {
file = resource.getFile();
input = new FileInputStream(file);
output = new StringWriter();
IOUtils.copy(input, output);
jsonString = output.toString();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally {
if (input != null) {
IOUtils.closeQuietly(input);
}
if (output != null) {
IOUtils.closeQuietly(output);
}
}
JSONObject whiteJson = JSONObject.parseObject(jsonString);
JSONObject whiteListMap = whiteJson.getJSONObject("whiteList");
JSONObject protocolsMap = whiteJson.getJSONObject("protocols");
for (Map.Entry<String, Object> entry : whiteListMap.entrySet()) {
String tag = entry.getKey();
whitelist.addTags(tag);
JSONObject whiteListMap2 = (JSONObject) entry.getValue();
for (Map.Entry<String, Object> entry2 : whiteListMap2.entrySet()) {
String attribute = entry2.getKey();
whitelist.addAttributes(tag, attribute);
}
}
for (Map.Entry<String, Object> entry : protocolsMap.entrySet()) {
String tag = entry.getKey().substring(0, entry.getKey().indexOf("."));
String key = entry.getKey().substring(entry.getKey().indexOf(".") + 1, entry.getKey().length());
JSONArray jsonArray = JSONArray.parseArray(entry.getValue().toString());
for (int i = 0; i < jsonArray.size(); i++) {
String value = jsonArray.getString(i);
//給URL屬性添加協(xié)議。例如: addProtocols("a", "href", "ftp", "http", "https")標簽a的href鍵可以指向的協(xié)議有ftp、http、https
whitelist.addProtocols(tag, key, value);
}
}
}
return whitelist;
}
}
使用時:
/**
* 針對富文本的字符替換
* 只有在白名單中出現(xiàn)的html標簽才會被保留
* @param value
* @return
*/
public static String richText(String value){
JsoupUtil.initWhiteList();
value = Jsoup.clean(value, "", JsoupUtil.whitelist);
return value;
}
最后是白名單配置文件
{
"whiteList":{
"a":{
"href":"",
"title":""
},
"b":{
},
"blockquote":{
"cite":""
},
"br":{
},
"caption":{
},
"cite":{
},
"code":{
},
"col":{
"span":"",
"width":""
},
"colgroup":{
"span":"",
"width":""
},
"dd":{
},
"div":{
"style":"/^text-align:\\s*(left|right|center);?\\s*$/i"
},
"dl":{
},
"dt":{
},
"em":{
},
"h1":{
},
"h2":{
},
"h3":{
},
"h4":{
},
"h5":{
},
"h6":{
},
"i":{
},
"img":{
"align":"",
"alt":"",
"height":"",
"src":"",
"title":"",
"width":""
},
"li":{
"class":"",
"style":"/^text-align:\\s*(left|right|center);?\\s*$/i"
},
"ol":{
"start":"",
"type":""
},
"p":{
"style":"/^text-align:\\s*(left|right|center);?\\s*$/i"
},
"pre":{
},
"q":{
"cite":""
},
"small":{
},
"span":{
"style":"/^\\s*font-family\\s*:\\s*(('|\\\"|\"|')?(楷體|楷體_GB2312|宋體|微軟雅黑|黑體|,|\\s|\\w|sans-serif)('|\\\"|\"|')?)+;?\\s*|\\s*(color|font-size|background-color)\\s*:\\s*(#\\w*|[\\w\\s]*|rgb\\s*\\(\\s*\\d+\\s*,\\s*\\d+\\s*,\\s*\\d+\\s*\\));?\\s*|\\s*text-decoration\\s*:\\s*(underline|overline|line-through|blink)\\s*;?\\s*$/i"
},
"strike":{
},
"strong":{
},
"sub":{
},
"sup":{
},
"table":{
"summary":"",
"width":""
},
"tbody":{
},
"td":{
"abbr":"",
"axis":"",
"colspan":"",
"rowspan":"",
"width":""
},
"tfoot":{
},
"th":{
"abbr":"",
"axis":"",
"colspan":"",
"rowspan":"",
"scope":"",
"width":""
},
"thead":{
},
"tr":{
},
"u":{
},
"ul":{
"type":"",
"class":""
}
},
"protocols":{
"a.href":[
"ftp",
"http",
"https",
"mailto"
],
"blockquote.cite":[
"http",
"https"
],
"cite.cite":[
"http",
"https"
],
"img.src":[
"http",
"https"
],
"q.cite":[
"http",
"https"
]
}
}
一個針對富文本的過濾器get√