文章详情

  • 游戏榜单
  • 软件榜单
关闭导航
热搜榜
热门下载
热门标签
php爱好者> php文档>使用C#代码清除多余HTML

使用C#代码清除多余HTML

时间:2010-09-11  来源:Simcoder

数据库中所有公司简介都是从 文本编辑器存入的 当然样式也存到了数据库 在读取的时候我只想获取文本内容 怎么办呢?

 

代码  #region 过滤HTML
        /// <summary>
        /// 过滤html标签
        /// </summary>
        /// <param name="strHtml">html的内容</param>
        /// <returns></returns>
        public static string StripHTML(string strHtml)
        {
            string[] aryReg ={
                                  @"<script[^>]*?>.*?</script>",

                                  @"<(\/\s*)?!?((\w+:)?\w+)(\w+(\s*=?\s*(([""'])(\\[""'tbnr]|[^\7])*?\7|\w+)|.{0})|\s)*?(\/\s*)?>",
                                  @"([\r\n])[\s]+",
                                  @"&(quot|#34);",
                                  @"&(amp|#38);",
                                  @"&(lt|#60);",
                                  @"&(gt|#62);", 
                                  @"&(nbsp|#160);", 
                                  @"&(iexcl|#161);",
                                  @"&(cent|#162);",
                                  @"&(pound|#163);",
                                  @"&(copy|#169);",
                                  @"&#(\d+);",
                                  @"-->",
                                  @"<!--.*\n"
                              };

            string[] aryRep = {
                                   "",
                                   "",
                                   "",
                                   "\"",
                                   "&",
                                   "<",
                                   ">",
                                   " ",
                                   "\xa1",//chr(161),
                                   "\xa2",//chr(162),
                                   "\xa3",//chr(163),
                                   "\xa9",//chr(169),
                                   "",
                                   "\r\n",
                                   ""
                               };

            string newReg = aryReg[0];
            string strOutput = strHtml;
            for (int i = 0; i < aryReg.Length; i++)
            {
                System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(aryReg[i], System.Text.RegularExpressions.RegexOptions.IgnoreCase);
                strOutput = regex.Replace(strOutput, aryRep[i]);
            }
            strOutput.Replace("<", "");
            strOutput.Replace(">", "");
            strOutput.Replace("\r\n", "");
            return strOutput;
        }
        #endregion

 

 

相关阅读 更多 +
排行榜 更多 +
辰域智控app

辰域智控app

系统工具 下载
网医联盟app

网医联盟app

运动健身 下载
汇丰汇选App

汇丰汇选App

金融理财 下载