文章详情

  • 游戏榜单
  • 软件榜单
关闭导航
热搜榜
热门下载
热门标签
php爱好者> php文档>搜索框关键词优先匹配算法

搜索框关键词优先匹配算法

时间:2011-03-10  来源:jannyz

结合jquery的autocomplete的应用,通常为根据输入关键词进行匹配查找给出相关的suggestion选项列表,对于一个关键词(或将用户输入看作一个词进行完全匹配)的查找比较简单,但类似google搜索引擎的搜索输入框则要求多关键词的组合匹配(对于关键词的取舍还要进行语义分析,这部分暂且不说)。

对于多个关键词(以“a b c d”为例,全匹配度M=4)匹配优先级及相应的正则表达式应依次为:

  1. 匹配所有[匹配度m=4]关键词(忽略空字符,连续且保持原有顺序) :a\s*b\s*c\s*d
  2. 匹配所有[m=4]关键词(允许不连续,但保持原有顺序) :a.*b.*c.*d
  3. 匹配所有[m=4]关键词(不要求顺序):(该步暂未实现,寻求好的正则表达)
  4. 从"a b c d"中去掉一个关键词(从后往前删),即匹配其中[m=3]关键词,得到关键词组合"a b c","a b d","a c d","b c d"共4组,然后对这四组按全匹配度M=3分别做1~3步骤;
  5. 然后从前面4组中分别按步骤4的方法得到[m=2]的关键词组合,应为"a b","a c","b c";"a b","a d","b d";"a c","a d","c d";"b c","b d","c d",然后distinct一下得到共6组[m=2]的关键词组合,同样按全匹配度M=2分别做1~3步骤;
  6. 若m>1则继续按步骤5依次对递减的关键词组合进行匹配;
  7. 最后匹配[m=1]的关键词:a|b|c|d

C#中的算法实现如下:

1. 在"public static class DataFactory"类中添加通用方法:

        public static IEnumerable<TSource> FilterList<TSource>
        (
            IEnumerable<TSource> source
            , String term
            , int? maxCount
            , IEqualityComparer<TSource> comparer
            , Func<TSource, Regex, bool> regexMatch
        )
        {
            if (!maxCount.HasValue)
            {
                maxCount = int.MaxValue;
            }

            //简单提取关键词
            term = Regex.Replace(term, @"(\*|\^|\\|\$|\.|\+|\?|\=|\!|\:|\||\/|\(|\)|\[|\]|\{|\})", @"\$1");
            term = term.ToUpper(System.Threading.Thread.CurrentThread.CurrentUICulture);
            term = Regex.Replace(term, " +", "|");
            term = term.Trim(new char[] { '|' });

            //按关键词组合的优先匹配规则生成匹配正则表达式列表
            int cnt = term.Split('|').Length;
            List<string> regList = new List<string>();
            List<string> buildWords = new List<string>(new string[] { term });
            do
            {
                List<string> tempBuildWords = new List<string>();
                for (int widx = 0; widx < buildWords.Count; widx++)
                {
                    string tempTerm = buildWords[widx];
                    regList.AddRange(new string[]{
                        tempTerm.Replace("|", "\\s*"), //步骤1
                        tempTerm.Replace("|", ".*") //步骤2
                    });

                    string[] words = tempTerm.Split('|');
                    for (int i = words.Length - 1; i >= 0; i--)
                    {
                        tempBuildWords.Add(("|" + tempTerm + "|").Replace("|" + words[i] + "|", "|").Trim(new char[] { '|' }));
                    }
                }
                buildWords = tempBuildWords.Distinct<string>().ToList<string>();
                cnt--;
            } while (cnt > 1);
            //match a keyword at least
            regList.Add(term);

            //按优先级从高到低匹配返回最终结果
            List<TSource> matchList = new List<TSource>();
            foreach (string regKey in regList)
            {
                Regex regex = new Regex(regKey, RegexOptions.IgnoreCase);
                int targetCount = maxCount.Value - matchList.Count<TSource>();
                foreach (var item in source.Except<TSource>(matchList, comparer))
                {
                    if (regexMatch(item, regex))
                    {
                        matchList.Add(item);
                        if (--targetCount == 0)
                        {
                            return matchList;
                        }
                    }
                }
            }

            return matchList;
        }

2. 在实体工厂类中添加实例方法:

    public class ProductFactory : BaseFactory, IFactory
    {
        private List<Product> linerProductList = new List<Product>();

        //……

        public IEnumerable<Product> FilterFromAllPESProducts(String term)
        {
            return FilterFromAllPESProducts(term, null);
        }
        public IEnumerable<Product> FilterFromAllPESProducts(String term, int? maxCount)
        {
            ProductEqualityComparer comparer = new ProductEqualityComparer();
            var pesProducts = this.linerProductList.Where(p => p.PESID.HasValue).Distinct(comparer);
            //调用静态搜索方法
            return DataFactory.FilterList<Product>(pesProducts, term, maxCount, comparer, ProductRegexMatch);
        }

        //设定具体实例的匹配规则
        public bool ProductRegexMatch(Product item, Regex regex)
        {
            return regex.IsMatch(item.Name);
        }

    }

    /// <summary>
    /// Equality comparer for product
    /// Used to remove duplicate products
    /// TODO: revise later     
    /// </summary>
    public class ProductEqualityComparer : IEqualityComparer<Product>
    {
        #region IEqualityComparer<Product> Members

        bool IEqualityComparer<Product>.Equals(Product x, Product y)
        {
            return x.ID == y.ID;
        }

        int IEqualityComparer<Product>.GetHashCode(Product obj)
        {
            return obj.ID.GetHashCode();
        }

        #endregion
    }

3. 应用层或业务逻辑层调用实体工厂类中的搜索方法:

        [OperationContract]
        [AspNetCacheProfile("ClientCacheProfile")]
        [WebGet(ResponseFormat = WebMessageFormat.Json)]
        public IEnumerable<product> FilterProducts(String term)
        {
            IEnumerable<product> products = null;
            products = DataFactory.GetProductFactory(Context.Current).FilterFromAllPESProducts(term, Constant.MaxItemOfAutoComplete);

            if (products == null || products.Count() == 0)
            {
                products = new List<product>();
                (products as List<product>).Add(new Product()
                {
                    ID = 0,
                    PESID = 0,
                    Name = Resources.String.Filter_NoProgramMatch
                });
            }
            return products;
        }

至此,搜索出来的结果基本能满足一般项目的需求。其中步骤3的匹配规则未实现,待后续补充……

相关阅读 更多 +
排行榜 更多 +
辰域智控app

辰域智控app

系统工具 下载
网医联盟app

网医联盟app

运动健身 下载
汇丰汇选App

汇丰汇选App

金融理财 下载