继续百度url题目
时间:2008-09-04 来源:ubuntuer
要求输入一个url,输出该url是首页、目录页或者其他url
如下形式叫做首页:
militia.info/
www.apcnc.com.cn/
http://www.cyjzs.comwww.greena888.com/
www.800cool.net/
http://hgh-products.my-age.net/
如下形式叫做目录页:
thursdaythree.net/greenhouses--gas-global-green-house-warming/
http://www.mw.net.tw/user/tgk5ar1r/profile/
http://www.szeasy.com/food/yszt/chunjie/
www.fuckingjapanese.com/Reality/
请注意:
a) url有可能带http头也有可能不带
b)动态url(即含有"?"的url)的一律不算目录页,如:
www.buddhismcity.net/utility/mailit.php?l=/activity/details/3135/
www.buddhismcity.net/utility/mailit.php?l=/activity/details/2449/
下面是我实现的代码:
#!/bin/bash
#echo Please input the url:
#read urls
for url in $(cat urls)
do
echo $url | sed 's/^http:\/\///g' | grep '\/' | grep -v '\/.*\/' | grep -v '\?'
a1=$?
echo $url | sed 's/^http:\/\///g' | grep '\/.*\/' | grep -v '\?'
a2=$?
echo $url | grep '\?'
a3=$?
if [ $a1 -eq 0 ]
then
echo "$url is index page"
elif [ $a2 -eq 0 ]
then
echo "$url is directory page"
else
echo "$url is other page"
fi
done
如下形式叫做首页:
militia.info/
www.apcnc.com.cn/
http://www.cyjzs.comwww.greena888.com/
www.800cool.net/
http://hgh-products.my-age.net/
如下形式叫做目录页:
thursdaythree.net/greenhouses--gas-global-green-house-warming/
http://www.mw.net.tw/user/tgk5ar1r/profile/
http://www.szeasy.com/food/yszt/chunjie/
www.fuckingjapanese.com/Reality/
请注意:
a) url有可能带http头也有可能不带
b)动态url(即含有"?"的url)的一律不算目录页,如:
www.buddhismcity.net/utility/mailit.php?l=/activity/details/3135/
www.buddhismcity.net/utility/mailit.php?l=/activity/details/2449/
下面是我实现的代码:
#!/bin/bash
#echo Please input the url:
#read urls
for url in $(cat urls)
do
echo $url | sed 's/^http:\/\///g' | grep '\/' | grep -v '\/.*\/' | grep -v '\?'
a1=$?
echo $url | sed 's/^http:\/\///g' | grep '\/.*\/' | grep -v '\?'
a2=$?
echo $url | grep '\?'
a3=$?
if [ $a1 -eq 0 ]
then
echo "$url is index page"
elif [ $a2 -eq 0 ]
then
echo "$url is directory page"
else
echo "$url is other page"
fi
done
相关阅读 更多 +