首页 robots写法【张家界it网】(Robots writing [Zhangjiajie it nets])

robots写法【张家界it网】(Robots writing [Zhangjiajie it nets])

举报
开通vip

robots写法【张家界it网】(Robots writing [Zhangjiajie it nets])robots写法【张家界it网】(Robots writing [Zhangjiajie it nets]) robots写法【张家界it网】(Robots writing [Zhangjiajie it nets]) Syntax: the simplest robots.txt file uses two rules: ? User-Agent: applies the following rules to the bots The web page Disallow: wants to interce...

robots写法【张家界it网】(Robots writing [Zhangjiajie it nets])
robots写法【张家界it网】(Robots writing [Zhangjiajie it nets]) robots写法【张家界it网】(Robots writing [Zhangjiajie it nets]) Syntax: the simplest robots.txt file uses two rules: ? User-Agent: applies the following rules to the bots The web page Disallow: wants to intercept 1. allow all SE included in this station: robots.txt is empty, you can do nothing. 2. prohibit all SE included in the web site of certain directories: User-agent: * Disallow: / directory name 1/ Disallow: / directory name 2/ Disallow: / directory name 3/ 3. prohibit a SE included in the station, such as the prohibition of Baidu: User-agent: Baiduspider Disallow: / 4. prohibit all SE included in this station: User-agent: * Disallow: / It is used to tell a subset of objects to determine what an object is using. Robots.txt must be placed in the root directory of a site, and the file name must be all lowercase. The format of the robots.txt file User-agent: defines the type of search engine The Disallow: definition prohibits the search engine from the address The Allow: definition allows the search engine to include addresses The types of search engines we use are: (User-agent, case sensitive) Google spider: Googlebot Baidu spider: Baiduspider Yahoo spider: Yahoo! Slurp! Alexa spider: ia_archiver Bing spider: MSNbot Altavista spider: scooter Lycos spider: lycos_spider_ (T-Rex) AlltheWeb spider: fast-webcrawler Inktomi spider: slurp Soso spider: Sosospider Google Adsense spider: Mediapartners-Google Youdao spider: YoudaoBot < -- > / / -! Robots.txt Catalog What is robots.txt? Robots.txt misuse Robots.txt usage skills Open What is robots.txt? Robots.txt Robots.txt is the first file to check when accessing a web site in a search engine. The robots.txt file tells the spider what file the file can be viewed on the server. When a search spider to visit a site, it will first check whether robots.txt exists, the site root directory if it exists, the robot will search range according to the contents of the file to determine access; if the file does not exist, all search spiders will be able to access the website all pages are not password protected the. Robots.txt must be placed in the root directory of a site, and the file name must be all lowercase. Syntax: the simplest robots.txt file uses two rules: ? User-Agent: applies the following rules to the bots The web page Disallow: wants to intercept Robots.txt works in SEO In the optimization of the website, often use the robots file to some content does not want to let the spiders crawl, written before a website to optimize robots.txt files use, now write this article in a little bit of knowledge! What is an robots.txt file? Search engine through a crawler spider program (also known as search spiders, robot, search robots, etc.), automatically collect the Internet pages and access to relevant information. In view of network security and privacy considerations, search engines follow the robots.txt protocol. Through the plain text file robots.txt created in the root directory, the web site can declare parts that do not want to be accessed by the robots. Each site can independently control whether the site is willing to be included in the search engine, or specify the search engine only included the specified content. When a crawler of a search engine visits a site, it first checks whether there is a robots.txt in the root directory of the site. If the file does not exist, then the crawler crawls along the link, if it exists, The crawler determines the scope of the access according to the content in that file. Robots.txt must be placed in the root directory of a site, and the file name must be all lowercase. The format of the robots.txt file User-agent: defines the type of search engine The Disallow: definition prohibits the search engine from the address The Allow: definition allows the search engine to include addresses The types of search engines we use are: (User-agent, case sensitive) Google spider: Googlebot Baidu spider: Baiduspider Yahoo spider: Yahoo! Slurp! Alexa spider: ia_archiver Bing spider: MSNbot Altavista spider: scooter Lycos spider: lycos_spider_ (T-Rex) AlltheWeb spider: fast-webcrawler Inktomi spider: slurp Soso spider: Sosospider Google Adsense spider: Mediapartners-Google Youdao spider: YoudaoBot Robots.txt file writing User-agent: * here * represents all kinds of search engines * * is a wildcard Disallow: /admin/ is defined here as "no crawl" to find directories under the admin directory Disallow: /require/ is defined here as "no crawl" to find directories under the require directory Disallow: /ABC/ is defined here as "no crawl" to find directories under the ABC directory Disallow: /cgi-bin/*.htm prohibits access to all URL (including subdirectories) with the suffix ".Htm" in the /cgi-bin/ directory. Disallow: / *? * banned all the dynamic pages in web access Disallow: /jpg$prohibits crawling all pages of pictures in.Jpg format Disallow:/ab/adc.html prohibits crawling to the adc.html file under the ab folder. Allow: /cgi-bin/ is defined here to allow you to crawl to the directory below the cgi-bin directory Allow: /tmp is defined here to allow crawling of the entire directory of TMP Allow:.Htm$only allows access to URL with the suffix ".Htm". Allow:.Gif$allows you to grab pages and pictures in GIF format *********************************************************** ********** This article links: [Zhangjiajie IT net] Zhangjiajie IT network (formerly: Zero IT network), which is the main content of webmaster to news and information, computer assembly, computer assembly, computer maintenance, computer maintenance, website building tutorials, learning software tutorials, mobile phone market, mobile phone / computer operating system information such as the main body of the comprehensive portal website. Reminder: this station, PHPCMS, PHP168, DEDECMS to DISCUZ, PHPWIND and other open source program (CMS), station (imitation station) / templates / imitation / template two development / website reconstruction /CSS map task. [contact QQ:858448386] *********************************************************************
本文档为【robots写法【张家界it网】(Robots writing [Zhangjiajie it nets])】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_998870
暂无简介~
格式:doc
大小:24KB
软件:Word
页数:6
分类:生活休闲
上传时间:2018-04-03
浏览量:14