找回密码
 立即注册
搜索
热搜: 活动 交友 discuz
查看: 417|回复: 0

Find orphan URLs

[复制链接]

1

主题

0

回帖

5

积分

新手上路

积分
5
发表于 2023-12-25 17:01:29 | 显示全部楼层 |阅读模式
Another way to use log files is to discover orphan URLs, URLs that you want search engines to crawl and index but are not linked to internally. We can do this by checking Ahrefs' Site Diagnostics to check for URLs with 200 status code HTML that have no internal links. You can see the "Orphan URL" report that I named this. There is a caveat here, since these URLs are not discovered by Ahrefs but are discovered by Googlebot, these URLs may not be the URLs we want to link to as they are not indexable. When setting up crawl sources for your Ahrefs project, I recommend using the "Custom URL list" feature to copy and paste these URLs. With this, Ahrefs will now consider these orphan URLs found in the log files and report the problem to you the next time it is crawled: 10. Crawling and monitoring by directory Assume that you have implemented structured URLs and specified the site's architecture (for example, /features/feature-page/). In this case, you can also analyze the log files by directory to see if Googlebot is crawling more of certain parts of the site.


I have deployed this analysis in the "Directories Overview" tab of Google CMS Web Designs Sheets. You can see that I also included data on the number of links within the directory and the total organic traffic. You can use this to see if Googlebot is spending more time crawling low-traffic directories than high-value directories. But again, keep in mind that this can happen because some URLs in a particular directory change more frequently than others. But if you notice an odd trend, it's worth investigating further. In addition to this report, there is also a "Directories Crawl trend" report if you want to see crawl trends for each directory on your site. 11. View Cloudflare cache ratio Go to the "CF cache status" tab and you'll see a summary of how often Cloudflare caches files on edge servers. When Cloudflare caches content (the HIT in the image above), the request is no longer sent to your origin server, but is served directly from its global CDN. This results in better Core Web vitals , especially for global sites. HINT. It's also worth setting up a cache on your origin server (such as Varnish, Nginx FastCGI, or Redis Full Page Cache) so that even if Cloudflare doesn't cache URLs, you still benefit from some caching.




If you see a lot of "Miss" or "Dynamic" response codes, I recommend further investigation to understand why Cloudflare is not caching the content. Common reasons may be: Linking to a URL that contains parameters – Cloudflare passes these requests to your origin server by default because they may be dynamic. Cache expiration time is too short – If you set a short cache time, many users may receive uncached content. No Preloading Cache If you need to have the cache expire frequently (because the content changes frequently), use a preloader bot to prepare the cache instead of having the user click on an uncached URL, such as Optimus Cache Preloader . HINT. It is highly recommended to set up HTML edge caching through Cloudflare, which will significantly reduce TTFB. This can be easily done using WordPress and Cloudflare’s automatic platform optimization .

回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

Archiver|手机版|小黑屋|DiscuzX

GMT+8, 2024-10-19 15:27 , Processed in 0.104601 second(s), 20 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表