Blog hosted on GitHub pages not getting indexed by Google Search #7478
-
I've got a weird problem where none of my Material for MkDocs pages hosted on GitHub Pages are getting indexed by Google Search. No matter what I do in the Google Search Console, I just get the "Discovered - currently not indexed" error and the only thing that works is to manually submit individual URLs for indexing, one by one (very tedious and time consuming + rate limited at around 10 URLs a day). What am I doing wrong - or what could be the root cause of this behavior? What happened?I used to run a Jekyll based blog, also hosted on GitHub Pages (but using the built-in GitHub Pages GHA workflow for publishing) for this. I also used Google Analytics at the time. This worked just fine and I never had to bother making sure my blog was indexed by Google. Maybe Google Analytics made sure all valid and visited URLs were indexed automatically. I moved my blog onto Material for MkDocs along with a custom GHA workflow which builds and publishes my blog, and I also moved away from Google Analytics. At the same time, I did change the URL to my blog pages (although I have a forwarding mechanism to the new URL for each blog post). At some point, I noticed all of my blog posts got de-listed at Google as I had a massive drop in visits to my website. Perhaps this happened because I changed the URL to the posts 🤷. I also use Umami instead of Google Analytics now. Once logged into the Google Search Console (GSC) I noticed none of the blog posts were indexed by Google on the new URLs. That's fair, I did change the post URLs after all. So I figured, I need to tell GSC where those blog posts reside now. What did I do?
Months later, the GSC still won't index any of my pages unless I manually tell it to explicitly do so - and I can only do this one URL at a time. ScreenshotsWhat am I doing wrong?I've noticed another user (👋 @Zwyx) is having pretty much the exact problem, where GitHub refuses to index pages. And as they are using Docusaurus (source here) it really feels like this is not a Material for MkDocs problem per se. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 7 replies
-
There seems to be a tool for figuring out reasons why pages are discovered but not indexed. The URL Inspection Tool should give you a report but you need to prove site ownership to it first. Hope that helps and please do let us know if this has anything at all to do with Material for MkDocs. |
Beta Was this translation helpful? Give feedback.
-
Over at Nype - https://npe.cm/ we had the same issue, we moved away from GitHub Pages and we're still recovering from it. 1st issue is basically that, due to how GitHub Pages are setup everything is stored in a limited amount of servers under a limited amount of IP addresses, so this leads to a quota threshold being met and Google pausing on crawling the servers at all. 2nd issue (my guess) is that any problem with a site is "amplified" due to the slow crawling, and Google sort of abandons a site after too many problems occur. You can check the status of your crawl stats under 3rd issue (my guess) is that in your case, you set up your redirections on the 404 page. GoogleBot should in theory handle JavaScript and detect the location change in JS, but overall when a bad URL loads GitHub sends the 404 server status code, and only later it loads the correct page. At Nype we played around with a combination of 404 page JavaScript redirects, 4th issue (my guess) is that in your case, you have now a lot of pages with possibly low traffic / low amount of backlinks, so Google is reluctant in adding so many links at once 🤔 We have also used the currently trending script https://github.com/goenning/google-indexing-script it helped a bit, but due to the 2nd issue manual requests for indexing were better. I personally didn't do it as I'm not the property owner. I hope at least some of the above helps you out 😅 but yeah after working a bit with Google "docs", A LOT of things are not said directly, and users have to guess too much imo. |
Beta Was this translation helpful? Give feedback.
Over at Nype - https://npe.cm/ we had the same issue, we moved away from GitHub Pages and we're still recovering from it.
1st issue is basically that, due to how GitHub Pages are setup everything is stored in a limited amount of servers under a limited amount of IP addresses, so this leads to a quota threshold being met and Google pausing on crawling the servers at all.
2nd issue (my guess) is that any problem with a site is "amplified" due to the slow crawling, and Google sort of abandons a site after too many problems occur. You can check the status of your crawl s…