Normally, I do a site:h3manth.com search on Google, and the first thing that comes up is my indexed pages. But last Friday evening things were shockingly different—I noticed none of my indexes were reflecting on Google search, rather it was showing some spam titled links!
Wondering what could be wrong, I logged into the Google search console and requested the bot to live test my URL. The bot replied saying the site is not reachable 404! Even though the site was loading perfectly well with a 200OK!
The Investigation Timeline
So, h3manth.com has a strange hosting scenario and here is a quick timeline on the investigation I did:
- In 2009 h3manth.com was bought via Google Gsuite
- Gsuite internally bought it via Godaddy
- Hosted it on another server slice bought on Godaddy
- Changed the DNS to Cloudflare long back for SSL, firewall and other benefits
- Godaddy moved my hosting recently, had to change my A record
digwas still showing the old A record- The same state even after 48hrs
- Google crawler thinks my site is 404 and I have lost the pagerank/index
- Now I am left with three different tech support windows and no resolution yet
- Finally in soup!
Tech Support Circus
- Tried to login to Formerly G Suite - Google Workspace and failed as my saved password wasn’t working.
- Had to drop a note to the support for recovery, as I wasn’t using the phone number that was registered for 2FA!
- Had to fill in a detailed response over the mail with multiple questions to prove my identity for password recovery
- It had been long, the credit card which was previously used for payment was expired and had discarded that card, the recovery mail was asking for the last mode of payment and the amount as one of the questions
- I failed twice to prove my identity and finally realized that I had used another card for this payment, after searching my emails for payment notifications.
- Was able to prove my identity and logged in, checked that the DNS was pointing to Cloudflare and that looked fine.
- Meanwhile had multiple conversations with Godaddy and nothing really was happening on their end.
- Reached out to Cloudflare thinking that it might be something to do with the proxied DNS, as I was on free plan there was a latency.
- Dropped a mail to Google Workspace, they called me back and said it is not really under their radar and there is no tech support as such for Google Search Console as it is a free product.
A Ray of Hope
As I was waiting and replying to each of the mail thread async, I was asking the lazyweb if someone else has seen something like this before and Šime Vidas highlighted that my domain was responding with a 404 when the User-Agent was of Googlebot.
Well, my robots.txt and .htaccess looked fine, at this point of time I started doubting if it was something to do with Cloudflare’s firewall settings, read few posts and disabled few of the advanced settings—but still the googlebot was being blocked.
The Final Move
From the Cloudflare conversation with Damian Parker we tried surpassing the Cloudflare proxy via the curl call by directly connecting to the godaddy’s host to rule out Cloudflare issues:
Surpassing Cloudflare with Googlebot UA
$ curl -svko /dev/null https://h3manth.com/ \
-A 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' \
--connect-to ::godaddy.ip
< HTTP/2 404
< date: Tue, 08 Jun 2021 09:17:54 GMT
< server: Apache
< content-length: 315
< content-type: text/html; charset=iso-8859-1
### Surpassing Cloudflare with a Browser UA
bash
$ curl -svko /dev/null https://h3manth.com/ \
-A 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36' \
--connect-to ::godaddy.ip
< HTTP/2 200
< date: Tue, 08 Jun 2021 09:19:16 GMT
< server: Apache
< content-type: text/html; charset=utf-8
Finally, it was clear that the issue is in the hosting server and not in Google Workplace or Cloudflare, so I was back to discussion with godaddy, who suggested to disable the DNS proxy and test, it was still the same, they later suggested that if I would buy a security package they would do a detailed investigation.
During the discussion, we came across a php file that was high in size and strange, the godaddy team suspected malware and was suggesting to do a virus scan, but I had remembered seeing such a file and had deleted it assuming that it was created by mistake, the src made no sense though, it was almost like uglified and minified JS.
Later on I was find . -type f -exec du -h {} + | sort -r -h finding and sorting files by size to see if there was anything else that is fishy, there wasn’t anything suspicious, I decided to look into the .htaccess yet again and voilà this time my eyes finally caught something suspicious!
Well, I guess I had removed hungers-jemie.php already before, but this was screwing it up! Cleaned this up and testing the bot was able to fetch my URLs!
Root Cause
I was thinking hard on how did an external user get access to my .htaccess then I realized that I was also hosting an old Drupal blog of mine with 300+ posts, logged in as admin to make sure user rules were fine and noticed some issues.
That was still ok, but making a quick security scan made me realize that this was the backdoor that was used for the breach. Few many of those red flags were not initiated by me for sure! (maybe few modules got outdated and was never checked)
So, I quickly got into my DB and checked, surprise! There were thousands of fake users been created! In a hurry I did a delete * from users and then went and checked my site, dang, none of posts were loading. I had forgot the fact that deleting users will also delete their reference posts and there is no way to recover them. I was so disappointed that after all this I have lost all of my posts, 300+ posts!!
Realized that I had done few experiments in migrating away from Drupal just before I moved to Octopress (2013). Wasn’t able to find it in my cloud stores, then I recalled there was a daemon that was running backing up the DB. Luckily I was able to find the latest DB backup, restored it, but that had the fake users, so cleaned it up and this time didn’t delete myself.
Finally, my weekend was over! (Remember this started on Friday evening) Well, it was an awesome feeling at the end, the site was back on track. As I type this the site has been indexed by search engines like ever before, but there are still few cached indices that I have requested to search engines to flush.
P.S: If the intruder is reading this post, I would like to say 👋
About Hemanth HM
Hemanth HM is a Sr. Machine Learning Manager at PayPal, Google Developer Expert, TC39 delegate, FOSS advocate, and community leader with a passion for programming, AI, and open-source contributions.