TL;DR: Consolidated 18 years of blog content from 6 different platforms (Blogspot → Drupal → WordPress → Octopress → 11ty → Astro) into a single static site with 372 posts, recovered 200+ posts from database dumps and backups, removed 36 duplicates, and fixed 220+ posts with broken formatting - all with AI pair programming.
The Blogging Timeline
2007 ──► Blogspot (Google's free platform, where it all began)
│
2008 ──► Drupal 6 (self-hosted, felt like a "real" developer)
│
2012 ──► WordPress (everyone was doing it)
│
2014 ──► Octopress (Ruby-based static site generator, hipster phase)
│
2020 ──► 11ty (JavaScript ecosystem, modern SSG)
│
2025 ──► Astro (the final form? )
Each migration carried forward content from the previous platform. Some posts survived all 6 migrations. Most accumulated formatting quirks along the way.
Why So Many Migrations?
Each platform served its purpose at the time:
- Blogspot - Zero config, just write and publish. Perfect for a student.
- Drupal - Wanted more control, PHP was everywhere.
- WordPress - Plugins for everything! (Until you need to patch 47 of them)
- Octopress - “Static sites are the future!” Also, I was learning Ruby.
- 11ty - JavaScript everywhere, simple and fast.
- Astro - Type-safe content collections, island architecture, and actual good DX.
The Great Consolidation
The content was scattered across:
- A Drupal 6 SQL dump (~50MB)
- WordPress exports (XML)
- Octopress markdown files
- 11ty markdown files
The Drupal content was the messiest - 18 years of different editors, plugins, and format changes had left scars in the data.
The Drupal Challenge
Parsing the SQL dump required some Python wizardry:
def parse_drupal_nodes(sql_file):
with open(sql_file, 'r') as f:
content = f.read()
# Parse INSERT statements for node_revisions
pattern = r"INSERT INTO `node_revisions` VALUES\s*\((.*?)\);"
matches = re.findall(pattern, content, re.DOTALL)
return parse_values(matches)
The URL aliases needed 322 redirect rules:
RewriteRule ^content/(.*)$ /scribe/$1 [R=301,L]
RewriteRule ^node/(.*)$ /scribe/$1 [R=301,L]
The Code Block Nightmare
Each platform handled code blocks differently:
- Drupal used
<code>tags with custom filter modules - WordPress had its own syntax highlighters
- Octopress used
{% codeblock %}liquid tags - 11ty used standard fenced blocks
The result after consolidation? Fragment fences everywhere:
```javascript
function foo() {
```javascript ← WHY
return bar;
```javascript ← STOP
}
```
The Fix Scripts
Two Python scripts saved the day:
Language detection:
def detect_language(code):
if re.search(r'^\s*import\s+\w+', code, re.MULTILINE):
return 'python'
if re.search(r'\.each\s+do\s*\|', code):
return 'ruby'
if re.search(r'^\s*\$\s+\w+', code, re.MULTILINE):
return 'bash'
return 'javascript'
Fragment fence removal:
patterns = [
(r'(\n[^\n`]+)\n```javascript\n([^\n`]+\n)', r'\1\n\2'),
(r'(\n[^\n`]+)\n```python\n([^\n`]+\n)', r'\1\n\2'),
]
The Statistics
| Metric | Value |
|---|---|
| Platforms migrated from | 6 |
| Total posts consolidated | 372 |
| Posts recovered from DB/backup | 200+ |
| URL redirects (100% working) | 292 |
| Posts cleaned (nav/tweet links) | 220 |
| Duplicate posts removed | 36 |
| Years of content | 18 |
| Build time | ~6 seconds |
| Final static site size | ~91MB |
The AI Pair Programming Stats
The entire migration was completed over multiple sessions with AI pair programming. Here’s the breakdown:
| Metric | Value |
|---|---|
| Total project time | ~4-5 hours (across sessions) |
| AI tool calls | 3800+ |
| Posts analyzed | 400+ |
| Posts recovered/fixed | 250+ |
| Python scripts written | 10+ |
| Deployments | 20+ |
| Git commits | 30+ |
The AI Workflow
- Human reports broken post → AI views file
- AI rewrites with proper code blocks → Human tests locally
- Human approves → AI commits and deploys
- Repeat for 70+ posts
Time Saved
Without AI, manually fixing 120+ posts with code block issues would have taken:
- ~5-10 min per post to view, analyze, and rewrite
- 120 posts × 7 min = ~14 hours of manual work
AI turned a 2-day project into a few hours of evening sessions.
What I’ve Learned
-
Content outlives platforms - Posts from 2007 are still getting traffic. The platform is temporary; the content is permanent.
-
Each migration leaves scars - Every platform has its quirks. Migrating content 5+ times means 5+ layers of formatting weirdness.
-
Static sites win - No more security patches. No more database maintenance. Just HTML served from edge locations.
-
AI pair programming is a game-changer - What would have been a weekend project became an evening session. Writing migration scripts, fixing broken posts, iterating on solutions - AI made it possible.
-
Beware of auto-fix scripts - A script that “fixes” code blocks across 200+ files can break more than it fixes. Some posts needed manual restoration after aggressive auto-fixing. Always test on a subset first.
-
Live databases are gold - The SQL dump wasn’t enough. Connecting to the live MySQL database recovered 13 more posts that were missing from the static dump.
-
llms.txt matters - Added h3manth.com/llms.txt for AI crawlers. If bots are reading our content anyway, might as well make it structured.
The Final Architecture
h3manth.com
├── /scribe/ → 372 blog posts (Astro)
├── /ai/ → AI projects
├── /fun/ → Interactive demos
├── llms.txt → AI-friendly site description
└── *.html → Landing pages
The blog is now live at h3manth.com/scribe with all 372 posts, proper syntax highlighting, and 292 redirects preserving 18 years of SEO juice.
From 6 disparate platforms to one unified Astro site - it’s been an evolution. Here’s to the next 18 years.
P.S. - If you’re migrating legacy content, don’t underestimate the code block problem. And get yourself an AI pair programmer.
About Hemanth HM
Hemanth HM is a Sr. Machine Learning Manager at PayPal, Google Developer Expert, TC39 delegate, FOSS advocate, and community leader with a passion for programming, AI, and open-source contributions.