Hemanth's Scribes

web

Consolidating 18 Years of Writing: From 6 Platforms to One

Author Photo

Hemanth HM

Thumbnail

TL;DR: Consolidated 18 years of blog content from 6 different platforms (Blogspot → Drupal → WordPress → Octopress → 11ty → Astro) into a single static site with 372 posts, recovered 200+ posts from database dumps and backups, removed 36 duplicates, and fixed 220+ posts with broken formatting - all with AI pair programming.

The Blogging Timeline

2007 ──► Blogspot (Google's free platform, where it all began)

2008 ──► Drupal 6 (self-hosted, felt like a "real" developer)

2012 ──► WordPress (everyone was doing it)

2014 ──► Octopress (Ruby-based static site generator, hipster phase)

2020 ──► 11ty (JavaScript ecosystem, modern SSG)

2025 ──► Astro (the final form? )

Each migration carried forward content from the previous platform. Some posts survived all 6 migrations. Most accumulated formatting quirks along the way.

Why So Many Migrations?

Each platform served its purpose at the time:

  • Blogspot - Zero config, just write and publish. Perfect for a student.
  • Drupal - Wanted more control, PHP was everywhere.
  • WordPress - Plugins for everything! (Until you need to patch 47 of them)
  • Octopress - “Static sites are the future!” Also, I was learning Ruby.
  • 11ty - JavaScript everywhere, simple and fast.
  • Astro - Type-safe content collections, island architecture, and actual good DX.

The Great Consolidation

The content was scattered across:

  • A Drupal 6 SQL dump (~50MB)
  • WordPress exports (XML)
  • Octopress markdown files
  • 11ty markdown files

The Drupal content was the messiest - 18 years of different editors, plugins, and format changes had left scars in the data.

The Drupal Challenge

Parsing the SQL dump required some Python wizardry:

def parse_drupal_nodes(sql_file):
    with open(sql_file, 'r') as f:
        content = f.read()
    
    # Parse INSERT statements for node_revisions
    pattern = r"INSERT INTO `node_revisions` VALUES\s*\((.*?)\);"
    matches = re.findall(pattern, content, re.DOTALL)
    
    return parse_values(matches)

The URL aliases needed 322 redirect rules:

RewriteRule ^content/(.*)$ /scribe/$1 [R=301,L]
RewriteRule ^node/(.*)$ /scribe/$1 [R=301,L]

The Code Block Nightmare

Each platform handled code blocks differently:

  • Drupal used <code> tags with custom filter modules
  • WordPress had its own syntax highlighters
  • Octopress used {% codeblock %} liquid tags
  • 11ty used standard fenced blocks

The result after consolidation? Fragment fences everywhere:

```javascript
function foo() {
```javascript  ← WHY
  return bar;
```javascript  ← STOP
}
```

The Fix Scripts

Two Python scripts saved the day:

Language detection:

def detect_language(code):
    if re.search(r'^\s*import\s+\w+', code, re.MULTILINE):
        return 'python'
    if re.search(r'\.each\s+do\s*\|', code):
        return 'ruby'
    if re.search(r'^\s*\$\s+\w+', code, re.MULTILINE):
        return 'bash'
    return 'javascript'

Fragment fence removal:

patterns = [
    (r'(\n[^\n`]+)\n```javascript\n([^\n`]+\n)', r'\1\n\2'),
    (r'(\n[^\n`]+)\n```python\n([^\n`]+\n)', r'\1\n\2'),
]

The Statistics

MetricValue
Platforms migrated from6
Total posts consolidated372
Posts recovered from DB/backup200+
URL redirects (100% working)292
Posts cleaned (nav/tweet links)220
Duplicate posts removed36
Years of content18
Build time~6 seconds
Final static site size~91MB

The AI Pair Programming Stats

The entire migration was completed over multiple sessions with AI pair programming. Here’s the breakdown:

MetricValue
Total project time~4-5 hours (across sessions)
AI tool calls3800+
Posts analyzed400+
Posts recovered/fixed250+
Python scripts written10+
Deployments20+
Git commits30+

The AI Workflow

  1. Human reports broken post → AI views file
  2. AI rewrites with proper code blocks → Human tests locally
  3. Human approves → AI commits and deploys
  4. Repeat for 70+ posts

Time Saved

Without AI, manually fixing 120+ posts with code block issues would have taken:

  • ~5-10 min per post to view, analyze, and rewrite
  • 120 posts × 7 min = ~14 hours of manual work

AI turned a 2-day project into a few hours of evening sessions.

What I’ve Learned

  1. Content outlives platforms - Posts from 2007 are still getting traffic. The platform is temporary; the content is permanent.

  2. Each migration leaves scars - Every platform has its quirks. Migrating content 5+ times means 5+ layers of formatting weirdness.

  3. Static sites win - No more security patches. No more database maintenance. Just HTML served from edge locations.

  4. AI pair programming is a game-changer - What would have been a weekend project became an evening session. Writing migration scripts, fixing broken posts, iterating on solutions - AI made it possible.

  5. Beware of auto-fix scripts - A script that “fixes” code blocks across 200+ files can break more than it fixes. Some posts needed manual restoration after aggressive auto-fixing. Always test on a subset first.

  6. Live databases are gold - The SQL dump wasn’t enough. Connecting to the live MySQL database recovered 13 more posts that were missing from the static dump.

  7. llms.txt matters - Added h3manth.com/llms.txt for AI crawlers. If bots are reading our content anyway, might as well make it structured.

The Final Architecture

h3manth.com
├── /scribe/           → 372 blog posts (Astro)
├── /ai/               → AI projects
├── /fun/              → Interactive demos
├── llms.txt           → AI-friendly site description
└── *.html             → Landing pages

The blog is now live at h3manth.com/scribe with all 372 posts, proper syntax highlighting, and 292 redirects preserving 18 years of SEO juice.

From 6 disparate platforms to one unified Astro site - it’s been an evolution. Here’s to the next 18 years.


P.S. - If you’re migrating legacy content, don’t underestimate the code block problem. And get yourself an AI pair programmer.

#javascript#python#astro
Author Photo

About Hemanth HM

Hemanth HM is a Sr. Machine Learning Manager at PayPal, Google Developer Expert, TC39 delegate, FOSS advocate, and community leader with a passion for programming, AI, and open-source contributions.