博客从Typecho迁移至Hugo

自从建这个博客到现在已经快 6 年了,最开始用的是 Jekyll,用的服务器甚至是放在家里的树莓派 Zero W。接着是使用 Handsome 主题的 Typecho,虽然没有几张图片但是还为他专门架设了一个 Chevereto 图床。配合上 Handsome 主题的 Typecho 功能很齐全也很美观,但是热衷于重装系统的我自然而然的每隔一两年都要重装一次服务器的系统,并徘徊在安装 Typecho 和 Chevereto 的各种文件权限问题中。而 Handsome 主题大而全的另一个反面就是繁杂的配置选项,尽管外观上仍然保持了作者的克制,但我更希望我的博客能更专注于传达信息。于是借着给实验室搭建官网的工夫,我就顺手将博客迁移到了 Hugo。

Typecho 数据导出

我最希望能够原样从 Typecho 原样迁移的数据大概有以下几个部分:

  • 文章内容及封面
  • 几年来积累的评论
  • 友链
  • 由于 Typecho 和 Hugo 的 URL 结构有所不同,我需要将原先博客文章的链接 301 映射到新链接,避免 SEO 降低权重。

幸运的是,我的大部分需求都能被 从 Typecho 完美迁移到 Hexo 这篇文章满足,基于这位作者提供的代码,我进行了一些修改以满足我的需求。

导出文章内容及封面

使用这个脚本可以导出文章的标题、头图、发表日期、标签、分类、slug 等信息。由于 slug 通常是唯一的,且相对于标题更容易搜索,后续脚本将使用这个字段在创建新旧链接对应表时进行数据库搜索。

请注意,该脚本仅能处理文章属于一个分类的情况。如果有多个分类,建议先在 Typecho 中手动将所有属于多个分类的文章去掉多余的分类。

article_exporter.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
import re
from datetime import datetime

import mariadb

db = mariadb.connect(
    host="127.0.0.1",
    port=3306,
    user="root",
    password="",
    database="Your-Database-Name"
)

cur = db.cursor()
cur.execute(
    "SELECT * FROM `typecho_contents` WHERE `template` IS NULL AND `type`='post'")
res = cur.fetchall()

cur.execute(
    "SELECT * FROM `typecho_fields` WHERE `name` = 'thumb'")
covers = cur.fetchall()
print(covers)

for x in res:
#    print(x)
    content = x[5].removeprefix('<!--markdown-->')
    created = datetime.fromtimestamp(x[3]).strftime("%Y-%m-%d %H:%M:%S")
    updated = datetime.fromtimestamp(x[4]).strftime("%Y-%m-%d %H:%M:%S")
    draft = 'true'
    if x[-9] == 'publish':
        draft = 'false'

    # check cover
    cover_link = ""
    for cover in covers:
        if cover[0] == x[0]:
            cover_link = cover[3]
    if cover_link == "":
        print("no cover", x[1])

    cur.execute(
        f'SELECT * FROM `typecho_relationships` JOIN `typecho_metas` WHERE `typecho_relationships`.`mid`=`typecho_metas`.`mid` AND `typecho_relationships`.`cid`={x[0]}')
    meta = cur.fetchall()
    category = []
    tag = []
    for y in meta:
        if y[5] == 'category':
            category.append([y[3], y[4]])
        elif y[5] == 'tag':
            tag.append(y[4])
        else:
            print(x[1], 'error unhandled', y[5])
    if len(category) != 1:
        print(x[1], 'category length warning', len(category))
        continue
    link = f'{category[0][1]}/{x[0]}.html'
    title = re.sub(r'[^\w\-_\. ]', '-', x[1])
    with open(f'./source/_posts/{title}.md', 'w') as f:
        f.write(f'''---
title: {x[1]}
image: {cover_link}
slug: {x[2]}
date: {created}
updated: {updated}
tags: [{','.join(tag)}]
categories: {category[0][0]}
toc: true
draft: {draft}
---

{content}
''')

映射新旧文章链接

你需要将以下脚本的倒数第二行链接修改为你在 Typecho 中设置的格式。下图是我之前设置的文章路径,你需要将脚本修改为你设置的路径格式。

image-20240215125444452

link_convert.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import glob
import re

import mysql.connector

db = mysql.connector.connect(
    host="localhost",
    user="root",
    password="",
    database="Your-Database-Name"
)

for fname in glob.glob('./source/_posts/*.md'):
    with open(fname, 'r') as f:
        data = f.read()
        ret = re.search(r'title: (.*)$', data, re.M)
        if not ret:
            print(f"error1! {fname}")
            exit(0)
        title = ret.group(1)
        ret = re.search(r'slug: \'?(.*?)\'?$', data, re.M)
        slug = ret[1]
        if not ret:
            print(f"error2! {fname}")
            exit(0)
        link = f'/p/{ret.group(1)}/'
        searchtitle = title.replace('&', '&amp;').replace('<', '&lt;').replace('>', '&gt;')
        cur = db.cursor()
        cur.execute(
            f"SELECT * FROM `typecho_contents` WHERE `slug`='{slug}'")
        res = cur.fetchall()
        if len(res) != 1:
            print(f'error3! {fname}', searchtitle, slug)
            exit(0)
        x = res[0]
        cur.execute(
            f'SELECT * FROM `typecho_relationships` JOIN `typecho_metas` WHERE `typecho_relationships`.`mid`=`typecho_metas`.`mid` AND `typecho_relationships`.`cid`={x[0]}')
        meta = cur.fetchall()
        category = []
        tag = []
        for y in meta:
            if y[5] == 'category':
                category.append([y[3], y[4]])
            elif y[5] == 'tag':
                tag.append(y[4])
            else:
                print(x[1], 'error unhandled', y[5])
        if len(category) != 1:
            print(x[1], 'category length warning', len(category))
            continue
        prevlink = f'/{category[0][1]}/ar{x[0]}.html' # 注意这一项,
        print(prevlink + ',' + link)

运行以上脚本后新建 postmap.txt,复制输出的内容到该文本文件中。

将 postmap.txt 转换为 nginx 规则

运行一下脚本,并将输出的内容粘贴到 nginx 配置文件中即可。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
cat = {}
with open('postmap.txt', 'r') as f:
    l = f.readlines()

for i in l:
    d = i.split(',')
    name = d[0].removeprefix('/').split('/')[0]
    if name not in cat:
        cat[name] = []
    cat[name].append(d)

for k, v in cat.items():
    content = ''
    for j in v:
        content += f'location = {j[0]} {{ return 301 {j[1].strip()}; }}\n'
    print(content)

评论数据迁移

由于 Hugo 是静态网站,需要部署一个评论后端,这里我使用了 Hugo Stack 主题原生支持的 Waline。当时用 Jekyll 做博客的时候似乎就听说过它的前身 Valine。

部署好 Waline 后,安装 Export2Valine 插件导出评论。

如果你的评论数据较多,有概率在点击导出后显示 502 Bad Gateway,并且日志中提示 upstream sent too big header while reading response header from upstream,这时你需要在 nginx 中的 location 字段添加如下代码启用缓存。

1
2
3
4
        ## TUNE buffers to avoid error ##  
        fastcgi_buffers 16 32k; 
        fastcgi_buffer_size 64k; 
        fastcgi_busy_buffers_size 64k; 

修改后如图所示:

image-20240215132423837

成功下载 valine.2024-02-15.jsonl 后,删去文件头部的 #filetype:JSON-streaming ,将其改名为 valine.json。并将该文件与 Waline 的 sqlite 数据库文件放在同一个目录下。之后运行如下脚本:

comment_convert.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
import json
import sqlite3
from datetime import datetime, timedelta

with open('valine.json', 'r') as f:
    data = f.read()
    data = data.split('}\\n{')

mp = {}
with open('postmap.txt', 'r') as f:
    l = f.readlines()
    for i in l:
        d = i.split(',')
        mp['/' + d[0].removeprefix('/').split('/')[1]] = d[1].strip()



def parseTime(t):
    return datetime.strftime(datetime.strptime(t, '%Y-%m-%dT%H:%M:%S.%fZ') + timedelta(hours=8), '%Y-%m-%d %H:%M:%S')


idmap = {None: None}
id = 1
conn = sqlite3.connect('./waline.sqlite')
c = conn.cursor()

for i in data:
    print(i)
    if i[0] != '{':
        i = '{' + i
    if i[-1] != '}':
        i = i + '}'
    obj = json.loads(i)
    idmap[obj['objectId']] = id
    id += 1

id = 1
for i in data:
    if i[0] != '{':
        i = '{' + i
    if i[-1] != '}':
        i = i + '}'
    obj = json.loads(i)
    u = obj['url']
    if u not in ['/msg.html', '/links.html', '/guestbook.html', '/cross.html', '/start-page.html', '/friends.html',]:
        u = '/ar' + u[1:] # 这里修改为你原先的路径样式
        print(u)
        u = mp[u]
    if 'pid' not in obj:
        obj['pid'] = None
    if 'rid' not in obj:
        obj['rid'] = None
    user = None
    if obj['link'] == 'https://raiot.me/':
        user = 1
        obj['mail'] = 'raiot.lee@hotmail.com'
    obj['createdAt'] = parseTime(obj['createdAt'])
    obj['updatedAt'] = parseTime(obj['updatedAt'])
    obj['insertedAt']['iso'] = parseTime(obj['insertedAt']['iso'])
    p = (id, user, obj['comment'], obj['insertedAt']['iso'],
         obj['createdAt'], obj['updatedAt'], obj['ip'], obj['link'], obj['mail'], obj['nick'], obj['ua'], u, "approved", idmap[obj['pid']], idmap[obj['rid']])
    c.execute(
        "INSERT INTO wl_Comment (id,user_id,comment,insertedAt,createdAt,updatedAt,ip,link,mail,nick,ua,url,status,pid,rid) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?);", p)
    id += 1
conn.commit()
conn.close()

运行成功后,将 waline 的数据库文件传回服务器,如果你使用的是 Stack 主题,添加如下配置项启用 Waline

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## Comments
[comments]
enabled = true
provider = "waline"


[comments.waline]
serverURL = "https://Your.Waline.Address"
lang = "zh-CN"
login = "disable"
avatar = ""
emoji = ["https://cdn.jsdelivr.net/gh/walinejs/emojis/weibo"]
requiredMeta = ["name", "email", "url"]
placeholder = ""

[comments.waline.locale]
admin = "Admin"

导出友链

我使用 phpMyAdmin 管理我的数据库,可以在 phpMyAdmin 中进入 typecho_links 数据表,选择导出,格式选择 JSON。

image-20240215133339838

复制导出的 JSON 文件内容,粘贴至下面的脚本中运行,会将其转换为 YAML 格式。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import json
from urllib.parse import unquote
null = 0
# 原始JSON数据
raw_json = [
    {"type":"header","version":"5.2.1","comment":"Export to JSON plugin for PHPMyAdmin"},
    {"type":"database","name":"typecho"},
    {"type":"table","name":"typecho_links","database":"typecho","data":[
        {"lid":"1","name":"Rat's Blog","url":"http:\/\/www.moerats.com\/","sort":"ten","image":"https:\/\/www.moerats.com\/usr\/picture\/rats.png","description":"主机教程,建站知识……\r\n这个blog也是他推荐的\r\n想买VPS的可以去那看看","user":null,"order":"1"},
        {"lid":"2","name":"中二小学长","url":"https:\/\/iwalyou.com\/","sort":"ten","image":"https:\/\/iwalyou.oss-cn-beijing.aliyuncs.com\/avater.png","description":"一个中二演员的日常博客……","user":null,"order":"2"},
        {"lid":"6","name":"左岸博客","url":"https:\/\/www.zrahh.com\/","sort":"ten","image":"https:\/\/www.zrahh.com\/img\/avatar.jpg","description":"比较有质量~","user":null,"order":"3"},
        {"lid":"7","name":"Escher’s Log","url":"https:\/\/jkgblog.com","sort":"ten","image":"https:\/\/i.loli.net\/2019\/06\/24\/5d108e8e33aa130214.jpg","description":"日常杂记、工具分享。","user":null,"order":"4"},
        {"lid":"12","name":"柳橙睿","url":"https:\/\/www.liuchengrui.cn","sort":"ten","image":"https:\/\/www.liuchengrui.cn\/myphoto.png","description":"其实我是一个假柳橙","user":null,"order":"6"},
        {"lid":"13","name":"I Am I ","url":"https:\/\/5ime.cn\/","sort":"ten","image":"https:\/\/cdn.jsdelivr.net\/gh\/5ime\/img\/avatar.jpg","description":null,"user":null,"order":"7"},
        {"lid":"14","name":"香菇肥牛的博客","url":"https:\/\/qing.su\/","sort":"ten","image":null,"description":null,"user":null,"order":"8"},
        {"lid":"15","name":"城南旧事","url":"https:\/\/blog.uso.cc","sort":"ten","image":"https:\/\/blog.uso.cc\/logo.png","description":"有关于成长、生活与情感。","user":null,"order":"9"}
    ]
    }
    ]


# 转换成所需的形式
# 转换成所需的形式
result = []
for entry in raw_json[2]["data"]:
    item = {
        "title": entry["name"],
        "description": entry["description"],
        "website": unquote(entry["url"]),  # 解码链接
        "image": unquote(entry["image"]) if entry.get("image") else None  # 解码图片链接,如果存在的话
    }
    result.append(item)

# 打印结果
for item in result:
    print("- title:", item["title"])
    print("  description:", item["description"])
    print("  website:", item["website"].replace('\\/', '/'))  # 移除转义字符
    if item["image"]:
        print("  image:", item["image"].replace('\\/', '/'))  # 移除转义字符
    print()

接下来,对于 Stack 主题,您只需新建一个页面(Page),并将脚本输出的内容粘贴到 index.md 的 Frontmatter 部分即可。

至此,大部分的迁移工作已经完成,只需将在 source/_post 目录中生成的文件复制到 Hugo 中即可。

如果使用 hugo server 命令渲染网站,包括评论在内的大部分内容应该可以正常访问。

一些优化

(绝赞更新中

Licensed under CC BY-NC-SA 4.0