RSS-Feed – History charset fixed

All mojibake issues in the RSS feed have now been fixed

After previously converting the full system to UTF-8, we have now also cleaned up the remaining mojibake issues in the RSS feed.

These issues were caused by older character encoding problems, where characters such as Swedish å, ä and ö, quotation marks, and other special characters were displayed incorrectly.

At the same time, we have created a script that can fix these issues directly in the database while the system is running. This means that any remaining or newly discovered mojibake issues can be handled live, without having to manually rebuild the RSS feed.

The RSS feed should now be much cleaner and correctly encoded.

#!/usr/bin/env bash
set -euo pipefail

usage() {
    cat <<'EOF'
Usage:
  mojibake.sh --database DB --user USER --table TABLE --columns col1,col2 [options]

Required:
  --database DB              Database/schema name
  --user USER                MySQL/MariaDB user
  --table TABLE              Table name
  --columns col1,col2        Columns to fix, comma-separated

Password:
  --pass PASS                MySQL/MariaDB password
  --ask-pass                 Ask for password
  MYSQL_PWD='secret'         Alternative to --pass

Optional:
  --host HOST                Default: localhost
  --port PORT                Default: 3306
  --pk COLUMN                Numeric primary/range column. Auto-detected if possible.
  --column COLUMN            Add one column. Can be repeated.
  --json-columns col1,col2   Treat these columns as JSON text and escape double quotes.
  --json-column COLUMN       Add one JSON text column. Can be repeated.
  --batch-size N             Default: 5000
  --sample-limit N           Default: 2
  --sample-batches N         Default: 3
  --backup-dir DIR           Default: ./mysql-backups
  --apply                    Actually update rows. Default is dry-run.
  --dry-run                  Only scan and show progress.
  --no-backup                Do not create mysqldump backup. Not recommended.
  --help                     Show this help.

Examples:
  mojibake.sh --database rss --user root --ask-pass --table content --columns json,title,description --dry-run

  mojibake.sh --database rss --user root --ask-pass --table content --columns json,title,description --json-columns json --batch-size 5000 --apply

  mojibake.sh --database rss --user root --ask-pass --table content --pk id --columns json,title,description --apply
EOF
}

die() {
    echo "Error: $*" >&2
    echo >&2
    usage >&2
    exit 1
}

trim() {
    local value="$*"
    value="${value#"${value%%[![:space:]]*}"}"
    value="${value%"${value##*[![:space:]]}"}"
    printf '%s' "$value"
}

quote_ident() {
    local value="$1"
    value="${value//\`/\`\`}"
    printf '`%s`' "$value"
}

sql_string() {
    local value="$1"
    value="${value//\\/\\\\}"
    value="${value//\'/\'\'}"
    printf "'%s'" "$value"
}

add_csv_columns() {
    local csv="$1"
    local target_array_name="$2"
    local item
    local clean
    local items

    IFS=',' read -r -a items <<< "$csv"
    for item in "${items[@]}"; do
        clean="$(trim "$item")"
        [[ -z "$clean" ]] && continue
        eval "$target_array_name+=(\"\$clean\")"
    done
}

join_sql() {
    local separator="$1"
    shift
    local output=""
    local item

    for item in "$@"; do
        if [[ -z "$output" ]]; then
            output="$item"
        else
            output+="$separator$item"
        fi
    done

    printf '%s' "$output"
}

contains_column() {
    local needle="$1"
    shift
    local item

    for item in "$@"; do
        [[ "$item" == "$needle" ]] && return 0
    done

    return 1
}

is_integer_type() {
    local data_type="$1"

    case "$data_type" in
        tinyint|smallint|mediumint|int|integer|bigint)
            return 0
            ;;
        *)
            return 1
            ;;
    esac
}

DATABASE=""
MYSQL_USER_NAME=""
MYSQL_PASSWORD_VALUE=""
MYSQL_PASSWORD_PROVIDED=0
ASK_PASS=0
HOST="localhost"
PORT="3306"
TABLE=""
PK_COLUMN=""
BACKUP_DIR="./mysql-backups"
MODE="dry-run"
DO_BACKUP=1
BATCH_SIZE=5000
SAMPLE_LIMIT=2
SAMPLE_BATCHES=3
FIX_COLUMNS=()
JSON_FIX_COLUMNS=()

if [[ $# -eq 0 ]]; then
    usage
    exit 1
fi

while [[ $# -gt 0 ]]; do
    case "$1" in
        --database)
            [[ $# -ge 2 ]] || die "Missing value for --database"
            DATABASE="$2"
            shift 2
            ;;
        --database=*)
            DATABASE="${1#*=}"
            shift
            ;;
        --user)
            [[ $# -ge 2 ]] || die "Missing value for --user"
            MYSQL_USER_NAME="$2"
            shift 2
            ;;
        --user=*)
            MYSQL_USER_NAME="${1#*=}"
            shift
            ;;
        --pass|--password)
            [[ $# -ge 2 ]] || die "Missing value for --pass"
            MYSQL_PASSWORD_VALUE="$2"
            MYSQL_PASSWORD_PROVIDED=1
            shift 2
            ;;
        --pass=*|--password=*)
            MYSQL_PASSWORD_VALUE="${1#*=}"
            MYSQL_PASSWORD_PROVIDED=1
            shift
            ;;
        --ask-pass)
            ASK_PASS=1
            shift
            ;;
        --host)
            [[ $# -ge 2 ]] || die "Missing value for --host"
            HOST="$2"
            shift 2
            ;;
        --host=*)
            HOST="${1#*=}"
            shift
            ;;
        --port)
            [[ $# -ge 2 ]] || die "Missing value for --port"
            PORT="$2"
            shift 2
            ;;
        --port=*)
            PORT="${1#*=}"
            shift
            ;;
        --table)
            [[ $# -ge 2 ]] || die "Missing value for --table"
            TABLE="$2"
            shift 2
            ;;
        --table=*)
            TABLE="${1#*=}"
            shift
            ;;
        --pk)
            [[ $# -ge 2 ]] || die "Missing value for --pk"
            PK_COLUMN="$2"
            shift 2
            ;;
        --pk=*)
            PK_COLUMN="${1#*=}"
            shift
            ;;
        --columns)
            [[ $# -ge 2 ]] || die "Missing value for --columns"
            add_csv_columns "$2" FIX_COLUMNS
            shift 2
            ;;
        --columns=*)
            add_csv_columns "${1#*=}" FIX_COLUMNS
            shift
            ;;
        --column)
            [[ $# -ge 2 ]] || die "Missing value for --column"
            FIX_COLUMNS+=("$2")
            shift 2
            ;;
        --column=*)
            FIX_COLUMNS+=("${1#*=}")
            shift
            ;;
        --json-columns)
            [[ $# -ge 2 ]] || die "Missing value for --json-columns"
            add_csv_columns "$2" JSON_FIX_COLUMNS
            shift 2
            ;;
        --json-columns=*)
            add_csv_columns "${1#*=}" JSON_FIX_COLUMNS
            shift
            ;;
        --json-column)
            [[ $# -ge 2 ]] || die "Missing value for --json-column"
            JSON_FIX_COLUMNS+=("$2")
            shift 2
            ;;
        --json-column=*)
            JSON_FIX_COLUMNS+=("${1#*=}")
            shift
            ;;
        --batch-size)
            [[ $# -ge 2 ]] || die "Missing value for --batch-size"
            BATCH_SIZE="$2"
            shift 2
            ;;
        --batch-size=*)
            BATCH_SIZE="${1#*=}"
            shift
            ;;
        --sample-limit)
            [[ $# -ge 2 ]] || die "Missing value for --sample-limit"
            SAMPLE_LIMIT="$2"
            shift 2
            ;;
        --sample-limit=*)
            SAMPLE_LIMIT="${1#*=}"
            shift
            ;;
        --sample-batches)
            [[ $# -ge 2 ]] || die "Missing value for --sample-batches"
            SAMPLE_BATCHES="$2"
            shift 2
            ;;
        --sample-batches=*)
            SAMPLE_BATCHES="${1#*=}"
            shift
            ;;
        --backup-dir)
            [[ $# -ge 2 ]] || die "Missing value for --backup-dir"
            BACKUP_DIR="$2"
            shift 2
            ;;
        --backup-dir=*)
            BACKUP_DIR="${1#*=}"
            shift
            ;;
        --apply)
            MODE="apply"
            shift
            ;;
        --dry-run)
            MODE="dry-run"
            shift
            ;;
        --no-backup)
            DO_BACKUP=0
            shift
            ;;
        --help|-h)
            usage
            exit 0
            ;;
        *)
            die "Unknown option: $1"
            ;;
    esac
done

[[ -n "$DATABASE" ]] || die "--database is required"
[[ -n "$MYSQL_USER_NAME" ]] || die "--user is required"
[[ -n "$TABLE" ]] || die "--table is required"
[[ "${#FIX_COLUMNS[@]}" -gt 0 ]] || die "--columns or --column is required"
[[ "$BATCH_SIZE" =~ ^[0-9]+$ ]] || die "--batch-size must be numeric"
[[ "$SAMPLE_LIMIT" =~ ^[0-9]+$ ]] || die "--sample-limit must be numeric"
[[ "$SAMPLE_BATCHES" =~ ^[0-9]+$ ]] || die "--sample-batches must be numeric"
[[ "$BATCH_SIZE" -gt 0 ]] || die "--batch-size must be greater than 0"

if [[ "$ASK_PASS" -eq 1 ]]; then
    read -r -s -p "MySQL password: " MYSQL_PASSWORD_VALUE
    echo
    MYSQL_PASSWORD_PROVIDED=1
fi

if [[ "$MYSQL_PASSWORD_PROVIDED" -eq 1 ]]; then
    export MYSQL_PWD="$MYSQL_PASSWORD_VALUE"
elif [[ -z "${MYSQL_PWD:-}" ]]; then
    die "--pass, --ask-pass, or MYSQL_PWD is required"
fi

MYSQL=(
    mysql
    --default-character-set=utf8mb4
    -h "$HOST"
    -P "$PORT"
    -u "$MYSQL_USER_NAME"
    --database="$DATABASE"
    --batch
    --raw
    --silent
)

MYSQLDUMP=(
    mysqldump
    --default-character-set=utf8mb4
    -h "$HOST"
    -P "$PORT"
    -u "$MYSQL_USER_NAME"
    --single-transaction
    --quick
)

TEXT_MAP='
c383c2a5 c3a5
c383c2a4 c3a4
c383c2b6 c3b6
c383e280a6 c385
c383c285 c385
c383e2809e c384
c383c284 c384
c383e28093 c396
c383c296 c396
c383c2a9 c3a9
c383c2a8 c3a8
c383c2a1 c3a1
c383c2b3 c3b3
c383c2b2 c3b2
c383c2ba c3ba
c383c2b9 c3b9
c383c2bc c3bc
c383c2b1 c3b1
c383c2b8 c3b8
c383c2a6 c3a6
c383c2a7 c3a7
c383e280b0 c389
c383c289 c389
c383cb86 c388
c383c288 c388
c383c281 c381
c383c293 c393
c383e2809c c393
c383c292 c392
c383e28099 c392
c383c39a c39a
c383c5a1 c39a
c383c39c c39c
c383c593 c39c
c383c291 c391
c383e28098 c391
c383c398 c398
c383cb9c c398
c383c386 c386
c383e280a0 c386
c383c387 c387
c383e280a1 c387
c3a2e282acc593 22
c3a2c280c29c 22
c3a2e282acc29d 22
c3a2c280c29d 22
c3a2e282accb9c 27
c3a2c280c298 27
c3a2e282ace284a2 27
c3a2c280c299 27
c3a2e282ace2809c 2d
c3a2c280c293 2d
c3a2e282ace2809d 2d
c3a2c280c294 2d
c3a2e282acc2a6 2e2e2e
c3a2c280c2a6 2e2e2e
c3a2e282acc2a2 2d
c3a2c280c2a2 2d
c3a2e2809ec2a2 e284a2
c3a2c284c2a2 e284a2
c3a2e2809ac2ac e282ac
c3a2c282c2ac e282ac
c382c2a0 20
c382c2a7 c2a7
c382c2a9 c2a9
c382c2ae c2ae
c382c2b0 c2b0
c382c2b1 c2b1
c382c2ab c2ab
c382c2bb c2bb
'

JSON_MAP='
c383c2a5 c3a5
c383c2a4 c3a4
c383c2b6 c3b6
c383e280a6 c385
c383c285 c385
c383e2809e c384
c383c284 c384
c383e28093 c396
c383c296 c396
c383c2a9 c3a9
c383c2a8 c3a8
c383c2a1 c3a1
c383c2b3 c3b3
c383c2b2 c3b2
c383c2ba c3ba
c383c2b9 c3b9
c383c2bc c3bc
c383c2b1 c3b1
c383c2b8 c3b8
c383c2a6 c3a6
c383c2a7 c3a7
c383e280b0 c389
c383c289 c389
c383cb86 c388
c383c288 c388
c383c281 c381
c383c293 c393
c383e2809c c393
c383c292 c392
c383e28099 c392
c383c39a c39a
c383c5a1 c39a
c383c39c c39c
c383c593 c39c
c383c291 c391
c383e28098 c391
c383c398 c398
c383cb9c c398
c383c386 c386
c383e280a0 c386
c383c387 c387
c383e280a1 c387
c3a2e282acc593 5c22
c3a2c280c29c 5c22
c3a2e282acc29d 5c22
c3a2c280c29d 5c22
c3a2e282accb9c 27
c3a2c280c298 27
c3a2e282ace284a2 27
c3a2c280c299 27
c3a2e282ace2809c 2d
c3a2c280c293 2d
c3a2e282ace2809d 2d
c3a2c280c294 2d
c3a2e282acc2a6 2e2e2e
c3a2c280c2a6 2e2e2e
c3a2e282acc2a2 2d
c3a2c280c2a2 2d
c3a2e2809ec2a2 e284a2
c3a2c284c2a2 e284a2
c3a2e2809ac2ac e282ac
c3a2c282c2ac e282ac
c382c2a0 20
c382c2a7 c2a7
c382c2a9 c2a9
c382c2ae c2ae
c382c2b0 c2b0
c382c2b1 c2b1
c382c2ab c2ab
c382c2bb c2bb
'

build_replace_expr() {
    local column="$1"
    local map="$2"
    local quoted_column
    local expr
    local bad_hex
    local good_hex

    quoted_column="$(quote_ident "$column")"
    expr="CAST(${quoted_column} AS CHAR CHARACTER SET utf8mb4)"

    while read -r bad_hex good_hex; do
        [[ -z "${bad_hex:-}" ]] && continue
        expr="REPLACE(${expr}, CONVERT(0x${bad_hex} USING utf8mb4), CONVERT(0x${good_hex} USING utf8mb4))"
    done <<< "$map"

    printf '%s' "$expr"
}

build_column_where() {
    local column="$1"
    local map="$2"
    local quoted_column
    local expr
    local bad_hex
    local good_hex

    quoted_column="$(quote_ident "$column")"
    expr="CAST(${quoted_column} AS CHAR CHARACTER SET utf8mb4)"

    while read -r bad_hex good_hex; do
        [[ -z "${bad_hex:-}" ]] && continue
        printf 'LOCATE(CONVERT(0x%s USING utf8mb4), %s) > 0\n' "$bad_hex" "$expr"
    done <<< "$map"
}

column_data_type() {
    local column="$1"

    "${MYSQL[@]}" -N -e "
        SELECT DATA_TYPE
        FROM information_schema.COLUMNS
        WHERE TABLE_SCHEMA = $(sql_string "$DATABASE")
          AND TABLE_NAME = $(sql_string "$TABLE")
          AND COLUMN_NAME = $(sql_string "$column")
        LIMIT 1;
    "
}

detect_single_numeric_pk() {
    local pk_count
    local pk_name
    local pk_type

    pk_count="$("${MYSQL[@]}" -N -e "
        SELECT COUNT(*)
        FROM information_schema.KEY_COLUMN_USAGE
        WHERE TABLE_SCHEMA = $(sql_string "$DATABASE")
          AND TABLE_NAME = $(sql_string "$TABLE")
          AND CONSTRAINT_NAME = 'PRIMARY';
    ")"

    [[ "$pk_count" == "1" ]] || return 1

    read -r pk_name pk_type < <("${MYSQL[@]}" -N -e "
        SELECT k.COLUMN_NAME, c.DATA_TYPE
        FROM information_schema.KEY_COLUMN_USAGE k
        JOIN information_schema.COLUMNS c
          ON c.TABLE_SCHEMA = k.TABLE_SCHEMA
         AND c.TABLE_NAME = k.TABLE_NAME
         AND c.COLUMN_NAME = k.COLUMN_NAME
        WHERE k.TABLE_SCHEMA = $(sql_string "$DATABASE")
          AND k.TABLE_NAME = $(sql_string "$TABLE")
          AND k.CONSTRAINT_NAME = 'PRIMARY'
        LIMIT 1;
    ")

    if is_integer_type "$pk_type"; then
        printf '%s' "$pk_name"
        return 0
    fi

    return 1
}

echo "Checking connection and table metadata..."
"${MYSQL[@]}" -e "SET NAMES utf8mb4; SELECT 1;" >/dev/null

if [[ -z "$PK_COLUMN" ]]; then
    if PK_COLUMN="$(detect_single_numeric_pk)"; then
        echo "Range column: $PK_COLUMN (auto-detected primary key)"
    else
        die "Could not auto-detect a single numeric primary key. Use --pk id or another indexed numeric column."
    fi
else
    pk_type="$(column_data_type "$PK_COLUMN")"
    [[ -n "$pk_type" ]] || die "PK/range column does not exist: ${DATABASE}.${TABLE}.${PK_COLUMN}"
    is_integer_type "$pk_type" || die "--pk column must be numeric/integer, got: $PK_COLUMN ($pk_type)"
    echo "Range column: $PK_COLUMN (manual)"
fi

TARGET="$(quote_ident "$DATABASE").$(quote_ident "$TABLE")"
QUOTED_PK="$(quote_ident "$PK_COLUMN")"

SET_PARTS=()
WHERE_PARTS=()
SAMPLE_COLUMNS=()

for column in "${FIX_COLUMNS[@]}"; do
    column="$(trim "$column")"
    [[ -z "$column" ]] && continue

    data_type="$(column_data_type "$column")"
    [[ -n "$data_type" ]] || die "Column does not exist: ${DATABASE}.${TABLE}.${column}"

    map="$TEXT_MAP"
    mode_label="text"

    if [[ "$data_type" == "json" ]] || contains_column "$column" "${JSON_FIX_COLUMNS[@]}"; then
        map="$JSON_MAP"
        mode_label="json"
    fi

    echo "Column: $column ($data_type, $mode_label mode)"

    SET_PARTS+=("$(quote_ident "$column") = $(build_replace_expr "$column" "$map")")
    SAMPLE_COLUMNS+=("LEFT(CAST($(quote_ident "$column") AS CHAR CHARACTER SET utf8mb4), 220) AS $(quote_ident "$column")")

    while IFS= read -r condition; do
        [[ -z "$condition" ]] && continue
        WHERE_PARTS+=("$condition")
    done < <(build_column_where "$column" "$map")
done

[[ "${#SET_PARTS[@]}" -gt 0 ]] || die "No usable columns found"
[[ "${#WHERE_PARTS[@]}" -gt 0 ]] || die "No replacement map loaded"

SET_SQL="$(join_sql $',\n    ' "${SET_PARTS[@]}")"
WHERE_SQL="$(join_sql $' OR\n        ' "${WHERE_PARTS[@]}")"
SAMPLE_SQL="$(join_sql $',\n        ' "${SAMPLE_COLUMNS[@]}")"

read -r PK_MIN PK_MAX < <("${MYSQL[@]}" -N -e "
    SET NAMES utf8mb4;
    SELECT COALESCE(MIN(${QUOTED_PK}), 0), COALESCE(MAX(${QUOTED_PK}), 0)
    FROM ${TARGET};
")

if [[ "$PK_MIN" == "0" && "$PK_MAX" == "0" ]]; then
    echo "Table appears empty. Nothing to do."
    exit 0
fi

TOTAL_SPAN=$((PK_MAX - PK_MIN + 1))
if [[ "$TOTAL_SPAN" -le 0 ]]; then
    die "Invalid range: min=$PK_MIN max=$PK_MAX"
fi

echo
echo "Mode: $MODE"
echo "Target: ${DATABASE}.${TABLE}"
echo "Range: ${PK_COLUMN} ${PK_MIN}..${PK_MAX}"
echo "Batch size: $BATCH_SIZE"
echo

if [[ "$MODE" == "apply" && "$DO_BACKUP" -eq 1 ]]; then
    mkdir -p "$BACKUP_DIR"
    BACKUP_FILE="${BACKUP_DIR}/${DATABASE}.${TABLE}.before-mojibake-fix.$(date +%Y%m%d-%H%M%S).sql"

    echo "Creating backup: $BACKUP_FILE"

    if command -v pv >/dev/null 2>&1; then
        "${MYSQLDUMP[@]}" "$DATABASE" "$TABLE" | pv -b -r -t > "$BACKUP_FILE"
    else
        echo "Tip: install pv if you want byte progress during backup."
        "${MYSQLDUMP[@]}" "$DATABASE" "$TABLE" > "$BACKUP_FILE"
    fi

    echo "Backup done."
    echo
elif [[ "$MODE" == "apply" ]]; then
    echo "Backup disabled with --no-backup. Continuing without dump."
    echo
fi

CURRENT_START="$PK_MIN"
BATCH_NO=0
TOTAL_MATCHES=0
TOTAL_CHANGED=0
SAMPLES_SHOWN=0

while [[ "$CURRENT_START" -le "$PK_MAX" ]]; do
    CURRENT_END=$((CURRENT_START + BATCH_SIZE - 1))
    if [[ "$CURRENT_END" -gt "$PK_MAX" ]]; then
        CURRENT_END="$PK_MAX"
    fi

    BATCH_NO=$((BATCH_NO + 1))
    PROCESSED_SPAN=$((CURRENT_END - PK_MIN + 1))
    PERCENT=$((PROCESSED_SPAN * 100 / TOTAL_SPAN))

    BATCH_WHERE="
        ${QUOTED_PK} BETWEEN ${CURRENT_START} AND ${CURRENT_END}
        AND (
        ${WHERE_SQL}
        )
    "

    MATCHES="$("${MYSQL[@]}" -N -e "
        SET NAMES utf8mb4;
        SELECT COUNT(*)
        FROM ${TARGET}
        WHERE ${BATCH_WHERE};
    ")"

    TOTAL_MATCHES=$((TOTAL_MATCHES + MATCHES))

    if [[ "$MODE" == "apply" ]]; then
        CHANGED="$("${MYSQL[@]}" -N -e "
            SET NAMES utf8mb4;
            START TRANSACTION;
            UPDATE ${TARGET}
            SET
                ${SET_SQL}
            WHERE ${BATCH_WHERE};
            SELECT ROW_COUNT();
            COMMIT;
        ")"

        TOTAL_CHANGED=$((TOTAL_CHANGED + CHANGED))

        printf '[%s] Batch %d | %s=%s..%s | %3d%% | matches=%s | changed=%s | total_changed=%s\n' \
            "$(date '+%H:%M:%S')" \
            "$BATCH_NO" \
            "$PK_COLUMN" \
            "$CURRENT_START" \
            "$CURRENT_END" \
            "$PERCENT" \
            "$MATCHES" \
            "$CHANGED" \
            "$TOTAL_CHANGED"
    else
        printf '[%s] Batch %d | %s=%s..%s | %3d%% | matches=%s | total_matches=%s\n' \
            "$(date '+%H:%M:%S')" \
            "$BATCH_NO" \
            "$PK_COLUMN" \
            "$CURRENT_START" \
            "$CURRENT_END" \
            "$PERCENT" \
            "$MATCHES" \
            "$TOTAL_MATCHES"
    fi

    if [[ "$MATCHES" -gt 0 && "$SAMPLES_SHOWN" -lt "$SAMPLE_BATCHES" && "$SAMPLE_LIMIT" -gt 0 ]]; then
        SAMPLES_SHOWN=$((SAMPLES_SHOWN + 1))
        echo "Sample batch $SAMPLES_SHOWN:"
        "${MYSQL[@]}" -e "
            SET NAMES utf8mb4;
            SELECT
                ${QUOTED_PK} AS pk,
                ${SAMPLE_SQL}
            FROM ${TARGET}
            WHERE ${BATCH_WHERE}
            LIMIT ${SAMPLE_LIMIT};
        "
        echo
    fi

    CURRENT_START=$((CURRENT_END + 1))
done

echo
echo "Finished."
echo "Total matching rows seen: $TOTAL_MATCHES"

if [[ "$MODE" == "apply" ]]; then
    echo "Total changed rows: $TOTAL_CHANGED"
else
    echo "Dry-run only. Re-run with --apply to update data."
fi

Support our work with Tornevall Tools

Tornevall Tools is growing with DNS tools, fact checking, OpenAI-powered services, and upcoming Microsoft Copilot-based features.

Keeping the platform running costs money. Hosting, servers, maintenance, development, and AI usage all add up over time.

If you want to help keep Tornevall Tools available, stable, and improving, you can support the project here:

GoFundMe:
https://gofund.me/13bed61f4

Ko-fi memberships and subscriptions:
https://ko-fi.com/tornevall/

Ko-fi is especially useful for ongoing support. Memberships and subscriptions help cover recurring costs and make it easier to keep paid or resource-heavy features available over time.

Thank you for helping keep Tornevall Tools running and improving.

DNSBL cloudflare turnstile fixed

We’ve been using turnstile for a while, with the DNSBL, to protect against spam. That turnstile was broken. It has now been fixed and released to the official DNSBL package at wordpress.

SocialGPT 1.2.19: clearer verifications, saved fact cards, and better debugging

SocialGPT 1.2.19 makes fact verifications easier to review, easier to share, and much less frustrating when something is slow or behaving strangely.

This update mainly focuses on three things:

  1. better control over Verify fact
  2. saved verifications you can reopen later
  3. clearer visibility when you want to understand what is happening in the background

What’s new?

Fact verifications are now saved automatically

When a verification succeeds, it is now saved as a real fact card in your Tools account. That means you no longer have to run the same check again just because you want to read the result later.

This makes it easier to:

  • go back to an earlier verification
  • review the sources at your own pace
  • compare multiple verifications over time
  • share a saved result afterwards

Share links for saved verifications

Saved fact cards can now get a public share link. In practice, this means you can run a verification once, save the result, and then open or share that same version again without having to run it again.

This is especially useful when you want to:

  • show a finished verification result to someone else
  • save a reference to an important check
  • reuse the same result in follow-ups or documentation

Fact cards now have a better reading format

Saved verification cards are now shown in a more editorial and readable format instead of feeling like a raw text dump.

Markdown is also supported better, which makes headings, lists, links, and simple structure much clearer when a card is reopened or shared publicly.

In short: results now feel more like finished articles and less like internal debug text.

New discreet debug panel in verification view

If you want to understand why a verification is taking time or acting strangely, there is now a small dbg opener directly inside the verification box.

It shows things like:

  • which phase the verification is in
  • how long different parts take
  • request token and timeout
  • transport details between the extension and Tools
  • a compact preview of the response or error

This is mainly for troubleshooting and follow-up, without making the normal verification UI noisy for everyone else.

Verify fact timeout behaves better

A slow or stuck verification should no longer keep counting forever without actually ending.

In 1.2.19, the verification flow now uses the same timeout logic as the other AI calls in the toolbox. That makes stuck verifications easier to catch and makes it clearer when a response is simply not coming back.

The debug panel is easier to use

The new debug panel also received important usability fixes:

  • it can be opened without getting caught by the drag behavior of the box
  • it has its own scroll when the information gets long
  • the content is easier to read and copy

For people using Facebook admin and review workflows

This release also continues improving visibility in Facebook-related workflows.

Among other things, the admin overlay is clearer with:

  • visible queue and dedupe status directly in the panel
  • better presentation of the latest batch result
  • clearer marking when scrolling has reached content that is already known or already sent

That makes it easier to see what is actually happening without having to guess.

Short summary

SocialGPT 1.2.19 makes Verify fact more practical in day-to-day use:

  • verifications can be saved and reopened
  • saved results can be shared
  • fact cards are easier to read
  • timeout behavior is better
  • there is now a built-in debug surface for people who need more insight

This release is mainly about improving the workflow after the verification itself — not just the answer you get, but how easy it is to understand, save, reuse, and troubleshoot it.

Restored site from yesterday

An internal incident recently affected parts of the platform and interrupted some normal account and data flows. This was not an external hack, but a recovery situation tied to internal automation and backup handling. A large part of the service is already back online, but some older data still has to be rebuilt carefully before everything can be treated as fully restored again.

For some users, that may mean reconnecting credentials, resetting passwords, or recreating specific tokens instead of expecting every older connection to return automatically. The recovery work is continuing, and the goal is to restore access safely without reintroducing broken or uncertain data. At the same time, development has not stopped completely around SocialGPT, and new features are still planned once the current stabilization work is in a better place.

Do We Need a Trustpilot for Social Media – and What Would It Mean?

We have started laying the foundation for Trustpilot for Social Media.

The idea is simple: people should get better help understanding whether a website, link, post, or source seems trustworthy, questionable, or worth checking more carefully.

Today, we often meet information without any useful context. A link appears in a feed. A post gets shared. A website looks serious enough. A claim is repeated often enough. And suddenly, people are expected to decide for themselves whether it deserves trust.

That is not always easy.

A trust layer for the web

Trustpilot for Social Media is meant to become a kind of reputation layer for the web.

Not a system that decides truth for everyone. Not a censorship tool. Not an automatic judge.

More like a warning light.

If a site has a long history of misleading content, scams, conspiracy material, or other serious problems, the user should be able to see that before trusting it. If a source has a stronger reputation, that should also be visible. If nothing is known, the system should simply say that.

The browser could show a small label or icon when visiting a website. The user could click it to read more, report a page, or see whether others have flagged the same source.

That kind of context could help people slow down before sharing, reacting, or believing something too quickly.

But reputation systems are risky

A system like this also comes with problems.

A report is not the same thing as a fact. People can misunderstand things. They can also abuse reporting systems on purpose. Competitors, political groups, trolls, and angry users could all try to damage someone else’s reputation.

That means reports must be handled carefully.

A single report should not become a public warning. Community signals should not automatically become truth. Admin review, correction options, and transparency need to be built in from the start.

The system should help people think, not tell them what to think.

GDPR cannot be an afterthought

There is also a privacy side to this.

Reporting a website is one thing. Reporting a person, a social media profile, a comment, or behavior connected to an account is something else.

That can become personal data very quickly.

Because of that, GDPR has to be part of the design from the beginning. The system should collect as little data as possible, avoid storing unnecessary personal details, and clearly separate website reputation from anything related to individuals.

Users must understand what happens when they report something. Is the report private? Can it become part of community data? Will an admin review it? Could AI be used to help analyze it? Can it be corrected or removed later?

Those answers need to be clear.

A reputation system without a correction process is dangerous. Websites can improve. Reports can be wrong. Context can change. People must be able to challenge, correct, or remove bad information where appropriate.

AI can help, but should not decide

AI can be useful in a project like this. It can help summarize reports, compare sources, detect patterns, and support fact-checking.

But AI should not become the final judge.

If AI is used, it should be clearly marked as support. It should also be optional, because every AI-assisted check costs money and may involve sensitive context.

The default should be cautious and privacy-friendly.

Starting small

The first version should not try to classify the entire internet.

A better start is to support moderation and basic source reputation. For example, helping admins understand whether a user, comment, post, or link needs a closer look before approval.

From there, the system can grow into browser warnings, community reports, public reputation pages, and deeper fact-checking tools.

So, do we need it?

Probably.

The web has a trust problem. People are constantly asked to judge sources, claims, links, and posts without enough context.

A careful reputation layer could help.

But it has to be built with limits. It needs transparency, privacy, GDPR-aware design, human review, and a way to fix mistakes.

The goal is not to control what people read.

The goal is to help people understand what they are looking at before they trust it.

Fact-checking Tools for Chrome

We are continuing to develop Tornevall Networks Toolbox for Social Medias with a clear focus on fact-checking.

The idea is simple: to gather information from established fact-checking organizations such as Snopes, Källkritikbyrån, Motargument and others, and use that to create a clearer picture of what is actually accurate in what we see online.

Rather than pointing out individual posts, the goal is to improve the overall understanding of how information spreads. By following different sources over time, it becomes easier to see how topics change, grow, or shift direction.

This is especially relevant during election periods, when the amount of information increases and it becomes harder to separate facts from misleading claims.

By combining multiple sources, we want to make it easier to see the bigger picture without relying on any single actor.

This is still a work in progress, but the ambition is straightforward: to make it easier to understand the flow of information online.

If you have suggestions for fact-checkers we should include, feel free to reach out.


Things are moving again

It has been a while since there was any real movement across the wider Tornevall Networks ecosystem.

That was not because everything had stopped. Most of it kept running just fine. But like many privately maintained projects, a lot of ideas ended up sitting in the background for far too long simply because life, time, and energy had to go elsewhere.

That has started to change.

Over the last few weeks, several parts of the platform have begun moving again – not just in maintenance terms, but in actual development. Some older services are being cleaned up, some tools are being rebuilt properly, and a few things that had been sitting half-finished for too long are finally getting the attention they should have had earlier.

One of the biggest shifts is happening around tools.tornevall.net, where a larger rebuild has made it possible to modernize parts of the ecosystem that had become too slow, too fragmented, or simply too outdated to keep patching forever. DNS-related tooling is being refreshed, documentation is being brought closer to reality, and a number of internal and public-facing interfaces are becoming more usable than before.

This also connects with changes already visible on the site. SocialGPT marked one kind of step forward. The ongoing DNSBL removal rebuild marks another. Older infrastructure is not being thrown away for the sake of it, but where something needs a cleaner structure, it is now being rebuilt with that in mind.

So while this is not a grand relaunch of everything at once, it is a very real shift in direction.

The platform is active again. Development is active again. And several long-running ideas are finally starting to look like real, usable systems instead of permanent work in progress.

Current areas of focus include

  • Rebuilding and modernizing tools.tornevall.net
  • Refreshing DNS-related tooling and removal workflows
  • Cleaning up older services and legacy structure
  • Improving documentation so it better reflects reality
  • Making internal and public-facing interfaces more usable
  • Bringing long-running ideas closer to fully usable systems
  • And much much more not even written down yet

DNSBL Removal Tool Upgrade in Progress

The DNSBL and FraudBL-page is currently undergoing an upgrade as part of a broader rebuild of our DNS-related tooling. The removal functionality applies to entries listed in DNSBL and FraudBL, which has been handled through a traditional database but via direct access to the underlying zone files a while.

Since it doesn’t work properly, the removal service is being modernized and restructured to improve reliability, security, and long-term maintainability. During this period, the existing web-based removal interface is half-way offline (the requests has been reported dysfunctional).

The rebuilt system will introduce a cleaner separation between web tools and API functionality. Removal requests will be handled through a dedicated API endpoint available at https://tools.tornevall.net, allowing for more predictable behavior and better automation support. Also, the DNSBL plugin at WordPress will be upgraded and refreshed.

The upcoming implementation focuses on proper CIDR handling, accurate single-IP removals, and support for server-side usage through a CLI endpoint. Access to CLI functionality will require a manually generated token to ensure controlled and auditable use.

The web interface will return in a new form, protected by modern verification mechanisms such as Captchas and Cloudflare Turnstile. The goal is to reduce abuse while keeping legitimate self-service removals straightforward.

Why the Page Is Changing

Earlier versions of this page relied on legacy components that no longer met technical or security requirements. Rather than patching outdated functionality, the decision was made to rebuild the removal system and related DNS tools as a coherent package.

The new solution will be delivered together with updated DNS utilities and an improved DNSBL Removal Kit, replacing older integrations.

About Availability and Support

The site based tools is maintained as a self-service resource. Response times and availability may vary due to private-life constraints. For this reason, the tooling is designed to minimize the need for manual intervention wherever possible.

For common questions and background information, please refer to our documentation and/or FAQ.

The site is privately maintained, not owned by any company or organization, and operates without commercial funding. All development and maintenance are done on spare time.

If you wish to support continued development, optional donation alternatives are available on the support page.

Current Status

The removal tool is under active redevelopment and will return as part of a consolidated DNS toolset with a fully functional DNSBL removal workflow.

Introducing SocialGPT

SocialGPT is a lightweight but powerful Chrome extension that integrates ChatGPT directly into your social media experience. For years, I’ve wanted a tool that would let me write smarter, sharper, more context-aware replies without opening new tabs or juggling windows. Every time I needed to draft a rebuttal or clarify a point, I wished for something embedded – something right there on the page.

Now it exists.

With SocialGPT, you can mark comment threads, automatically pull their content into an in-page editor panel, and generate AI-assisted replies in seconds – no reloads, no switching, no bullshit.

Source code: BitbucketGitHub mirror

Key Features

  • Context Marking – Highlight any number of elements in a thread to build a structured conversation context with block indexes (like [1], [2], etc).
  • Floating Reply Panel – A modal-less in-page editor where you can:
    • choose tone (e.g. cynical, friendly, brutally honest)
    • select response length (short, micro, extended)
    • switch models (GPT-4o, GPT-4, o3-mini)
    • input modifiers and custom instructions
  • Facebook-Aware – Automatically detects your profile name and injects it into the prompt for authentic replies.
  • Right-Click Access – Mark content or open the reply interface with a right-click.
  • Mark Mode Toggle – One-click switch to enable or disable GPT reading mode.
  • Response Modification – Use the Modify button to rework or fine-tune previous replies with new tone, instructions or shortened versions. This is especially useful after generation, since context and prompt fields are cleared upon reply.
  • Visual Loader – Subtle spinning loader shows when ChatGPT is generating content.
  • Fact Check Reminder – Prompts include reminders to validate and cross-reference controversial claims or disputed data before producing a final draft. Designed to prevent regurgitation of unchecked social media noise.

Tone Profiles

Organized into four categories:

  • Objective & Informative – neutral and formal, fact-based and concise, academic and precise, analytical and critical
  • Confrontational & Direct – critical and direct, cynical and sharp, aggressive and unapologetic, brutally honest
  • Satirical & Sarcastic – sarcastic and dry, snarky and dismissive, satirical and ironic, witty and clever
  • Approachable & Light – friendly and casual, conversational and soft

Ideal Use Cases

  • Rebuttals in comment sections
  • High-speed debate replies
  • Satirical or snarky thread injections
  • Public moderation with edge
  • Clarifying academic-style posts

Requirements

  • OpenAI API key (GPT-4 or GPT-4o recommended)
  • Chrome browser with extensions enabled

Created by Thomas Tornevall for real-world online interaction. Feedback and pull requests are welcome at either GitHub or Bitbucket.

Stay sharp – speak smart – strike fast.