srmdn.

Back

A Network Blip Is Not Just a BlipBlur image

A network blip is brief, self-healing, and invisible after the fact. It is also the kind of thing that silently wipes out a backup job while your site keeps running normally.

What is a network blip#

A network blip is a temporary interruption in a network connection. Not a full outage. The server stays up, DNS resolves, ping responds. But for a few seconds (sometimes longer), packets get dropped or delayed enough that an active TCP connection times out and closes.

Common causes:

  • A router or switch along the path reboots or flushes its connection table

  • A cloud provider does brief maintenance on a network link

  • BGP route changes between two hosting providers mid-transfer

  • ISP congestion causing packet loss above the TCP retransmission threshold

The defining characteristic: by the time you notice and go to investigate, everything works fine again. There is no broken host to point at, no down service to restart. The failure window is already closed.

Why long-running operations take the hit#

A web request that hits a blip just fails and retries in milliseconds. The user barely notices, the next request goes through, life continues. The entire exposure window is under a second.

A long-running transfer is different. An SCP upload that hits a blip mid-transfer loses the entire transfer. No retry, no resume. The connection is gone, the destination file is incomplete or missing, and the script that launched it either crashes or silently moves on.

This is the asymmetry: short operations tolerate blips, long operations do not. Database backups, large file transfers, remote sync jobs — anything that holds a TCP connection open for more than a few seconds is exposed. The longer the transfer, the higher the probability that a momentary blip lands inside it.

Case study: the failed backup#

My backup timer fired at 3am. The script packaged SQLite databases, config files, and content into a .tar.gz, then uploaded to a remote VPS over SCP through a jump host. Every local step completed cleanly. Then the upload dropped.

The setup is a standard bash backup script scheduled via systemd timer. If you want a reference implementation, here is the one I use.

What the logs said#

journalctl -u your-backup.service --since today
bash

Output:

Mar 31 03:01:05 backup[12301]: [03:01:05] Creating archive...
Mar 31 03:01:05 backup[12302]: [03:01:05]   archive: backup_20260331.tar.gz (1.1M)
Mar 31 03:01:05 backup[12303]: [03:01:05] Uploading to remote...
Mar 31 03:08:18 backup[12304]: scp: Connection closed
Mar 31 03:08:21 systemd[1]: backup.service: Main process exited, code=exited, status=255/EXCEPTION
Mar 31 03:08:21 systemd[1]: backup.service: Failed with result 'exit-code'.
Mar 31 03:08:21 systemd[1]: Failed to start backup.service
plaintext

Two things stand out. First, the gap: Uploading to remote... logged at 03:01, then silence until scp: Connection closed at 03:08. SCP hung for seven minutes before the connection dropped. Second, the log file itself has no “uploaded” confirmation line:

[03:01:05] Uploading to remote...
[03:08:18] ← scp: Connection closed (from journalctl, not the log file)
plaintext

The backup script used set -euo pipefail. When scp exited non-zero, the script bailed immediately without writing another log entry. No “upload failed” message, because nothing in the failure path wrote to the log before exiting. The gap in the log file is the signal.

What caused it#

The SCP route went through a jump host. The jump host stayed up (ping responded, port open), but the SSH session dropped mid-transfer. A brief interruption on the relay, a momentary blip on the path between relay and backup destination. The connection was healthy before and after.

To rule out a key issue or downed destination, test each hop separately:

# Test the relay directly
ssh -p <port> user@relay-host "echo OK"

# Test the full chain
ssh user@backup-destination "echo OK"

# Manual SCP test
scp /tmp/testfile user@backup-destination:/tmp/
bash

All three succeeded immediately. The relay was healthy. The failure was a moment-in-time blip: gone before the investigation started.

Why it is easy to miss#

The systemd unit did report Failed with result 'exit-code'. So systemctl status your-backup.service would show a failed state. But you only check that when something is obviously broken. A 3am backup failure does not break anything visible. The site stays up, requests keep coming in, nothing pages you.

The only reason I caught it was an alert email. Without that, I would have gone days believing the remote copy existed.

The real consequence#

If a disk fails or a deploy goes wrong the next morning and you reach for the latest backup, you are restoring from data that is older than you think. In a low-traffic personal project that might be acceptable. In anything with active writes, that gap is data loss.

What to do after a failure#

Check whether the local archive is still there. A backup script that packages locally before uploading will still have the archive even when the upload fails:

ls -lh ~/backups/
bash

If the file is there, the data is not lost. Upload it manually:

scp ~/backups/backup_20260331.tar.gz user@backup-destination:/path/to/backups/
bash

Then run the full backup script again to get a fresh timestamped copy and let it handle remote rotation normally.

What to add to prevent silent failures#

Trap on exit and log the outcome. With set -euo pipefail, the failure path exits without writing to the log by default. Add a trap:

on_exit() {
    local code=$1
    if [[ $code -ne 0 ]]; then
        log "Backup FAILED (exit code: $code)"
        send_failure_alert "$code"
    fi
}
trap 'on_exit $?' EXIT
bash

Send a failure notification. The systemd unit failure state is not enough because it requires you to look for it. Email via curl --ssl-reqd to an SMTP relay, a webhook, anything. The goal is a push notification, not a pull check.

Verify the remote after upload. After scp returns, SSH into the destination and confirm the file exists and is non-empty:

ssh backup-destination "ls -lh /backups/backup_${TIMESTAMP}.tar.gz"
bash

A zero-byte file after a dropped connection is a real failure mode. SCP can create the destination file before the transfer completes.

The check that matters#

After any incident, read the log file directly alongside journalctl. The log file shows what the script wrote. journalctl shows what systemd saw, including stderr. Together they give you the full picture. A gap between “Uploading…” and the next entry is the signature of a blip-killed transfer.

Network blips are outside your control. What is inside your control: whether your script logs the outcome, whether a failure sends you a notification, and whether you verify the remote copy after every upload. The blip cannot be prevented. The silent failure can.

Enjoyed this post?

Get Linux tips, sysadmin war stories, and new posts delivered to your inbox.

No spam. Unsubscribe anytime.

A Network Blip Is Not Just a Blip
https://srmdn.com/blog/network-blip
Author srmdn
Published at March 31, 2026