What 2 GB of Logs on a Fresh VPS Actually Means

A few weeks after moving to a self-managed VPS, I noticed the system journal had grown to over 2 GB. The server had only been running about a month. Nothing about the apps was unusual: traffic was normal, no crashes, no deployments that week.

So I started digging.

Finding the source#

journalctl --disk-usage confirmed the size. To find what was writing so aggressively, I pulled the last seven days of logs and counted entries per service:

journalctl --no-pager -q --since "7 days ago" -o json \
  | python3 -c "
import sys, json, collections
counts = collections.Counter()
for line in sys.stdin:
    try:
        d = json.loads(line)
        svc = d.get('_SYSTEMD_UNIT') or d.get('SYSLOG_IDENTIFIER') or 'unknown'
        counts[svc] += 1
    except: pass
for svc, n in counts.most_common(10):
    print(f'{n:>8}  {svc}')
"

bash

The result:

   73978  ssh.service
    9308  cron.service
    4752  app-backend.service
    1740  app-production.service
     606  kernel

plaintext

SSH had nearly 74,000 log entries in seven days. Everything else combined didn’t come close.

What’s actually in there#

journalctl -u ssh.service --no-pager -n 10

bash

Mar 03 04:07:02 vps sshd[63622]: Invalid user xiedr from 45.148.10.118 port 43446
Mar 03 04:07:02 vps sshd[63622]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=45.148.10.118
Mar 03 04:07:05 vps sshd[63622]: Failed password for invalid user xiedr from 45.148.10.118 port 43446 ssh2
Mar 03 04:09:15 vps sshd[63704]: Failed password for root from 181.23.107.93 port 38173 ssh2
Mar 03 04:11:41 vps sshd[63950]: Invalid user a from 134.122.46.171 port 53312
Mar 03 04:11:42 vps sshd[63951]: Invalid user a from 134.122.46.171 port 53318
Mar 03 04:11:42 vps sshd[63952]: Invalid user a from 134.122.46.171 port 53322

plaintext

SSH brute-force attempts. About 10,000 a day, every day, since the server went live.

My first reaction was concern. But before doing anything, I wanted to understand what I was actually looking at.

Why this actually matters#

10,000 attempts a day sounds containable until you do the math. At that rate, with no journal size limit set, you’re looking at several gigabytes a month and it compounds as long as the server is up. I’ve had this happen before on a different server: a log file that started as noise quietly grew to 60 GB over a few months. There was no warning. The disk just filled up.

When a Linux disk hits 100%, it doesn’t degrade gracefully. nginx stops writing access logs and starts returning errors. Databases that need to write to disk, whether that’s a WAL file, a lock, or a temp file, start failing. Applications throw write errors that look like bugs until you realize the real cause. Depending on what’s running, recovery can mean emergency cleanup under pressure while services are down.

The second problem is signal loss. Your journal is also where real security events show up: failed sudo attempts, service crashes, actual intrusion attempts with valid usernames. When it’s buried under 70,000 SSH noise entries a week, you lose the ability to notice anything real. The logs become a liability instead of a tool.

None of this is the fault of the bots. They’re doing what bots do. The failure mode is leaving the journal unconfigured and assuming it self-manages.

Bots, not people#

There are a few ways to tell automated scanning from a targeted attack.

The usernames give it away. Pull the top targets:

journalctl -u ssh.service --no-pager --since "7 days ago" -q \
  | grep -oP '(Invalid user|Failed password for) \K\S+' \
  | sort | uniq -c | sort -rn | head -20

bash

9114  invalid
7534  root
 160  admin
  98  hik
  55  oracle
  53  test
  51  ubuntu
  42  git
  40  postgres
  36  dell
  28  deploy
  27  ansible
  21  tomcat

plaintext

This is a wordlist. hik is the default user on Hikvision cameras. dell is on some Dell iDRAC systems. oracle, postgres, tomcat are server software default accounts. Nobody targeting my server specifically would try hik or orangepi. They’re running the same list against every IP they can reach.

The timing is mechanical. When I looked at one of the most persistent IPs:

journalctl -u ssh.service --no-pager --since "7 days ago" -q \
  | grep "80.94.92.65"

bash

Feb 24 04:29:44 vps sshd[31690]: Invalid user equipment from 80.94.92.65 port 59946
Feb 24 04:43:04 vps sshd[31933]: Failed password for sshd from 80.94.92.65 port 41176 ssh2
Feb 24 04:57:25 vps sshd[32182]: Invalid user zhangdong from 80.94.92.65 port 59176
Feb 24 05:10:45 vps sshd[32441]: Invalid user thum from 80.94.92.65 port 36168
Feb 24 05:24:59 vps sshd[32619]: Invalid user huan from 80.94.92.65 port 37594
Feb 24 05:38:35 vps sshd[32743]: Invalid user shengziqi from 80.94.92.65 port 51948

plaintext

Every 13 to 14 minutes. Exactly. That’s deliberate throttling to stay under rate-limit windows. Not a human typing.

Coordinated subnets. The top attackers included 80.94.92.65, .69, .70, .64: four IPs from the same /24, hitting simultaneously. That’s a botnet or a rented VPS farm running a distributed scan across the entire IPv4 space.

Your server is not the target. It’s just an address that exists.

Why anyone bothers#

IPv4 has about 4 billion addresses. Tools like Masscan can sweep the entire space in under an hour. Running these scans costs almost nothing, and the economics work even at a very low hit rate.

A server with default credentials gets compromised in seconds and immediately put to work: spam relays, crypto mining, DDoS botnet nodes, proxy services. Operators sell access to these networks or run them directly. The bots don’t care what your server does or who you are. They’re looking for the small percentage of newly spun-up machines where someone left root/password as the login, or where a cloud provider silently re-enabled password auth on provisioning.

This is why the noise starts within minutes of a server going live. Scanners watch for new IPs appearing in BGP routes and routing tables. By the time you finish setting up nginx, you’re already being scanned.

Am I already compromised?#

This is the right question to ask before anything else. Check successful logins:

journalctl -u ssh.service --no-pager --since "30 days ago" -q \
  | grep "Accepted"

bash

What you want to see is every successful login using the same key fingerprint, from IPs you recognise. What I saw was exactly that: my ED25519 key, from my ISP’s dynamic IP range and a cloud provider I use. Nothing else.

If you see an Accepted publickey entry from an IP you don’t recognise, that’s worth investigating immediately. SSH brute-force noise by itself is not evidence of a breach. It’s background radiation on any public IP.

The part that was actually worrying#

While investigating the SSH config, I found this in /etc/ssh/sshd_config.d/:

50-cloud-init.conf       →  PasswordAuthentication yes
60-cloudimg-settings.conf  →  PasswordAuthentication no

plaintext

Cloud-init, the provisioning tool most VPS providers use, had dropped a config file setting password authentication to yes. A later file was overriding it to no, so the effective setting was correct. But 50-cloud-init.conf was a loaded gun. If the second file ever got removed by a package update, password auth would silently re-enable. The brute-force bots hammering the server every few minutes would immediately start getting password prompts instead of rejections.

This is the actual risk. Not the scanning, which is just noise, but the possibility that a routine system update quietly removes one file and undoes your security config without any indication that anything changed.

The fix: edit 50-cloud-init.conf and change the value to no. Don’t delete it because cloud-init may recreate it. Just make it say the right thing so both files agree.

If you’re on a VPS with cloud-init, check yours:

grep -r "PasswordAuthentication" /etc/ssh/sshd_config.d/

bash

Then confirm the effective config:

sshd -T | grep passwordauthentication

bash

That last command is what matters: it shows what sshd actually resolved after processing all the drop-in files, not what any single file says.

If you want to audit your SSH config more systematically, I wrote ssh-audit.sh ↗ which checks for common weaknesses, flags misconfigurations, and runs a reputation check on your server’s public IP.

SSH hardening worth adding#

With password auth confirmed off, two more settings help.

MaxStartups 10:30:60: by default, sshd accepts up to 100 simultaneous unauthenticated connections before starting to drop them. Bots can hold open dozens of connections doing nothing, just occupying sshd threads. Setting this to 10:30:60 means sshd starts probabilistically dropping new connections once 10 are pending, and hard-drops all above 60.

ClientAliveInterval 300 with ClientAliveCountMax 2: sends a keepalive every 5 minutes and disconnects if the client doesn’t respond after two attempts. This cleans up ghost sessions from dropped connections.

Put these in a new drop-in file to keep the change auditable:

# /etc/ssh/sshd_config.d/70-hardening.conf
MaxStartups 10:30:60
ClientAliveInterval 300
ClientAliveCountMax 2

plaintext

Validate and reload:

sshd -t && systemctl reload ssh

bash

Fixing the log bloat#

fail2ban was already running and banning aggressively: 3 failures in 10 minutes gets an IP banned for 24 hours. After a month it had banned 1,541 IPs. That’s working correctly. If you want a cleaner view of what it’s actually doing, fail2ban-report.sh ↗ shows per-jail stats, top offending IPs, and recent bans in one output.

The logs themselves were still growing because the journal had no size limit configured. Two changes fix this permanently.

First, vacuum what’s already there:

journalctl --vacuum-size=500M
journalctl --vacuum-time=30d

bash

Then set a permanent cap in /etc/systemd/journald.conf so it never grows back:

[Journal]
SystemMaxUse=500M
MaxRetentionSec=30day

ini

Restart journald to apply:

systemctl restart systemd-journald
journalctl --disk-usage

bash

On my server that brought 2 GB down to 499 MB in a few seconds. With the cap set, the journal self-manages: oldest entries are dropped automatically when it hits the limit.

500M is generous for most personal servers. If you need long retention for debugging or compliance, raise it. Just set something so it doesn’t grow unbounded.

Is this right for you?#

If you run a public-facing server on any major cloud or VPS provider, you are getting scanned. There’s no configuration that stops it. It’s just the internet. The right response is to make sure your actual defenses are solid, not to try to make the noise stop.

Password auth off, key-only login, fail2ban active, journal capped: these are the floor, not a complete hardening guide. If your threat model goes beyond random bots, look at AllowUsers, IP allowlisting, and port knocking on top of this. For a personal server or small project, these steps get you to a state where the noise is contained and you can actually notice if something real happens.

The cloud-init config issue is worth checking regardless of anything else. It’s easy to miss, it won’t show up in any obvious error, and the consequence of missing it is that the wall you think is solid has a door in it.

Finding the source#

What’s actually in there#

Why this actually matters#

Bots, not people#

Why anyone bothers#

Am I already compromised?#

The part that was actually worrying#

SSH hardening worth adding#

Fixing the log bloat#

Is this right for you?#

Enjoyed this post?

~/share