How to identity the source of network egress?

I've been getting billed for "Compute Engine Network Internet Egress from Americas to APAC" on the order of $60-$70 a month (500-600GB). My instances are all in us-central1.

I've seen plenty of one-off probes at my http services in my logs, but nothing like this kind of volume. Just a request for a word-press admin resource now and then.

I've also been seeing bunches of ssh attempts that get rejected. I haven't worried too much about them until seeing this line item in my bill. Is it possible that these ssh attempts can add up to that volume of egress traffic?

Currently my firewall allows https through the load balancer only, and ssh from anywhere. The other defaults are still in effect too--ICMP from anywhere and RDP from anywhere (I don't use RDP).

I'd like to have a better idea where the traffic is coming from so I can update my firewall effectively.

EDIT

I've closed off my firewall for ICMP and SSH and verified no new invalid login attemps, and no impact on the rate of egress. I gotta say this is what I'd expect.

After running some iftop and mtr on my instances, I've begun to suspect this volume of egress might be my application's own traffic to graph.facebook.com. However this leaves me confused, since I've seen that domain resolve to IPs in the US and in Ireland. So I could believe a small volume of traffic to the Europe and I'd expect the "Americas to Americas" volume to be higher. But how does this translate into APAC???

EDIT 2

I inverted the bulk of my traffic to graph.facebook.com. Instead of pushing image data in my posts, I'm providing a URL to my Cloud Storage with CDN, and letting Facebook pull the image data.

This action cut my daily egress to APAC from about 16GB to abou 1GB. It seems pretty clear that Google is accounting traffic to Facebook as APAC.

But that makes very little sense to me giving the locations of Facebook's data centers. However, that's a new and different question, and I'm going to consider this one pretty much closed. I'm going to add an answer to this question which includes the network tools I used in diagnosing.


Solution 1:

Reiterating for anyone who is facing similar issue in 2021:

Ingress (incoming packets) is always free, egress (packets leaving GCP) pricing can be checked here.

You can use VPC Flow Logs to determine your egress target. While flow logs are free, you will be charged for the amount of data logged.

Applications like oftop, mtr and application logs are useful when analyzing data ingress and egress.

Unwanted traffic can be blocked using Firewall rules.
For better control, enable Firewall Rules Logging.