Modelling University student Internet usage

I am working on a project for a university housing client where I need to model the usage patterns of students living on campus. There are obviously many variables at play here, I am keen to understand how they would impact such a model.

There are many parallels that can be drawn between this and a normal office/dc scenario - however I believe University students in a residential scenario will not fit any corporate models (due to online gaming, filesharing, skype etc...). In my project the design will be a hub/spoke. The data center will have a large Internet trunk feeding into various firewalls, proxies and servers managing user access. There are WAN links out to each of the student sites. I need to be quite accurate in modelling link size and usage patterns on each of the links.

For example as a baseline I have assumed that the Internet pipe will need to be at least 200Mbps at the data center. For the WAN links I have a mix of 50M, 100M, 200M. Are there any models I can use to test my baseline to see what sort of performance can be expected by the students... eg. If Skype is allowed on the network, will my model stand up if the load is at 60% across the network.

I know this is a very open ended question. There is not going to be a correct answer (unless someone has a model they built for this very scenario) I am more interested in the discussion that might come from it as there are so many things that need to be factored in. Would love to hear some opinions.


I don't have a model for this utilisation, but I did manage a University halls network back in 2005.
We had a central hub-and-spoke topology, with 1Gbit/s inbound from cable & wireless. We broke that down into 100Mbit allocations, and piped those out to the halls over single mode fibre.
At the access level, we had a metric boatload of Cisco 4006 chassis switches, each with as many 48 port 10/100 line cards as we could fit in.

All ports had a max speed of 10Mbit, half-duplex (no idea why half-duplex, but it "had always been that way"). There was also MAC address port security, and a complex student signup procedure that meant we had to configure the port security on their port from their MAC when they register. This was supposed to be some protection against students putting a switch in their room. It didn't work.

Lessons I learnt:

  • If you can imagine students might do it, they are doing it. (This pretty much covers all types of VoIP, Gaming, Pornography)

  • If you think you've got good firewalls to block P2P traffic, you haven't. (DC++ was the bane of our existence at the time, it wasn't so much people sharing and seeding out to the internet, but inside the LAN).

Other thoughts:

Testing

Consider contacting Spirent as they make a bunch of traffic generators / network tester hardware which could be invaluable in simulating / emulating 16,000 horny students.

Caching

Consider putting a transparent proxy in between the main feed out of the halls and the external connection to the internet. I guess you'll want 10-15TB of cache space, and using something like a cluster of Squid proxies, you should be able to massively limit the amount of internet traffic. It's something I do at events sometimes, especially when the bandwidth is limited. So much of what people browse for is cacheable, and you don't need to re-request it every time.

Clever buggers

No matter what restrictions you place on the speed, the amount of QoS, the level of VLANs, you'll always get a few bright spark students who try to circumvent the network. Hire them. (That's how I got a job working for hallsnet!)


In order to build your model you need observations from your environment. The best place to get them would be from your network's current traffic. If I were in your place I would be trying to get Netflow data from your routers for the past year (if possible), or at least a full semester.

You can determine types of traffic using flow-tools (and optionally JKFlow if you want pretty pictures).

Armed with that information you now know (a) What kind of traffic you're producing / consuming, and (b) How much of each type of traffic you're generating. You can combine this information with campus population data (number of students, faculty, staff) to figure out, roughly, how much traffic a person produces, and work out an equation for the average student/professor/staffer.


How detailed you make the model is up to you, and semi-dependent on your network architecture. For example, if your dorms are contained to a specific subnet you can model dorm traffic separately.

Going further, you can model specific dorms, and with the help of university administration telling you how many students in each dorm are in a given major, even correlate that data to a limited extent.


The Netflow traffic data is also a very useful monitoring tool - If you aren't already collecting it, you should be. It will be interesting (at a minimum), and helpful (when stuff goes wonky on the network and you need to figure out why).