Adventures in HPC: January 2012

Monday, January 30, 2012

Dell's "helpful" new network device names

Traditionally on Linux hosts, your Ethernet devices are prefixed with eth, so your primary Ethernet port is eth0, followed by eth1, etc. Almost every Linux distro follows this (for wired LAN interfaces - wireless is a whole other bag). It makes things like scripting network configuration in your automated installers a lot easier, since you know the primary interface should almost always be named eth0 (there are some exceptions, but we deal with those as they come).

That is until Dell decided to change their naming scheme for RHEL 6 hosts.

Under the New World Order, Dell has "helpfully" decided to rename network interfaces emX for onboard devices, pX for PCI-slot cards. Now, I sort of understand their reason for doing this. There are some cases where network devices will be enumerated differently, particularly when using PCI-based network cards, and also when switching kernel versions. This can result in unwanted changes to your network order, where eth1 suddenly becomes eth0, or eth[0-3] disappear and become eth[4-7]. This problem can be mitigated by leveraging UDev's persistent network interface naming, which is precisely what Dell has done.

I think they went a step too far though, especially since emX is not a standard across server platforms. For Dell-only sites, this isn't a big issue, since all of your hardware will follow that paradigm. When your install scripts depend on interfaces being named eth0, however, things break horribly.

Fortunately you can disable this "feature" by specifying "biosdevname=0" as a kernel boot argument.

Saturday, January 28, 2012

The case of the disappearing node

When you work in an environment with thousands of computers, you're bound to come across strange problems that make you go "Hmm..." Parts fail, computers die, switches go bad. But they don't always die in the most obvious ways.

We got a report a month or two ago about a node in one of our 10 GigE clusters which apparently went offline for some reason. After a student went to look at the node, they found the node seemed fine, but it didn't have any network connectivity. They started down the normal path of troubleshooting: flip the cable, change the cable out, replace the NIC. Still nothing. So they decided to exchange cables with the node next to it. This is where things got really weird...

Normally when you switch cables with the adjacent node after you've already replaced the cable, you're looking to see whether the node or the switch port is bad. Again, switches go bad, it's nothing new. What you'd normally expect is for the problem to either stay on the original node (the node is bad) or the problem moves to the adjacent node (the port is bad). What you don't expect is for the problem to disappear. Both nodes now connect without problems. Move the cable back, and the problem reappears on the original node. Hmm...

Diving into low-level networking, you learn that every network interface has a clock source in the PHY layer. This seems pretty logical, as you have to establish time somehow. We want to make sure all of our interfaces use the same clock speed. Unfortunately, clock sources aren't perfect, and over time they may drift one way or another. We call this network jitter. Normally, it's within a certain range and can be corrected for in software. Sometimes, however, things start to drift in different directions. One clock may slow while the other speeds up.

We haven't yet resolved the problem of the disappearing node. An obvious solution is to leave the cables flopped. This solves the short-term goal of getting a node returned to service, but leaves the original problem of PHY jitter driving connections into the ground.

The long-term solution will probably be replacing the switch. Our switch vendor probably won't like that solution, but we don't like nodes that go poof.

And I get tired of saying "hmm..." all the time.

Sunday, January 22, 2012

Back home again in Indiana

Time to try blogging again. I've been wanting to get into blogging (and podcasting) for a while now, and figure life is getting interesting enough to start again.

So what's the blog about?

The main focus will probably be work (as the title suggests), since I don't think high performance computing has enough voice from a sysadmin perspective sometimes. HPC programmers love to talk and dream and voice their opinions, but a lot of times I feel their perspective gets skewed. Even the admins that are voicing opinions seem to be out of touch, as they tend to work on smaller scale machines. More on those ideas in future posts.

When I run out of technical topics to talk about, there's always music and other current events in my life. At least I find them interesting. Your mileage may vary.