• No results found

Part I Getting Started

Chapter 2 Climb Out of the Hole

2.1 Tips for Improving System Administration

2.1.5 Other Tips

2.1.5.1 Make Email Work Well

The people who approve your budget are high enough in the management chain to use only email and calendaring if it exists. Make sure that these

applications work well. When these applications become stable and reliable, management will have new confidence in your team. Requests for resources will become easier. Having a stable email system can give you excellent cover as you fight other battles. Make sure that management’s administrative sup- port people also see improvements. Often, these people are the ones running the company.

2.1.5.2 Document as You Go

Documentation does not need to be a heavy burden; set up a wiki, or simply create a directory of text files on a file server. Create checklists of common tasks such as how to set up a new employee or how to configure a customer’s email client. Once documented, these tasks are easier to delegate to a junior person or a new hire.

Lists of critical servers for each application or service also are useful. Labeling physical devices is important because it helps prevent mistakes and makes it easier for new people to help out. Adopt a policy that you will pause to label an unlabeled device before working on it, even if you are in a hurry. Label the front and back of machines. Stick a label with the same text on both the power adapter and its device. (See Chapter 9.)

2.1.5.3 Fix the Biggest Time Drain

Pick the single biggest time drain, and dedicate one person to it until it is fixed. This might mean that the rest of your group has to work a little harder in the meantime, but it will be worth it to have that problem fixed. This person should provide periodic updates and ask for help as needed when blocked by technical or political dependencies.

Success in Fixing the Biggest Time Drain

When Tom worked for Cibernet, he found that the company’s London SA team was prevented from any progress on critical, high-priority projects because it was drown- ing in requests for help with people’s individual desktop PCs. He couldn’t hire a senior SA to work on the high-priority projects, because the training time would exceed the project’s deadline. Instead, he realized that entry-level Windows desktop support techni- cians were plentiful and inexpensive and wouldn’t require much training beyond normal assimilation. Management wouldn’t let him hire such a person but finally agreed to bring someone in on a temporary 6-month contract. (Logically, within 6 months, the desktop environment would be cleaned up enough that the person would no longer be needed.) With that person handling the generic desktop problems—virus cleanup, new PC

2.1 Tips for Improving System Administration 35

deployment, password resets, and so on—the remaining SAs were freed to complete the high-priority projects that were key to the company.

By the end of the 6-month contract, management could see the improvement in the SAs’ performance. Common outages were eliminated both because the senior SAs fi- nally had time to “climb out of the hole” and because the temporary Windows desktop technician had cleaned up so many of the smaller problems. As a result, the contract was extended and eventually made permanent when management saw the benefit of specialization.

2.1.5.4 Select Some Quick Fixes

The remainder of this book tends to encourage long-term, permanent solutions. However, when stuck in a hole, one is completely justified in stra- tegically selecting short-term solutions for some problems so that the few important, high-impact projects will get completed. Maintain a list of long- term solutions that get postponed. Once stability is achieved, use that list to plan the next round of projects. By then, you may have new staff with even better ideas for how to proceed. (For more on this, see Section 33.1.1.4.)

2.1.5.5 Provide Sufficient Power and Cooling

Make sure that each computer room has sufficient power and cooling. Every device should receive its power from an uninterruptible power supply (UPS). However, when you are trying to climb out of a hole, it is good enough to make sure that the most important servers and network devices are on a UPS. Individual UPS—one in the base of each rack—can be a great short-term solution. UPSs should have enough battery capacity for servers to survive a 1-hour outage and gracefully shut themselves down before the batteries have run down. Outages longer than an hour tend to be very rare. Most outages are measured in seconds. Small UPSs are a good solution until a larger-capacity UPS that can serve the entire data center is installed. When you buy a small UPS, be sure to ask the vendor what kind of socket is required for a particular model. You’d be surprised at how many require something special.

Cooling is even more important than power. Every watt of power a com- puter consumes generates a certain amount of heat. Thanks to the laws of thermodynamics, you will expend more than 1 watt of energy to provide the cooling for the heat generated by 1 watt of computing power. That is, it is very typical for more than 50 percent of your energy to be spent on cooling. Organizations trying to climb out of a hole often don’t have big data centers but do have small computer closets, often with no cooling. These organizations scrape by simply on the building’s cooling. This is fine for one

server, maybe two. When more servers are installed, the room is warm, but the building cooling seems sufficient. Nobody notices that the building’s cooling isn’t on during the weekend and that by Sunday, the room is very hot. A long weekend comes along, and your holiday is ruined when all your servers have overheated on Monday. In the United States, the start of summer unofficially begins with the three-day Memorial Day weekend at the end of May. Because it is a long weekend and often the first hot weekend of the year means, that is often when people realize that their cooling isn’t sufficient. If you have a failure on this weekend, your entire summer is going to be bad. Be smart; check all cooling systems in April.

For about $400 or less, you can install a portable cooler that will cool a small computer closet and exhaust the heat into the space above the ceiling or out a window. This fine temporary solution is inexpensive enough that it does not require management approval. For larger spaces, renting a 5- or 10-ton cooler is a fast solution.

2.1.5.6 Implement Simple Monitoring

Although we’d prefer to have a pervasive monitoring system with many bells and whistles, a lot can be gained by having one that pings key servers and alerts people of a problem via email. Some customers have the impression that servers tend to crash on Monday morning. The reality is that without monitoring, crashed machines accumulate all weekend and are discovered on Monday morning. With some simple monitoring, a weekend crash can be fixed before people arrive Monday. (If nobody hears a tree fall in the forest, it doesn’t matter whether it made a noise.) Not that a monitoring system should be used to hide outages that happen over the weekend; always send out email announcing that the problem was fixed. It’s good PR.