Latest Posts

Most Popular Posts

Backups are a necessary evil in System Administration, and although most of us dislike the process, it is by far the most critical element under the IT umbrella. I like to think of the whole process as a Recovery Plan instead of a backup plan, because in the end all I care about is that the data is recovered properly and quickly. One of the biggest pitfalls new System Administrators or System Administrators new to a particular company do, is that they do not test their own backups. Not being able to recover information from a system you designed or recommended is the quickest and surest way to get fired. 

 

1. Risk Assessment

Although this sounds simple, a true risk assessment is rarely done and far out of the reach of the average business. Although hiring actuaries and combing through insurance statistics is ideal, this is far from what companies are willing to do for a data recovery plan. Many System Administrators find companies that have not experienced data loss are less willing to be thorough in their analysis and budgeting.

One of the first things in a recovery plan is to write down the possible external risks, some examples:

  1. Fire
  2. Flood
  3. Earthquake
  4. Tornado
  5. Physical Break-in / Theft
  6. Virtual Break-in (Cracker)

Ask your Insurance Company what some likely issues will be, they will be happy to tell you every possible disaster scenario.

Next, think of internal risks such as:

  1. Viruses/Malware
  2. Data Corruption
  3. Hardware Failure (Electrical or Mechanical)
  4. User Error (accidental deletion)
  5. User Malice (non-accidental deletion)

To supplement risks, look through your company’s history for any previous data loss and the reasons for it. Enlist the help of your industry colleagues for any scenarios not on the list. You’ll be amazed at some of the “once-in-a-lifetime” stories you’ll hear – some may be applicable to you.

 

2. Impact Rating

An impact analysis on each scenario listed above should be created. This involves the hardware, software and data (all systems) that are affected and how. Does the impact involve a full or partial outage? If hardware is likely damaged, how quickly can it be replaced? Etc. Suppose there is a fire in a server room and the servers are damaged. You have the LTO tapes as backups, but no server, and no LTO drive to restore it with. How many days will it take to get an LTO drive and from where? During this phase of planning a vendor and consultant list should be compiled.

 

3. Risk Rating

This can be included in the budgetary section, or done beforehand. Combine the risk assessment items and impact ratings and sort them. This is important. You should implement a recovery plan that encompasses as many items as possible. By sorting, it makes the budgetary step easier when you need to cut coverage because of costs. 

 

4. Methods

There are many methods to ensure data continuity in certain scenarios, but be careful as there is rarely a one-size-fits-all approach to backups.

Mirroring: Is designed to mitigate single-point hardware failures. If your database server fails, having a mirrored server will ensure your data is available. Mirroring at another location may also solve router, switch and connection issues for external clients. Mirroring does not protect against corruption, viruses, user error or malice. On-site mirroring does not protect against theft, fire, flood, etc.

Removable Backup Media: Backups to media protect against issues such as corruption, viruses, user error or malice. If you leave these items on-site they will not protect against theft, fire and flood. If they are taken off-site, it will take longer to retrieve your data in case of loss. With backup tapes, hard drives and cds, the backup data itself is typically a day or older. If this method of backing up is your only method, be sure your business can survive with older data. Be sure to have multiple days of backups, or a weekly backup with incremental backups per day. Often times users will not report data loss until days after the event, by which time relevant backups have already been overwritten with newer, useless backups.

Non-removable Backup Media: Items such as NAS (Network Attached Storage), DAS (Direct Attached Storage) or SAN (Storage Area Network) can be used to backup servers, virtual machines and data. The issue with these is that they are not removable. This will not protect against theft, fire, flood, etc.

Be careful of proprietary systems used to backup your data. Be sure to audit your recovery scenario regularly to ensure your backups can be recovered. Companies go out of business, and items are discontinued. Do you have any backups on jazz drives? How difficult would it be to recover if you had to find a new jazz drive? Don’t know what a jazz drive is? Exactly!

 

5. Budgetary Concerns

With your sorted list in hand, you can now plan for the items you need to mitigate any disasters. Protecting against many scenarios may prove to be prohibitive in cost. If you do not make the budgetary decisions, be sure that your list is as comprehensive as possible. It is up to IT to determine the impact of all scenarios, and it is up to the budgetary members to determine how much they want to spend. If they say no, you have at least outlined all the possibilities.

In your cost analysis, include replacement or redundancy items such as:

  1. Backup Storage (Tapes & Tape Drives, Hard Drives, CDs)
  2. Backup Servers, NAS, SAN, DAS
  3. Mirrored Servers
  4. Redundant Connections (Internet and Cabling)
  5. Backup Routers, Switches, etc

Part of your impact analysis should include what is damaged or lost. If you have the tapes, but no tape drive, you will need to replace it in order to retrieve the data. Make sure you have the ability to read your backups when you need. If it takes 3 days to ship a new tape drive, but the cost is minimal, consider having a backup tape drive in stock.

While they are numbers out there for determining how much to spend on a backup and recovery system, you should make your decisions based on the impact and risk. If your data is your business’ main asset, you should spend a larger chunk of your budget to protect it. If time is critical in retrieving your data, the solution may include keeping extra hard drives, servers, router and switches in stock. If time is not an issue and an outage can be handled for days, you can order items at the time of recovery.

Determining budget can be a mix of preventative costs and the cost of downtime to the business (lost sales, lost productivity). Ideally disaster scenarios should have a cost to the business attached to them. If a server failure results in $0 productivity for the day, the overall impact can be many thousands of dollars per day – that fact alone may convince management to have a redundant server available.

 

6. Deployment and Testing

Don’t forget this step! Backups are useless unless they can be recovered. Take a weekend to simulate a likely recovery scenario. You may be surprised at all the “gotchas” when recovering data. Common stumbling blocks include not backing up database logs that are critical to recovery (ex. Exchange Server), or recovering to dissimilar hardware (Ex. RAID5 on a different controller).


(average: 4.00 out of 5)

How do you solve simple computer issues? Hopefully not like I do.

Do you start with the simplest solution first? Or do you start with the most likely solution?

Suppose you return to your computer, open up Google and nothing displays. What do you do first?

Do you:

  • Check the router for status indicators and connectivity?
  • Check the modem for status indicators and connectivity?
  • Call your ISP immediately?
  • Ping a known good server?
  • Check for viruses and malware?
  • Delete all DNS cache?
  • Confirm IP and gateway settings?
  • Examine the Hosts file?
  • Check the network cable?
  • Reseat the network card?
  • Something else?

Most of us tend to test what the most likely cause should be. Every once and a while even a good technician will overlook the obvious either by discounting the possibility of it happening, or simply forgetting it as a step.

In my case the Internet wasn’t working. With a complete lack of problem solving procedure, after checking numerous items and not the obvious ones, I was ready to declare it a hardware problem – when I noticed my cat was chewing on something…

Chewed up CAT5

How many times have you overlooked the obvious or forgot a simple step that would have saved countless hours in diagnosing a problem?


(No Ratings Yet)

Intel has released the long-awaited trim feature for its newer 34nm SSDs (Intel X25-M and X18-M G2). For those of you with the older G1 series, there is no trim feature available.

Trim is enabled by installing the firmware update from Intel to bring all G2 SSDs to the 02HA firmware. http://www.intel.com/go/ssdfirmware. When updating, remember to take your system out of AHCI mode (but put it back after the update). The firmware tool cannot update drives in RAID.

You must also download the SSD Toolbox from Intel. It is recommended to run the trim tool daily for optimal performance (scheduled task) if you are using Windows XP or Vista. Windows 7 users will not need to run it as long as they have the drive in AHCI mode. http://www.intel.com/go/ssdtoolbox

The trim feature in Windows 7 helps to alleviate the ‘re-write’ penalty found in most SSDs. When you have a fresh SSD, unwritten blocks only require one operation to fill, whereas a previously full (even if the data is deleted) SSD requires two operations to fill a block.

What trim does is ‘zero’-out the SSD’s free space to return it to a factory fresh state. In a previous entry I described how to do this (but it required erasing the whole drive – not very useful).

Here are some benchmarks from PC Perspective outlining the IO improvements of the Intel X25-M with trim enabled. If you run a database or web server you’ll want to use the trim feature. Anandtech’s writeup is here (more technical).

UPDATE: I’ve just updated the firmware on my Thinkpad T400 with the 34nm G2 X25-M 80GB SSD. Everything went smoothly, the firmware update DID NOT erase any files. Before you do the firmware update, backup all of your files with the expectation that you will need to reinstall everything! Just because it worked for me, doesn’t mean it will work for you. Below are the screenshots step by step just in case you were curious. It took about 12 minutes from downloading the ISO to booting Windows 7 back up.

AHCI SATA in BIOS

Intel SSD Firmware Update Step 1

Intel SSD Firmware Update Step 2

Intel SSD Firmware Update Step 3

Intel SSD Firmware Update Step 4

Intel SSD Firmware Update Step 5


(No Ratings Yet)

I’ve been waiting for the new 1156 processors and boards to come out in order to “upgrade” my home system. Last week I put together the following:

  • Core i7 860 (2.8 GHz) – New
  • Asus P7P55D – New
  • Corsair 8GB XMS PC3-10600 – New
  • EVGA GeForce GTX 260 Superclocked – Existing
  • Intel X25-E 32GB – Existing
  • WD VelociRaptor 150GB (x2) – Existing
  • WD 1 TB – Existing

I was looking through Futureshop’s website tonight and found the “Upgrade Advisor” tool…I thought: OK, let’s see how my new system ranks against the best.

futureshoprank

Well. Can’t say I’m surprised. I know the machine isn’t the best, but still, 4/5 on TV recording? 3.5/5 for ripping music and managing photos?! I’d like to know how they do the rankings – even if the i7 860 was “too new” to be included, surely the 8GB RAM would rank pretty good for managing photos?

The tool is made by FutureMark and Intel. I guess I’ll need to wait to get an 8 core system, 32GB RAM and 4x GTX 295s in SLI before I can surf the net and manage my photos.

But seriously, there are “regular” people using this tool and thinking “oh no, my Phenom 940 is slow, it says I should get a Core 2 5600…”

BTW, the i7 860 is pretty good. It is roughly the same price as the i7 920 (1366 pin) but the 1156 platform is much less expensive.


(No Ratings Yet)

By now many have heard about the performance degredation found in Intel SSDs due to the write/rewrite commands. Although they remain incredibly fast, there are some instances where you may wish to “reset” the drive or at least secure erase the drive for a second sale or install in a different computer or server.

An Intel quote: “An alternative method (faster) is to use a tool to perform a SECURE ERASE command on the drive. This command will release all of the user LBA locations internally in the drive and result in all of the NAND locations being reset to an erased state. This is equivalent to resetting the drive to the factory shipped condition, and will provide the optimum performance.”

The Center for Magnetic Recording Research no longer has HDDErase 3.3 on their website which is needed to secure erase the Intel X18-M, X25-M and X25-E. HDDErase 4.0 is not compatible with the Intel SSDs but should be used for all other hard drives. HDDErase 3.3 is available below:

Download HDDErase 3.3 (Intel SSD Compatible) here.

Included in the zip file are usage instructions. Be sure you can create a DOS 6.22 boot disk (in Windows XP explorer, right click on the “A drive” and select “format” and “create boot disk”). Then include the HDDErase.exe file on the disk.

You must also disable AHCI (SATA Mode) if enabled in your BIOS before you boot into DOS for the utility to run and work properly. Most BIOS will have an option to emulate IDE mode for SATA ports. Be sure to switch it back to AHCI once you are done.

Secure erasing the Intel SSD only takes about a minute.

sec-erase-0  sec-erase-01 

sec-erase-02

sec-erase-2

sec-erase-3

sec-erase-4


(average: 3.00 out of 5)
Page 1 of 3123

How often do you back up your important files?

View Results

Loading ... Loading ...