Data Recovery Tutorial (Basic)

As I hinted at with the last news post, I recently suffered what some might describe as a “catastrophic” data loss. An obscene number of important family photos, documents, and all of my music, suddenly vanished before my eyes thanks to a number of conspiring factors. All was not completely lost though, since I know I few (very) basic rules about recovering files from NTFS partitions. To help some of you out who may be going to ventricular tachycardia because you’re in a similar situation, here’s a very basic guide on getting some of your data back. If you’ve used data recovery apps before, or if you’re like me and just make shit up as you go along, you won’t find much of value here. For everyone else, hit the jump. Disclaimer: I don’t do this for a job, so try any of this at your own risk. Comments, corrections or suggestions are welcome.

I’m a nurse by profession, so I do a lot of patient education. This article will be no different. Before you jump into it, it’s important to remember a few rules and a bit of theory for data recovery. We’ll be discussing NTFS file systems here, not ext3 or some other weird system you’ve dug up. Firstly, in most cases data dies hard; in many cases when you “delete” a file, or format a partition (particularly using the “quick” format option) the data is actually still there. In basic terms, what happens is that the actual data for the file is still written to the drive or the storage device, but the space it occupies is now flagged as “free” so that something else can overwrite whatever space the file was using. The file appears invisible to the user and can’t be accessed normally (ie from Explorer) but the data still exists. This is completely different to the Recycle Bin, which is just a holding folder for files that you intend to delete at some other point.

Even when something major goes wrong, like you accidentally perform a quick format on a drive, or the partition or master file table goes tango uniform, data might still be recoverable. The files might still be there, just invisible because there are no pointers for the system to locate them. Failing that, some apps will simply go sector by sector to search for media and rebuilds it cluster by cluster. The way most data dies in the home environment is either through a hardware failure (ie a head crash or flash memory failure) or by new data overwriting the old data. Remember if a file is deleted, the OS considers the space to be “free”, hence it will come along and write a file to that space. It might overwrite part of the file, or the whole lot, depending on what it wants to do at the time. If a file is overwritten your chances of recovering it drop drastically; you might be able to get part of it back, or you’ll get nothing back at all.

Therefore the first rule of data recovery is DO NOT WRITE TO THE DRIVE YOU ARE RECOVERING FROM. At all. Don’t defrag it. Don’t install your data recovery software on it. And don’t, whatever you do, restore data to it! Restoring data back to the drive you’re recovering it from creates a fatal loop which will wreck your chances of recovering further data. Sometimes though you might not be able to avoid writing to the drive; in my case I didn’t realise that the data wasn’t backed up when I formatted the RAID array and installed Windows again, so I just had to try to recover what I could. Always recover the data to a separate DRIVE (not a separate partition!), so that you minimise all writing operations to the drive with your data that you intend to recover.

So, with that in mind, what can you recover? We won’t know until we have a go, but in general the less write actions performed, the greater your chance of recovery. So if you just wiped a partition, went “OH SHIT!” and shut the system down, chances are (depending on how you wiped it) the data is still there and the majority will be recoverable. If you have written to the drive, but only a little bit, you’ll probably recover a fair chunk of data. If you’ve written a bunch of stuff since then, your chances reduce. If there’s a physical hardware failure then I can’t help you, and you might just have to use a data recovery service or just bin the drive and cry in the corner.

Alright, classroom theory over. Procedure manual starts now.

Step 1: Get your software
I tend to use Recuva, a free app for recovering files from NTFS/FAT partitions. Recuva does a fairly decent job, though it’s fairly slow. My main reason for using it is because it’s free and it does the job. There’s an obscene amount of software out there, but I’d advise you to be cautious when spending money to purchase one; if possible get a trial version to see exactly what it can find. If the trial doesn’t show you all of the files that can be recovered, or doesn’t give you an accurate idea of your chances of recovering a file, don’t purchase the program; more often than not, they’re completely useless. Whatever happens, get your application and install it onto a second drive, or onto a USB flash drive; most of them will fit on a small drive.

Other programs that are quite popular are PhotoRec and TestDisk; these two form part of a companion package. PhotoRec is primarily aimed at recovering specific filetypes (mostly photos, documents, and archives), and it does a pretty damn good job at it. It’s pretty slow but it may be more successful than Recuva. It’s a pretty inelegant result though; basically it searches the entire drive and dumps whatever it can find into a folder, preserving no file names. That said PhotoRec tends to be very good at recovering lost files, so it’s worth learning how to use it. It’s not hard.

TestDisk does a hell of a lot more than just recover data; it also recovers lost partitions, boot sectors, fixes MFT problems, and a lot more. It can also recover files but it’s not quite as good at it as PhotoRec; all it does is scan NTFS MFT entries for any deleted files and lets you attempt to recover them, which isn’t quite so useful if things have really gone to shit. If the partition simply isn’t showing up at all, or the disk is reported as being empty with no partitions, then it’s worth trying TestDisk to see if you can recover it and fix the partition. Both PhotoRec and TestDisk are command line apps (though they do have a rudimentary console interface so it’s not a long command with like 20 switches trailing behind it) so if you demand a GUI you’re out of luck, but they’re fairly easy to use.

Alternatively, you can use something like UBCD, a bootable CD which contains a large number of tools for fixing major problems, or run an Ubuntu Live CD and download some of the same tools if you’d rather work in a full GUI. I won’t cover any of those scenarios but Lifehacker has a good guide on the Ubuntu method. This method is particularly beneficial if you’re intending to recover something from a drive which has the only functional operating system on it, and you can’t take the drive and put it into another box. For this article I’ll assume you’re using Recuva or PhotoRec.

Step 2: Isolate the drive
If you’re using a live CD you’ve done that already, but for anything else just remember not to write to the drive. Don’t install new applications, don’t defrag it, and don’t copy anything to it. If the drive happens to contain the operating system or is used for a lot of write operations for the OS (ie contains a swap file or temp directories) it’s best to take the drive out and put it into a different box to minimise the number of writes to it. You don’t want to boot it up and have Windows start writing shit to the drive (and Windows loves to write shit).

Step 3: Scan
If you’re using Recuva, it’ll show up with a lovely little wizard, which you can either ignore or choose to use. Whatever you do, your best bet is to select Deep Scan and let it get to work. If the storage device is big, you can expect it to take a long time, so go amuse yourself as you see fit. If you’re using some other program, refer to that program’s guide. Some will do a “deep scan” by default. If it takes a few seconds to scan and brings up nothing, it’s probably just finding orphaned files or recent deletions, which is fine if you’ve accidentally emptied the recycle bin without thinking, but next to useless if you’ve accidentally formatted the drive or done something else. To get the best chance of finding the file, a deep/full scan is the best way to go.

After scanning the program should display all the files it has found. From here you can usually start to look at the discovered files and figure out what it is you want to do with them. Recuva’s interface will display all files that it found, along with a guide on what it can do with the file. Red files most probably can’t be recovered; normally this means they’ve been overwritten. Strangely enough though I have been able to recover some of these files. Yellow files may be recoverable with some damage, or they may not. Green files can most probably be recovered, but I know from experience that this is no guarantee. Revuca also gives you a few bits of information about the file, like its most probable former directory, the date it was last modified, and its filesize. It can also filter results either by keyword or by filetype. Recuva’s interface is fairly slow (especially if searching by keyword, man it’s shockingly slow!) but it does get the job done.

Find the files you want and take a look at them. Recuva includes a preview frame which shows the file’s contents (if possible), info about the file, and the header for the file (in hex). Generally, if you can see it in the preview, you can probably recover the file. Some of the files you’re looking at will no longer have their appropriate filenames and Recuva might not be able to determine what directory they were originally found in (ie Path might display “C:\?\”). If that’s the case, you’re going to have to use a bit of brainpower to track down the files; try sorting the list by filesize or modified dates, or you can try the path to start with. Unfortunately there’s no quick way to be certain that the file definitely wasn’t found except to carefully go through everything. If you can’t be bothered, then hey, it’s your data, so I don’t really care!

If you’re using PhotoRec, the first thing to do is select the drive or media that you want to recover. PhotoRec doesn’t present friendly names; it lists it in the pretty raw format of “mount point – drive stats”. Check that you’ve got the right drive, and double check; you don’t want to waste time with PhotoRec scanning the wrong drive, because the scan will take a LONG time. Once the drive has been selected, you’ll be asked to choose the file system. If you’re running Windows, select the first option (Intel) as this covers FAT/NTFS file systems. Next you’ll be asked to select the Source Partition – this is the partition that you want to recover files from. Next it’ll ask for the file system type for the partition – since we’re recovering from a Windows partition hit OTHER (in fact you’d hit OTHER for any system that wasn’t EXT2/EXT3).

Next, select whether to scan the entire partition or just unallocated space. If you’re trying to recover data from a drive that appears to be corrupted or has somehow failed (ie the files are visible but can’t be opened, or you want to recover files that weren’t actually deleted) then select WHOLE. This will go through the entire thing and recover every file it possibly can find. If you’re only after deleted files, and the rest of the drive is fine, select FREE. This will only check unallocated space for files and recover them. Again this is only useful if the files were deleted. If you’re looking to attempt a full recovery, use WHOLE. Finally, choose a location to send the files to. Like with any app send them to a new drive. Note that PhotoRec will take quite a long time, depending on the size of the drive, and it performs two passes to get all the data. The timer is fairly inaccurate though, so if it says 12 hours or something, take it with a grain of salt. You can interrupt the process. Pass 0 is where the first 10 files are searched to determine block size and shouldn’t take too long. Pass 1 is where the files are recovered. Note that PhotoRec might appear to get stuck on a series of sectors, but if you watch the “files found” counter and watch the directories, you can see it finding more files. I don’t know if this is a bug, or just the way that it works, but it’s definitely still doing something (namely, recovering files).

Step 4: Recover
Once you’ve located the data, in Recuva tick the files you want to restore, and restore them to a second HDD or other storage device. Check through the files to establish that the have been recovered successfully. If some files aren’t displaying, check to see if they display in Recuva. If they do, try to restore again (to a different folder). If they don’t, the file is toast. Note that from what I’ve seen, Recuva seems to screw up large batches of recovery commands, so for the best chance at recovering your files, do it in small groups. This seems to happen even on files that supposedly have an “excellent” chance at recovery.

With PhotoRec, it creates a massive number of folders called “recup_dir.#” where # is a number (strangely enough) starting at 1. There are no valid filenames during recovery; everything gets some sort of alphanumeric designation; I don’t know how it decides to name files (could be arbitrary for all I know) but nothing gets its proper file name. While the recovery is in progress feel free to browse the folders. I’ve noticed that PhotoRec seems to occasionally clump files together that were in the same directory tree, but otherwise there’s little to no organisation; files are just recovered and dumped in the folders. Now you have the wonderful task of searching through all the folders and figuring out which files are the files you want.

Step 5: Profit
After you’re certain that you’ve got everything you possibly can, you’re finished. Now you need to look at a suitable backup solution, or if you’re me, you need to rage at your misfortune.

Q: My files weren’t discovered by Recuva/PhotoRec! Are they gone for good?
A: Probably. If you really dig around you can probably find yourself a crash course in computer forensics and see if you can manually bring the files back, but you’d need to get a fairly good understanding of how the filesystem works, and that’s something I can’t tell you (because I have very little idea myself of exactly how it all works). If the data is life-or-death stuff, you can take it to a recovery specialist who has all the equipment and training to work wonders. Note that it will cost you (and might cost an obscene amount).

Q: Recuva recovered some files, but they don’t open. What’s gone wrong?
A: Sometimes Recuva will write a file, but there’s nothing actually there when you go to open the file. In most cases this simply means Recuva hasn’t been able to properly recover the file. Whether or not the file can still be used depends on how badly damaged it is; some files might be somewhat recoverable through manual means (ie you might be able to rip part of the text content from a Word doc if you have the tools to do so), but otherwise the file is dead in the water. If you’ve still got Recuva open, try to restore just that single file on its own (ie don’t select it as part of a group) and see if that helps.

Q: Recuva recovered some pictures, but they’re all messed up. What gives?
A: Taking a guess here, but it’s probably an attempt at recovery which has gone wrong, probably due to the file being partially overwritten or fragmented, but Recuva has had a go at it anyway. PhotoRec tends to do the same thing. You might also notice similar behaviours with music or movies; part of the file will be there (like a few minutes) or it might be missing large gaps. Similar sort of story; the app did its best, but it couldn’t get the complete file back.

Q: I got back a lot of files, but some of them seem to be much larger than they were originally. What gives?
A: Again, taking a guess, but I think occasionally the recovery app recovers some further junk data, or writes something to the file that was useful in recovery but doesn’t make up part of the actual content. I can’t remember exactly what the reason is but I have noticed that some files are inflated (sometimes by several megabytes) after recovery. I believe there are some apps to fix this, but I can’t remember. Google it.

Q: I searched for pictures but there’s a lot of these little pictures that have come from Christ knows where. What are they?
A: If you search the whole drive for image files you’ll likely pull up images that are used as application icons or that were stored as part of your browser’s cache. Either way they’re probably useless. Well, most people would consider them useless, maybe you like to sift through your cache to find pictures. Not my cup of tea, but to each their own.

Q: Recuva/PhoroRec brought back my files, but they’re all named weird things and numbers. What did I do wrong?
A: Nothing. Sometimes Recuva can’t pick up on the original file name, so the file ends up with a numerical designation (presumably given by Recuva). Sometimes Recuva attempts to use other identifiers to create a filename; I’ve had it pull apparently random tags out of photos and use them as the file name. PhotoRec just doesn’t give a shit about filenames and gives everything a new name.

Q: The drive simply isn’t recognised at all/makes a weird scratching noise/caught fire/got up and walked off. What can I do?
A: In the event of hardware failure, you’re on your own. Simple little failures might be fixable by using a few unorthodox methods but most of the time if the hardware has shit itself, it’s gone unless a specialist can bring it back. Your home office/kitchen/garage/pillow fort is not sufficient for opening up a hard drive and tinkering with the insides; you need a proper clean room to prevent anything getting on the surface of the platters.

Q: My friend says that because the Zero Sector of my Hard Disk Drive is confabulated, I can’t protract my datas. What does this mean?
A: It means your friend is an arsehole and should be shot.

Q: What sort of media should I back up to, so that this never happens again?
A: Multiple forms and multiple locations if at all possible. I’m planning to follow this up with a backup guide, but in general remember that all forms of media can fail. DVDs especially are notorious for being useless coasters after a period of time. Hard drives seem like the logical choice, especially since they’re cheap for massive amounts of data, but they too can fail. An external drive, properly stored and not in constant use, would probably be “reasonably safe” assuming that no harm came to it. I’d advise against “syncing to the cloud” unless you have small amounts of data, a robust connection, or it’s a fully paid service that isn’t going anywhere soon. That’s just my opinion; I’d much rather trust my local efforts than one on a server that I can’t see or control.

Q: Is there a way to configure PhotoRec to find specific types of files?
A: Yes. There’s an options screen available when you select a partition. See the PhotoRec wiki site for more details. Whether or not this makes the scanning any faster is something I’m not sure of, but it would reduce write operations during recovery, so maybe it’d make it slightly faster.

Q: Does PhotoRec recover everything?
A: Not really, but it does get most things. PhotoRec looks for known file headers; if the header isn’t recognised by PhotoRec, it can’t recover the file. All supported file headers are available here. Additionally, the less fragmented data is, the greater the chance of PhotoRec being able to recover it; fragmented data will be harder to recover. If there’s low fragmentation, PhotoRec might be able to recover it.

Q: Is there any automated way to make PhotoRec’s recovered file structure look better?
A: Not really, no. There are a few scripts that might help you, but I’d advise against it; you might miss something. Unfortunately, the only right way to do it is to go through it piece by piece. Yeah, it’ll suck, but it’ll suck less than recovering what you want then deleting it assuming it was random junk.

Q: Which is better – Recuva or PhotoRec?
A: Lots of people will say that PhotoRec is better. I’m not certain. In a real disaster (literally using them to actually recover life-or-death data) I noticed that Recuva found files a lot faster than PhotoRec and gave me the opportunity to actually specify which ones I wanted backed up, and also made a good attempt at attaching a meaningful filename to the recovered file. It can even attempt to recover directory structures. That said, while PhotoRec took a hell of a lot longer but it did recover more useful files. It also recovered a lot of shit that I didn’t want, but I can’t change that. So which is best? Honestly they seem roughly the same to me.


Broadcast on this frequency...

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s