By: Brent McKenna, FAU M.S
Step One: Find Data, Get Data
As a scientist, this sounds like it will be the easiest step. Tag some fish, download the tag receivers, get an email, save the data. Step one done.
In fact, finding and getting the data might be even easier. Pretend that you inherited a large data set. Each file was meticulously catalogued and organized. Before you ever saw the files, fish movements had been divided into spawning and non-spawning seasons then by tag identification number. Each ID was grouped and arranged chronologically. This data is easy to consume and handle.
Step Two: But Wait! There’s More!
Turns out though, your data is organized great for the last person who used it. Turns out, maybe not so much for you.
With every change of hands, the interest in the data changes. What you want to accomplish, isn’t the same as what the previous user had wanted to accomplish.
Luckily, there are only some 30,000 lines of detections in your data set. That should be easy to organize how you need it, right? Right.
Step Three: 99 Excel Files open, 99 files of excel, take one down, open 4 more, 110 files of excel…on your desktop
A short period of reflection later, you decide that the best way to handle the data is to combine every data set into a single gigantic dataset or separate each data set by some identification parameter. Turns out that 30,000 rows in Microsoft excel either crashes the computer program or forms one unwieldy file, so that isn’t an option. Instead, you make each identification number into its own excel file.
You diligently set about partitioning the data in ways that you need. Check the tag identification number, copy, and paste. Open the next file. Check the tag identification number, copy, and paste. After the 1000th row, however, every line starts to blend together. Your eyes cross. You philosophize that there is no real difference between 2 and 5 or 3 and 8. After all, two is but five upside down, and eight is only three next to a mirror. Moreover, individual fish aren’t that important. What’s important is that all the fish are the same species.
Alas, each individual fish is important. Incredibly so. Three is not eight, and 2 is not 5. No matter how much you want them to be. You must continue to organize the data. Each tagged fish gets its own file.
After copying over every instance of an ID’s detection, you open the next file, and the next and the next until your desktop is cluttered with files.
At last, you are done, but don’t forget to save them.
Step Four: So Close, but not quite.
You clicked “save as” for every file you made. Named them after their respective tag numbers and saved them in a single folder. Then you backed up that folder onto an external hard-drive. With all of your files saved, you can load them into ArcGIS. Your excitement peaks. The first step is almost done. Instead of a beautiful map of colored dots in ArcGIS, however, you get an error message. You try another file. Get the same error message. You saved every one of your files in the wrong format.
You go back and save each file again. This time after using Google to make sure you know which file format is best for ArcGIS. This takes a while.
Step Five: One drop of water in the bucket
You get that first file loaded into GIS. Perfectly. It is easy to manipulate and effective at showing the data that you want it to show. Now, you just load another 109 files because you want to see your whole dataset.
Step Six: You learn something new every 2 hours or so
Next, you try to animate your map. How do your fish move with time? You somehow manage to create a time animation of fish you didn’t know you had from 250 years ago before figuring out that the times were saved in the wrong format in your files. This will be easy to fix, or so you thought. In the format that ArcGIS needs, you cannot format the whole date and time column at once. You must reopen that original excel file, change the date and time format there, and resave the file for use on ArcGIS. You must do this for every new file you made.
Step Seven: Your Computer Crashed. Do Not Pass Go.
Finally, everything is going smoothly. Your data is organized in a way that you can load into your mapping program and do whatever you need to do. The amount of adjustments made in ArcGIS are minimal. It seems like you’ve accomplished a major goal in your project. You’re close to being done. Excitement abounded. You make what should be your last click, that click to save. That final moment before completion when you can show your lab that you can do something! But…The program doesn’t reply. Your click is not going through. You try clicking again…maybe one more time. Ok. It froze. Not a big deal. Leave it alone. Let the computer think. Then, it’s like slow-motion. Is that blue you see? Is it blue? Surely it’s just green or maybe some holdover from staring out the window too long. Not a chance.
Blue screen of death.
Your stomach slams into the ground. Despair and disappoint permeate your body. Your hands reach towards the sky, you fall to your knees, and scream, “WHY!!” Cursing every deity of silicon and heavy metals.
Step Eight: Time to start over
As it turns out, your files corrupted when your computer crashed. Time to redo everything. You text your friends. You won’t be going to the bar with them tonight. In fact, let’s cancel the weekend’s plans. You have a lot of work to do again. But this time, you learned your lesson. Every 30 seconds, you click that save button.
Step Nine: What I Learned, a list of Pro-tips:
- Consider getting a Mac.
- Save everything in the right file format the first time.
- Write your hypothesis and how you plan to test it on a piece of paper. Then post that paper on the wall behind your computer. Make sure you adhere it along the center line of your computer screen so that you can see it even when you go cross-eyed.
- Thank the people who worked with the data before you, a lot. They worked hard to turn the data into what it is now. They made your work that much faster and simpler.