Data Hygeine
Jul. 1st, 2010 03:18 pmI'm not much of a software bigot, but if I could banish Excel from the sciences entirely, I would!
There is a family of genes with names like Mar1, Mar2... and another named Sept1, Sept2... et cetra. Try type or paste those into Excel and it TURNS THEM INTO FREAKING DATES!!! As a result, the scientific literature is now full of references to genes called 1-Mar, 2-Mar, et cetra. These are not recognized names that anyone would use deliberately, but a Google search shows the literature to be full of them.
Try to find genes in common between two lists - say, something from a scientific paper and something in your own database - and these ones very often drop out because this involuntary name change corrupted one or the other. Since these lists are often tens of thousands of names long, you never even notice. GAH!
This is just one of many, many ways in which using Excel has silently compromised scientific datasets. It is evil and must die.
There is a family of genes with names like Mar1, Mar2... and another named Sept1, Sept2... et cetra. Try type or paste those into Excel and it TURNS THEM INTO FREAKING DATES!!! As a result, the scientific literature is now full of references to genes called 1-Mar, 2-Mar, et cetra. These are not recognized names that anyone would use deliberately, but a Google search shows the literature to be full of them.
Try to find genes in common between two lists - say, something from a scientific paper and something in your own database - and these ones very often drop out because this involuntary name change corrupted one or the other. Since these lists are often tens of thousands of names long, you never even notice. GAH!
This is just one of many, many ways in which using Excel has silently compromised scientific datasets. It is evil and must die.
no subject
Date: 2010-07-02 12:24 am (UTC)Powerpoint is/was the bane of my life, especially in consultancy. Just pure evil on a stick. People tried to draw in it or create graphs and give us the resulting mess, which I'd have to totally redraw. Once something goes into PPT it never usually comes out the same, I feel the same way when I have to open the package too.
Word can have it's uses, but really give me Textpad or Open Office anyday!
no subject
Date: 2010-07-02 01:02 am (UTC)That said - to be perfectly honest, it's not Excel's fault (or OpenOffice Calc's, which does the same thing). If one just starts typing or pasting into an unformatted cell, the application tries to "make sense" of the data as best it can, resulting in the problem you describe. At least in OO Calc - and I presume in Excel, though I don't have a copy available to check - if one sets the cell to "Text" formatting first - the problem doesn't occur.
I have some idea of the magnitude of your frustration, but blaming a tool because people misuse it - on two different levels at the same time - strikes me as unfair.
no subject
Date: 2010-07-02 02:17 am (UTC)Why can't they make the date logarithm an option you can choose or not for any specific column?
no subject
Date: 2010-07-02 04:30 am (UTC)no subject
Date: 2010-07-03 03:53 pm (UTC)no subject
Date: 2010-07-02 12:05 pm (UTC)