Data Hygeine
Jul. 1st, 2010 03:18 pmI'm not much of a software bigot, but if I could banish Excel from the sciences entirely, I would!
There is a family of genes with names like Mar1, Mar2... and another named Sept1, Sept2... et cetra. Try type or paste those into Excel and it TURNS THEM INTO FREAKING DATES!!! As a result, the scientific literature is now full of references to genes called 1-Mar, 2-Mar, et cetra. These are not recognized names that anyone would use deliberately, but a Google search shows the literature to be full of them.
Try to find genes in common between two lists - say, something from a scientific paper and something in your own database - and these ones very often drop out because this involuntary name change corrupted one or the other. Since these lists are often tens of thousands of names long, you never even notice. GAH!
This is just one of many, many ways in which using Excel has silently compromised scientific datasets. It is evil and must die.
There is a family of genes with names like Mar1, Mar2... and another named Sept1, Sept2... et cetra. Try type or paste those into Excel and it TURNS THEM INTO FREAKING DATES!!! As a result, the scientific literature is now full of references to genes called 1-Mar, 2-Mar, et cetra. These are not recognized names that anyone would use deliberately, but a Google search shows the literature to be full of them.
Try to find genes in common between two lists - say, something from a scientific paper and something in your own database - and these ones very often drop out because this involuntary name change corrupted one or the other. Since these lists are often tens of thousands of names long, you never even notice. GAH!
This is just one of many, many ways in which using Excel has silently compromised scientific datasets. It is evil and must die.
no subject
Date: 2010-07-02 02:17 am (UTC)Why can't they make the date logarithm an option you can choose or not for any specific column?
no subject
Date: 2010-07-02 04:30 am (UTC)no subject
Date: 2010-07-03 03:53 pm (UTC)