Wednesday, January 11, 2006

Genome annotation errors in pathway databases due to semantic ambiguity in partial EC numbers, by Green and Karp in NAR, is crucial reading for people in the bioinformatics community. It highlights (in the context of pathway databases) two errors which are potential pitfalls for bioinformaticians: erroneously interpreting other researchers' data and not fully understanding data types when making automated curation software.

The paper points out a system-wide error in pathway databases like
KEGG in dealing with partial EC numbers (e.g. 1.1.1.-). The two
meanings of this notation are confused in the databases, and multiple genes are often assigned to incorrect functions as a result. Their
results show that only 28% of genes with a partial EC number are
correctly annotated in KEGG. They propose a new annotation for
partial EC numbers to resolve the ambiguity.

Overall the paper is a quick read, fairly controversial in our group,
but instructive.

*written by a guest reviewer

