Impact of Social Sciences – Data Descriptors: Providing the necessary information to make data open, discoverable and reusable

Data need to be more than just available, they need to be discoverable and understandable. Iain Hrynaszkiewicz introduces Nature’s new published data paper format, a Data Descriptor. Peer-review and curation of these data papers will facilitate open access to knowledge and interdisciplinary research, pushing the boundaries of discovery. Some of the most tangible benefits of open data stem from social and interdisciplinary sciences as these fields require effective cross-disciplinary communication.

Dengue fever (officially, human dengue virus infection) can cause headaches, pain behind the eyes, nausea, vomiting, and kills thousands. The World Health Organization says that more than 2.5 billion people – 40% of the world’s population – are now at risk from dengue. It is spread by mosquitos and there is no vaccine for the virus. Reporting of cases is of inconsistent quality, and can be biased by difficulties in diagnosis, limited resources for diagnostic testing, and the varying reporting capacities of national health systems.

The virus is a global public health challenge which Dr Simon Hay and his team sought to battle – with data. They have collected the largest database of (8,309) human dengue virus occurrences, derived from peer-reviewed literature and case reports as well as informal online sources, with entries dating from 1960 to 2012. To make this data more easily reusable, they published a Data Descriptor in Nature Publishing Group’s new journal Scientific Data, which describes all data collection processes in full, as well as geo-positioning, database management and quality-control procedures.

A Data Descriptor is a peer-reviewed article that describes and links to scientifically valuable datasets. It is citable (so researchers get more credit for their work) and is designed to make datasets more discoverable, interpretable and reusable. A Data Descriptor provides information needed for interpretation; links through to one or more trusted data resources where data files, code or workflows are stored; fulfils a significant part of funders’ data management requirements; and uses open licenses that enable reuse. Many publishers, including offerings from Springer/BioMed Central, Elsevier, Wiley, Faculty of 1000 and Ubiquity Press, offer some form of data-driven publication – sometimes called data papers or data notes, rather than Data Descriptors, but all largely with similar aims of increasing the visibility of datasets in the peer-reviewed literature.
http://blogs.lse.ac.uk/impactofsocialsciences/2014/10/22/data-descriptor...