SMWCon Fall 2013/Organic Data Science: Opening Scientific Data Curation

Although scientists in many disciplines share data through community repositories so that others can harvest those data for analysis and publications (e.g., in astronomy, physics, etc), this paradigm has failed in other scientific disciplines. In some cases, scientists prefer to define themselves the metadata that they want to specify, rather than being required to provide what is pre-defined as required by the repository. In other cases, the data collectors have no incentives to invest the effort required to formally and fully describe their datasets. At the heart of this problem is the traditional treatment of data sharing as a separate function in science, where data repositories are not integrated with the practices of science. We are investigating the use of semantic wikis as a platform for scientific communities to create and converge organically on metadata properties that suit their needs. We are also investigating "organic data science" as a paradigm to support data sharing as a collective activity that is integrated with other activities in scientific research, such as the joint formulation of shared science questions and their pursuit through shared workflows for data analysis. This approach is consistent with recent trends to make scientific software and data more open and broadly accessible across disciplines, as well as open to volunteer contributors and to citizen scientists. A key aspect of this work is the credit to contributors through provenance and the development of proactive mechanisms to encourage structure and convergence. Our organic data science approach can benefit other Semantic MediaWiki projects for social knowledge collection, particularly those focusing on big data integration and analysis.

This talk describes joint work with Paul Hanson from the Center for Limnology at the University of Wisconsin Madison and Chris Duffy from the Department of Civil Engineering at Pennsylvania State University.