Author : Prashanth Kumar 1
Date of Publication :30th November 2017
Abstract: In this paper, we address the problem of dataset extraction from research articles. With the growing digital data repositories and the demand for data-centric research in data mining community, finding appropriate dataset for a research problem has become an essential step in scientific research. But given the wide variety of data usage in scientific research, it is very difficult to figure out which datasets are most useful for a particular research topic. To alleviate this problem, an automated dataset search engine is a powerful tool. In this work, we propose a novel approach to extract dataset names from research articles. We propose a novel way of using “web intelligence†from academic search engines and online dictionaries to mine data set names from research articles. We also show a comparison between different sources of “web knowledge†by comparing different academic search engines such as Google Scholar, Microsoft academic search. The performance of this approach is evaluated using standard information retrieval metrics such as precision, recall and F-measure. We get an F-measure of 80%. This accuracy is significant for an unsupervised approach.
Reference :