International Journal of Engineering Research in Electronics and Communication Engineering

Heterogenous Documents Using Hierarchical Dirichlet Process

Author : Gopi Krishna L ¹ Sandeep Naramgari ²

Date of Publication :7th October 2016

Abstract: the hierarchical data groupings in text corpus, e.g., words, sentences, and documents, we conduct the structural learning and infer the latent themes and topics for sentences and words from a collection of documents, respectively. The relation between themes and topics under different data groupings is explored through an unsupervised procedure without limiting the number of clusters. A tree stick-breaking process is presented to draw theme proportions for different sentences. We build a hierarchical theme and topic model, which flexibly represents the heterogeneous documents using Bayesian nonparametric. Thematic sentences and topical words are extracted. In the experiments, the proposed method is evaluated to be effective to build semantic tree structure for sentences and the corresponding words. The superiority of using tree model for selection of expressive sentences for document summarization

Reference :

Will Updated soon

Monthly Journal for Electronics and Communication Engineering

Monthly Journal for Electronics and Communication Engineering

Call for Paper

Indexing

Recent Article