Statistical Interpretation for Mining Hybrid Regional Web Documents

Prakash, Kolla Bhanu and Rangaswamy, M. A. Dorai and Raman, Arun Raja (2012) Statistical Interpretation for Mining Hybrid Regional Web Documents. WIRELESS NETWORKS AND COMPUTATIONAL INTELLIGENCE, ICIP 2012, 292.0. 503-+. ISSN 1865-0929

Full text not available from this repository.

Abstract

Media mining has taken a major shift from conventional data ruining due to the ever increasing complexity of web documents. Another new dimension gets added when the web documents are of Indian origin since variety of languages and dialects get. into the development of web pages. These web documents wherein words in different languages are used with or without translation can be termed as hybrid documents. A typical yahoo news page in different languages is an example of this. The complexity of extracting information or content. and eventually knowledge gets more involved when words from other languages are used as yjet are without translation like 'computer' or 'mobile' being used freely in regional languages. Even though the reader/ surfer can follow the content easily, no translation has been done. Such documents are the focus of this study and a statistical approach for describing the features of the words in different languages is used as the basis for correlation to assess the content of such web documents. As a benchmark study six words related to education are taken in four different languages, English, Tamizh, Telugu and Hindi and different ways of normalizing within and outside the group are taken as the base vectors and using correlation study, any new data. or group of data. is checked for assessing the probability of getting the content. The words being in different scripts are converted to a three layer pixel map groups so that translational and text related issues do not affect the mining procedure. Further as textual data. is well-structured irrespective of language, this approach of getting attributes and using them as bases is more general and does have the ability to include texts from any language.

Item Type: Article
Uncontrolled Keywords: HTML, Media Mining, Multi-Lingual, Web Communication, Web Documents
Depositing User: Unnamed user with email techsupport@mosys.org
Last Modified: 06 Feb 2026 07:12
URI: https://ir.vmrfdu.edu.in/id/eprint/7036

Actions (login required)

View Item
View Item