Streamlining web mining

Home > Semantic Technology, Web Mining > Streamlining web mining

Streamlining web mining

2009/07/15 Andrzej Góralczyk Leave a comment Go to comments

Last Sunday I submitted my comment to the people vs machine debate in Research Magazine. Some readers of this comment asked me how I get 97% accuracy of sentiment changes’ measurement in the Web Mining.

Web text analytics is rather new field of research and everybody is using its own approach. So, I would only advice – don’t want to be too quick. If you collect millions of records and focus on thousands of specific sentiment-rich expressions, first look at this data. Make some basic descriptive statistics (Yes!), make some charts of the frequency distributions etc. Try to find proper way of stratification, using your best proven approaches and tools. Don’t avoid this basic examination – I write this because I see many freshmen in analytic business who want to cut corners.

If you find good way of data stratification you will undoubtedly notice, that some expressions occur most frequently in one or two or three specific contexts or specific subject domains. Follow this clue, and limit further research to these expressions. This is the first step to the discourse mining (not simply text mining).

Next steps are obvious. Look for relations between various characteristics of the contexts, subject domains, and these “good” expressions. Make clustering in order to select subjects domains and texts you need. Make the selection from your corpus of texts.

There are a lot of tools to extrude rich and accurate information from data selected in this way.

Limiting the scope of study is the first and very basic way to streamline any research process. It is also a basic step used in Industrial Engineering in streamlining any manufacturing or business process.

Categories: Semantic Technology, Web Mining Tags: Semantic Technologies, Text Mining, Web Mining

Comments (1) Trackbacks (0) Leave a comment Trackback

naughty america sentry configs

2014/09/28 at 15:32

Reply

WOW just what I was searching for. Came here by
searching for social engineering careers in it security

No trackbacks yet.

	visit link on Monitoring brand using discour…
	naughty america sent… on Streamlining web mining
	printer belfast on Intellectual Communication Sys…
	Andrzej Góralczyk on Monitoring brand using discour…
	Nathan Gilliatt on Monitoring brand using discour…
	Andrzej Góralczyk on Objectivity of subjectivity
	T.R. Fitz-Gibbon on Objectivity of subjectivity

Discourse Web

Streamlining web mining

Leave a comment Cancel reply

Pages

Recent Posts

Recent Comments

Categories

My other sites

Blog Posts from AnalyticBridge

Discourse Web

Streamlining web mining

Share this:

Related

Leave a comment Cancel reply

Pages

Recent Posts

Recent Comments

Categories

My other sites

Blog Posts from AnalyticBridge