Archive for the ‘Web Mining’ Category

Monitoring brand using discourse analysis


Learning public opinion (or sentiment) about Your brand in traditional way is expensive because surveys or focus groups take much time and human work. Probably in near future alternative solution will gain recognition as some technology vendors launched tools for brand monitoring using text analytics. Initial review of these attempts appeared yesterday on SmartData Collective.

Monitoring brand using discourse analysis differs, to some extent, from the approach based on text analysis. I have very fresh example – a tool for monitoring opinion about retail nets (supermarkets). And now some words how it is made and how it works.

Building Monitor. Analysis of the discourse in the corpus of Internet discussions related to supermarkets gave a collection of subjects interesting for interlocutors, and a collection of expressions of their attitudes.  Using these results the complex queries for semantic search were built for the learning research. It is the crucial stage – we should learn very details of the discourse, and get its math at the same time, as the basis for justification and calibration of the Monitor. The final task is relatively easy – to implement the results and build a “machine” using accessible technology.

How Monitor works. The data for each retail brand is collected using semantic search. Monitor makes all the calculus according to calibrating formulas and provides figures ready for presentation. Please see the pictures made for presentation only (not production version).

First are the “profile” – how the brand is perceived, i. e. how it is distinguished vs the average Internet discourse. The result of such kind is often astonishing because the picture dramatically differs from that of Customer’s (user of monitor) wishes, from official image and marketing buzz. Moreover, the interlocutors’ categories (vertical in the charts) also differ.

Then there is a comparison of the brands monitored. The charts show how people value each brand with regard to the same categories.  2 charts with negative opinions (general index only) are presented as the example.

The third important group of results regards monitoring itself, i. e. presentation of the changes. It depends on the Customer needs. Some customers want to observe the effects of promotional campaigns, and for such purpose day-to-day monitoring is appropriate. Some want to know the general trends… etc.

Objectivity of subjectivity


For some months I study the discourse of attitude, and sometimes look back to the nice talk of Shilpa Arora and Mahesh Joshi. In the beginning of this work they consider a comment subjective if it cannot be objectively verified. Such definition seems to be quite reasonable if you are using some arbitrary judgement what can be objectively verified and what cannot. However, arbitrary judgement is subjective.

There is a long tradition in opinion mining to make some arbitrary classifications of the words or expressions. For example, many researchers consider opinions as subjective, and  the statements about facts as objective. Many use external standard dictionary to classify expressions as positive or negative, WordNet to identify semantic similarity etc. It’s no wonder that so many studies suffer problems of identifying ironic expressions, sarcasm etc.

In my studies on expressing attitudes the discourse is taken as is, without arbitrary classification. Instead, I examine in which contexts the expressions appear. And result (for Polish language) is different. In the rough picture expressions of attitude seem to fall into two categories: “at the point” statements, and private, even intimate statements. There is a sharp distinction between these two kinds of expressions, clearly visible in the matrices of context relatedness of the expressions.

Streamlining web mining


Last Sunday I submitted my comment to the people vs machine debate in Research Magazine. Some readers of this comment asked me how I get 97% accuracy of sentiment changes’ measurement in the Web Mining.

Web text analytics is rather new field of research and everybody is using its own approach. So, I would only advice – don’t want to be too quick. If you collect millions of records and focus on thousands of specific sentiment-rich expressions, first look at this data. Make some basic descriptive statistics (Yes!), make some charts of the frequency distributions etc. Try to find proper way of stratification, using your best proven approaches and tools. Don’t avoid this basic examination – I write this because I see many freshmen in analytic business who want to cut corners.

If you find good way of data stratification you will undoubtedly notice, that some expressions occur most frequently in one or two or three specific contexts or specific subject domains. Follow this clue, and limit further research to these expressions. This is the first step to the discourse mining (not simply text mining).

Next steps are obvious. Look for relations between various characteristics of the contexts, subject domains, and these “good” expressions. Make clustering in order to select subjects domains and texts you need. Make the selection from your corpus of texts.

There are a lot of tools to extrude rich and accurate information from data selected in this way.

Limiting the scope of study is the first and very basic way to streamline any research process. It is also a basic step used in Industrial Engineering in streamlining any manufacturing or business process.