Archive for the ‘Research’ Category

Discourse targeting


Thanks to Eric Fredine’s blog I found Microsoft study on the Behavioural Targeting. The report is interesting to me because it clearly proved that tracking short term behaviour is more effective for targeted ads delivery than tracking long term behaviour. For me, it is one more proof that “context is the King”.

For some years, my research is focused on specific context – online discussion. The context is extremely rich of thoughts, dreams, emotions, attitudes etc., and all this wealth manifests itself in the discourse of discussion. The discussion within online community is much more rich than the context of online purchase, browsing, search,…

The main general conclusion of my research is that discourse is granulated. For expample, there are big differences between the topics discussed by active and passive people.

For targeted ads delivery, probably more important is the statement that there are specific patterns of topics discussed and attitudes towards various subjects (products, institutions, other people etc.). For example a car.

On automotive discussion forums people are looking for the advice what car to choose or for the bargain new tires etc. But the idea of buying new car most often is born on… family forums! The difference is so big that sometimes I poke fun saying that “a car is a member of family”.

We can say that in the context of online discussion a car is often associated with another element of this context – a topic about family.

Why not to target the car ads at the family discussion forums?

Above result is valid for Polish discussion forums, and I’m not sure if French, German or American forums are similar. And, of course, there are much more pecularities of the patterns of attitudes towards particular products associated with other elements of the context.

The idea of discourse targeting is stunning, isn’t it?

Monitoring brand using discourse analysis


Learning public opinion (or sentiment) about Your brand in traditional way is expensive because surveys or focus groups take much time and human work. Probably in near future alternative solution will gain recognition as some technology vendors launched tools for brand monitoring using text analytics. Initial review of these attempts appeared yesterday on SmartData Collective.

Monitoring brand using discourse analysis differs, to some extent, from the approach based on text analysis. I have very fresh example – a tool for monitoring opinion about retail nets (supermarkets). And now some words how it is made and how it works.

Building Monitor. Analysis of the discourse in the corpus of Internet discussions related to supermarkets gave a collection of subjects interesting for interlocutors, and a collection of expressions of their attitudes.  Using these results the complex queries for semantic search were built for the learning research. It is the crucial stage – we should learn very details of the discourse, and get its math at the same time, as the basis for justification and calibration of the Monitor. The final task is relatively easy – to implement the results and build a “machine” using accessible technology.

How Monitor works. The data for each retail brand is collected using semantic search. Monitor makes all the calculus according to calibrating formulas and provides figures ready for presentation. Please see the pictures made for presentation only (not production version).

First are the “profile” – how the brand is perceived, i. e. how it is distinguished vs the average Internet discourse. The result of such kind is often astonishing because the picture dramatically differs from that of Customer’s (user of monitor) wishes, from official image and marketing buzz. Moreover, the interlocutors’ categories (vertical in the charts) also differ.

Then there is a comparison of the brands monitored. The charts show how people value each brand with regard to the same categories.  2 charts with negative opinions (general index only) are presented as the example.

The third important group of results regards monitoring itself, i. e. presentation of the changes. It depends on the Customer needs. Some customers want to observe the effects of promotional campaigns, and for such purpose day-to-day monitoring is appropriate. Some want to know the general trends… etc.

Discourse of excellence


All companies are good, and some are excellent. How to grasp the difference?

Companies active in particular sector are basically similar. Similar products, machines, technology, and people are doing similar things. Why some are falling, some simply exist stagnating, and some excel and are climbing – for example – to the top of World Class Manufacturing?

Excellent companies think differently. It is cultural difference, and many studies discovered the relationship between culture of the enterprise and its business results. Some 10 years ago I also made a study of this relationship, and discovered evolution of the values’ perception and hierarchy during transformation from centrally planned economy to the market economy. The essay on the “cultural revolution in business”, published in the press and in a book, won some renown.

The discourse of the company reflects its culture. There are, for example, some differences between the discourse of excellent and good companies:

  1. in the average company people think and speak in the terms of results, whereas in excellent companies people think more about processes (the flow, the quality, control etc.); in between are the companies in which the people of Board are focused on results only and the other managers are concerned with processes, and this discrepancy sometimes induce conflicts;
  2. in the average company, during break, people talk about sport, children, politics, last weekend etc, whereas in excellent companies lunchtime is full of the business chats – the sign that they are interested in business;
  3. the average company is quiet and politically correct and the wrangles are rare and full of accusation, whereas in excellent companies wrangles are almost permanent and focused on the problems not people – they are not wars;
  4. people in the average company take problems personally, whereas in excellent company problems are treated simply as tasks and expressed in the terms of figures and facts (objectivism);
  5. excellent company has the common language dramatically facilitating communication; in the companies with KAIZEN culture this language is replete with the terms of organisational techniques (“Pareto language”);

The discourse differ also in enterprising or innovating and average companies. For example, the attitudes to uncertainty: entrepreneurs are often “taking risk”, innovating companies are “looking for opportunities”, and average companies rather “protect themselves” against threat.

The difference means the opportunity to measure and analyse. The question is, of course, whether the analytics of the discourse of excellence could be useful and make sense. For example: could we monitor the entrepreneur climate more cheaply or more accurate or faster using discourse analytics instead of conventional methods? Or could we rate the companies more efficiently using analysis of the “discourse of excellence” instead of (or in addition to) traditional rating? Could we measure competitive capacity of particular company and identify the space for improvement using the analysis of its discourse? Could we monitor its culture and alarm if something goes wrong? Simply speaking – could we develop intangible capital analytics?

Above questions are inspired by the short discussion of Nicholas.Carbis’ post on AnalyticBridge. He develops “human capital analytics” (as he says). I think that broader idea of intellectual (or intangible) capital analytics is worth considering.

As far as I know there is no empirical evidence up to now of the relationship between Intellectual Capital (IC) and company productivity. Perhaps IC analytics could help?

There are so many questions about practical implementation of IC analytics. First: Data source. Second: Sometimes discurse reflects company’s culture indirectly, esp. when official language prevails and is used to hide rather than to reveal the issues…

Managing limitations of prediction


Thrilling question whether “All Predictive Models Are Wrong?” has already second page of proposed answers in the LinkedIn, and probably will be continued for a long time. However, the other question seems to be equally important: “What to do if the predictions are not sufficiently accurate?”

Standard answer is seductive and well known in the security management – it is necessary to built a strategy to cope with uncertainty and following risk. For example, contemporary military strategies for such cases fulfil the goal “to preserve the ability to continue operations”. Narrow-minded business often develops strategy “to minimize loss”, and open-minded business develops organisation able “to benefit most of opportunities and optimize the measures against threat”.

Discussed is the case of hurricanes. No one can predict the loss due to the hurricanes with satisfactory accuracy. Uncertainty is costly. If the risk is overestimated, people and companies bear excessive cost to protect or insure themselves. If the risk is underestimated, the insurers bankrupt…

It is not possible to predict loss of disasters accurately. It does not mean, however, that data analytics has not to do much in this field. Probably instead of tilt with windmills better is to analyse strategies of the response to uncertainty, and to build the models of optimum strategies. Moreover, the variance of strategies can explain a part of the variance of total loss, and contribute to the accuracy of total loss prediction.

The idea of modelling the strategies of reinsurance is not new. It requires some knowledge and models of the behaviour of open dynamic systems. It seems to be reasonable to build a general model of the reinsurance strategy securing insurers against bankruptcy – the strategy “to preserve the ability to continue operations”.

Objectivity of subjectivity


For some months I study the discourse of attitude, and sometimes look back to the nice talk of Shilpa Arora and Mahesh Joshi. In the beginning of this work they consider a comment subjective if it cannot be objectively verified. Such definition seems to be quite reasonable if you are using some arbitrary judgement what can be objectively verified and what cannot. However, arbitrary judgement is subjective.

There is a long tradition in opinion mining to make some arbitrary classifications of the words or expressions. For example, many researchers consider opinions as subjective, and  the statements about facts as objective. Many use external standard dictionary to classify expressions as positive or negative, WordNet to identify semantic similarity etc. It’s no wonder that so many studies suffer problems of identifying ironic expressions, sarcasm etc.

In my studies on expressing attitudes the discourse is taken as is, without arbitrary classification. Instead, I examine in which contexts the expressions appear. And result (for Polish language) is different. In the rough picture expressions of attitude seem to fall into two categories: “at the point” statements, and private, even intimate statements. There is a sharp distinction between these two kinds of expressions, clearly visible in the matrices of context relatedness of the expressions.

Streamlining web mining


Last Sunday I submitted my comment to the people vs machine debate in Research Magazine. Some readers of this comment asked me how I get 97% accuracy of sentiment changes’ measurement in the Web Mining.

Web text analytics is rather new field of research and everybody is using its own approach. So, I would only advice – don’t want to be too quick. If you collect millions of records and focus on thousands of specific sentiment-rich expressions, first look at this data. Make some basic descriptive statistics (Yes!), make some charts of the frequency distributions etc. Try to find proper way of stratification, using your best proven approaches and tools. Don’t avoid this basic examination – I write this because I see many freshmen in analytic business who want to cut corners.

If you find good way of data stratification you will undoubtedly notice, that some expressions occur most frequently in one or two or three specific contexts or specific subject domains. Follow this clue, and limit further research to these expressions. This is the first step to the discourse mining (not simply text mining).

Next steps are obvious. Look for relations between various characteristics of the contexts, subject domains, and these “good” expressions. Make clustering in order to select subjects domains and texts you need. Make the selection from your corpus of texts.

There are a lot of tools to extrude rich and accurate information from data selected in this way.

Limiting the scope of study is the first and very basic way to streamline any research process. It is also a basic step used in Industrial Engineering in streamlining any manufacturing or business process.