Discourse targeting


Thanks to Eric Fredine’s blog I found Microsoft study on the Behavioural Targeting. The report is interesting to me because it clearly proved that tracking short term behaviour is more effective for targeted ads delivery than tracking long term behaviour. For me, it is one more proof that “context is the King”.

For some years, my research is focused on specific context – online discussion. The context is extremely rich of thoughts, dreams, emotions, attitudes etc., and all this wealth manifests itself in the discourse of discussion. The discussion within online community is much more rich than the context of online purchase, browsing, search,…

The main general conclusion of my research is that discourse is granulated. For expample, there are big differences between the topics discussed by active and passive people.

For targeted ads delivery, probably more important is the statement that there are specific patterns of topics discussed and attitudes towards various subjects (products, institutions, other people etc.). For example a car.

On automotive discussion forums people are looking for the advice what car to choose or for the bargain new tires etc. But the idea of buying new car most often is born on… family forums! The difference is so big that sometimes I poke fun saying that “a car is a member of family”.

We can say that in the context of online discussion a car is often associated with another element of this context – a topic about family.

Why not to target the car ads at the family discussion forums?

Above result is valid for Polish discussion forums, and I’m not sure if French, German or American forums are similar. And, of course, there are much more pecularities of the patterns of attitudes towards particular products associated with other elements of the context.

The idea of discourse targeting is stunning, isn’t it?


Intellectual Communication Systems


The article of EISIE
Andrzej Góralczyk

The purpose of intellectual communication is not merely to send and receive a message. It is to understand the message.

It seems trivial as long as it concerns everyday life. We are accustomed with everyday misunderstandings causing no more than a smile or laugh.

However, understanding is not trivial when it comes to the communication with big value or big risk behind. Think for example about a huge loss when millions of people do not find relevant information in the Internet at the first glance, despite it exists there. How big value could carry education if really multicultural thanks to better understanding of people with different mindset, concepts and lexicon? How big chances for success could have business thanks to better understanding trends hidden in the background of the stream of messages from the company and form outside? How secure could we feel thanks to better understanding of the dynamics of the road congestion, epidemic disease, “terrorist” attack, blackout or hurricane evolution?

Understanding and technology.

The notion of intellectual communication appeared for a short time in the beginning of 1970-ties. There was a model of two different communicating thesauri. And there was a simple question – how they come to understanding from the initial state of misunderstanding or understanding nothing at all due to the differences in their content and structure? There were attempts to build a theory of communication between such two different thesauri. Today it is clear, that the notion is to broad to be taken as the subject of useful theory albeit it is still inspiring and convenient framework to depict broad domain of the issues of communication and understanding.

Intellectual communication with artificial agent or with the use of artificial mediator became hot issue today. The research and practical applications cover such diverse fields as machine translation, intelligent building, geographic information systems (GIS), cognitive systems, semantic web and semantic search, expert systems, advanced Business Intelligence, “intelligent” user interfaces, security and anti-criminal analytics, early threat detection etc.

Some of technologies and concepts underlying these applications are quite new and immature, and suffer from ignoring thousands years of humanistic knowledge about meaning, reasoning and learning. Some suffer from dreams. For example a dream about computer able to understand, think and decide like a man or even more adequately, exactly, logically etc. We are sure that – except very special applications, like expert systems or mission-critical devices – imitating a man or human thinking (reasoning) is a nonsense. This is a user who has to understand, think and decide! Computer is, potentially, a strong tool to facilitate these tasks. The more pertinent division of tasks between man and machine the more sensible application of IT in intellectual communication.

Consider for example so called Semantic Web [1]. The idea of Tim Berners-Lee, the inventor of World Wide Web, is to extend this net in such a manner that its content can be comprehended by machines. What for? In order to enable artificial agents (i. e. computer applications) provide some services to the users. For example to appoint a visit to therapist’s clinic: finding the “trustworthy” clinic near patient’s house, checking the calendars of the patient and doctor, etc. So, the agent has to be able to reformulate the user’s request to the query appropriate for searching the web, to use the query to find appropriate data in the web, to present options to te user, and finally to make some transactions in both calendars. We could describe such behaviour of the agent as “using data in particular manner” or “combining data from various sources at particular purpose” or so. It seems to be much less than human understanding.

The experts in the field of Semantic Web tend to equip machines with huge controlled vocabularies called “ontologies”. It is nothing to do with the ontology in its original philosophical sense. It is rather a specific “map” of the particular domain (or conceptualisation of the part of the world). Vocabulary has to provide a substance for machine reasoning, and therefore it has to have some special properties. First, it has to be exact and unambiguous – very different from our human everyday conceptualisation. Second, the concepts of an ontology have to be organized in a strict tree-like hierarchy (so called taxonomy), with formally defined relations between them – the structure very different from that of real world.

In this way, using two inadequate models, the experts want to enable artificial agent to communicate about real world to the real man in real everyday situations! Seems not impossible, albeit very complicated and resource-demanding task.

The Meaning and the Understanding

There are several ways to model the meaning in order to make it computable with machine if we want to use computing to simulate human thinking. For example, we can say that meaning of the expression (a sentence, a word, a concept) is a set of “things” (entities, beings) it denotes (indicates to). Such definition is very convenient as long as simulating thinking with a logic or algebra of sets is sufficient for particular task. In fact, such classical approach is widely used in tree-like formal ontologies mentioned above. However, many experts in the field are getting conscious that usefulness of the approach is very limited if the relations in the tree are purely logical. Many rediscover the context – a wealth of semantic, factual, casual, modal etc. relations enriching the meaning in real use (see [2], theses 3.3 and 3.314, and for example [3]). Some endeavour to the other extreme proclaiming “meaning is context” or even “context is king”. In fact, understanding has very little to do with indicating, and very much to do with the context.

By the way – the two concepts of meaning seem to be incompatible because the first is about function (indicating), and the second is about structure (relations). Probably this is the fundamental reason why the problem of identifier is not solvable within the framework of Semantic Web, nor in the framework of Topic Maps (contrary to the illusion of advocates) [4-6]. However, there is no rationale to dismiss the supposition that there exist frameworks in which a problem of identifier vanishes.

Even if we relax the requirement of machine reasoning, leaving only the requirement of machine “understanding”, the obvious solution is a huge dictionary defining all the words (terms) useful in the communication. If no restriction is imposed as to the domain of communication, the dictionary should be countable albeit infinitive. So, the machine “having knowledge” about Everything is another nonsense, for technical reasons. There can be domain ontologies instead (see for example [7] and [8]).

Hermeneutic Searching Engine

We are developing solutions for intellectual communication based on the requirements much more relaxed that those discussed above. First, we don’t assume that machine has to “understand” – it is human who has to understand. Second, we don’t suppose the idea of communicating Everything is reasonable in practice – there can be nothing like “shared vocabulary” about Everything. Moreover, there is only partially shared vocabulary in general case of intellectual communication, and this is why one party is often asking the other party for something she doesn’t know!. Third, we believe that properly balanced abstraction (see below) is a proper way to reduce the demand for terms in the dictionary mutually understood by both parties of communication and, at the same time to cover context broad enough to enable rich understanding. Simply speaking – abstraction enables to communicate rich content with minimum words. This “economy” is compliant with one of the main rules of our approach to system engineering.

Take for example a searching engine. We require it should find, in the “documentation pool”, a set of information objects documenting context of particular matter in order to improve User’s understanding of this matter. What exactly the engine should do? It should “understand” the User’s query and find a documentation of the same context plus some extra RELEVANT context beyond the meaning of the query. And nothing more, if we want the communication between User and documentation pool be effective. Then, it is User’s task to gain new knowledge from this extra context. A searching machine mediating such communication between the User and the “documentation pool” is more then semantic searching engine. It can be called a hermeneutic searching engine.

Like in many known semantic search projects, our hermeneutic searching engine is not for Everything. User has to choose “a domain” before submitting query. The query submitted is in fact a boolean expression of the query term and a code of a domain. And here is the trick – “domain” represents some preconfigured context. More precisely – a “piece od context” small enough to avoid User’s confusion and overloading machine with computing tasks, and big enough to “probe” documentation pool and find all meaning documentation and nothing more (i. e. complete documentation).

In practice, there is a margin of precision and unambiguity and the margin of completeness of documentation harvested. However, in most practical cases we examined up to date, the margin is narrower than that in other searching engines, esp. those striving for “relevance”, for example Google.

How to build “a domain” is our secret and a heart of the solution based on the invention we made in the early 1980-ties. The ontology used is based on rigorous philosophical analysis of human “knowledge systems”. Of course, it is not a tree-like hierarchy and only few relations are subject to some formal requirements. We extended this original ontology appropriately and decomposed it into a set of categories. Unlike in classical philosophy, these categories are not the “highest genera of entities” nor the “millions of terms defining Everything”. They are a little bit more abstract than entities and relations we used to talk about in everyday life. We can call it a “weak abstraction”. There exist a limited number of these categories because they are abstractions. Out of about 120 categories possible in our ontology only about 35 are sufficiently meaningful. These 35 categories form a kind of “alphabet”. Using this alphabet we construct the working contexts. Then we translate them into particular language, since the meaning is language independent. Finally, we adjust the semantic representation of the context and get “the domain”.

It is easy to imagine how powerful can be the hermeneutic machine having the inverted index with not only words but also with the codes of “domains” as the entries.

Finally, it is worth to note the big difference between our hermeneutic searching engine and both the “relevance” searching engines and “artificial reasoning” solutions. The latter are based on the assumption, that the angel is in the details. “Be more specific” is the usual advice of the “relevance” searching engine. “Narrow your searching terms” appeal the “reasoning machines”. Our hermeneutic searching engine is doing something partially opposite: using specific vocabulary it is narrowing the search but, at the same time, it attempts to EXTEND the query with some more general context. For example in order to learn “who gained and who lose in relation to Hurricane Katrina” one should submit a query consisting of a term “Hurricane Katrina” and the codes of 3 contexts with “weak abstraction”: people, organisations and economic values. We believe that a little bit of abstraction in the context is the essence of understanding.

The System

The example of hermeneutic searching engine illustrates some particularities of our approach to the system engineering. Let’s summarize:

  1. Clear distinction between the tasks of the user and the technical (and organisational) device enables us to avoid both unjustified automation and unnecessary human work;
  2. Of big importance is the proper compromise between benefits of the precision of control over the course of events and the “cost” of control; in the case of searching engine the question is of the sufficient unambiguity and completeness of the documentation of particular context;
  3. Our ideal is the rule of economy: minimum measures causing maximum effect; in the Intellectual Communication Systems the challenge is to find rich-content expressions in order to communicate more with less words; the “weak abstraction” applied in our searching engine makes the “domains” specific and general at the same time.


[1] Berners-Lee, T., Hendler, J., Lassila, O., The Semantic Web.
A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American, May 17, 2001, http://www.sciam.com/print_version.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21

[2] Wittgenstein, L., Tractatus Logico-Philosophicus

[3] Wilson, S., Comment & Analysis: Why Context Is King, http://zope.cetis.ac.uk/content/20010827123828

[4] Clark, K. G., Identity Crisis, XML.com, September 11, 2002, http://www.xml.com/pub/a/2002/09/11/deviant.html

[5] Berners-Lee, T., What HTTP URIs Identify, Dresign Issues for the World Wide Web, June 9, 2002, rev. October 29, 2006, http://www.w3.org/DesignIssues/HTTP-URI2

[6] Pepper, S., Schwab, S., Curing the Web’s Identity Crisis. Subject Indicators for RDF., Ontopia, http://www.ontopia.net/topicmaps/materials/identitycrisis.html

[7] Knowlege Zone – One Stop Shop for Ontologies, http://smi-protege.stanford.edu:8080/KnowledgeZone/

[8] DAML Ontology Library, http://www.daml.org/ontologies/


September 2nd, 2007

Monitoring brand using discourse analysis


Learning public opinion (or sentiment) about Your brand in traditional way is expensive because surveys or focus groups take much time and human work. Probably in near future alternative solution will gain recognition as some technology vendors launched tools for brand monitoring using text analytics. Initial review of these attempts appeared yesterday on SmartData Collective.

Monitoring brand using discourse analysis differs, to some extent, from the approach based on text analysis. I have very fresh example – a tool for monitoring opinion about retail nets (supermarkets). And now some words how it is made and how it works.

Building Monitor. Analysis of the discourse in the corpus of Internet discussions related to supermarkets gave a collection of subjects interesting for interlocutors, and a collection of expressions of their attitudes.  Using these results the complex queries for semantic search were built for the learning research. It is the crucial stage – we should learn very details of the discourse, and get its math at the same time, as the basis for justification and calibration of the Monitor. The final task is relatively easy – to implement the results and build a “machine” using accessible technology.

How Monitor works. The data for each retail brand is collected using semantic search. Monitor makes all the calculus according to calibrating formulas and provides figures ready for presentation. Please see the pictures made for presentation only (not production version).

First are the “profile” – how the brand is perceived, i. e. how it is distinguished vs the average Internet discourse. The result of such kind is often astonishing because the picture dramatically differs from that of Customer’s (user of monitor) wishes, from official image and marketing buzz. Moreover, the interlocutors’ categories (vertical in the charts) also differ.

Then there is a comparison of the brands monitored. The charts show how people value each brand with regard to the same categories.  2 charts with negative opinions (general index only) are presented as the example.

The third important group of results regards monitoring itself, i. e. presentation of the changes. It depends on the Customer needs. Some customers want to observe the effects of promotional campaigns, and for such purpose day-to-day monitoring is appropriate. Some want to know the general trends… etc.

Discourse of excellence


All companies are good, and some are excellent. How to grasp the difference?

Companies active in particular sector are basically similar. Similar products, machines, technology, and people are doing similar things. Why some are falling, some simply exist stagnating, and some excel and are climbing – for example – to the top of World Class Manufacturing?

Excellent companies think differently. It is cultural difference, and many studies discovered the relationship between culture of the enterprise and its business results. Some 10 years ago I also made a study of this relationship, and discovered evolution of the values’ perception and hierarchy during transformation from centrally planned economy to the market economy. The essay on the “cultural revolution in business”, published in the press and in a book, won some renown.

The discourse of the company reflects its culture. There are, for example, some differences between the discourse of excellent and good companies:

  1. in the average company people think and speak in the terms of results, whereas in excellent companies people think more about processes (the flow, the quality, control etc.); in between are the companies in which the people of Board are focused on results only and the other managers are concerned with processes, and this discrepancy sometimes induce conflicts;
  2. in the average company, during break, people talk about sport, children, politics, last weekend etc, whereas in excellent companies lunchtime is full of the business chats – the sign that they are interested in business;
  3. the average company is quiet and politically correct and the wrangles are rare and full of accusation, whereas in excellent companies wrangles are almost permanent and focused on the problems not people – they are not wars;
  4. people in the average company take problems personally, whereas in excellent company problems are treated simply as tasks and expressed in the terms of figures and facts (objectivism);
  5. excellent company has the common language dramatically facilitating communication; in the companies with KAIZEN culture this language is replete with the terms of organisational techniques (“Pareto language”);

The discourse differ also in enterprising or innovating and average companies. For example, the attitudes to uncertainty: entrepreneurs are often “taking risk”, innovating companies are “looking for opportunities”, and average companies rather “protect themselves” against threat.

The difference means the opportunity to measure and analyse. The question is, of course, whether the analytics of the discourse of excellence could be useful and make sense. For example: could we monitor the entrepreneur climate more cheaply or more accurate or faster using discourse analytics instead of conventional methods? Or could we rate the companies more efficiently using analysis of the “discourse of excellence” instead of (or in addition to) traditional rating? Could we measure competitive capacity of particular company and identify the space for improvement using the analysis of its discourse? Could we monitor its culture and alarm if something goes wrong? Simply speaking – could we develop intangible capital analytics?

Above questions are inspired by the short discussion of Nicholas.Carbis’ post on AnalyticBridge. He develops “human capital analytics” (as he says). I think that broader idea of intellectual (or intangible) capital analytics is worth considering.

As far as I know there is no empirical evidence up to now of the relationship between Intellectual Capital (IC) and company productivity. Perhaps IC analytics could help?

There are so many questions about practical implementation of IC analytics. First: Data source. Second: Sometimes discurse reflects company’s culture indirectly, esp. when official language prevails and is used to hide rather than to reveal the issues…

Managing limitations of prediction


Thrilling question whether “All Predictive Models Are Wrong?” has already second page of proposed answers in the LinkedIn, and probably will be continued for a long time. However, the other question seems to be equally important: “What to do if the predictions are not sufficiently accurate?”

Standard answer is seductive and well known in the security management – it is necessary to built a strategy to cope with uncertainty and following risk. For example, contemporary military strategies for such cases fulfil the goal “to preserve the ability to continue operations”. Narrow-minded business often develops strategy “to minimize loss”, and open-minded business develops organisation able “to benefit most of opportunities and optimize the measures against threat”.

Discussed is the case of hurricanes. No one can predict the loss due to the hurricanes with satisfactory accuracy. Uncertainty is costly. If the risk is overestimated, people and companies bear excessive cost to protect or insure themselves. If the risk is underestimated, the insurers bankrupt…

It is not possible to predict loss of disasters accurately. It does not mean, however, that data analytics has not to do much in this field. Probably instead of tilt with windmills better is to analyse strategies of the response to uncertainty, and to build the models of optimum strategies. Moreover, the variance of strategies can explain a part of the variance of total loss, and contribute to the accuracy of total loss prediction.

The idea of modelling the strategies of reinsurance is not new. It requires some knowledge and models of the behaviour of open dynamic systems. It seems to be reasonable to build a general model of the reinsurance strategy securing insurers against bankruptcy – the strategy “to preserve the ability to continue operations”.

Objectivity of subjectivity


For some months I study the discourse of attitude, and sometimes look back to the nice talk of Shilpa Arora and Mahesh Joshi. In the beginning of this work they consider a comment subjective if it cannot be objectively verified. Such definition seems to be quite reasonable if you are using some arbitrary judgement what can be objectively verified and what cannot. However, arbitrary judgement is subjective.

There is a long tradition in opinion mining to make some arbitrary classifications of the words or expressions. For example, many researchers consider opinions as subjective, and  the statements about facts as objective. Many use external standard dictionary to classify expressions as positive or negative, WordNet to identify semantic similarity etc. It’s no wonder that so many studies suffer problems of identifying ironic expressions, sarcasm etc.

In my studies on expressing attitudes the discourse is taken as is, without arbitrary classification. Instead, I examine in which contexts the expressions appear. And result (for Polish language) is different. In the rough picture expressions of attitude seem to fall into two categories: “at the point” statements, and private, even intimate statements. There is a sharp distinction between these two kinds of expressions, clearly visible in the matrices of context relatedness of the expressions.

Streamlining web mining


Last Sunday I submitted my comment to the people vs machine debate in Research Magazine. Some readers of this comment asked me how I get 97% accuracy of sentiment changes’ measurement in the Web Mining.

Web text analytics is rather new field of research and everybody is using its own approach. So, I would only advice – don’t want to be too quick. If you collect millions of records and focus on thousands of specific sentiment-rich expressions, first look at this data. Make some basic descriptive statistics (Yes!), make some charts of the frequency distributions etc. Try to find proper way of stratification, using your best proven approaches and tools. Don’t avoid this basic examination – I write this because I see many freshmen in analytic business who want to cut corners.

If you find good way of data stratification you will undoubtedly notice, that some expressions occur most frequently in one or two or three specific contexts or specific subject domains. Follow this clue, and limit further research to these expressions. This is the first step to the discourse mining (not simply text mining).

Next steps are obvious. Look for relations between various characteristics of the contexts, subject domains, and these “good” expressions. Make clustering in order to select subjects domains and texts you need. Make the selection from your corpus of texts.

There are a lot of tools to extrude rich and accurate information from data selected in this way.

Limiting the scope of study is the first and very basic way to streamline any research process. It is also a basic step used in Industrial Engineering in streamlining any manufacturing or business process.

Two purposes of this blog


For many years in hundreds of articles, lectures and discussions I urged managers and IT professionals to apply best practices, humane science and economy to their work. As consultant, I demonstrated for many times the practical usefulness of such attitude. Today, after many achievements in the past, I see that bringing knowledge and practice together is never ending necessity. Post-industrial society and knowledge-based economy is still to be built on the always fresh layers of ignorance-based economy. Therefore I open this blog with two purposes in mind.

The first purpose of this blog is to bring humane sciences and technology together. IT and computer science could benefit much, and develop much more efficiently if closely collaborating with the experts in the fields they enter. For example,  why not to learn process management from industrial engineers, when developing Business Process Management applications? Why not to develop ontologies together with philosophers who are experts in ontology? Why to develop semantic technologies without the experts of linguistics?

As generalist, I managed to base my inventions in semantic technologies on strong philosophical grounds, and applied experiences of the dicourse analysis. The results are promising, and I will use them in this blog as examples of the benefits of such integral approach.

The second purpose of this blog is to promote knowledge and expertise as the resources of economic and human growth. This is quite difficult task, as a variety of hype and multitude of sham pretend in the field. Probably more difficult is to make knowledge and expertise comprehensible without losing their profundity and pertinence.

I invite everybody interested in these two tasks to join this blog.