Archive

Posts Tagged ‘context’

Discourse targeting

 

Thanks to Eric Fredine’s blog I found Microsoft study on the Behavioural Targeting. The report is interesting to me because it clearly proved that tracking short term behaviour is more effective for targeted ads delivery than tracking long term behaviour. For me, it is one more proof that “context is the King”.

For some years, my research is focused on specific context – online discussion. The context is extremely rich of thoughts, dreams, emotions, attitudes etc., and all this wealth manifests itself in the discourse of discussion. The discussion within online community is much more rich than the context of online purchase, browsing, search,…

The main general conclusion of my research is that discourse is granulated. For expample, there are big differences between the topics discussed by active and passive people.

For targeted ads delivery, probably more important is the statement that there are specific patterns of topics discussed and attitudes towards various subjects (products, institutions, other people etc.). For example a car.

On automotive discussion forums people are looking for the advice what car to choose or for the bargain new tires etc. But the idea of buying new car most often is born on… family forums! The difference is so big that sometimes I poke fun saying that “a car is a member of family”.

We can say that in the context of online discussion a car is often associated with another element of this context – a topic about family.

Why not to target the car ads at the family discussion forums?

Above result is valid for Polish discussion forums, and I’m not sure if French, German or American forums are similar. And, of course, there are much more pecularities of the patterns of attitudes towards particular products associated with other elements of the context.

The idea of discourse targeting is stunning, isn’t it?

Intellectual Communication Systems

eisie_micro

The article of EISIE
by
Andrzej Góralczyk

The purpose of intellectual communication is not merely to send and receive a message. It is to understand the message.

It seems trivial as long as it concerns everyday life. We are accustomed with everyday misunderstandings causing no more than a smile or laugh.

However, understanding is not trivial when it comes to the communication with big value or big risk behind. Think for example about a huge loss when millions of people do not find relevant information in the Internet at the first glance, despite it exists there. How big value could carry education if really multicultural thanks to better understanding of people with different mindset, concepts and lexicon? How big chances for success could have business thanks to better understanding trends hidden in the background of the stream of messages from the company and form outside? How secure could we feel thanks to better understanding of the dynamics of the road congestion, epidemic disease, “terrorist” attack, blackout or hurricane evolution?

Understanding and technology.

The notion of intellectual communication appeared for a short time in the beginning of 1970-ties. There was a model of two different communicating thesauri. And there was a simple question – how they come to understanding from the initial state of misunderstanding or understanding nothing at all due to the differences in their content and structure? There were attempts to build a theory of communication between such two different thesauri. Today it is clear, that the notion is to broad to be taken as the subject of useful theory albeit it is still inspiring and convenient framework to depict broad domain of the issues of communication and understanding.

Intellectual communication with artificial agent or with the use of artificial mediator became hot issue today. The research and practical applications cover such diverse fields as machine translation, intelligent building, geographic information systems (GIS), cognitive systems, semantic web and semantic search, expert systems, advanced Business Intelligence, “intelligent” user interfaces, security and anti-criminal analytics, early threat detection etc.

Some of technologies and concepts underlying these applications are quite new and immature, and suffer from ignoring thousands years of humanistic knowledge about meaning, reasoning and learning. Some suffer from dreams. For example a dream about computer able to understand, think and decide like a man or even more adequately, exactly, logically etc. We are sure that – except very special applications, like expert systems or mission-critical devices – imitating a man or human thinking (reasoning) is a nonsense. This is a user who has to understand, think and decide! Computer is, potentially, a strong tool to facilitate these tasks. The more pertinent division of tasks between man and machine the more sensible application of IT in intellectual communication.

Consider for example so called Semantic Web [1]. The idea of Tim Berners-Lee, the inventor of World Wide Web, is to extend this net in such a manner that its content can be comprehended by machines. What for? In order to enable artificial agents (i. e. computer applications) provide some services to the users. For example to appoint a visit to therapist’s clinic: finding the “trustworthy” clinic near patient’s house, checking the calendars of the patient and doctor, etc. So, the agent has to be able to reformulate the user’s request to the query appropriate for searching the web, to use the query to find appropriate data in the web, to present options to te user, and finally to make some transactions in both calendars. We could describe such behaviour of the agent as “using data in particular manner” or “combining data from various sources at particular purpose” or so. It seems to be much less than human understanding.

The experts in the field of Semantic Web tend to equip machines with huge controlled vocabularies called “ontologies”. It is nothing to do with the ontology in its original philosophical sense. It is rather a specific “map” of the particular domain (or conceptualisation of the part of the world). Vocabulary has to provide a substance for machine reasoning, and therefore it has to have some special properties. First, it has to be exact and unambiguous – very different from our human everyday conceptualisation. Second, the concepts of an ontology have to be organized in a strict tree-like hierarchy (so called taxonomy), with formally defined relations between them – the structure very different from that of real world.

In this way, using two inadequate models, the experts want to enable artificial agent to communicate about real world to the real man in real everyday situations! Seems not impossible, albeit very complicated and resource-demanding task.

The Meaning and the Understanding

There are several ways to model the meaning in order to make it computable with machine if we want to use computing to simulate human thinking. For example, we can say that meaning of the expression (a sentence, a word, a concept) is a set of “things” (entities, beings) it denotes (indicates to). Such definition is very convenient as long as simulating thinking with a logic or algebra of sets is sufficient for particular task. In fact, such classical approach is widely used in tree-like formal ontologies mentioned above. However, many experts in the field are getting conscious that usefulness of the approach is very limited if the relations in the tree are purely logical. Many rediscover the context – a wealth of semantic, factual, casual, modal etc. relations enriching the meaning in real use (see [2], theses 3.3 and 3.314, and for example [3]). Some endeavour to the other extreme proclaiming “meaning is context” or even “context is king”. In fact, understanding has very little to do with indicating, and very much to do with the context.

By the way – the two concepts of meaning seem to be incompatible because the first is about function (indicating), and the second is about structure (relations). Probably this is the fundamental reason why the problem of identifier is not solvable within the framework of Semantic Web, nor in the framework of Topic Maps (contrary to the illusion of advocates) [4-6]. However, there is no rationale to dismiss the supposition that there exist frameworks in which a problem of identifier vanishes.

Even if we relax the requirement of machine reasoning, leaving only the requirement of machine “understanding”, the obvious solution is a huge dictionary defining all the words (terms) useful in the communication. If no restriction is imposed as to the domain of communication, the dictionary should be countable albeit infinitive. So, the machine “having knowledge” about Everything is another nonsense, for technical reasons. There can be domain ontologies instead (see for example [7] and [8]).

Hermeneutic Searching Engine

We are developing solutions for intellectual communication based on the requirements much more relaxed that those discussed above. First, we don’t assume that machine has to “understand” – it is human who has to understand. Second, we don’t suppose the idea of communicating Everything is reasonable in practice – there can be nothing like “shared vocabulary” about Everything. Moreover, there is only partially shared vocabulary in general case of intellectual communication, and this is why one party is often asking the other party for something she doesn’t know!. Third, we believe that properly balanced abstraction (see below) is a proper way to reduce the demand for terms in the dictionary mutually understood by both parties of communication and, at the same time to cover context broad enough to enable rich understanding. Simply speaking – abstraction enables to communicate rich content with minimum words. This “economy” is compliant with one of the main rules of our approach to system engineering.

Take for example a searching engine. We require it should find, in the “documentation pool”, a set of information objects documenting context of particular matter in order to improve User’s understanding of this matter. What exactly the engine should do? It should “understand” the User’s query and find a documentation of the same context plus some extra RELEVANT context beyond the meaning of the query. And nothing more, if we want the communication between User and documentation pool be effective. Then, it is User’s task to gain new knowledge from this extra context. A searching machine mediating such communication between the User and the “documentation pool” is more then semantic searching engine. It can be called a hermeneutic searching engine.

Like in many known semantic search projects, our hermeneutic searching engine is not for Everything. User has to choose “a domain” before submitting query. The query submitted is in fact a boolean expression of the query term and a code of a domain. And here is the trick – “domain” represents some preconfigured context. More precisely – a “piece od context” small enough to avoid User’s confusion and overloading machine with computing tasks, and big enough to “probe” documentation pool and find all meaning documentation and nothing more (i. e. complete documentation).

In practice, there is a margin of precision and unambiguity and the margin of completeness of documentation harvested. However, in most practical cases we examined up to date, the margin is narrower than that in other searching engines, esp. those striving for “relevance”, for example Google.

How to build “a domain” is our secret and a heart of the solution based on the invention we made in the early 1980-ties. The ontology used is based on rigorous philosophical analysis of human “knowledge systems”. Of course, it is not a tree-like hierarchy and only few relations are subject to some formal requirements. We extended this original ontology appropriately and decomposed it into a set of categories. Unlike in classical philosophy, these categories are not the “highest genera of entities” nor the “millions of terms defining Everything”. They are a little bit more abstract than entities and relations we used to talk about in everyday life. We can call it a “weak abstraction”. There exist a limited number of these categories because they are abstractions. Out of about 120 categories possible in our ontology only about 35 are sufficiently meaningful. These 35 categories form a kind of “alphabet”. Using this alphabet we construct the working contexts. Then we translate them into particular language, since the meaning is language independent. Finally, we adjust the semantic representation of the context and get “the domain”.

It is easy to imagine how powerful can be the hermeneutic machine having the inverted index with not only words but also with the codes of “domains” as the entries.

Finally, it is worth to note the big difference between our hermeneutic searching engine and both the “relevance” searching engines and “artificial reasoning” solutions. The latter are based on the assumption, that the angel is in the details. “Be more specific” is the usual advice of the “relevance” searching engine. “Narrow your searching terms” appeal the “reasoning machines”. Our hermeneutic searching engine is doing something partially opposite: using specific vocabulary it is narrowing the search but, at the same time, it attempts to EXTEND the query with some more general context. For example in order to learn “who gained and who lose in relation to Hurricane Katrina” one should submit a query consisting of a term “Hurricane Katrina” and the codes of 3 contexts with “weak abstraction”: people, organisations and economic values. We believe that a little bit of abstraction in the context is the essence of understanding.

The System

The example of hermeneutic searching engine illustrates some particularities of our approach to the system engineering. Let’s summarize:

  1. Clear distinction between the tasks of the user and the technical (and organisational) device enables us to avoid both unjustified automation and unnecessary human work;
  2. Of big importance is the proper compromise between benefits of the precision of control over the course of events and the “cost” of control; in the case of searching engine the question is of the sufficient unambiguity and completeness of the documentation of particular context;
  3. Our ideal is the rule of economy: minimum measures causing maximum effect; in the Intellectual Communication Systems the challenge is to find rich-content expressions in order to communicate more with less words; the “weak abstraction” applied in our searching engine makes the “domains” specific and general at the same time.

References

————–
[1] Berners-Lee, T., Hendler, J., Lassila, O., The Semantic Web.
A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American, May 17, 2001, http://www.sciam.com/print_version.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21

[2] Wittgenstein, L., Tractatus Logico-Philosophicus

[3] Wilson, S., Comment & Analysis: Why Context Is King, http://zope.cetis.ac.uk/content/20010827123828

[4] Clark, K. G., Identity Crisis, XML.com, September 11, 2002, http://www.xml.com/pub/a/2002/09/11/deviant.html

[5] Berners-Lee, T., What HTTP URIs Identify, Dresign Issues for the World Wide Web, June 9, 2002, rev. October 29, 2006, http://www.w3.org/DesignIssues/HTTP-URI2

[6] Pepper, S., Schwab, S., Curing the Web’s Identity Crisis. Subject Indicators for RDF., Ontopia, http://www.ontopia.net/topicmaps/materials/identitycrisis.html

[7] Knowlege Zone – One Stop Shop for Ontologies, http://smi-protege.stanford.edu:8080/KnowledgeZone/

[8] DAML Ontology Library, http://www.daml.org/ontologies/

——————–

September 2nd, 2007

Objectivity of subjectivity

 

For some months I study the discourse of attitude, and sometimes look back to the nice talk of Shilpa Arora and Mahesh Joshi. In the beginning of this work they consider a comment subjective if it cannot be objectively verified. Such definition seems to be quite reasonable if you are using some arbitrary judgement what can be objectively verified and what cannot. However, arbitrary judgement is subjective.

There is a long tradition in opinion mining to make some arbitrary classifications of the words or expressions. For example, many researchers consider opinions as subjective, and  the statements about facts as objective. Many use external standard dictionary to classify expressions as positive or negative, WordNet to identify semantic similarity etc. It’s no wonder that so many studies suffer problems of identifying ironic expressions, sarcasm etc.

In my studies on expressing attitudes the discourse is taken as is, without arbitrary classification. Instead, I examine in which contexts the expressions appear. And result (for Polish language) is different. In the rough picture expressions of attitude seem to fall into two categories: “at the point” statements, and private, even intimate statements. There is a sharp distinction between these two kinds of expressions, clearly visible in the matrices of context relatedness of the expressions.