July 8, 2004

Evaluating Activity Inference

Archery target

Here is a small interesting problem.

A user in a smart home performs a sequence of activities and the smart home records a sensor stream that was generated by those activities. The sensors might be motion sensors or RFID tags or infrared beam breaks or whatever. What the user really did is recorded as a sequence of time stamped activities. For discussion we'll call the true description of what happened and when it happened the gold trace.

Now let's suppose there is a hypothetical piece of software which is trying to recreate what happened based on the sensor stream. The inference mechanism might be an HMM or a stochastic solution to a DBN - how you infer the user's actions is not material to the issue I am describing. However, the software must produce a sequence of time stamped activities comparably to the gold trace. Let's call the software inferred trace the black trace.

The problem is this: If you have multiple pieces of software that infer the activities in different ways, you would like to be able to assign a score of how close the black trace is to the gold trace so that you can determine which software did the best job. What is the best way to do this scoring?

One way (method A) is to take both the gold trace and the black trace and slice them into N bins of equal amounts of time. Then for each bin, determine if the black trace and the gold trace agree. The final score is the percentage of bins in which the black trace matches the gold trace.

This method is definitely biased by long activities. An activity which takes a long amount of time is going to be disproportionately represented in the scoring function. In particular you could have a very easy activity to detect (say sleeping) which takes 12 hours. Your activity inferencer might correctly figure out when you are sleeping, but that's all it can do. Using this method, your lame inference engine would get 50% accuracy. This seems to be a bad metric on one hand because it doesn't test an inferencing engine's ability to detect activities. On the other hand, from the perspective of a user of this system it does make sense. Every time they look at the output of the inference engine they know that the engine will be right 50% of the time.

Another method (method B) is to forget the timing information and just translate the gold trace into a sequence of activities. If each activity is represented by a character then the gold trace becomes a string. Now do the same thing with the black trace. Given these two strings you can find a minimum cost edit distance between the two strings. Using the edit mapping you can determine false positives and false negatives and use those metrics to score your traces.

This method is more satisfying from a machine learning perspective because it exercises the inference engine's ability to discriminate among activities. In contrast to method A though it might be entirely unsatisfying to a user. In order to see the degenerate case, imagine that an inference engine correctly guesses the user's current activity for 1 seconds in the middle of the activity and picks a default activity for the rest of the time. The edit distance will be relatively low because there will be one extra activity (the default one) inserted in the black trace for every true activity, but by clock time it will be right a miniscule amount of time.

A third method proposed by Tanzeem is to segment the black trace according to when the true activities (in the gold trace) occurred. For each of these chunks of the black trace integrate over the likelihood of each of the possible activities to determine the time and confidence weighted vote that the inference engine casts for each activity over the length of the activity. If the highest vote is for the correct activity then the black trace is correct, otherwise not.

This method is satisfying because it doesn't bias longer activities against shorter activities, but it does allow the inference engine to rapidly switch it's current best guess while an activity is under way. (Of course you can only switch the instantaneous best guess so often and still achieve overall accuracy, but it is still possible). This behavior would be completely unsatisfying to a user and would be penalized by methods B and A.

So in conclusion and in light of these techniques the desiderata for a good scoring scheme are:

  • It is not biased toward long activities.
  • It measures an inference engine's ability to discriminate among different activities.
  • It penalizes rapid changes in instantaneous prediction.
Posted by djp3 at 12:23 PM | Comments (0) | TrackBack (0)

Hierarchical Topic Models and the Nested Chinese Restaurant Process

Chinese Restaurant Sign

This paper presents a method of determining priors over hierarchical topic structures in a world with infinite topics.

The paper relies on a generative description of how topic hierarchies are generated and then uses Gibbs sampling to find the best model for the data.

The generative process generates topic hierarchy trees in which each node in a tree is a "topic". A document is defined by a path from the root to a leaf node. The topics aren't necessarily semantically meaningful things, but rather a distribution over words that will be found in any document that is a child of the node. Since all documents are children of the root node, the "topic" of the root node will tend to be words that are common to all documents such as "a", "the", "or", etc. So the distribution isn't a "topic" in the LSI sense, but rather a generalization over the words common to all its children.

It relies on a Chinese Restaurant Process which is defined elsewhere and is extended to generate trees as well as branches in the tree. It has the following properties: if you choose an existing topic then choose a more populated topic, otherwise add more tables according to a gamma parameter.

It was a tough paper, but had pretty good results when it tried to learn a hierarchy over NIPS abstracts. (There is something weird going on when your research is directed toward the conference in which it is published, but...)

I have the following unanswered questions:
  • What is a conjugate prior?
  • Why is a Dirichlet prior a conjugate prior over a multinomial?
  • How does a Latent Dirichlet Prior differ from a Dirichlet Prior?
Posted by djp3 at 10:15 AM | Comments (0) | TrackBack (0)

July 7, 2004

Information Communication Technology for Developing Countries

Woman on cell-phone This was a talk presented at Intel Research Seattle by James Dailey from the Grameen Technology Center in Seattle WA. Mr. Dailey spoke about emerging technologies in the developing world. His aim was to technologically empower the "bottom of the pyramid" (meaning the lowest economic strata) in an economically viable way. This is in contrast to most technology which markets to the top economic strata. GTC was started 6-7 years ago to promote their principles world-wide.
Three components that Grameen is apparently pushing on:
  • Scaling up micro-finance with technology. (some stats about micro-finance as implemented by the "Grameen bank" (not sure what that is): 96% women borrowers, 90% repayment rate)
  • The Village Phone run by a local entrepreneur. (These seemed to be a particular instance of a micro-finance operation that GTC was involving themselves in, or liked or something)
  • Remote transactions that create an ATM-like device that works with cell-phones and web browsers because the ATM model doesn't work in developing countries, and poor transportation infrastructure restricts remote physical payments.

Aspects of business models that they endorse include shared access (computer cafe' model), a focus on entrepreneurs, add value to communities and financial sustainability.

Challenges include lack of capital, high illiteracy, cultural barriers, poor infrastructure (electricity/phone lines).

Scaling up micro-finance can be aided by credit bureaus, remote transactions (no ATMS exist), remote data collection, guaranteeing identity, agent models (allowing people to act as your agent), and intermediation.

MOAP (Micro-finance Open Architecture Project) is a vision to support micro-finance with open source software and open standards.

My opinion of this talk was that it was heavy on vision and light on actual research. There were some clear points made about why current technologies won't work in the developing world, and some potential solutions (primarily cell phones), but it wasn't at all clear what steps GTC was making to push the technology other than by identifying challenges. I didn't get any idea of what GTC's role was in any real micro-finance operation. I wasn't sure if the Village Phone was an idea they liked, or a real implemented program in the field and I was pretty sure that the remote transaction with cell-phones was just an idea.

More generally I wasn't convinced that the overall vision was sound. It seemed to me that GTC's direction was to replicate the developed world with alternative technologies, yet at the same time there was a contrasting discussion about how the developing world was free to reinvent structures that wouldn't work for them. For example, Mr. Dailey suggested establishing credit bureaus and implementing strong government sponsored identity guarantees. I'm not sure that the best thing for countries where human rights are frequently abhorrent and semi-organized crime is rampant is to establish credit reporting data centers and strong identity control and tracking under a central agency's control.

In an effort to understand the vision more, I asked why 96% of the micro-finance recipients were women. The answer was a blend of memes including "men want more money", "men don't repay loans as well", and "this is also about empowering women". All of these issues may be true, but none of these issues were previously raised, so they seemed to confuse the overall picture of what the micro-financiers role was.

Overall I came away with the impression that GTC just wanted things not to be so bad in the third world - it would be good if there wasn't so much poverty, it would be good if there was more agricultural information, it would be good if women were more empowered, it would be good if it was all open source, etc. There was a noticeable lack of data to support the fact that these ideas were in fact good ideas or necessary ideas and subsequently there wasn't much commitment to a vision of action along those lines.

Posted by djp3 at 4:22 PM | Comments (0) | TrackBack (0)