July 21, 2004
Bleu: a Method for Automatic Evaluation of Machine Translation
In a previous article I discussed some challenges associated with trying to grade to sequences of activities. One of which is the true, or gold trace, and the other is the inferred, or black, trace. In that discussion I suggested that the desiderata for a metric that scores the black trace against the gold trace (so that you can evaluate the quality of different black traces) were: |
- It is not biased toward long activities.
- It measures an inference engine's ability to discriminate among different activities.
- It penalizes rapid changes in instantaneous prediction.
As a result of that discussion Henry referred me to this paper.
This paper was written to describe a method of scoring different machine translations. Instead of a black trace, they looked at a machine translation of a given sentence. Instead of a gold trace, they looked at an expert human translation of the same sentence. Whereas in our application we have activity steps, in their application they have words. Basically it's a straightforward mapping of our problem to theirs.
Their solution was to use a metric which matched successively longer n-grams in the black trace against n-grams in the gold trace with a penalty for black traces which are much shorter than the gold trace. The results of matching n-grams of different lengths are averaged after being scaled for the expected random matching. In the results of their paper they showed that there model discriminated well among human generated black traces as well as subtle differences among machine translations.
Because it is based on a tokenized set of activities it is not biased toward long activities. It does an effective job of scoring a black trace based on sequential orderings (n-gram matches) which means that it exercises an inference engines ability to discriminate among activities and in particular multistep activities. It also effectively penalizes an algorithm that rapidly switches among activity predictions because that would cause n-gram matches to degrade quickly. So it meets our desiderata.
It seems like it would be a good method, but it needs to be slightly modified to make sure that all tokenized matches overlap at least some point in time.
Residential Mobility: How people change homes
This talk was an Intel Research Seattle talk by Irina Shklovski from Intel PaPR group. |
17% of Americans change their residence in any 12-month period. This means there is a lot of business around moving. Moving isn't considered a very big deal. Moving is "okay." But it disrupts social networks, causes stress and depression, and destroys social identity
Technology is affecting relationships, but social norms and communication culture is evolving with technology. It isn't clear whether technology is allowing easier maintenance of distributed social networks.
Does Internet use alleviate stress associated with geographic mobility?
This talk discussed a study of eight families in various stages of moving on the West Coast and Tennessee.
Themes:- Moving is an opportunity to clean physical space.
- Yard Sales
- People are more likely to throw away on unpacking since people tend to do it themselves
- Things that have sentimental value don't get thrown out, but are hard to identify to an outsider. The sentimental items provide familiarity in a new place
- Moving causes people to preserve, approximate, or transform old ways of living.
- Music spaces were preserved
- Refrigerators were mostly preserved
- Approximation were required by space limitations
- New practices helped to adjust to a new place such as collecting things that are cool in the new location.
- Moving requires relearning where things are.
- People bought books about new places, but not maps because maps were available online
- Keeping "in touch"
- New communication modalities - move caused people to buy cellphones
- People with cellphones had to decide if they wanted landlines
- Cellphones don't have a "place"
I was a little bit underwhelmed by this presentation. While it was interesting to engage in the idea of how technology impacts residential moves, I don't think that anything was presented that was that surprising, exciting, or even actionable. This seems to be a study that aims to scientifically evaluate something that people don't really question, but no one has a reference to quote. During the presentation there was a lot of argument about the creation of space in new locations that seemed to be splitting hairs based on anecdotes. I'm also not sure that any conclusions could really be drawn from this other than technology impacts residential mobility in different ways.