Codex Caelestis: January 30, 2005 - February 5, 2005 Archives

February 2, 2005

Strawman results

Here is the strawman:

I calculated the mean length of each activity from the two training runs that I did of each activity by itself. From that mean length I fit an exponential curve to the amount of time expected to be in each activity. From those training runs I also calculated the probability of observing each object.

Then I created an HMM with one random variable that could take on one of 11 states. Each state corresponds to one of the eleven activities. The self-transition probability was determined from the exponential distribution above and the out-transition was set to be uniform across each of the other activities. Each state also has a distribution over expected observations that was trained from above. This model is called the "prior" strawman since there was no training done on the interleaved trials. Using this model I evaluated the per-time slice accuracy in the presence of empty observations (i.e. one observation per time click even if it is empty). I also evaluated it on whether it picked any of the currently in progress activities as the primary activity (this value must be greater than or equal to the previous measure.) Finally, I evaluated what the edit-distance was between the sequence of activities that the model inferred and the true activity. This measure penalizes frequent switching between activities.

Now, using ten-fold cross-validation, I took the above model and trained it with exact EM inference using GMTK. Also in the presence of "no-observations". This became the "trained" strawman model. I evaluated this on the same measures.

And, a little info, about the data set. On a per-time-slice average there were 1.98 activities in progress at any given time with a std of .0899. The average length was about 27 minutes.

Here are the results:

Model	Time-Slice Accuracy		Edit Distance
	Primary Activity	Any Activity
"single-trained" straw man	73.7% (std 0.096%).	82.28%(std 0.068%)	30.7 std(9.59)
"full-trained" straw man	65.1% (std 0.08%)	78.4% (std 0.062%)	74.8 (std 12.2)

These results are troubling. Why would the numbers get worse across the board for every metric when you do training? It appears that the EM learning process is causing the model to wander so that the semantic meaning of a given node no longer corresponds to what I wanted them to correspond to. In order to fix this we are going to have to train on some labeled and some unlabeled data simultaneously to pin down the semantics of the nodes.

Dieter also wanted me to run the same experiments, but without the "no_observation" observations. This was to correct a problem in the strawman where the model tended to drift to the activity which had the highest likelihood of seeing "no_observation" during lengthy periods without an observation.

Here are the results when the data only has hits when RFIDs are observed.

Model	Time-Slice Accuracy		Edit Distance
	Primary Activity	Any Activity
"single-trained" straw man	89% (std 4.7%).	89.1%(std 4.8%)	19.1 std (7.3)
"full-trained" straw man	81.5% (std 4.4%)	81.9% (std 4.4%)	64.3 (std 9.4)

As expected the numbers for the strawman got better when there were no "no_observations". I don't think this is the right way to go though because the only way in which it makes sense to talk about interruptions is when an activity has an expected duration

So the next step is to correct my model to not have such strict deterministic steps ( not mentioned in this write-up) and get comparable numbers to the above.

Here are the results for the pinned model:

Model	Time-Slice Accuracy		Edit Distance
	Primary Activity	Any Activity
pinned straw man	33.5% (std 6.4%)	33.6% (std 6.4%)	279.3 (std 27.2)

Posted by djp3 at 3:34 PM | Comments (0) | TrackBack (0)