The anom-alous pack-age pro-vides some tools to detect unusual time series in a large col-lec-tion of time series. This is joint work with Earo Wang (an hon-ours stu-dent at Monash) and Niko-lay Laptev (from Yahoo Labs). Yahoo is inter-ested in detect-ing unusual pat-terns in server met-rics.
The pack-age is based on this paper with Earo and Niko-lay.
The basic idea is to mea-sure a range of fea-tures of the time series (such as strength of sea-son-al-ity, an index of spik-i-ness, first order auto-cor-re-la-tion, etc.) Then a prin-ci-pal com-po-nent decom-po-si-tion of the fea-ture matrix is cal-cu-lated, and out-liers are iden-ti-fied in 2-??dimensional space of the first two prin-ci-pal com-po-nent scores.
We use two meth-ods to iden-tify outliers.
A bivari-ate ker-nel den-sity esti-mate of the first two PC scores is com-puted, and the points are ordered based on the value of the den-sity at each obser-va-tion. This gives us a rank-ing of most out-ly-ing (least den-sity) to least out-ly-ing (high-est density).
A series of




I explained the ideas in a talk last Tues-day given at a joint meet-ing of the Sta-tis-ti-cal Soci-ety of Aus-tralia and the Mel-bourne Data Sci-ence Meetup Group. Slides are avail-able here. A link to a video of the talk will also be added there when it is ready.
The density-??ranking of PC scores was also used in my work on detect-ing out-liers in func-tional data. See my 2010 JCGS paper and the asso-ci-ated rain-bow pack-age for R.
There are two ver-sions of the pack-age: one under an ACM licence, and a lim-ited ver-sion under a GPL licence. Even-tu-ally we hope to make the GPL ver-sion con-tain every-thing, but we are cur-rently depen-dent on the alphahull pack-age which has an ACM licence.