Mining Financial Time Series
by Arno Siebes
A lot of financial data is in the form of time-series data, eg, the tick data from stock markets. Interesting patterns mined from such data could be used for, eg, cleaning the data or spotting possible market opportunities.
Mining time-series data is, however, not trivial. Simply seeing each individual time-series as a (large) record in a table pre-supposes that all series have the same length and sampling frequency. Moreover, straightforward application of standard mining algorithms to such tables means that one forgets the time structure in the series. To overcome these problems, one can work with a fixed set of characteristics that are derived from each individual time-series. These characteristics should be such that they preserve similarity of time-series. That is, time-series that are similar should have similar characteristics and vice versa. If such a set of characteristics can be found, the mining can be done on these characteristics rather than on the original time-series.
A confounding factor in defining such characteristics is that similarity of time-series is not a well-defined criterion. In the Dutch HPCN project IMPACT, in which CWI participates, we take similarity as being similar to the human eye, and we use wavelet analysis to define and compute the characteristics. One of the attractive features of this approach is that different characterisations capture different aspects of similarity. For example, Hoelder exponents capture roughness at a pre-defined scale, whereas a Haar representation focuses on local slope.
Currently, experiments are underway with the Dutch ABN AMRO bank to filter errors from on-line tick-data. In the first stage, a Haar representation is used to identify spikes in the data. In the next stage, clustering on Hoelder exponents and/or Haar representations will be used to identify smaller scale errors.
Please contact:
Arno Siebes - CWI
Tel: +31 20 592 4139
E-mail: Arno.Siebes@cwi.nl