The world around us is changing rapidly. It is clear that the future will be as unrecognizable to us as our world of the Internet, mobile telephony, computers and satellite navigation would have been to earlier generations.
Although physics and engineering are fundamental to these existing technologies and the future (as yet unknowable) developments, the common denominator underlying them is not quantum physics or nanotechnology but information. We are surrounded by technology that collects, transmits and manipulates data, and ultimately needs to convert it to information. The sheer quantity of this data is hard to comprehend. Every minute of every day, the Internet processes more than 200 million e-mails, 2 million Google search queries and an almost unmeasurable amount of financial data. During one minute the computers at CERN can produce 600 gigabytes of data equivalent to two HD video downloads every second. A single MRI scan can include up to 10 gigabytes of data but may contain only one piece of information: the presence or absence of a disease.
This deluge has become known by the trendy term big data. However, there is a problem with big data: No matter how big it is, it is still just data. Although the Internet is the largest repository of data that we humans have ever constructed, anybody who has looked at the full Twitter feed or has searched for kitten videos will realize that it is remarkably difficult to find the big information lurking in the undifferentiated mess of data. The recently announced Cantab Capital Institute for the Mathematics of Information at Cambridge University is tasked with discovering solutions to the information problem in fields as diverse as imaging and machine learning.
The information problem is particularly acute in finance. The big data hype is often found when one is talking about new unusual sources of data satellite imaging of crops and Walmart car parks or mining the Twitter feed for consumer trends. But the largest source of big data in finance is history. How should one go about processing the petabytes of historical financial data already in existence (and the gigabytes of new historical data produced every minute) to make optimal investment decisions?
To address these considerable obstacles, applying mathematics and statistics what has come to be known as information science is essential.
Investors only have historical data (except those using astrology or tarot cards), and every decision that they make is guided in some way by it. Unfortunately, there is a great deal of noise and superfluous information in historical data, and extracting signals or relevant information from it is exceptionally challenging. Furthermore, investors confidence in historical information and their commitment to using it as part of their investment process are inconsistent.
An example of this is asset-liability modeling for pension funds. The liability side of the equation uses historical data on the lifestyles and habits of large groups of people to forecast the statistical distributions of retirement ages and mortality. From this, a very precise probabilistic distribution of the future liability cash flows can be determined.
For the asset side of the equation, the use of historical data is less consistent. Although the data may show, say, that value investing has outperformed the market or that certain systematic styles have consistently added value, investors are less inclined to use this data rigorously to define their processes.
Why is this? In almost all cases, it is a result of two data or information biases. Investors believe there is either security selection information or security timing information in the data. For example, an investor may believe that interest rates will definitely rise next year so now would be a bad time to embark on a risk parity investment. Alternatively, the investor may believe that either his own analysis of the data or a forecasters analysis indicates that a particular security or asset is a better purchase than another.
Although there is some timing and selection information in historical data, the signal is small and elusive compared with the noise. Only sophisticated information science techniques and large amounts of computing power can hope to extract information from the data, and therefore big-data-driven investment processes are best left to the experts. Conversely, for the vast majority of less technologically equipped investors, the big data revolution is likely to just reinforce the traditional view of investment processes: Do what worked in the past because it will probably work in the future, and try not to change your mind too often.
Ewan Kirk is CIO and Partner at Cantab Capital Partners LLP, a multibillion dollar systematic asset manager. The Cantab Capital Institute for the Mathematics of Information sits within Cambridge Universitys mathematics faculty and aims to galvanize progress in the mathematics of information.