Название | Analytics for Insurance |
---|---|
Автор произведения | Tony Boobier |
Жанр | Зарубежная образовательная литература |
Серия | |
Издательство | Зарубежная образовательная литература |
Год выпуска | 0 |
isbn | 9781119141082 |
1.1.1 Big Data Defined by Its Characteristics
Big Data may be ‘big news’ but it is not entirely ‘new news’. The rapid growth of information has been recognized for over 50 years although according to Gil Press who wrote about the history of Big Data in Forbes1 the expression was first used in a white paper published in 2008.
With multiple definitions available, Big Data is best described by five key characteristics (Figure 1.4) which are:
Figure 1.4 Big Data defined by its characteristics
■ Volume – the sheer amount of structured and unstructured data that is available. There are differing opinions as to how much data is being created on a daily basis, usually measured in petabytes or gigabytes, one suggestion being that 2.5 billion gigabytes of information is created daily.2 (A ‘byte’ is the smallest component of computer memory which represents a single letter or number. A petabyte is 1015 bytes. A ‘gigabyte’ is one-thousand million bytes or 1020 bytes.) But what does this mean? In 2010 the outgoing CEO of Google, Eric Schmidt, said that the same amount of information – 5 gigabytes – is created in 48 hours as had existed from ‘the birth of the world to 2003.’ For many it is easier to think in terms of numbers of filing cabinets and whether they might reach the moon or beyond but such comparisons are superfluous. Others suggest that it is the equivalent of the entire contents of the British Library being created every day.
It is also tempting to try and put this into an insurance context. In 2012 the UK insurance industry created almost 90 million policies, which conservatively equates to somewhere around 900 million pages of policy documentation. The 14m books (at say 300 pages apiece) in the British Library equate to about 4.2 billion pages or equivalent to around five years of annual UK policy documentation. In other words, it would take insurers five years to fill the equivalent of the British Library with policy documents (assuming they wanted to). But let's not play games – it is sufficient to acknowledge that the amount of data and information now available to us is at an unprecedented level.
Perhaps because of the enormity of scale, we seek to define Big Data not just by its size but by its characteristics.
■ Velocity – the speed at which the data comes to us, especially in terms of live streamed data. We also describe this as ‘data in motion’ as opposed to stable, structured data which might sit in a data warehouse (which is not, as some might think, a physical building, but rather a repository of information that is designed for query and analysis rather than for transaction processing).
‘Streamed data’ presents a good example of data in motion in that it comes to us through the internet by way of movies and TV. The speed is not one which is measured in linear terms but rather in bytes per second. It is governed not only by the ability of the source of the data to transmit the information but the ability of the receiver to ‘absorb’ it. Increasingly the technical challenge is not so much that of creating appropriate bandwidth to support high speed transmittal but rather the ability of the system to manage the security of the information.
In an insurance context, perhaps the most obvious example is the whole issue of telematics information, which flows from mobile devices not only at the speed of technology but also at the speed of the vehicle (and driver) involved.
■ Variety – Big data comes to the user from many sources and therefore in many forms – a combination of structured, semi-structured and unstructured. Semi-structured data presents problems as it is seldom consistent. Unstructured data (for example plain text or voice) has no structure whatsoever.
In recent years an increasing amount of data is unstructured, perhaps as much as 80 %. It is suggested that the winners of the future will be those organizations which can obtain insight and therefore extract value from the unstructured information.
In an insurance context this might comprise data which is based on weather, location, sensors, and also structured data from within the insurer itself – all ‘mashed’ together to provide new and compelling insights. One of the clearer examples of this is in the case of catastrophe modeling where insurers have the potential capability to combine policy data, policyholder input (from social media), weather, voice analysis from contact centers, and perhaps other key data sources which all contribute to the equation.
■ Veracity – This is normally taken to mean the reliability of the data. Not all data is equally reliable as it comes from different sources. One measure of veracity is the ‘signal to noise’ ratio which is an expression for the usefulness of information compared to false or irrelevant data. (The expression has its origin in the quality of a radio signal compared to the background noise.)
In an insurance context this may relate to the amount of ‘spam’ or off-topic posts on a social media site where an insurer is looking for insight into the customers' reaction to a new media campaign.
As organizations become obsessed with data governance and integrity there is a risk that any data which is less than perfect is not reliable. This is not necessarily true. One major UK bank for example gives a weighting to the veracity, or ‘truthfulness’ of the data. It allows them to use imperfect information in their decisions. The reality is that even in daily life, decisions are made on the best information available to us even if not perfect and our subsequent actions are influenced accordingly.
■ Value – the final characteristic and one not widely commentated on is that of the value of the data. This can be measured in different ways: value to the user of the data in terms of giving deeper insight to a certain issue; or perhaps the cost of acquiring key data to give that information, for example the creditworthiness of a customer.
There is a risk in thinking that all essential information is out there ‘in the ether’ and it is simply a matter of finding it and creating a mechanism for absorption. It may well be that certain types of data are critical to particular insights, and there is a cost benefit case for actively seeking it.
In an insurance context, one example might be where remote aerial information obtained from either a satellite or unmanned aerial device (i.e., a drone) would help in determining the scale of a major loss and assist insurers in more accurately setting a financial reserve. Drones were used in the New Zealand earthquake of 2011 and currently US insurers are already investigating the use of this technology.
Beyond these five ‘V's of data, it is likely that other forms of data and information will inevitably emerge. Perhaps future data analysis might even consider the use of ‘graphology’ – the study of people's handwriting to establish character – as a useful source of information. Those who are perhaps slightly skeptical of this as a form of insight might reflect on the words of Confucius who about 500 BC warned ‘Beware of a man whose handwriting sways like a reed in the wind.’
Such thinking about graphology has become a recognized subject in many European countries and even today is used in some recruitment processes. Perhaps one day, the use of analytics will demonstrate a clearer correlation between handwriting, personality, speech and behavior. In an insurance context where on-line applications prevail, the use of handwriting is increasingly likely to be the exception and not the norm. Because of this the need for such correlation between handwriting and behavioral insight is probably unlikely to be very helpful to insurers in the short term.
1.1.2 The Hierarchy of Analytics, and How Value is Obtained from Data
Analytics, or the analysis of data, is generally recognized as the key by which data insights are obtained. Put another way, analytics unlocks the ‘value’ of the data.
There is a hierarchy of analytics (Figure 1.5).
1
Press, Gil. ‘A Very Short History of Big Data.’ Forbes Magazine, 2013. http://www.forbes.com/sites/gilpress/2013/05/09/a-very-short-history-of-big-data/ (accessed May 17, 2016).