Big data – it’s everywhere. We’re told it will increase revenues, help us sell more and know more about what our customers want. Studies suggest that 90% of the world’s data has been created in the last two years alone, and the volume of data is set to continue to increase. Scientific research has been particularly challenged by the data-deluge.
Tera- and zetabytes of complex information are generated daily by laboratories, individuals, universities and commercial organisations. If scientists can’t take advantage of this data then it significantly impacts R&D innovation. Be it the technology, food, chemicals, sportswear or cosmetics industries, validated scientific data drives new product R&D and drives commercial success.
The ability to filter through the ‘noise’ is vital whether you are a life scientist working in drug development or a technology company designing a new product. To effectively address the challenges of spiralling costs and increasingly complex and competitive markets, the tools used in scientific research are becoming particularly astute at screening, aggregating and integrating large data sets to successfully uncover unique insights.
With the sheer volume and density of published research and the rise of multiple data intensive programs since the success of the Human Genome Project, the life sciences space is managing increasingly complex and diverse data. How can fields outside the life sciences learn from how big data is managed by life science companies and how it can be applied to boost productivity?
Learning how to fail early
Bringing an innovative drug to market can take 10-15 years and cost up to $5 billion. Indeed, the stakes are high and success is not guaranteed. Mitigating R&D risks by applying the ‘fail early and fail cheap’ approach is crucial. Siphoning and sorting published research results and data from varied fields, such as chemistry and biology, is where the challenge lies. By working with quality, validated data, researchers have greater confidence in making informed decisions to continue to invest time, money and resources in a program.
So whatever the particular driver behind a business using big data, whether a marketing campaign or new product development, companies must use data to identify successful lines of enquiry while closing down dead-ends early on in the process.
Those businesses that can efficiently analyse data and so reduce the risk of ploughing time and money into doomed projects will increase their chances of commercial success. Indeed, a recent report from Bain & Co found that, of 400 large companies, those that adopted big data analytics 'gained a significant lead over the rest of the corporate world.'
Mining for gold
Handling immense data sets requires a combination of scientific and technological skills to determine how data is stored, searched and accessed. In science, the importance of data scientists in ensuring that data is handled correctly from the outset is not underestimated; other industries can learn from the scientific approach. Text-mining tools and the use of relevant taxonomies are essential.
If we think about big data as a huge number of data points in some multi-dimensional space, the problem is one of analysis, i.e. frequently finding very similar or very dissimilar points which cannot be compared. In life sciences, taxonomies assign data points a class, thus comparison of two points is as easy as looking up other data points in the same class.
Without taxonomies, the only way to find data points comparable to the one of interest to compute the distance of this point to every other point in the space, which is a huge number of computations. Taxonomies provide enormous speed for big data analysis.
Taxonomies combined with semantic technology and text-mining tools provide a more efficient way to discover the relevant content. Text-mining has generally been a largely manual process, but recent advances in technology such as text-mining automation have transformed the process. Technology has been able to play an important role in complementing the human element of text-mining and curation.
Text-mining extracts key data from multiple and disparate data sources; the resulting nuggets of data have enormous value and can help users make associations where none existed before. By investing in the right technology, businesses can ensure their employees are able to get the most value from the data available to them.
Plotting a course for success
Big data allows manufacturing, production and research processes to be more efficient and cost-effective. It should help cut through the clutter of opinion and guide a targeted approached based on underlying insights. Those who can successfully distill the data will have a significant competitive advantage.
Capturing and understanding internal project results, both past and present, enable companies to retain knowledge and augment new initiatives without having to start from scratch. Successful data integration can accelerate research, help better validate hypotheses and ultimately improve research outcomes.
This aggregated approach must be harnessed in other industries. For example, a pharmaceutical researcher mines and analyses different literature and data sources to make relevant associations between genes and proteins to inform potential new treatments. Similarly, through data mining, other enterprises can uncover new insights to better inform product development.
A combination of the right people and the right technology is an antidote that ensures less time is wasted following unproductive avenues and will boost commercial productivity.