Twitter has announced plans to make the source code for Storm, a data processing system developed by a company it recently acquired, available under an open source licence later this year.
Storm is a system for processing continuous streams of data in real time. It is comparable to massively parallel processing systems such as hadoop, in that it breaks the processing task into smaller jobs which are executed in parallel.
It differs in that it processes the data continuosly as it is produced, and is therefore comparable to complex event processing (CEP) systems that are commonly used by financial trading organisations.
Storm was originally developed by BackType Technology, a social media analytics company that Twitter bought in June of this year.
In a blog post written before Twitter’s acquisition of BackType, the company explained that although it had been developed to analyse the kind of data streams produced by social messaging sites like Twitter, "Storm enables a whole new range of applications we didn’t anticipate when we initially designed it".
Storm is the latest example of technology developed to help large web companies process their vast volumes of data entering the public domain. Previous example include hadoop, to which Yahoo! has been a significant contributor, and Cassandra, a storage system built by Facebook.
These so-called ‘big data’ technologies are finding some applications beyond the web, such as processing data produced by smart electricity meters.