Thursday, November 5, 2015

The important big data

IBM took another step in software-defined storage this week  - announcing Spectrum Scale 4.2. (and a lot more.) This product has a long history of leading performance in scale-out technical computing. (and still growing in that market - See IBM at SC15)  However, this release goes well beyond that installed base. 

It used to be the important big data was the raw data from which great decisions could be made. The old important big data were things like manufacturing simulation, financial risk analysis, genomics, and seismic. Things computers could crunch! 

The new important big data is user behaviors, digital marketing results, output from 1000s of sensors, healthcare records and images -- lots and lots of images, videos, digitized voice. Some of it for big, but most of it we just used to throw away. This data isn't just crunched. It's massaged, interpreted visualized and factored. The inputs are messy and the outputs can be surprising. 

The new important big data needs to be accessed via object because RESTful APIs are easier to program. The spectrum of HDFS tools are needed to walk through it, plus the various commercial plugins to validate and visualize it. The new release of Spectrum Scale has unified object, HDFS transparency, as well as the traditional file support. For the new big important data, file servers don't cut it. 

Going even further, the new important data isn't just for experts. Spectrum Scale  has a UI - and a pretty good one by the look of it. (See Bob Oesterlin's post on his first impressions.) The new important big data needs to be quick and adaptable and multi-application - and easy to use. 

Spectrum Scale 4.2 - a good example of why it isn't just GPFS anymore. 

Cross Posted to Linked-In

No comments: