INFORMS Big Data Conference in San Jose, CA

About 300 practitioners and academics with a passion for Big Data gathered at the San Jose Convention Center for a great INFORMS event on June 22-24 (#INFORMSBigData).

The big themes of this conference included streaming data, open source tools, and Internet of Things (IoT). Moreover, there’s a consensus to move from predictive to prescriptive analytics for improved decision making.

On June 22nd, pre-conference vendor workshops took place. I decided to first attend the FICO session (@FICO, see two pictures below). I mainly learned that FICO is not just doing credit scores, but also entered the market of software tools for predictive analytics (including FICO® Model Builder and FICO® Xpress Optimization Suite, and of course a cloud offering: FICO® Analytic Cloud and the FICO® Decision Management Platform). The presented particularly highlighted the robust optimization capabilities of their tool.

Tuesday 24 June 2014

This was a great event. Congratulations to the organizers! Smaller events such as these allow for more intense interactions. Personally, I prefer them over much larger events.

Next, yours truly (@dirkvandenpoel) attended the SAS technology session. They showed the machine learning capabilities of SAS® Enterprise Miner(TM). Patrick Hall (@SASSoftware, Research Statistician Developer, see picture below) discussed supervised and unsupervised machine learning. A particular highlight was the improved R integration in Enterprise Miner 13.1 (see slide below).

Then, it was time for a nice welcome reception with nice food. It was the ideal opportunity to do some networking and visit the booths in the exhibit area.

next >

< previous

This was held by Bill Franks (@BillFranksGA) Chief Analytics Officer at Teradata Corporation, see picture below). His keynote was titled “Putting Big Data To Work”, which offered an introductory helicopter view of the industry.

Next, I attended the case study track. The first talk was by John Erik Koch (Director, Informatics, Merck & Co., see picture below) titled “Big Data in Practice - Scientific Information as a Business Asset – Driving Productivity at Merck Research Labs Through Novel Approaches to Scientific Information Management”. This presentation was about managing scientific information, i.e., study results, analyses and historical records. These are often lost due to poor information management practices and failure to steward information in a way that can be leveraged for future purposes.

Next, I switched to the emerging trends track to listen to Prof. Dr. Ion Stoica (UC Berkeley and CEO at Databricks and CTO at Conviva, see picture below on the right). The talk was titled “Taming Big Data with Berkeley Data Analytics Stack (BDAS)”. Today’s data analytics tools are slow in answering even simple queries, as they typically require to sift through huge amounts of data stored on disk, and are even less suitable for complex computations, such as machine learning algorithms. To address this challenge, they are developing BDAS, an open source data analytics stack that provides interactive response times for complex computations on massive data.

After a nice lunch, I attended the session by Paul Yacci (Data Scientist at Booz Allen Hamilton, see picture on the right) titled “Machine Learning on Streaming Data with Storm and MOA”. Analysis of streaming data enables real-time actions in applications such as network defense and fraud detection. Storm pro- vides an open source distributed stream processing framework designed to scale to the demands of big data. Storm can be extended to include machine learning by integrating the open source library MOA (Massive Online Analysis), which provides machine learning algorithms capable of online learning.

Next, it was time for our poster session. Michiel Van Herwegen (UGent and Virdata) and Jeremy De Clercq (Virdata, a Technicolor project) compiled a nice poster titled “Scalable, Real-time Big Data Analytics for Connected Cars”, which yours truly (@dirkvandenpoel, see picture below) was happy to present and explain at the event. We are proud to have been awarded the 3rd place winner of the the poster competition.

Due to the overwhelming interest in our poster, I missed out on the other posters during our session (and even part of the subsequent presentations). I just had the opportunity to take one picture.

Next, I attended (part of) the presentation by Pinar Donmez (Chief Data Scientist Kabbage, Inc., see picture below) titled “Smart Machine Learning to Lend to Small Businesses”. Consumer/business loans and other areas of credit scoring have been disrupted by interdisciplinary technologies combining risk management with machine learning and advanced data analytics.

Next, Birds-of-a-Feather Discussion Groups took place. Below, you see a list of topics (and their respective popularity). Given our own research interest, I attended the streaming analytics interest group.

That evening, the conference walking dinner took place at the Marriott San Jose Grand Ballroom. This was another great networking event.

On June 24th, the event started off with a keynote talk by Michael Svilar (@msvilar, Managing Director at Accenture, see picture below) titled “Big Data in Action: Applying Analytics to the Internet of Everything”. There are upwards of a trillion connected and instrumented things: cars, appliances, cameras, roadways, pipelines...even pharmaceuticals and livestock. The talk focused on how organizations can drive positive business outcomes in the connected world using big data analytics.

Next, I attended the talk by Alan Papir (Software Engineer at Analytics Media Group AMG, see picture to the right) titled “Optimizing Media Purchasing Through Big Data”. Using various modeling and data mining techniques in conjunction with large and rich datasets (such as billions of set top box records), AMG discovers who is most likely to “convert” to a product or candidate at the person-level. AMG then takes these desirable targets and uses a trove of set top box data to produce a near- optimal solution to problem of purchasing the most valuable placements given a limited budget (a multi-objective variation of the knapsack problem).

After the coffee break, Anton J. Mobley (Kaiser Permanente, see picture below) delivered the talk titled “Python, R, and SQL in MPP Databases”. This talk discussed MPP (Massively Parallel Processing) databases and shows how to combine them with analytic tools to start developing models.

After lunch, Link C. Jaw (Fellow at Intel Corporation, see picture below) delivered a talk titled “Big Data Analytics Application to Jet Engine Diagnostics”. As an #avgeek and IoT researcher, yours truly had to attend this talk!

Next, it was time for another poster session.

Below, you see the winning poster as well as the winning team: Vinh Q. Nguyen, Nimra Siddiqi and Nishant Rohatgi (Toyota Financial Services, see picture below); Using Big Data and Text Analytics to Manage Attrition. Congratulations!

As a final session, I attended Kevin Foster (Big Data Solutions Architect at IBM, see picture below)’s talk titled “Babies, Brains, and Buses... and Why Stream Computing is the Right Approach to Real-time Predictions and Decision Making”. Babies, brains, and buses. What do they have in common? Each of these have been the subject for stream computing projects that monitor data in real-time with decisions made quicker and in higher volume than is possible with any disk or even memory based store-then-query architectural approach. Real-time decision making can often require sub-second results, but can also just be within seconds or even minutes based on the needs of the project.