The “What” and The “How” of Big Data

In Profiles & Interviews by Sunand Menon6 Comments

ID-100111445Here’s a discussion on big data and analytics and their potential application within the social sector. Sunand Menon walks through the landscape, adding further comment on analysis (the skills we’re using to make sense of data) and analytics (the tools being developed to help that work.) This kind of exploration is important given that the word “data” is sometimes used, incorrectly, as an implied and casual synonym for “analysis.” But, what “the data say” is not as self-evident or certain as that common refrain. Analysis says a lot more. And then, there’s still a decision left to make. Time to hike a sleeve and take a tour inside the machine.

“What is ‘big data’ and ‘analytics’? How does it apply to my sector? How can it help my organization?”

Those were the opening questions posed by a prospective client, as we sat down to breakfast.

I appreciated the directness, and immediately launched into a simple explanation, without the technical jargon. Big data is really just information that is continually generated, collected and stored. It’s called ‘big’ because it represents a volume that is so immense, and growing so quickly, that we cannot quantify it. Think about it, if we gave it a value (e.g. 100 zettabytes = 1 with 23 zeroes after it), what would we call it after we reach it? ‘Bigger data’?

Big data comes from everywhere. Text messages, emails, climate sensors, Facebook ‘likes’ and ‘comments’, iPhone photos and videos, you name it. In the nonprofit world, it could be financial data, donor information, outcome data, or individual viewpoints on a foundation or a charity.

Big data consists of both ‘structured’ and ‘unstructured’ data. ‘Structured data’ is what we traditionally called data – lots of numbers, all categorized into specific groupings or fields, and recorded in spreadsheet-type rows and columns. ‘Unstructured data’ is basically any other data e.g., text, comments, pictures, videos, individual or group sentiments. One of the reasons why our data creation, collection and storage has improved so much recently is because developments in technology have now allowed us to increasingly (and more efficiently) process unstructured data.

So what can we do with the big data we collect? We can analyze it and make better decisions. This is where the power of ‘Analytics’ comes in. These are rules that we construct and apply, generally in the form of mathematical/logical algorithms (IF, THEN, AND, ELSE).

For example: IF a registered user ‘likes’ a specific charity on Facebook, AND then ‘comments’ positively on a similar charity on LinkedIn, AND has identified themselves as previous donors to yet another similar charity, THEN one can assume there is a higher likelihood that donation requests from charitable organizations serving that cause will have a higher chance of success. Imagine the efficiency leap in such situations – they can actually be used to predict future behaviours and patterns.

So that, in a nutshell is big data, and analytics. And how they can be applied to virtually any sector, including the nonprofit world. And how they can help the organization – by facilitating better decision-making.

How do I get to the point where I am making these better decisions?

It all starts with defining the need. For instance, let’s suppose that: “The nonprofit world will benefit tremendously by facilitating the collection and storage of all relevant nonprofit data, and by building relevant performance analytics around them. This, in turn, will facilitate better decision-making (e.g. resource allocation), which will lead to better outcomes, more efficiently.” Not an unreasonable hypothesis – it’s been done in other sectors (see a previous blog post )

In my view, the most difficult part involves defining scope and breadth of “relevant data” and “relevant performance analytics”. Let’s assume that the data and analytical methodologies are defined to a “good enough” level. How do we now build this?

Firstly, the data has to be collected – they may be coming from different databases, and may be in varying forms, such as numbers or words, or even at different time intervals, e.g., once-off data versus real-time, streaming data (think of stock price data). They can all be loaded because of new developments like ‘Hadoop’, a free software framework that supports processing of large datasets.

Secondly, the data has to be integrated and stored in analytical appliances. Often, the various forms of data are placed in preconfigured hardware and software systems, ready to be retrieved according to a set of criteria. You may have heard about capabilities such as ‘Netezza’ or ‘Greenplum’ or Exalytics’ – these are simply analytical appliances owned by IBM, EMC and Oracle, respectively that do this work.

Thirdly, the data has to be processed and analyzed to derive insights. In big data analysis, a variety of mathematical techniques are used – e.g., collaborative filtering, clustering, categorization, all on programming platforms such as ‘MapReduce’ and ‘Mahoot’. They essentially identify patterns and linkages between diverse datasets – no matter how unlikely the potential link between them.

And therein lies the power of big data and analytics: sometimes correlations might exist that we could never have imagined.

Finally, the data has to be visualized to convey the results of the analysis. This can be done for complex datasets using statistical packages such as ‘SAS’, ‘SPSS’ and ‘R’. This can also be done using intuitive dashboards or interfaces with a variety of styles, e.g., Tableau, Geckoboard, Domo or Inverra.

The good news is that this has been done many times over in several industries, and there are many organizations that can help in the build-out. At the large, and complex end of the execution scale are big technology and services multinationals like IBM, Accenture, and EMC. At the smaller end, a multitude of boutique consulting and technology services companies exist. The implementation cost can range from six figures to tens of millions of dollars, depending on the size and complexity of the data and analytics involved. In the context of the vision for Markets for Good, it may be worthwhile to work on a proof-of-concept project involving a subset of “relevant” structured and unstructured data, and a draft set of “relevant” performance analytics. We may be pleasantly surprised with what we find.

(Visited 612 times, 1 visits today)

Comments

  1. Wow, Sunand! We’re long lost cousins! The system you described in your article is exactly what my partner and I are developing. Not only that, but I managed the data team at Thomson Reuters that maintained TRBC. I love to see data experts applying principles of knowledge management toward addressing social challenges. Would love to continue this conversation with you!

    1. Britt, great to hear from you, and glad to see that your TRBC experience has spurred your interest in helping transform the social sector. Stay tuned, there are plenty of opportunities to contribute in this space.

  2. This was an incredibly helpful blog post in explaining, in jargon-free prose, just what “Big Data” is and how it could be used. I think an under-appreciated barrier to the use of big data in the social sector is that non-profit staff don’t understand big data and feel it’s something that only extremely large corporations have the resources to employ.

    Another obstacle, though, and one that is hard to overcome, is hinted at when you say , “the most difficult part involves defining scope and breadth of ‘relevant data’ and ‘relevant performance analytics.'” Social problems are extremely complex, and the very task of defining what success looks like in addressing them can be challenging. If we struggle to know what exactly we’re looking to achieve, even the most sophisticated data analysis is not going to help us get there. What’s more, even when we have a shared definition of success, it’s not always clear what questions we should ask of data in order to achieve it.

  3. Hello, Nathan. Your points are well taken. There is a recurring theme in the social sector re unquantifiable success and impact.

    I think its time to add to this problem ID some thought on the emerging quantification efforts going on while continuing to find more ways. For example: While third-grade reading scores may not be linked to prison funding (http://bit.ly/12vz53J) they do indicate a probability for not graduating from high school (http://huff.to/11IOqq9). Each student who can read on his level by third grade represents a clear economic value. To bring the point home we could connect the two concepts. See this study that notes a correlation to high school dropouts and prison populations: http://nyti.ms/15G7Kc5. No need for a new study – only linking the data we already have. Now we get quantifiable social outputs: Each student who can read proficiently represents an arbitrage opportunity vs. the cost of prison: In 2011, California spent $9.6 billion on prisons, versus $5.7 billion on higher education….. The state spends $8,667 per student per year. It spends about $50,000 per inmate per year.

    That 8667, by definition, must generate more follow-on economic activity than the 50,000.

    Much more alchemy goes into the valuation of internet-based companies offering apps with admittedly narrow use. They continue to get funded.

    Thus the job of the social sector must be not only to codify successes and impact scenarios, but, more importantly to create a market pressure that will lead to accepted valuations that can motivate funding of social sector work.

    A beginning step would be to stop with definitions of impact that are too wide. One organization or one issue area solved cannot cure a complex problem. If, then, a school were to identify the organizations dealing with health and food as those providing their best input – healthy, well-fed students – then we could examine the work that should happen at specific intersections.

    Perhaps it may make more sense for a school to buy more food than books. Or, maybe the school should collaborate with organizations combatting sleep deprivation – a significant contributor to obesity.

    I don’t pretend that any of these are new ideas. They’re not. What we want to do is find examples of people executing a way out of this.

  4. Sunand’s piece gets us expertly into what we in the UK call the “brass tacks”of big data. The way forward is absolutely to take slices of data to show what we can do in analytic terms before scaling out. Consumer research in other fields has long since developed extensive segmentation of customers. We can do something similar with service users to understand them in their variety and indeed explore what works better for specific types of person / need. This calls for powerful forms of cluster and discriminant analysis. We can rank programmes by the total added social value for £$ etc expended – stochastic frontier analysis. There are great possibilities; But first off we need some decent data sets and that is no trivial matter.

    Matthew Pike, ResultsMark

Leave a Comment