Era of Data Explosion
Whether we like it or not, most of our daily activities, rituals and interactions are migrating to cyber space, where every single detail can be measured, stored and retrieved. In August of 2010, Google CEO Eric Schmidt noted that every two days we create as much information as we did between the dawn of civilization and 2003. This is something like five Exabytes of data! Not surprisingly, in 2013, that number has doubled. The amount of data that is created every day is almost beyond our grasp. The unstructured and bulky nature of this big data has introduced new challenges to companies and forces them to cope with new emerging frameworks and paradigms (NoSQL, Hadoop, map reduce, spark etc.). While the popular technologies around big data constantly change , one thing has remained the same: the fact that exploiting this data is essential for business success.
Why is Social Media so Important?
Among many different sources, social media has been one of the most popular and influential players in the cyber arena. For example, Facebook alone takes up 15.8% of the total time spent on the internet. Let’s take a look at some basic Facebook stats as of December 2014:
- 35 billion monthly active users worldwide
- 152 million daily active users in US and Canada
- On average 7 billion pieces of content are shared by Facebook users daily
- 2 billion Facebook users shared a post during Thanksgiving 2014
- 350 million photos are uploaded to Facebook daily
- Each user on average spends 21 minutes on Facebook daily
- 57% of millennials use Facebook to organize an event at least once a week
- On average 1500 posts are eligible to appear on a Facebook user news feed each day
- Every minute 100,000 new friend requests are made
- Facebook takes in 600 Tera bytes of data daily
- 30% of Americans get their news from Facebook
- Wall Street estimated the value of each Facebook user at $128
Facebook only owns 36% of North America’s total social sharing; Twitter, YouTube, Instagram and other social media have their own audience. For instance:
- Twitter users tweet nearly 300,000 times per minute
- Instagram users post nearly 220,000 new photos per minute
- YouTube users upload 72 hours of new video content per minute
By looking at the above numbers, anyone can see the tremendous influence that social networking has gained over our lives.
Nevertheless, social media goes beyond social networking, blogs, forums, wikis, news aggregation (e.g. Digg and Reddit) and social bookmarking (e.g. StumbleUpon) are also examples of social media. Since people usually reveal their honest thoughts and preferences on social media, there are tremendous opportunities for businesses to ramp up their sales, cut down marketing costs and interact with their consumers directly on these platforms. The amount of insight and valuable information a business can extract from social media data is limitless.
What is Social Media Analytics?
Social media analytics, also known as social media listening or online listening, is the process of measuring, analyzing, and interpreting interactions and associations between people, topics and ideas in the cyber space . In essence, it usually starts with gathering data from social media websites and is followed by the mining of that data to improve business decisions and customer experience. Examples of social media analytics applications include :
Brand Monitoring: monitoring the company’s social image as well as its brand and products. (e.g. gathering early feedback on a new product, monitoring product quality and identifying the potential issues, monitoring brand and product sentiments and using them as key performance indicators).
Trend Recognition: for early identification of new trends. This is much more cost-efficient and up-to-date in comparison to the traditional market research approach like surveys and panels. (e.g. shift in consumer demands across different products, emerging hot topics of interest, growing consumer segments for business expansion etc.).
- Identification of the right consumer segments through online discussions and consequently the best website to advertise on.
- Identify centrality points or the most influential people in a particular segment to target marketing campaigns and provide incentive for them to endorse the products.
Enhancing end-user experience: to help consumers with their online experience. (E.g. showing trends, product recommendation engines, reputation monitoring etc.).
Other Applications: e.g. assess the general mood of a population on a daily basis and take advantage of good days. It is important to find the most efficient way to increase the company’s presence in an online network. This can be done by studying information propagation patterns in the network. Also, extract and track the flow of relevant information, etc.
Social Media Analytics maturity levels for enterprises
In the Collaborative Consulting white paper titled, “Developing Big Data Capabilities to Govern and Influence Sentiment” , a capability maturity model (CMM) for social media analytics is introduced, which explains the roadmap for how organizations can exploit the full potential of social media sentiment analysis as they advance through 5 maturity levels:
- Level 1 (Motivated): The enterprise becomes aware of the importance of social media and senior management is determined to take actions to leverage its potential.
- Level 2 (Organized): This level includes strategic and tactical planning: defining a joint business/technology/skills blueprint, a road map and acquiring the right skillset(s).
- Level 3 (Aware): This phase includes establishing a social media lab environment and initial data collection and model experimentation.
- Level 4 (Informed): In this phase the social media lab evolves and sentiment analysis is integrated into existing processes and everyday operations.
- Level 5 (Assertive): In this level a culture of sentiment analysis and social data science has been established and there is a closed loop between product/campaign strategy and sentiments.
The paper also provides a visual guide for each of the above steps, including the connection between different Hadoop ecosystems components and with data lab, etc.
That’s interesting, but I don’t have access to a big data infrastructure. With this being said, is there a quick way to get my hands dirty with social media data on my local machine?
While adopting a systematic maturity model like the one mentioned above is highly advisable for enterprises, data enthusiasts can access and play with smaller amount of social media data even without access to a big data streaming agent (e.g. Flume) and or distributed computing environment. Most social media platforms have a free API and there are also open source wrappers available in different popular programming platforms, like R and Python. These programming platforms simplify working with those APIs even further. For example, the R package “twitter” provides an easy interface to the Twitter web API and allows users to fetch data on followers, those they are following, tweets, hash tags, etc.
After extracting the data, it’s time to digest, process, and visualize it. In order to understand network dynamics, it helps to familiarize oneself with the basic graph and network theory concepts. There are free packages available which can help one/someone extract basic insights from network data even if they don’t have a deep understanding of the underlying principles:
iGraph Package in R: provides routines for simple graphs and network analysis. This iGraph package handles large graphs very well and includes functions for generating random and regular graphs, graph visualization, centrality indices etc. It has a built-in function to implement the PageRank algorithm, one of the mechanisms that Google uses to rank web pages. Applying this function identifies the most influential people or elements in a network, and the results can be used to market your products more effectively.
Network Package: provides tools to create and modify network objects. It inputs a range of relational data types, and supports arbitrary vertex/edge/graph attributes.
SNA Package: provides a range of tools for social network analysis, including node and graph-level indices, structural distance and covariance methods, structural equivalence detection, network regression, random graph generation, exploratory edge set comparison, and 2D/3D network visualization.
Statnet packages: statnet is a suite of software packages for statistical network analysis. The packages implement recent advances in network modeling based on exponential-family random graph models (ERGM), as well as latent space models and more traditional network methods. The components of the package provide a comprehensive framework for ERGM-based network modeling: tools for model estimation, for model evaluation, for model-based network simulation, and for network visualization. This broad functionality is powered by a central Markov chain Monte Carlo (MCMC) algorithm. The coding is optimized for speed and robustness. Both Network and SNA packages are a part of Statnet suite.
Social media is a prime example of “things we own, end up owning us”. Created as a tool to facilitate our social interactions, social media has gained tremendous power over our daily lives, and most of us are constantly influenced and affected by it. However, this emerging phenomenon has introduced many exciting opportunities as well. Businesses can harness the power of social media to increase their sales, cut marketing costs, to monitor market trends, and to enhance their brand image. There are many free APIs and code wrappers available that can help interested individuals gather and analyze social media data. So, what are you waiting for? Go ahead, get your hands dirty and have fun with it.
 The era of one-size-fits-all data management has been almost ended and Hadoop is one of the many options for companies to choose from
- Lo, Bobby, “Social media analytics in business intelligence applications”, Thesis (M. Eng.)–Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008