Friday, May 1, 2020

Innovation Pathways Large Data Analysis

Question: Discuss about the Innovation Pathways Large Data Analysis. Answer: Introduction Ever since social media has gotten into the mainstream, there has been an explosion in its use, both for social and commercial purposes. As businesses seek to gain a greater market share, many have resorted to online advertising using paid up ads; however, social media advertising has gained even greater popularity as an advertising platform. Businesses need to make sense of the social media data by analyzing sentiments from comments in order to improve their products or develop a better understanding of their clientele. However, mining social media data, especially hash tags pose a challenge in that there are many variations in the quality and frequency of semantics usage. This creates a problem for automated data mining when approaches such as obtaining metadata of a hash tags lexical semantics, or mining information from the text associated with hash tags (contextual semantics). Sentiment analysis refers to the process of identifying and categorizing peoples opinions in text using computational approaches in order to understand the attitude of the writer towards a topic and is a fast growing research area (Medhat, Hassan, Korashy, 2014). Clustering entails grouping objects in a group based on their similarity. Semantics refers to logic aspects of language concerned with meaning (Daim, Chiavetta, Porter, Saritas, 2016). This thesis proposes a review of existing methods to solve the identified problems of semantic analysis by evaluating past research and developing a novel approach for clustering social media hash tag semantics. This is aimed at providing businesses with a better method for deriving sense and metrics from their social media activities to better understand their customers and product in a more accurate manner. Problem Statement The present approaches to sentiment analysis entail the use of symbolic techniques where every term (feature) is assigned a specific sentiment score to measure its intensity and direction as either being negative or positive. The score of a document is computed using aggregation techniques for the score for each term. Another approach to sentiment analysis is the approach of supervised machine learning where functions are inferred from training data that has been labeled. However, these techniques have drawbacks; the symbolic approach relies completely upon score terms to create a document class, yet the method is too simplistic and rudimentary and has high levels of inaccuracy to be used meaningfully. Further, the supervised learning, through an improvement over the symbolic technique, is very expensive since large quantities of training data is required and these must have their classes manually pre-defined (Li, Liu, 2012). As such, a better method for sentiment analysis is necess ary; the method should result in greater stability, efficiency, and accuracy as well as requiring little human input, and being as cost effective as possible. Background to the Problem People make comments and opinions on social media, using hash tags related to specific topics to give their opinions; to make sense of this opinions and use it for purposes such as improving products by businesses, the opinions are mined using computational approaches, a concept termed sentiment analysis (Medhat, Hassan, Korashy, 2014). The data sets used for sentiment analysis and how they are used are of immense importance in creating meanings upon which important decision s can be made. The methods commonly used include supervised machine learning and symbolic techniques. Symbolic techniques include approaches such as human scoring of sentiments on a scale ranging from positive to negative; for example, poor can be scored as negative. This is the simplest approach to sentiment analysis, but has the drawback of being highly subjective and inaccurate. To get better results, lexical databases such as WordNet are employed for scoring, but this is mainly for the English language where English words are grouped into sysnets and entails only scoring adjectives. The database defines relationships between synonyms allowing for scoring based on the distance or similarity between two words. Scoring can also be achieved through web search which works on a similar approach to WordNet where the familiarity of words is measured. The other approach of supervised machine learning entails the extraction of objective sentences from a document using either a 3 or 4 point scale using a technique of negation processing where ME (maximum entropy classification), NB (Nave Bayes classification) and SVM (support vector machine) are used (Li, Liu, 2012). These challenges are magnified when there are hash tags involved in sentiment analysis (Fernandez, Mart?nez-Barco, Gutierrez, Gomez, 2015).Using training data, the accuracy of the methods ranged between 73% and 83%, which are still low and better approaches to provide greater accuracy with little human input in an efficient manner are of great importance. Research Design This paper proposes the use of a descriptive qualitative research design in which past work will be reviewed and past algorithms as used in sentiment analysis evaluated for their merits and limitations. Works by past researchers in the field will also be reviewed for their suitability and practicality in solving the stated research problem; data and metadata to be used for identifying hash tag lexical semantics will be proposed and sense level algorithms developed and tested, using examples to demonstrate a novel method for clustering semantics used in social media This research thesis will, after reviewing and evaluating some of the popularly used approaches and algorithms in semantic analysis, propose the use of sense level semantic clustering based on metadata for hashtags and a hybrid method for clustering, employing consensus clustering. Using examples, this paper will demonstrate the accuracy of the hybrid approach in solving the research (and practical) problem with respect to efficient, simple, non-human mediated and efficient method for sentiment analysis. This will add to the existing body of research on sentiment analysis, specifically focusing on hashtag sentiments as used in social media, and geared towards helping businesses generate more accurate profiles of their products based on customer sentiments. References Daim, T., Chiavetta, D., Porter, A., Saritas, O. (2016). Anticipating Future Innovation Pathways Through Large Data Analysis (1st ed., p. 68). Cham: Springer International Publishing. Fernandez, J., Martinez-Barco, P., Gutierrez, Y., Gomez, J. (2015). GPLSI: Supervised Sentiment Analysis in Twitter using Skipgrams. In 8th International Workshop on Semantic Evaluation (SemEval 2014) (pp. 296-297). Alicante: University of Alicante, Department of Software and Computing Systems. Li, G., Liu, F. (2012). Application of a clustering method on sentiment analysis. Journal Of Information Science, 38(2), 127-139. https://dx.doi.org/10.1177/0165551511432670 Medhat, W., Hassan, A., Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4), 1093-1113. https://dx.doi.org/10.1016/j.asej.2014.04.011

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.