21 December 2018
I recently ran a webinar for the Information Lab's Let's Talk Data meetup group. I will put together a full-text rundown of the process later, but I wanted a quick highlight of the topic first.In the webinar (and the recorded video) I covered off 3 different methods to implement a sentiment analysis project in Alteryx, I used the IMDb reviews dataset compiled by Stamford.The first, quickest and easiest way to implement sentiment analysis is to use some form of API service. In the webinar, I used the Microsoft Cognitive services API. There are other services by Google, Amazon, and start-ups like theysay. (As a note I used the Microsoft API as there is a prebuilt Macro on the gallery, you can find the help information here.The next two processes required some further data preparation (unlike the API method) to isolate individual and important words.The second method was to use a dictionary lookup with the subjectivity lexicon from the University of Pittsburg. Using a Find and Replace method on the individual words I identified the words likely to be more positive or negative. Once the polarising words were found aggregating to the original review level allowing the positive or negative reviews could be identified.The final method was to create a simple logistic model using the built-in Alteryx R tools, we focused only on the most common 20 words (for the sake of time) but you could use all the words to develop the model if wanted. Taking the reviews that were split to one row per word I transformed the records to a 'One Hot Encoded' dataset were each word represents a column and each row a review. The values returned would be a 1 for reviews with the word, 0 for reviews without the word. This dataset would then be run into the logistic regression model to check which words are most likely result in a positive or negative review.The webinar was posted up on The Information Lab YouTube page. Or you can check it out below.https://youtu.be/EM_qAshXWPI