In this article, we’ll be exploring the results of text analysis on YouTube comments for the mountain biking reality TV show “Pinkbike Academy.” If you’re a fan of the series, or just interested in analyzing social media comments, read on to learn more about the insights gained and challenges faced from this project!
Background
I started mountain biking in the summer of 2021 and quickly became addicted to the adrenaline and chance to focus and get into the “flow” it provided. I find the sport to be very engaging and a healthy escape, and I enjoy being a part of the fun community as well. Mountain biking is big on social media, especially YouTube, and after a couple of years of watching videos on the platform, I stumbled upon the mountain biking reality TV show “Pinkbike Academy” (Pinkbike Academy Season 3). Pinkbike Academy focuses each season on 10 aspiring professional mountain bikers in Big White, British Columbia, as they compete in a series of challenges for a $30K prize and a pro contract from Orbea Bikes. The challenges are an interesting mix of enduro races, fitness challenges, mechanic tests, and even a social media production competition. I loved the series! I saw a fun opportunity to try something I had been wanting to do, which was scraping YouTube comments. You can find my article on how to do that here.
In the below article, I’ll be walking you through some of the results I found interesting and some of the challenges of doing sentiment analysis on YouTube comments as well.
The Results: Top Liked Comments
To start, let’s just dive right into some of the data. Below is a table of the Top 10 Comments sorted by # of user likes across all 3 seasons. I’ll encourage you to read through the comments yourself first, then I’ll add some thoughts of mine and context for those who haven’t seen the series below.
Rank | Comment | # of Likes | Episode |
1 | The answer “quit my job to be here” No one else was going to win after that. | 1,040 | S1 Episode 10 |
2 | Based on the instagram pages Brad is now racing for Yeti and Emmet for Norco. So it seems like we have 3 winners 🙂 | 1,023 | S2 Episode 10 |
3 | Addison wouldve been perfect with this challenge. Hes got expertise on being a bike shop mechanic | 910 | S1 Episode 7 |
4 | I hate reality TV……when does the next episode drop? | 848 | S1 Episode 1 |
5 | Cam hurts himself badly, flats for a loss, and acts like nothing happened!! What an absolute example of an inspirational athlete. Thanks Cam! | 790 | S2 Episode 6 |
6 | How many people want a season two like if u do | 734 | S1 Episode 10 |
7 | Gutted for Tarmo, what was the point of having them race when it was basically about who already has the biggest social media following. Poorest season so far. | 733 | S3 Episode 10 |
8 | Am I the only one that finds Addison the most likeable? | 706 | S1 Episode 4 |
9 | How you gonna eliminate the guy who rode incredibly despite an injury, but keep in the girl who didn’t even look like she was racing? Edit: Brody himself has given us an update in the replies to this comment. I didn’t have the full picture while writing the original comment. | 693 | S3 Episode 3 |
10 | It seemed like Cody was just staying with Flo and encouraging her so he’d have an excuse to not try hard. NOBODY else in that race could talk on the climbs but he was constantly talking to Flo. Also between the pillow thing and excuses at the end… just eliminate that man already | 649 | S2 Episode 4 |
- General Insights
- For anyone who has been in the YouTube comments section for even a few days, you would know that comment sections can get a little hairy. But, overall I was pretty impressed that most of the top comments at least were pretty positive and supportive when it came to the Pinkbike Academy series. The mountain biking community seems to be excited about cool content like the PB Academy, so there is more excitement than criticism about how it is executed or what is bad about it.
- There is a slight trend of the comments becoming more critical from Season 1 to Season 3. I wonder if it’s because of the audience becoming more and more used to the show (and more comfortable criticizing), or if it’s because there were actually more pieces to deserve criticism in later seasons.
- It doesn’t show up in the below comments as much (except for a bit in comment #9), but I noticed that the actual riders who competed in the series are actually fairly active in the comments section. It’s cool to see the riders interact with the community and even push back on some of the negativity (e.g. Brody clarifying his situation with one of the comment authors in comment #9). I think examples like these just help me love the mountain bike community more and hope that it stays as supportive and positive as I’ve been exposed to.
- Specific Comment Notes
- Comment #1 below: you’ll see the short comment that won the most likes from PB Academy fans. It was a comment referencing when the eventually crowned champion, Evan, had told the judges he quit his job to come to compete at PB Academy. I haven’t actually seen Season 1 yet but this comment alone makes me want to go back and watch this Season to get to know more of his story.
- Comment #7 below: you’ll notice the author expressing frustration with the judges and how Max was crowned Season 3 champion over Tarmo. This was something I noticed throughout the comment section on the Season 3 Finale. I would have to agree that it was a bit frustrating to see Tarmo (the strongest competitor) end up losing to Max. But, after some reflection, Orbea (the bike sponsor) really needs to make sure that their sponsor dollars are going to good use here. Without Orbea, it’s very likely that we wouldn’t even be able to enjoy this series, as I’m sure it takes plenty of financial funding to make it possible. Max certainly has a better social media presence than Tarmo which will help Orbea more, and Max is also more than enough impressive when it comes to competing. Overall, I think they made the right decision for the show + Orbea. But, maybe if Pinkbike Academy is better funded in the future, it may not be so reliant on sponsor needs and can focus more on pure competition.
- Comment #9 below: you start to see one of the more critical comments surfacing up. However, I was impressed by how the author took the time to listen to Brody’s response on the matter (guessing his injuries during this point of the series forced him to retire) and update his stance on the matter. Shows a bit of character in an otherwise unruly land of YouTube comments.
The Results: Top Keywords Over Time
Next, let’s take a look at some of the top-mentioned names and keywords across the seasons. I’ve highlighted a few words that stood out to me, but if you’d like to interact with the data yourself you can access the Tableau Public Dashboard here.
Below, you’ll find a line chart for each of the three seasons that represents the “% of comments that mention a keyword”, said differently, a metric documenting what percentage of the comments mentioned a particular keyword. One way to interpret this graph is to focus toward the end of the line chart, where you will see the first name of the winner of each season is typically the most mentioned term (except Season 3!).
Season 1 Keyword Counts Over Time
Season 2 Keyword Counts Over Time
Season 2 (Winner: Flo)
Season 3 Keyword Counts Over Time
- Keyword Counts Insights
- In both Season 1 and Season 2, the winners Evan and Flo were the most mentioned keyword in the final episode. However, in Season 3, both Tarmo (2nd place finisher) and the show sponsor (Orbea) were mentioned more than the actual winner, Max.
- In Season 1 you can see Addison peaked during the middle of the season and then the same for Cody in Season 2. After reading through some of the comments to check for Positive vs. Negative sentiment, seems to be a lot of love from the fans for these two!
- Interesting to see that Orbea is mentioned so much in the final episode of Season 3, but we don’t really see many sponsor mentions otherwise.
Overall, it was interesting to track and see who and what was getting mentioned in the YouTube comments section as the seasons progressed. I encourage you to check out the data for yourself! (Tableau Public Dashboard here.)
Challenges
Now, the challenges. Admittedly, I could have gone further into some of these issues but I’m choosing to move on from these issues for the moment to focus on other opportunities. Here are some of the main challenges I faced below:
- FAIL: Using Sentiment Analysis to gauge how the crowd felt: I started to take a look at classifying the Positive vs. Negative sentiment of some of the YouTube Comments. Initially, I decided to focus on some of the Season 3 Episode 10 comments that I knew were more critical as compared to other episodes. I tried a few different sentiment analysis packages, but the technologies all seemed to struggle with the sarcastic tone that the commenters were using or the lack of context in this situation (for instance a lot of people focused on giving Tarmo the 2nd place finisher compliments) to express their frustrations. Some examples below:
- Joke of a decision. Congratulations to Tarmo, he was the best rider. Hopefully he will get picked up by a good team for next season
- Lots of positive words mixed in here such as “Congratulations, best, hopefully, good” that likely confused the sentiment package. But, reading as a human, we can obviously tell this is a criticism of the show.
- It’s great that Max gets to present Orbea to customers at the pit, in races where Tarmo wins!
- Sarcasm goes undetected with the context for the NLP packages here. (For context, the joke is that the 1st place finisher Max will always be trailing the 2nd place finisher Tarmo in any mountain bike competition.)
- Funnily enough, they didn’t even bother to show the final results of the race. Well done Tarmo!
- Again, a criticism of the show here. The lack of context makes it tough for any sentiment package to detect its criticism though.
- Joke of a decision. Congratulations to Tarmo, he was the best rider. Hopefully he will get picked up by a good team for next season
- CHALLENGE: Lots of filler words (bike, great, show) in the results: I used the general stopwords provided by NLTK to remove some of the most commonly ignored keywords. However, there were still many keywords left that didn’t get removed that we’re pretty distracting in the results (bike, good, rider, mountain, etc).
- A few options in the future could be to look into different stopword lists such as gensim’s, spaCy’s, or sklearn’s stopwords. Another thing that I could have tried would’ve been to have performed a TF-IDF, or term frequency-inverse document frequency to find unique words that only belong to each season (e.g. names and certain unique keywords would’ve been unique to each season, while general words like “bike” or “mountain” would’ve been removed).
Conclusion
This was a fun mini-project to get my hands into some YouTube comments data and some basic sentiment analysis. This project has me thinking of different questions I could answer using YouTube comments data, but I’d love to hear your ideas as well. Add your project ideas, thoughts, or questions in the comments below!
Thank you for reading! If you have any feedback or thoughts, would love to continue the conversation — add a comment below. Or, you can reach me directly at @JacksonBurton11 on Twitter or email me at [email protected].
If you’d like to stay up to date on any future Off Road Analyst posts, sign up below!
cool project. can we apply this to mtb gear videos to predict bike trends? how wheel size, frame design, suspension etc will evolve in the future.
OR, using past world cup results, along with news articles to predict world cup standings over the season. hard data like competition standings and news articles would give you less trouble with filler words and sarcasm
Thanks! Some interesting ideas. For your first idea, the first thing I thought of was 99spokes.com/trends, where 99spokes tracks trends on 69,065 bikes in their database. The benefit of the 99spokes view is that the data is likely very high quality, whereas any YouTube comment scraping could be quite messy and potentially unreliable.
But one criticism of 99spokes process is that they are just measuring the “supply side trends”, or in other words, what are the manufacturers doing? Whereas with YouTube comments, we could measure the voice of the consumer and more accurately define what the consumer values in the industry, as opposed to just what’s coming out of the factories.
And for your other idea, I haven’t followed the world cup in the past but now I’m interested in maybe getting into it this year.. would you recommend it?