Multilingual Sentiment Analysis: What It Is and How to Do It Right
Social media has revolutionised the way businesses connect with their customers, offering valuable insights into their sentiments and opinions. Within this realm, sentiment analysis, also known as opinion mining, plays a vital role as a subset of social listening.
As businesses expand globally, the complexities and opportunities surrounding social media multilingual sentiment analysis become increasingly evident. In this blog post, we’ll explore these challenges and advantages, and how social media multilingual sentiment analysis can help businesses unlock valuable insights in real time.
In this post:
- What is multilingual sentiment analysis?
- Why is multilingual sentiment analysis important?
- How does multilingual sentiment analysis apply to marketing?
- The connection between language emotions and language
- How to carry out multilingual sentiment analysis
- Does sentiment analysis work in all languages?
- Types of sentiment analysis
- Challenges of conducting multilingual sentiment analysis as part of a global marketing strategy
- Recommendations for successful multilingual sentiment analysis
- Final thoughts
What is multilingual sentiment analysis?
Multilingual sentiment analysis, also called multilingual opinion mining, is a subset of social listening.
Social listening involves the monitoring and analysis of conversations across social media platforms to identify sentiment, causes, public opinion, and social trends. Within this discipline, sentiment analysis focuses on the identification, extraction, analysis, and labelling of consumers’ feelings and opinions expressed on social media. When sentiment analysis is carried out in more than one language, we speak of multilingual sentiment analysis.
Multilingual sentiment analysis is becoming an increasingly important part of social listening because of the growing amount of people actively using social media – 4.48 billion people to be precise. When you consider that only 25.9% of internet users speak English natively, sentiment analysis in languages other than English becomes increasingly important.
In other words, the diversity of languages and cultures on the world wide web strongly influences social analytics and social listening, and sentiment analysis in English alone isn’t enough.
By acknowledging that sentiment is inherently linked to language and culture, multilingual sentiment analysis helps companies break down language barriers and catch valuable insights in real-time.
Why is multilingual sentiment analysis important?
Multilingual sentiment analysis is important because sentiment drives action:
- If people love a product, they’re going to keep buying it, and they might even tell their friends and family about it.
- If people love the ideas of a politician or hate a public figure, they might elect them into office or discredit them completely.
- If public sentiment related to a future demonstration or parade indicates potential violence, governments might take precautions to avoid the sentiment from becoming reality.
The reason why conducting sentiment analysis only in English is that our emotions and consumer behaviour are greatly influenced by our culture and language. It’s a bit like SEO and SEO translation: if you don’t have a good grasp of the user’s cultural context, your efforts are likely to fail.
Therefore, for organisations with an international customer or user base, sentiment analysis cannot be an English-only game. The sentiment of your customers from Portugal, for instance, will require analysis in Portuguese if you want to avoid sentiment inaccuracies and misinterpretations.
How does multilingual sentiment analysis apply to marketing?
In marketing, sentiment analysis allows companies to see how customers feel about their products and services. This information can be used to improve offerings, fine-tune marketing campaigns, and reach out to audiences who might not know about the brand yet. This makes sentiment analysis a fabulous tool for:
- Reputation management
- Consumer behaviour analysis
- Brand management
- Brand awareness
- Competitive intelligence
- Customer service
For example, if a brand finds that the sentiment around their latest product launch is negative, they can take a step back and see what went wrong. Is it the price point? The colour combination? The ingredient list? With sentiment information, companies can develop better products for their customers.
Another example of the power of multilingual sentiment analysis in marketing that I always use is the case of Vans, a footwear brand striving to become a vehicle for creative self-expression. Doug Palladini, Global Brand President for Vans, explains: “We are focused on building culture from the inside out with people who are organically connected to our brand.”
The best way for a brand like Vans to understand and connect with its global audience is through multilingual sentiment analysis. In 2016, Buezo and Weleh Dennis, both first-generation Americans, started a unisex streetwear line Kids of Immigrants celebrating the diversity of the immigrant experience. Social media users who related to the line’s message fully embraced it and shared it with their networks – in more than one language – and Vans took notice.
Vans were able to tap into sentiment data to identify the trend and create a successful partnership with Buezo and Dennis (based on acculturation hyperlocalisation) and design a shoe that was in line with Vans’ mission to “To enable creative expression—and inspire youth culture —by celebrating and encouraging the Off the Wall attitude that comes from expressing your true self.”
The connection between language emotions and language
Emotions and sentiments are not universal; they are heavily influenced by language and culture. In other words, different cultures and languages often have distinct ways of expressing and interpreting emotions because emotions are culturally driven and shaped by societal norms, values, and language-specific expressions.
Studies have explored the connection between emotions, language, and culture, providing evidence for the cultural shaping of emotions.
One influential study conducted by Shaver, Schwartz, Kirson, and O’Connor (1987) explored the relationship between emotions and culture. They found that individuals from different cultural backgrounds not only differ in the frequency and intensity of specific emotions they experience but also in how they label and interpret emotional experiences. This study highlights the cultural variability in emotional experiences and the need for cultural sensitivity in activities such as sentiment analysis.
Another study by Mesquita and Frijda (1992) investigated cultural differences in emotional responses. They found that cultural norms and values influenced the regulation and expression of emotions. The study demonstrated that different cultures had varying norms for when and how emotions should be expressed, suggesting that cultural contexts shape emotional experiences and expressions.
Moreover, research by psychologist Lisa Feldman Barrett suggests that emotions are not prewired universal responses but rather constructed based on cultural and contextual factors. Her work emphasises the role of cultural learning and language in shaping emotional experiences and expressions.
These findings support the notion that emotions are culturally driven and shaped by language and societal influences. By considering the connection between emotions, language, and culture, businesses can employ multilingual sentiment analysis to better understand and interpret the sentiments expressed in different languages and cultural contexts. This comprehensive approach allows businesses to tailor their marketing strategies effectively and engage with diverse audiences on a global scale.
How to carry out multilingual sentiment analysis
Analysing sentiment (in one language or across several languages) is a task that typically requires the use of machine learning (trained models), data analysis techniques, and natural language processing (NLP) to derive quantitative sentiment scores from raw text.
NLP combines computer science, artificial intelligence, and linguistics to enable a machine to read and analyse human language with some level of understanding. The machine will look at words, phrases, and themes to determine sentiment.
Machine learning and NLP aren’t the only ways to conduct sentiment analysis, though – there are several techniques available, including manual human analysis. However, while a person could do the job manually by browsing the web, finding relevant posts, reading them, and assessing the sentiment or emotion behind them, in practical terms, an algorithm will be able to do the job much faster and more accurately.
When sentiment scoring is done by machine learning, sentiment models learn from data. They start with training samples, which are texts marked up with sentiment annotation so that they can be used to train sentiment analysis algorithms. These samples are extracted and annotated by human experts; in the case of multilingual sentiment analysis, you’d need a marketing translation specialist.
The models use this training sample data to find patterns, identify associations between sentiment and sentiment-carrying words, and create sentiment scores.
Sentiment analysis systems can be rule-based, automatic, or hybrid.
- Rule-based: These algorithms perform the analysis by applying rules programmed by experts. For example, sentiment analysis rules might look for specific words or phrases that indicate sentiment, such as “great”, “outstanding”, and “terrible”. An example is the VADER sentiment analysis model, which stands for Valence Aware Dictionary and Sentiment Reasoner).
- Automatic: these models perform sentiment analysis without human interaction through machine learning and training samples to infer sentiment scores.
- Hybrid: These sentiment scoring systems combine both sentiment scoring approaches.
Analysing sentiment (in one language or across several languages) is a task that requires the use of machine learning (trained models), data analysis techniques, and natural language processing (NLP) to derive quantitative sentiment scores from raw text. There are several techniques to carry this out.
Does sentiment analysis work in all languages?
In sentiment analysis, language detection is a prerequisite to sentiment interpretation. Without identifying the language of a social media post, an algorithm will not be able to decipher sentiment information.
This ability, however, needs to be built into the sentiment analysis model; it’s just not feasible to involve human linguists to translate into English the whole pool of foreign-language social media posts that require sentiment scoring.
This leads us to one of the most frequent challenges in sentiment analysis for companies operating internationally: their sentiment scoring systems aren’t trained in sentiment analysis for other languages than English.
Types of sentiment analysis
To establish the different types of sentiment analysis, we first need to establish the criteria on which the classification is made.
Monolingual sentiment analysis vs multilingual sentiment analysis
The first distinction that can be made in sentiment analysis is whether it’s conducted in one language (monolingual sentiment analysis) or more (multilingual sentiment analysis).
As sentiment is affected by language, it’s important to be able to conduct sentiment analysis in the native language of your target audience. In the section below, we’ll look at how this is done to ensure maximum sentiment accuracy.
Polarity-based sentiment analysis vs advanced sentiment analysis
The third distinction concerns how advanced the analysis is.
Polarity-based sentiment analysis is the most basic form: using natural language processing (NLP), sentiment is classified as positive, negative or neutral by looking at single words.
Advanced sentiment analysis is more complex and uses advanced linguistic methods, such as syntax and semantics, to take into account other factors such as:
- The strength of the sentiment: Polarity categories are expanded (very positive, positive, neutral, negative, very negative). This is called fine-grained sentiment analysis.
- Emotion detection: The analysis goes beyond the positive/negative dichotomy to look at emotional states such as enjoyment, happiness, frustration, anger, surprise, disgust, etc.
- Context: The analysis takes into account sentiment in other parts of the text, such as whether sentiment is strong or weak elsewhere, what sentiment words are nearby, etc. This is useful to identify sarcasm and avoid false sentiment results.
- The time or place when the sentiment was expressed: sentiment changes depending on the time and situation, such as sentiment expressed during a national tragedy or after an election.
General sentiment analysis vs aspect-based sentiment analysis
The last distinction to be made is whether sentiment analysis relates to an entity (product, event, person, etc.) as a whole or focuses on particular features or aspects of such entity.
For example, a tweet can simultaneously refer to a hotel’s excellent location and mediocre food: sentiment analysis would be looking at the sentiment for each of those aspects separately.
Challenges of conducting multilingual sentiment analysis as part of a global marketing strategy
Many of the challenges associated with conducting multilingual sentiment analysis as part of a global marketing strategy stem from the limitations of existing natural language processing (NLP) techniques and open source tools. While these tools have undoubtedly advanced sentiment analysis capabilities, they still face certain shortcomings when dealing with the complexities of multiple languages and cultural nuances.
Let’s explore these challenges in more detail:
1. Need for training
In sentiment analysis, language detection is a prerequisite to sentiment interpretation. Without identifying the language of a social media post, an algorithm will not be able to decipher sentiment information.
This ability, however, needs to be built into the sentiment analysis model; it’s just not feasible to involve human linguists to translate into English the whole pool of foreign-language social media posts that require sentiment scoring.
In other words, sentiment analysis tools are primarily trained to categorise words and phrases in a single language. When applied to other languages, these tools may lack the necessary language-specific training data, leading to reduced accuracy and reliability.
This leads us to one of the most frequent challenges in sentiment analysis for companies operating internationally: their sentiment scoring systems aren’t trained in sentiment analysis for languages other than English.
2. Machine translation shortcomings
While neural machine translation has come a long way and produces relatively accurate translations, sentiment and emotion cannot be fully addressed by machine translation.
Machine translation is, for example, unable to detect sarcasm or irony. Because sentiment analysis requires understanding the text in context and sentiment cannot be seen as separate from context, this task is still better performed by sentiment scoring models trained in sentiment analysis for each language.
However, if the analysis only seeks some basic sentiment information (positive vs negative), sentiment analysis tools that use machine translation can learn sentiment from translated text quite effectively.
The ideal scenario would involve input from sentiment experts who can train sentiment models for sentiment scoring in each language.
3. Context-dependent meaning and sarcasm
Even when no machine translation is involved, and the data is collected in the company’s original language (e.g., English), sentiment analysis tools often struggle to detect context-dependent meanings, such as sarcasm, irony, or subtle nuances that heavily influence sentiment interpretation.
Polysemy challenges fall within this category, too. Sentiment analysis tools face difficulties in dealing with polysemy, where a single word may have multiple meanings. This ambiguity can lead to inaccurate sentiment classification.
Negation handling is another aspect of context-dependent meaning. Sentiment analysis tools sometimes fail to understand that the presence of negation doesn’t automatically indicate a negative sentiment. Negation can reverse the sentiment’s polarity, complicating accurate sentiment classification.
4. Pre-processing with different encoding
Multilingual sentiment analysis encounters challenges when processing text encoded in different formats. Languages with non-Latin alphabets or older text data sources may have diverse encoding styles, requiring additional pre-processing steps for effective analysis.
5. Emoji interpretation
Emojis play a significant role in expressing sentiment on social media. However, sentiment analysis tools often struggle to accurately classify emojis, resulting in their exclusion from many analyses or misinterpretation of the associated sentiment.
6. Other audiovisual elements
Many social media posts nowadays contain audiovisual elements such as GIFs, videos, or images. Sentiment analysis tools are still unable to analyse these elements in a meaningful way, leading to false positives and negatives or the exclusion of some sentiment-carrying data from the analysis.
7. Different domains and industries
Sentiment analysis models frequently encounter challenges when confronted with diverse domains and industries. Each industry possesses its distinct vocabulary, colloquialisms, and specific terms, all of which can significantly impact the interpretation of sentiments. An illustrative case is the word “sick,” which holds a negative meaning in the healthcare sector but can be used positively to express admiration or excitement in informal contexts.
8. Biases and imbalances in model training
Sentiment analysis models are trained using large datasets, which can introduce biases. Biases in labelling and training data can impact the accuracy and fairness of sentiment analysis results. For instance, if an algorithm is trained to label the sentence “I am a sensitive person” as negative and “I can be very ambitious” as positive, the results can be biassed and lead to skewed insights.
An additional instance arises when a particular sentiment class (e.g., positive or negative) is disproportionately prevalent in training data compared to the others. This imbalance can result in biassed model predictions, making the model more inclined to predict the overrepresented class.
9. Data privacy regulations
Engaging in the analysis of sensitive customer data without appropriate consent or violating data privacy regulations can have grave repercussions, such as legal penalties and the erosion of customer trust.
Recommendations for successful multilingual sentiment analysis
Experts recommend taking the following steps to overcome the challenges of multilingual sentiment analysis:
- Train separate sentiment analysis models for each language.
- Combine software and human analysts – the latter can help with polysemy, sarcasm and irony detection, and context-sensitive meaning interpretation.
- Where possible, ensure that human analysts are trained translators or linguists – these professionals are more aware of the subtle nuances in each language and even of how two languages differ in terms of encoding sentiment in linguistic devices.
- Utilise pre-processing steps to manage encoding discrepancies in different languages.
- Opt for modern generative analysis tools that maintain data private and secure through strict data handling policies.
- Couple sentiment analysis with consumer surveys to make up for the shortcomings of both strategies.
- Don’t approach sentiment analysis with a preconceived notion of the results – rather, keep an open mind and wait to review all the data.
- Use unexpected insights to inspire new strategies, such as new uses for products.
- Consider that human reviews of a random sample set from a large dataset, can reveal equally valuable, accurate, and valid insights as running all mentions through an AI-based sentiment analysis tool.
- Employ data scientists to analyse whether people across cultures use specific emojis more frequently in positive or negative events, and train the machine learning models accordingly.
- Adopt an industry-specific approach when dealing with sentiment analysis from specialised domains.
Final thoughts
Multilingual sentiment analysis is a powerful technology that can help businesses to identify customer sentiments, understand current trends in the market, as well as uncover new opportunities. However, it comes with its share of challenges that should be considered and addressed before deploying such models across languages.
By following the recommendations listed above, companies can gain valuable insights from multilingual sentiment analysis while avoiding potential risks. With the right methodology and tools, businesses will be able to reliably use sentiment analytics in their multilingual marketing strategies and operations.
Related article: Marketing Translation vs Marketing Localisation: Differences, Benefits, and Challenges
Author: Maria Scheibengraf
Maria Scheibengraf is an English-to-Spanish marketing and SEO translator specialised in software (SaaS, martech, fintech), and Operations Manager at Crisol Translation Services, which she co-founded in 2016. With a solid background in programming and marketing, Maria has an in-depth understanding of the technical intricacies involved in software programs, websites, and digital platforms. Maria is also the author of The SEO Translation Bible.