Trending New Challenges for Trust & Safety Teams

With the recent announcement of our partnership with, and investment by TaskUs, we wanted to share some of the highlights that are under the hood of our flagship moderation product, Safety Operations Center.

In order to keep up to date on toxic phenomena, L1ght’s team of  researchers and data analysts track, identify, and categorize new trends. Here are some trends we are seeing that are posing challenges to Trust & Safety teams that Safety Operations Center helps identify and remove from customer environments:

Bypassing Word Restrictions: Traditional blocked word lists can be bypassed by users. These use obfuscation of words, substitution, and accents to bypass traditional, non-AI block dictionaries. In addition, usage of another language (especially in Asian regions) can also bypass Trust & Safety policy restrictions. 

For example: #titsx-hot or bokep or Bĺòwjòb, will challenge most models and lists, where in L1ght’s case, our systems are autoconfigured to understand the context of such words and flag them accordingly. 

Profile Images: Traditional AI models can often fail in user profile assessments either due to the size of the image, or its distinction as just a profile image. Users can therefore post using subtler imagery to pass toxicity detection. 

For example: Users can channel their followers via just their profile image using toxic imagery. This is why L1ght’s best practices include assessment of all profile imagery to minimize risk. 

Spam & Emotions: ML models can falsely detect emoticons or repeated characters as toxic when they are often expressing emotions. At the same time, specific emojis should be detected as toxic by default. It is a growing challenge to detect the correct context. 

For example: L1ght’s contextuality includes cultural nuances across all regions of the world. We can help customers adjust their Trust & Safety policies to accept these uniquenesses without lowering risk to unsafe content. In one particular case, L1ght was able to build a custom classifier to reduce human moderation of spam to near zero while eliminating nearly all false positives.  

Hashtags & Links: Often in chat and review sites, hashtags and links are either ignored or minimally processed. They can therefore be used as a means to promote unsafe content. Links and hashtags should be processed independently from the textual content to determine if there is toxic intent behind them. 

For example: Often bad actors regenerate old hashtags that have been previously identified as part of trafficking or abuse. L1ght proactively searches and isolates such tags to ensure that customers proactively stop new trends before they develop. 

To learn more about Safety Operations Center, contact us for a demo at: