Some thoughts about Telegram’s “Aggressive Filtering”

TL;DR: Telegram's new anti-spam system leaves some serious questions unanswered and, for this reason, we question its processes and cannot safely recommend its use. Whilst we admire the effort on Telegram's part, we believe they need to provide more information to users about how it works and how it guarantees user privacy.

A little while back, we spotted a verified bot in the wild, called “Telegram Anti-Spam”. Not knowing what it was, we asked around, and didn’t get much information, other than the fact that it adds an “Aggressive Anti-Spam” option to a group's settings, presumably when the bot is added.

Beyond that, we just had to wait and see. Which we did…

Telegram recently announced this new feature, providing very vague details about how it works and, in so doing, throwing some shade over third-party anti-spam bots that "can be efficient" as they are “not constrained by Telegram's commitment to user privacy.” Whilst they are not entirely wrong, it does not paint a good picture of these efficient bots that are, as they indicate, doing a good job at getting rid of spam effectively, and doing so with privacy in mind – like HAL does (more on this at the end of this post).

Beyond this, we’ve noted some issues that we’d like to share with the public. Before we start, we should note that our intention is not to make Telegram look like the bad-guy, but rather to point out some obvious issues, faults, and questions that we think should be addressed. We strongly believe that tools of this nature need to be transparent to a large degree (without providing spammers with the tools needed to evade them) and should inspire strong confidence that they are acting in the best interests of users, and nobody else.

Privacy

Telegram claims their system respects their original commitments to privacy, but does not explain how it works.

On the one hand, we don’t expect any anti-spam system to detail its inner workings, as that would most certainly help spammers evade such a system – but, on the other hand, we feel that some indications are in order, even if they are very high-level explanations.

Telegram has only said that their “experienced anti-spam algorithms” can be turned on to “unleash their full potential”.

What kind of algorithms are we talking about? Are they based on machine learning? Do they do sentiment analysis? If not, what are they doing? Simple filtering systems, perhaps?

Our primary concern, however, is that in order to fight spam at this level – where content on every message sent anywhere needs to be analyzed – certain ‘rules’ need to be put into place pertaining to user privacy; specific rules that dictate how information is processed, and how that processing keeps a user’s information private. We note that many groups on Telegram are public, and so this may sound like a moot point, however we say it with the knowledge that data-processing comes in many forms, and this would affect private groups as well.

Alas, their privacy policy has not been updated since 14 August 2018, as at the time of writing this, and they have made no indication as to specific processes regarding user privacy in relation to this new system, which we feel they ought to do.

Their current policy only makes provision for reading people’s messages when those messages are reported to them by other users. There are no additional clauses/provisions when it comes to their new opt-in anti-spam system.

Operational Workflow, Censorship

This point relates back to privacy, but deserves a mention of its own.

The bot, @tgsantispambot, appears to have administrative rights across all groups in Telegram. We say “appears to have” because we do not have concrete evidence that it does. It does not appear in the admins list of any group, and it cannot be added to a group manually. To enable it, one simply turns on a switch, and the bot kicks into high gear.

However, the bot does appear on the admins list in the “Recent Actions” filters, even when "Aggressive Anti-Spam" has not been enabled.

This is enough information for us to state – with some level of confidence – that the bot is automatically an admin everywhere, even when the toggle has not been turned on.

Given the nature of how this anti-spam system is advertised to work, we can safely assume, then, that it has the ability to read all messages, whether or not group owners enabled Aggressive Anti-Spam.

To us, this is a notable and significant breach of privacy.

Whilst we acknowledge that Telegram has access to messages anyway (all of them are stored in their data-centers), there has been no evidence to suggest that Telegram has been analysing or reading any of them, except within the bounds of the clause in their privacy policy, noted above.

If this is not the case, and they are not analysing all messages, then we think Telegram ought to make this clear, and additionally state, on record, what system processes kick in when a group owner turns it on.

Separately from this, we cannot rule out the possibility that Telegram could use this tool, when activated, to discretely apply censorship processes without letting the public know that they might do this. We have been informed, via public and private channels alike, that the bot has been removing posts that are clearly not spam, with no clear and immediate recourse in sight – quote, ”it can’t be undone.”

This points back to the thoughts we have about algorithms – they have not produced any information that discloses them at a high level, nor any information to let users know what to expect. To us, this is a serious issue that Telegram needs to resolve with the utmost urgency.

Account IDs

This one is somewhat in the air, as Telegram has never produced any documentation relating to how account IDs are assigned at sign-up.

However, we feel it is important to note that 90%+ of all bans in at least two anti-spam systems, including HAL, are for accounts that have IDs in the 5-billion range. We’re not entirely sure why their system does this, but we find it strange that the vast majority of spammers use these IDs.

For some insight, we’ve identified that there are no accounts in the 3-billion and 4-billion ranges (why?), and the majority of legitimate accounts have IDs under 3-billion (makes some sense).

We find it hard to not ask ourselves this simple question, then: what’s to say their anti-spam system is not taking this into account when looking for spam?

Telegram has been around for years, and we feel they’ve had the tools to fight spam properly for quite some time. And yet, this strange phenomenon appeared roughly a year ago, and all of a sudden they have the “algorithms” in place.

There are some more technical questions that can be asked, here, however we won’t go into it just yet – we’re simply highlighting something that we’ve noticed, and are hopeful that this 'phenomenon', whilst strange, doesn't hold any water, and is just a technical process they use to make database indexing across data-centres efficient.

(As a side note: HAL’s account ID is in the 5-billion range, so we do not discount the fact that legitimate accounts are in it as well, much like early spammers would have been in earlier ID ranges. Our point is that a pattern has been identified, and deserves to be questioned as it has not been documented anywhere. Telegram’s Anti-Spam bot is also in the same range.)

Anonymous Phone Numbers

Alongside their announcement of an anti-spam system, Telegram also announced that users will be able to sign up for the platform using anonymous phone numbers using their Fragment platform, which is linked to their TON blockchain – meaning that a SIM card is not required.

We won't go into the specifics about how blockchains work, but, sufficed to say, we feel strongly that this move is rather dangerous, considering that the people behind the spam problem will very likely use the platform to create anonymous accounts that share spam – they have the resources to do so. Some anonymous phone numbers are very pricey, while others cost as little as a few Bic Mac burgers, and might, perhaps, be easier for spammers to acquire.

On the one hand, spammers use anonymous phone numbers anyway, but on the other, this looks like Telegram is opening the flood-gates to spam, and then providing a solution to the problem, whilst still making money from it. We certainly hope this is not the case, but at face value, this connection cannot be ignored.

In Summary…

…and with some exhaustive knowledge about how spam works in Telegram to back it, we’re going on record to state that we reserve our judgement about Telegram’s new anti-spam system, and cannot recommend its use, no matter how effective it may be, until such time as Telegram clarifies its practices and can guarantee that its commitment to privacy is being respected at all levels, and at all times.

We admire the effort, and appreciate the fact that Telegram is finally taking the leap and doing something about the spam problem.

However, as noted before, anti-spam systems that work using content-analysis tools should be as transparent as possible when it comes to answering the questions that matter.

What about HAL? How does it guarantee privacy?

Admittedly, we do not yet have a privacy policy in place. However, we are still in beta, and features are still being developed. That having been said, we make the following statements about how HAL processes messages:

1. When HAL’s filtering system looks at a message, it simply compares the message to a large database of flexible filters. It does not 'read' those messages, and has no idea what they mean. It's not an AI, and it doesn't comprise of any machine-learning systems. (This step only takes place when its "federation partners", as we call them, have identified that an account is not a known spammer.)

2. If a match is not found, the message-content is discarded immediately, and HAL moves onto the next one. No logs are stored in relation to the message, other than who sent it, where they sent it, and when. (We will be making adjustments to discard this information in favour of something simpler, before we reach General Availability.)

3. If one or more matches are found, the message is identified as spam and then stored in the system, along with some new filters to target the message. This simply makes it easier to identify that message when sent by another spammer.

4. Beyond that, we do not log anything, and we only store information that is already accessible to anyone, either via Telegram’s apps, or their APIs.

HAL is built in such a way that it can never be used as a tool to gather mass data about how people use the platform, or any data that would identify a user and details about the messages they send. It stores only what it needs in order to operate effectively, and its databases are inaccessible to anyone, with the obvious exception of its creator.

We will be releasing a privacy policy in due course, likely within the next two cycle-updates in the private beta. The policy will be subject to small changes where so required, but will always honor the points made above.