aaron blog

What link previews leak

Link previews in messaging apps are a convenient way to get a peek at content before clicking on the link. To work inside a messagingapp, a link preview require someone-- either you, or the messaging app-- to fetch the preview content from the link destination. This post takes a look at some popular messaging apps and what data they send when they fetch the preview from the requested website.

First things first: link preview comparison by messaging app

Bottom line up front, here are the results of testing the web requests that these messaging apps send when they request a link preview.

| App       | Platform  | Request Origin | Self-identifying |
| --------- | --------- | -------------- | ---------------- |
| BlueSky   | Web       | Datacenter     |        ✅        |
| BlueSky   | Mobile    | Datacenter     |        ✅        |
| Discord   | Desktop   | Datacenter     |        ✅        |
| Discord   | Mobile    | Datacenter     |        ✅        |
| iMessage  | Desktop   | Sender IP      |        🚫        |
| iMessage  | Mobile    | Sender IP      |        🚫        |
| Instagram | Mobile    | Datacenter     |        ✅        |
| Instagram | Web       | Datacenter     |        ✅        |
| Keybase   | Deskop    | Sender IP      |        ✅        |
| Keybase   | Mobile    | Sender IP      |        ✅        |
| Mastodon  | Web       | Datacenter     |        ✅        |
| Signal    | Desktop   | Sender IP      |        🚫        |
| Signal    | Mobile    | Sender IP      |        🚫        |
| Telegram  | Mobile    | Datacenter     |        ✅        |
| Telegram  | Deskop    | Datacenter     |        ✅        |
| Twitter   | Web       | Datacenter     |        ✅        |
| Twitter   | Mobile    | Datacenter     |        ✅        |
| WhatsApp  | Desktop   | Sender IP      |        ✅        |
| WhatsApp  | Mobile    | Sender IP      |        ✅        |
| Wire      | Desktop   | Sender IP      |        🚫        |
| Wire      | Mobile    | Sender IP      |        🚫        |

Some apps will send the request from their own datacenter, while others will send the request from the IP address of the sender. Some apps will self-identify as the app that is requesting the link preview, while others will not. Read on for further explanation of each app.

What is a link preview?

A link preview is a small preview of the content that is available at a link. It is typically displayed in a messaging app when a user shares a link. The preview is fetched from the link destination, and is displayed in the messaging app before the user clicks on the link.

Screenshot of a link preview on X, the website formerly known as Twitter

Screenshot of a link preview on X, the website formerly known as Twitter.

For web developers, the most common way to publish the link preview content of your site is by using Open Graph tags. Open Graph was created by Facebook, to enrich the look and feel of shared hyperlinks on their platform. Open Graph tags are now supported by most major social media platforms, and are the most common way to publish link preview content. There isn't really an RFC or published specification beyond the website linked above, but the authors of the Open Graph "protocol" insist that its use is governed by Open Web Foundation Agreement 0.9.

What happens when a link preview is requested?

When a user inserts a link into a messaging app, the following likely happens:

1. User inserts a link into the app, or, sends a message with a link.

2. The app sends a request to the included URL, either from the app on the device itself, or, from a datacenter operated by the app developer.

3. The app (or datacenter) takes the web page response from the URL, and parses out the Open Graph tags, such as og:title, og:description, og:image, etc.

4. The app displays the received title, description and image within the app, as a preview of the content that is available at the link.

Title, description, and image are just the most common Open Graph tags. While other frameworks such as link-rel canonical might also be used, these specific Open Graphs tags are the ones you are most likely to see on social media and messaging apps.

Screenshot of a link preview in Signal

Screenshot of a link preview in Signal.

Setting up the experiment and testing methodology

In order to observe what data is being sent by the app develoeprs when a link preview is requested, I used a link shortener I wrote called Telex that logs the incoming requests before redirecting them to a destination. I then inserted the short links on various messaging apps and social media websites, and observed what request data was logged.

Another way one would be able to see the request data is simply by looking at requests to their website. However, my experiment was designed to directly attribute these sessions to the particular app being examined. A publicly available website may be too noisy with other crawler traffic and human visitors.

For desktop testing, a Macbook Air on macOS 13 was used. For mobile testing, an iPhone on iOS 17 was used.

The results from each app is discussed below.

BlueSky

Tested Platforms: Web, Mobile

Request Origin: Datacenter

Self-identifying: Yes

BlueSky is a decentralized social media protocol and messaging service. When a link is posted on the BlueSky webapp or mobile app, a request is made to that link with the following request details:

App: BlueSky
Platform: Desktop and Mobile
IP Adress: 18.191.104.94
ISP: AS16509 Amazon.com
Geolocation: Columbus, Ohio, United States
User Agent: Mozilla/5.0 (compatible; Bluesky Cardyb/1.1; +mailto:[email protected])

So from this we can surmise that BlueSky is using Amazon Web Services to host their link preview service, and that they are self-identifying the traffic via the user agent. The cute Cardyb artifact may be a build version of the app protocol or code name for the link preview service.

Discord

Tested Platforms: Desktop, Mobile

Request Origin: Datacenter

Self-identifying: Yes

Discord is a social messaging app. When a link is posted on the Discord desktop app, a request is made to that link with the following request details:

App: Discord
Platform: Desktop
IP Address: 35.227.62.178 
ISP: AS396982 Google Cloud
Geolocation: North Charleston, South Carolina, United States
User Agent: Mozilla/5.0 (compatible; Discordbot/2.0; +https://discordapp.com)

On Discord mobile, there is only a slight different of IP address, albeit from the same subnet. It may be a serverless worker or microservice that allocates available IPs to running jobs.

App: Discord
Platform: Mobile
IP Address: 35.237.4.214
ISP: AS396982 Google Cloud
Geolocation: North Charleston, South Carolina, United States
User Agent: Mozilla/5.0 (compatible; Discordbot/2.0; +https://discordapp.com)

iMessage

Tested Platforms: Desktop, Mobile

Request Origin: Sender IP

Self-identifying: No, with caveats

iMessage is a messaging app for Apple devices. When a link is posted on the iMessage desktop app, a request is made to that link with the following request details:

App: iMessage
Platform: Desktop and Mobile
IP Address: The sender's IP address
ISP: The sender's ISP
Geolocation: The sender's geolocation
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/601.2.4 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.4 facebookexternalhit/1.1 Facebot Twitterbot/1.0

While iMessage link previews self-identify as something, the stated user agent does not explicitly say iMessage or Apple. It does purport to be a specific version of macOS, but the version did not match the actual user agent of the tested device. Note the use of Facebook and Twitter artifacts in the user agent-- this is what makes it indentifiable as possible link preview traffic, whilst not explictly identifying itself as iMessage.

Instagram

Tested Platforms: Web, Mobile

Request Origin: Datacenter

Self-identifying: Yes

Instagram is a social media app. When a link is posted on the Instagram webapp or mobile app, a request is made to that link with the following request details:

App: Instagram
Platform: Mobile
IP Address: 2a03:2880:21ff:e::face:b00c
ISP: AS32934 Facebook
Geolocation: Forest City, North Carolina, United States
User Agent: facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)

App: Instagram
Platform: Web
IP Address: 2a03:2880:20ff:8::face:b00c
ISP: AS32934 Facebook
Geolocation: Ashburn, Virginia, United States
User Agent: facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)

For Instagram, the user agents for both mobile and web are identical, as is the ISP. However the IP allocation is vastly different. It is kind of cool how they allocated an IPv6 with face:b00c in it.

Keybase

Tested Platforms: Desktop, Mobile

Request Origin: Sender IP

Self-identifying: Yes

Keybase is a secure messaging app. When a link is posted on the Keybase desktop app, a request is made to that link with the following request details:

App: Keybase
Platform: Desktop and Mobile
IP Address: The sender's IP address
ISP: The sender's ISP
Geolocation: The sender's geolocation
User Agent: Mozilla/5.0 (compatible; KeybaseBot; +https://keybase.io)

Keybase ends up being the most consistent, with indentical user agents between mobile and desktop, and sticking to sending the sender's IP when making the request.

Mastodon

Tested Platforms: Web

Request Origin: Datacenter

Self-identifying: Yes

Mastodon is a decentralized social media protocol and messaging service. When a link is posted on the Mastodon webapp, a request is made to that link with the following request details:

App: Mastodon (infosec.exchange)
Platform: Web
IP Address: 2a01:4f8:222:1c9d::2
ISP: AS24940 Hetzner Online
Geolocation: Falkenstein, Saxony, Germany
User Agent: http.rb/5.1.1 (Mastodon/4.2.0-rc2+glitch; +https://infosec.exchange/) Bot

Due to the federated nature of Mastodon, the observed request info will be determined by the infrastructure of that particular mastodon server. In my case, the information observed is specific to my home server of https://infosec.exchange.

Signal

Tested Platforms: Desktop, Mobile

Request Origin: Sender IP

Self-identifying: No, with caveats

Signal is a secure messaging app. When a link is posted on the Signal desktop app, a request is made to that link with the following request details:

App: Signal
Platform: Desktop and Mobile
IP Address: The sender's IP address
ISP: The sender's ISP
Geolocation: The sender's geolocation
User Agent: WhatsApp/2

Signal is pretty consistent with what they send out between desktop and mobile. What is most interesting here is that Signal's requests self-identify as WhatsApp. Now, WhatsApp is known to use the Signal protocol for delivering end-to-end encryption, but that would not explain the presence of WhatsApp in the user agent for Signal. It is possible that Signal is using the WhatsApp user agent to avoid being blocked by websites that block Signal traffic.

Telegram

Tested Platforms: Desktop, Mobile

Request Origin: Datacenter

Self-identifying: Yes

Telegram is a messaging app. When a link is posted on the Telegram desktop app, a request is made to that link with the following request details:

App: Telegram
Platform: Mobile
IP Address: 149.154.161.248
ISP: AS62041 Telegram Messenger Inc
Geolocation: London, England, United Kingdom
User Agent: TelegramBot (like TwitterBot)

App: Telegram
Platform: Desktop
IP Address: 2607:fb90:e9e3:1d09:f97b:d76f:7a57:3f57
ISP: AS62041 Telegram Messenger Inc
Geolocation: London, England, United Kingdom
User Agent: TelegramBot (like TwitterBot)

For Telegram the requests are basically identical, other than IP address. The presence of TwitterBot in the user agent may indicate that sites readily understand traffic marked as TwitterBot as legitimate app-initatied web requests.

X (Twitter)

Tested Platforms: Web, Mobile

Request Origin: Datacenter

Self-identifying: Yes

X, formerly known as Twitter, is a social media app. When a link is posted on the X webapp or mobile app, a request is made to that link with the following request details:

App: X
Platform: Mobile
IP Address: 199.16.157.180
ISP: AS13414 Twitter
Geolocation: Atlanta, Georgia, United States
User Agent: Twitterbot/1.0

App: X
Platform: Web
IP Address: 199.16.157.182
ISP: AS13414 Twitter
Geolocation: Atlanta, Georgia, United States
User Agent: Twitterbot/1.0

Again, consistent across webapp and mobile platforms. Only difference is the IP allocation, though they appear to be on the same subnet.

WhatsApp

Tested Platforms: Desktop, Mobile

Request Origin: Sender IP

Self-identifying: Yes

WhatsApp is a messaging app. When a link is posted on the WhatsApp desktop app, a request is made to that link with the following request details:

App: WhatsApp
Platform: Mobile
IP Address: The sender's IP address
ISP: The sender's ISP
Geolocation: The sender's geolocation
User Agent: WhatsApp/2.23.18.78 i

App: WhatsApp
Platform: Desktop
IP Address: The sender's IP address
ISP: The sender's ISP
Geolocation: The sender's geolocation
User Agent: WhatsApp/2.2336.9 N

Note that there appears to be a distinction between the mobile and desktop apps in the user agent: the characters i for mobile and N for Desktop. The version numbers also follow a different convention.

It should also be noted that Signal's self-identification as WhatsApp/2 may still stick out, as it fails to contain as much information as a legitimate WhatsApp user agent.

Wire

Tested Platforms: Desktop, Mobile

Request Origin: Sender IP

Self-identifying: No, with caveats

Wire is a secure messaging app. When a link is posted on the Wire desktop app, a request is made to that link with the following request details:

App: Wire
Platform: Desktop
IP Address: The sender's IP address
ISP: The sender's ISP
Geolocation: The sender's geolocation
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36

App: Wire
Platform: Mobile
IP Address: The sender's IP address
ISP: The sender's ISP
Geolocation: The sender's geolocation
User Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) FxiOS/117.3  Mobile/15E148 Safari/605.1.15

For Wire, we have a mixed result with regards to self-identificaiton. On mobile, Wire will send the actual user agent for the device's browser. However for desktop, there appears to be a generic user agent purporting to be Chrome 69 on Windows 10. This may be an attempt to avoid being blocked by websites that block Wire traffic.

Conclusion

The datacenter-based solutions may be useful for safely loading external content, as well as caching the content accross the global user base. However it is far more centralized; That is likely why the secure messaging apps like iMessage, Signal, and Wire stick to sending the device IP.

The user agents for Signal and Wire raise questions as to how network oeprators may be blocking the traffic from those apps based on their user agents (which is possibly an ineffective way to block such traffic).

Finally, it would be interesting to experiment in the use of these user agents on guarded websites, as it might be the case that some of these user agents are allow-listed as legitimate traffic.

Thanks for reading! Hope you learned something new. If you have any questions or comments, you are welcome to reach out over email or any of the platforms discussed in this post. ;)