SMSTester - Monitoring SMS Delivery (and Keyword Filtering, Possibly)

Posted by SaferMobile on Apr 25, 2011

NOTE: This article was updated with an addendum and additional data.

Inspired by Michael Benedict's original blog post on monitoring SMS delivery reliability in Tanzania and recent reports of SMS keyword blocking in Uganda, MobileActive.org set out to replicate Michael's work - and add to it. SMS is such a crucial part of many mobile projects and just day-to-day life across the developing world, yet there’s a lack of public knowledge of mobile network operator interdependency, latency, and reliability (how mobile network operators work together to transmit SMS, the lag time between sending and receiving a message, and the guarantee that a message will reach its recipient).

Michael's post got us thinking: Can this type of experiment be replicated without extra hardware required (GPRS modems, etc.)? After a few quick brainstorming sessions at the OpenMobile Lab in New York, we created an alpha version of a mobile application that recreates a number of latency tests. It’s far from perfect - and there is still plenty of work to be done - but we’re confident that this project will lead us to extremely valuable data about the transparency and reliability of SMS on mobile networks.

SMSTester - The App

SMSTester is a simple Android app that allows a user create a set of keywords to be sent as SMS messages. This allows the user to explore differences in latency for any type of message - from basic, everyday text like ‘milk’ or ‘newspaper’ to politically inflammatory text such as ‘revolution.’ We then set up a logging mechanism to timestamp and record each SMS as it is sent (from the sender side) or received (on the receipt side). By comparing the sent and received timestamps, we’re very easily able to calculate message latency from one SIM to another.

Initial Deployment

We deployed SMSTester in a test in Egypt a few weeks ago. As this was the initial trial for a fully untested application, we were careful.  While we did run our tests across a number of local mobile operator networks, we kept the test volume small enough to keep us under the radar (for now!). Our test methodology included:

  1. Testing across all three major mobile operator networks in Egypt: Etisalat, Mobinil, and Vodafone
  2. Consistent keyword test bed containing both ‘safe’ and ‘political’ terms, where ‘safe’ refers to everyday vocabulary and ‘political’ refers to politically sensitive words
  3. Language coverage across both English and Arabic
  4. Roughly 270 messages successfully sent, received and analyzed

What We Looked For And Why

The main focus of our analysis was SMS delivery latency, delay, or more generally, delivery failures. There are plenty of anecdotal stories of seemingly random delays lasting multiple hours or even days in many countries where we work. While network congestion and growing infrastructure are often to blame for SMS unreliability, there are also legitimate concern that delays may be an indication of deliberate message filtering and monitoring. What has emerged is an environment in which activists and human rights defenders are unable to clearly understand what networks - and what behavior - is safe or hazardous for themselves or their contacts. The end goal of this research, put simply, is to change this paradigm. Rumors of keyword filtering are not helpful; what is helpful is any evidence of surveillance.

This small experiment is just a start, of course. Our hypothesis is that keyword filtering and other malicious behavior on the part of mobile network operators may manifest in the form of increased message latency or overt message blockage. If we could detect any sign of a correlation between message content and delivery with just some initial testing in-country, this would be a great first step towards our overall goal. However it’s very important to note that while message latency or failure may be indicative of bad behavior on the part of the carries, it could be due to any number of contributing factors and is by no means an implication of foul play. For now we’re merely hypothesizing.

Results

Despite the minor bugs discovered, we gathered very valuable information about message latency in Egypt during this trial. The most valuable data was on the Etisalat network (also known as the Emirates Telecommunications Corporation), based in the UAE. The majority of the data we recovered from this trial was between an Etisalat SIM and other Egyptian networks. (Note: This was the result of inadvertent data loss for other test scenarios and we did not specifically target Etisalat).

Main Conclusions

Big Caveat (READ THIS!): Given the small sample size of this test, it should be noted that none of these conclusions are definitive. In fact, the very nature of such a small sample size warrants much further investigation.

(1) Delivery between Etisalat & Mobinil networks warrants further investigation. As shown below, the delivery time for English language text messages from Etisalat to Mobinil is significantly greater than delivery time to any other network, for both English and Arabic texts. This may be due to any number of network delays, and it may also be indicative of English language filtering by one or both of the mobile network operators.

 (2) Delivery time of politically sensitive English messages on Etisalat warrants further investigation. The chart below shows that politically sensitive English messages sent across the Etisalat network were delivered with significantly more latency than others, with the possible exception of politically sensitive Arabic messages. In addition, each of the three messages that were delivered out of order fell into this category. Similar to the above conclusion, this may be indicative of specific filtering on behalf of Etisalat.

Secondary Conclusions

Aside from these two observations above, it’s difficult to infer conclusions from the remaining results. Much as Michael Benedict posited in his original blog posting, message latency and dropped messages appear to occur at random. English messages sent from Etisalat to Mobinil showed a significant number of mis-ordered / delayed messages, however this pattern did not correlate well with message content.

Messages sent from Etisalat to Vodafone showed no discernible patter, with the final messages taking nearly 500X as long to be delivered as the mean of the test.

The converse test (Vodafone to Etisalat) resulted in the biggest set of missing messages that we saw in all our tests. While at first glance it may appear that this dropping coincides at the border of English ‘safe’ and ‘political’ messages, in actuality it encompasses a single ‘safe’ message as well. Had that been the case, it might have suggested purposeful blocking of SMS delivery due to keyword detection.

Message latency between two Mobinil SIMs also showed no obvious pattern from our tests, and delivery times fell within a reasonable range across all messages sent.

Next Steps

1. Application Improvements

The number one priority for us moving forward is to make improvements to the SMSTester to make it more agile. Due to our initial design, the collection and analysis of the captured data took too long to be scalable in the future. Building in the ability to conduct simple analysis of SMS latency into the application could have the potential to make real-time SMS latency analysis a reality.

2. More Data!

The scale of data collected was not enough to come to solid conclusions. While it does point us towards some interesting possibilities, we’ll need to collect significantly more data to draw some real conclusions.

3. Crowdsourcing

We will make the SMSTester app and code available once we have made some bug fixes and improvements. We would welcome allies around the world to collect data, understanding that running it at scale might be problematic in some countries (where there are SIM registration requirements, for example, that tie SIM numbers to a specific person.) Because of the risk involved in running SMSTester, we want to be clear about the potential risks for anyone running the app. That said, we are small and in order to gather meaningful, comparative data, we have to rely on this collective network.  We look forward to working with you all on this.


Addendum: Source Data & Methodology

We’ve uploaded the source data for our posting on SMStester here. We describe below our test data and methodology. Thanks to all for their comments and questions to-date - further insights and suggestions are always welcome!

Networks & Hardware

Our team on the ground in Cairo successfully acquired SIM cards for each of the major mobile data carriers in the country: Etisalat, Mobinil, and Vodafone. Our hardware consisted of various Android handsets: a few HTC Wildfires and a T-Mobile G2.

Software

Our handsets were equipped with SMSTester software (we’ll be releasing the source after fixing a few very obvious bugs that we’ll get into here) for sending and receiving messages en masse. This Android application has a very simple design - it accepts a simple text file containing message payloads (one message per line), a destination mobile number, and a configurable delay between messages (can be set to n milliseconds). This software records the following fields whenever a message is sent or received: type (sent or received), origin (phone #), destination (phone #), payload, and timestamp.

Note: As this was the first time conducting these tests, we did make a user error in the handset setup process that resulted in the SMSTester application only being installed on sending devices. Recipient devices, on the other hand, had Whispersys’s TextSecure application installed. This may be a contributing factor to some of the issues with message timestamping that we saw (see below).

Message Payload

All but one of our tests sent the payload indicated in a detailed spreadsheet. One test in particular (Vodafone → Etisalat) only successfully sent through 48 of the 56 messages due to a prepaid account issue: the sending SIM card unfortunately ran out of credit in the middle of a test. The messages sent can be split into two language categories (English & Arabic) as well as two categories of sensitivity (everyday / safe language & political). The specific content in these messages was chosen at random and we intend to create a more comprehensive dataset in future experiments. Suggestions on this are welcome!

 Note: The Arabic message content categorized as ‘safe’ was incorrectly entered, and while there were Arabic characters, they did not form sensible words. As the intent of ‘safe’ messages is to approximate a type of message that is non-discriminatory and not filtered by carriers, we determined that this error should not be enough to adversely affect our results.

Logs & Latency

After completion of each test session, we retrieved log data from each of our test devices and compiled them into two data tables: sent and recv. Message latency was computed as the simple difference between the receipt and sent timestamps. There were certainly irregularities in the message latencies we saw in this initial round of tests. Many of the computed latencies actually appeared to be negative. Some hypotheses for this type of discrepancy include:

  • Potential differences in network time between carriers

  • SMSTester app only being used on sent device, not recipient

  • Possible user error, including misconfigured devices

For the purposes of this initial, experimental trial run, we normalized each dataset to remove negative latencies. The methods we used for dataset normalization were relatively straightforward as we chose to err on the side of simplicity for this initial: we uniformly increased the message duration across a test set (~56 messages) by the absolute value of the most negatively latent message. In other words, the most negatively latent message in a test set is normalized to a delivery time of roughly zero.

Note 1: One major improvement to SMSTester that we’ll be making in the next iteration is the inclusion of a unique, sequential message ID within each message itself. This will allow us to more easily track and report on message delays that result in messages being received in an incorrect order.

Note 2: In addition to a unique sequential ID for each message, including a timestamp in sent messages will allow the receiving handset to automatically compute latency based on network time, and make the job of retrieving usable logs much easier. This will hopefully even allow us to do some simple calculation and graphing of message latency on-device.

SMSTester - Monitoring SMS Delivery (and Keyword Filtering, Possibly) data sheet 5077 Views
Countries: Egypt

each of the three messages

each of the three messages that were delivered out of order fell into this category. Similar to the above conclusion, this may be indicative of specific filtering on behalf of Etisalat.

Now the above statement could be correct, or simply its a very common scenario seen with signaling issues with SMSCs in general.

There is no (to my knowledge) a decent system for filtering SMS really, as its like any form of filtering in its need of human analysts to continually check what is being filtered, this is mainly done to prevent "false negative" (passage of traffic that should be blocked) and "False positives" (blocked legit messages).

Analysis Engines take milliseconds, but milliseconds are not really accurate in measuring deliveries of SMSs as it uses (none real-time) protocols, involves air interfaces, etc....

But all said, i think this is a very interesting effort!! :)

Absolutely

Yes, we are - especially the keyword list! We are keen on seeing this replicated with the improved SMSTester, though, to get a larger data set to see whether the patterns hold across carriers there with more data points.

I'll post a link as soon as the data is up.

Katrin

Make your data available?

Are you willing to make your data publicly available, together with an exact account of your testing methods (i.e. keyword lists)? Without that, no-one can verify your findings.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd><p><br> <b><i><blockquote>
  • Lines and paragraphs break automatically.

More information about formatting options