Leakage paths for the Apple / Google bluetooth tracing system

Update: see also Moxie Marlinspike’s thread which captures most of this.

Overview of the tracing system
Find out if someone specific is sick
Use stationary beacons to track someone’s travel path
Increased hit rate of stationary / marketing beacons
Leakage of information when someone isn’t sick
Fraud resistance
Conclusions
Appendix: Singapore’s BlueTrace
Appendix: Alternatives

Overview of the tracing system

I read through Apple’s docs for their contact tracing partnership with Google. This article summarizes how it works and possible data leakage paths. This isn’t totally my area, send me corrections if I’m wrong.

The tracing system works using Bluetooth advertise packets which are short radio broadcasts that tell a short piece of information to nearby devices, usually about service availability.

The A/G tracing system populates these packets with a unique identifier called a Rolling Proximity Identifier (henceforth RPI).

To make it hard to track someone over time, the RPIs are opaque and don’t contain any kind of user ID. They’re created using a one-way hash (click the link if you don’t know what that is) from a Daily Tracing Key, or DTK.

The DTK is in turn created via a key derivation function from a root Tracing Key that’s unique to your device (but never leaves it). Possible reason they’re deriving the TKs, rather than randomly generating them daily, is to regenerate the DTK history when you test positive and need to upload them (but you could also just store the history).

To summarize:

TK -> DTK (daily) -> RPI (every 10 minutes)

If someone tests positive, they upload their recent DTKs to a database. Other devices download the last N days of positive keys on some periodic basis, run it against their list of RPIs with timestamps (because DTK + time + 1-way hash = RPI), and get a list of RPIs that are sick.

Update: I should emphasize that the apps which use these APIs don’t have access to the raw observations; they can only query for matches between the daily ID download and the observations and get back a yes / no. A point that I missed until I read this tweet from a University College London digital regulation professor is that this is A/G’s subtle way of making sure that government-sponsored apps on top of this API can’t be used for spying on people en masse. A/G don’t say this out loud, probably wisely.

As far as I know, nobody is talking about making this mandatory: either to participate in tracing or to report when you’re sick. I think that’s a good thing. As with all non-mandatory systems, the most effective legislative path to making them mandatory is to normalize them first, convince a majority, and then make them mandatory later.

Also, there are leakage paths. Read on:

Find out if someone specific is sick

If I’m targeting an individual, I can capture their RPIs pretty easily and get notified that they’re sick.

If I operate an office building, I can pretty easily narrow down an RPI to N people entering a building at once. I don’t know if there are bluetooth scrapers for employers; I’d be shocked if there aren’t. I think this is what the estimote guy Steve Cheney is building this month.

If this continues for a while, and if sickness status is worth any money (the latter is a big if), we’ll see darkweb marketplaces where you can buy an individual’s RPIs.

My point is that even if this is fine for emergencies (another big if), don’t make the mistake of letting it be normalized for non-epidmic times or seasonal flu.

Use stationary beacons to track someone’s travel path

Let’s say I had one iPhone per subway entrance in NYC, just sitting there collecting everyone’s RPIs. When someone tests positive and publishes their keys, I can then track their . I won’t know who they are, but I can at least grab aggregate information about where coronavirus getters travel.

Does this sound like a bridge too far? It isn’t: passive bluetooth observation stations are already ubiquitous, so this isn’t insane.

Increased hit rate of stationary / marketing beacons

If everyone has bluetooth on all the time for health reasons, this is like duck season for companies that already operate consumer surveillance platforms targeting bluetooth / wifi. It’s a bait ball.

These companies aren’t signatory to any special privacy rules that affect this emergency, and in fact have relatively few privacy obligations generally because they don’t have a contract with the owners of the phones they’re targeting.

Not only will their data be much richer, not only can they now merge in people’s epidemiological data, not only do they have an expertise in de-anonymizing bluetooth traces, but the data that they collect now will enrich their database for a long time; understanding what their normal sparse DB looks like at, say, 80% population adherence will allow them to beef up their inference models. And this is a capability that will only slowly decay as consumer behavior and devices switch out.

Update: a reader claims ~half of users already have bluetooth on.

Also bluetooth by itself has been relatively non-private. Historically, there have been lots of ways around bluetooth MAC randomization – it’s sensitive to configuration, issues with refresh token timing. In 2017, researchers claimed to get around it every time when it was even enabled.

Leakage of information when someone isn’t sick

It seems like this isn’t possible given their spec, except:

You still have to phone home once a day to get a list of sick people’s tokens
The system can encourage somebody to go to a hospital and get tested, at which point an institution can collect a DNA sample.

Am I paranoid? Maybe, but if the question is ‘can you use this system to make someone think they need to go to the hospital based on approximate location’, the answer is yes.

Fraud resistance

This isn’t a leakage path, but I’m wondering what stops someone from sending fake positive results that cause overloads of our testing capacity as a low-grade form of terrorism.

Will we offer a ‘signed testing payload’ from labs? Will I share my DTKs with labs? The spec doesn’t say.

Every product that supports anonymous use needs to plan for fraud.

The docs I read don’t say whether the API will be locked down in any way, but I’m guessing that even if it is on ios it won’t be on droid.

Some readers point out that public health agencies will build the reporting piece themselves, reporting is beyond the scope of the spec, and this is just about making enabling changes to platform bluetooth. I think those claims are right but the US government doesn’t have a great track record of rolling out usable health products to large audiences of consumers on a crash basis.

Will it be nationwide? Multiple agencies with separate apps? State by state? These unknowns aren’t flaws in the bluetooth spec, but they reduce the odds of having a system that’s (1) simple to analyze from a privacy perspective and (2) any good. Did a government agency ask for this or is this a spec that A/G have been working on for a while and swept the dust off of.

Conclusions

I think there is information that could help us answer the ‘do we need this’ question:

Are you less likely to transmit the disease outdoors than in close quarters?
Do masks work effectively to prevent transmission by a sick person? If yes, where do they need to be worn?
How important are asymptomatic spreaders as a vector?
How effective is fever as a screening tool?

I’m not a doctor and don’t know the answers to these questions. I think the knowledge-base here is evolving and doctors may not know the answers to these questions yet.

I don’t understand the supply chain / lab capacity questions affecting test availability.

As best I understand the public policy question, it’s ‘when do we open, how do we prevent a huge wave, and how do we prepare for it’.

Given all those unknowns, I shouldn’t express an opinion on ‘do we need this’ and so I won’t.

Separately from questions of necessity, I’ll say that I hate:

All forms of centralized location tracking, mandatory or otherwise, because there is never transparency about what is collected or how it’s used (google has had multiple ‘mea culpas’ over confusing or totally ignored location settings)
Apple and Google collaborating on data collection. I think Apple lovers are saying ‘Apple keeps Google honest here’. I think that’s true in the web standards space, where they’re not collaborating but competing. I think it wasn’t true in the giant silicon valley hiring scandal, where ‘collaboration’ was actually collusion.
Having bluetooth or wifi turned on outdoors – they leak

Separately from the necessity of contact tracing, there are questions about its efficacy. This contact tracing in the real world article is from a cryptography person, not someone who works on the public health / epidemiology systems, but at least raises an important question.

If a public health authority rolls this out and encourages people to use it, we should consider also making it illegal to collect trace data for purposes other than personal testing decisions for the duration of this crisis.

All that said, we should do what we have to to stay healthy.

Appendix: Singapore’s BlueTrace

Singapore acted early to roll out bluetooth-based contact tracing (TraceTogether), which they opened up as bluetrace, whitepaper here. According to a Straits Times article from a sharp reader of this blog, only 1/6 of Singaporeans have installed the app (based on download count, which is an upper bound on usage), and they need 2/3 to bring R0 below 1.

Some takeaways from SG’s whitepaper:

The old ios bluetooth API turned out to be too private i.e. shittier for this:

the iOS version of OpenTrace is bound by restrictions that iOS has on background Bluetooth functionality. … the iOS app advertises in a proprietary advertisement format that is not part of the Bluetooth standard and thus not readable by non-iOS devices. … The current workaround is to encourage iOS users to keep their app in the foreground

They had to do some pretty neat radio calibration (which is open-sourced on their github):

through tests of devices in anechoic chambers, we have established that the variance in transmission power … can be as large as 30 dB (1000x). … transmission power varies little between different devices of the same model … we have taken reference signal strength readings for popular mobile devices in Singapore. We use this to calibrate RSSI readings when classifying encounters by proximity.

BlueTrace has a clearer process role for health authorities, as well as fraud design.

We see various challenges with a purely decentralised contact tracing system. Individuals falsely declaring themselves infected would cause unnecessary anxiety and panic in other users, and erode trust in the system. Some form of authorisation for users to either flag themselves as positive COVID-19 cases, or to upload encounter history, is therefore necessary to protect against abuse.

To protect users and the system from fraudulent uploads, an authorisation code is provided by the health authority and entered through the app in order to obtain a valid token to transmit the logs.

In the US version, this piece will presumably be taken care of by the health agencies that develop the apps. I wonder if the ACA marketplace was the last time a government agency tried to roll out a high-profile web service to everyone at once? Since then, we’ve gotten 18F and USDS – this will be their first real test.

Their encryption is also centralized:

The health authority decrypts the TempID for each encounter in the uploaded encounter history, in order to obtain the UserID and validity period. … The health authority then contacts individuals assessed to have a high likelihood of exposure to the disease, to provide medical guidance and care. … Note that this workflow can be automated and decentralised without affecting interoperability with other BlueTrace implementations. However, we do not recommend this.

the experience of Singapore’s contact tracers suggest that contact tracing should remain a human-fronted process

Appendix: Alternatives

Up front: I’m not promoting any of these and I’m not an expert here.

Thinking purely in terms of technology, if apple enables a ‘background bluetooth’ permission (similar to background location) app developers can do the rest. Lack of background bluetooth was the main reason SG’s TraceTogether was bananas on ios.

Another alternative is check-ins. Check-ins the way foursquare does it aren’t very private (they’re designed for sharing), but you could do it with most of the information kept private on the phone until you test positive, and send down a list of venues rather than individual IDs.

It’s possible to build a check-in system on top of a tracing system by having individual users check into venues and post to a system when they get a disease hit for a contact from that venue. This is less private for users who are checking in but more private for users who consume the ‘illness at venue’ reports.

If concentrated places like offices and subways are the biggest risk (I’m not an epidemiologist and I don’t know if this ‘if’ is a fact), check-ins should help a lot, especially if you provide a way to check in to individual subway cars. (Bluetooth beacons could do that, but now we’re back in always-on-bluetooth land).

Beyond check-ins, I think reducing density on mass transit will matter if this goes on for a while. I’m working on a staggering system to schedule people’s subway usage to cap simultaneous riders per car. (Which wouldn’t be super private). The city did something similar when the WTC opened in the 70s to prevent new office workers from overwhelming the station at Chambers St.