Digital Green Certificates: Security analysis not included

Every EU member state is rushing to implement Digital Green Certificates until the end of June, yet no one is stopping to look at their security.

Digital Green Certificates are a European solution to the problem of free movement in the times of the COVID pandemic. The idea is that while traveling to some other country in the EU, you won’t have to mess about with the random paper confirmation of vaccination you got at the vaccination place but instead will be able to present a standardized and interoperable vaccination or test certificate that the authorities of each member state will validate. The need for interoperability is significant, and thanks to the European Commission the opportunity to standardize on one format was used. In this post, I look at the design of Digital Green Certificates from a security perspective and outline several security issues.

Digital Green Certificates#

The Digital Green Certificate (DGC) is digital proof that a person has been vaccinated against COVID-19, received a negative test result, or recovered from COVID-19. It is valid in all EU countries, and even though it contains the word digital in its name, it can be in the form of a paper or a digital certificate¹. In the end it is just a QR code that has the vaccination/test/recovery data in it, signed by an issuing body from some member state. This is supported by public key infrastructure similar to the one used in e-passports that will be centrally operated by the EU (DIGIT). The QR code contains data such as the name of the holder, date of birth, type of the certificate, and respective certificate data (e.g., date of vaccination, vaccine name). Contrary to many claims in the media - one even from the Slovak government agency implementing the DGC apps² - the data on the QR code can be read by anyone and its confidentiality is not protected.

The technical specifications can be found here and are published by the eHealth network which is an EU body with members from all member states and Norway. The main technical specifications discussed in this post are the Technical Specifications for Digital Green Certificates:

The European Commission has tasked Deutsche Telekom and SAP to develop reference implementations of all of the required components, which can be found under the eu-digital-green-certificates Github organization, take particular note of the dgc-overview repository. Some additional specification details and components are also under the ehn-dcc-development Github organization, specifically the hcert-spec repository. There is also a Slack space provided by the Linux Foundation Public Health initiative that you can join here.

In the typical usage scenario, a person gets vaccinated and receives a paper or an email document containing the DGC (a QR code). This QR code can be scanned into a DGC wallet app which stores the person’s certificates. If the person then decides to travel to some EU state they can present the DGC using the wallet app or on paper to a border officer who will use a verifier app to verify that the DGC is valid and verify their identity against the data in the DGC. Both the wallet app and the verifier app communicate with a national backend and are developed by each member state.

The TAN factor#

Many aspects of the DGC architecture are well specified and required from implementors. However the member states are free to implement their parts of the infrastructure (wallet app, verifier app, issuer app, national backend) as they see fit with only some requirements on interoperability. Volume 4 of the specifications, which concerns the wallet and verifier apps, is not a normative specification but rather a description of the reference implementation and the closest guide one can get as to how the nationally developed apps are going to look like.

The wallet app specification describes how a user imports a DGC into the app. In this description, a TAN (Transaction Authentication Number) can be first seen. This TAN is described as a second factor that is generated with the DGC when it is issued and then sent to the user. During import of the DGC into the wallet app, the user has to scan the DGC and enter the correct TAN which is sent to the backend where it is validated against the TAN stored with the ID of the DGC. The backend allows the import only if the DGC hasn’t been previously imported and if the TAN is correct. If several incorrect TAN guesses are sent to the backend for a given DGC ID that ID is blocked and the DGC cannot be imported into the wallet app. During this import process, the app also generates a keypair (of an unspecified type) and sends the generated public key to the backend, where it is stored if the import was successful.

User story 2, transferring a Green Certificate to the wallet app

That’s enough for the description of how the app works, let’s now get to where the security issues are. To start off, the TAN is not really a second factor. The specification explicitly allows the TAN to be sent to the user alongside the DGC (page 15 of Volume 4):

Second, the TAN could be returned in conjunction with the signed DGC (as shown in the flow diagram below) or sent directly to the user’s phone.

See the issue? If the TAN is just sent alongside the DGC it cannot be a second factor. Why is there a need for a second factor here? If I somehow obtain a user’s DGC I will just print it on paper and not mess with an app that will not let me import the DGC, or if I want to make the extra effort, I will just make a clone of the app which has no such TAN check on import, but is otherwise identical. Now you might be thinking, okay, but there’s a keypair generated on import in the legitimate app and the public key gets sent to the backend so surely having the DGC in a rogue app will not help you because the verifier app checks with the backend that the DGC was imported and somehow challenges the wallet app to prove possession of the corresponding private key. Well, think again, the generated keypair is not mentioned in the rest of the document and thus it doesn’t matter that the rogue app doesn’t have the private key. In fact, it doesn’t even matter that there is some TAN check as the original paper DGC is still valid and could be used just like the digital one.

Thus, the whole TAN check and public key binding on the backend is a completely useless security measure and will always be useless as long as paper and in-app DGCs have to be treated the same. As one cannot discriminate against people with paper DGCs any attempts at better security properties than those of the paper DGCs are pointless. The inclusion of security measures which do not provide additional security is a red flag when looking at any system, as it usually means that the designers did not know why they added the security measure.

In fact, the introduction of this TAN check and public key binding nonsense introduced properties that make the app less usable and thus the system as a whole less secure. As designed, the DGC is importable only into one wallet app on one device and this can be done only once. See the issue again? This already disallows the scenario where parents traveling with children would both like to have their children’s certificates in their wallet apps or similar multi-user scenarios. What happens if the user uninstalls the app or loses the device that they imported their DGCs into? If one DGC is allowed to reside only in one (official) wallet app, then the wallet app provides worse usability and user experience than just keeping the DGC in the form of a document or image file. Thus rendering any potential security properties gained from users using the app pointless and driving the users away from the app and potentially towards malicious apps. It is also an undocumented limitation of the system that the reference wallet app does not warn the user about. The added complexity of dealing with DGC recovery after a lost device, which is currently unhandled, is another unnecessary burden introduced with the addition of the innocuous TAN check and public key binding.

As the final cherry on the cake of bad security properties the validation of the TAN on the backend during DGC import in the wallet app creates a possibility of a DoS attack against DGCs that were not yet imported and where the attacker knows or can guess the ID of the DGC. The attack is simple, the attacker simply sends a few requests with the wrong TAN as if they were importing the target DGC in the app (the request doesn’t contain the whole DGC but just its ID, the TAN and the generated public key) and after several tries the backend blocks the DGC from being imported into the official app. As the specification does not require the DGC IDs to be unpredictable or unknown to the attacker this attack is clearly feasible and nothing stops a member state from issuing DGCs with IDs that simply increment. This missing requirement with important impact on security is a clear example of the overall state of the DGC specifications with regards to security.

I was not the first to spot and point out some of the issues presented here, the Github user jquade posted this issue on the dgc-overview repository on May 2. The issue outlines the pointlessness of the TAN check. It was closed on May 6 with a comment that did not refute any of its points and gave a handwaving argument for the public key binding mentioning some future online scenarios.

Security analysis not included#

The DGC specification does not even have a clear threat model which would describe what sorts of attackers it aims at. Even questions such as: What security properties does the system claim to have? Is it supposed to stop theft of DGCs, counterfeiting of DGCs, impersonation, …? are left unanswered. The only part of the specification explicitly focusing on security is found in Volume 1 on pages 8 and 9. It considers such risks as the signing algorithm (ECDSA) being found weak!? but disregards many important risks with regards to the apps by claiming that:

These cannot preemptively be accounted for in this specification but must be identified, analyzed and monitored by the Participants.

How incredibly helpful to find this in place of a proper security design and analysis 🤦.

Conclusions#

This post presented several issues with the current specification of the Digital Green Certificates. To address the presented issues I suggest to:

Drop the TAN and public key binding parts from Volume 4 of the specification as well as the reference implementation. Doing so decreases the complexity of the design (Keep It Simple Stupid), increases the usability of the app and aligns security with the paper form of DGCs.
Have proper security design and analysis be part of the specification, written and reviewed by experts who know what they are doing. Digital contact tracing caught the attention of many researchers, a search on eprint.iacr.org for “contact tracing” gives 33 results, yet it gives 0 for “digital green certificate”. The EU has tons of great researchers and for spotting issues like the mentioned an ordinary IT security student should be sufficient and should spot them.
Provide better guidance to the developers of the member states not only on the interoperability part, but also on their apps, their security, user experience and usability. If this is the state of the specification the developers are going off, the implementations are going to be different and likely worse (see ² for an example).

To not have this post be a completely negative one, I will end on a positive note. The transparency of the DGC implementation process is laudable. The specifications are public and readable, the reference implementations along with many components used are open-source, the Github repositories are active with issues and pull requests being handled. Without all of this, this post wouldn’t exist and one could not even begin to look at the security of this system that is being implemented all across the EU.

https://ec.europa.eu/info/live-work-travel-eu/coronavirus-response/safe-covid-19-vaccines-europeans/eu-digital-covid-certificate_en ↩
They claimed that only a special “certified” verification app will be able to access the data in the certificate as they will be encrypted and the key will be stored in the verification app. ↩↩

COVID-19 vaccination notifications in Slovakia

Ak hľadáte stránku na odber notifikácii o COVID-19 očkovaní na Slovensku, nájdete ju na covid.neuromancer.sk. Stránka poskytuje notifikácie na Váš email o voľných miestach na očkovanie proti ochoreniu COVID-19 a tiež o momente otvorenia očkovacieho formuláru pre nové skupiny obyvateľov. Stránka používa informácie od NCZI avšak nie je s NCZI alebo Ministerstvom Zdravotníctva akokoľvek asociovaná. Pre registráciu na očkovanie použite formulár NCZI.

Population-wide antigen testing for COVID-19 in Slovakia

Slovensko plánuje celoplošné antigénové testovanie na COVID-19 a z pohľadu na tlačovky to vyzerá, že to robí nie práve informovane. Tento príspevok obsahuje interaktívny nástroj na odhadovanie a výpočet chybovosti týchto testov na populácii. Na výpočet toho, koľko pozitívnych prípadov test zachytí (true positive) či koľko negatívnych ľudí prehlási za pozitívnych (false positive) je treba niekoľko parametrov. Parametre Populácia, Účasť a Nakazení sú odhady, pričom odhad nakazených v populácii (a aj motivácia za týmto príspevkom) je z príspevku Richarda Kollára. Odhad Senzitivity testu je z porovnávacej štúdie FN Motol. Odhad Špecificity testu je pomerne optimistický a väčšina štúdii ho pre plánované antigénové testy určuje nižšie.

Nástroj je interaktívny a odhady parametrov je možné meniť.

Metóda 1Method 1

Populácia

Účasť

Nakazení

Senzitivita testu

Špecificita testu

Pravdivo pozitívniTrue positive	Falošne pozitívniFalse positive	Pravidivo negatívniTrue negative	Falošne negatívniFalse negative	Netestovaní pozitívniMissed positive

Kód ktorý robí výpočet môžete nájsť nižšie (JavaScript) a vo forme Jupyter notebooku aj na binderi.

// Calculate the population that will get tested
let tested_population = population * participation;

// Calculate the infected among the tested and non-tested
// Assumption that attendance is uniform among infected and non-infected
let tested_infected = infected * participation;
let tested_clean = tested_population - tested_infected;

// Calculate the true/false and negative/positive from the tested sample,
// with given sensitivity and specificity
let true_clean = tested_clean * specificity;
let false_infected = tested_clean * (1 - specificity);
let true_infected = tested_infected * sensitivity;
let false_clean = tested_infected * (1 - sensitivity);

// Calculate the missed infected
let missed_infected = infected * (1 - participation);
return {
    "true_negative": true_clean,
    "false_positive": false_infected,
    "true_positive": true_infected,
    "false_negative": false_clean,
    "missed_positive": missed_infected 
};

Metóda 2Method 2

Populácia

Testovaní

Senzitivita testu

Špecificita testu

Pozitívne otestovaní

Infikovaní testovaníTested infected	Neinfikovaní testovaníTested clean	Infikovaní celkovoTotal infected	Neinfikovaní celkovoTotal clean

Pravdivo pozitívniTrue positive	Falošne pozitívniFalse positive	Pravidivo negatívniTrue negative	Falošne negatívniFalse negative	Netestovaní pozitívniMissed positive

Kód ktorý robí výpočet môžete nájsť nižšie (JavaScript) a vo forme Jupyter notebooku aj na binderi.

let attendance = tested / population;
let tested_negative = tested - tested_positive;

// Calculate the number of infected among the tested
let tested_infected = (specificity * tested_positive - (1 - specificity) * tested_negative) / (specificity + sensitivity - 1);
let tested_clean = tested - tested_infected;

// Assumption that attendance is uniform among infected and non-infected
let total_infected = (tested_infected / tested) * population;
let total_clean = (tested_clean / tested) * population;

// Calculate the missed infected
let missed_infected = total_infected - tested_infected;

// Calculate the true/false and negative/positive from the tested sample, with given sensitivity and specificity
let true_clean = tested_clean * specificity;
let false_infected = tested_clean * (1 - specificity);
let true_infected = tested_infected * sensitivity;
let false_clean = tested_infected * (1 - sensitivity);
return {
    "tested_infected": tested_infected,
    "tested_clean": tested_clean,
    "total_infected": total_infected,
    "total_clean": total_clean,
    "true_negative": true_clean,
    "false_positive": false_infected,
    "true_positive": true_infected,
    "false_negative": false_clean,
    "missed_positive": missed_infected 
};

VysvetlivkyLegend

Pravdivo pozitívny: Prípad kedy bol pozitívny človek správne identifikovaný testom ako pozitívny. Z populácie sa tak izolujú symptomatickí aj asymptomatickí ľudia a preruší sa tak táto vetva prenosu ochorenia.
Falošne pozitívny: Prípad kedy človek nemá COVID-19 avšak bol testom falošne identifikovaný ako pozitívny (bude absolvovať karanténu a následne si môže myslieť, že COVID-19 už prekonal a má imunitu).
Pravdivo negatívny: Prípad kedy bol negatívny človek správne identifikovaný testom ako negatívny.
Falošne negatívny: Prípad kedy človek má COVID-19 avšak bol testom falošne identifikovaný ako negatívny (a bude mať rozšírené možnosti pohybu na verejnosti).
Netestovaný pozitívny: Prípad kedy človek má COVID-19, avšak nezúčastnil sa celoplošného testovania (a bude mať obmedzené možnosti pohybu na verejnosti).

Analysis of the Covid19 ZostanZdravy app - Contact-tracing

Tento post analyzuje Slovenskú contact-tracing aplikáciu Covid19 ZostanZdravy , konkrétne jej bezpečnosť a udržiavanie súkromia. Aplikácia je vyvjíjaná dobrovoľníkmi zo spoločnosti Sygic, avšak oficiálne beží pod kontrolou NCZI, Národného Centra Zdravotníckých Informácii, prevádzkovateľom aj spracávcom osobných údajov je ÚVZ, Ústav Verejného Zdravotníctva (podmienky ochrany súkromia). Táto analýza bola vytvorená z verejne dostupných informácii, čo bolo možné vďaka tomu, že aplikácia aj jej backend server sú open-source (analyzované boli commity 400aa52, 2710f09 a f9b9d2c). Táto analýza ukazuje problémy v momentálnom fungovaní contact-tracing časti aplikácie a navrhuje spôsob ako ich odstrániť. Analýza reprezentuje best-effort analýzu a bola vytvorená zhruba za deň, môže obsahovať chyby, som otvorený feedbacku a komentárom .

Súkromie

Aplikácia nepoužíva známe contact-tracing protokoly ako DP-3T, PEPP-PT NTK alebo ROBERT, ale namiesto toho používa vlastný protokol na contact-tracing, ktorého dizajn navrhli sami vývojári. Tento stav nastal kvôli tomu, že aplikácia bola vyvinutá predtým než sa zmienené contact-tracing protokoly objavili.

Použitý contact-tracing protokol je centralizovaný, funguje na báze BLE (Bluetooth Low Energy) vysielania so statickým identifikátormi a funguje zhruba nasledovne:

Užívateľ si nainštauje aplikáciu, ktorá vygeneruje deviceID (náhodné UUID zariadenia) a zaregistruje zariadenie na servri, pričom dostane naspäť profileID (unsigned integer) ktoré je serverom prideľované inkrementálne (a teda n-tý zaregistrovaný používateľ má profileID = n).
Aplikácia následne po dobu používania pravidelne vysiela profileID zariadenia pomocou BLE a takisto počúva a zaznamenáva profileID zaslané ďalšími zariadeniami.
Aplikácia pravidelne nahráva zoznam profileID s ktorými prišla do kontaktu (ktorý trval viac ako 5 minút) na server. Nahratie kontaktov na server je autentizované pomocou deviceID, ktoré je potrebné na nahratie kontaktov (a iné interakcie s API serveru) ale inak neopúšťa zariadenie. Nahraný zoznam kontaktov donedávna obsahoval aj čas a dĺžku po ktoré boli dve zariadenia v kontakte(jedno zariadenie počulo BLE vysielanie druhého) avšak toto sa zmenilo a aplikácia odosiela už len deň kontaktu.
Keď je o používateľovi potvrdené, že je infikovaný, akcie protokolu sa stávajú nejasné, keďže open-source backend server poskytuje len HTTP API, ktoré využíva na administráciu celého systému administračná aplikácia, ktorá ale nieje open-source. Avšak, o tom čo sa stane po potvrdení infikovania sa dá niečo zistiť z API ktoré backend server poskytuje. Toto API poskytuje administratívny call ktorý pre daného používateľa (identifikovaného pomocou jeho deviceID a profileID) vráti zoznam kontaktov (teda profileID). Toto API je pravdepodobne použité v administrácii na získanie kontaktov infikovaného používateľa a rozoslanie upozornení používateľom ktorý prišli do kontaktu s infikovaným. Je dôležité poznamenať, že tento API call za kontakt považuje aj kontakt nahlásený iba jednou stranou, teda vracia zoznam profileID, ktoré niekedy nahlásili že videli vysielané profileID infikovaného po viac ako 5 minút. Tento zoznam sa nekontroluje s kontaktami ktoré nahlásil sám infikovaný.

Tento prístup poskytuje celú sieť kontaktov všetkých používateľov centrálnemu serveru, bez ohľadu na to či je daný používateľ infikovaný alebo nie. Takýto graf kontaktov, ajkeď pseudonymný, predstavuje značné množstvo informácii o používateľoch (viď napríklad tento dokument, sekcia 4).

Oznámenie kontaktu s infikovaným

Ako je popísané vyššie, je pravdepodobné, že oznamovanie kontaktu s nakazeným využíva nahlásené kontakty iba jednej strany, keď sa teda od serveru požadujú všetky kontakty používateľa X, server prehľadá kontakty všetkých používateľov a nájde tých, ktorý nahlásili, že dané profileID videli (viď kód). Takéto oznamovanie kontaktu môže dávať zmysel, pokiaľ je potrebné počítať s možnosťou, že niektoré zariadenia budú na nejakú dobu offline a nenahrajú svoje kontakty. Ak by boli potrebné na oznámenie kontaktu uploady z obidvoch strán, mohlo by to vytvoriť problém s false-negatives taktiež kvôli nespoľahlivosti Bluetooth komunikácie. Avšak takáto implicitná dôvera kontaktom uploadnutým zariadením používateľa spolu so spôsobom akým sú generované profileID (ako monotónna inkrementujúca sa sekvencia čísel, viď kód tu a tu) umožňuje útok pri ktorom útočnik získa informáciu o infikovanosti všetkých používateľov.

Útočník si zaregistruje nový profil, získa tým svoje profileID. Keďže profileID sú generované inkrementálne, útočník teraz môže enumerovať všetky predošlé registrované profileID.
Útočník si následne vytvorí nový profil pre každý z predošlých registrovaných profilov.
Potom bude útočník pravidelne hlásiť kontakt z každého z jeho profilov s práve jedným registrovaným profilom, teda nahlási kontakt s profilom #1 na útočníkovom profile #1, nahlási kontakt s profilom #2 na útočníkovom profile #2 atď.
Ak sa následne potvrdí infekcia u nejakého z používateľov zaregistrovaných pred zaregistrovaním útočníkového prvého profilu, zoznam kontaktov tohoto používateľa bude vždy obsahovať práve jeden útočníkov profil a útočník teda dostane notifikáciu o kontakte s infikovaným používateľom na práve jeden profil, čo indikuje ktorý používateľ bol infikovaný.
Tento útok sa dá rozšíriť aj o kompletnú deanonymizáciu infikovaného používateľa, a to umiestnením BLE pasívnych zariadení na verejnych miestach, spolu s kamerou zameranou na dané miesto a následným korelovaním vypočútých profileID v BLE vysielaniach a záberom z kamery (viď tu). Tento zber dát môže byť vykonaný aj pred útokom samotným alebo pred používateľovým infikovaním.

Pokiaľ by však bola použitá implicitná dôvera v druhú stranu kontaktu, a teda vyhľadávanie kontaktou používateľa X by dôverovalo kontaktom uploadnutým týmto používateľom, bol by možný iný útok, ktorý by umožnil útočníkovi označiť všetkých používateľov akoby boli v kontakte si infikovanou osobou.

Útočník si zaregistruje nový profil a dostane profileID. Keďže profileID sú generované inkrementálne, útočník teraz môže enumerovať všetky predošlé registrované profileID. Útočník však nemôže posielať na server správy za iné zariadenia a nahlasovať tak napríklad kontakty za iné zariadenia, keďže tieto správy vyžadujú deviceID ktoré je tvorené náhodným UUID s dostatkom entropie.
Útočník môže nahlasovať profileID všetkých používateľov ako kontakty, denne po istú dobu.
Útočník môže poskytnúť svoje údaje z profilu/zariadenie osobe ktorá je pravdepodobne infikovaná a kooperuje s ním. Táto osoba sa nechá otestovať a dostane pozitívny výsledok a potvrdenie o infikovaní. Toto potvrdenie potom útočník použije aby v aplikácii nahlásil infikovanie svôjho profilu, čím okamžite označí všetkých užívateľov akoby boli v kontakte s infikovanou osobou (útočníkovým profilom).

Modifikovať systém tak, aby vyžadoval nahlásenie kontaktu z obidvoch strán kontaktu sa môže zdať ako jednoduchá oprava systému, avšak prináša vyšsie zmienené problémy s false-negatives, vďaka zariadeniam ktoré prejdu offline alebo Bluetooth problémom (keď iba jedno zariadenie videlo dostatočne veľa BLE vysielaní toho druhého aby nahlásilo kontakt). Momentálny systém s predikovateľnými a statickými identifikátormi pravdepodobne nikdy nebude možný bez podobných útokov.

Používanie vlastného contact-tracing protokolu, aký systém momentálne používa, je bezpečnostné riziko aj keď by sa v ňom zmienený útok opravil, keďže riadna špecifikácia a bezpečnostná analýza je potrebná na dizajn podobných protokolov. Obidva z týchto prvkov sa dajú získať použitím zavedeného protokolu ako napríklad DP-3T. Ako hovorí mantra kryptografickej komunity, Don't roll your own crypto!

Reprodukovateľnosť a deployment

Tri komponenty systému, Android aplikácia, iOS aplikácia a backend server sú plne open-source, čo je dobré z perspektívy analýzy a tiež je absolútne minimum toho aký by mal akýkoľvek contact-tracing systém byť.

Avšak na tom ako je aplikácia skompilovaná a vydaná už chýba transparentnosť, je nejasné aké verzie kódu aktuálne bežia na serveri, alebo aké verzie kódu sú v aplikáciách vydaných na Google či Apple app store. Repozitár Android aplikácie neobsahuje jej úplnú konfiguráciu a teda nieje možné ju lokálne skompilovať tak aby bola rovnaká s publikovaným APK na Google Play store.

Reprodukovateľnosť kompilácie je dôležitá, aby sa dalo zaistiť, že kód môže byť analyzovaný a tvrdenia z tejto analýzy kódu sa dajú jednoducho aplikovať aj na vydanú aplikáciu. Ak reprodukovateľnosť nieje splnená, je potrebná dekompilácia aplikácie a analýza dekompilovaného kódu, s reprodukovateľnou kompiláciou je potreba len analýza otvoreného kódu, kompilácia a porovnanie hashu.

Špecifikácia a dokumentácia

Systém nemá žiadnu špecifikáciu, contact-tracing protokolu, backend API alebo akéhokoľvek komponentu. Bez detailnej špecifikácie všetkých komponent systému a ich správania je riadna analýza systému veľmi náročná ak nie nemožná. Toto je možno vidieť na mojich vyjadreniach vyššie, kde nedostupný komponent systému, administrátorská aplikácia, robí rozhodnutia ktoré ovplyvňujú či a ako bude útok fungovať. Bez špecifikácie, ktorá mala byť vytvorená ešte pred implementáciou, existenciu viacerých zraniteľností v systéme nieje možné jednoducho vylúčiť, sú však ťažšie na nájdenia a opravenie.

Komponentom systému tiež chýba dokumentáci, okrem osamelého README súboru tu a tam. Dobre zdokumentované komponenty by uľahčili bezpečnostnú analýzu systému a takisto pomohli novým ľuďom a developerom prispieť do projektu kódom.

Testy

Android aplikácia neobsahuje žiadne testy, iOS aplikácia obsahuje test priečinok bez testov. Server je jediný komponent s akýmikoľvek testami, aj ten obsahuje len zopár testov pre push-notification službu, SMS službu a pár unit testov pre core repository. Absencia testov je signifikantným problémom pre tak citlivú aplikáciu, keďže pravdepodobnosť výskytu chýb v kóde je vyššia ak kód neobsahuje žiadne testy.

Kalibrácia a praktické testovanie

Aplikovateľnosť tohoto systému na contact-tracing nebola prakticky otestovaná v reálnom svete. Takéto testovanie je nutné pre riadnu kalibráciu toho, čo epidemiologicky významné stretnutie je, a toho, ako takéto stretnutie vyzerá z pohľadu BLE vysielaní. Moderné zariadenia majú silné vysielacie aj prijímacie možnosti, ak sa akákoľvek sekvencia pritých BLE správ dlhšia ako 5 minút počíta za kontakt (takto aplikácia momentálne funguje), počet false-positives bude pravdepodobne vysoký.

Na kalibrácii a praktickom testovaní v reálnom svete robí momentálne DP-3T tím, využíva pritom aplikáciu postavenú na ich decentralizovanom contact-tracing protokole, a robí tak ešte pred vydaním ich aplikácie vo Švajčiarsku (viď tu a tu).

Iné systémy

V porovnaní s momentálnym vývojom a plánmi na contact-tracing rôznych krajín je táto aplikácia jasne najmenej zachovávajúcou súkromie, kvôli spomenutým problémom (plná sieť kontaktov na serveri, útoky, statické a predikovateľné ID).

Projekt DP-3T prezentuje decentralizovaný contact-tracing protokol ktorý zachováva súkromie používateľov, poskytuje silné garancie v ohľade bezpečnosti, má detailnú špecifikáciu, má zverejnené SDK, má otvorený zdrojový kód a prešiel riadnou bezpečnostnou analýzou. Je taktiež podporovaný veľkou skupinou výskumníkov z oblasti security & privacy. Tento protokol bude využitý vo Švajčiarsku (app). Existuje tiež snaha o dizajn systémov na interoperabilitu contac-tracing protokolov, ktorá zahŕňa DP-3T (tu).

Situácia v UK je určite horšia z pohľadu súkromia ako je situácia v Švajčiarsku, NHSX/NCSC nedávno zverejnilo špecifikáciu pre vlastný centralizovaný contact-tracing systém, ktorý nezachováva súkromie používateľov (viď tu pre analýzu od Martina Albrechta a tu pre analýzu od Kennyho Patersona).

Od vytvorenia prvých contact-tracing protokolov bolo zverejnených niekoľko verejnych vyjadrení od stoviek Európskych a svetových vedcov a výskumníkov hlavne z oblasti security & privacy, ktoré volajú po zodpovedných prístupoch ku contact-tracingu ktoré zachovávajú súkromie ľudí. Dostupné sú tu a tu. Tieto vyjadrenia podporujú decentralizovaný prístup ku contact-tracingu ako napríklad DP-3T a jasne odmietajú centralizovaný postup ktorý je použitý v prípade aplikácie Covid19 ZostanZdravy.

Závery a odporúčania

Myslím si že aplikácia, v stave v ktorom momentálne je, predstavuje značné riziko z pohľadu súkromia jej používateľov. Nasledujúci zoznam sumarizuje prezentované problémy:

Aplikácia odhaľuje úplnú sieť kontaktov všetkých používateľov centrálnemu serveru.
Aplikácia používa statické a predikovateľné uživateľské identifikátory.
Aplikácia umožňuje útok pri ktorom útočník získa informáciu o infikovanosti všetkých používateľov.
Aplikácia neumožňuje reproducibilitu kompilácie a teda korešpondencia medzi zverejneným kódom a vydanou aplikáciou nieje jednoducho dokázateľná.
Apkilácia nemá špecifikáciu a dokumentáciu.
Aplikácia nemá skoro žiadne testy.
Neprebehla žiadna verejná bezpečnostná analýza contact-tracing protokolu alebo jednotlivých aplikácii.
Neprebehla žiadna kalibrácia alebo praktické testovanie v reálnom svete.

Pri porovnaní aplikácie s princípmi navrhnutými v Joint Statement on Contact Tracing, aplikácia nevyhovuje v troch zo štyroch bodov.

"Contact tracing Apps must only be used to support public health measures for the containment of COVID-19. The system must not be capable of collecting, processing, or transmitting any more data than what is necessary to achieve this purpose." Aplikácia zbiera plnú sieť kontaktov každého používateľa, čo nieje nutné.
"Any considered solution must be fully transparent. The protocols and their implementations, including any sub-components provided by companies, must be available for public analysis. The processed data and if, how, where, and for how long they are stored must be documented unambiguously. Such data collected should be minimal for the given purpose." Dáta zbierané aplikáciou niesu minimálne.
"When multiple possible options to implement a certain component or functionality of the app exist, then the most privacy-preserving option must be chosen. Deviations from this principle are only permissible if this is necessary to achieve the purpose of the app more effectively, and must be clearly justified with sunset provisions." Contact-tracing protokol ktorý bol naimplementovaný jasne nezachováva súkromie v maximálnej možnej miere.
"The use of contact tracing Apps and the systems that support them must be voluntary, used with the explicit consent of the user and the systems must be designed to be able to be switched off, and all data deleted, when the current crisis is over." Aplikácia je dobrovoľná.

Chcel by som zdôrazniť, že analýza podobná tejto mala prebehnúť dávno predtým než táto aplikácia dosiahla momentálne úrovne popularity a vývoja. Jeden spôsob ako odstrániť niektoré zo spomenutých problémov je presunúť aplikáciu na DP-3T contact-tracing protokol, ktorý má dostupné implementácie na Android aj iOS a prešiel značnou bezpečnostnout analýzou. Tento krok by odstránil problémy prameniace z contact-tracing protokolu použitého ale tiež by pomohol s inými problémami, keďže potreba plnej špecifikácie by bola nižšia, kód vyžadujúci dokumentáciu by bol jednodušší a ostalo by menej kódu na testovanie. Problémy s kalibráciou a testovaním v reálnom svete by boli tiež odstránené vďaka prebiehajúcemu testovaniu tímom DP-3T.

Jeden praktický problém, ktorý som nezmienil keďže nesúvisí s bezpečnosťou či súkromím, je problém s Bluetooth vysielaním na iOS zariadeniach keď sa aplikácia nachádza v pozadí. Tento problém by bol tiež vyriešený použitím DP-3T, keďže DP-3T plánuje použiť Apple contact-tracing API vo svojom iOS SDK, keď sa stane dostupným.

This post analyzes the Slovak contact-tracing app Covid19 ZostanZdravy from a security and privacy perspective. The app is being developed by volunteers from Sygic, but is officially running under control of NCZI, the National Health Information Center, with data ownership by UVZ, the Public Health Authority of Slovakia (see the privacy policy). This analysis was performed from publicly available sources, which was possible as both the app and backend are open-source (the analyzed commits were 400aa52, 2710f09 and f9b9d2c). The text below represents the issues I see in the current workings of the contact-tracing part of the app and provides an outlook on fixing them and moving forward. The analysis represents a best effort analysis done in a day, it might contain errors, or I might have misrepresented something, I am open to comments .

Privacy

The app does not use an established contact-tracing protocol, such as DP-3T, PEPP-PT NTK or ROBERT, but instead uses a custom designed protocol to perform contact-tracing. This is because the app predates those protocols by a few weeks. The contact-tracing protocol is a BLE-based contact-tracing protocol with static IDs that roughly works as follows:

The user installs the app, which generates a deviceID a random UUID of the device, enrolls this device with the server and receives back a profileID which is an unsigned integer, assigned in a increasing sequence by the server.
The app then broadcasts the profileID of the device on BLE and listens to other broadcasted profileIDs of other devices.
The app then periodically upload a list of seen profileIDs to the server. This upload and all of the app's interaction with the server is authenticated by the deviceID which is sent to the server in every request and is kept on the device otherwise. The uploaded list of contacts used to contain the time and duration of the contacts, but this was abandoned and instead only the day of contact is uploaded.
When the user becomes infected, the actions of the protocol become unclear, as the open-source backend is just an HTTP API, the administration of the whole system is done through an admin app that interacts with the backend, but is not open-source. However, something can be deduced from the API offered by the backend, as it offers one administrative call to query the seen profileIDs by a given device (identified by both the deviceID and profileID). This call is likely used by the admin app to query the contacts of a newly infected user and send alerts/quarantine recommendations to them. It is important to note that this call reports one-sided contacts as reported by the users.

This approach clearly provides the whole contact graph of a user's device to the server, whether the user is infected or not. Such a contact graph, while it is pseudonymous, leaks significant private information about the users to the server (see this document, section 4).

Contact reporting

As described above, it is likely that the reporting of contacts of an infected user uses only one-sided contacts submitted by the user's device, i.e. when querying the contacts of a user X, the contacts of all users are queried for X's profileID (see the code). Which might make sense, if one accounts for the possibility of some devices going offline and not uploading their contacts. If contact reports from both parties were necessary to report a contact, this might pose problems. However, this implicit trust of user's reported contacts, together with the way profileIDs are assigned (unsigned increasing integer sequence, see the code here and here) creates an attack on the system, in which an attacker can get the infection status of all of the users.

Attacker first creates a new profile, and receives back their profileID. As profileIDs are generated incrementally, the attacker can now enumerate all previously registered profileIDs.
The attacker creates a new profile for each of the existing user's profiles.
Then the attacker will report a contact from each of his profiles with exactly one of the legitimate user's profiles, i.e. attacker's profile #1 reports contact with user profile #1, attacker's profile #2 reports contact with user profile #2, and so on.
When any of the users registered before attacker's profile registration are confirmed infected, the query for their contacts will always include the one attacker's profile and the attacker will get a notification of being in contact with an infected user.
There is also the possibility of extending this attack to complete deanonymization of an infected user, by placing BLE listening devices in particular public places, together with a camera capturing the area, and then correlating the captured broadcasts with the camera view of the area (see here). This data collection can be performed even before the attack itself or before the user's infection.

If however, the implicit trust was one-sided the other way, i.e. querying the contacts of a user X would trust their reported contacts a different attack would be possible, one that would mark all users as having contact with an infected person.

The attack would work as follows:

Attacker registers a profile with the server, and receives back their profileID. As profileIDs are generated incrementally, the attacker can now enumerate all previously registered profileIDs. They can not however spoof messages to the API as users with those profileIDs as deviceIDs are required for that, and those are random UUIDs that contain enough entropy.
The attacker can however report any and all profileIDs in use to the server as contacts, possibly daily for some period of time.
The attacker can now give the account details/device with the account details to a likely infected cooperating person, which will get tested and obtain a confirmation of infection from a health authority. The person then confirms their infection with the attackers account details, which immediately marks all of the users in the system as exposed to an infected person.

Modifying the system to rely on both sides of an encounter to report it might seem like an easy fix, however that brings the aforementioned issues of false-negatives created by devices going offline, or devices with different bluetooth strength (where only one device saw enough broadcasts of the other device to report a contact) and so on. The current system with predictable and static user IDs will likely always suffer from similar attacks.

Using a custom contact-tracing protocol, as the system does, is a security risk even if the above attack is fixed, as proper specification and security analysis is necessary to get it right. One can get both of those by using an established protocol such as DP-3T. As the cryptography community mantra rightfully states, Don't roll your own crypto!

Build reproducibility and deployment

The three components of the system, the Android app, the iOS app and the backend server are all open-source, which is quite nice from an analysis perspective and also the bare-minimum a contact-tracing system should be.

There is however no transparency over the build and deployment process, e.g. what versions of code actually run on the server, or are provided in the respective app stores. The Android app does not contain the full configuration and it is thus not possible to build it reproducibly such that the built APK matches the app store APK perfectly.

Having build reproducibility for a privacy sensitive app is important, to ensure that code can be analyzed and that arguments from this code analysis can be applied to the deployed app. Also to make decompilation and analysis of deployed apps not necessary apart from a comparison of the app's hash.

Specification and documentation

The system lacks any proper specification, of the contact-tracing protocol, backend API or really any component. Without a detailed specification of all of the system's components and their responsibilities and behavior, proper analysis is resource-intensive if not impossible. This can be seen from my statements about the attacks above, where an unavailable component of the system, the admin app, makes decisions that influence how and if an attack would work. Without this specification, which should have been created before implementation took place, more vulnerabilities in the system cannot be ruled out, they will however remain harder to find and fix.

The components also lack documentation, apart from a README here and there. Having properly documented components would make security analysis of the system easier, as well as help new contributors to contribute to the project.

Tests

The android app contains no tests at all, the iOS app contains a test directory that contains no tests. The server is the only component with any tests, and contains a few tests for the push-notification service, SMS messaging service and a few unit tests for the core repository. This absence of tests is a serious issue for a privacy sensitive app, as the likelihood of errors in the code with absolutely no tests is high.

Calibration and real-world testing

The contact-tracing capabilities of the app have not been properly tested in the real-world, to the best of my knowledge. Such testing is necessary for proper calibration of what an epidemiologically significant encounter is and how it manifests in the BLE broadcasts. Modern devices have strong capabilities to both broadcast and receive the broadcasts, if any sequence of correctly received broadcasts longer than 5 minutes is counted as an encounter (as currently done in the app), the number of false-positives would likely be quite high.

Calibration and real-world testing is currently being performed by the DP-3T team, using an app built using their decentralized contact-tracing protocol, even before the deployment of the app in Switzerland (see here and here).

Conclusions and recommendations

I believe the app, as it is now, presents a significant risk from a privacy perspective. The following list summarizes the issues presented:

The app reveals the full contact graph of all of its users to the server.
The app uses static and predictable user IDs.
The app allows for an attack in which an attacker gains the infection status of all users.
The app is not build reproducibly and thus correspondence between the deployed apps and the sources can not be easily confirmed.
The app has no specification and documentation.
The app has almost no tests.
There was no public security analysis of the contact-tracing protocol or the apps.
There was no calibration and real-world testing of the app and system.

When comparing the app to the principles outlined in the Joint Statement on Contact Tracing, the app fails all but one.

"Contact tracing Apps must only be used to support public health measures for the containment of COVID-19. The system must not be capable of collecting, processing, or transmitting any more data than what is necessary to achieve this purpose." The app collects the full contact graph of all users, which is unnecessary.
"Any considered solution must be fully transparent. The protocols and their implementations, including any sub-components provided by companies, must be available for public analysis. The processed data and if, how, where, and for how long they are stored must be documented unambiguously. Such data collected should be minimal for the given purpose." The data collected by the app is not minimal.
"When multiple possible options to implement a certain component or functionality of the app exist, then the most privacy-preserving option must be chosen. Deviations from this principle are only permissible if this is necessary to achieve the purpose of the app more effectively, and must be clearly justified with sunset provisions." The contact-tracing protocol implemented is clearly not the most privacy-preserving, but likely the simplest.
"The use of contact tracing Apps and the systems that support them must be voluntary, used with the explicit consent of the user and the systems must be designed to be able to be switched off, and all data deleted, when the current crisis is over." The app is currently voluntary.

I want to stress that an analysis like this one should have been performed long before the app achieved current levels of deployment. A way to fix some of the issues above would be to move the app to the DP-3T contact-tracing protocol, which has SDKs available for both Android and iOS, and has passed significant security and privacy analysis. This would fix the privacy and security issues inherent in the protocol used, but also help with other issues, as the need for a full specification would be lower, the code to document would be simpler and there would be less code to test. Calibration and testing issues would be also resolved by the currently ongoing testing by the DP-3T team.

One practical issue that I did not mention, as it does not pertain to security or privacy, is that of Bluetooth broadcast issues on iOS. This would be resolved by using DP-3T as well, since the iOS SDK of DP-3T plans to utilize the Apple provided contact-tracing APIs, when they become available.