The state of tooling for verifying constant-timeness of cryptographic implementations

This post explores the current state of tools for verification of constant-time properties in cryptographic implementations, both static and dynamic. These tools are mostly unused in the development of open-source cryptographic libraries and remain only as results of academic work. I know of only four open-source cryptographic library that utilize these tools in an automated manner, similar to how unit tests, test-vectors, or even fuzzing is commonplace. Below is a list of what popular open-source cryptographic libraries run in their Continuous Integration (CI) setups collected on a best-effort basis.

UPDATE: For an updated list of tools see the following Github page: https://crocs-muni.github.io/ct-tools/

  • OpenSSL: Builds, tests, fuzzing (OSS-Fuzz)
  • LibreSSL: Builds, tests, fuzzing (OSS-Fuzz)
  • BoringSSL: Builds, tests, fuzzing (Custom buildbots + OSS-Fuzz), constant-time verification using a ctgrind-like approach
  • BearSSL: No public CI, fuzzing (OSS-Fuzz) (constant-time documentation)
  • Botan: Builds, tests, fuzzing (Travis + OSS-Fuzz), constant-time verification using ctgind
  • Crypto++: Builds, tests
  • wolfSSL: No public CI, fuzzing (OSS-Fuzz)
  • mbedTLS: Builds, tests, fuzzing (OSS-Fuzz), constant-time verification using ctgrind and MemSan
  • libtomcrypt: Builds, tests
  • libgcrypt: No public CI?
  • libsodium: Builds, tests, fuzzing (OSS-Fuzz)
  • MatrixSSL: No public CI?
  • Amazon s2n: Builds, tests, fuzzing, constant-time verification using ct-verif and SideTrail
  • GnuTLS: Builds, tests, fuzzing (OSS-Fuzz)
  • NSS: Builds, tests, fuzzing (OSS-Fuzz)

Of particular note is the cryptofuzz project, which fuzzes the above (and more) cryptographic libraries as part of OSS-Fuzz.

Evaluation#

The tools presented here are evaluated and categorized based on several characteristics.

The first and most significant criterion is the general approach the tool takes and whether it is dynamic or static, e.g., whether it runs the target or not. Dynamic tools usually instrument the target in some way, observe its runs with varying inputs and then evaluate whether differences in runs leak via an observable timing side-channel. Static tools, on the other hand, usually use formal techniques from static analysis and verification of programs to analyze the target and conclude whether it is constant time, with regards to some leakage model.

A second significant criterion that differentiates many of the tools is the input level at which they work. Several tools work on the source code level, most with the C language, some with a custom domain-specific language. Other tools choose to work on a lower level, often in LLVM Intermediate Representation (IR), an assembly-like language in Static Single Assignment (SSA) form used in the LLVM toolchain but also in many open-source program analysis tools. As all of the mentioned input levels are above the assembly/binary level, they are exposed to a level of risk that a compiler will somehow introduce timing leakage, which will not be caught. This risk is entirely realistic, as general-purpose compilers do not offer side-channel resistance and have no obligation to keep the code leakage free. Finally, some tools work directly with compiled binaries and thus provide the highest guarantees that a compiler will not introduce leakage.

The leakage model used and the possibility to configure it forms another essential property of the tools. I consider the leakage model to be what the tool considers to be sources of timing leakage. There are three common leakage models that are often combined in the tools. The branching leakage model considers all branching instructions (program counter changes) conditional on secret values to be leaking. The memory-access leakage model considers all secret dependent memory accesses to be leaking. Lastly, the operand leakage model considers all use of particular instructions with secret dependent operands to be leaking the operands. This leakage model is specific to some processor architectures and instructions which take a variable time to execute, see, for example, the variable time multiplier on ARM (ARM7TDMI Technical Reference Manual). The three mentioned leakage models are usually used together. Both the branching and memory-access leakage models have a version that modifies them such that the existence of processor cache is properly modeled. For example, this allows for code that accesses memory based on a secret value, but only in the space of one cache line (sometimes as a result of applying a cache preloading countermeasure), thus a cache attacker gains no secret information. The above models, especially the operand one, have drawbacks as they require assumptions on hardware behavior or modeling of hardware, and are hardware-specific. Dynamic tools might consider a completely different leakage model, for example, the attacker might only learn the number of instructions that were executed during a run of a function, or sometimes its runtime. This model is applicable for remote or local - but not cache - timing attackers.

For static tools, the properties of soundness and completeness are achievable and essential. A sound tool only deems secure programs secure, thus has no false negatives, while a complete one only deems insecure programs insecure, thus has no false positives. Here I adopt the notion that the tool aims to detect the presence of timing leakage (positive result) and thus derive the false positive/negative notions as above. In the case of dynamic tools, soundness and completeness are often unachievable, and one can only argue about the false-negative rate and the false-positive rate or generally about classes of leakage the tool can find and of classes of program constructs the tool will flag falsely.

Connected to the notion of errors in the tool’s output is its flexibility regarding what values it considers secure detects their leakage. Some tools give the developer the ability to declassify a secret value, thereby exempting it from analysis. This is clearly a double-edged sword. While necessary for some program constructions, its abuse would lead to false negatives. One common application of declassification in cryptographic implementations is in rejection sampling, e.g., when a secret is repeatedly sampled and thrown away until a condition is satisfied. Rejection sampling leads to variable-time code, however with some slight assumptions on the random number generator and the condition on the secret, such leakage is benign and should be ignored. Another source of benign leakage exists in the form of publicly observable outputs. They arise, for example, in the decryption function of an authenticated encryption cipher, as the verification result (which is clearly secret dependent and which is returned) also affects whether the ciphertext gets decrypted. The tool support for such outputs forms another criterion which expands its possibilities of use.

Cryptographic implementations pose a unique challenge for (static) program verification tools. With cryptographic sizes of inputs and outputs, the state space (which program verification tools often work with) can be too large for the tools, even with their many analysis tricks. Thus, performance or even whether the tool is practically usable on real-world cryptographic codebases is an important criterion.

Last but not least, there is a concern for usability. It is the ease of use of the tool that, in the end, drives most of the adoption. Usability is hard to characterize, as it contains elements of all criteria discussed thus far. For example, if a domain-specific language (DSL) is used as the input level to the tool, existing projects will likely not adopt it, as it will require a rewrite of a part of their codebase. As these tools are products of research papers, the usual saying about the quality of research code, tooling and packaging applies as well. Furthermore, with the absence of proper packaging, the tools and their dependencies are often left outdated and no longer work on current versions of their dependencies. All of this points to usability being a significant concern for the adoption of tools for verification of constant-time properties.

The platform support of the tools is unclear as they usually only explicitly target and evaluate on x86. However, as their input level is often the LLVM IR or source code, they could work on other platforms.

Tools#

The tools below are discussed in this report, ordered chronologically. Some properties of the tools are unclear and are marked with an ?. This list will hopefully grow as I find time to look at more tools and add them.

ctgrind#

(2010) github(agl) github(dfaranha)

ctgrind is a patch available for the Valgrind (memcheck) tool, which adds functionality to mark areas of memory as uninitialized. This is to be used on secrets. At runtime, the memcheck tool then checks that the secret(uninitialized) memory is not used in branches or for memory access. As Valgrind’s memcheck supports the VALGRIND_MAKE_MEM_UNDEFINED and VALGRIND_MAKE_MEM_DEFINED client requests, it is now possible to implement a ctgrind-like approach without patches to Valgrind.

  • Approach: Dynamic
  • Input level: Source code required to embed annotations, then binary for analysis.
  • Leakage model: Branching model, memory-access model
  • Soundness: No, Completeness: No
  • Declassification: yes, Publicly observable outputs: No
  • Performance: Ok
  • Usability: Good. Developers are often experienced with Valgrind and similar tools. However, the need for a custom patch, and thus a recompile of Valgrind, hinders usability, as the patch is not maintained and might get out of date or no longer be supported by upstream Valgrind.

ct-verif#

(2015) github(imdea) github(michael-emmi) paper

The ct-verif tool is a static analysis tool verifying constant-time properties of code, working on the level of LLVM IR, with source code annotations. It uses the SMACK modular software verification toolchain, Bam-Bam-Boogieman for Boogie source transformation, Boogie intermediate verification language as well as the Corral and Z3 solvers.

The tool is actively deployed in the CI of Amazon’s s2n library at link. However, even there, it is only used to verify two functions that together have less than 100 lines of code.

  • Approach: Static
  • Input level: Source code required to embed annotations, then LLVM IR for analysis.
  • Leakage model: Branching model, memory-access model, or even the operand model is possible.
  • Soundness: Yes, Completeness: Yes
  • Declassification: Yes, Publicly observable outputs: Yes
  • Performance: ?
  • Usability: Bad. The tool relies on a whole host of other tools and has been broken by updates at times. The latest update to the repository is in 2018.

dudect#

(2016) github ePrint

dudect is a dynamic tool that uses leakage assessment techniques from physical (power and EM) side-channel analysis, namely test-vector leakage assessment (TVLA). It first runs the target using two classes of secret input data with varying public input data and measures the duration of execution for each run. It then applies a test to the two distributions of the duration of execution for the two classes (either Welch’s t-test for equality of means or Kolmogorov-Smirnov test for equality of distributions), and if the distributions differ, leakage is reported. This is analogous to how leakage assessment is used in power side-channel attacks, in that instead of comparing distributions of power consumption at points during the execution of the target, the runtime distributions are compared.

  • Approach: Dynamic
  • Input level: Binary
  • Leakage model: Instruction counter or the runtime of a function call
  • Soundness: No, Completeness: No
  • Declassification: No, Publicly observable outputs: No
  • Performance: Good
  • Usability: Ok

FlowTracker#

(2016) page paper code

The FlowTracker tool is a static tool that works by analyzing the Program Dependence Graph (PDG) of the target in LLVM IR form.

  • Approach: Static
  • Input level: LLVM IR
  • Leakage model: Branching model, memory-access model.
  • Soundness: Yes, Completeness: No
  • Declassification: No, Publicly observable outputs: No
  • Performance: Good
  • Usability: Bad, uses a very old version of the LLVM compiler stack.

SideTrail#

(2018) paper preprint

SideTrail (at one point called SideWinder) is a tool for verifying time-balanced implementations. The notion of time-balance is a weakening of the constant-time notion that allows for the presence of leakage that is provably under some bound $\delta$ (execution time is negligibly influenced by secrets). For $\delta = 0$ this notion fits well with the notion of constant-time. The tool uses a cross-product technique similar to that of ct-verif. However, instead of asserting the equality of memory accesses and program counter, it asserts the equality of an instruction counter. Its leakage model and technique are well suited against remote (non-cache) attackers.

The tool is deployed in the CI of Amazon’s s2n library at link, where it is used to verify the time-balancedness of several parts of the codebase, handling the CBC decryption, HMAC padding, and AEAD decryption.

  • Approach: Static
  • Input level: Source code for annotations, then LLVM IR
  • Leakage model: Duration as measured by an instruction counter and a model of instruction runtime
  • Soundness: ?, Completeness: ?
  • Declassification: No, Publicly observable outputs: No
  • Performance: Ok.
  • Usability: Ok. It requires code annotations, as well as providing manual assumptions and loop invariants to ease the verifier’s work.

MicroWalk#

(2018) github arXiv

The MicroWalk framework is a dynamic tool that uses Dynamic Binary Instrumentation (DBI) and Mutual Information Analysis (MIA). As a dynamic tool, it runs the target with random inputs and uses dynamic binary instrumentation to log events such as memory allocations, branches, calls, returns, memory reads/writes as well as stack operations into an execution trace. It then processes these traces by applying the chosen leakage model, i.e., in the branching model, it only keeps the control flow events in the execution traces. After collection of traces, it offers several analysis options, either directly comparing the traces or using mutual information analysis either on the whole trace or a specific offset in the execution traces (a specific instruction).

  • Approach: Dynamic
  • Input level: Binary
  • Leakage model: Branching model, memory-access model, with the possibility of extending that allows the operand model as well.
  • Soundness: No, Completeness: No
  • Declassification: No, Publicly observable outputs: No
  • Performance: Good
  • Usability: Good. It was Windows-only until recently, but now it also supports Linux.

DATA#

(2018,2020) github paper(2018) paper(2020)

DATA (Differential Address Trace Analysis) is a tool quite similar to the Microwalk framework in that it is a dynamic tool that records memory-accesses of the target into address traces as it processes random secret inputs. The traces are then aligned and analyzed using generic and specific leakage tests. The tool reports the location of leakage and even offers a graphical user interface for analysis.

  • Approach: Dynamic
  • Input level: Binary
  • Leakage model: Branching model, memory-access model.
  • Soundness: No, Completeness: No
  • Declassification: No, Publicly observable outputs: No
  • Performance: Ok
  • Usability: Good

FaCT#

(2019) github paper

The FaCT tool is less of a tool for analysis of implementations for timing leakage and more of a domain-specific language for writing constant-time implementations that automatically removes leakage during compilation. The language is C-like, compiles into LLVM IR, and offers the secret keyword, which is used to mark certain variables as secret, which then triggers the compiler to generate constant-time code with regards to their values.

  • Approach: Static
  • Input level: A domain-specific language called FaCT
  • Leakage model: Branching model, memory-access model, and operand model.
  • Soundness: Yes, Completeness: No
  • Declassification: Yes, Publicly observable outputs: No
  • Performance: Ok
  • Usability: Bad for projects with existing codebases as it requires the use of a compiled DSL. Good for new implementations, as it was user-tested and found to improve the developer’s ability to write constant-time code.

ct-fuzz#

(2019) github paper arXiv

ct-fuzz takes inspiration from ct-verif in its method but diverges significantly. It first constructs a product program using self-composition of the target with itself, where it asserts that at each point that the memory address accessed by the two programs, whether through control from or indexing, is the same. It then uses a fuzzer against this product program, which splits its fuzzing input equally into the secret inputs for the two instances or the original program in the product program. If the fuzzer detects a failed assert, leakage is detected, as it found two runs through the target, which differ only in secret inputs yet access different offsets in memory.

  • Approach: Dynamic, using fuzzing, with static analysis used to construct the fuzzed program.
  • Input level: LLVM IR, implemented as an LLVM IR transformation in afl-fuzz but requires source code to embed annotations.
  • Leakage model: The branching model and the memory-access model are used together, with the configurable option of extending them to be cache-aware.
  • Soundness: No, Completeness: Yes
  • Declassification: Yes, Publicly observable outputs: No
  • Performance: Good
  • Usability: Ok, uses afl-fuzz, which is a well-known fuzzing tool. However, it uses outdated versions of several dependencies and thus cannot be build without downgrading or fixing them.

TIMECOP#

(2020) page page(SUPERCOP)

The TIMECOP tool is a tool that uses Valgrind’s memcheck client requests VALGRIND_MAKE_MEM_{UN}DEFINED to essentially implement a method like ctgrind. It is a part of the SUPERCOP toolkit (System for Unified Performance Evaluation Related to Cryptographic Operations and Primitives) and is used to evaluate the constant-time properties of implementations in SUPERCOP.

  • Approach: Dynamic
  • Input level: Source code required to embed annotations, then binary for analysis.
  • Leakage model: Branching model, memory-access model
  • Soundness: No, Completeness: No
  • Declassification: Yes, Publicly observable outputs: No
  • Performance: Good
  • Usability: Ok

Binsec/Rel#

(2020) paper preprint

Binsec/Rel is a static analysis tool that works on the binary level, thereby overcoming issues of compilers inserting non-constant-time code or turning constant-time code into non-constant-time one.

  • Approach: Static
  • Input level: Binary
  • Leakage model: Branching model, memory-access model.
  • Soundness: Yes, Completeness: Yes
  • Declassification: No, Publicly observable outputs: No
  • Performance: Good
  • Usability: ?

Miscellaneous#

Several other tools and languages exist that are related to the analysis of implementations for verifying constant-time properties. I chose to omit a detailed analysis of these tools for now, as the analyzed selection presents a good sample of what the landscape has to offer. Several of these tools also enable one to remove leaks from leaking implementations or create constant-time implementations. We list the tools below in no particular order.

Conclusions#

There is an abundance of tools for verifying constant-time properties of cryptographic implemenetations, yet none seem to be actually used in an automated way outside of the papers that introduced them. This is troubling, as their impact is then limited to the select few implementations that the authors chose to verify in a given work while real-world cryptographic libraries change daily and have no automated verification.

From the analyzed static tools, ct-verif and SideTrail stand out, as they are actively deployed; of note is also the Binsec/Rel tool for its approach on a binary level.

The usability of dynamic tools is usually much better than that of static tools, with ctgrind/timecop‘s approach being almost zero cost in terms of integration as ordinary tests or fuzzing together with Valgrind suffice. The dudect tool could also be used in continuous integration, provided some test harnesses are created for it. The MicroWalk/DATA tools are quite similar and might be more suited to interactive testing of implementations.


hxp ctf 2020 - Hyper

HXP CTF 2020

This year I took part in the hxp ctf with a bunch of friends from and around the CRoCS lab. Our team Crocs-Side-Scripting finished at the 58th place. As this was our first CTF and my first real CTF, I quite like the outcome and enjoyed the experience. This post outlines our solution to the crypto hyper challenge, which focuses on Hyperelliptic curves.

The challenge#

The challenge constructs a random number generator using a Hyperelliptic curve of genus 3 (and arithmetic on it), then generates some bytes using it which it XORs with the message Hello! The flag is: hxp{...}.

The code is in Sage and first creates a hyperelliptic curve \(C\) of genus \(3\) of the form \(y^2 = x^7 + x\) over a 64-bit prime field. What is interesting about hyperelliptic curves is that their set of \(\mathbb{K}\)-rational points (for \(\mathbb{K}\) being some extension of the base field) does not form a group like it is for elliptic curves. There is however a group associated to each hyperelliptic curve, its Jacobian. The elements of the Jacobian are divisors, one might imagine them as a formal sum \( \sum_i^m n_i P_i \) of some points \( P_i \) with \( n_i \in \mathbb{Z} \). To my best understanding when mathematicians are talking about a formal sum they are talking about doing a sum, but then instead of doing it, they … don’t do it. In this case it also cannot be computed, as there is no addition defined on the points of the hyperelliptic curve. The sum in the divisor can contain a number of points upto and equal to the genus of the curve, some might however be defined over extensions of the base field. The operation defined on the divisors is too complicated and unnecessary for this write-up, but more can be found in the SageMath docs and in the helpful An Introduction to Elliptic and Hyperelliptic Curve Cryptography and the NTRU Cryptosystem.

#!/usr/bin/env sage
import struct
from random import SystemRandom

p = 10000000000000001119

R.<x> = GF(p)[]; y=x
f = y + prod(map(eval, 'yyyyyyy'))
C = HyperellipticCurve(f, 0)
J = C.jacobian()

The RNG is seeded with three random integers in the range \([0, p^3]\) which makes them roughly 192 bits. During initialization, the RNG also creates three fixed points on the curve (with \(x\)-coords \(11, 22\) and \(33\)) and transforms them into a divisors on the Jacobian. The RNG calls the clk function and processes the results whenever it runs out of bytes. The generation of the first byte triggers this clocking. The clocking takes the current list of the three divisors and multiplies them with the three secret random integers in self.es. These three divisors are then summed and this sum is decomposed into two elements (u, v = sum(self.clk())).

In Sage, divisors on the Jacobian are represented using the Mumford representation. This representation consists of two polynomials \(u(x)\) and \(v(x)\). It holds that \(u(x)\) is monic, \(u(x)\) divides \(f(x) - h(x) v(x) - v(x)^2\) where \(f(x)\) and \(h(x)\) are polynomials associated with the curve. It also holds that \(deg(v(x)) < deg(u(x)) \le g\) where \(g\) is the genus of the curve, in our case \(3\). When one has a divisor on the Jacobian in Sage and indexes into it or decomposes it into two elements the two polynomials \(u(x)\) and \(v(x)\) are returned, which is what happens on the u, v = sum(self.clk()) line.

The rest of the RNG is straightforward, it takes the six coefficients from the Mumford representation of the summed divisor (three from \(u(x)\) and three from \(v(x)\)) and converts them to 8-byte integers, then concatenates them into a 48-byte string which used as the RNG output.

class RNG(object):

    def __init__(self):
        self.es = [SystemRandom().randrange(p**3) for _ in range(3)]
        self.Ds = [J(C(x, min(f(x).sqrt(0,1)))) for x in (11,22,33)]
        self.q = []

    def clk(self):
        self.Ds = [e*D for e,D in zip(self.es, self.Ds)]
        return self.Ds

    def __call__(self):
        if not self.q:
            u,v = sum(self.clk())
            rs = [u[i] for i in range(3)] + [v[i] for i in range(3)]
            assert 0 not in rs and 1 not in rs
            self.q = struct.pack('<'+'Q'*len(rs), *rs)
        r, self.q = self.q[0], self.q[1:]
        return r

    def __iter__(self): return self
    def __next__(self): return self()

The rest of the challenge uses the RNG to compute 48 random bytes and XORs the output with the message.

flag = open('flag.txt').read().strip()
import re; assert re.match(r'hxp\{\w+\}', flag, re.ASCII)

text = f"Hello! The flag is: {flag}"
print(bytes(k^^m for k,m in zip(RNG(), text.encode())).hex())

The solution#

The first important realization is that due to the message starting with Hello! The flag is: hxp{ the first 24 bytes of the output of the RNG are known. In the algorithm, this corresponds to the knowledge of the u[i] for i in range(3) in the __call__ function. These u[i] represent the coefficients of the \(u(x)\) polynomial in the Mumford representation of the divisor. What we do not know are the next 24 bytes of the RNG output, which correspond to the coefficients of the \(v(x)\) polynomial in the Mumford representation of the divisor, so our task is to somehow recover them.

To recover the \(v(x)\) polynomial, one needs to think about the correspondence between the points in the divisor and its Mumford representation. It holds that roots of the \(u(x)\) polynomial are \(x\)-coordinates of the points in the formal sum in the divisor. Furthermore, for \(x_i\) a root of \(u(x)\), \(y_i = v(x_i)\) is the \(y\)-coordinate of the represented point. We can use this recover the \(v(x)\) polynomial. The \(v(x)\) polynomial has degree smaller than three so we can represent it as \(a x^2 + b x + c\) with \(a, b, c \in \mathbb{K}\). Using the three roots \(x_i\) of the \(u(x)\) polynomial we can form three linear equations over \(\mathbb{\overline{K}}\) of the form \(v^2(x) = x^7 + x\). Plugging in our form of the polynomial \(v(x)\) we get \((a x^2 + b x + c)^2 = x^7 + x\), for each \(x\) from the roots of the \(u(x)\) polynomial. For easier solving we will take the square root of the equation and get \(a x^2 + b x + c = (x^7 + x)^{1/2}\). This is what the Sage script below does, with some additional fiddling of the ordering of the coefficients of \(v(x)\) when they are transformed into bytes for the RNG output.

#!/usr/bin/env sage
from binascii import unhexlify, hexlify
from itertools import permutations

p = 10000000000000001119
k = GF(p)
kc = k.algebraic_closure()

R.<x> = k[]; y=x
f = y + prod(map(eval, 'yyyyyyy'))
C = HyperellipticCurve(f, 0)
J = C.jacobian()

msg_prefix = b"Hello! The flag is: hxp{"

with open("output.txt") as f:
    content = unhexlify(f.read().strip())

bs = []
for c, m in zip(content, msg_prefix):
    # Do the XOR, obtain k
    b = c^^m
    print(b)
    bs.append(b)

u0 = int.from_bytes(bytes(bs[:8]), byteorder="little")
u1 = int.from_bytes(bytes(bs[8:16]), byteorder="little")
u2 = int.from_bytes(bytes(bs[16:24]), byteorder="little")
print(hex(u0), hex(u1), hex(u2))

ps = x^3 + u2 * x^2 + u1 * x + u0  # TODO: this ordering might be the other way around.
aps_roots = ps.roots(ring=kc, multiplicities=False)
x0, x1, x2 = aps_roots

A = Matrix(((x0^2, x0, kc(1)), (x1^2, x1, kc(1)), (x2^2, x2, kc(1))))
Y = vector((x0^7 + x0, x1^7 + x1, x2^7 + x2))
Ys = vector((-Y[0].sqrt(), -Y[1].sqrt(), -Y[2].sqrt())) # TODO: Maybe the other sqrt?

v = A.solve_right(Ys)
print(v)
v0 = int(str(v[0])).to_bytes(8, byteorder="little")
v1 = int(str(v[1])).to_bytes(8, byteorder="little")
v2 = int(str(v[2])).to_bytes(8, byteorder="little")

for a0, a1, a2 in permutations((v0, v1, v2)):
    cs = []
    q = bytes(bs) + a0 + a1 + a2
    for c, m in zip(content, q):
        # Do the XOR, obtain k
        b = c^^m
        cs.append(chr(b))
    print("".join(cs))

And there it is: Hello! The flag is: hxp{ez_P4rT_i5_ez__tL0Cm}.


Population-wide antigen testing for COVID-19 in Slovakia

Slovensko plánuje celoplošné antigénové testovanie na COVID-19 a z pohľadu na tlačovky to vyzerá, že to robí nie práve informovane. Tento príspevok obsahuje interaktívny nástroj na odhadovanie a výpočet chybovosti týchto testov na populácii. Na výpočet toho, koľko pozitívnych prípadov test zachytí (true positive) či koľko negatívnych ľudí prehlási za pozitívnych (false positive) je treba niekoľko parametrov. Parametre Populácia, Účasť a Nakazení sú odhady, pričom odhad nakazených v populácii (a aj motivácia za týmto príspevkom) je z príspevku Richarda Kollára. Odhad Senzitivity testu je z porovnávacej štúdie FN Motol. Odhad Špecificity testu je pomerne optimistický a väčšina štúdii ho pre plánované antigénové testy určuje nižšie.

Nástroj je interaktívny a odhady parametrov je možné meniť.

Metóda 1

Populácia
Účasť
Nakazení
Senzitivita testu
Špecificita testu
Pravdivo pozitívni Falošne pozitívni Pravidivo negatívni Falošne negatívni Netestovaní pozitívni
Kód ktorý robí výpočet môžete nájsť nižšie (JavaScript) a vo forme Jupyter notebooku aj na binderi.
mybinder.org
// Calculate the population that will get tested
let tested_population = population * participation;

// Calculate the infected among the tested and non-tested
// Assumption that attendance is uniform among infected and non-infected
let tested_infected = infected * participation;
let tested_clean = tested_population - tested_infected;

// Calculate the true/false and negative/positive from the tested sample,
// with given sensitivity and specificity
let true_clean = tested_clean * specificity;
let false_infected = tested_clean * (1 - specificity);
let true_infected = tested_infected * sensitivity;
let false_clean = tested_infected * (1 - sensitivity);

// Calculate the missed infected
let missed_infected = infected * (1 - participation);
return {
    "true_negative": true_clean,
    "false_positive": false_infected,
    "true_positive": true_infected,
    "false_negative": false_clean,
    "missed_positive": missed_infected 
};

Metóda 2

Populácia
Testovaní
Senzitivita testu
Špecificita testu
Pozitívne otestovaní
Infikovaní testovaní Neinfikovaní testovaní Infikovaní celkovo Neinfikovaní celkovo
Pravdivo pozitívni Falošne pozitívni Pravidivo negatívni Falošne negatívni Netestovaní pozitívni
Kód ktorý robí výpočet môžete nájsť nižšie (JavaScript) a vo forme Jupyter notebooku aj na binderi.
mybinder.org
let attendance = tested / population;
let tested_negative = tested - tested_positive;

// Calculate the number of infected among the tested
let tested_infected = (specificity * tested_positive - (1 - specificity) * tested_negative) / (specificity + sensitivity - 1);
let tested_clean = tested - tested_infected;

// Assumption that attendance is uniform among infected and non-infected
let total_infected = (tested_infected / tested) * population;
let total_clean = (tested_clean / tested) * population;

// Calculate the missed infected
let missed_infected = total_infected - tested_infected;

// Calculate the true/false and negative/positive from the tested sample, with given sensitivity and specificity
let true_clean = tested_clean * specificity;
let false_infected = tested_clean * (1 - specificity);
let true_infected = tested_infected * sensitivity;
let false_clean = tested_infected * (1 - sensitivity);
return {
    "tested_infected": tested_infected,
    "tested_clean": tested_clean,
    "total_infected": total_infected,
    "total_clean": total_clean,
    "true_negative": true_clean,
    "false_positive": false_infected,
    "true_positive": true_infected,
    "false_negative": false_clean,
    "missed_positive": missed_infected 
};

Vysvetlivky

  • Pravdivo pozitívny: Prípad kedy bol pozitívny človek správne identifikovaný testom ako pozitívny. Z populácie sa tak izolujú symptomatickí aj asymptomatickí ľudia a preruší sa tak táto vetva prenosu ochorenia.
  • Falošne pozitívny: Prípad kedy človek nemá COVID-19 avšak bol testom falošne identifikovaný ako pozitívny (bude absolvovať karanténu a následne si môže myslieť, že COVID-19 už prekonal a má imunitu).
  • Pravdivo negatívny: Prípad kedy bol negatívny človek správne identifikovaný testom ako negatívny.
  • Falošne negatívny: Prípad kedy človek má COVID-19 avšak bol testom falošne identifikovaný ako negatívny (a bude mať rozšírené možnosti pohybu na verejnosti).
  • Netestovaný pozitívny: Prípad kedy človek má COVID-19, avšak nezúčastnil sa celoplošného testovania (a bude mať obmedzené možnosti pohybu na verejnosti).

Analysis of the Covid19 ZostanZdravy app - Contact-tracing

Covid19 ZostanZdravy

This post analyzes the Slovak contact-tracing app Covid19 ZostanZdravy from a security and privacy perspective. The app is being developed by volunteers from Sygic, but is officially running under control of NCZI, the National Health Information Center, with data ownership by UVZ, the Public Health Authority of Slovakia (see the privacy policy). This analysis was performed from publicly available sources, which was possible as both the app and backend are open-source (the analyzed commits were 400aa52, 2710f09 and f9b9d2c). The text below represents the issues I see in the current workings of the contact-tracing part of the app and provides an outlook on fixing them and moving forward. The analysis represents a best effort analysis done in a day, it might contain errors, or I might have misrepresented something, I am open to comments .

Privacy

The app does not use an established contact-tracing protocol, such as DP-3T, PEPP-PT NTK or ROBERT, but instead uses a custom designed protocol to perform contact-tracing. This is because the app predates those protocols by a few weeks. The contact-tracing protocol is a BLE-based contact-tracing protocol with static IDs that roughly works as follows:
  1. The user installs the app, which generates a deviceID a random UUID of the device, enrolls this device with the server and receives back a profileID which is an unsigned integer, assigned in a increasing sequence by the server.
  2. The app then broadcasts the profileID of the device on BLE and listens to other broadcasted profileIDs of other devices.
  3. The app then periodically upload a list of seen profileIDs to the server. This upload and all of the app's interaction with the server is authenticated by the deviceID which is sent to the server in every request and is kept on the device otherwise. The uploaded list of contacts used to contain the time and duration of the contacts, but this was abandoned and instead only the day of contact is uploaded.
  4. When the user becomes infected, the actions of the protocol become unclear, as the open-source backend is just an HTTP API, the administration of the whole system is done through an admin app that interacts with the backend, but is not open-source. However, something can be deduced from the API offered by the backend, as it offers one administrative call to query the seen profileIDs by a given device (identified by both the deviceID and profileID). This call is likely used by the admin app to query the contacts of a newly infected user and send alerts/quarantine recommendations to them. It is important to note that this call reports one-sided contacts as reported by the users.

This approach clearly provides the whole contact graph of a user's device to the server, whether the user is infected or not. Such a contact graph, while it is pseudonymous, leaks significant private information about the users to the server (see this document, section 4).

Contact reporting

As described above, it is likely that the reporting of contacts of an infected user uses only one-sided contacts submitted by the user's device, i.e. when querying the contacts of a user X, the contacts of all users are queried for X's profileID (see the code). Which might make sense, if one accounts for the possibility of some devices going offline and not uploading their contacts. If contact reports from both parties were necessary to report a contact, this might pose problems. However, this implicit trust of user's reported contacts, together with the way profileIDs are assigned (unsigned increasing integer sequence, see the code here and here) creates an attack on the system, in which an attacker can get the infection status of all of the users.

  1. Attacker first creates a new profile, and receives back their profileID. As profileIDs are generated incrementally, the attacker can now enumerate all previously registered profileIDs.
  2. The attacker creates a new profile for each of the existing user's profiles.
  3. Then the attacker will report a contact from each of his profiles with exactly one of the legitimate user's profiles, i.e. attacker's profile #1 reports contact with user profile #1, attacker's profile #2 reports contact with user profile #2, and so on.
  4. When any of the users registered before attacker's profile registration are confirmed infected, the query for their contacts will always include the one attacker's profile and the attacker will get a notification of being in contact with an infected user.
  5. There is also the possibility of extending this attack to complete deanonymization of an infected user, by placing BLE listening devices in particular public places, together with a camera capturing the area, and then correlating the captured broadcasts with the camera view of the area (see here). This data collection can be performed even before the attack itself or before the user's infection.

If however, the implicit trust was one-sided the other way, i.e. querying the contacts of a user X would trust their reported contacts a different attack would be possible, one that would mark all users as having contact with an infected person.

The attack would work as follows:
  1. Attacker registers a profile with the server, and receives back their profileID. As profileIDs are generated incrementally, the attacker can now enumerate all previously registered profileIDs. They can not however spoof messages to the API as users with those profileIDs as deviceIDs are required for that, and those are random UUIDs that contain enough entropy.
  2. The attacker can however report any and all profileIDs in use to the server as contacts, possibly daily for some period of time.
  3. The attacker can now give the account details/device with the account details to a likely infected cooperating person, which will get tested and obtain a confirmation of infection from a health authority. The person then confirms their infection with the attackers account details, which immediately marks all of the users in the system as exposed to an infected person.

Modifying the system to rely on both sides of an encounter to report it might seem like an easy fix, however that brings the aforementioned issues of false-negatives created by devices going offline, or devices with different bluetooth strength (where only one device saw enough broadcasts of the other device to report a contact) and so on. The current system with predictable and static user IDs will likely always suffer from similar attacks.

Using a custom contact-tracing protocol, as the system does, is a security risk even if the above attack is fixed, as proper specification and security analysis is necessary to get it right. One can get both of those by using an established protocol such as DP-3T. As the cryptography community mantra rightfully states, Don't roll your own crypto!

Build reproducibility and deployment

The three components of the system, the Android app, the iOS app and the backend server are all open-source, which is quite nice from an analysis perspective and also the bare-minimum a contact-tracing system should be.

There is however no transparency over the build and deployment process, e.g. what versions of code actually run on the server, or are provided in the respective app stores. The Android app does not contain the full configuration and it is thus not possible to build it reproducibly such that the built APK matches the app store APK perfectly.

Having build reproducibility for a privacy sensitive app is important, to ensure that code can be analyzed and that arguments from this code analysis can be applied to the deployed app. Also to make decompilation and analysis of deployed apps not necessary apart from a comparison of the app's hash.

Specification and documentation

The system lacks any proper specification, of the contact-tracing protocol, backend API or really any component. Without a detailed specification of all of the system's components and their responsibilities and behavior, proper analysis is resource-intensive if not impossible. This can be seen from my statements about the attacks above, where an unavailable component of the system, the admin app, makes decisions that influence how and if an attack would work. Without this specification, which should have been created before implementation took place, more vulnerabilities in the system cannot be ruled out, they will however remain harder to find and fix.

The components also lack documentation, apart from a README here and there. Having properly documented components would make security analysis of the system easier, as well as help new contributors to contribute to the project.

Tests

The android app contains no tests at all, the iOS app contains a test directory that contains no tests. The server is the only component with any tests, and contains a few tests for the push-notification service, SMS messaging service and a few unit tests for the core repository. This absence of tests is a serious issue for a privacy sensitive app, as the likelihood of errors in the code with absolutely no tests is high.

Calibration and real-world testing

The contact-tracing capabilities of the app have not been properly tested in the real-world, to the best of my knowledge. Such testing is necessary for proper calibration of what an epidemiologically significant encounter is and how it manifests in the BLE broadcasts. Modern devices have strong capabilities to both broadcast and receive the broadcasts, if any sequence of correctly received broadcasts longer than 5 minutes is counted as an encounter (as currently done in the app), the number of false-positives would likely be quite high.

Calibration and real-world testing is currently being performed by the DP-3T team, using an app built using their decentralized contact-tracing protocol, even before the deployment of the app in Switzerland (see here and here).

Other solutions

In comparison with current contact-tracing efforts and plans of different countries, the app is clearly the least privacy-preserving, due to using the aforementioned privacy issues (full contact graph on server, attacks possible, static and predictable IDs used).

The DP-3T project presents a decentralized privacy-preserving approach to contact-tracing, with strong guarantees, a detailed specification, published SDKs and extensive security analysis. It is also backed by a large group of researchers from the security & privacy area. This approach will be deployed in Switzerland (app). There has also been extensive work on interoperability of contact-tracing protocols, focusing on DP-3T (here).

The situation in the UK seems worse than the case of Switzerland, the NHSX/NCSC recently released a specification for a custom centralized contact-tracing system, which does not have privacy-preserving properties (see here for an analysis by Martin Albrecht and here for an analysis by Kenny Paterson).

There have been several statements from hundreds of scientists and researchers mainly in the fields of security & privacy that called for a responsible, privacy-preserving by design, approach to contact-tracing. See here and here. These statements endorse the decentralized privacy-preserving approach taken by DP-3T and clearly advise against the centralized approach taken by the Covid19 ZostanZdravy app (obviously without directly mentioning it).

Conclusions and recommendations

I believe the app, as it is now, presents a significant risk from a privacy perspective. The following list summarizes the issues presented:

  • The app reveals the full contact graph of all of its users to the server.
  • The app uses static and predictable user IDs.
  • The app allows for an attack in which an attacker gains the infection status of all users.
  • The app is not build reproducibly and thus correspondence between the deployed apps and the sources can not be easily confirmed.
  • The app has no specification and documentation.
  • The app has almost no tests.
  • There was no public security analysis of the contact-tracing protocol or the apps.
  • There was no calibration and real-world testing of the app and system.

When comparing the app to the principles outlined in the Joint Statement on Contact Tracing, the app fails all but one.

  • "Contact tracing Apps must only be used to support public health measures for the containment of COVID-19. The system must not be capable of collecting, processing, or transmitting any more data than what is necessary to achieve this purpose." The app collects the full contact graph of all users, which is unnecessary.
  • "Any considered solution must be fully transparent. The protocols and their implementations, including any sub-components provided by companies, must be available for public analysis. The processed data and if, how, where, and for how long they are stored must be documented unambiguously. Such data collected should be minimal for the given purpose." The data collected by the app is not minimal.
  • "When multiple possible options to implement a certain component or functionality of the app exist, then the most privacy-preserving option must be chosen. Deviations from this principle are only permissible if this is necessary to achieve the purpose of the app more effectively, and must be clearly justified with sunset provisions." The contact-tracing protocol implemented is clearly not the most privacy-preserving, but likely the simplest.
  • "The use of contact tracing Apps and the systems that support them must be voluntary, used with the explicit consent of the user and the systems must be designed to be able to be switched off, and all data deleted, when the current crisis is over." The app is currently voluntary.

I want to stress that an analysis like this one should have been performed long before the app achieved current levels of deployment. A way to fix some of the issues above would be to move the app to the DP-3T contact-tracing protocol, which has SDKs available for both Android and iOS, and has passed significant security and privacy analysis. This would fix the privacy and security issues inherent in the protocol used, but also help with other issues, as the need for a full specification would be lower, the code to document would be simpler and there would be less code to test. Calibration and testing issues would be also resolved by the currently ongoing testing by the DP-3T team.

One practical issue that I did not mention, as it does not pertain to security or privacy, is that of Bluetooth broadcast issues on iOS. This would be resolved by using DP-3T as well, since the iOS SDK of DP-3T plans to utilize the Apple provided contact-tracing APIs, when they become available.


GSoC 2017 - Final work submission

As the GSoC 2017 final evaluation period just ended, my final work product is finally submitted. This post is a summary of my final work product.

Mailman-pgp#

  • repository@gitlab
  • docs@rtd
  • Plugin for Mailman Core.
  • Enables creating a PGP mailing list, which has a list key, can receive and serve messages encrypted, can sign and receive signed messages from subscribers.
  • Creates the key email command, which is used for per-address user key management.
  • Subscription to a PGP enabled mailing list the subscribing address to send and confirm an address public key, which the moderator must verify.
  • Somewhat confirms the user has possession of the appropriate private key to the one sent on subscription.
  • Has per-list settings for encryption/signatures/what to do with non encrypted / non signed messages, etc..
  • Optionally exposes a REST API for list configuration.
  • Has local archivers which can store the messages encrypted by the list key.
  • Stores list and address keys in configurable key directories.
  • Requires (some not merged) MRs in Mailman Core:
  • Additional MR (not required):
  • Required branches are merged and maintained at J08nY/mailman/plugin.
  • To install, do pip install mailman-pgp, warning: it will pull in a development version of Mailman Core and PGPy.

django-pgpmailman#

mailman-rest-events#

  • repository@gitlab
  • A plugin for Mailman Core that turned out to be unnecessary for the working of django-pgpmailman, but implemented a similar feature as this MR.
  • This plugin sends the events (and some information about them) from Mailman Core to a list of configurable endpoints using JSON in HTTP POST requests.

Other contributions#

Overall#

I think I met almost all goals that the project idea required and my original proposal stated, with the noteworthy exception of remote archiving to HyperKitty which I just couldn’t find a way to integrate.


GSoC 2017 - Web UI progress

django-pgpmailman progress#

Successfully created the mail list views. Inspired heavily by Postorius, to get the same look, both in templates and views. There is a list index view, which lists only PGP enabled lists, and their key fingerprints. This also allows one to download the list key as it’s linked from the list key fingerprint. The list name link leads to a list settings/info view. The info tab is available to any logged in user, while the settings are list owner only. All the per-list PGP settings are configurable there.

django-mailman3 template chunks#

In order to make plugging the django-mailman3 based apps together and deduplicate some of their code, as well as to integrate the django-pgpmailman app into any Postorius + HyperKitty project I refactored the direct references of Postorius to HyperKitty and vice versa.

This is done in the template chunk MR. It introduces a new template tag in django-mailman3, which is intended to be used by all django-mailman3 based apps to let other installed apps add their entries to the navbar and user menu. Which I are two main ways Postorius and HyperKitty reference each other.


1   2   3   4