Wednesday, December 24, 2014

52 Things: Number 12: What is the elliptic curve group law?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know' to do Cryptography: a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. We continue the Mathematical Background section by introducing the Elliptic Curve Group Law...

The Elliptic Curve group law is a method by which a binary operation is defined on the set of rational points of an elliptic curve to form a group. Now, lets go through what that actually means, and what it's used for. Thanks to Dr Dan Page for providing the group law diagram.

An Elliptic Curve and its rational points

An Elliptic Curve is a cubic equation in two variables over some mathematical field. They can be written in various forms, but over most fields 1 can be written in short Weierstrass form:
$$E : y^2 = x^3 + ax + b $$
For now we will assume that we are working in the field of real numbers, and ignore any complications that come from using finite fields. With some simple requirements on $a,b$ (specifically, that $27b^2\neq -4a^3$) this is an elliptic curve.

The set of points that will be elements of our group are going to be the rational points of the elliptic curve. This is simply the collection of points $(x,y)$ that satisfy the curve equation where both $x,y$ are rational. So, that's the set of $(x,y)\in \mathbb{Q}$ where $y^2=x^3+x+b$.  For reasons that will become clear, we also include a point at infinity 2.

Adding a Group Law to an elliptic curve

The simplest way to describe the relation we're going to add to the set of rational points is with a diagram:

Elliptic Curve (blue) with two points (P,Q) and their sum (P+Q) plotted, along with construction lines (red)
So, to add together $P$ and $Q$, we draw a line through $P$ and $Q$, and make $T=(T_x,T_y)$ the third point this line intersects the curve. Then, $P+Q=(T_x,-T_y)$. To add a point to itself, we take the tangent at that point. Now, the surprising fact is that this operation defines a group, with the point at infinity the neutral element.
Most of the requirements of being a group are easy to see geometrically3. For example, it is easy to find the inverse of an element. In the diagram above $(P+Q)+T=0$, because the line from $T$ to $(P+Q)$ has it's third intersection at infinity, and so $(P+Q)=-T$. In fact, for any elliptic curve in short Weierstrass form, to negate a point we simply change the sign of it's y-coordinate.

Is that all there is to it?

Pretty much yes. The same method holds to over finite fields, although in this case it tends to be simpler to think of the group's operation as being an algebraic construct rather than geometrical, since Elliptic Curves over finite fields do not have such an intuitive structure.  Also, we don't need to view curves in short Weierstrass form, since there are many different coordinate schemes and equations that represent the same curve. Indeed, some choices of curve and coordinate system assist us in doing certain types of computation.

What's that got to do with Cryptography?

It turns out that over certain finite fields the Elliptic Curve Group has several nice properties for cryptographers. There are a surprisingly large number of curve and field pairs where it's not too costly to do group computations4, but for which the various discrete log or DH problems (see last week's blog) are hard. Moreover, compared to using large multiplicative groups (eg RSA groups) the variables computed with are much smaller. Putting all these together, elliptic curves allow cryptographers to efficiently calculate ciphertexts that are much smaller than those created by alternative means without reducing security.

  1. Specifically, fields of characteristic not equal to 2,3. That is, fields where $2\neq0$ and $3\neq 0$. Unfortunately, this obviously means that the results we discuss won't hold in binary fields, but that is rather beyond the scope of this talk.
  2. Justification for this comes from considering the elliptic curve as a curve in projective space, but for now it suffices that such a point exists.
  3. Associativity is by far the most complicated to show. This diagram on wikipedia explains the concept behind the proof, although the details are rather involved.
  4. Even as I write this, I'm sure someone will question the validity of this claim, but it is true that compared to many groups that one could construct in which the required problems are sufficiently hard, point arithmetic on an elliptic curve is comparatively tractable.

Friday, December 19, 2014

Study Group: Efficient Smart Phone Forensics Based on Relevance Feedback

This week's study group was given by Panos on the subject of Efficient Smart Phone Forensics Based on Relevance Feedback[1].  

 The Digital Forensics Procedure has 4 major steps: Device Secure, Data Extraction, Data Analysis and Data Representation. To improve the efficiency of data analysis, a sub procedure known as forensic triage is involved to mark out the most relative information among in the phone. In this paper, a ranking system called LIFTR was presented to prioritize the information recovered from Android phones where the amounts of data are huge comparing to feature phones.

LIFTR is designed under following assumptions:
  1. Physical image of phone is acquired.
  2. The data is not encrypted. They are filtered binary files (which are the output of previous procedure).
  3. The format of data are list of strings, e.g. the result of DEC0DE or strings command.
LIFTR then takes the information parsed by recovery engine as the input alongside with optional suspect information, and then outputs a ranked list of the most important NAND pages.

There are two major steps in the design of LIFTR:
  1. Initial Ranking. The input of LIFTR are unranked pages. The initial ranking will mark the pages based on a relevance metric. After this procedure, the pages will be ranked for the next step.
  2. Relevance Feedback. After the initial ranking, LIFTR will present some possibly relevant fields of pages to the investigator and ask for the labelling, which is usually manually, of true and false positive. LIFTR then resorts the ranking by the labelling information assigning relevant pages with higher ranks and irrelevant pages lower. This process can be repeated multiple time to reach a better performance.

The experimental result shows that LIFTR can efficiently mark out the relevant pages. However, it does not guarantee the target information will be included in its result; therefore all data should still be analyzed during an investigation. However, LIFTR is still an effective tool to improve the efficiency of the investigation.

[1]Saksham Varma, Robert J. Walls, Brian Lynn, and Brian Neil Levine. 2014. Efficient Smart Phone Forensics Based on Relevance Feedback. InProceedings of the 4th ACM Workshop on Security and Privacy in Smartphones & Mobile Devices (SPSM '14). ACM, New York, NY, USA, 81-91. DOI=10.1145/2666620.2666628

52 Things: Number 11: What are the DLP, CDH and DDH problems?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. This blog is the second in the section on Mathematical Background and concerns how group operations can be used to design cryptographic primitives.

As you probably know by now, cryptography often relies on 'hard problems'. That is, we design cryptographic protocols whose security can be proven if we assume that the adversary is unable to solve a certain (mathematical) problem in a reasonable amount of time. This blog post introduces three such problems which are widely used in security proofs. Luckily for me, a) this is really just Group Theory, not Computer Science and b) just two days before writing this I attended a very decent guest lecture by fellow Bristol Crypto researcher Susan Thomson on this exact topic. (That is to say, any inaccuracies in the following can and should be blamed on her!)

The Discrete Logarithm Problem (DLP)

Okay, let $G$ be an abelian group. First we write the operation in $G$ multiplicatively. For any $g \in G$ and any integer $a>1$ let $g^a$ denote $g * g * ... * g$ with $a$ occurrences of $g$. The discrete logarithm problem (DLP) is:

Given $G$, $g$ and $h=g^a$, find $a$. 

Here $a$ is called the discrete logarithm of $h$ with base $g$, hence the name of the problem.

Is the discrete logarithm problem hard? Sometimes, maybe. As a counter-example, let $G$ be the integers under addition. So now it makes sense to write the group operation additively, not multiplicatively. So the same process of repeating the group operation with the same element $g$ is now written $g + g + ... + g$ with, say, $a$ summands and, since we're working in the integers, this sum, call it $h$, is just equal to the integer $ag$. Therefore $a$, the discrete logarithm of $h$ to the base $g$, can be found by just dividing $h$ by $g$. For example, if I say to you, "find the discrete logarithm to the base 3 of 18 in the integers under addition", you'd just write $3 + 3 + ... + 3 = 18$ with $a$ summands on the left, simplify this to $3a = 18$ and find that $a = 6$. I could change the group to integers modulo $N$ for some integer $N>1$ (still under addition) but the problem wouldn't be much harder: one has to solve an equation like $ag \equiv h \: (\mathrm{mod} \: N)$ which is solved by performing the Extended Euclidean Algorithm to find $g^{-1}\: (\mathrm{mod} \: N)$ (if it exists), multiplying this by $h$ and reducing modulo $N$ to obtain $a$. All of this is can be done in polynomial time - no good for building secure cryptographic primitives.

On the other hand, the DLP in a finite field of prime order viewed as a group under multiplication (after throwing out the zero element) or in elliptic curve groups (more on those next week) is believed to be hard. That is, we do not yet know of any polynomial time algorithms for finding discrete logarithms in these groups. As a concrete example, suppose I ask you, "find the discrete logarithm to the base 3 of 5 in the multiplicative group of the integers modulo 7". This means find an integer $a$ such that $3^a \equiv 5\: (\mathrm{mod} \: 7)$. Now that we are in the multiplicative group, not the additive one and so we really do have to 'take logarithms' somehow, not just multiply by the inverse of 3. In this case, since 7 is fairly small, we can find the answer by just trying all the possibilities one at a time until we find a solution:
  1. $3^1 = 3 \not\equiv 5\: (\mathrm{mod} \: 7)$
  2. $3^2 = 9 \equiv 2 \not\equiv 5 \: (\mathrm{mod} \: 7)$
  3. $3^3 = (3^2)\times3 \equiv 2\times3 = 6 \not\equiv 5\: (\mathrm{mod} \: 7)$
  4. $3^4 = (3^3)\times3 \equiv 6\times3 = 18 \equiv 4 \not\equiv 5\: (\mathrm{mod} \: 7)$
  5. $3^5 = (3^4)\times3 \equiv 4\times 3 = 12 \equiv 5\: (\mathrm{mod} \: 7)$
so $a = 5$. The way we 'bounce around' the integers modulo 7 by repeatedly multiplying by the base 3 (getting 3, 2, 6, 4 and then 5) should give you some intuition as to why DLP seems hard in this setting. If our prime was much larger than 7, say thousands of bits long in binary, even a computer would take a very long time to solve DLP this way (though there are better, sub-exponential time algorithms and it is not proven that no polynomial time algorithm exists for solving DLP in this kind of group).

The Computational Diffie-Hellman Problem (CDH)
A problem related to DLP is named after Whit Diffie and Martin Hellman who devised a way of two parties agreeing on a secret key over a public channel without revealing it:
  • Alice and Bob publicly agree on a cyclic group $G$ and generator $g$. 
  • Alice chooses a random secret integer $a$ and Bob chooses a random secret integer $b$. 
  • Alice computes $g^a$ and publicly sends this to Bob. Bob computes $g^b$ and publicly sends this to Alice. 
  • Alice and Bob both compute $g^{ab}=(g^a)^b=(g^b)^a$ by raising what they received from the other party to power of their own secret integer. 
Now $g^{ab}$ is a secret key that can be used for symmetric encryption and decryption by Alice and Bob. But someone listening in to the exchange has in their possession $G$, $g$, $g^a$ and $g^b$. So secrecy of the key $g^{ab}$ depends on this problem, called the Computational Diffie-Hellman Problem (CDH):

Given $G$, $g$, $g^a$ and $g^b$, find $g^{ab}$.

CDH is clearly related to DLP, but which is harder? Well, if I can solve DLP then I can efficiently compute the secret integer $a$ from $g^a$ and then find $g^{ab}$ by raising $g^{b}$ to the power $a$ in the same way Alice does, therefore solving CDH. So anyone who can solve DLP can also solve CDH, meaning DLP is at least as hard as CDH.

The Decisional Diffie-Hellman Problem (DDH)
This is another 'discrete logarithm' style problem used to prove indistinguishability properties. Say Alice and Bob perform the Diffie-Hellman key agreement protocol as above so that $G$, $g$, $g^a$ and $g^b$ are all public and $g^{ab}$ is the shared secret key. Intuitively, the Decisional Diffie-Hellman Problem (DDH) asks whether an adversary can distinguish Alice and Bob's secret key $g^{ab}$ from a random group element of $G$. Formally:

Given  $G$, $g$, $g^a$, $g^b$ and $T_x$ such that $T_0$ is a random element of $G$, $T_1 = g^{ab}$ and $x$ is chosen uniformly at random from $ \lbrace 0,1 \rbrace $, find $x$.

If an adversary can solve DDH (i.e. output the correct value of $x$ with probability greater than $\frac{1}{2}$), then $G$, $g$, $g^a$ and $g^b$ must leak some information about the secret key $g^{ab}$ that distinguishes it from a random group element, even if it can't be computed directly. What should be clear is that if the adversary can solve the computational Diffie-Hellman problem, then they can actually compute $g^{ab}$ and hence trivially distinguish this element from a random group element, thereby solving the decisional Diffie-Hellman problem. So anyone who can solve CDH can also solve DDH, meaning CDH is at least as hard as DDH.

These are the three problems we wanted to talk about and we've given a sketch proof of their ordering in terms of hardness: DLP is the most hard, then CDH and then DDH. As we've seen, DLP is sometimes easy, making CDH and DDH easy. So the choice of group $G$ and generator $g$ is very important when doing cryptography!

Wednesday, December 17, 2014

Study Group: Attack on XLS

This week's study group, delivered by Guy, featured a paper by Mridul Nandi attacking the XLS domain completion method. This attack was initially given at DIAC in August, and was formally presented earlier this month at Asiacrypt 2014 in Taiwan. The idea of a domain completion method is to preserve length in an encryption scheme, so that the ciphertext is of the same length as the message. More formally, this means turning an encryption scheme that works on blocks of size n to a cipher that works on inputs of size mn. This contrasts to the strategy of padding (for example to the block length of a block cipher) that infers ciphertext expansion, which is used in many applications and standards. A number of approaches have appeared in the literature including ciphertext stealing, hash-then-counter (XCB, HCTR, HCH) and applying a block cipher twice to create a one-time pad for the final block (EME, TET, HEH) - but these methods are not applicable to generic blockciphers and lack formal security analysis.

At FSE 2007, Ristenpart and Rogaway presented XLS (eXtended Latin Squares), a domain completion method that inherits the assumed strong pseudorandom permutation property of the underlying blockcipher. The method is conceptually straightforward, utilising a (linear) mixing permutation and bit-flipping and the security proof is, as you would expect from the authors, relatively easy to follow despite its numerous intricacies. No fewer than six teams have incorporated XLS into their CAESAR submissions.

Despite this acquired confidence, Nandi presents an attack to demonstrate that XLS is in fact not a strong pseudorandom permutation (SPRP), giving an attack that makes just three encryption/decryption oracle queries and succeeds with probability 1/2. In addition, the author shows that replacing the mixing function defined in the original paper with some other stronger linear permutation(s) does not help, and gives a distinguishing attack (with success probability 1/4) against this generalised version of XLS.

Nandi does not precisely pinpoint any incorrect assertions in the proof nor provide any indication to give hope that XLS is fixable, and it will be interesting to see how the authors of the original paper (and the teams relying on XLS in their CAESAR submissions) respond in the coming months.

Friday, December 12, 2014

52 Things: Number 10: What is the difference between the RSA and strong-RSA problem?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. This blog post introduces the RSA and Strong-RSA problems and highlights the differences between the two.

Cryptography relies heavily on the assumption that certain mathematical problems are hard to solve in a realistic amount of time. When looking at Public-Key (Asymmetric) Cryptography, which is what we'll be focusing on in this blog post we use the assumed existence of One-Way functions, i.e. functions that are easy to compute one way but are difficult to invert. We use problems from algorithmic number theory to produce these functions.


The first difficult problem from number theory to talk about is factoring. Given a composite integer $N$ the factoring problem is to find positive integers $p,q$ such that $N = pq$. Although on the face of it this seems like a very simple problem, this is in fact a very tough, well studied problem. This can be solved in exponential time by checking all the numbers $p = 2, \ldots, \sqrt{N}$. However, solving a problem in exponential time is not fast enough. No polynomial time algorithm has been developed to solve the factoring problem, despite many years of research. Clearly there are examples of $N$ for which this is very easy to solve, for example whenever $N$ is even. Therefore, when starting to think about using this in a Cryptographic construction we consider $N$ as very large and being constructed by 2 large primes $p,q$. 

The RSA Problem

In RSA public-key encryption [1] Alice encrypts a plaintext $M$ using Bob's public key $(n,e)$ to ciphertext $C$ by $C = M^e (\textrm{mod } n)$ where $n$ is the product of two large primes and $e \geq 3$ is an odd integer that is coprime to the order of $\mathbb{Z}_n^{*}$, the group of invertible elements of $\mathbb{Z}_n$. Bob knows the private key $(n,d)$ where $de = 1 (\textrm{ mod } (p-1)(q-1))$ meaning he can compute $M = C^d (\textrm{mod } n)$. An adversary can eavesdrop $C$ and can know the public key $(n, e)$ however to calculate $M$ the adversary must find the factors of $n$. Therefore, this means the RSA problem is no harder than integer factorisation but is still a very hard problem to solve provided a suitable $n$ is chosen. 

The Strong RSA Assumption

The strong RSA assumption differs from the RSA assumption in that the adversary can choose the (odd) public exponent $e \geq 3$. The adversary's task is to compute the plaintext $M$ from the ciphertext given that $C = M^e (\textrm{mod } n)$. This is at least as easy as the RSA problem meaning that the strong RSA assumption is, unsurprisingly, a stronger assumption. The RSA problem is now over a quarter of a century old. Public key encryption schemes have been developed that derive their strength fully from the RSA problem.

[1] -