Simple and hardware friendly RSA threshold signatures
A \((n,t)\) threshold signature schemes allow splitting a key into \(n\) pieces, in such a way that at least \(t < n\) participants must jointly use their key shards in order to generate a valid signature.
Many techniques for RSA threshold signatures have been developed. Currently published techniques require either a trusted dealer, or use of a distributed key generation algorithm. In addition, the signers must perform a non-standard RSA signature; that is, signing a message with a private exponent which is not equal to \(e^{-1} \bmod n\). Both requirements prevent using standard hardware such as HSMs or smartcards to protect the key shards.
I discovered a technique for \(n\)-of-\(n\) RSA signatures where both keys and signatures can be computed using standard cryptographic hardware.
A normal instance of the RSA signature system is created by first fixing a public exponent \(e\) then choosing two random primes \(p\) and \(q\) of suitable size such that \(p-1\) and \(q-1\) are relatively prime to \(e\). Multiplying \(p\) and \(q\) produces the public modulus \(n\), and the private exponent \(d\) is computed as \(e^{-1} \bmod \phi(n)\). Having established a key, a signature can be formed by first choosing a suitable hash function \(H : \{0,1\}^\star \rightarrow Z_n\) and computing \(s\) = \(H(m)^d \bmod n\). Common choices for \(H\) are the PSS and PKCS #1 v1.5 schemes, in combination with some secure hash function such as SHA-512. The signature can be verified by computing \(H(m)' = s^e \bmod n\) and checking that \(H(m)'\) is a valid representative for the purported message \(m\). Commonly, the Chinese Remainder Theorem (CRT) is used to compute \(s\) using two modular exponentiations, one modulo \(q\) and another modulo \(p\). Since modular exponentiation algorithms have a runtime that is quadratic based on the size of the modulus, CRT can offer substantial speedups.
Many threshold signature schemes based on RSA have been proposed, but all rely on either a trusted dealer or else the use of a distributed key generation algorithm. A trusted dealer implies a trusted party whose deviation from the protocol will compromise the security of the system. Distributed key generation does away with the dealer, but at the cost of complicating key generation, and, worse, requiring a non-standard key generation mechanism. Thus, it is not possible to create a threshold RSA key in such a way that the resulting key shards are stored within a standard PKCS #11 hardware security module.
But there is a simple approach based on multiprime RSA which neatly avoids these problems, at the cost of larger public keys, and with the limitation of only working for \(n\)-of-\(n\). It is probably most suitable for \(2\)-of-\(2\).
Each of the \(n\) parties involved in the threshold scheme generate a public/private RSA keypair. This can be done using standard hardware, and no interaction between the parties is required during the key generation step. The only restriction is all keypairs must be generated with the same public exponent \(e\). All parties publish their public key. The combined public key is simply the common \(e\) and the product of the public moduli: \((e, n_1 \cdot n_2 \cdot ... \cdot n_n)\)
To perform a distributed signature, all parties are given the input message \(m\). They must first jointly agree on the exact value of \(H(m)\). For PKCS #1 v1.5 signatures, the padding is deterministic. For randomized schemes such as PSS, the salt could for instance be chosen by hashing the input message with a shared secret known to all signers. The encoding of \(H(m)\) should be sized according the public modulus rather than that of the individual keys, since verifiers will judge the correctness of \(H(m)\) as an encoding for \(m\) relative to the size of the public key. This restriction unfortunately inhibits use of hardware which provides padding as part of the signature operation. Instead mechanisms for "raw" RSA such as PKCS #11's CKM_RSA_X_509 must be used.
Each key holder signs \(H(m) \bmod n_i\) using their private key, and publishes their signature share \(s_i = H(m)^{d_i} \bmod n_i\). All signature operations can be done in parallel. Given the signature shares \(s_1\), \(s_2\), ... \(s_n\), the CRT can be used to create a joint signature \(s\). This final recombination step can be done by an untrusted third party. Deviations from the protocol can be easily detected by verifying each signature share against the signer's respective public key.
Extending this scheme to \(t\)-of-\(n\) can be done by generating \({n \choose t}\) different public keys, and allowing authentication using any of them.
This scheme is at least as secure as breaking the strongest of the \(n\) sub-keys, which can easily be seen by observing that any valid signature generated without the cooperation of one of the key holders, implies either the existence of some pair \(y, s_y\) where \(s_y = y^{d} \bmod n\) which the signer has not issued (and thus a direct forgery of RSA) or otherwise a signature where some previously issued (valid) \(y, s_y\) pair was reused to forge a signature on some other message. This implies the existence of distinct \(m_1, m_2\) where \(H(m_1) \equiv H(m_2) \bmod n\) where \(n\) is the signer's public modulus. For PKCS #1 v1.5, producing such messages requires a collision attack on the hash function, because the hash of the message is placed into the low bits of \(H(m)\). The situation for PSS here is less clear, and needs further analysis.
Downsides of this scheme are that the resulting (threshold) key is \(n\) times larger than the key length of the individual users, increasing both transmission costs and verification time. While RSA verification is quite fast, likely this approach is only practical for \(n \leq 4\). In addition the larger public key gives a somewhat false sense of security in terms of safety against NFS attacks; factoring a 4096 bit RSA key requires \(\approx 2^{150}\) effort but a pair of 2048 bit keys can be broken with just \(\approx 2^{112}\) operations.