Bitbashing (Posts about programming)https://randombit.net/bitbashing/categories/programming.atom2019-08-02T22:27:07ZJack LloydNikolaBit manipulations using BMI2https://randombit.net/bitbashing/posts/haswell_bit_permutations.html2012-06-02T00:00:00-04:002012-06-02T00:00:00-04:00Jack Lloyd<div><p>Intel's Haswell design (expected in 2013) will include a new
instruction set called BMI2 with various fun bit manipulation
instructions. Particularly of note are the <tt class="docutils literal">pext</tt> and <tt class="docutils literal">pdep</tt>
instructions which are essentially bit-level gather/scatter
operations. Combining two <tt class="docutils literal">pext</tt> operations results in the GRP
instruction described in <a class="reference external" href="http://palms.ee.princeton.edu/PALMSopen/lee01efficient.pdf">Efficient Permutation Instructions for Fast
Software Cryptography</a>, where
the authors show how to implement bit level permutations using a
variety of instructions. Perhaps not coincidentally at least one of
the authors now works at Intel.</p>
<p><a href="https://randombit.net/bitbashing/posts/haswell_bit_permutations.html">Read more…</a> (4 min remaining to read)</p></div>Using std::async for easy parallel computationshttps://randombit.net/bitbashing/posts/cpp_async.html2009-11-24T00:00:00-05:002009-11-24T00:00:00-05:00Jack Lloyd<div><p>C++0x, the next major revision of C++, includes a number of new
language and library facilities that I am greatly looking forward to,
including a standard thread interface. Initially the agenda for C++0x
had included facilities built on threads, such as a thread pool, but as
part of the so-called 'Kona compromise' (named after the location of
the committee meeting where the compromise was made) all but the most
basic facilities were deferred for a later revision.</p>
<p><a href="https://randombit.net/bitbashing/posts/cpp_async.html">Read more…</a> (3 min remaining to read)</p></div>Converting Line Endings in InnoSetuphttps://randombit.net/bitbashing/posts/convert_line_endings_in_innosetup.html2009-11-23T00:00:00-05:002009-11-23T00:00:00-05:00Jack Lloyd<div><p>I recently packaged <a class="reference external" href="https://botan.randombit.net">botan</a> using
<a class="reference external" href="http://www.jrsoftware.org/isinfo.php">InnoSetup</a>, an open source
installation creator. Overall I was pretty pleased with it - it seems
to do everything I need it to do without much of a hassle, and I'll
probably use it in the future if I need to package other programs or
tools for Windows.</p>
<p><a href="https://randombit.net/bitbashing/posts/convert_line_endings_in_innosetup.html">Read more…</a> (1 min remaining to read)</p></div>4x4 integer matrix transpose in SSE2https://randombit.net/bitbashing/posts/integer_matrix_transpose_in_sse2.html2009-10-08T00:00:00-04:002009-10-08T00:00:00-04:00Jack Lloyd<div><p>The Intel SSE2 intrinsics has a macro <tt class="docutils literal">_MM_TRANSPOSE4_PS</tt>
which performs a matrix transposition on a 4x4 array represented by
elements in 4 SSE registers. However, it doesn't work with integer
registers because Intel intrinsics make a distinction between integer
and floating point SSE registers. Theoretically one could cast and use
the floating point operations, but it seems quite plausible that this
will not round trip properly; for instance if one of your integer
values happens to have the same value as a 32-bit IEEE denormal.</p>
<p>However it is easy to do with the punpckldq, punpckhdq, punpcklqdq,
and punpckhqdq instructions; code and diagrams ahoy.</p>
<p><a href="https://randombit.net/bitbashing/posts/integer_matrix_transpose_in_sse2.html">Read more…</a> (1 min remaining to read)</p></div>Inverting Mersenne Twister's final transformhttps://randombit.net/bitbashing/posts/inverting_mt19937_tempering.html2009-07-21T00:00:00-04:002009-07-21T00:00:00-04:00Jack Lloyd<div><p>The Mersenne twister RNG 'tempers' its output using an invertible
transformation:</p>
<pre class="literal-block">
unsigned int temper(unsigned int x)
{
x ^= (x >> 11);
x ^= (x << 7) & 0x9D2C5680;
x ^= (x << 15) & 0xEFC60000;
x ^= (x >> 18);
return x;
}
</pre>
<p>The inversion function is:</p>
<pre class="literal-block">
unsigned int detemper(unsigned int x)
{
x ^= (x >> 18);
x ^= (x << 15) & 0xEFC60000;
x ^= (x << 7) & 0x1680;
x ^= (x << 7) & 0xC4000;
x ^= (x << 7) & 0xD200000;
x ^= (x << 7) & 0x90000000;
x ^= (x >> 11) & 0xFFC00000;
x ^= (x >> 11) & 0x3FF800;
x ^= (x >> 11) & 0x7FF;
return x;
}
</pre>
<p>This inversion has been confirmed correct with exhaustive search.</p></div>Optimizing Forward Error Correction Coding Using SIMD Instructionshttps://randombit.net/bitbashing/posts/forward_error_correction_using_simd.html2009-01-19T00:00:00-05:002009-01-19T00:00:00-05:00Jack Lloyd<div><p>Forward error correction (FEC) is a technique for handling lossy
storage devices or transmission channels. A FEC code takes <em>k</em> blocks
of data and produces an additional <em>m</em> blocks of encoding information,
such that any set of <em>k</em> of the blocks (out of the <em>k+m</em> total) is
sufficient to recover the original data. One can think of RAID5 as a
FEC with arbitrary <em>k</em> and <em>m</em> fixed at 1; most FEC algorithms allow
wide latitude for the values that can be sent, allowing the code to be
adjusted for the reliability expectations and needs of the particular
channel and application. For instance, the <a class="reference external" href="http://allmydata.org/trac/tahoe">Tahoe</a> distributed filesystem splits
stored files using <em>k</em> of 3 and <em>m</em> of 7, so as long as at least 30%
of the devices storing the file survive, the original file can be
recoved.</p>
<p><a href="https://randombit.net/bitbashing/posts/forward_error_correction_using_simd.html">Read more…</a> (9 min remaining to read)</p></div>A Failure Case in a Linux Random Number Generatorhttps://randombit.net/bitbashing/posts/linux_random32_fail.html2008-07-01T00:00:00-04:002008-07-01T00:00:00-04:00Jack Lloyd<div><p>The Linux kernel implements a random number generator called a
Tausworthe generator, in the file <tt class="docutils literal">lib/kernel32.c</tt>. The kernel uses
this generator for a variety of non-cryptographic purposes, such as
calculating network delays and random ports numbers, choosing a random
element to drop from full caches, and many other places where a
randomized algorithm is useful. While looking through this source, I
found some cases where it could fail quite dramatically.</p>
<p><a href="https://randombit.net/bitbashing/posts/linux_random32_fail.html">Read more…</a> (3 min remaining to read)</p></div>Roman Proverbs Applicable to Softwarehttps://randombit.net/bitbashing/posts/doc_proverb.html2008-06-30T00:00:00-04:002008-06-30T00:00:00-04:00Jack Lloyd<p><em>Quod non est in actis, non est in mundo.</em>
("What is not in the documents does not exist")</p>Adventures in Signal Handlinghttps://randombit.net/bitbashing/posts/f_notify.html2008-03-02T00:00:00-05:002008-03-02T00:00:00-05:00Jack Lloyd<div><p>I was reading the man page for Linux <tt class="docutils literal">fcntl(2)</tt>, because
I've never used it and was curious what exactly it could do. For
a couple of hours this afternoon, I thought I had perhaps found
a security vulnerability in the design, this post is to trace
my logic and describe what I learned.</p>
<p><a href="https://randombit.net/bitbashing/posts/f_notify.html">Read more…</a> (3 min remaining to read)</p></div>Search Based Filesystemhttps://randombit.net/bitbashing/posts/search_filesystem.html2007-10-13T00:00:00-04:002007-10-13T00:00:00-04:00Jack Lloyd<div><p>I was using a friend's OS X machine briefly, and at one point thought
that how the "Movies" folder worked was it acted as an index into the
actual filesystem, using some search technology or another to find all
of your movies and present them to you. Presenting search results as a
directory of files instead of a list (especially with movies, since
they are normally a single file that are more or less independent of
each other) seemed so obviously user-friendly that I figured that had
to be what Apple would do. (Apparently not: my friend said it's just a
plain old directory with actual files in it). But my misconception
planted the idea that this would be pretty useful!</p>
<p><a href="https://randombit.net/bitbashing/posts/search_filesystem.html">Read more…</a> (1 min remaining to read)</p></div>Python Format String Annoyancehttps://randombit.net/bitbashing/posts/python_format_strings.html2007-04-09T00:00:00-04:002007-04-09T00:00:00-04:00Jack Lloyd<div><p>Python's format string operator is useful, but if you wish to provide
arguments to it you must do so all at once. This makes some situations
harder to deal with, including this one wherein I am annoyed at being
unable to compact some Python code manipulating a MySQL database.</p>
<p><a href="https://randombit.net/bitbashing/posts/python_format_strings.html">Read more…</a> (1 min remaining to read)</p></div>Finding Equivalences of Boolean Functionhttps://randombit.net/bitbashing/posts/booleans.html2006-08-30T00:00:00-04:002006-08-30T00:00:00-04:00Jack Lloyd<div><p>A fairly common class of functions in crypto are functions mapping
{0,1}<sup>3</sup> onto {0,1}. In particular, these show up a lot in hash
functions derived from MD4, including MD5, SHA-1, RIPEMD, and
SHA-512. These range in complexity from simple three-term expressions
like "(A xor B xor C)" to functions like "((A and B) or (C and (A or
B)))". One interesting and important difference between these two
functions becomes very important when you consider how to implement
these functions on an x86 (or x86-64) processor. The x86 uses
two-operand instructions, and has very few registers, so computing
something like "(A and B) xor (not(A) and C)", which requires two
temporaries (one to hold A and B, the other (not(A) and C)) might
require you to spill values to the stack. Often, that means a major
performance hit. Finding an alternate form for this function that only
requires fewer temporary registers could be a major benefit. Obviously
finding these equivalences could be done by hand, but having a
computer do it seemed both faster and more interesting.</p>
<p><a href="https://randombit.net/bitbashing/posts/booleans.html">Read more…</a> (2 min remaining to read)</p></div>Fun with assemblyhttps://randombit.net/bitbashing/posts/x86_asm_hashing.html2006-08-13T00:00:00-04:002006-08-13T00:00:00-04:00Jack Lloyd<div><blockquote>
<dl class="docutils">
<dt>"If you can explain how you do something, then you're very very bad at it."</dt>
<dd>-- John Hopfield</dd>
</dl>
</blockquote>
<p>The <a class="reference external" href="http://venge.net/monotone">Monotone</a> folks have been doing some
profiling and performance work of late. One thing that came out of
that was the finding that Botan's SHA-1 implementation was causing a
bottleneck; because Monotone identifies everything via hashes, there
are times where it needs to hash many (many) megabytes of source data,
and the faster that happens, the better. Since low-level C++ wasn't
cutting it, I felt that it was time to try my hand at x86 assembly
again.</p>
<p><a href="https://randombit.net/bitbashing/posts/x86_asm_hashing.html">Read more…</a> (3 min remaining to read)</p></div>Initial Impressions of C#https://randombit.net/bitbashing/posts/csharp.html2006-08-03T00:00:00-04:002006-08-03T00:00:00-04:00Jack Lloyd<div><p>For the last month or so, I've been spending some time on a C#
application to perform vulnerability analysis of PHP and ASP code. The
language was imposed on me by external constraints, but it turns out
to be a very reasonable choice for this sort of problem. I'm not doing
anything particularly groundbreaking or clever, so it ends up that
having really good libraries and a reasonably expressive language is
more useful, in terms of immediate productivity, than having a really
powerful language and half-assed library support (see: Common Lisp).</p>
<p><a href="https://randombit.net/bitbashing/posts/csharp.html">Read more…</a> (2 min remaining to read)</p></div>