(Mostly) good password resets

Hey everybody!

This is part 3 to my 2-part series on password reset attacks (Part 1 / Part 2). Overall, I got awesome feedback on the first two parts, but I got the same question over and over: what’s the RIGHT way to do this?

So, here’s the thing. I like to break stuff, but I generally leave the fixing to somebody else. It’s just safer that way, since I’m not really a developer or anything like that. Instead, I’m going to continue the trend of looking at others’ implementations by looking at three major opensource projects - Wordpress, SMF, and MediaWiki. Then, since all of these rely on PHP’s random number implementation to some extent, I’ll take a brief look at PHP.

SMF

SMF 1.1.13 implements the password-reset function in Sources/Subs-Auth.php:

  // Generate a random password.
  require_once($sourcedir . '/Subs-Members.php');
  $newPassword = generateValidationCode();
  $newPassword_sha1 = sha1(strtolower($user) . $newPassword);

Looking at Sources/Subs-Members.php, we find:

// Generate a random validation code.
function generateValidationCode()
{
  global $modSettings;

  $request = db_query('
    SELECT RAND()', __FILE__, __LINE__);

  list ($dbRand) = mysql_fetch_row($request);
  mysql_free_result($request);

  return substr(preg_replace('/\W/', '', sha1(microtime() . mt_rand() . $dbRand .
      $modSettings['rand_seed'])), 0, 10);
}

Which is pretty straight forward, but also, in my opinion, very strong. It takes entropy from a bunch of different places:

  • The current time (microtime())
  • PHP's random number generator (mt_rand())
  • MySQL's random number generator ($dbRand)
  • A user-configurable random seed

Essentially, it puts these difficult-to-guess values through a cryptographically secure function, sha1(), and takes the first 10 characters of the hash.

The hash consists of lowercase letters and numbers, which means there are 36 possible choices for 10 characters, for a total of 3610 or 3,656,158,440,062,976 possible outputs. That isn’t as strong as it could be, since there’s no reason to limit its length to 10 characters (or its character set to 36 characters). That being said, three quadrillion different passwords would be nearly impossible to guess. (By my math, exhaustively cracking all possible passwords, assuming md5 cracks at 5 million guesses/second, would take about 23 CPU-years). Not that cracking is terribly useful - remote bruteforce guessing is much more useful and is clearly impossible.

SMF is my favourite implementation of the three, but let’s take a look at Wordpress!

Wordpress

Wordpress 3.1 implements the password-reset function in wp-login.php:

  $key = $wpdb->get_var($wpdb->prepare("SELECT user_activation_key FROM
      $wpdb->users WHERE user_login = %s", $user_login));
  if ( empty($key) ) {
    // Generate something random for a key...
    $key = wp_generate_password(20, false);
    do_action('retrieve_password_key', $user_login, $key);
    // Now insert the new md5 key into the db
    $wpdb->update($wpdb->users, array('user_activation_key=> $key),       array('user_login=> $user_login));
  }

wp_generate_password() is found in wp-includes/pluggable.php:

/**
 * Generates a random password drawn from the defined set of characters.
 *
 * @since 2.5
 *
 * @param int $length The length of password to generate
 * @param bool $special_chars Whether to include standard special characters.
      Default true.

 * @param bool $extra_special_chars Whether to include other special characters.
 *   Used when generating secret keys and salts. Default false.
 * @return string The random password
 **/
function wp_generate_password( $length = 12, $special_chars = true, $       extra_special_chars = false ) {
  $chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
  if ( $special_chars )
    $chars .= '!@#$%^&*()';
  if ( $extra_special_chars )
    $chars .= '-_ []{}<>~`+=,.;:/?|';

  $password = '';
  for ( $i = 0; $i < $length; $i++ ) {
    $password .= substr($chars, wp_rand(0, strlen($chars) - 1), 1);
  }

  // random_password filter was previously in random_password function which was       deprecated
  return apply_filters('random_password', $password);
}

This generates a string of random characters (and possibly symbols) up to a defined length, choosing the characters using wp_rand(). So, for the final step, how is wp_rand() implemented? It’s also found in wp-includes/pluggable.php and looks like this:

  global $rnd_value;

  // Reset $rnd_value after 14 uses
  // 32(md5) + 40(sha1) + 40(sha1) / 8 = 14 random numbers from $rnd_value
  if ( strlen($rnd_value) < 8 ) {
    if ( defined( 'WP_SETUP_CONFIG) )
      static $seed = '';
    else
      $seed = get_transient('random_seed');
    $rnd_value = md5( uniqid(microtime() . mt_rand(), true ) . $seed );
    $rnd_value .= sha1($rnd_value);
    $rnd_value .= sha1($rnd_value . $seed);
    $seed = md5($seed . $rnd_value);
    if ( ! defined( 'WP_SETUP_CONFIG) )
      set_transient('random_seed', $seed);
  }

  // Take the first 8 digits for our value
  $value = substr($rnd_value, 0, 8);

  // Strip the first eight, leaving the remainder for the next call to wp_rand().
  $rnd_value = substr($rnd_value, 8);

  $value = abs(hexdec($value));

  // Reduce the value to be within the min - max range
  // 4294967295 = 0xffffffff = max random number
  if ( $max != 0 )
    $value = $min + (($max - $min + 1) * ($value / (4294967295 + 1)));

  return abs(intval($value));
}

This is quite complex for generating a number! But the points of interest are:

  • Hashing functions (sha1 and md5) are used, which are going to be a lot slower than a standard generator, but they, at least in theory, have cryptographic strength
  • The random number is seeded with microtime() and mt_rand(), which is PHP's "advanced" randomization function)
  • The random number is restricted to 0 - 0xFFFFFFFF, which is pretty typical

In practice, due to the multiple seeds with difficult-to-predict values and the use of a hashing function to generate strong random numbers, this seems to be a good implementation of a password reset. My biggest concern is the complexity - using multiple hashing algorithms and hashing in odd ways (like hasing the value alone, then the hash with the seed). It has the feeling of being unsure what to do, so trying to do everything ‘just in case’. While I don’t expect to find any weaknesses in the implementation, it’s a little concerning.

Now, let’s take a look at my least favourite (although still reasonably strong) password-reset implementation: MediaWiki!

MediaWiki

MediaWiki 1.16.2 was actually the most difficult to find the password reset function in. Eventually, though, I managed to track it down to includes/specials/SpecialUserlogin.php:

    $np = $u->randomPassword();
    $u->setNewpassword( $np, $throttle );
    $u->saveSettings();
    $userLanguage = $u->getOption( 'language);
    $m = wfMsgExt( $emailText, array( 'parsemag', 'language=> $userLanguage ),       $ip, $u->getName(), $np,
        $wgServer . $wgScript, round( $wgNewPasswordExpiry / 86400 ) );
    $result = $u->sendMail( wfMsgExt( $emailTitle, array( 'parsemag',       'language=> $userLanguage ) ), $m );

$u->randomPassword() is found in includes/User.php looks like this:

  /**
   * Return a random password. Sourced from mt_rand, so it's not particularly secure.
   * @todo hash random numbers to improve security, like generateToken()
   *
   * @return \string New random password
   */
  static function randomPassword() {
    global $wgMinimalPasswordLength;
    $pwchars = 'ABCDEFGHJKLMNPQRSTUVWXYZabcdefghjkmnpqrstuvwxyz';
    $l = strlen( $pwchars ) - 1;

    $pwlength = max( 7, $wgMinimalPasswordLength );
    $digit = mt_rand( 0, $pwlength - 1 );
    $np = '';
    for ( $i = 0; $i < $pwlength; $i++ ) {
      $np .= $i == $digit ? chr( mt_rand( 48, 57 ) ) : $pwchars{ mt_rand( 0, $l ) };
    }
    return $np;
  }

This is easily the most complex, and also most dangerous, password-reset implementation that I’ve found.

First, the length is only 7 characters by default. That’s already an issue.

Second, the set of characters is letters (uppercase + lowercase) and exactly one number. And it looks to me like they put a lot of effort into figuring out just how to put that one number into the password. Initially, I thought this made the password slightly weaker due to the following calculations:

  • 7 characters @ 52 choices = 527 = 1,028,071,702,528
  • 6 characters @ 52 choices + 1 character @ 10 choices = 526 * 10 = 197,706,096,640

However, as my friend pointed out, because you don’t know where, exactly, the number will be placed, that actually adds an extra multiplier to the strength:

  • 6 characters @ 52 choices + 1 characters @ 10 choices + unknown number location = 526 * 10 * 7 = 1,383,942,676,480

So, in reality, adding a single number does improve the strength, but only by a bit.

Even with the extra number, though, the best we have at 7 characters is about 1.4 trillion choices. As with the others, that’s essentially impossible to guess/bruteforce remotely. That’s a good thing. However, with a password cracker and 5 million checks/second, it would take a little over 3.2 CPU-days to exhaustively crack all generated passwords, so that can very easily be achieved.

The other issue here is that the only source of entropy is PHP’s mt_rand() function. The next section will look at how PHP seeds this function.

PHP

All three of these implementations depend, in one way or another, on PHP’s mt_rand() function. The obvious question is, how strong is mt_rand()?

I’m only going to look at this from a high level for now. When I have some more time, I’m hoping to dig deeper into this and, with luck, bust it wide open. Stay tuned for that. :)

For now, though, let’s look at the function that’s used by all three password-reset functions: mt_rand(). mt_rand() is an implementation of the Mersenne Twister algorithm, which is a well tested random number generator with an advertised average period of 219937-1. That means that it won’t repeat until 219937-1 values are generated. I don’t personally have the skills to analyze the strength of the algorithm itself, but what I CAN look at is the seed.

Whether using rand() or mt_rand(), PHP automatically seeds the random number generator. The code is in ext/standard/rand.c, and looks like this:

PHPAPI long php_rand(TSRMLS_D)
{
    long ret;

    if (!BG(rand_is_seeded)) {
        php_srand(GENERATE_SEED() TSRMLS_CC);
    }
    // ...
}

Simple enough - if rand() is called without a seed, then seed it with the GENERATE_SEED() macro, which is found in ext/standard/php_rand.h:

#ifdef PHP_WIN32
#define GENERATE_SEED() (((long) (time(0) * GetCurrentProcessId())) ^
     ((
long)
(
1000000.0 * php_combined_lcg(TSRMLS_C))))
#else
#define GENERATE_SEED() (((long) (time(0) * getpid())) ^
     ((
long) (1000000.0 * php_combined_lcg(TSRMLS_C))))
#endif

So it’s seeded with the current time() (known), process id (weak), and php_combined_lcg(). What the heck is php_combined_lcg? Well, an LCG is a Linear Congruential Generator, a type of random number generator, and it’s defined at ext/standard/lcg.c so let’s take a look:

PHPAPI double php_combined_lcg(TSRMLS_D) /* {{{ */
{
    php_int32 q;
    php_int32 z;

    if (!LCG(seeded)) {
        lcg_seed(TSRMLS_C);
    }

    MODMULT(53668, 40014, 12211, 2147483563L, LCG(s1));
    MODMULT(52774, 40692, 3791, 2147483399L, LCG(s2));

    z = LCG(s1) - LCG(s2);
    if (z < 1) {
        z += 2147483562;
    }

    return z * 4.656613e-10;
}

This function also needs to be seeded! It’s pretty funny to seed a random number generator with another random number generator - what, exactly, does that improve?

Here is what lcg_seed(), in the same file, looks like:

static void lcg_seed(TSRMLS_D) /* {{{ */
{
    struct timeval tv;

    if (gettimeofday(&tv, NULL) == 0) {
        LCG(s1) = tv.tv_sec ^ (tv.tv_usec<<11);
    } else {
        LCG(s1) = 1;
    }
#ifdef ZTS
    LCG(s2) = (long) tsrm_thread_id();
#else 
    LCG(s2) = (long) getpid();
#endif

    /* Add entropy to s2 by calling gettimeofday() again */
    if (gettimeofday(&tv, NULL) == 0) {
        LCG(s2) ^= (tv.tv_usec<<11);
    }

    LCG(seeded) = 1;
}

This is seeded with the current time (known), the process id (weak), and the current time again (still known).

So to summarize, unless I’m missing something, PHP’s automatic seeding uses the following for entropy:

  • Current time (known value)
  • Process ID (predictable range)
  • php_combined_lcg
    • Current time (again)
    • Process id (again)
    • Current time (yet again)

I haven’t done any further research into PHP’s random number generator, but from what I’ve seen I don’t get a good feeling about it. It would be interesting if somebody took this a step further and actually wrote an attack against PHP’s random number implementation. That, or discovered a source of entropy that I was unaware of. Because, from the code I’ve looked at, it looks like there may be some problems.

An additional issue is that every seed generated is cast to a (long), which is 32-bits. That means that at the very most, despite the ridiculously long period of the mt_rand() function, there are only 4.2 billion possible seeds. That means, at the very best, an application that relies entirely on mt_rand() or rand() for their randomness are going to be a lot less random than they think!

It turns out, after a little research, I’m not the only one who’s noticed problems with PHP’s random functions. In fact, in that article, Stefan goes over a history of PHP’s random number issues. It turns out, what I’ve found is only the tip of the iceberg!

Observations

I hope the last three blogs have raised some awareness on how randomization can be used and abused. It turns out, using randomness is far more complex than people realize. First, you have to know how to use it properly; otherwise, you’ve already lost. Second, you have to consider how you’re generating the it in the first place.

It seems that the vast majority of applications make either one mistake or the other. It’s difficult to create “good” randomness, though, and I think the one that does the best job is actually SMF.

Recommendation

Here is what I would suggest:

  • Get your randomness from multiple sources
  • Save a good random seed between sessions (eg, save the last output of the random number generator to the database)
  • Use cryptographically secure functions for random generation (for example, hashing functions)
  • Don't limit your seeds to 32-bit values
  • Collect entropy in the application, if possible (what happens in your application that is impossible to guess/detect/force but that can accumulate?)

I’m sure there are some other great suggestions for ensuring your random numbers are cryptographically secure, and I’ve love to hear them!

Comments

Join the conversation on this Mastodon post (replies will appear below)!

    Loading comments...