(Mostly) good password resets

Hey everybody!

This is part 3 to my 2-part series on password reset attacks (Part 1 / Part 2). Overall, I got awesome feedback on the first two parts, but I got the same question over and over: what's the RIGHT way to do this?

So, here's the thing. I like to break stuff, but I generally leave the fixing to somebody else. It's just safer that way, since I'm not really a developer or anything like that. Instead, I'm going to continue the trend of looking at others' implementations by looking at three major opensource projects - WordPress, SMF, and MediaWiki. Then, since all of these rely on PHP's random number implementation to some extent, I'll take a brief look at PHP.

SMF

SMF 1.1.13 implements the password-reset function in Sources/Subs-Auth.php:

  // Generate a random password.
  require_once($sourcedir . '/Subs-Members.php');
  $newPassword = generateValidationCode();
  $newPassword_sha1 = sha1(strtolower($user) . $newPassword);

Looking at Sources/Subs-Members.php, we find:

// Generate a random validation code.
function generateValidationCode()
{
  global $modSettings;

  $request = db_query('
    SELECT RAND()', __FILE__, __LINE__);

  list ($dbRand) = mysql_fetch_row($request);
  mysql_free_result($request);

  return substr(preg_replace('/\W/', '', sha1(microtime() . mt_rand() . $dbRand .
      $modSettings['rand_seed'])), 0, 10);
}

Which is pretty straight forward, but also, in my opinion, very strong. It takes entropy from a bunch of different places:

  • The current time (microtime())
  • PHP's random number generator (mt_rand())
  • MySQL's random number generator ($dbRand)
  • A user-configurable random seed

Essentially, it puts these difficult-to-guess values through a cryptographically secure function, sha1(), and takes the first 10 characters of the hash.

The hash consists of lowercase letters and numbers, which means there are 36 possible choices for 10 characters, for a total of 3610 or 3,656,158,440,062,976 possible outputs. That isn't as strong as it *could* be, since there's no reason to limit its length to 10 characters (or its character set to 36 characters). That being said, three quadrillion different passwords would be nearly impossible to guess. (By my math, exhaustively cracking all possible passwords, assuming md5 cracks at 5 million guesses/second, would take about 23 CPU-years). Not that cracking is terribly useful - remote bruteforce guessing is much more useful and is clearly impossible.

SMF is my favourite implementation of the three, but let's take a look at WordPress!

WordPress

WordPress 3.1 implements the password-reset function in wp-login.php:

  $key = $wpdb->get_var($wpdb->prepare("SELECT user_activation_key FROM
      $wpdb->users WHERE user_login = %s", $user_login));
  if ( empty($key) ) {
    // Generate something random for a key...
    $key = wp_generate_password(20, false);
    do_action('retrieve_password_key', $user_login, $key);
    // Now insert the new md5 key into the db
    $wpdb->update($wpdb->users, array('user_activation_key=> $key),       array('user_login=> $user_login));
  }

wp_generate_password() is found in wp-includes/pluggable.php:

/**
 * Generates a random password drawn from the defined set of characters.
 *
 * @since 2.5
 *
 * @param int $length The length of password to generate
 * @param bool $special_chars Whether to include standard special characters.
      Default true.

 * @param bool $extra_special_chars Whether to include other special characters.
 *   Used when generating secret keys and salts. Default false.
 * @return string The random password
 **/
function wp_generate_password( $length = 12, $special_chars = true, $       extra_special_chars = false ) {
  $chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
  if ( $special_chars )
    $chars .= '!@#$%^&*()';
  if ( $extra_special_chars )
    $chars .= '-_ []{}<>~`+=,.;:/?|';

  $password = '';
  for ( $i = 0; $i < $length; $i++ ) {
    $password .= substr($chars, wp_rand(0, strlen($chars) - 1), 1);
  }

  // random_password filter was previously in random_password function which was       deprecated
  return apply_filters('random_password', $password);
}

This generates a string of random characters (and possibly symbols) up to a defined length, choosing the characters using wp_rand(). So, for the final step, how is wp_rand() implemented? It's also found in wp-includes/pluggable.php and looks like this:

  global $rnd_value;

  // Reset $rnd_value after 14 uses
  // 32(md5) + 40(sha1) + 40(sha1) / 8 = 14 random numbers from $rnd_value
  if ( strlen($rnd_value) < 8 ) {
    if ( defined( 'WP_SETUP_CONFIG) )
      static $seed = '';
    else
      $seed = get_transient('random_seed');
    $rnd_value = md5( uniqid(microtime() . mt_rand(), true ) . $seed );
    $rnd_value .= sha1($rnd_value);
    $rnd_value .= sha1($rnd_value . $seed);
    $seed = md5($seed . $rnd_value);
    if ( ! defined( 'WP_SETUP_CONFIG) )
      set_transient('random_seed', $seed);
  }

  // Take the first 8 digits for our value
  $value = substr($rnd_value, 0, 8);

  // Strip the first eight, leaving the remainder for the next call to wp_rand().
  $rnd_value = substr($rnd_value, 8);

  $value = abs(hexdec($value));

  // Reduce the value to be within the min - max range
  // 4294967295 = 0xffffffff = max random number
  if ( $max != 0 )
    $value = $min + (($max - $min + 1) * ($value / (4294967295 + 1)));

  return abs(intval($value));
}

This is quite complex for generating a number! But the points of interest are:

  • Hashing functions (sha1 and md5) are used, which are going to be a lot slower than a standard generator, but they, at least in theory, have cryptographic strength
  • The random number is seeded with microtime() and mt_rand(), which is PHP's "advanced" randomization function)
  • The random number is restricted to 0 - 0xFFFFFFFF, which is pretty typical

In practice, due to the multiple seeds with difficult-to-predict values and the use of a hashing function to generate strong random numbers, this seems to be a good implementation of a password reset. My biggest concern is the complexity - using multiple hashing algorithms and hashing in odd ways (like hasing the value alone, then the hash with the seed). It has the feeling of being unsure what to do, so trying to do everything 'just in case'. While I don't expect to find any weaknesses in the implementation, it's a little concerning.

Now, let's take a look at my least favourite (although still reasonably strong) password-reset implementation: MediaWiki!

MediaWiki

MediaWiki 1.16.2 was actually the most difficult to find the password reset function in. Eventually, though, I managed to track it down to includes/specials/SpecialUserlogin.php:

    $np = $u->randomPassword();
    $u->setNewpassword( $np, $throttle );
    $u->saveSettings();
    $userLanguage = $u->getOption( 'language);
    $m = wfMsgExt( $emailText, array( 'parsemag', 'language=> $userLanguage ),       $ip, $u->getName(), $np,
        $wgServer . $wgScript, round( $wgNewPasswordExpiry / 86400 ) );
    $result = $u->sendMail( wfMsgExt( $emailTitle, array( 'parsemag',       'language=> $userLanguage ) ), $m );

$u->randomPassword() is found in includes/User.php looks like this:

  /**
   * Return a random password. Sourced from mt_rand, so it's not particularly secure.
   * @todo hash random numbers to improve security, like generateToken()
   *
   * @return \string New random password
   */
  static function randomPassword() {
    global $wgMinimalPasswordLength;
    $pwchars = 'ABCDEFGHJKLMNPQRSTUVWXYZabcdefghjkmnpqrstuvwxyz';
    $l = strlen( $pwchars ) - 1;

    $pwlength = max( 7, $wgMinimalPasswordLength );
    $digit = mt_rand( 0, $pwlength - 1 );
    $np = '';
    for ( $i = 0; $i < $pwlength; $i++ ) {
      $np .= $i == $digit ? chr( mt_rand( 48, 57 ) ) : $pwchars{ mt_rand( 0, $l ) };
    }
    return $np;
  }

This is easily the most complex, and also most dangerous, password-reset implementation that I've found.

First, the length is only 7 characters by default. That's already an issue.

Second, the set of characters is letters (uppercase + lowercase) and exactly one number. And it looks to me like they put a lot of effort into figuring out just how to put that one number into the password. Initially, I thought this made the password slightly weaker due to the following calculations:

  • 7 characters @ 52 choices = 527 = 1,028,071,702,528
  • 6 characters @ 52 choices + 1 character @ 10 choices = 526 * 10 = 197,706,096,640

However, as my friend pointed out, because you don't know where, exactly, the number will be placed, that actually adds an extra multiplier to the strength:

  • 6 characters @ 52 choices + 1 characters @ 10 choices + unknown number location = 526 * 10 * 7 = 1,383,942,676,480

So, in reality, adding a single number does improve the strength, but only by a bit.

Even with the extra number, though, the best we have at 7 characters is about 1.4 trillion choices. As with the others, that's essentially impossible to guess/bruteforce remotely. That's a good thing. However, with a password cracker and 5 million checks/second, it would take a little over 3.2 CPU-days to exhaustively crack all generated passwords, so that can very easily be achieved.

The other issue here is that the only source of entropy is PHP's mt_rand() function. The next section will look at how PHP seeds this function.

PHP

All three of these implementations depend, in one way or another, on PHP's mt_rand() function. The obvious question is, how strong is mt_rand()?

I'm only going to look at this from a high level for now. When I have some more time, I'm hoping to dig deeper into this and, with luck, bust it wide open. Stay tuned for that. :)

For now, though, let's look at the function that's used by all three password-reset functions: mt_rand(). mt_rand() is an implementation of the Mersenne Twister algorithm, which is a well tested random number generator with an advertised average period of 219937-1. That means that it won't repeat until 219937-1 values are generated. I don't personally have the skills to analyze the strength of the algorithm itself, but what I CAN look at is the seed.

Whether using rand() or mt_rand(), PHP automatically seeds the random number generator. The code is in ext/standard/rand.c, and looks like this:

PHPAPI long php_rand(TSRMLS_D)
{
    long ret;

    if (!BG(rand_is_seeded)) {
        php_srand(GENERATE_SEED() TSRMLS_CC);
    }
    // ...
}

Simple enough - if rand() is called without a seed, then seed it with the GENERATE_SEED() macro, which is found in ext/standard/php_rand.h:

#ifdef PHP_WIN32
#define GENERATE_SEED() (((long) (time(0) * GetCurrentProcessId())) ^
     ((
long)
(
1000000.0 * php_combined_lcg(TSRMLS_C))))
#else
#define GENERATE_SEED() (((long) (time(0) * getpid())) ^
     ((
long) (1000000.0 * php_combined_lcg(TSRMLS_C))))
#endif

So it's seeded with the current time() (known), process id (weak), and php_combined_lcg(). What the heck is php_combined_lcg? Well, an LCG is a Linear Congruential Generator, a type of random number generator, and it's defined at ext/standard/lcg.c so let's take a look:

PHPAPI double php_combined_lcg(TSRMLS_D) /* {{{ */
{
    php_int32 q;
    php_int32 z;

    if (!LCG(seeded)) {
        lcg_seed(TSRMLS_C);
    }

    MODMULT(53668, 40014, 12211, 2147483563L, LCG(s1));
    MODMULT(52774, 40692, 3791, 2147483399L, LCG(s2));

    z = LCG(s1) - LCG(s2);
    if (z < 1) {
        z += 2147483562;
    }

    return z * 4.656613e-10;
}

This function also needs to be seeded! It's pretty funny to seed a random number generator with another random number generator - what, exactly, does that improve?

Here is what lcg_seed(), in the same file, looks like:

static void lcg_seed(TSRMLS_D) /* {{{ */
{
    struct timeval tv;

    if (gettimeofday(&tv, NULL) == 0) {
        LCG(s1) = tv.tv_sec ^ (tv.tv_usec<<11);
    } else {
        LCG(s1) = 1;
    }
#ifdef ZTS
    LCG(s2) = (long) tsrm_thread_id();
#else 
    LCG(s2) = (long) getpid();
#endif

    /* Add entropy to s2 by calling gettimeofday() again */
    if (gettimeofday(&tv, NULL) == 0) {
        LCG(s2) ^= (tv.tv_usec<<11);
    }

    LCG(seeded) = 1;
}

This is seeded with the current time (known), the process id (weak), and the current time again (still known).

So to summarize, unless I'm missing something, PHP's automatic seeding uses the following for entropy:

  • Current time (known value)
  • Process ID (predictable range)
  • php_combined_lcg
    • Current time (again)
    • Process id (again)
    • Current time (yet again)

I haven't done any further research into PHP's random number generator, but from what I've seen I don't get a good feeling about it. It would be interesting if somebody took this a step further and actually wrote an attack against PHP's random number implementation. That, or discovered a source of entropy that I was unaware of. Because, from the code I've looked at, it looks like there may be some problems.

An additional issue is that every seed generated is cast to a (long), which is 32-bits. That means that at the very most, despite the ridiculously long period of the mt_rand() function, there are only 4.2 billion possible seeds. That means, at the very best, an application that relies entirely on mt_rand() or rand() for their randomness are going to be a lot less random than they think!

It turns out, after a little research, I'm not the only one who's noticed problems with PHP's random functions. In fact, in that article, Stefan goes over a history of PHP's random number issues. It turns out, what I've found is only the tip of the iceberg!

Observations

I hope the last three blogs have raised some awareness on how randomization can be used and abused. It turns out, using randomness is far more complex than people realize. First, you have to know how to use it properly; otherwise, you've already lost. Second, you have to consider how you're generating the it in the first place.

It seems that the vast majority of applications make either one mistake or the other. It's difficult to create "good" randomness, though, and I think the one that does the best job is actually SMF.

Recommendation

Here is what I would suggest:

  • Get your randomness from multiple sources
  • Save a good random seed between sessions (eg, save the last output of the random number generator to the database)
  • Use cryptographically secure functions for random generation (for example, hashing functions)
  • Don't limit your seeds to 32-bit values
  • Collect entropy in the application, if possible (what happens in your application that is impossible to guess/detect/force but that can accumulate?)

I'm sure there are some other great suggestions for ensuring your random numbers are cryptographically secure, and I've love to hear them!

10 thoughts on “(Mostly) good password resets

  1. Reply

    paranoia

    Nice analysis.

    Dokuwiki:

    [code]
    /**
    * Create a pronouncable password
    *
    * @author Andreas Gohr
    * @link http://www.phpbuilder.com/annotate/message.php3?id=1014451
    *
    * @return string pronouncable password
    */
    function auth_pwgen(){
    $pw = '';
    $c = 'bcdfghjklmnprstvwz'; //consonants except hard to speak ones
    $v = 'aeiou'; //vowels
    $a = $c.$v; //both

    //use two syllables...
    for($i=0;$i < 2; $i++){
    $pw .= $c[rand(0, strlen($c)-1)];
    $pw .= $v[rand(0, strlen($v)-1)];
    $pw .= $a[rand(0, strlen($a)-1)];
    }
    //... and add a nice number
    $pw .= rand(10,99);

    return $pw;
    }
    [/code]

  2. Reply

    Niels2000

    Thank you for an interesting article.

    I wrote a bit of code that attempts to address some of your points (although I intentionally neglected to use a carry-over to keep things simple):

    http://codereview.stackexchange.com/questions/1494/password-generation-function-using-custom-seed

  3. Reply

    CrazyD

    taking only the first 10 char of the sha1() is A LOT less entropy then you state. Its not as simple as you make it.

    1. Reply

      Ron Bowes Post author

      @CrazyD - the plus side of using sha1 is that the entropy from all sources is (in theory) evenly divided throughout the 160-bit SHA value before being chopped to the first 10 characters (40 bits). You definitely lose a lot of potential values, but the entropy from the sources is still there.

  4. Reply

    jop

    Isn't the best way to generate random numbers just to use the OS:es random? The kernel have access to much better entropy sources (hardware latencys etc) then anything in user land.

    I'm really surprised to see that PHP doesn't use that...

    1. Reply

      Ron Bowes Post author

      @jop - There are a few issues:
      - Each OS has a different interface to its randomness, so it's difficult to know how
      - Some OSes don't have a true random store - I'm not sure if Windows (especially older versions?) does
      - Attackers can exhaust the entropy store before performing the attack - this will either create a DoS (because no entropy is left) or predictable numbers (because it falls back to a PRNG)

      Those can can be fixed, and likely are in modern OSes. I'm just thinking of possible drawbacks.

  5. Reply

    jop

    @ron

    The first two things are inconveniences but a large project like PHP should be able to overcome them.

    However I'd love to see a blog post elaborating on the last issue. As a random noob I would just think that by using hashes and not letting an attacker see the internal state of the PRNG would be enough. (So even though there are no entropy left, one would need to break the hash to predict numbers)?

    / Jonatan

  6. Reply

    Mike

    Great writeup of password generation!

    1. There were problems in PHP's LCG as noted by samy kumar: http://samy.pl/phpwn/ (2010)

    2. LCG has been given increased entropy by calling gettimeofday() again and using it for high bits (tv.tv_usec<<11) example here:
    http://www.wickedweb.co.uk/blog-entry/_/under-the-hood/48/

  7. Reply

    MD5_is_faster

    Adding to CrazyD's comment for the SMF estimates, I generally assume that MD5 can be guessed offline, today, at 45 billion guesses/sec*machine, not 5 million, per (http://hashcat.net/oclhashcat-lite/).
    This takes your 23 CPU-year estimate down to 23 machine-hours for an exhaustive brute force search.

  8. Reply

    Scott Arciszewski

    I prefer to use either mcrypt_create_iv($length, MCRYPT_DEV_URANDOM) or openssl_random_pseudo_bytes($length) based on availability. ($length = number of bytes; so for 256 bits we'd ask for $length = 32)

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>