4
\$\begingroup\$

When PHP converts a number to string (for printing it out, for example), sometimes it may convert it to scientific notation out of nowhere (0.000021 for example). Or the number can be already in a string format with exponent. But most people prefer to see a conventional decimal number instead.

As it turned out, existing answers on Stack Overflow are expectedy short-sighted or at least incomplete, so I decided to try a complete universal solution. Not sure though if it covers all edge cases and if it can be improved in general.

Type hinting is intentionally left out for compatibility.

function scientific2decimal($num, $decimal_separator = ".", $thousands_separator = ",")
{
    if (!preg_match('!\d+(\.(\d+))?e([+-]?)(\d+)$!i', $num, $matches)) {
        return $num;
    }
    
    list(,,$decimals, $sign, $exponent) = $matches;
    $sign = $sign ?: "+";
    $actual_decimals = strlen($decimals);
    if ($sign === '+') {
        $number_of_decimals = max(0, $actual_decimals - $exponent);
    } else {
        $number_of_decimals = $exponent + $actual_decimals;
    }
    return number_format($num, $number_of_decimals, $decimal_separator, $thousands_separator);
}

a simple test:

$test = [
    0.000021,
    '1e3',
    '1.1337228E-3',
    '1.1337228E-6',
    '1.0002E3',
    '1.13372223434E+6',
    '2.133333E-5',
];
foreach ($test as $num) {
    echo $num, ": ", scientific2decimal($num), "\n";
}
\$\endgroup\$
1
  • \$\begingroup\$ I'd simplify it like return rtrim(rtrim(number_format(floatval($num), strlen($num), $decimal_separator, $thousands_separator), '0'), $decimal_separator); \$\endgroup\$ Commented Sep 19 at 18:38

4 Answers 4

5
\$\begingroup\$

I don't think there's much to improve here, it seems to work quite well. You check the input, determine the number of decimals and then output the number. It's not all that complex.

Personally I don't like regular expressions, so I wouldn't use one of those. The whole preg_match() thing is quite a heavy operation and requires thorough analysis to understand completely and prevent bugs from creeping in.

So, I propose an alternative that does away with it:

function number2decimal($number, $decimal_separator = ".", $thousands_separator = ",")
{
   if (!is_numeric($number)) {
       return $number;
   }
   $parts = explode('e', strtolower($number));    
   if (count($parts) != 2) {
       return $number;
   }
   [$base, $exponent] = $parts;
   $number_of_decimals = -$exponent + strlen($base) - strrpos($base, '.') - 1;
   return number_format($number, $number_of_decimals, $decimal_separator, $thousands_separator);
}

If the input isn't a number, we return it. Then we check if it is a scientific number, and if not we return it. Those two checks should cover your regular expression. They are simpler, easier to read, and probably quicker to execute.

I've simplified the way to get the needed number of decimals. It seems that number_format() ignores negative numbers and I get the sign of the exponent with the exponent itself.

Finally I do the same as you: Return the formatted string.

Live demo: https://3v4l.org/o58KM

I did find that I'm a bit stricter than you when it comes to numbers. For instance '1.0002 E 3' is not a number in my routine.

\$\endgroup\$
1
  • \$\begingroup\$ Above all I like the exponent handling trick! and yes, overall code is much simpler, my respect \$\endgroup\$ Commented Sep 17 at 17:23
6
\$\begingroup\$

round tripping

There's a finite number of floats, fewer than \$2^{64}\$ of them. The OP scientific2decimal() is serializing a float to a string.

I am skeptical that your library's callers would be happy with a routine that won't successfully send the same quantity back and forth, float to string and with floatval() back to float, unharmed and unchanged. (Use str_replace(',', '', $number) to elide , commas prior to the float conversion.)

The difficulty stems from the last line where we call number_format(). The documentation explains that we format

decimal digits using the rounding half up rule

which is guaranteed to break round tripping. There's a whole literature on this topic.

The decimal base ten has a pair of factors, \$2\$ and \$5\$, and one of them is odd, which spells trouble for binary base conversions. (BTW where @KIKO Software wrote $base he meant $significand, or perhaps mantissa.)

Some decimal quantities, like .5 and .25, convert to binary FP nicely. Most conversions, like .1 and .3, will be inexact, with repeating bits similar to the repeating digits a calculator displays for 1/3 or 1/7. So floatval() chooses the 64-bit quantity that comes closest.

When converting to decimal string, we should strive to perform the reverse operation. But the "rounding half up" approach messes that up. You might prefer to build conversion on top of sprintf(). An important use case that number_format() was designed to handle is deliberately truncating trailing digits in a way that humans will find pleasing, and it is well suited for that. But the OP code is trying to preserve a number, not truncate some trailing digits.

BTW, the proposed Kiko test case of (unquoted) 123456789842794767576576 seems unfair, given that php can't quite represent that quantity. It has too many digits. Of the two numbers below, the second seems like a more appropriate test case.

  • 123456789842794767576576
  • 123456789842794767500000

type stability

Caller might pass in diverse types.

    if (!preg_match ... ) {
        return $num;
    }

It seems reasonable to force $num to a string, here. Then caller can safely assume a string type as it continues to work with the result.

\$\endgroup\$
5
  • 1
    \$\begingroup\$ I don't think that round tripping is the intention here, number_format() is an output function. The unfair test case 123456789842794767576576 was added to check whether both routines could handle it. Unfair tests should be part of most software testing. \$\endgroup\$ Commented Sep 17 at 17:20
  • \$\begingroup\$ I only wrote an answer at all because I found it remarkable that an original and a refactored function would be apparently identical and yet could produce such very different results. I was saying that when reading the source I found the last several decimal digits distracting since they do not really mean anything -- php parsing crushes such constants down to a 53-bit significand. A pair of tests for ...576 and for ...577 wouldn't actually be two distinct tests, they are the same thing, and I always worry about misleading the reader with code having surprising behavior. \$\endgroup\$
    – J_H
    Commented Sep 17 at 17:30
  • \$\begingroup\$ Yes, there are definitely limits to the numbers that PHP can represent, and I intentionally went over them, just to see what would happen using both routines. I didn't realize that this could mislead any readers. \$\endgroup\$ Commented Sep 17 at 17:36
  • \$\begingroup\$ Oh, I did considered sprintf(), but, as far as I know, it doesn't insert thousands separators, so I stayed with number_format(). \$\endgroup\$ Commented Sep 17 at 17:39
  • \$\begingroup\$ Right. I mean, I'm not saying floats are easy 8^) -- there's ever so many details! Obtaining decimal digits with sprintf() and then post-processing to jam the , commas in there could be one viable approach, I'm sure there's others. When I see code that manipulates integers, I know we're on solid ground (assuming no overflow). When I see FP code, I get nervous, perhaps more than many other engineers. Been bitten too many times. (Like when 80-bit intermediate results on one host don't match up with pure 64-bit results that another host serialized. Both are "correct". Sigh!) \$\endgroup\$
    – J_H
    Commented Sep 17 at 17:42
3
\$\begingroup\$

Kiko already covered many points and suggested an alternative, which uses destructuring assignment instead of calling list().

It is rare to see exclamation marks as delimiters for regular expressions.

One might want to consider using single quotes for strings unless variable expansion is needed.

\$\endgroup\$
-3
\$\begingroup\$

Here’s an improved version of your PHP function to convert scientific notation to a decimal format. I've made the function more efficient, clearer, and capable of handling edge cases.

function scientificToDecimal($num) {
    // Validate if the input is in scientific notation
    if (!preg_match('/^[+\-]?\d+(\.\d+)?([eE][+\-]?\d+)?$/', $num)) {
        return $num;
    }

    // Split the number into base and exponent
    $numParts = explode('e', strtolower($num));
    $base = $numParts[0];
    $exponent = $numParts[1] ?? 0;

    // If there's no exponent, return the base as it is
    if ($exponent == 0) {
        return $base;
    }

    // Calculate the decimal form
    $decimalForm = number_format($base * pow(10, $exponent), max(0, (int) -$exponent), '.', '');

    // Remove trailing zeros and unnecessary decimal point
    return rtrim(rtrim($decimalForm, '0'), '.');
}

// Test cases to demonstrate the function
$tests = [
    '0.000021', 
    '1e3', 
    '1.1337228e-6', 
    '1.0002E3', 
    '1.13372223434E+6', 
    '2.13333E-5'
];

foreach ($tests as $testNum) {
    echo $testNum . " => " . scientificToDecimal($testNum) . "\n";
}

Key Improvements:

  1. Validation: I added preg_match to ensure the number follows a valid scientific notation format before trying to process it.
  2. Handling positive and negative signs: The function now properly handles both positive and negative numbers.
  3. Cleanup of output: The result is trimmed to remove unnecessary trailing zeros and decimal points, making the output cleaner.

This version is more robust and should handle various edge cases while providing clean output in decimal format.

\$\endgroup\$
1
  • 6
    \$\begingroup\$ Thank you for trying, but I would like to see some explanations instead of just statements. 1. Why "adding" preg_match when it's already there? 2. What is exactly wrong with positive and negative numbers in the current function? 3. Why there should be trailing zeros and or decimal points? 4. Which AI model you were using? \$\endgroup\$ Commented Sep 18 at 5:38

Not the answer you're looking for? Browse other questions tagged or ask your own question.