To double quote or not, that's the question!

Florian Engelhardt - Aug 16 - - Dev Community

Just recently I heard again that PHP folks still talk about single quotes vs. double quotes and that using single quotes is just a micro optimisation but if you get used to using single quotes all the time you'd save a bunch of CPU cycles!

"Everything has already been said, but not yet by everyone" – Karl Valentin

It is in this spirit that I am writing an article about the same topic Nikita Popov did already 12 years ago (if you are reading his article, you can stop reading here).

What is the fuzz all about?

PHP performs string interpolation, in which it searches for the use of variables in a string and replaces them with the value of the variable used:

$juice = "apple";
echo "They drank some $juice juice.";
// will output: They drank some apple juice.
Enter fullscreen mode Exit fullscreen mode

This feature is limited to strings in double quotes and heredoc. Using single quotes (or nowdoc) will yield a different result:

$juice = "apple";
echo 'They drank some $juice juice.';
// will output: They drank some $juice juice.
Enter fullscreen mode Exit fullscreen mode

Look at that: PHP will not search for variables in that single quoted string. So we could just start using single quotes everywhere. So people started suggesting changes like this ..

- $juice = "apple";
+ $juice = 'apple';
Enter fullscreen mode Exit fullscreen mode

.. because it'll be faster and it'd save a bunch of CPU cycles with every execution of that code because PHP does not look for variables in single quoted strings (which are non-existent in the example anyway) and everyone is happy, case closed.

Case closed?

Obviously there is a difference in using single quotes vs. double quotes, but in order to understand what is going on we need to dig a bit deeper.

Even though PHP is an interpreted language it is using a compile step in which certain parts play together to get something the virtual machine can actually execute, which is opcodes. So how do we get from PHP source code to opcodes?

The lexer

The lexer scans the source code file and breaks it down into tokens. A simple example of what this means can be found in the token_get_all() function documentation. A PHP source code of just <?php echo ""; becomes these tokens:

T_OPEN_TAG (<?php )
T_ECHO (echo)
T_WHITESPACE ( )
T_CONSTANT_ENCAPSED_STRING ("")
Enter fullscreen mode Exit fullscreen mode

We can see this in action and play with it in this 3v4l.org snippet.

The parser

The parser takes these tokens and generates an abstract syntax tree from them. An AST representation of the above example looks like this when represented as a JSON:

{
  "data": [
    {
      "nodeType": "Stmt_Echo",
      "attributes": {
        "startLine": 1,
        "startTokenPos": 1,
        "startFilePos": 6,
        "endLine": 1,
        "endTokenPos": 4,
        "endFilePos": 13
      },
      "exprs": [
        {
          "nodeType": "Scalar_String",
          "attributes": {
            "startLine": 1,
            "startTokenPos": 3,
            "startFilePos": 11,
            "endLine": 1,
            "endTokenPos": 3,
            "endFilePos": 12,
            "kind": 2,
            "rawValue": "\"\""
          },
          "value": ""
        }
      ]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

In case you wanna play with this as well and see how the AST for other code looks like, I found https://phpast.com/ by Ryan Chandler and https://php-ast-viewer.com/ which both show you the AST of a given piece of PHP code.

The compiler

The compiler takes the AST and creates opcodes. The opcodes are the things the virtual machine executes, it is also what will be stored in the OPcache if you have that setup and enabled (which I highly recommend).

To view the opcodes we have multiple options (maybe more, but I do know these three):

  1. use the vulcan logic dumper extension. It is also baked into 3v4l.org
  2. use phpdbg -p script.php to dump the opcodes
  3. or use the opcache.opt_debug_level INI setting for OPcache to make it print out the opcodes
    • a value of 0x10000 outputs opcodes before optimisation
    • a value of 0x20000 outputs opcodes after optimisation
$ echo '<?php echo "";' > foo.php
$ php -dopcache.opt_debug_level=0x10000 foo.php
$_main:
...
0000 ECHO string("")
0001 RETURN int(1)
Enter fullscreen mode Exit fullscreen mode

Hypothesis

Coming back to the initial idea of saving CPU cycles when using single quotes vs. double quotes, I think we all agree that this would only be true if PHP would evaluate these strings at runtime for every single request.

What happens at runtime?

So let's see which opcodes PHP creates for the two different versions.

Double quotes:

<?php echo "apple";
Enter fullscreen mode Exit fullscreen mode
0000 ECHO string("apple")
0001 RETURN int(1)
Enter fullscreen mode Exit fullscreen mode

vs. single quotes:

<?php echo 'apple';
Enter fullscreen mode Exit fullscreen mode
0000 ECHO string("apple")
0001 RETURN int(1)
Enter fullscreen mode Exit fullscreen mode

Hey wait, something weird happened. This looks identical! Where did my micro optimisation go?

Well maybe, just maybe the ECHO opcode handler's implementation parses the given string, although there is no marker or something else which tells it to do so ... hmm 🤔

Let's try a different approach and see what the lexer does for those two cases:

Double quotes:

T_OPEN_TAG (<?php )
T_ECHO (echo)
T_WHITESPACE ( )
T_CONSTANT_ENCAPSED_STRING ("")
Enter fullscreen mode Exit fullscreen mode

vs. single quotes:

T_OPEN_TAG (<?php )
T_ECHO (echo)
T_WHITESPACE ( )
T_CONSTANT_ENCAPSED_STRING ('')
Enter fullscreen mode Exit fullscreen mode

The tokens are still distinguishing between double and single quotes, but checking the AST will give us an identical result for both cases - the only difference is the rawValue in the Scalar_String node attributes, that still has the single/double quotes, but the value uses double quotes in both cases.

New Hypothesis

Could it be, that string interpolation is actually done at compile time?

Let's check with a slightly more "sophisticated" example:

<?php
$juice="apple";
echo "juice: $juice";
Enter fullscreen mode Exit fullscreen mode

Tokens for this file are:

T_OPEN_TAG (<?php)
T_VARIABLE ($juice)
T_CONSTANT_ENCAPSED_STRING ("apple")
T_WHITESPACE ()
T_ECHO (echo)
T_WHITESPACE ( )
T_ENCAPSED_AND_WHITESPACE (juice: )
T_VARIABLE ($juice)
Enter fullscreen mode Exit fullscreen mode

Look at the last two tokens! String interpolation is handled in the lexer and as such is a compile time thing and has nothing to do with runtime.

BUSTED

For completeness, let's have a look at the opcodes generated by this (after optimisation, using 0x20000):

0000 ASSIGN CV0($juice) string("apple")
0001 T2 = FAST_CONCAT string("juice: ") CV0($juice)
0002 ECHO T2
0003 RETURN int(1)
Enter fullscreen mode Exit fullscreen mode

This is different opcode than we had in our simple <?php echo ""; example, but this is okay because we are doing something different here.

Get to the point: should I concat or interpolate?

Let's have a look at these three different versions:

<?php
$juice = "apple";
echo "juice: $juice $juice";
echo "juice: ", $juice, " ", $juice;
echo "juice: ".$juice." ".$juice;
Enter fullscreen mode Exit fullscreen mode
  • the first version is using string interpolation
  • the second is using a comma separation (which AFAIK only works with echo and not with assigning variables or anything else)
  • and the third option uses string concatenation

The first opcode assigns the string "apple" to the variable $juice:

0000 ASSIGN CV0($juice) string("apple")
Enter fullscreen mode Exit fullscreen mode

The first version (string interpolation) is using a rope as the underlying data structure, which is optimised to do as little string copies as possible.

0001 T2 = ROPE_INIT 4 string("juice: ")
0002 T2 = ROPE_ADD 1 T2 CV0($juice)
0003 T2 = ROPE_ADD 2 T2 string(" ")
0004 T1 = ROPE_END 3 T2 CV0($juice)
0005 ECHO T1
Enter fullscreen mode Exit fullscreen mode

The second version is the most memory effective as it does not create an intermediate string representation. Instead it does multiple calls to ECHO which is a blocking call from an I/O perspective so depending on your use case this might be a downside.

0006 ECHO string("juice: ")
0007 ECHO CV0($juice)
0008 ECHO string(" ")
0009 ECHO CV0($juice)
Enter fullscreen mode Exit fullscreen mode

The third version uses CONCAT/FAST_CONCAT to create an intermediate string representation and as such might do more memory copies and/or use more memory than the rope version.

0010 T1 = CONCAT string("juice: ") CV0($juice)
0011 T2 = FAST_CONCAT T1 string(" ")
0012 T1 = CONCAT T2 CV0($juice)
0013 ECHO T1
Enter fullscreen mode Exit fullscreen mode

So ... what is the right thing to do here and why is it string interpolation?

String interpolation uses either a FAST_CONCAT in the case of echo "juice: $juice"; or highly optimised ROPE_* opcodes in the case of echo "juice: $juice $juice";, but most important it communicates the intent clearly and none of this has been bottle neck in any of the PHP applications I have worked with so far, so none of this actually matters.

TLDR

String interpolation is a compile time thing. Granted, without OPcache the lexer will have to check for variables used in double quoted strings on every request, even if there aren't any, waisting CPU cycles, but honestly: The problem is not the double quoted strings, but not using OPcache!

However, there is one caveat: PHP up to 4 (and I believe even including 5.0 and maybe even 5.1, I don't know) did string interpolation at runtime, so using these versions ... hmm, I guess if anyone really still uses PHP 5, the same as above applies: The problem is not the double quoted strings, but the use of an outdated PHP version.

Final advice

Update to the latest PHP version, enable OPcache and live happily ever after!

[Edit: August 16th]

What about sprintf()?

So actually I intended to say that none of this is a performance problem, if you are using string interpolation, single quotes and concatenation or anything else. Someone stepped up and mentioned sprintf() and where this clocks in performance-wise. So for the sake of completeness, lets have a look at sprintf():

<?php
$juice = "apple";
echo sprintf("juice: %s %s", $juice, $juice);
Enter fullscreen mode Exit fullscreen mode

compiles to the following opcode:

0000 ASSIGN CV0($juice) string("apple")
0001 INIT_FCALL 3 128 string("sprintf")
0002 SEND_VAL string("juice: %s %s") 1
0003 SEND_VAR CV0($juice) 2
0004 SEND_VAR CV0($juice) 3
0005 V1 = DO_ICALL
0006 ECHO V1
Enter fullscreen mode Exit fullscreen mode

A quick benchmark shows that the sprintf() variant takes 14 to 21 times as long as the string interpolation variant on my local machine.

Here comes the catch: this is only true up to PHP 8.3, PHP 8.4 comes with another compile time optimisation that will treat sprintf() calls that just have %s and %d in them as if you wrote string interpolation:

0000 ASSIGN CV0($juice) string("apple")
0001 T2 = ROPE_INIT 4 string("juice: ")
0002 T2 = ROPE_ADD 1 T2 CV0($juice)
0003 T2 = ROPE_ADD 2 T2 string(" ")
0004 T1 = ROPE_END 3 T2 CV0($juice)
0005 ECHO T1
Enter fullscreen mode Exit fullscreen mode

So the final advice still holds: update to the latest PHP version (well, maybe wait with upgrading to PHP 8.4 until there is a stable release).

. . . . . . . .
Terabox Video Player