UserTalk scripts often need to build large strings from smaller ones, such as when assembling a web page or RSS feed in Conversant, Manila, or even in the old static site engine. In order to optimize this process as much as possible, there are two things the scripter needs to remember: "Avoid copying large strings whenever possible," and "trigger the secret in-place append."
Here's the most basic of all string building routines:
text = text + s
In this simple assignment, we're building a string variable named "text" by appending another string variable named "s".
There's a hidden optimization here which Doug Baron mentioned years ago (when it was new). We'll call it the secret "in-place append." Internally, Frontier does "text += s".
We can think of the secret "in-place append" as a one step add-and-assign, for strings. In a normal assignment statement, even one as simple as "
x = y", an entirely new variable is allocated in memory. If
x is three characters long, and
y is five characters long, then a third chunk of memory five characters in size must be allocated for the new value of x (which is a copy of y's value).
Unfortunately, the simple example above still returns a copy of the result if it's the last line of a code block (like a local handler, a script or even an "if" block). We need to eliminate that copy.
text = text + s;
This is exactly like the first example, but the semicolon causes it to return true (the simplest, most efficient value in UserTalk) instead of a copy of
Now let's consider this more difficult example:
text = text + s + cr
This one makes a copy of
text and adds
s to it, and then adds
cr to that copy. The result of all this copying is assigned to
text. Finally, a copy of the result is "returned" (as mentioned above). Very inefficient.
The optimization (secret "in-place append") mentioned above is never triggered here, because there's more than one operation on the right side of the assignment operator.
text = text + s + cr;
This still makes the first two copies, but a copy of the result isn't returned because of the semi-colon. Still inefficient, but improving.
text = text + ( s + cr )
Parentheses are evaluated first, so we end up with exactly the same thing as our first example. There's a significant optimization here: a new string is made that is the concatenation of
cr, and then the
+= is performed on
text. Think of it as something like
text += ( s + cr ).
Unfortunately, a copy of the result is still returned because there's no semicolon.
text = text + ( s + cr );
This is the most efficient form if we're appending more than a single string, because it gives us the secret "in-place append" (again, that's implemented internally according to Doug, and my tests prove it to be true) and no implicit copy is 'returned' thanks to the semicolon.
There's still a string-copying step happening when the parentheses are evaluated (the first step). A new string is created that is the the concatenation of s and cr. In UserTalk, this is totally unavoidable, so this is still the most efficient form of appending more than one string to a base string.
This is essentially the same thing:
text = text + ( s + cr + "whatever" );
Some have wondered aloud if adding extra parentheses somewhere would make for even faster string concatenation in this final example. The answer: no, it doesn't do us any good. All we can do is change the order of the operations.
There are two goals when concatenating strings: trigger the secret "in-place append" by making our string concatenation look like a simple "x = x + y", and make as few copies of our strings as possible.
Knowing how UserTalk is processed "under the hood" is the only way to know if we're doing it right.
This little essay was originally posted on the Frontier message board, but it just took me forty-five minutes to find it so I've posted this modified version of it here.
Page last updated: 5/2/2003
is Seth Dillingham's
personal web site.
Read'em and weep, baby.