in this post i’ll be discussing how i fixed an issue in the supercollider codebase. this is the first entry in what i believe will be a long series about my experiences as a developer on the supercollider project.

for those of you who don’t know, supercollider is a music programming language like max/msp, which consists of a synthesis engine (scsynth), an interpreted language (sclang), and an IDE (scide). the issue in question, issue #2337, was that links in the help documentation browser were sometimes appearing with “%20” where there should be spaces.

i am choosing to write about this issue for two reasons. first, i think it demonstrates a time when i really knew What I Was Doing, or at least thought i did. by writing about it, i can reinforce that good behavior in myself and also reflect on what i could have done better. secondly, i think it’s an interesting example of how programming errors can become very plain once code is written and organized clearly.

this post is going to be split into four parts:

  1. archaeology. finding the responsible code, digging it out of dusty obscurity.
  2. architecture. redesigning the existing code, before attempting to change anything.
  3. fix it
  4. archive. document what has changed through tests and comments.

Archaeology

to reiterate, the original problem was that links rendered in the IDE’s help browser were appearing with %20 where there should be a space. another piece of important information that came up during initial discussion was that spaces still appeared, sometimes. however, the issue was reliably reproducible. we just couldn’t tell why it was happening with this particular set of conditions.

scide uses a custom tool called scdoc to translate from its custom help file format (.schelp) into the HTML that the help browser displays. from previous experience, i’ve learned that the place where most of this rendering happens is in a class called SCDocHTMLRenderer. so, i started my search there, scanning for methods that might be related to constructing an <a> tag.

i quickly came across a method named htmlForLink that seemed to be the culprit. it was the first method called in order to render a link object, and made several calls to relevant-sounding things like escapeSpecialChars and escapeSpacesInAnchor. here is the method in its entirety:

*htmlForLink {|link,escape=true|
    var n, m, f, c, doc;
    // FIXME: how slow is this? can we optimize
    #n, m, f = link.split($#); // link, anchor, label
    if(m.size > 0) {
        m = this.escapeSpacesInAnchor(m);
    };
    ^if ("^[a-zA-Z]+://.+".matchRegexp(link) or: (link.first==$/)) {
        if(f.size<1) {f=link};
        c = if(m.size>0) {n++"#"++m} {n};
        if(escape) { f = this.escapeSpecialChars(f) };
        "<a href=\""++c++"\">"++f++"</a>";
    } {
        if(n.size>0) {
            c = baseDir+/+n;
            doc = SCDoc.documents[n];
            // link to other doc (might not be rendered yet)
            if(doc.notNil) {
                c = c ++ ".html";
            } {
                // link to ready-made html (Search, Browse, etc)
                if(File.exists(SCDoc.helpTargetDir+/+n++".html")) {
                    c = c ++ ".html";
                } {
                    // link to other file?
                    if(File.exists(SCDoc.helpTargetDir+/+n).not) {
                        "SCDoc: In %\n"
                        "  Broken link: '%'"
                        .format(currDoc.fullPath, link).warn;
                    };
                };
            };
        } {
            c = ""; // link inside same document
        };
        if(m.size>0) { c = c ++ "#" ++ m }; // add #anchor
        if(f.size<1) { // no label
            if(n.size>0) {
                f = if(doc.notNil) {doc.title} {n.basename};
                if(m.size>0) {
                    f = f++": "++m;
                }
            } {
                f = if(m.size>0) {m} {"(empty link)"};
            };
        };
        if(escape) { f = this.escapeSpecialChars(f) };
        "<a href=\""++c++"\">"++f++"</a>";
    };
}

cryptastic. even though i thought i had found the problematic method, i had a very hard time telling, because with all due respect to the original author (who i have decided not to name), i find this code extremely difficult to read, almost intentionally so. i have a pet theory that some programmers think their code will run faster if they give their variables single-letter names, either out of affinity with C’s historical practice or simply because less words -> more speed. i know i’ve found myself thinking that subconsciously, but having recently made an effort to use long and – more importantly – descriptive variable names, i am quickly being cured of this habit.

aside from using single-letter variable names, there were a couple other things that made this code difficult to understand:

  • inconsistent spacing: spaces are sometimes placed around operators and within braces; sometimes not.
  • tight spacing: a statement like "<a href=\""++c++"\">"++f++"</a>"; looks almost like a clever regex at first. i would like it if supercollider had a more convenient string interpolation syntax, but it’s just not there, and trying to compensate for that by smushing this all together is not helping.
  • no empty lines separating paragraphs
  • deep nesting
  • multiple statements on the same line
  • overuse of end-of-line comments. i used to love end-of-line comments when i was only writing code, but experiences like this convince me that comment-before-code is always clearer for the reader except in certain canonical cases, like variable declaration.
  • return flow is unclear: the point of exit (^) is actually midway up the method, which is easy to miss at first glance.

for once, i decided to do the responsible thing while coding, and redesign the code before attempting to fix anything. not only would it help me understand what was going on; it would also help future readers.

Architecture

for this section, i’m more or less going to step through my commit history and discuss what i changed, and why i changed it.

Renaming variables

i started with the simplest and easiest improvements: whitespace and variable names. i gave all the single-letter variables more descriptive names, like linkText, linkAnchor, and linkBase. then, i reformatted the code so its use of whitespace is consistent with our WIP style guide. i kept my commits small and atomic, so that i could be sure that chaining them together would produce a logical and consistent result. i was especially careful about this, because, again, i didn’t yet understand exactly what this code was trying to do!

out of 28 commits in this file, the first nine were simply formatting and name changes. at this point, the method looked somewhat more approachable, but i’d say it was still too long for its own good:

// argument link: the raw link text from the schelp document
// argument escape: whether or not to escape special characters in the link text itself
// returns: the <a> tag HTML representation of the original `link`
*htmlForLink { |link, escape = true|
    var linkBase, linkAnchor, linkText, linkTarget, doc;
    // FIXME: how slow is this? can we optimize
    #linkBase, linkAnchor, linkText = link.split($#); // link, anchor, label
    if(linkAnchor.size > 0) {
        linkAnchor = this.escapeSpacesInAnchor(linkAnchor);
    };

    ^if("^[a-zA-Z]+://.+".matchRegexp(link) or: (link.first == $/)) {
        if(linkText.size < 1) {linkText = link};
        linkTarget = if(linkAnchor.size > 0) {linkBase ++ "#" ++ linkAnchor} {linkBase};
        if(escape) { linkText = this.escapeSpecialChars(linkText) };

        "<a href=\"" ++ linkTarget ++ "\">" ++ linkText ++ "</a>";
    } {
        if(linkBase.size>0) {
            linkTarget = baseDir +/+ linkBase;
            doc = SCDoc.documents[linkBase];

            // link to other doc (might not be rendered yet)
            if(doc.notNil) {
                linkTarget = linkTarget ++ ".html";
            } {
                // link to ready-made html (Search, Browse, etc)
                if(File.exists(SCDoc.helpTargetDir +/+ linkBase ++ ".html")) {
                    linkTarget = linkTarget ++ ".html";
                } {
                    // link to other file?
                    if(File.exists(SCDoc.helpTargetDir +/+ linkBase).not) {
                        "SCDoc: In %\n"
                        "  Broken link: '%'"
                        .format(currDoc.fullPath, link).warn;
                    };
                };
            };
        } {
            linkTarget = ""; // link inside same document
        };

        if(linkAnchor.size > 0) { linkTarget = linkTarget ++ "#" ++ linkAnchor }; // add #anchor
        if(linkText.size < 1) { // no label
            if(linkBase.size > 0) {
                linkText = if(doc.notNil) {doc.title} {linkBase.basename};
                if(linkAnchor.size > 0) {
                    linkText = linkText ++ ": " ++ linkAnchor;
                }
            } {
                linkText = if(linkAnchor.size > 0) {linkAnchor} {"(empty link)"};
            };
        };
        if(escape) { linkText = this.escapeSpecialChars(linkText) };
        "<a href=\"" ++ linkTarget ++ "\">" ++ linkText ++ "</a>";
    };
}

i like to use the “control-flow” rule of thumb for estimating code complexity – essentially, you start from the beginning with 1 point, and count another point for each if, switch, while, and so on. you also add a point for each additional test within a conditional (ands and ors). if you’ve got a score greater than 10-ish by the time you hit the end of the method, something’s up. i pulled this idea from the venerable software engineering Code Complete, by Steve McConnell.

in case you’re curious, this method’s score is 18. i already thought it could be clearer when i first looked at it, but having this quantitative measure is nice to confirm that suspicion.

Refactor, Part 1

now that the method was readable for me, i began making more significant design changes. i began by noticing that these two lines appear at the end of both branches of the top-level if block:

if(escape) { linkText = this.escapeSpecialChars(linkText) };
"<a href=\"" ++ linkTarget ++ "\">" ++ linkText ++ "</a>";

i snipped them both and brought them to the end of the method. this allowed me to fix another point of confusion by placing the return statement at the real end of the method, rather than about 1/3 of the way through.

i was a little stumped about what to do next, so i spent some time adding space and comments before every consequential paragraph of code. this was my way of talking myself through the behavior of the method. i also used logging to determine what kinds of strings were being passed to this method, and added that information to the method comment.

at this point, the method with attendant comments was almost double its size (103 lines vs. 50). this was mostly due to the comments i added. since i want to show a few more versions of the method and don’t want to make this post too long, here are the first 30 lines:

// argument link: the raw link text from the schelp document
// argument escape: whether or not to escape special characters in the link text itself
// returns: the <a> tag HTML representation of the original `link`
// Possible, non-exhaustive input types for `link`:
//   "#-decorator#decorator"
//   "#-addAction"
//   "Classes/View#-front#shown"
//   "Guides/GUI-Introduction#view"
//   "Classes/FlowLayout"
//   "#*currentDrag#drag&drop data"
//   "#Key actions"
//   "http://qt-project.org/doc/qt-4.8/qt.html#Key-enum"
*htmlForLink { |link, escape = true|
    var linkBase, linkAnchor, linkText, linkTarget, doc;
    // FIXME: how slow is this? can we optimize

    // Get the link base, anchor, and text from the original string
    #linkBase, linkAnchor, linkText = link.split($#);

    if(linkAnchor.size > 0) {
        linkAnchor = this.escapeSpacesInAnchor(linkAnchor);
    };

    // Check whether the link is a URL or a relative path (starts with a `/`),
    // NOTE: the second condition is not triggered by anything in the core library's
    // help system. I am not sure if it is safe to remove. - Brian H
    if("^[a-zA-Z]+://.+".matchRegexp(link) or: (link.first == $/)) {
        // Process a link that goes to a URL outside the help system

        // If there was no link text, set it to be the same as the original link
        if(linkText.size < 1) {linkText = link};

if you’re thinking these comments are too verbose, i agree with you. i considered them an exercise in thinking through the design, and i hoped that i wouldn’t be leaving the method in this state. but this exercise had a useful product, and that was words. those little section summaries i wrote, like “Process a link that goes to a URL outside the help system”, do me a great service. when i factor out that section into a separate method, i know exactly what to call it now. i’ll use the Words i worded while i was talking myself out of psyduck state.

i made a few more small commits at this point: i moved the declaration of doc into the smaller scope where it was actually used, and rewrote some of my comments for clarity.

Refactor, Part 2

i didn’t mention this before, but one of the things that really bugged me about the original code was its use of string size comparisons: linkText.size > 0, linkBase.size < 1. at first, i thought i could jump straight to the more meaningful isEmpty and isEmpty.not, but that proved problematic, because nil strings don’t respond to isEmpty. and, in the line:

    #linkBase, linkAnchor, linkText = link.split($#);

some of those strings can become nil if link has fewer than two #s.

nevertheless! i thought isEmpty was worth it, so my next step was to assign each of these variables to empty-string if they turned out to be nil, by adding three lines after the link.split statement i just referenced:

    #linkBase, linkAnchor, linkText = link.split($#);
    linkBase = linkBase ? "";
    linkAnchor = linkAnchor ? "";
    linkText = linkText ? "";

now i could use isEmpty and isEmpty.not without fear.

at this point, i had become familiar enough with the method that i could see how to fix it.

Fixing it

by tracking all the uses of linkAnchor in the method, i was able to quickly pinpoint what was causing the issue. we know from the issue discussion that the problem only happens if the link is (1) within the same document (i.e., not a URL), and (2) doesn’t have any link text. the pre-existing workaround for this issue was to manually add link text in the .schelp file: instead of writing link::Classes/Ringz#Interaction with sample rate:: (which renders as “Ringz: Interaction%20with%20sample%20rate”), we write link::Classes/Ringz#Interaction with sample rate#Ringz: Interaction with sample rate::. the functional difference between these two examples in the context of this method is that the linkText will be empty in the former and filled in the latter.

rewriting the method so that we only see the relevant control flow path makes the issue glaringly obvious:

*htmlForLink { |link, escape = true|
    var linkBase, linkAnchor, linkText, linkTarget, doc;

    #linkBase, linkAnchor, linkText = link.split($#);
    linkText = "";

    linkAnchor = this.escapeSpacesInAnchor(linkAnchor);
    linkTarget = baseDir +/+ linkBase;
    doc = SCDoc.documents[linkBase];
    linkTarget = linkTarget ++ ".html";
    linkTarget = linkTarget ++ "#" ++ linkAnchor;
    linkText = doc.title;
    linkText = linkText ++ ": " ++ linkAnchor;
    linkText = this.escapeSpecialChars(linkText);
    ^"<a href=\"" ++ linkTarget ++ "\">" ++ linkText ++ "</a>";
}

linkAnchor is used to generate the linkText, but after its spaces have already been escaped. and, as we expected, it takes specific and rare conditions for this line to be executed. first, there has to be exactly one # in the string passed to this method. then, the part of that string after the # has to contain spaces. the spaces are escaped because the linkAnchor is non-empty, and this space-escaped anchor is appended to the linkText when the original linkText itself is empty. i was a little surprised at how simple the problem was, but when you consider the original state of the method, it makes sense that this problem went unsolved for so long. it was hidden in plain sight.

here is the still-broken method in its entirety, with my comment pointing out the problematic block.

// argument link: the raw link text from the schelp document
// argument escape: whether or not to escape special characters in the link text itself
// returns: the <a> tag HTML representation of the original `link`
// Possible, non-exhaustive input types for `link`:
//   "#-decorator#decorator"
//   "#-addAction"
//   "Classes/View#-front#shown"
//   "Guides/GUI-Introduction#view"
//   "Classes/FlowLayout"
//   "#*currentDrag#drag&drop data"
//   "#Key actions"
//   "http://qt-project.org/doc/qt-4.8/qt.html#Key-enum"
*htmlForLink { |link, escape = true|
    var linkBase, linkAnchor, linkText, linkTarget;
    // FIXME: how slow is this? can we optimize

    // Get the link base, anchor, and text from the original string
    // Replace them with empty strings if any are nil
    #linkBase, linkAnchor, linkText = link.split($#);
    linkBase = linkBase ? "";
    linkAnchor = linkAnchor ? "";
    linkText = linkText ? "";

    if(linkAnchor.empty.not) {
        linkAnchor = this.escapeSpacesInAnchor(linkAnchor);
    };

    // Check whether the link is a URL or a relative path (starts with a `/`),
    // NOTE: the second condition is not triggered by anything in the core library's
    // help system. I am not sure if it is safe to remove. - Brian H
    if("^[a-zA-Z]+://.+".matchRegexp(link) or: (link.first == $/)) {
        // Process a link that goes to a URL outside the help system

        // If there was no link text, set it to be the same as the original link
        if(linkText.isEmpty) {linkText = link};

        // Set the link target to be the link base plus its anchor, if there was one
        linkTarget = if(linkAnchor.isEmpty.not) {linkBase ++ "#" ++ linkAnchor} {linkBase};
    } {
        // Process a link that goes to a URL within the help system

        // The document referred to by this link
        var doc = nil;

        // If the link base is non-empty, it is a link to something in another document
        if(linkBase.isEmpty.not) {

            // Compose the target as being relative to the help system base directory
            linkTarget = baseDir +/+ linkBase;

            // Find the document referred to by this link
            doc = SCDoc.documents[linkBase];

            // If this is an existing document, just add .html to get the target
            if(doc.notNil) {
                linkTarget = linkTarget ++ ".html";
            } {
                // If the document doesn't exist according to SCDoc, check the filesystem
                // to see if the link target is present
                if(File.exists(SCDoc.helpTargetDir +/+ linkBase ++ ".html")) {
                    linkTarget = linkTarget ++ ".html";
                } {
                    // If the link target doesn't exist as an HTML file, check to see if the
                    // raw filepath exists. If it does, do nothing with it -- we're done. If
                    // it doesn't, then consider this a broken link.
                    if(File.exists(SCDoc.helpTargetDir +/+ linkBase).not) {
                        "SCDoc: In %\n"
                        "  Broken link: '%'"
                        .format(currDoc.fullPath, link).warn;
                    };
                };
            };
        } {
            // If the link base was empty, we're linking within the same document, so set
            // the target to be empty. We add the anchor below.
            linkTarget = "";
        };

        // If there was an anchor, add it to the link target
        if(linkAnchor.isEmpty.not) { linkTarget = linkTarget ++ "#" ++ linkAnchor };

        // If there was no label, generate one from the base and/or anchor.
        // FIXME: the anchor's spaces have already been escaped here, which causes issue #2337.
        if(linkText.isEmpty) {

            // If the base was non-empty, generate it by combining the filename and the anchor.
            // Otherwise, if there was an anchor, use that. Otherwise, use "(empty link)"
            if(linkBase.isEmpty.not) {
                linkText = if(doc.notNil) {doc.title} {linkBase.basename};
                if(linkAnchor.isEmpty.not) {
                    linkText = linkText ++ ": " ++ linkAnchor;
                }
            } {
                linkText = if(linkAnchor.isEmpty.not) {linkAnchor} {"(empty link)"};
            };
        };
    };

    // Escape special characters in the link text if requested
    if(escape) { linkText = this.escapeSpecialChars(linkText) };

    // Return a well-formatted <a> tag using the target and link text
    ^"<a href=\"" ++ linkTarget ++ "\">" ++ linkText ++ "</a>";
}

Fixing it: post-fix refactor

i still had a bone to pick with this method. i knew i would immediately forget how it worked as soon as i unloaded it from my brain. writing this a month later, that’s exactly what happened with this oversized version of the method. i recently heard on a podcast that in the speaker’s opinion, the number of times you should have to see a block of code repeated before you pull it into its own routine should be “zero”. i took that approach here and pulled out blocks of logically related code.

through a series of incremental changes that i won’t bore you with (you can check the commit log if you really want to see), it became clear that this code really only needs to do two things:

  • create the link text
  • create the link target

everything else is conditional. there are two main points of inflection: the behavior changes depending on whether the link target is something “external” (i.e., a URL or relative path) or “internal” (within the same document or another document in the help system); the behavior also changes depending on whether escape is true or false. this second dependency is a smell in itself and indicates that some other method should probably be doing the escaping when that’s what it needs, but for now my goal is only to reduce the complexity of this method.

after playing around with the design a bit, i came up with three methods: one generates the link target for an external link; the second, the link target for an internal link; the third makes the link text for an internal link. i didn’t need to create a separate method for an external link’s link text, because that was already a one-liner.

here is the final method in all its simplicity:

*htmlForLink { |link, escape = true|
    var linkBase, linkAnchor, linkText, linkTarget;

    // Get the link base, anchor, and text from the original string
    // Replace them with empty strings if any are nil
    #linkBase, linkAnchor, linkText = link.split($#);
    linkBase = linkBase ? "";
    linkAnchor = linkAnchor ? "";
    linkText = linkText ? "";

    // Check whether the link is a URL or a relative path (starts with a `/`),
    if("^[a-zA-Z]+://.+".matchRegexp(link) or: (link.first == $/)) {
        // Process a link that goes to a URL outside the help system

        // If there was no link text, set it to be the same as the original link
        linkText = if(linkText.isEmpty) { link } { linkText };
        linkTarget = this.prLinkTargetForExternalLink(linkBase, linkAnchor);
    } {
        // Process a link that goes to a URL within the help system
        linkText = this.prLinkTextForInternalLink(linkBase, linkAnchor, linkText);
        linkTarget = this.prLinkTargetForInternalLink(linkBase, linkAnchor, link);
    };

    // Escape special characters in the link text if requested
    if(escape) { linkText = this.escapeSpecialChars(linkText) };

    // Return a well-formatted <a> tag using the target and link text
    ^"<a href=\"" ++ linkTarget ++ "\">" ++ linkText ++ "</a>";
}

this method is infinitely more graspable than the first version. even though there are all the additional comments and empty lines still there, it’s about 30 lines long – shorter than the original.

i am omitting the bodies of the private methods in this post; they contain the logic already presented in older versions of the code above. if you’re curious, you can see the final result here.

Archive

you’ll notice that in addition to solving this issue, i’ve also tried to make life easier for the next person who touches this code. i gave things clearer names, i added explanatory comments, and i documented the research that led me to the fix. and that list of possible inputs i made for myself? they became the perfect way to test out the new version of the method, to confirm that i hadn’t broken anything. there were no tests for this method previously, so i also wrote some edge-case tests to make sure that it would fail gracefully on unexpected inputs like totally empty strings. you can see those tests here.

Postscript

i’m pleased with how responsibly i approached this problem. it would have been easy to do it the way i used to – by tweaking things here and there, running the program, checking if it changed anything, leaving my comments on paper instead of in the file i’m editing. i think i can keep this up in future problem-solving.

coming back to this after a cool-off period, i see that there are some things i could have done better, too.

i could have gone farther with my refactoring, moving out the tacked-on escape parameter to the function, and possibly also moving out the final line that fits the linkTarget and linkText into an a-tag. it might even be possible to handle the logic of string-splitting more effectively. maybe i wouldn’t have had to pass so many arguments into each of my factored-out methods.

i also could have followed the conventions of our project more effectively – i don’t think i ever put my method comment header into any .schelp file, so it won’t be accessible to someone browsing the documentation. i also didn’t really consider whether i was placing my new methods in the best spot, or try to improve the speed of this method like the FIXME asked me to. given how rarely this method is called (if i recall, roughly 500 times while parsing around 1300 documents), it didn’t seem like an issue. regardless, the rendering process in SCDoc is very slow, and next time i ought to at least consider that.

regardless, i’m happy with my work here, and i received some nice compliments on the readability of my code when i made the pull request which made me feel like it wasn’t all in vain.

until next time, here’s my music suggestion: Pacific Ocean, by Dirty Beaches. i recently discovered Dirty Beaches and i’ve really been enjoying their album Stateless.

bye