Friday, June 21, 2019

Summer slurpies, three for a dollar (Variadic functions in Perl 6)

Recently on #perl6 someone came in with an interesting question, which I will merely use a spring board to a description of a particular feature of Perl 6's signatures and a larger discussion on module design.

They had a sub that looked like the following:
sub foo(Str $first, Str *@rest) { … }
And were having a bit of trouble calling it.  They code they were using to call was foo(@strings), which was of course quickly solved on the calling side by adding in a slip: foo(|@slip).  As I look back on it now, I realize that while they understood that Perl 6 needs to specially indicate slurpy parameters like many other languages, they were probably used to using Perl 5's style of defining signatures which is based P5's style of argument passing (where, in effect, all sub calls are done by passing a single array @_).  Thus in Perl 5, given the signature:
sub foo($first, @rest) { … }
One could pass an array of strings, multiple scalar strings, etc, all would be flattened, and the first one would be assigned to $first and all others (if present) would be assigned to @rest.  Because in Perl 5 you can assume your scalars and arrays get flattened, this works really well.  Because Perl 6's signatures are a bit more complicated, that same default behavior isn't as valuable.  (And actually, that Perl 5 signature would compile just fine in Perl 6 — but it would expect two, and only two, arguments on the calling side: a scalar and a positional)

Perl 6, like many languages, has slurpies.  But because lists/arrays/etc in both flavors of Perl get special status, the slurpies are a teeny bit more complicated but a lot bit more powerful/expressive.  As you may have guessed by my title, in Perl 6 there are three types of slurpies: *@pour-first, **@chug-now, +@read-the-label.  Okay, so there's a reason they don't let me write the main docs, but you'll see my names kinda make sense and I want to roll with my drinking theme (it's the end of the semester, us educators deserve a cold one this time of year).

  • *@pour-first (flattening slurpy)
    The flattening slurpy doesn't like to drink an entire bottle of liquor all at once.  It'll divide everything up into individual shots (scalars) first.  So if you pass it a scalar, a positional with three items, and another scalar, it will flatten the positional first (as if it had a slip placed in front on the calling side), and the *@pour-first will end up with five elements, all scalars.  If there are nested positionals, they will be flattened.  This matches the behavior of Perl 5.
  • **@chug-now (gulping slurpy)  
    The gulping slurpy just wants to drink.  Whatever you give it, it'll gulp down immediately, regardless whether it's a shot (scalar) or a whole bottle (array), effectively calling .push for each of the remaining arguments in the signature.  So if you pass it a scalar, a positional with three items, and another scalar, **@chug-now will have three items: a scalar, a positional, and another scalar.  This probably is closer to the behavior expected in other languages (although if they're strongly typed you have trouble mixing arrays and single elements).
  • +@read-the-label (connoisseur slurpy
    The connoisseur slurpy (I'm staking the claim on this term) has a more refined palate.  It will carefully read the label before deciding what it's going to do.  If it's passed multiple arguments, it will function exactly like the gulping slurpy: each one becomes an element of +@read-the-label, because it assumes there's a reason you've given it a shot of one drink and a bottle of another.  If the argument is a single scalar or a single positional, it will put the element(s) into an array.  This is effectively syntactical sugar for the following:
multi sub foo(  @bar) { samewith |@bar }
multi sub foo(**@bar) { ... }

Mixing the slurpies

With three different types, we need to think which ones to use when we write subs.  Each one has situations where it is ideal.  The process of deciding which one to use come a core Perl (both 5 and 6) philosophy: Do What I Mean.  Perl 5 was limited a little bit because of how arguments are passed in allowing package developers to do much on this end, but Perl 6 allows us (and really forces us) to think hard about which one to choose.

If you're just writing a quick script, it probably won't matter too much as long as it works for you.  My guess is the *@slurpy is used most by Perl 5 veterans, and they will scalarize a positional if they don't want it flattened,  while the **@slurpy will be used most for those with more experience in other languages — using the slip when they need override the default behavior and flatten, and almost no one will by default use +@slurpy.

But if you're developing a module, I'd encourage you to think through this stuff a bit more.  To help your module follow the DWIM principle, ask yourself how would I call this normally? and refactor if you start realizing you use it differently on a regular basis.  Remember that DWIM is an end-user–focused principle, not a developer-focused one (these of course overlap, a module developer is an end-user with respect to core development, for instance).

For example, if you have a function that can only work on scalar values, you almost certainly want to *@pour-first.  This way, it doesn't matter how someone passes stuff to you, it will just work.  For example, let's say your function pings a bunch of IP addresses.  We can safely assume that if someone passes an array, they intend for us to use the IP address inside of it.  As a result, the following is a good example of DWIM:
sub ping(*@ip-addresses) {
  send-pingy-message $_ for @ip-addresses
}

ping($home, coworkers-on-duty(), $office, @family)
No matter what weird combination and arrangement of IPs they throw, it will just work, they'll be happy, so you'll be happy.

But often times you will want to treat those differently.  This is particularly common when handling nested things is either important (and must be specially handled) or needs to not be handled at all.  Imagine a debugging module has a sub that reports back the type of each object it's passed.  If I pass a list, I expect it to say List, not Str Str Str because the list had three strings.  So for it to DWIM, we need:
sub report(**@objects) {
  .WHAT.say for @objects
}

report($foo, @bar, $xyz)
If it had used *@objects, then I would get far more than just the three that I, the user, expect, I would get frustrated and submit issues on github and then you'd get frustrated. But we used **@objects, so the user is happy and you the developer are happy.  Yay.

Lastly, there may be situations where I expect for the end user to create their own list of stuff and they will pass that directly.  These are less common but when they come up, it's really nice to not force them to need to use a slip.  However, this is only useful when if they're not constructing their own positional, we assume they will give us more than one variable.  You might ask what the logic is there, but actually this is exactly how loop structures work.  When you say :
my @array = 1, 2, 3;
for @array { .say }  # outputs 1  2  3  (three lines)
you assume that each element of the array will be iterated over (but it won't be flattened if nested).  If you pass a scalar, it will do a single loop with that value:
for 42 { .say }   # outputs 42
Granted, that's functionally identical to given, but hey, it works.  If we pass a list of scalars, they are iterated on:
for 1,2,3 { .say }   # outputs 1  2  3  (three lines)
If we passed something mixed like 1,2,<a b c>,3, I think most of us intuitively know that the <a b c> shouldn't be separated iterated, and should be considered a single value just like the numbers.  And that's how it is:
for 1,2,<a b c>,3 { .say }   # outputs 1  2  (a b c)  3  (four lines)
It's a bit crazy just how intuitive the use case is (you've probably never thought about that before!).  But if you develop a module, there's a chance that this is what simply makes the most sense from the standpoint of who'll use your code and it's good to know that it exists.

Conclusion

As a a typical Perl 6 user, we rarely have to think about when we pass along scalars or arrays.  That's because the core of Perl 6 has already thought out extensively how we'll most likely use things.  But as more modules get written, I'd encourage module developers to (a) allow slurpies where they make sense (helps keep my code clean) and (b) give the same level of care and attention to their slurpies as the Perl 6 core team did.   For anyone that's not developing code for others, I hope you've learned something and can stop and appreciate some of the work done by people far smarter than me when stuff just works.

Appendix: Slurpy quick reference

Here's a little table to show you what the value of $bar/@bar will be when you pass the following variables (quotation marks removed for clarity)
my $a   = 'a'; 
my $b   = 'b';
my @ijk = 'i', ('j', 'k');
my @xy  = 'x',  'y';
 f($a)  f($a,$b)  f(@xy)  f($a,@xy)  f($a,@ijk,@xy) 
f($bar)
a
(x,y)*
f(@bar)
(a)
(x,y)*
f(*@bar)
(a)
(a,b)
(x,y)
(a,x,y)
(a,i,j,k,x,y)
f(**@bar)
(a)
(a,b)
((x,y))
(a,(x,y))
(a,(i,(j,k)),(x,y))
f(+@bar)
(a)
(a,b)
(x,y)
(a,(x,y))
(a,(i,(j,k)),(x,y))

Note: red m-dashes () result in errors.  Arrays with a green asterisk (*) are passed as is, rather than being new arrays.  Nested arrays differentiated by color for clarity.

Friday, May 31, 2019

Perl 6's given (switch on steroids)

Sometimes the best features of Perl — and even more so of Perl 6 — are found by asking yourself, “I wonder if this’ll work?” and then finding that yes, yes it does.

Many languages have a type of switch statement that lets us avoid having a long list of if-then-else.  In JavaScript, we can tidy up things from this:
if (partOfSpeech == 'noun') {
  doNounStuff();
} else if (partOfSpeech == 'verb') { 
  doVerbStuff();
} else if (partOfSpeech == 'adjective') { 
  doAdjectiveStuff();
} else if (partOfSpeech == 'adverb') {
  doAdverbStuff;
}
To this:
switch (partOfSpeech) {
  case 'noun':
    doNounStuff();
    break;
  case 'verb':
    doVerbStuff();
    break;
  case 'adjective':
    doAdjectiveStuff();
    break;
  case 'adverb':
    doAdjectiveStuff();
    break;
}
Now, because of the way that switches work in many other languages, we aren't necessarily less wordy, because we have to include break (unless we want to flow through, which I generally find to be the exception and not the rule, but YMMV). In Perl 6, the switch statement doesn’t exist and instead there is a much more powerful given statement that has all the functionality of other language's switches, plus more. I'm going to focus, however, on its use as a switcher. To write the above Javascript code in Perl 6, we get:
given $part-of-speech {
  when 'noun'      { do-noun-stuff      }
  when 'verb'      { do-verb-stuff      }
  when 'adjective' { do-adjective-stuff }
  when 'adverb'    { do-adverb-stuff    }
}
Now let’s imagine a situation where we might want to use a switch statement inside of another switch statement. For this, we’ll consider needing to determine a greeting message in Spanish oriented towards a given audience. The catch is, our audience could be one person or multiple, we may need to treat them informally or formally, and they may be guys, girls, or a mix thereof. All of these factor into the message we need to give them. Let’s see how we could set our greeting message in JavaScript:
switch (audience.number) {
  case 'singular':
    switch (audience.gender) {
      case 'masculine':
        switch (audience.formality) {
          case 'informal':
            message = '¿Cómo estás mi amigo?';
            break;
          case 'formal':
            message = '¿Cómo está el señor?';
            break;
        }
        break;
      case 'feminine':
        switch (audience.formality) {
          case 'informal':
            message = '¿Cómo estás mi amiga?';
            break;
          case 'formal':
            message = '¿Cómo está la señora?';
            break;
        }
        break;
    break;
  case 'plural':
    switch (audience.gender) {
      case 'masculine':
      case 'mixed':
        switch (audience.formality) {
          case 'informal':
            message = '¿Cómo estáis mis amigos?';
            break;
          case 'formal':
            message = '¿Cómo están los señores?';
            break;
        }
        break;
      case 'feminine':
        switch (audience.formality) {
          case 'informal':
            message = '¿Cómo estáis mis amigas?';
            break;
          case 'formal':
            message = '¿Cómo están las señoras?';
            break;
        }
        break;
    break;
YIKES! Now in the case of Spanish, because many of the oppositions are binary, one could argue that an if-then-else might be cleaner or even with some nested ternary operators, but it still doesn’t generate anything remotely as clean as the Perl 6 code:
given $audience.number, $audience.gender, $audience.formality {
  when 'singular', 'masculine', 'informal' { my $message = '¿Cómo estás mi amigo?'    }
  when 'singular', 'masculine',   'formal' { my $message = '¿Cómo está el señor?'     }
  when 'singular', 'feminine',  'informal' { my $message = '¿Cómo estás mi amiga?'    }
  when 'singular', 'feminine',    'formal' { my $message = '¿Cómo estás la señora?'   }
  when 'plural',   'masculine', 'informal' { my $message = '¿Cómo estáis mis amigos?' }
  when 'plural',   'masculine',   'formal' { my $message = '¿Cómo están los señores?' }
  when 'plural',   'feminine',  'informal' { my $message = '¿Cómo estáis mis amigas?' }
  when 'plural',   'feminine',    'formal' { my $message = '¿Cómo están las señores?' }
}
I don’t think there’s much question that the Perl 6 version is much more readable and — importantly — maintainable. You can very quickly see which attributes apply to which message. Even more nicely (one of the powerful things of given) is that the given block can return data, so we can greatly simplify the assignment to:
my $message = do given $audience.number, $audience.gender, $audience.formality {
  when 'singular', 'masculine', 'informal' { '¿Cómo estás mi amigo?'    }
  when 'singular', 'masculine',   'formal' { '¿Cómo está el señor?'     }
  when 'singular', 'feminine',  'informal' { '¿Cómo estás mi amiga?'    }
  when 'singular', 'feminine',    'formal' { '¿Cómo estás la señora?'   }
  when 'plural',   'masculine', 'informal' { '¿Cómo estáis mis amigos?' }
  when 'plural',   'masculine',   'formal' { '¿Cómo están los señores?' }
  when 'plural',   'mixed',     'informal' { '¿Cómo estáis mis amigos?' }
  when 'plural',   'mixed',       'formal' { '¿Cómo están los señores?' }
  when 'plural',   'feminine',  'informal' { '¿Cómo estáis mis amigas?' }
  when 'plural',   'feminine',    'formal' { '¿Cómo están las señores?' }
}
So, what‘s happening here? In Perl 6, given can take one or more arguments and will try to match each of them to the values for each when block. But we can take advantage of some of some other features of Perl 6 to do some cooler stuff. Both the masculine and mixed messages are the same when plural, so we can use a junction to simplify it:
my $message = do given $audience.number, $audience.gender, $audience.formality {
  ...
  when 'plural',   'masculine'|'mixed', 'informal' { '¿Cómo estáis mis amigos?' }
  when 'plural',   'masculine'|'mixed',   'formal' { '¿Cómo están los señores'  }
  ...
}
Now those two messages are given when the gender is either masculine or mixed. This can be particularly powerful if you have a large number of values in an array which might not even be possible to mimic in other languages in a switch:
given $baby-name {
  when @cool-names.any   { say 'That’s a cool name for a kid!'    }
  when @wtf-names.any    { say 'They’re going to hate that name!' }
  default                { say 'Eh, I guess it’s alright.'        }
}
One thing I mentioned before is maintainability. Let's imagine for a second that you had written the message select code. Your boss comes to you and says “Hey, actually, we really don’t like using señores/señoras for the formal plural. How about just using ustedes since it's nice and neutral?”. You might think that we need to use a large junction, but if for any when block we don’t care about a value, we can use a Whatever star:
my $message = do given $audience.number, $audience.gender, $audience.formality {
  when 'singular', 'masculine',         'informal' { '¿Cómo estás mi amigo?'    }
  when 'singular', 'masculine',           'formal' { '¿Cómo está el señor?'     }
  when 'singular', 'feminine',          'informal' { '¿Cómo estás mi amiga?'    }
  when 'singular', 'feminine',            'formal' { '¿Cómo está la señora?'   }
  when 'plural',   'masculine'|'mixed', 'informal' { '¿Cómo estáis mis amigos?' }
  when 'plural',   'feminine',          'informal' { '¿Cómo estáis mis amigas?' }
  when 'plural',    *,                    'formal' { '¿Cómo están ustedes?'     }
}
Nice and simple, and the star really emphasizes that we don’t care what the value is there. The star can actually be even more powerful. What if $audience.number gave us an actual number, rather than the grammatical number? That's no problem at all!
my $message = do given $audience.number, $audience.gender, $audience.formality {
  when     1, 'masculine',         'informal' { '¿Cómo estás mi amigo?'    }
  when     1, 'masculine',           'formal' { '¿Cómo está el señor?'     }
  when     1, 'feminine',          'informal' { '¿Cómo estás mi amiga?'    }
  when     1, 'feminine',            'formal' { '¿Cómo está la señora?'   }
  when * > 1, 'masculine'|'mixed', 'informal' { '¿Cómo estáis mis amigos?' }
  when * > 1, 'feminine',          'informal' { '¿Cómo estáis mis amigas?' }
  when * > 1,  *,                    'formal' { '¿Cómo están ustedes?'     }
}
For the coup de grâce, what if your boss came to you and said, “Now some of our users have complained that they’re getting messages using vosotros, but they don’t use that in Latin American.  Can you give a different message depending on the country?”.  In many other languages, that could mean adding a lot of extra blocks!  Instead, we can just add a new attribute to $audience for their country of origin and in just a few seconds, we have it up and running:
my @vosotros = <Spain EquitorialGuinea WesternSahara>;
my $message = do given $audience.number, $audience.gender, $audience.formality, $audience.country {
  when     1, 'masculine',         'informal', *              { '¿Cómo estás mi amigo?'    }
  when     1, 'masculine',           'formal', *              { '¿Cómo está el señor?'     }
  when     1, 'feminine',          'informal', *              { '¿Cómo estás mi amiga?'    }
  when     1, 'feminine',            'formal', *              { '¿Cómo está la señora?'    }
  when * > 1, 'masculine'|'mixed', 'informal', @vosotros.any  { '¿Cómo estáis mis amigos?' }
  when * > 1, 'feminine',          'informal', @vosotros.any  { '¿Cómo estáis mis amigas?' }
  when * > 1,  *,                          * , *              { '¿Cómo están ustedes?'     }
}
You might think we need to use @vosotros.none to catch the countries that don’t use that (though it’s easy to see when you might want to do something like that), but just like with switch blocks in other languages, the whens are evaluated in order, so if the way the masculine / mixed junction adds a bunch of space bothers you as much as it does me, we could actually simply this even further to:
my @vosotros = <Spain EquitorialGuinea WesternSahara>;
my $message = do given $audience.number, $audience.gender, $audience.formality, $audience.country {
  when     1, 'masculine', 'informal', *              { '¿Cómo estás mi amigo?'    }
  when     1, 'masculine',   'formal', *              { '¿Cómo está el señor?'     }
  when     1, 'feminine',  'informal', *              { '¿Cómo estás mi amiga?'    }
  when     1, 'feminine',    'formal', *              { '¿Cómo está la señora?'    }
  when * > 1, 'feminine',  'informal', @vosotros.any  { '¿Cómo estáis mis amigas?' }
  when * > 1,  *,          'informal', @vosotros.any  { '¿Cómo estáis mis amigos?' }
  when * > 1,  *,                  * , *              { '¿Cómo están ustedes?'     }
}
Imagine the monstrosity of a codeblock that the above code would be in many other languages, which would furthermore obscure the purpose of the code. And yet here the conditions for each message can be very cleanly and clearly spelled out making maintenance a breeze.

So what's the lesson here? If you're thinking about having a lot of embedded given or even if-then-else blocks where the same values are being checked, using multiple argument given statements in Perl 6 can save you a LOT of hassle when refactoring and can also make your code be infinitely more readable.

Edit

I also forgot that the when can be postfixed, meaning we can get rid of our brackets (and some of that unruly space for the @vosotros.any):
my @vosotros = <Spain EquitorialGuinea WesternSahara>;
my $message = do given $audience.number, $audience.gender, $audience.formality, $audience.country {
  '¿Cómo estás mi amigo?'     when     1, 'masculine', 'informal', *;
  '¿Cómo está el señor?'      when     1, 'masculine',   'formal', *;
  '¿Cómo estás mi amiga?'     when     1, 'feminine',  'informal', *;
  '¿Cómo está la señora?'     when     1, 'feminine',    'formal', *;
  '¿Cómo estáis mis amigas?'  when * > 1, 'feminine',  'informal', @vosotros.any;
  '¿Cómo estáis mis amigos?'  when * > 1,  *,          'informal', @vosotros.any;
  '¿Cómo están ustedes?'      when * > 1,  *,                  * , *;
}
This to me is the best because it visually puts the actually messages right next to the variable $message. Clear purpose, and devilishly easy to add new messages with different conditions.