Friday, June 21, 2019

Summer slurpies, three for a dollar (Variadic functions in Perl 6)

Recently on #perl6 someone came in with an interesting question, which I will merely use a spring board to a description of a particular feature of Perl 6's signatures and a larger discussion on module design.

They had a sub that looked like the following:
sub foo(Str $first, Str *@rest) { … }
And were having a bit of trouble calling it.  They code they were using to call was foo(@strings), which was of course quickly solved on the calling side by adding in a slip: foo(|@slip).  As I look back on it now, I realize that while they understood that Perl 6 needs to specially indicate slurpy parameters like many other languages, they were probably used to using Perl 5's style of defining signatures which is based P5's style of argument passing (where, in effect, all sub calls are done by passing a single array @_).  Thus in Perl 5, given the signature:
sub foo($first, @rest) { … }
One could pass an array of strings, multiple scalar strings, etc, all would be flattened, and the first one would be assigned to $first and all others (if present) would be assigned to @rest.  Because in Perl 5 you can assume your scalars and arrays get flattened, this works really well.  Because Perl 6's signatures are a bit more complicated, that same default behavior isn't as valuable.  (And actually, that Perl 5 signature would compile just fine in Perl 6 — but it would expect two, and only two, arguments on the calling side: a scalar and a positional)

Perl 6, like many languages, has slurpies.  But because lists/arrays/etc in both flavors of Perl get special status, the slurpies are a teeny bit more complicated but a lot bit more powerful/expressive.  As you may have guessed by my title, in Perl 6 there are three types of slurpies: *@pour-first, **@chug-now, +@read-the-label.  Okay, so there's a reason they don't let me write the main docs, but you'll see my names kinda make sense and I want to roll with my drinking theme (it's the end of the semester, us educators deserve a cold one this time of year).

  • *@pour-first (flattening slurpy)
    The flattening slurpy doesn't like to drink an entire bottle of liquor all at once.  It'll divide everything up into individual shots (scalars) first.  So if you pass it a scalar, a positional with three items, and another scalar, it will flatten the positional first (as if it had a slip placed in front on the calling side), and the *@pour-first will end up with five elements, all scalars.  If there are nested positionals, they will be flattened.  This matches the behavior of Perl 5.
  • **@chug-now (gulping slurpy)  
    The gulping slurpy just wants to drink.  Whatever you give it, it'll gulp down immediately, regardless whether it's a shot (scalar) or a whole bottle (array), effectively calling .push for each of the remaining arguments in the signature.  So if you pass it a scalar, a positional with three items, and another scalar, **@chug-now will have three items: a scalar, a positional, and another scalar.  This probably is closer to the behavior expected in other languages (although if they're strongly typed you have trouble mixing arrays and single elements).
  • +@read-the-label (connoisseur slurpy
    The connoisseur slurpy (I'm staking the claim on this term) has a more refined palate.  It will carefully read the label before deciding what it's going to do.  If it's passed multiple arguments, it will function exactly like the gulping slurpy: each one becomes an element of +@read-the-label, because it assumes there's a reason you've given it a shot of one drink and a bottle of another.  If the argument is a single scalar or a single positional, it will put the element(s) into an array.  This is effectively syntactical sugar for the following:
multi sub foo(  @bar) { samewith |@bar }
multi sub foo(**@bar) { ... }

Mixing the slurpies

With three different types, we need to think which ones to use when we write subs.  Each one has situations where it is ideal.  The process of deciding which one to use come a core Perl (both 5 and 6) philosophy: Do What I Mean.  Perl 5 was limited a little bit because of how arguments are passed in allowing package developers to do much on this end, but Perl 6 allows us (and really forces us) to think hard about which one to choose.

If you're just writing a quick script, it probably won't matter too much as long as it works for you.  My guess is the *@slurpy is used most by Perl 5 veterans, and they will scalarize a positional if they don't want it flattened,  while the **@slurpy will be used most for those with more experience in other languages — using the slip when they need override the default behavior and flatten, and almost no one will by default use +@slurpy.

But if you're developing a module, I'd encourage you to think through this stuff a bit more.  To help your module follow the DWIM principle, ask yourself how would I call this normally? and refactor if you start realizing you use it differently on a regular basis.  Remember that DWIM is an end-user–focused principle, not a developer-focused one (these of course overlap, a module developer is an end-user with respect to core development, for instance).

For example, if you have a function that can only work on scalar values, you almost certainly want to *@pour-first.  This way, it doesn't matter how someone passes stuff to you, it will just work.  For example, let's say your function pings a bunch of IP addresses.  We can safely assume that if someone passes an array, they intend for us to use the IP address inside of it.  As a result, the following is a good example of DWIM:
sub ping(*@ip-addresses) {
  send-pingy-message $_ for @ip-addresses
}

ping($home, coworkers-on-duty(), $office, @family)
No matter what weird combination and arrangement of IPs they throw, it will just work, they'll be happy, so you'll be happy.

But often times you will want to treat those differently.  This is particularly common when handling nested things is either important (and must be specially handled) or needs to not be handled at all.  Imagine a debugging module has a sub that reports back the type of each object it's passed.  If I pass a list, I expect it to say List, not Str Str Str because the list had three strings.  So for it to DWIM, we need:
sub report(**@objects) {
  .WHAT.say for @objects
}

report($foo, @bar, $xyz)
If it had used *@objects, then I would get far more than just the three that I, the user, expect, I would get frustrated and submit issues on github and then you'd get frustrated. But we used **@objects, so the user is happy and you the developer are happy.  Yay.

Lastly, there may be situations where I expect for the end user to create their own list of stuff and they will pass that directly.  These are less common but when they come up, it's really nice to not force them to need to use a slip.  However, this is only useful when if they're not constructing their own positional, we assume they will give us more than one variable.  You might ask what the logic is there, but actually this is exactly how loop structures work.  When you say :
my @array = 1, 2, 3;
for @array { .say }  # outputs 1  2  3  (three lines)
you assume that each element of the array will be iterated over (but it won't be flattened if nested).  If you pass a scalar, it will do a single loop with that value:
for 42 { .say }   # outputs 42
Granted, that's functionally identical to given, but hey, it works.  If we pass a list of scalars, they are iterated on:
for 1,2,3 { .say }   # outputs 1  2  3  (three lines)
If we passed something mixed like 1,2,<a b c>,3, I think most of us intuitively know that the <a b c> shouldn't be separated iterated, and should be considered a single value just like the numbers.  And that's how it is:
for 1,2,<a b c>,3 { .say }   # outputs 1  2  (a b c)  3  (four lines)
It's a bit crazy just how intuitive the use case is (you've probably never thought about that before!).  But if you develop a module, there's a chance that this is what simply makes the most sense from the standpoint of who'll use your code and it's good to know that it exists.

Conclusion

As a a typical Perl 6 user, we rarely have to think about when we pass along scalars or arrays.  That's because the core of Perl 6 has already thought out extensively how we'll most likely use things.  But as more modules get written, I'd encourage module developers to (a) allow slurpies where they make sense (helps keep my code clean) and (b) give the same level of care and attention to their slurpies as the Perl 6 core team did.   For anyone that's not developing code for others, I hope you've learned something and can stop and appreciate some of the work done by people far smarter than me when stuff just works.

Appendix: Slurpy quick reference

Here's a little table to show you what the value of $bar/@bar will be when you pass the following variables (quotation marks removed for clarity)
my $a   = 'a'; 
my $b   = 'b';
my @ijk = 'i', ('j', 'k');
my @xy  = 'x',  'y';
 f($a)  f($a,$b)  f(@xy)  f($a,@xy)  f($a,@ijk,@xy) 
f($bar)
a
(x,y)*
f(@bar)
(a)
(x,y)*
f(*@bar)
(a)
(a,b)
(x,y)
(a,x,y)
(a,i,j,k,x,y)
f(**@bar)
(a)
(a,b)
((x,y))
(a,(x,y))
(a,(i,(j,k)),(x,y))
f(+@bar)
(a)
(a,b)
(x,y)
(a,(x,y))
(a,(i,(j,k)),(x,y))

Note: red m-dashes () result in errors.  Arrays with a green asterisk (*) are passed as is, rather than being new arrays.  Nested arrays differentiated by color for clarity.