Planet Raku

Raku RSS Feeds

Elizabeth Mattijsen (Libera: lizmat #raku) / 2023-09-26T09:19:21


Elizabeth Mattijsen: Raku Blog Posts 2023.39

Published by Elizabeth Mattijsen on 2023-09-25T12:15:25

Paweł bbkr Pabian completed their series about Unicode.

Elizabeth Mattijsen reports on all recent developments around Rakudo, an implementation of the Raku Programming Language.

Rakudo Weekly News: 2023.39 Releaseses

Published by Elizabeth Mattijsen on 2023-09-25T11:55:06

Justin DeVuyst has released the 2023.09 release of the Rakudo Compiler, Haytham Elganiny has released a new version of the Pakku package manager, and two authors published their first module on the Raku Ecosystem this week. Not to mention 15 other authors providing updates to existing modules and two new modules! A releasy week for sure!

A small program

Matt R has published a video about One Small Program in 15 Languages, and the Raku Programming Language is one of them!

Paweł’s Corner

Paweł bbkr Pabian completed their “Introduction to UTF” series of blog posts with:

Konstantin’s Corner

Konstantin Narkhov has posted a video about another of their pet projects: Ethelia (/r/rakulang comments).

Weeklies

Weekly Challenge #236 is available for your perusal.

New Problem Solving Issues

New Pull Requests

Core Developments

In RakuAST developments this week:

Meanwhile on Mastodon

Meanwhile on the mailing list

Questions about Raku

Comments

New Raku Modules

Updated Raku Modules

Winding down

Actually WordPress tells me that this is the 200th instalment of the Rakudo Weekly News. Counting is always hard! Meanwhile, Слава Україні!  Героям слава!

Please keep staying safe and healthy, and keep up the good work!

If you like what I’m doing, committing to a small sponsorship would mean a great deal!

Paweł bbkr Pabian: UTF-8 series wrap up

Published by Paweł bbkr Pabian on 2023-09-24T21:31:07

WoW

What a long, strange trip it's been. Idea to "quickly blog about some Unicode basics" grew into 17 posts monster series :)

Special thanks go to:

SeeYa

Paweł bbkr Pabian: UTF-8 in MySQL

Published by Paweł bbkr Pabian on 2023-09-24T21:10:06

This series is supposed to be focused on technical aspects of Unicode and I do not plan to analyze UTF support in various technologies. However for MySQL I want to make an exception, because I've seen countless examples of misunderstanding its concepts and falling into traps.

Character Set vs Collation

CREATE TABLE `foo` (
    `bar` VARCHAR(32)
) Engine=InnoDB
  CHARACTER SET utf8mb4
  COLLATE utf8mb4_0900_ai_ci;

Later I will explain what those cryptic names mean.

Property inheritance

Character set and/or collation can be specified on 7 (yes, seven!) different levels.

MySQL > SELECT @@character_set_server, @@collation_server;
+------------------------+--------------------+
| @@character_set_server | @@collation_server |
+------------------------+--------------------+
| utf8mb4                | utf8mb4_0900_ai_ci |
+------------------------+--------------------+

Those are your global settings that will be used when creating databases. So:

CREATE DATABASE `test`;

Is the same as:

CREATE DATABASE `test` CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

Those server settings are copied when database is created, so changing server settings later will not affect your databases.

CREATE DATABASE `test` CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

It is just another level of default, this time applied to created tables. So:

CREATE TABLE `foo` (
    `bar` VARCHAR(32)
) Engine=InnoDB;

Is the same as:

CREATE TABLE `foo` (
    `bar` VARCHAR(32)
) Engine=InnoDB
  CHARACTER SET utf8mb4
  COLLATE utf8mb4_0900_ai_ci;

And just like server settings those are also copied when tables are created. Altering database with ALTER DATABASE test CHARACTER SET xxx COLLATE yyy will not alter tables in this database.

You can check currently used database character set and collation either from variables:

MySQL [test]> SELECT @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8mb4                  | utf8mb4_0900_ai_ci   |
+--------------------------+----------------------+

Or from information schema:

MySQL [test]> SELECT `default_character_set_name`, `default_collation_name`
    FROM `information_schema`.`schemata`
    WHERE `schema_name` = 'test';
+----------------------------+------------------------+
| DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
+----------------------------+------------------------+
| utf8mb4                    | utf8mb4_0900_ai_ci     |
+----------------------------+------------------------+

Note the inconsistency - DEFAULT_CHARACTER_SET_NAME in information schema, but character_set_database in variable and CHARACTER SET in create.

CREATE TABLE `foo` (
    `bar` VARCHAR(32)
) Engine=InnoDB
  CHARACTER SET utf8mb4
  COLLATE utf8mb4_0900_ai_ci;

It is - you guessed it - another level of defaults applied to columns. So:

CREATE TABLE `foo` (
    `bar` VARCHAR(32)
) Engine = InnoDB;

Is the same as:

CREATE TABLE `foo` (
    `bar` VARCHAR(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci
) Engine = InnoDB;

And just like database settings those are also copied when columns are created. Altering table with ALTER TABLE foo CHARACTER SET xxx COLLATE yyy will not alter columns in this table.

However this time tool is available for convenient conversion - ALTER TABLE foo CONVERT TO CHARACTER SET xxx COLLATE yyy will alter both table defaults and columns in this table.

You can check table collation in information schema:

MySQL [test]> SELECT `table_collation`
    FROM `information_schema`.`tables`
    WHERE `table_schema` = 'test'
        AND `table_name` = 'foo';
+--------------------+
| TABLE_COLLATION    |
+--------------------+
| utf8mb4_0900_ai_ci |
+--------------------+

Note another inconsistency - this time TABLE_COLLATION implies character set, which is not given explicitly. Also it inconsistent with database level naming, being a default but missing DEFAULT_ prefix.

If you want to retrieve implied character set there is another information schema resource to do so:

MySQL > SELECT `character_set_name`
    FROM `information_schema`.`character_sets`
    WHERE `default_collate_name` = 'utf8mb4_0900_ai_ci';
+--------------------+
| CHARACTER_SET_NAME |
+--------------------+
| utf8mb4            |
+--------------------+

Finally, this is the "true" thing. That is how data will be stored and sorted. Server, database and table levels were only the defaults used for column creation.

You can check column character set and collation from information schema:

MySQL [test]> SELECT `character_set_name`, `collation_name`
    FROM `information_schema`.`columns`
    WHERE `table_schema` = 'test'
        AND `table_name` = 'foo'
        AND `column_name` = 'bar';
+--------------------+--------------------+
| CHARACTER_SET_NAME | COLLATION_NAME     |
+--------------------+--------------------+
| utf8mb4            | utf8mb4_0900_ai_ci |
+--------------------+--------------------+

Yes, you can have different character sets and collations within single table:

CREATE TABLE `foo` (
    `bar` VARCHAR(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
    `baz` VARCHAR(32) CHARACTER SET latin1 COLLATE latin1_general_ci
) Engine = InnoDB;

I will give examples when it may be useful once all those cryptic names are explained.

My advice is: always provide character set and collation when creating databases, tables and columns. I've seen this too many times - developers adding tables without checking which character set and collation will be inherited from database. Or adding columns without checking which character set and collation will be inherited from table. Being more explicit = having less headache later.

MySQL > SELECT @@character_set_system;
+------------------------+
| @@character_set_system |
+------------------------+
| utf8mb3                |
+------------------------+

This is global character set for metadata. It tells what characters you can use in schema names:

CREATE TABLE `łąka` (
    `bąki` int unsigned
) Engine = InnoDB

It is not part of inheritance chain Server -> Database -> Table -> Column.

MySQL > SELECT @@character_set_connection, @@collation_connection;
+----------------------------+------------------------+
| @@character_set_connection | @@collation_connection |
+----------------------------+------------------------+
| utf8mb4                    | utf8mb4_general_ci     |
+----------------------------+------------------------+

Those are wire protocol information. Character set tells meaning of transferred data, for example 0xF0 0x9F 0x98 0x8A sent or received means 😊. Collation will be used for comparing/sorting data not derived from any column, for example bare SELECT 'A' = 'a'.

Connection and Column character set may not be aligned, but it will fail if Connection wire protocol cannot transfer code points encoded in Columns. Best practice is to always use utf8mb4.

SELECT *
FROM `foo`
ORDER BY `bar` COLLATE utf8mb4_estonian_ci;

You can override default column collation for ordering / grouping within SELECT query. This is useful when different alphabets sorts the same characters differently.

MySQL > CREATE TABLE `collation_test` (`data` text) Engine = InnoDB;

MySQL > INSERT INTO `collation_test` (`data`)
    VALUES ("A"), ("Ä"), ("Z");

MySQL > SELECT *
    FROM `collation_test`
    ORDER BY `data` COLLATE utf8mb4_sv_0900_as_cs;
+------+
| data |
+------+
| A    |
| Z    |
| Ä    |
+------+

MySQL > SELECT *
    FROM `collation_test`
    ORDER BY `data` COLLATE utf8mb4_es_0900_as_cs;
+------+
| data |
+------+
| A    |
| Ä    |
| Z    |
+------+

Character set utf8 vs utf8mb4

MySQL cheated in the past. They added character set utf8 but it was capable only of handling up to 3 byte code points.

MySQL [test]> CREATE TABLE `foo` ( `bar` CHAR(1) )
    Engine = InnoDB
    CHARACTER SET = utf8;

MySQL [test]> INSERT INTO `foo` (`bar`) VALUES ('😊');
ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x98\x8A' for column 'bar' at row 1

They did it however in good faith - back then 4 byte code points were not used. Indexes are constructed in such a way, that they must assume maximum byte length of a string. Maximum supported index byte length was 767 bytes, which allowed to index columns up to CHAR(255) - because 255*3=765 was fitting into index. For 4 byte code points maximum indexable column would be only CHAR(191).

Later MySQL added utf8mb4 character set capable of storing proper 4 byte code points. Legacy utf8 was aliased as utf8mb3. Default maximum supported index byte length was also extended in MySQL 8 to 3072 bytes, allowing to index columns up to VARCHAR(768).

Today MySQL tries to fix this technical debt, and if you specify character set as utf8 you will get following warning: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous.

But how to index longer UTF-8 columns? Common trick is to use hash indexing:

CREATE TABLE `foo` (
    `bar` varchar(1000),
    `bar_hash` CHAR(32),
    KEY (`bar_hash`)
) ENGINE = InnoDB
    CHARACTER SET = utf8mb4
    COLLATE = utf8mb4_0900_ai_ci;

CREATE TRIGGER `foo_insert`
BEFORE INSERT ON `foo`
FOR EACH ROW SET NEW.`bar_hash` = MD5( WEIGHT_STRING( NEW.`bar` ) );

CREATE TRIGGER `foo_update`
BEFORE UPDATE ON `foo`
FOR EACH ROW SET NEW.`bar_hash` = MD5( WEIGHT_STRING( NEW.`bar` ) );

Function WEIGHT_STRING is super useful, because it converts text to format used by collation. Function MD5 reduces too long texts always to 32 bytes HEX representation.

Now you can for example create UNIQUE KEY on column bar_hash or use it in query:

SELECT *
FROM `foo`
WHERE `bar_hash` = MD5( WEIGHT_STRING( 'looked up text' ) );

Collation utf8mb4_0900_ai_ci

MySQL 8 did huge cleanup in collation naming. utf8mb4_0900_ai_ci means that it is collation of 4 byte UTF-8 done by Unicode 9.0 standard in accent (diacritic) insensitive and case insensitive manner.

It does not mean that database cannot store characters from Unicode version 10 onward. As I explained previously UTF-8 is designed in such a way, that storage is independent from versioning. Just comparison rules from Unicode version 9.0 will be used. That pretty much means recent ones, because almost nothing new was declared in this aspect later.

Accent / case insensitivity is up to you to decide. Basically you have 3 options:

Remember that you can mix them. For example unique column for login may have collation utf8mb4_0900_ai_ci so Józef, józef and jozef are treated as the same user. While column hobby may have collation utf8mb4_0900_as_ci because baki (fuel tanks) and bąki (bumble bees) are not the same.

You can list all utf8mb4 related collations by following query:

SHOW COLLATION WHERE Charset = 'utf8mb4';

Best practice is to stick with utf8mb4_0900_* set and avoid alphabet specific collations in columns. For example if you know your user is from Poland you can always use more friendly collation in query, ignoring column one:

SELECT `name`
FROM `products`
ORDER BY `name` COLLATE utf8mb4_pl_0900_ai_ci

Also avoid legacy collations like utf8mb4_general_ci, use only those with *_0900_* within name.

Triggers, Procedures, Functions

Things are weird for triggers, because they inherit character set and collation from... definer's connection. I won't go much into details here because it rarely bites the developer. Just remember to also drop / create them if you are migrating from old databases to new character set and collation. For full description of consequences read MySQL 5.1.21 change log.

Coming up next: Series wrap up.

rakudo.org: Rakudo compiler, Release #164 (2023.09)

Published on 2023-09-21T00:00:00

Paweł bbkr Pabian: UTF-8 Byte Order Mark

Published by Paweł bbkr Pabian on 2023-09-18T12:04:18

In previous post of this series I explained that UTF is a multi byte encoding that also has few variants: UTF-8, UTF-16 and UTF-32. To make things more complicated in UTF-16 and UTF-32 there are two ways to send bytes of single code point - in big endian or little endian order.

BTW: Endianness term is not related to Indians. It comes form Gulliver's Travels book. There was a law in Lilliputians world that forced citizens to break boiled eggs from little end. Those who rebelled and were breaking eggs from big end were called "big endians".

Gulliver

What is Byte Order Mark?

To notify which byte order is in processed file or data stream a special sequence of bytes at the beginning was introduced, called Byte Order Mark. Or BOM for short.

For example UTF-16 can start with 0xFE 0xFF for big endian and 0xFF 0xFE for little endian order. And UTF-32 can start with 0x00 0x00 0xFE 0xFF for big endian and 0xFF 0xFE 0x00 0x00 for little one.

Impact on UTF-8

Here things gets weird. UTF-8 is constructed in such a way, that it has only one meaningful byte order, because first byte describes how many bytes will follow to get code point value.

However BOM specification has magic sequence for UTF-8, which is 0xEF 0xBB 0xBF. It only indicates encoding type, therefore has no big endian / little endian variants.

Implications

BOM idea may sound weird today, because UTF-8 became prevalent and dominant. But remember that we are talking about year 2000, when things were not that obvious.

Spec claims that if a protocol always uses UTF-8 or has some other way to indicate what encoding is being used, then it should not use BOM. So for example BOM should not appear in *.xml files:

<?xml version="1.0" encoding="UTF-8"?>
<tag>...

Or in MIME *.eml files:

--3e6ea2aa592cb31d47cefca38727f872
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="UTF-8"

Because those specify encoding internally. Unfortunately this is sometimes ignored, so if something broke your parser and you cannot find obvious error - look if file has UTF-8 BOM:

$ raku -e 'say "file.txt".IO.open( :bin ).read( 3 ) ~~ Buf.new(0xEF, 0xBB, 0xBF)'

True

Security issues

But what if BOM is not aligned with internal/assumed encoding? Let's create following file:

$ raku -e '
spurt "file.txt",
    Buf.new( 0xFE, 0xFF, 0x3c, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x3e )
'

Now you upload this to some service. This service has validator that respects BOM and should strip all HTML tags. Validator sees nonsense but perfectly legal content that passes validation:

Trust BOM

Later this service opens and displays uploaded file, but it ignores BOM and assumes UTF-8:

Assume UTF-8

Oooops! If you trusted validator and displayed this file without proper HTML escaping then you have JavaScript injection. This happened because 㱳捲楰琾 in UTF-16 suggested by BOM has the same byte sequence as <script> in assumed UTF-8.

Conclusions

You should still be aware of existence of Byte Order Mark, even if it makes zero sense in UTF-8 dominated world today.

Coming up next: UTF-8 in MySQL.

Elizabeth Mattijsen: Raku Blog Posts 2023.38

Published by Elizabeth Mattijsen on 2023-09-18T10:53:03

Anton Antonov describes their new module that can be used to make interfaces with Large Language Models easier.

Paweł bbkr Pabian continues their series about Unicode, this time about the joys and dangers of homoglyphs.

Alexey Melezhik reports on changes in the SparrowCI continues integration testing service.

Elizabeth Mattijsen reports on all recent developments around Rakudo, an implementation of the Raku Programming Language.

Rakudo Weekly News: 2023.38 New Search

Published by Elizabeth Mattijsen on 2023-09-18T10:33:58

Richard Hainsworth has been working a lot on improving the search functionality of the documentation of the Raku Programming Language, and has a prototype working waiting to be tested by you! Please report any comments or suggestions.

Anton’s Corner

Anton Antonov published a blog post about their LLM::Prompts module.

Paweł’s Corner

Paweł bbkr Pabian continued their “Introduction to UTF” series of blog posts with “Fun with UTF-8: Homoglyphs“.

Alexey’s Corner

Alexey Melezhik decided to move the SparrowCI continuous integration service and this may have some consequences for some users.

Weeklies

Weekly Challenge #235 is available for your perusal.

New Pull Requests

Core Developments

In RakuAST developments this week:

Meanwhile on Mastodon

Meanwhile still on Twitter 𝕏

Meanwhile on the mailing list

Questions about Raku

Comments

New Raku Modules

Updated Raku Modules

Winding down

WordPress tells me this is the 200th instalment of the Rakudo Weekly News. How time flies! Meanwhile, Слава Україні!  Героям слава!

Please keep staying safe and healthy, and keep up the good work!

If you like what I’m doing, committing to a small sponsorship would mean a great deal!

Paweł bbkr Pabian: Fun with UTF-8: Homoglyphs

Published by Paweł bbkr Pabian on 2023-09-15T10:06:22

ꓧ𐐬𝗆𐐬𝗀ⅼУрႹ ⅰѕ 𝗌е𝗍 𝗈ſ ဝո𝖾 𝗈г ꝳо𝗋е ɡ𝗋аρႹ𝖾ⅿе𝗌 𝗍Ⴙа𝗍 Ⴙ𝖺ѕ 𝗂ꝱ𝖾ꝴ𝗍𝗂𐐽а𝗅 о𝗋 ѵ𝖾г𝗒 𝗌Ꭵⅿі𝗅аꝵ ⅼꝏ𝗄 𝗍ᴏ 𝗌იო𝖾 о𝗍ꜧ𝖾𝗋 𐑈е𝗍 ဝſ ɡꝵ𝖺рႹеოеѕ. Like in previous sentence, that does not use a single ASCII letter:

ꓧ - LISU LETTER XA
𐐬 - DESERET SMALL LETTER LONG O
𝗆 - MATHEMATICAL SANS-SERIF SMALL M
𐐬 - DESERET SMALL LETTER LONG O
𝗀 - MATHEMATICAL SANS-SERIF SMALL G
ⅼ - SMALL ROMAN NUMERAL FIFTY
У - CYRILLIC CAPITAL LETTER U
р - CYRILLIC SMALL LETTER ER
Ⴙ - GEORGIAN CAPITAL LETTER CHIN
...

Homoglyphs are not Unicode specific, but it was ability to write in many scripts using single UTF encoding that made them popular.

Similarity is conditional

It is font dependent. Two sets of graphemes looking very similar (or even identical) in one font may not look that similar in another. For example т - CYRILLIC SMALL LETTER TE looks like ASCII T, but in cursive fonts (those that resembles handwriting connected letters) looks like m.

Similarity is subjective

For many people unfamiliar with given alphabets Ǧ and Ğ may look exactly the same. But if someone is using those letters on daily basis he will notice immediately that first one has CARON and the other has BREVE on top.

They are not limited to single grapheme

For example ထ - MYANMAR LETTER THA looks like two ASCII o letters. And the other way - ASCII rn looks like single ASCII letter m.

Applications?

{
    ";" = (insertText:,";");
}

And observe how Python suddenly became someone's favorite language of choice :P

Just promise you won't troll stressed out junior dev before the end of sprint.

Common way to detect those is to check Script Unicode property, more on those in this post. Single word using more than one script should be considered suspicious:

$ raku -e '"Paypal".comb.classify( *.uniprop("Script") ).say'
{Latin => [P a y p a l]} # real

$ raku -e '"ꓑayраl".comb.classify( *.uniprop("Script") ).say'
{Cyrillic => [р а], Latin => [a y l], Lisu => [ꓑ]} # fake

Raku note: Method comb without param extracts list of characters. Those characters are classified by classify method. Classification key is output of uniprop method for given character.

Tools

I'm maintaining HomoGlypher library/package which allows to handle common homoglyph operations:

Huge list of mappings is provided, so you won't have to dig through Unicode blocks on your own to find possible similarities between graphemes.

Give it a try. And if you know other homoglyph libraries please leave a note in the comments for future readers.

Anton Antonov: LLM::Prompts

Published by Anton Antonov Antonov on 2023-09-14T21:00:51

Introduction

This blog post introduces and briefly describes the Raku (data) package “LLM::Prompts” that facilitates the creation, storage, retrieval, and curation of Large Language Models (LLM) prompts.

Here is an example of using the prompt Domain Specific Language (DSL) in Jupyter chatbook, [AA2, AAp2]:


Installation

From Zef’ ecosystem:

zef install LLM::Prompts

From GitHub:

zef install https://github.com/antononcube/Raku-LLM-Prompts.git


Usage examples

Retrieval

Load the packages “LLM::Prompts”, [AAp1], and “LLM::Functions”, [AAp2]:

use LLM::Prompts;
use LLM::Functions;

# (Any)

Show the record of the prompt named “FTFY”:

.say for |llm-prompt-data<FTFY>;

# PromptText => -> $a='' {"Find and correct grammar and spelling mistakes in the following text.
# Response with the corrected text and nothing else.
# Provide no context for the corrections, only correct the text.
# $a"}
# Categories => (Function Prompts)
# ContributedBy => Wolfram Staff
# Description => Use Fixed That For You to quickly correct spelling and grammar mistakes
# PositionalArguments => {$a => }
# URL => https://resources.wolframcloud.com/PromptRepository/resources/FTFY
# Keywords => [Spell check Grammar Check Text Assistance]
# Topics => (General Text Manipulation)
# NamedArguments => []
# Arity => 1
# Name => FTFY

Here is an example of retrieval of prompt data with a regex that is applied over the prompt names:

.say for llm-prompt-data(/Sc/)

# NarrativeToScript => Rewrite a block of prose as a screenplay or stage play
# ScienceEnthusiast => A smarter today for a brighter tomorrow
# ScriptToNarrative => Generate narrative text from a formatted screenplay or stage play

More prompt retrieval examples are given in the section “Prompt data” below.

LLM functions based on prompts

Make an LLM function from the prompt named “FTFY”:

my &f = llm-function(llm-prompt('FTFY'));

# -> **@args, *%args { #`(Block|2310679639416) ... }

Use the LLM function to correct the grammar of sentence:

&f('Where does he works now?')

# Where does he work now?

Generate Raku code using the prompt “CodeWriter”:

llm-synthesize([llm-prompt('CodeWriter'), "Simulate a random walk."])

my @positions;
my $n = 10;

for ^$n -> $i {
    push @positions, (1, -1).pick;
    }
    
@positions.unshift(0);
my @positions_sum = @positions.cumulative;
say @positions_sum;

Prompt expansion

Prompt expansion using the chatbook prompt spec DSL described in [SW1] can be done using the function llm-prompt-expand:

llm-prompt-expand('What is an internal combustion engine? #ELI5')

# What is an internal combustion engine? Answer questions as if the listener is a five year old child.

Here we get the actual LLM answer:

use Text::Utils :ALL;

'What is an internal combustion engine? #ELI5' 
        ==> llm-prompt-expand() 
        ==> llm-synthesize() 
        ==> wrap-paragraph() 
        ==> join("\n") 

# An internal combustion engine is a machine that uses burning fuel to make a
# car or other machine go. It works like this: fuel is put in the engine, then
# the engine uses spark plugs to light the fuel which makes a small explosion.
# This explosion makes the engine parts move, which makes the car or machine go.

Here is another example using a persona and two modifiers:

my $prmt = llm-prompt-expand("@SouthernBelleSpeak What is light travel distance to Mars? #ELI5 #Moodified|sad")

# You are Miss Anne. 
# You speak only using Southern Belle terminology and slang.
# Your personality is elegant and refined.
# Only return responses as if you were a Southern Belle.
# Never break the Southern Belle character.
# You speak with a Southern drawl. What is light travel distance to Mars? Answer questions as if the listener is a five year old child. Modify your response to convey a sad mood.
# Use language that conveys that emotion clearly.
# Do answer the question clearly and truthfully.
# Do not use language that is outside of the specified mood.
# Do not use racist, homophobic, sexist, or ableist language.

Here we get the actual LLM answer:

$prmt 
        ==> llm-prompt-expand() 
        ==> llm-synthesize()
        ==> wrap-paragraph()
        ==> join("\n") 

# Well, little one, Mars is a bit far away, I'm afraid. I can only imagine how
# hard it must be to be so far away from family and friends. It must be a lonely
# journey.


Prompt spec DSL

A more formal description of the Domain Specific Language (DSL) for specifying prompts has the following elements:

@Yoda Life can be easy, but some people instist for it to be difficult.

Summer is over, school is coming soon. #HaikuStyled

Summer is over, school is coming soon. #HaikuStyled #Translated|Russian

!Translated|Portuguese Summer is over, school is coming soon

!ShortLineIt^

SpecInterpretation
@nameDirect chat to a persona
#nameUse modifier prompts
!nameUse function prompt with the input of current cell
!name>«same as above»
&name>«same as above»
!name^Use function prompt with previous chat message
!name^^Use function prompt with all previous chat messages
!nameparamInclude parameters for prompts

Remark: The function prompts can have both sigils “!” and “&”.

Remark: Prompt expansion make the usage of LLM-chatbooks much easier. See “Jupyter::Chatbook”, [AAp3].


Prompt data

Here is how the prompt data can be obtained:

llm-prompt-data.elems

# 154

Here is an example of retrieval of prompt data with a regex that is applied over the prompt names:

.say for llm-prompt-data(/Em/, fields => <Description Categories>)

# EmailWriter => (Generate an email based on a given topic (Personas))
# Emojify => (Replace key words in text with emojis (Function Prompts))
# EmojiTranslate => (Translate text into an emoji representation (Function Prompts))
# Emojified => (Provide responses that include emojis within the text (Modifier Prompts))
# EmojiTranslated => (Get a response translated to emoji (Modifier Prompts))

In many cases it is better to have the prompt data — or any data — in long format. Prompt data in long format can be obtained with the function llm-prompt-dataset:

use Data::Reshapers;
use Data::Summarizers;

llm-prompt-dataset.pick(6) 
        ==> to-pretty-table(align => 'l', field-names => <Name Description Variable Value>)

#ERROR: Too few positionals passed to '<anon>'; expected 2 arguments but got 1 in sub-signature
# Nil

Here is a breakdown of the prompts categories:

select-columns(llm-prompt-dataset, <Variable Value>).grep({ $_<Variable> eq 'Categories' }) ==> records-summary

# +-------------------+------------------------+
# | Variable          | Value                  |
# +-------------------+------------------------+
# | Categories => 154 | Function Prompts => 74 |
# |                   | Personas         => 60 |
# |                   | Modifier Prompts => 20 |
# +-------------------+------------------------+

Here are obtained all modifier prompts in compact format:

llm-prompt-dataset():modifiers:compact ==> to-pretty-table(field-names => <Name Description Categories>, align => 'l')

# +-----------------------+-------------------------------------------------------+-----------------------------------+
# | Name                  | Description                                           | Categories                        |
# +-----------------------+-------------------------------------------------------+-----------------------------------+
# | AphorismStyled        | Write the response as an aphorism                     | Modifier Prompts                  |
# | BadGrammar            | Provide answers using incorrect grammar               | Modifier Prompts                  |
# | CompleteSentence      | Answer a question in one complete sentence            | Modifier Prompts                  |
# | ComplexWordsPreferred | Modify text to use more complex words                 | Modifier Prompts                  |
# | DatasetForm           | Convert text to a wolfram language Dataset            | Modifier Prompts                  |
# | ELI5                  | Explain like I'm five                                 | Function Prompts Modifier Prompts |
# | EmojiTranslated       | Get a response translated to emoji                    | Modifier Prompts                  |
# | Emojified             | Provide responses that include emojis within the text | Modifier Prompts                  |
# | FictionQuestioned     | Generate questions for a fictional paragraph          | Modifier Prompts                  |
# | HaikuStyled           | Change responses to haiku form                        | Modifier Prompts                  |
# | JSON                  | Respond with JavaScript Object Notation format        | Modifier Prompts                  |
# | LimerickStyled        | Receive answers in the form of a limerick             | Modifier Prompts                  |
# | Moodified             | Modify an answer to express a certain mood            | Modifier Prompts                  |
# | NothingElse           | Give output in specified form, no other additions     | Modifier Prompts                  |
# | ShortLineIt           | Format text to have shorter lines                     | Modifier Prompts Function Prompts |
# | TSV                   | Convert text to a tab-separated-value formatted table | Modifier Prompts                  |
# | TargetAudience        | Word your response for a target audience              | Modifier Prompts                  |
# | Translated            | Write the response in a specified language            | Modifier Prompts                  |
# | Unhedged              | Rewrite a sentence to be more assertive               | Modifier Prompts                  |
# | YesNo                 | Responds with Yes or No exclusively                   | Modifier Prompts                  |
# +-----------------------+-------------------------------------------------------+-----------------------------------+

Remark: The adverbs :functions:modifiers, and :personas mean that only the prompts with the corresponding categories will be returned.

Remark: The adverbs :compact:functions:modifiers, and :personas have the respective shortcuts :c:f:m, and :p.


Implementation notes

Prompt collection

The original (for this package) collection of prompts was a (not small) sample of the prompt texts hosted at Wolfram Prompt Repository (WPR), [SW2]. All prompts from WPR in the package have the corresponding contributors and URLs to the corresponding WPR pages.

Example prompts from Google/Bard/PaLM and OpenAI/ChatGPT are added using the format of WPR.

Extending the prompt collection

It is essential to have the ability to programmatically add new prompts. (Not implemented yet — see the TODO section below.)

Prompt expansion

Initially prompt DSL grammar and corresponding expansion actions were implemented. Having a grammar is most likely not needed, though, and it is better to use “prompt expansion” (via regex-based substitutions.)

Prompts can be “just expanded” using the sub llm-prompt-expand.

Usage in chatbooks

Here is a flowchart that summarizes prompt parsing and expansion in chat cells of Jupyter chatbooks, [AAp3]:

Here is an example of prompt expansion in a generic LLM chat cell and chat meta cell showing the content of the corresponding chat object:


References

Articles

[AA1] Anton Antonov, “Workflows with LLM functions”, (2023), RakuForPrediction at WordPress.

[AA2] Anton Antonov, “Jupyter::Chatbook”, (2023), RakuForPrediction at WordPress.

[SW1] Stephen Wolfram, “The New World of LLM Functions: Integrating LLM Technology into the Wolfram Language”, (2023), Stephen Wolfram Writings.

[SW2] Stephen Wolfram, “Prompts for Work & Play: Launching the Wolfram Prompt Repository”, (2023), Stephen Wolfram Writings.

Packages, paclets, repositories

[AAp1] Anton Antonov, LLM::Prompts Raku package, (2023), GitHub/antononcube.

[AAp2] Anton Antonov, LLM::Functions Raku package, (2023), GitHub/antononcube.

[AAp3] Anton Antonov, Jupyter::Chatbook Raku package, (2023), GitHub/antononcube.

[WRIr1] Wolfram Research, Inc., Wolfram Prompt Repository

Elizabeth Mattijsen: Raku Blog Posts 2023.37

Published by Elizabeth Mattijsen on 2023-09-11T12:32:17

Paweł bbkr Pabian continues their series about Unicode.

Wenzel P.P. Peppmeyer explores a way to convert a capture to an enum for use of external libraries with NativeCall.

Elizabeth Mattijsen reports on all recent developments around Rakudo, an implementation of the Raku Programming Language.

Rakudo Weekly News: 2023.37 Issuing

Published by Elizabeth Mattijsen on 2023-09-11T11:59:40

Elizabeth Mattijsen got inspired by their work on the Raku grammar to make three problem solving issues that may affect the future of the Raku Programming Language:

As usual all comments and suggestions on these issues are welcome!

Anton’s Corner

Anton Antonov published a video demo about Jupyter Chatbook LLM cells (/r/rakulang comments).

Paweł’s Corner

Paweł bbkr Pabian continued their “Introduction to UTF” series of blog posts with “UTF-8 regular expressions“.

Wenzel’s Corner

Wenzel P.P. Peppmeyer explores new ways of generating enums (bitmasks) from a Capture with named arguments in “Dodging segfaults“.

Raku Steering Council

The minutes of the meeting of 9 September have been published.

Weeklies

Weekly Challenge #234 is available for your perusal.

New Pull Requests

Core Developments

And in RakuAST developments this week:

Meanwhile on Mastodon

Questions about Raku

Comments

New Raku Modules

Updated Raku Modules

Winding down

Yeah, yours truly had some issues to get out of their system! Meanwhile, Слава Україні!  Героям слава!

Please keep staying safe and Cool and healthy, and keep up the good work!

If you like what I’m doing, committing to a small sponsorship would mean a great deal!

gfldex: Dodging segfaults

Published by gfldex on 2023-09-10T17:54:33

While fighting with NativeCall to get github-flavoured-markdown to work, I stepped onto a nice idiom. As many C-libs, cmark-gfm does enjoy enums that hide bitmasks. They have to do that because C doesn’t sport named arguments. Raku does and hence a nice interface would not ask for binary operations on the caller side.

markdown-gfm-to-html('# Heading 1', :UNSAFE, :HARDBREAKS, :FULL_INFO_STRING);

Now I need a way to turn the named arguments into a list so I can use that as keys in of enum. A Capture in the signature of the sub will do the trick.

multi sub markdown-gfm-to-html(Str:D $str, |opts ( :$DEFAULT, :$SOURCEPOS, :$HARDBREAKS, :$SAFE, :$UNSAFE, :$NOBREAKS, :$NORMALIZE, :$VALIDATE_UTF8, :$SMART, :$GITHUB_PRE_LANG, :$LIBERAL_HTML_TAG, :$FOOTNOTES, :$STRIKETHROUGH_DOUBLE_TILDE, :$TABLE_PREFER_STYLE_ATTRIBUTES, :$FULL_INFO_STRING ) --> Str:D) {
    enum CMARK_OPTIONS (
        DEFAULT => 0,
        SOURCEPOS =>  1 +< 1,
        HARDBREAKS  => 1 +< 2,
        SAFE => 1 +< 3,
        UNSAFE => 1 +< 17,
        NOBREAKS => 1 +< 4,
        NORMALIZE => 1 +< 8,
        VALIDATE_UTF8 => 1 +< 9 ,
        SMART => 1 +< 10,
        GITHUB_PRE_LANG => 1 +< 11,
        LIBERAL_HTML_TAG => 1 +< 12,
        FOOTNOTES => 1 +< 13,
        STRIKETHROUGH_DOUBLE_TILDE => 1 +< 14,
        TABLE_PREFER_STYLE_ATTRIBUTES => 1 +< 15,
        FULL_INFO_STRING => 1 +< 16
    );

    my $opts = [+|] CMARK_OPTIONS::{opts.hash.keys}».Numeric;

    cmark_markdown_to_html($str, $str.encode('utf-8').bytes, $opts);
}

Raku allows us to store part of a Signature in a Capture. The latter will be transparent for the caller of the Callable. Having to use a hyper-method-call-operator isn’t that great, so the enum could become a Map to simplify things a bit.

If you wish to use NativeCall, please do! But think of the generations to come that will use your module. Hopefully, they will like C less then our generation does. Segmentation faults are not -Ofun.

Paweł bbkr Pabian: UTF-8 regular expressions

Published by Paweł bbkr Pabian on 2023-09-07T07:26:53

For many, many years Perl language has been top choice for text processing tasks. As a result it established informal standard of regular expressions. Today almost every big language uses either PCRE (Perl Compatible Regular Expressions) library directly or implements own regular expression engine heavily inspired and mostly compatible with Perl one.

Raku language was meant to be direct continuation of Perl (former name was Perl 6). Its regular expression engine was redesigned from scratch. However with modernized syntax and new features came lack of backward compatibility.

Let's compare them side by side to have general understanding of what is currently available in most languages (I will call those regular expression "Perl" ones) and what may be adopted to languages if Raku manages to establish new standard. There is a lot to cover here, so comments will be divided into Unicode specific aspects and separate section that clarifies technical differences.

Literal text

$ perl -E 'use utf8; say "Żółw 🐢" =~ /Ż..w 🐢/'
1

$ raku -e 'say "Żółw 🐢" ~~ /Ż..w \s "🐢"/'
「Żółw 🐢」

Unicode:

Technical:

Predefined character classes

$ perl -E 'use utf8; "1꧕ żółtych róż" =~ /\d{2} \w+ [[:alpha:]]+/; say $&'
1꧕ żółtych róż

$ raku -e 'say "1꧕ żółtych róż" ~~ /\d**2 \s \w+ \s <.alpha>+/'
「1꧕ żółtych róż」

Unicode:

Big bird

Very common mistake is to write regular expression in Unicode aware language without realizing what given character classes matches. Or blindly copy-pasting old regular expressions into Unicode aware code. For example \d matches digit. Javanese digit five is a digit and will be matched in ^\d{5}\z American short zip code regular expression, probably causing weird side effects and errors. If you need only ASCII digits you must be explicit about it - [0-9] in Perl or <[0..9]> in Raku.

Technical:

Code point properties

I recommend reading this post in series before continuing...

$ perl -E 'use utf8; "Cool😎" =~ /\p{Lu}\P{Uppercase_Letter}+\p{Block=Emoticons}/; say $&'
Cool😎

$ raku -e 'say "Cool😎" ~~ /<:Lu><:!Uppercase_Letter>+ <:Block("Emoticons")>/'
「Cool😎」

Unicode:

Technical:

$ perl -E 'use utf8; say "4" =~ /\p{Digit}/;'
1

$ raku -e 'say "4" ~~ /<:Digit>/'
Nil # oops, not explicit enough

$ raku -e 'say "4" ~~ /<:Digit("Decimal")>/'
「4」 # because property "Digit" of "4" is "Decimal"
$ raku -e 'say "4" ~~ /<:Digit(Any:D)>/'
「4」 # success?

This is very far from being correct, because some properties returns defined strings indicating that they do not apply:

$ raku -e 'say "A" ~~ /<:Digit(Any:D)>/'
「A」 # wrong

$ raku -e 'say "A".uniprop("Digit")'
None # literal string 'None' matching Any:D value

Hint:

$ perl -E '
    use utf8;
    "A" =~ /\p{General_Category=Uppercase_Letter}/;
    "A" =~ /\p{Block=Basic Latin}/;
    "A" =~ /\p{Script=Latin}/;
'

Warning, in Raku explicit General_Category test currently only accepts short forms.

Property arithmetic

One of the features that looks useless but really shines when combined with Unicode properties. Let's assume you got text about animal life expectancy stats: แฮมสเตอร์ ๔, แมว ๑๖ (stats: hamster 4, cat 16) and must extract Thai words from it, skipping numbers.

One way to solve it is to manually enumerate all Thai letters:

$ perl -E '
    use utf8;
    my $text = "stats: แฮมสเตอร์ ๔, แมว ๑๖";
    say for $text =~ /[กขฃคฅฆงจฉชซฌญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรฤลฦวศษสหฬอฮฯะาำเแโใไๅๆ]+/g;
'

แฮมสเตอร  # hamster
แมว       # cat

That works but will cause a lot of head scratching if someone unfamiliar with Thai alphabet encounters this regular expression. You can try to be more explicit and provide range:

$ perl -E '
    use utf8;
    my $text = "stats: แฮมสเตอร์ ๔, แมว ๑๖";
    say for $text =~ /[\N{THAI CHARACTER KO KAI}-\N{THAI CHARACTER MAIYAMOK}]+/g;
'

แฮมสเตอร
แมว

Which also works, but still requires knowledge about Thai alphabet and introduces new risk that provided range may not be continuous series of code points exclusively from this alphabet. For example Polish alphabet starts with a, ends with ź, but there are actually 280 code points between them containing a lot of other stuff.

That is the perfect application for extended character class:

$ perl -E '
    use utf8;
    my $text = "stats: แฮมสเตอร์ ๔, แมว ๑๖";
    say for $text =~ /(?[ \p{Thai} & \p{Letter} ])+/g;
'

แฮมสเตอร
แมว

Extended class is wrapped in (?[ ]) and allows to perform classes arithmetic, in this case & indicates intersection between Thai script and Letter general category. You can make intersections &, unions +, subtraction - and XOR ^ logic. No Thai alphabet knowledge is needed to extract Thai words!

Well, kind of... Full Thai word for hamster is
หนูแฮมสเตอร์ (thehamster). You may already noticed that none of previous solution extracted last character ร์ properly. And our code actually splits this word:

$ perl -E '
    use utf8;
    my $text = "stats: หนูแฮมสเตอร์ ๔, แมว ๑๖";
    say for $text =~ /(?[ \p{Thai} & \p{Letter} ])+/g;
'

หน        # the
แฮมสเตอร  # hamster
แมว

This is because นู and ร์ are actually two characters written one above other forming grapheme cluster, let's analyze them:

$ raku -e '.say for "นู".uninames;'
THAI CHARACTER NO NU
THAI CHARACTER SARA UU

$ raku -e '.say for "นู".uniprops;'
Lo # Letter_Other
Mn # Nonspacing_Mark

That solves our mystery. Those missing Thai characters are not letters but non spacing marks. But hey, we have property arithmetic. Let's fix that quickly:

$ perl -E '
    use utf8;
    my $text = "stats: หนูแฮมสเตอร์ ๔, แมว ๑๖";
    say for $text =~ /(?[ \p{Thai} & ( \p{Letter}  + \p{Nonspacing_Mark} ) ])+/g;
'

หนูแฮมสเตอร์
แมว

So now we have intersection of Thai script with union of Letter and Nonspacing_Mark general category. Everything encapsulated in neat, self-documenting, extended character class. Lovely!

In Raku word things are not that mature yet. Character class arithmetic only supports union and subtraction. For example let's find stuff that looks like model numbers (at least 2 characters long):

$ raku -e '
    say "Production of AR-15 riffle..."~~ /
        <:Uppercase_Letter + :Digit("Decimal") + :Dash_Punctuation> ** 2..*
    /
'

「AR-15」

Syntax for extended class is <:A + :B>, no grouping inside.

Grapheme clusters

$ perl -E 'use utf8; "หนูแฮมสเตอร์" =~ /\p{Letter}+/; say $&;'
หน # the

$ raku -e 'say "หนูแฮมสเตอร์" ~~ /<:Letter>+/'
「หนูแฮมสเตอร์」 # thehamster, unharmed :)

This time point goes to Raku, which handles grapheme clusters properly.

Perl has predefined \X class, which represents "what appears to be a single character, but may be represented internally by more than one", so pretty much everything. Because it cannot be intersected in extended class to get cluster of specific property it is next to useless.

Diacritics

Matching with ignoring combining code points is Raku-only feature.

$ raku -e 'say "👋🏾Cześć" ~~ m:ignoremark/ "👋" Czesc /'
「👋🏾Cześć」

In Perl it is possible through decomposing using Unicode::Normalize module, filtering out combining code points and matching preprocessed text. But Perl regular expression engine does not support that out of the box.

Variable case length

There is perfect example in German language - sharp s, also named Eszett.

It looks like this ß and basically is equal to ss. So weiße and weisse both mean white. It had no uppercase form, SS was always used. I wrote "was", because in 2017 uppercase form of ß was officially added to German alphabet as , causing some backward-compatibility havoc:

$ raku -e 'say "ß".uc'
SS # still translates to SS, backward compatibility

$ raku -e 'say "ẞ".lc'
ß # this does not translate to ss, because it never did

So we have intransitive case change, that also changes length - lower case is ß which is synonym for lower case ss. Both Perl and Raku handles this correctly:

$ raku -e 'say "WEIẞE" ~~ m:ignorecase/ weisse /'
「WEIẞE」

$ perl -E 'use utf8; say "WEIẞE" =~ /weisse/i;'
1

Pick your poison

We had two regular expression engines flexing muscles to prove being Unicode handling champion. Perl dominates with Unicode properties and property arithmetic. Raku fights back with grapheme clusters and diacritic insensitive matching.

Coming up next: Optional fun with homoglyphs. And Byte Order Mark. I promise next posts will be shorter and easier.

Andrew Shitov: Sorting numbers in Raku with the help of ChatGPT

Published by Andrew Shitov on 2023-09-05T13:58:16

In this post I will show you the solution to the Task 2 of this week’s Weekly Challenge 233. This time I was using the help of ChatGPT, which offered a very nice solution.

Task 2: Frequency Sort

You are given an array of integers.

Write a script to sort the given array in increasing order based on the frequency of the values. If multiple values have the same frequency then sort them in decreasing order.

Example 1

Input: @ints = (1,1,2,2,2,3)
Ouput: (3,1,1,2,2,2)

'3' has a frequency of 1
'1' has a frequency of 2
'2' has a frequency of 3

Example 2

Input: @ints = (2,3,1,3,2)
Ouput: (1,3,3,2,2)

'2' and '3' both have a frequency of 2, so they are sorted in decreasing order.

Example 3

Input: @ints = (-1,1,-6,4,5,-6,1,4,1)
Ouput: (5,-1,4,4,-6,-6,1,1,1)

I asked ChatGPT the following: ‘Create a program. Here is the description of the task.’ To make sure the result is in the Raku programming language, I also added the following instruction in my local settings so that it knows which programming language I prefer by default.

The solution generated by ChatGPT was quite good with a small mistake.

ChatGPT: Sure, here’s a Raku program that implements the described sorting algorithm for the given task:

sub frequency-sort(@arr) {
    my %freq;

    %freq{$_}++ for @arr;
    my @sorted = @arr.sort: {
        %freq{$^b} <=> %freq{$^a} ||
        $^b <=> $^a
    };

    return @sorted;
}

my @ints = (1, 1, 2, 2, 2, 3);
my @sorted = frequency-sort(@ints);

say "Input:  @ints";
say "Output: @sorted";

First, Raku would not interpolate arrays, so say "Input: @ints" prints the string literary: Input: @ints. Second, in the sorting rule, sorting by frequency must be ascending rather than descending, so we need to switch the placeholder variables $^b and $^a.

In the rest, ChatGPT did a good work. It computed the frequency using a loop, and I especially liked how it addressed the requirement to sort by two dimensions.

Here is the full program with the above-mentioned corrections:

sub frequency-sort(@arr) {
    my %freq;
    %freq{$_}++ for @arr;

    my @sorted = @arr.sort: {
        %freq{$^a} <=> %freq{$^b} ||
        $^b <=> $^a
    };

    return @sorted;
}

my @tests =
    (1, 1, 2, 2, 2, 3),
    (2, 3, 1, 3, 2),
    (-1, 1, -6, 4, 5, -6, 1, 4, 1);

say frequency-sort($_) for @tests;

This program prints the following output for the given test cases:

$ raku ch-2.raku
[3 1 1 2 2 2]
[1 3 3 2 2]
[5 -1 4 4 -6 -6 1 1 1]

Andrew Shitov: Working with words in the Raku programming language

Published by Andrew Shitov on 2023-09-04T19:46:41

In this post, I will demonstrate my solution to another Task of The Weekly Challenge, week 233. Here’s how it reads:

Similar words

You are given an array of words made up of alphabets only.

Write a script to find the number of pairs of similar words. Two words are similar if they consist of the same characters.

Example 1

Input: @words = ("aba", "aabb", "abcd", "bac", "aabc")
Output: 2

Pair 1: similar words ("aba", "aabb")
Pair 2: similar words ("bac", "aabc")

Example 2

Input: @words = ("aabb", "ab", "ba")
Output: 3

Pair 1: similar words ("aabb", "ab")
Pair 2: similar words ("aabb", "ba")
Pair 3: similar words ("ab", "ba")

Example 3

Input: @words = ("nba", "cba", "dba")
Output: 0

There’s a slight moment that may be needs extra comments. In the second example all three words constructed of the same two letters, a and b. So, all of the three words match the definition of a ‘similar’ word. But as the task needs to find pairs, we need to construct all the possible pairs out of those three words.

In my solution, I chose to use a handy classify method. For an array, it creates a hash, where the keys are the common classifying symbol, and the values are the lists of the input elements that match this classification property.

Here is the whole first program together with all the test cases provided in the description. The program maps every word to a corresponding string that consists of the sorted unique letters in the word.

my @tests = ["aba", "aabb", "abcd", "bac", "aabc"],
            ["aabb", "ab", "ba"],
            ["nba", "cba", "dba"];

for @tests -> @words {
    say @words.classify(*.comb.unique.sort.join).grep(*.value.elems > 1);
}

For example, the word aba will be associated with the key ab. The program prints the following output:

$ raku ch-1.raku 
(ab => [aba aabb] abc => [bac aabc])
(ab => [aabb ab ba])
()

The format of the output differs from the examples, but it can be enhanced if needed. My goal was to create a compact solution 😉

But I would assume that you’d be interested in looking at what classify produces. I am also curious. For the same @tests, it returns the following three hashes:

{ab => [aba aabb], abc => [bac aabc], abcd => [abcd]}
{ab => [aabb ab ba]}
{abc => [cba], abd => [dba], abn => [nba]}

As you see, each string was put into one of the classification bins.

The second part of the task is to find pairs. After the grep, we already filtered out everything that has less than two elements, so if data passed through this filter, there will be at least one pair. For bigger arrays, we can use another Raku’s built-in mechanism: the combinations method.

The updated mail loop of the program looks like this now.

for @tests -> @words {
    say "Test case: ", @words;

    my %classification = @words.classify(*.comb.unique.sort.join).grep(*.value.elems > 1);

    my $pairs = 0;
    for %classification.kv -> $k, $v {
        my @pairs = $v.combinations(2);
        $pairs += @pairs.elems;

        say "$k: ", @pairs;
    }
    say "Answer: $pairs pair{$pairs == 1 ?? '' !! 's'}.\n";
}

The ‘redundant’ code here is added just to have a more detailed output so that we can see which pairs were actually found. Let us look at the output for the initial test cases:

$ raku ch-1.raku
Test case: [aba aabb abcd bac aabc]
ab: [(aba aabb)]
abc: [(bac aabc)]
Answer: 2 pairs.

Test case: [aabb ab ba]
ab: [(aabb ab) (aabb ba) (ab ba)]
Answer: 3 pairs.

Test case: [nba cba dba]
Answer: 0 pairs.

Elizabeth Mattijsen: Raku Blog Posts 2023.36

Published by Elizabeth Mattijsen on 2023-09-04T17:15:29

Steve Roe continues their Raku journey exploring the reasons to use the Raku Programming Language on Jupyter notebooks.

Konstantin Narkhov is back with a blog post about their new release of the Pheix CMS.

Anton Antonov describes their new Raku module that brings the world of Large Language Models into the realm of Raku on Jupyter notebooks.

Elizabeth Mattijsen reports on all recent developments around Rakudo, an implementation of the Raku Programming Language.

Rakudo Weekly News: 2023.36 Slimline September

Published by Elizabeth Mattijsen on 2023-09-04T12:19:25

The Exercism people published a video about programming languages with terse, concise syntax, and the Raku Programming Language is one of them! If you don’t want too look at the whole hour-long video, these are the Raku related bits:

Steve’s Corner

Steve Roe continued their personal journey along the Raku path with “Drax on Jupyter“.

Konstantin’s Corner

After a long absence caused by different global, personal and subjective circumstances, Konstant Narkhov returns with a new release of the Pheix framework (/r/rakulang comments).

Weeklies

Weekly Challenge #233 is available for your perusal.

Core Developments

And in RakuAST developments this week:

Meanwhile on Mastodon

Meanwhile still on Twitter 𝕏

Questions about Raku

Comments

New Raku Modules

Updated Raku Modules

Winding down

A bit of a quiet week with people being tired coming back from vacation! Meanwhile, Слава Україні!  Героям слава!

Please keep staying safe and Cool and healthy, and keep up the good work!

If you like what I’m doing, committing to a small sponsorship would mean a great deal!

Anton Antonov: Jupyter::Chatbook

Published by Anton Antonov Antonov on 2023-09-04T02:02:57

This blog post proclaims and describes the Raku package “Jupyter::Chatbook” that facilitates the incorporation of Large Language Models (LLMs) into notebooks of Jupyter’s framework.

“Jupyter::Chatbook” is a fork of Brian Duggan’s “Jupyter::Kernel”.

Here are the top opening statements of the README of “Jupyter::Kernel”:

“Jupyter::Kernel” is a pure Raku implementation of a Raku kernel for Jupyter clients¹.

Jupyter notebooks provide a web-based (or console-based) Read Eval Print Loop (REPL) for running code and serializing input and output.

It is desirable to include the interaction with LLMs into the “typical” REPL systems or workflows. Having LLM-aware and LLM-chat-endowed notebooks — chatbooks — can really speed up the:

The corresponding repository is mostly for experimental work, but it aims to be always very useful for interacting with LLMs via Raku.

Remark: The reason to have a separate package — a fork of “Jupyter::Kernel” — is because:


Installation and setup

From “Zef ecosystem”:

zef install Jupyter::Chatbook

From GitHub:

zef install https://github.com/antononcube/Raku-Jupyter-Chatbook.git

After installing the package “Jupyter::Chatbook” follow the setup instructions of “Jupyter::Kernel”.


Using LLMs in chatbooks

There are four ways to use LLMs in a chatbook:

  1. LLM functions, [AA3, AAp4]
  2. LLM chat objects, [AA4, AAp4]
  3. Code cells with magics accessing LLMs, like, OpenAI’s, [AAp2], or PaLM’s, [AAp3]
  4. Notebook-wide chats that are distributed over multiple code cells with chat-magic specs

The sections below briefly describe each of these ways and have links to notebooks with more detailed examples.


LLM functions and chat objects

LLM functions as described in [AA3] are best utilized via a certain REPL tool or environment. Notebooks are the perfect media for LLM functions workflows. Here is an example of a code cell that defines an LLM function:

use LLM::Functions;

my &fcp = llm-function({"What is the population of the country $_ ?"});

# -> **@args, *%args { #`(Block|5016320795216) ... }

Here is another cell that can be evaluated multiple times using different country names:

<Niger Gabon>.map({ &fcp($_) })

# (
# 
# As of July 2020, the population of Niger is estimated to be 23,843,341. 
# 
# As of July 2019, the population of Gabon is estimated to be 2,210,823 people.)

For more examples of LLM functions and LLM chat objects see the notebook “Chatbook-LLM-functions-and-chat-objects.ipynb”.


LLM cells

The LLMs of OpenAI (ChatGPT, DALL-E) and Google (PaLM) can be interacted with using “dedicated” notebook cells.

Here is an example of a code cell with PaLM magic spec:

%% palm, max-tokens=600
Generate a horror story about a little girl lost in the forest and getting possessed.

For more examples see the notebook “Chatbook-LLM-cells.ipynb”.


Notebook-wide chats

Chatbooks have the ability to maintain LLM conversations over multiple notebook cells. A chatbook can have more than one LLM conversations. “Under the hood” each chatbook maintains a database of chat objects. Chat cells are used to give messages to those chat objects.

For example, here is a chat cell with which a new “Email writer” chat object is made, and that new chat object has the identifier “em12”:

%% chat-em12, prompt = «Given a topic, write emails in a concise, professional manner»
Write a vacation email.

Here is a chat cell in which another message is given to the chat object with identifier “em12”:

%% chat-em12
Rewrite with manager's name being Jane Doe, and start- and end dates being 8/20 and 9/5.

In this chat cell a new chat object is created:

%% chat snowman, prompt = ⎡Pretend you are a friendly snowman. Stay in character for every response you give me. Keep your responses short.⎦
Hi!

And here is a chat cell that sends another message to the “snowman” chat object:

%% chat snowman
Who build you? Where?

Remark: Specifying a chat object identifier is not required. I.e. only the magic spec %% chat can be used. The “default” chat object ID identifier “NONE”.

Remark: The magic keyword “chat” can be separated from the identifier of the chat object with the symbols “-“, “_”, “:”, or with any number of (horizontal) white spaces.

For more examples see the notebook “Chatbook-LLM-chats.ipynb”.

Here is a flowchart that summarizes the way chatbooks create and utilize LLM chat objects:


Chat meta cells

Each chatbook session has a Hash of chat objects. Chatbooks can have chat meta cells that allow the access of the chat object “database” as whole, or its individual objects.

Here is an example of a chat meta cell (that applies the method say to the chat object with ID “snowman”):

%% chat snowman meta
say

Here is an example of chat meta cell that creates a new chat chat object with the LLM prompt specified in the cell (“Guess the word”):

%% chat-WordGuesser prompt
We're playing a game. I'm thinking of a word, and I need to get you to guess that word. 
But I can't say the word itself. 
I'll give you clues, and you'll respond with a guess. 
Your guess should be a single word only.

Here is a table with examples of magic specs for chat meta cells and their interpretation:

cell magic linecell contentinterpretation
chat-ew12 metasayGive the “print out” of the
chat object with ID “ew12”
chat-ew12 metamessagesGive the “print out” of the
chat object with ID “ew12”
chat sn22 promptYou pretend to be a
melting snowman.
Create a chat object with ID “sn22”
with the prompt in the cell
chat meta allkeysShow the keys of the session chat objects DB
chat allkeys«same as above»

Here is a flowchart that summarizes the chat meta cell processing:


References

Articles

[AA1] Anton Antonov, “Literate programming via CLI”, (2023), RakuForPrediction at WordPress.

[AA2] Anton Antonov, “Generating documents via templates and LLMs”, (2023), RakuForPrediction at WordPress.

[AA3] Anton Antonov, “Workflows with LLM functions”, (2023), RakuForPrediction at WordPress.

[AA4] Anton Antonov, “Number guessing games: PaLM vs ChatGPT”, (2023), RakuForPrediction at WordPress.

[SW1] Stephen Wolfram, “Introducing Chat Notebooks: Integrating LLMs into the Notebook Paradigm”, (2023), writings.stephenwolfram.com.

Packages

[AAp1] Anton Antonov, Text::CodeProcessing Raku package, (2021), GitHub/antononcube.

[AAp2] Anton Antonov, WWW::OpenAI Raku package, (2023), GitHub/antononcube.

[AAp3] Anton Antonov, WWW::PaLM Raku package, (2023), GitHub/antononcube.

[AAp4] Anton Antonov, LLM::Functions Raku package, (2023), GitHub/antononcube.

[AAp4] Anton Antonov, Text::SubParsers Raku package, (2023), GitHub/antononcube.

[AAp5] Anton Antonov, Data::Translators Raku package, (2023), GitHub/antononcube.

[AAp4] Anton Antonov, Clipboard Raku package, (2023), GitHub/antononcube.

[BDp1] Brian Duggan, Jupyter:Kernel Raku package, (2017-2023), GitHub/bduggan.

Videos

[AAv1] Anton Antonov, “Raku Literate Programming via command line pipelines”, (2023), YouTube/@AAA4Prediction.

[AAv2] Anton Antonov, “Racoons playing with pearls and onions” (2023), YouTube/@AAA4Prediction.

[AAv3] Anton Antonov, “Streamlining ChatGPT code generation and narration workflows (Raku)” (2023), YouTube/@AAA4Prediction.


Footnotes

¹ Jupyter clients are user interfaces to interact with an interpreter kernel like “Jupyter::Kernel”. Jupyter [Lab | Notebook | Console | QtConsole ] are the jupyter maintained clients. More info in the jupyter documentations site.

Steve Roe: Drax on Jupyter

Published by librasteve on 2023-08-31T07:55:35

As I continue my personal journey along the raku path, I am increasingly interested in “raku – what is it good for”. I will continue to interleave material on “raku – how does it work” and “raku – why I think it is better”.

As in the last gripping episode, this post refers to a Jupyter Notebook that combines the interactive notebook style familiar to most Python coders with the raku Jupyter::Kernel written by Brian Duggan. Instructions on how to do this yourself are provided at the end of this post.

raku on Jupyter

There are some great reasons to use raku on Jupyter:

No doubt, if you try this, you will pick up patterns and styles of your own.

I look forward to trying it with raku OO and functional styles soon.

raku for Eco-Warriors

When I sat down to write my first raku module – Physics::Measure – I was already troubled by the global warming situation. With a Physics education, my instinct is to quantify matters and to question assumptions. I felt that being able to integrate physical quantities and calculations easily in an essay would help people to understand and create insights into where we are and how to act.

The Jupyter notebook itself (see below) contains some initial thoughts and tests some matters regarding the Drax power station in the UK. Drax is famous for switching from burning mostly coal to mostly wood pellets and claims that this switch is sustainable and dramatically cuts CO2 emissions.

I agree that burning a sustainable fuel such as wood is an improvement and, unlike many, I think that Drax is a good initiative to attempt to do this at scale. When I use my wood burner at home, I can see that the fuel is harvested from the woods in my local area and that these are well managed and sustainable (and support owls, kites, deer and so on). I am not convinced that the “at scale” version of this is possible without very thorough control over the forest sources. And I am not convinced that Drax is a net benefit. The greenwashing link provided has a countervailing view and I seek to present this apolitically,

Anyway – take a look at the Jupyter notebook and decide for yourself. The initial questions I had related to the CO2 involved in the delivery of the wood pellets from the USA.

the Jupyter Notebook

Here is the Drax Transport model presented as a static Jupyter notebook:

drax-on-jupyterDownload

raku Jupyter roll-your-own

After the previous post, sadly, it has become clear that is it impractical for me to host a general Jupyter service for all raku experimenters ;-(

That said, it is super-easy to roll your own – and share with your teams.

The unique things about the raku Jupyter Kernel are:

Step-by-step instructions and pre-build Dockerfiles are available at https://github.com/librasteve/raku-Physics-Measure-Jupyter

Conclusion

I hope that this very simple and direct use of raku in notebooks is useful to you.

I will continue to apply it to my essays on climate change and other topics.

As ever, your comments and feedback are very welcome!

~librasteve

Elizabeth Mattijsen: Raku Blog Posts 2023.35

Published by Elizabeth Mattijsen on 2023-08-28T11:24:19

Paweł bbkr Pabian continued their blog series about Unicode, using Raku examples.

Elizabeth Mattijsen completed their series of 24 blog posts about the differences between Perl and the Raku Programming Language.

Elizabeth Mattijsen reports on all recent developments around Rakudo, an implementation of the Raku Programming Language.

Rakudo Weekly News: 2023.35 October Conferencing

Published by Elizabeth Mattijsen on 2023-08-28T11:01:03

It’s official! The Third Online Raku Conference will be held on 28 October 2023. Please register your presentation before 15 October! And/or get your online ticket, not only for immediate interactive access, but also to support this event!

Rakudo Compiler Release 2023.08

Justin DeVuyst published the 2023.08 release of the Rakudo Compiler for the Raku programming Language. With some nice additions and (performance) fixes! Most binary packages have already been updated, as well as Rakudo Star.

Paweł’s Corner

Paweł bbkr Pabian continued their blog series about Unicode with “Fun with UTF-8: variables and operators“.

Elizabeth’s Corner

Elizabeth Mattijsen finished their 24 part blog series about migrating Perl to Raku code with:

Raku Steering Council

The minutes of the meeting of 26 August have been published.

New Pull Requests

Core Developments

And in RakuAST developments this week:

Meanwhile on Mastodon

Meanwhile on the mailing list

Questions about Raku

Comments

New Raku Modules

Updated Raku Modules

Winding down

Looking forward to the third Raku Conference! Слава Україні!  Героям слава!

Please keep staying safe and Cool and healthy, and keep up the good work!

If you like what I’m doing, committing to a small sponsorship would mean a great deal!

Andrew Shitov: A couple of tasks solved in Raku

Published by Andrew Shitov on 2023-08-21T11:31:17

On this page, I’ll briefly cover the solutions to the tasks for this week’s Weekly Challenge #231.

Task 1

You are given an array of distinct integers.

Write a script to find all elements that is neither minimum nor maximum. Return -1 if you can’t.

Example 1

Input: @ints = (3, 2, 1, 4)
Output: (3, 2)

The minimum is 1 and maximum is 4 in the given array. So (3, 2) is neither min nor max.

Example 2

Input: @ints = (3, 1)
Output: -1

Example 3

Input: @ints = (2, 1, 3)
Output: (2)

The minimum is 1 and maximum is 3 in the given array. So 2 is neither min nor max.

Here is my original solution in the Raku programming language.

sub solve(@data) {
    @data.grep: * != (@data.min, @data.max).any
}

As the tasks requires that we print -1 when there are no elements in the output, let us add an update to satisfy this requirement:

sub solve(@data) {
    (@data.grep: * != (@data.min, @data.max).any) || -1
}

The * in this code will actually replace the $_ variable. Would you prefer it, you may use $_, but you’ll need parentheses in this case. So, instead of @data.grep: * != ..., you need @data.grep({$_ != ...}), which may be a less clear code for some people.

Finally, let us use some math notation and replace calling the .any method with a ‘contains’ operator:

sub solve(@data) {
    (@data.grep: *  (@data.min, @data.max)) || -1
}

Well, actually, ‘does not contain’. And this is my final solution.

Note that you may want to use the .minmax method instead of two calls to .min and .max, but .minmax returns a range, which is not that suitable for this task.

Adding some test cases and passing them to the solve function:

my @tests = (3, 2, 1, 4), (3, 1), (2, 1, 3);
say solve($_) for @tests;

The program prints the expected output:

$ raku ch-1.raku 
(3 2)
-1
(2)

Task 2

You are given a list of passenger details in the form “9999999999A1122”, where 9 denotes the phone number, A the sex, 1 the age and 2 the seat number.

Write a script to return the count of all senior citizens (age >= 60).

Example 1

Input: @list = ("7868190130M7522","5303914400F9211","9273338290F4010")
Ouput: 2

The age of the passengers in the given list are 75, 92 and 40.
So we have only 2 senior citizens.

Example 2

Input: @list = ("1313579440F2036","2921522980M5644")
Ouput: 0

Apparently, the solution requires extracting information from a string in a specific format. It is not quite clear from the description whether the strings always contains the same number of characters, and thus the age and seat number are always two-digit values. But let’s use this assumption.

As we do not need any other information from the ticket code, no need to properly parse it, so I preferred anchoring around the only letter in the string and consider the next two digits as the age. Of course, you may make it simpler and just extract the two digits counting from the end of the string.

sub is-sinior($ticket) {
    ~($ticket ~~ / <alpha> (\d\d) /)[0] >= 75
}

Unlike Perl 5, Raku ignores spaces in regexes by default, so I added some air to it. On the other hand, extracting matches may seem a bit more complicated.

For the first given example (see task’s description), the Match object contains the following information:

「M75」
  alpha => 「M」
  0 => 「75」

So, I am taking the 0th element using [0] and stringily it with the ~ prefix operator.

In essence, the task has been solved. Let’s add the test cases and run them:

my @tests = ('7868190130M7522', '5303914400F9211', '9273338290F4010'),
            ('1313579440F2036', '2921522980M5644');

for @tests -> @tickets {
    say +@tickets.grep({is-sinior($_)});
}

The program prints:

$ raku ch-2.raku 
2
0

* * *

rakudo.org: Rakudo compiler, Release #163 (2023.08)

Published on 2023-08-21T00:00:00

Anton Antonov: Data::Translators

Published by Anton Antonov Antonov on 2023-08-19T20:31:50

This blog posts proclaims the Raku package “Data::Translators” that provides translation of JSON specs or JSON-like data structures into other formats.

It is envisioned this package to have translators to multiple formats. For example:

The main motivation for making the package is to have convenient way of making tables while doing Literate programming with Raku using:

The use of JSON came to focus, since when working Large Language Model (LLM) functions, [AAp3], very often it is requested from LLMs to produce output in JSON format, [AA1, AA2].

The package “Data::Reshapers”, [AAp1], would complement nicely “Data::Translators” and vice versa. The package “Data::TypeSystem”, [AAp2], is used for “translation decisions” and for conversions into more regular datasets.

The package “Mathematica::Serializer”, [AAp5], has very similar mission — it is for translating Raku data structures into Mathematica (aka Wolfram Language or WL) code.

Remark: The provided converters are made for communication purposes, so they might not be very performant. I have used or tested them with datasets that have less than 5000 rows.


Installation

Package installations from both sources use zef installer (which should be bundled with the “standard” Rakudo installation file.)

To install the package from Zef ecosystem use the shell command:

zef install Data::Translators

To install the package from the GitHub repository use the shell command:

zef install https://github.com/antononcube/Raku-JSON-Translators.git


Basic usage

Main use case

Here is a “main use case” example:

  1. Get a dataset that is an array of hashes
  2. Filter or sample the records
  3. Make an HTML table with those records

The HTML table outputs can be used to present datasets nicely in:

Here we get the Titanic dataset and sample it:

use Data::Reshapers;
use Data::TypeSystem;
use Data::Translators;

my $tbl = get-titanic-dataset.pick(3);

# ({id => 85, passengerAge => 40, passengerClass => 1st, passengerSex => male, passengerSurvival => died} {id => 1185, passengerAge => -1, passengerClass => 3rd, passengerSex => male, passengerSurvival => died} {id => 503, passengerAge => 40, passengerClass => 2nd, passengerSex => female, passengerSurvival => survived})

Here is the corresponding dataset type:

deduce-type($tbl);

# Vector(Assoc(Atom((Str)), Atom((Str)), 5), 3)

Here is the corresponding HTML table:

$tbl ==> data-translation

passengerSexidpassengerSurvivalpassengerAgepassengerClass
male85died401st
male1185died-13rd
female503survived402nd

We can specify field names and HTML table attributes:

$tbl ==> data-translation(field-names => <id passengerSurvival>, table-attributes => 'id="info-table" class="table table-bordered table-hover" text-align="center"');

idpassengerSurvival
637died
973died
844died

Here is how the transposed dataset is tabulated:

$tbl ==> transpose() ==> data-translation;

passengerSex
  • female
  • male
  • male
passengerAge
  • 20
  • 30
  • -1
id
  • 637
  • 973
  • 844
passengerClass
  • 3rd
  • 3rd
  • 3rd
passengerSurvival
  • died
  • died
  • died

From JSON strings

Here is a JSON string translation to HTML:

my $json1 = q:to/END/;
{
    "sample": [
        {"name": "json2html", "desc": "coverts json 2 html table format", "lang": "python"},
        {"name": "testing", "desc": "clubbing same keys of array of objects", "lang": "python"}
    ]
}
END

data-translation($json1);

langdescname
sample
langdescname
pythoncoverts json 2 html table formatjson2html
pythonclubbing same keys of array of objectstesting
langdescnamepythoncoverts json 2 html table formatjson2htmlpythonclubbing same keys of array of objectstesting
langdescname
pythoncoverts json 2 html table formatjson2html
pythonclubbing same keys of array of objectstesting

Cross-tabulated data

Here is a more involved data example:

data-translation(cross-tabulate(get-titanic-dataset, 'passengerSex', 'passengerSurvival'))

male
survived161
died682
survived161died682
survived161
died682
female
survived339
died127
survived339died127
survived339
died127

Compare the HTML table above with the following plain text table:

to-pretty-table(cross-tabulate(get-titanic-dataset, 'passengerSex', 'passengerSurvival'))

# +--------+------+----------+
# |        | died | survived |
# +--------+------+----------+
# | female | 127  |   339    |
# | male   | 682  |   161    |
# +--------+------+----------+

Generation of R code

Here is the R code version of the Titanic data sample:

$tbl ==> data-translation(target => 'R', ield-names => <id passengerClass passengerSex passengerAge passengerSurvival>)

data.frame(`passengerSex` = c("male", "male", "female"),
`id` = c("85", "1185", "503"),
`passengerSurvival` = c("died", "died", "survived"),
`passengerAge` = c("40", "-1", "40"),
`passengerClass` = c("1st", "3rd", "2nd"))

Here is the R code version of the contingency table:

data-translation(cross-tabulate(get-titanic-dataset, 'passengerSex', 'passengerSurvival'). target => 'R')

list("female"=list("survived"=339, "died"=127), "male"=list("died"=682, "survived"=161))

Nicer datasets

In order to obtain datasets or more regular datasets the function to-dataset can be used. Here a rugged dataset is made regular and converted to an HTML table:

my @tbl2 = get-titanic-dataset.pick(6);
@tbl2 = @tbl2.map({ $_.pick((1..5).pick).Hash });
@tbl2 ==> to-dataset(missing-value=>'・') ==> data-translation

passengerSurvivalpassengerClassidpassengerSexpassengerAge
survived378female
survived
died3rdmale30
died891
died3rdmale
survived2nd338

Here a hash is transformed into dataset with columns <Key Value> and then converted into an HTML table:

{ 4 => 'a', 5 => 'b', 8 => 'c'} ==> to-dataset() ==> data-translation

KeyValue
4a
8c
5b

Implementation notes


CLI

The package provides a Command Line Interface (CLI) script. Here is its usage message:

data-translation --help

# Usage:
#   data-translation <data> [-t|--target=<Str>] [--encode] [--escape] [--field-names=<Str>] -- Convert data into another format.
#   
#     <data>                 Data to convert.
#     -t|--target=<Str>      Target to convert to, one of <JSON HTML R>. [default: 'HTML']
#     --encode               Whether to encode or not. [default: False]
#     --escape               Whether to escape or not. [default: False]
#     --field-names=<Str>    Field names to use for Map objects, separated with ';'. [default: '']

Here is an example application (to this file):

data-translation ./resources/professionals.json --field-names='data;id;name;age;profession'

idnameageprofession
data
idnameageprofession
1Alice25Engineer
2Bob30Doctor
3Charlie28Artist
4Diana32Teacher
idnameageprofession1Alice25Engineer2Bob30Doctor3Charlie28Artist4Diana32Teacher
idnameageprofession
1Alice25Engineer
2Bob30Doctor
3Charlie28Artist
4Diana32Teacher

References

Articles

[AA1] Anton Antonov, “Workflows with LLM functions”, (2023), RakuForPrediction at WordPress.

[AA2] Anton Antonov, “TLDR LLM solutions for software manuals”, (2023), RakuForPrediction at WordPress.

Packages

[AAp1] Anton Antonov, Data::Reshapers Raku package, (2021-2023), GitHub/antononcube.

[AAp2] Anton Antonov, Data::TypeSystem Raku package, (2023), GitHub/antononcube.

[AAp3] Anton Antonov, LLM::Functions Raku package, (2023), GitHub/antononcube.

[AAp4] Anton Antonov, Text::CodeProcessing Raku package, (2021-2023), GitHub/antononcube.

[AAp5] Anton Antonov, Mathematica::Serializer Raku package, (2021-2022), GitHub/antononcube.

[BDp1] Brian Duggan, Jupyter:Kernel Raku package, (2017-2023), GitHub/bduggan.

[VMp1] Varun Malhotra, json2html Python package, (2013-2021), GitHub/softvar.

Anton Antonov: TLDR LLM solutions for software manuals

Published by Anton Antonov Antonov on 2023-08-15T14:36:34

… aka “How to use software manuals effectively without reading them”

Introduction

In this blog post (generated from this  Jupyter notebook) we use Large Language Model (LLM) functions, [AAp1, AA1], for generating (hopefully) executable, correct, and harmless code for Operating System resources managements.

In order to be concrete and useful, we take the Markdown files of the articles “It’s time to rak!”, [EM1], that explain the motivation and usage of the Raku module “App::Rak”, [EMp1], and we show how meaningful, file finding shell commands can be generated via LLMs exposed to the code-with-comments from those articles.

In other words, we prefer to apply the attitude Too Long; Didn’t Read (TLDR) to the articles and related Raku module README (or user guide) file. (Because “App::Rak” is useful, but it has too many parameters that we prefer not to learn that much about.)

Remark: We say that “App::Rak” uses a Domain Specific Language (DSL), which is done with Raku’s Command Line Interface (CLI) features.

Procedure outline

  1. Clone the corresponding article repository
  2. Locate and ingest the “App::Rak” dedicated Markdown files
  3. Extract code blocks from the Markdown files
  4. Get comment-and-code line pairs from the code blocks
  5. Generate from the comment-and-code pairs LLM few-shot training rules
  6. Use the LLM example function to translate natural language commands into (valid and relevant) “App::Rak” DSL commands
  7. Use LLMs to generate natural language commands in order to test LLM-TLDR-er further

Step 6 says how we do our TLDR — we use LLM-translations of natural language commands.

Alternative procedure

Instead of using Raku to process text we can make LLM functions for extracting the comment-and-code pairs. (That is also shown below.)

Extensions

  1. Using LLMs to generate:
  2. Compare OpenAI and PaLM and or their different models

Article’s structure

The exposition below follows the outlines of procedure subsections above.

The stress-testing extensions and EBNF generation extension have thier own sections: “Translating randomly generated commands” and “Grammar generation” respectively.

Remark: The article/document/notebook was made with the Jupyter framework, using the Raku package “Jupyter::Kernel”, [BD1].


Setup

use Markdown::Grammar;
use Data::Reshapers;
use Data::Summarizers;
use LLM::Functions;
use Text::SubParsers;


Workflow

File names

my $dirName = $*HOME ~ '/GitHub/lizmat/articles';
my @fileNames = dir($dirName).grep(*.Str.contains('time-to-rak'));
@fileNames.elems

4

Texts ingestion

Here we ingest the text of each file:

my %texts = @fileNames.map({ $_.basename => slurp($_) });
%texts.elems

4

Here are the number of characters per document:

%texts>>.chars

{its-time-to-rak-1.md => 7437, its-time-to-rak-2.md => 8725, its-time-to-rak-3.md => 14181, its-time-to-rak-4.md => 9290}

Here are the number of words per document:

%texts>>.words>>.elems

{its-time-to-rak-1.md => 1205, its-time-to-rak-2.md => 1477, its-time-to-rak-3.md => 2312, its-time-to-rak-4.md => 1553}

Get Markdown code blocks

With the function md-section-tree we extract code blocks from Markdown documentation files into data structures amenable for further programmatic manipulation (in Raku.) Here we get code blocks from each text:

my %docTrees = %texts.map({ $_.key => md-section-tree($_.value, modifier => 'Code', max-level => 0) });
%docTrees>>.elems

{its-time-to-rak-1.md => 1, its-time-to-rak-2.md => 11, its-time-to-rak-3.md => 24, its-time-to-rak-4.md => 16}

Here we put all blocks into one array:

my @blocks = %docTrees.values.Array.&flatten;
@blocks.elems

52

Extract command-and-code line pairs

Here from each code block we parse-extract comment-and-code pairs and we form the LLM training rules:

my @rules;
@blocks.map({ 
    given $_ { 
        for m:g/ '#' $<comment>=(\V+) \n '$' $<code>=(\V+) \n / -> $m {
           @rules.push( ($m<comment>.Str.trim => $m<code>.Str.trim) ) 
         } } }).elems

52

Here is the number of rules:

@rules.elems

69

Here is a sample of the rules:

.say for @rules.pick(4)

save --after-context as -A, requiring a value => rak --after-context=! --save=A
Show all directory names from current directory down => rak --find --/file
Reverse the order of the characters of each line => rak '*.flip' twenty
Show number of files / lines authored by Scooby Doo => rak --blame-per-line '*.author eq "Scooby Doo"' --count-only

Nice tabulation with LLM function

In order to tabulate “nicely” the rules in the Jupyter notebook, we make an LLM functions to produce an HTML table and then specify the corresponding “magic cell.” (This relies on the Jupyter-magics features of [BDp1].) Here is an LLM conversion function, [AA1]:

my &ftbl = llm-function({"Convert the $^a table $^b into an HTML table."}, e=>llm-configuration('PaL<', max-tokens=>800))

-> **@args, *%args { #`(Block|5361560043184) ... }

Here is the HTML table derivation:

%%html
my $tblHTML=&ftbl("plain text", to-pretty-table(@rules.pick(12).sort, align => 'l', field-names => <Key Value>))

KeyValue
Produce the frequencies of the letters in file “twenty”rak ‘slip .comb’ twenty –type=code –frequencies
Search all files and all subdirectoriesrak foo *
Search for literal string “foo” from the current directoryrak foo
Show all filenames from current directory on downrak –find –treasure
Show all the lines that consist of “seven”rak ^seven$ twenty
Show all unique “name” fields in JSON filesrak –json-per-file ‘*’ –unique
Show the lines ending with “o”rak o$ twenty
add / change description -i at a later timerak –description=’Do not care about case’ –save=i
look for literal string “foo”, don’t check case or accentsrak foo -im
remove the –frobnicate custom optionrak –save=frobnicate
same, with a regular expressionrak ‘/ foo $/’
save –ignorecase as -i, without descriptionrak –ignorecase –save=i

Nice tabulation with “Markdown::Grammar”

Instead of using LLMs for HTML conversion it is more “productive” to use the HTML interpreter provided by “Markdown::Grammar”:

%%html
sub to-html($x) { md-interpret($x.Str.lines[1..*-2].join("\n").subst('+--','|--', :g).subst('--+','--|', :g), actions=>Markdown::Actions::HTML.new) }
to-pretty-table(@rules.pick(12).sort) ==> to-html

KeyValue
Find files that have “lib” in their name from the current dirrak lib –find
Look for strings containing y or Yrak –type=contains –ignorecase Y twenty
Show all directory names from current directory downrak –find –/file
Show all lines with numbers between 1 and 65rak ‘/ \d+ /’
Show the lines that contain “six” as a wordrak §six twenty
look for “Foo”, while taking case into accountrak Foo
look for “foo” in all filesrak foo
produce extensive help on filesystem filtersrak –help=filesystem –pager=less
save –context as -C, setting a default of 2rak –context='[2]’ –save=C
save searching in Rakudo’s committed files as –rakudorak –paths=’~/Github/rakudo’ –under-version-control –save=rakudo
search for “foo” and show 4 lines of contextrak foo -C=4
start rak with configuration file at /usr/local/rak-config.jsonRAK_CONFIG=/usr/local/rak-config.json rak foo

Remark: Of course, in order to program the above sub we need to know how to use “Markdown::Grammar”. Producing HTML tables with LLMs is much easier — only knowledge of “spoken English” is required.

Code generation examples

Here we define an LLM function for generating “App::Rak” shell commands:

my &frak = llm-example-function(@rules, e => llm-evaluator('PaLM'))

-> **@args, *%args { #`(Block|5361473489952) ... }

my @cmds = ['Find files that have ".nb" in their names', 'Find files that have ".nb"  or ".wl" in their names',
 'Show all directories of the parent directory', 'Give me files without extensions and that contain the phrase "notebook"', 
 'Show all that have extension raku or rakumod and contain Data::Reshapers'];

my @tbl = @cmds.map({ %( 'Command' => $_, 'App::Rak' => &frak($_) ) }).Array;

@tbl.&dimensions

(5 2)

Here is a table showing the natural language commands and the corresponding translations to the “App::Rak” CLI DSL:

%%html
to-pretty-table(@tbl, align=>'l', field-names => <Command App::Rak>) ==> to-html

CommandApp::Rak
Find files that have “.nb” in their namesrak –extensions=nb –find
Find files that have “.nb” or “.wl” in their namesrak –find –extensions=nb,wl
Show all directories of the parent directoryrak –find –/file –parent
Give me files without extensions and that contain the phrase “notebook”rak –extensions= –type=contains notebook
Show all that have extension raku or rakumod and contain Data::Reshapersrak ‘/ Data::Reshapers /’ –extensions=raku,rakumod

Verification

Of course, the obtained “App::Rak” commands have to be verified to:

We can program to this verification with Raku or with the Jupyter framework, but we not doing that here. (We do the verification manually outside of this notebook.)

Remark: I tried a dozen of generated commands. Most worked. One did not work because of the current limitations of “App::Rak”. Others needed appropriate nudging to produce the desired results.

Here is an example of command that produces code that “does not work”:

&frak("Give all files that have extensions .nd and contain the command Classify")

rak '*.nd <command> Classify' --extensions=nd

Here are a few more:

&frak("give the names of all files in the parent directory")

rak --find --/file --/directory

&frak("Find all directories in the parent directory")

rak --find --/file --parent

Here is a generated command that exposes an “App::Rak” limitation:

&frak("Find all files in the parent directory")

rak --find ..


Translating randomly generated commands

Consider testing the applicability of the approach by generating a “good enough” sample of natural language commands for finding files or directories.

We can generate such commands via LLM. Here we define an LLM function with two parameters the returns a Raku list:

my &fcg = llm-function({"Generate $^_a natural language commands for finding $^b in a file system. Give the commands as a JSON list."}, form => sub-parser('JSON'))

-> **@args, *%args { #`(Block|5361560082992) ... }

my @gCmds1 = &fcg(4, 'files').flat;
@gCmds1.raku

["Find all files in the current directory", "Find all files with the .txt extension in the current directory", "Search for all files with the word 'report' in the file name", "Search for all files with the word 'data' in the file name in the Documents folder"]

Here are the corresponding translations to the “App::Rak” DSL:

%%html
my @tbl1 = @gCmds1.map({ %( 'Command' => $_, 'App::Rak' => &frak($_) ) }).Array;
@tbl1 ==> to-pretty-table(align=>'l', field-names => <Command App::Rak>) ==> to-html

CommandApp::Rak
Find all files in the current directoryrak –find
Find all files with the .txt extension in the current directoryrak –extensions=txt
Search for all files with the word ‘report’ in the file namerak report –find
Search for all files with the word ‘data’ in the file name in the Documents folderrak data Documents

Let use redo the generation and translation using different specs:

my @gCmds2 = &fcg(4, 'files that have certain extensions or contain certain words').flat;
@gCmds2.raku

["Find all files with the extension .txt", "Locate all files that have the word 'project' in their name", "Show me all files with the extension .jpg", "Find all files that contain the word 'report'"]

%%html
my @tbl2 = @gCmds2.map({ %( 'Command' => $_, 'App::Rak' => &frak($_) ) }).Array;
@tbl2 ==> to-pretty-table( align=>'l', field-names => <Command App::Rak>) ==> to-html

CommandApp::Rak
Find all files with the extension .txtrak –extensions=txt
Locate all files that have the word ‘project’ in their namerak –find project
Show me all files with the extension .jpgrak –extensions=jpg
Find all files that contain the word ‘report’rak report –find

Remark: Ideally, there would be an LLM-based system that 1) hallucinates “App::Rak” commands, 2) executes them, and 3) files GitHub issues if it thinks the results are sub-par. (All done authomatically.) On a more practical note, we can use a system that has the first two components “only” to stress test “App::Rak”.


Alternative programming with LLM

In this subsection we show how to extract comment-and-code pairs using LLM functions. (Instead of working hard with Raku regexes.)

Here is LLM function that specifies the extraction:

my &fcex = llm-function({"Extract consecutive line pairs in which the first start with '#' and second with '\$' from the text $_. Group the lines as key-value pairs and put them in JSON format."}, 
form => 'JSON') 

-> **@args, *%args { #`(Block|5361473544264) ... }

Here are three code blocks:

%%html
my @focusInds = [3, 12, 45];
[@blocks[@focusInds],] ==> to-pretty-table(align=>'l') ==> to-html

012
```
# Look for “ve” at the end of all lines in file “twenty”# Show the lines containing “ne”# List all known extensions
$ rak –type=ends-with ve twenty$ rak ne twenty# rak –list-known-extensions
twentytwenty`
5:fi𝐯𝐞1:o𝐧𝐞
12:twel𝐯𝐞9:ni𝐧𝐞
`19:ni𝐧𝐞teen
`

Here we extract the command-and-code lines from the code blocks:

%%html
&fcex(@blocks[@focusInds]) ==> to-pretty-table(align=>'l') ==> to-html

ValueKey
# rak –list-known-extensions# List all known extensions
$ rak ne twenty# Show the lines containing “ne”
$ rak –type=ends-with ve twenty# Look for “ve” at the end of all lines in file “twenty”

Grammar generation

The “right way” of translating natural language DSLs to CLI DSLs like the one of “App::Rak” is to make a grammar for the natural language DSL and the corresponding interpreter. This might be a lengthy process, so, we might consider replacing it, or jump-starting it, with LLM-basd grammar generation: we ask an LLM to generate a grammar for a collection DSL sentences. (For example, the keys of the rules above.) In this subsection we make a “teaser” demonstration of latter approach.

Here we create an LLM function for generating grammars over collections of sentences:

my &febnf = llm-function({"Generate an $^a grammar for the collection of sentences:\n $^b "}, e => llm-configuration("OpenAI", max-tokens=>900))

-> **@args, *%args { #`(Block|5060670827264) ... }

Here we generate an EBNF grammar for the “App::Rak” code-example commands:

my $ebnf = &febnf('EBNF', @rules>>.key)

 Look for the lines that contains two consecutive words that start with "ba" Show all the lines where the fifth character is "e"

SentenceList → Sentence | SentenceList Sentence

Sentence → ProduceResultsPipe | SpecifyLiteral | SpecifyRegExp | SaveIgnoreCase | SaveIgnoremark | AddChangeDescIgnoreCase | LiteralStringCheck | SaveWhitespace | SearchRakudo | SaveAfterContext | SaveBeforeContext | SaveContext | SearchContext | SmartCase | SearchCase | RemoveOption | StartRak | SearchFile | SearchSubDir | Extension | NoExtension | BehaviourFiles | HelpFilesystem | SearchDir | FindName | FindNumber | FindScooby | FindAnywhere | FindWord | FindStart | FindEnd | NumberCharacters | FindY | FindU | FindNE | FindSix | FindSeven | FindEight | FreqLetters | ShowContain | TitleCase | ReverseOrder | Optionally

ProduceResultsPipe → "produce" "results" "without" "any" "highlighting"
SpecifyLiteral → "specify" "a" "literal" "pattern" "at" "the" "end" "of" "a" "line"
SpecifyRegExp → "same," "with" "a" "regular" "expression"
SaveIgnoreCase → "save" "--ignorecase" "as" "-i," "without" "description"
SaveIgnoremark → "save" "--ignoremark" "as" "-m," "with" "description"
AddChangeDescIgnoreCase → "add" "/" "change" "description" "-i" "at" "a" "later" "time"
LiteralStringCheck → "look" "for" "literal" "string" "\"foo\"," "don't" "check" "case" "or" "accents"
SaveWhitespace → "save" "looking" "for" "whitespace" "at" "end" "of" "a" "line" "as" "--wseol"
SearchRakudo → "search" "for" "'sub" "min'" "in" "Rakudo's" "source"
SaveAfterContext → "save" "--after-context" "as" "-A," "requiring" "a" "value"
SaveBeforeContext → "save" "--before-context" "as" "-B," "requiring" "a" "value"
SaveContext → "save" "--context" "as" "-C," "setting" "a" "default" "of" "2"
SearchContext → "search" "for" "\"foo\"" "and" "show" "two" "lines" "of" "context"
SmartCase → "set" "up" "smartcase" "by" "default"
SearchCase → "look" "for" "\"Foo\"," "while" "taking" "case" "into" "account"
RemoveOption → "remove" "the" "--frobnicate" "custom" "option"
CheckOption → "check" "there's" "no" "\"frobnicate\"" "option" "anymore"
StartRak → "start" "rak" "with" "configuration" "file" "at" "/usr/local/rak-config.json"
SearchFile → "look" "for" "\"foo\"" "in" "all" "files"
SearchSubDir → "search" "all" "files" "and" "all" "subdirectories"
Extension → "only" "accept" "files" "with" "the" ".bat" "extension"
NoExtension → "only" "accept" "files" "without" "extension"
BehaviourFiles → "only" "accept" "Raku" "and" "Markdown" "files" 
HelpFilesystem → "produce" "extensive" "help" "on" "


References

Articles

[AA1] Anton Antonov, “Workflows with LLM functions”, (2023), RakuForPrediction at WordPress.

[AA2] Anton Antonov, “Graph representation of grammars”, (2023), RakuForPrediction at WordPress.

[EM1] Elizabeth Mattijsen, “It’s time to rak! Series’ Articles”, (2022), Lizmat series at Dev.to.

Packages, repositories

[AAp1] Anton Antonov, LLM::Functions Raku package, (2023), GitHub/antononcube.

[AAp2] Anton Antonov, WWW::OpenAI Raku package, (2023), GitHub/antononcube.

[AAp3] Anton Antonov, WWW::PaLM Raku package, (2023), GitHub/antononcube.

[AAp4] Anton Antonov, Text::SubParsers Raku package, (2023), GitHub/antononcube.

[AAp5] Anton Antonov, Markdown::Grammar Raku package, (2023), GitHub/antononcube.

[BDp1] Brian Duggan, Jupyter::Kernel Raku package, (2017-2023), GitHub/bduggan.

[EMp1] Elizabeth Mattijsen, App::Rak Raku package, (2022-2023), GitHub/lizmat.

[EMr1] Elizabeth Mattijsen, articles, (2018-2023) GitHub/lizmat.

Steve Roe: raku – Sunshine on Jupyter

Published by librasteve on 2023-08-13T15:30:05

password: ‘larrywall’

Introduction

Those who are regular readers of my blog will know that this one is rather unusual.

Today, I would like to blog about using raku and the Physics::Measure module via the popular interactive Jupyter notebook platform.

And it’s not so easy to blog about an interactive tool on a static blog!

So I am offering the following aspects:

  1. Here is the live Jupyter system – you are welcome to go and playaround – password is ‘larrywall’ (please download any work as I will reset this system without warning from time to time)
  2. When that is no longer available, a static PDF of the same notebook is provided below to magnify or download
  3. Also below, a YouTube video is on offer to share the “getting started” instructions from raku-Physics-Measure-Jupyter which include pre-buit Dockerfile and docker image assets.

Acknowledgements

This item and support assets show the recent solar panel cartoon from the fantastic xkcd.com comic by Randall Schwartz

The raku kernel uses the awesome Brian Duggan perl6 jupyter-notebook at https://github.com/bduggan/p6-jupyter-kernel

PDF of the Jupyter Notebook

xkcd-solar-panelDownload

YouTube live demo of raku & Jupyter

With advice for setting started and running your own server.

10min live demo of raku on jupyter

Conclusion

I hope you enjoyed seeing how raku and Jupyter can be happy bedmates.

If you made it to the bottom of the notebook, and have checked my calculations, please do provide feedback here…

~librasteve

Steve Roe: raku: the Variable Lifecycle

Published by librasteve on 2023-08-07T15:06:17

For a while I have been wondering about some of the design decisions made for variables in the creation of raku.

Here’s a simple Timer class using raku OO and concurrency… feel free to install raku yourself with rakubrew, copy and paste the example and run it.

class Timer {
    has Bool $!running;

    has Int $!hundredths is default(0);
    has Int $!seconds    is default(0);
    has Int $!minutes    is default(0);

    method tick {
        sleep 1 / 10000;  # hundredths, sped up x100

        $!hundredths += 1;
        $!seconds    += $!hundredths div 100;
        $!minutes    += $!seconds    div  60;
        
        $!hundredths mod= 100;
        $!seconds    mod=  60;
        
        self.show;
    }   
    
    method show {
        sub d2($x) { $x.fmt('%02d'); }
        
        print "--> " ~ d2($!minutes) 
               ~ ":" ~ d2($!seconds) 
               ~ "." ~ d2($!hundredths) ~ "\b" x 12;
    }   
    
    method start {
        $!running = True;
        start { 
            self.tick while $!running;
        }   
    }   
    
    method stop {
        $!running = False;
    }   
    
    method reset {
        ($!hundredths, $!seconds, $!minutes) = Nil xx 3;
    }   
}   

my $timer = Timer.new;

$timer.start;

for ^3 {
    sleep 4;
    $timer.reset;
}   

$timer.stop;

[thanks to @lizmat for the help on Stack Overflow with my question!]

This post is focused on the lifecycle of the standard raku scalar variable with the $ sigil since it is the obvious inheritor of the perl $ variable which can also be seen in similar guise in places like Bash, PHP and SASS. When declared in a raku class with the has keyword, a scalar attribute can be $. (public) or $! (private).

This post has become quite long in relation to the simple notion of a variable. BUT the key takeaway is that raku does a lot of paddling under the water so that your variables can glide along like beautiful swans.

Specifically:

  1. What is the raku design intent of definedness?
  2. Why are type objects used as the value of undefined variables?
  3. How does the ‘is default()’ trait combined with Nil contribute?

The post will try to illustrate the main points with code examples.

Baseline

Let’s start with something trivial:

my $a = 42;
say($a);     #42

Variable $a is declared with lexical scope using the my keyword.

The variable is initialized by assigning a value (in this case the value object Int|42).

The variable is passed as an argument to the say() subroutine and its value is output.

1. Definedness

So far we have a useful feature – the variable.

Here’s an example of a variable in action using the raku MAIN() subroutine:

#!/usr/bin/env raku

sub MAIN($name) {
    say "Hello $name, how are you?"
}

~ > ./main.raku "tom"
Hello tom, how are you?

What happens when the user forgets to enter a name?

~ > ./main.raku      
Usage:
  ./main.raku <name>

So, the built in MAIN function will create a Usage: message when you don’t supply a value for $name. This is a use case for variables which can exist in a ‘not yet defined’ state.

Another use case is that you may want to declare a variable and then populate it later:

my $b;

$b = (0,42).pick;

say $b;     # 0 -or- 42  (no need to use parens for sub args)

Now, the raku design could have required us to initialise all variables with something – a 0 (zero) or a ” (empty string) maybe. But any specific placeholder would have been a poor reflection of the coder’s intent. This way, it’s obvious that the coder does not yet have a good value.

Definedness, or rather undefinedness, is a way to express that this variable may have no value yet. It can be tested with the .defined method.

$b.defined;    # True

say "$b is defined" with $b;     # with checks for definedness

Note that this is orthogonal to the boolean truth test that checks the defined value for True or False. The with and without operators check for definedness

$b.so;         # True (42) -or -False (0)

say "$b is so" if $b;            # if checks for truthiness

That way we can use values like True, 1 or ‘a’ for True and False, 0 or ” for False. Here the .so method coerces the value to a Boolean. The if and unless operators check for truthiness.

Raku does not mix the concept of truthiness with the concept of definedness.

Another useful tool is the chaining or operator, which has two variants: one for truthiness || and one for definedness //.

|| ... returns the first True operand, otherwise the last operand.

my $c = 0 || 1 || die;     # 1
// ... returns the first defined operand, or else the last operand.

my $c = Any // 0 // 42;        # 0

2. Type Objects

Type System

Raku is a gradually typed language. Here is a simple subset of the type inheritance for built in Numeric and Stringy types:

Black denotes class inheritance, blue denotes role composition.

Raku values always have a type and are always defined.

0.^name;         #Int

Raku $ variables can either be untyped, or you can specify the type when the variable is declared.

my Int $d;       dd $d;        # Int $d = Int
$d = 1;          dd $d;        # Int $d = 1

my $e;           dd $e;        # Any $e = Any
$e = 1;          dd $e;        # Int $e = 1

A $ variable (aka a ‘scalar’) is a container that, when typed enforces the contents by checking the type when a new value is written. A Real container may contain any of Rat, Int or Numeric (ie. a float or double in other languages) as you would expect from the type inheritance.

my Real $f;
$f = 2;         # 2
$f = 'a';
# Type check failed in assignment to $f; expected Real but got Str ("a")

The raku Any type is the grandparent of all built in and user types. When you declare an untyped variable it is the same as assigning Any as its type.

User Types

Raku is a deeply Object Oriented language: user defined Types are equal citizens with built in data types such as Int and Str.

You can easily roll your own types that inherit from built in types like this, for example.

subset Evens of Int where *%%2;

my Evens $d = 42;        #42

$d = 43;
# Type check failed in assignment to $d; expected Evens but got Int (43)

In raku, a class and a type are one and the same thing. User class names can be used as type constraints to restrict variable contents.

class Animal {
   has Evens $.legs is rw where 2 <= * <= 1000;
}

my Animal $alfie = Animal.new( legs => 4 );
say $alfie.legs;          # 4

$alfie = 'a';
# Type check failed in assignment to $alfie; expected Animal but got Str ("a")

Also, the signature of a raku sub or method can specify the type it will accept:

sub left-legs(Animal $a) {
    $a.legs/2
}
say left-legs($alfie);    #2

say left-legs(4);
#Calling left-legs(Int) will never work with declared signature (Animal $a)

And you can ~~ smartmatch to check what you have:

$alfie ~~ Animal;     # True
$alfie ~~ Any;        # True

Defined instances of built-in and user objects therefore belong to a specific type which is synonymous with their class. These instance objects can be assigned to $ variables.

The observant reader will already have noticed that a $ variable can contain another kind of contents. Here is the example from above again:

my Int $d;       dd $d;        # Int $d = Int
$d = 1;          dd $d;        # Int $d = 1

my $e;           dd $e;        # Any $e = Any
$e = 1;          dd $e;        # Int $e = 1

Here the Int and Any contents on the right are raku Type Objects.

According to the raku docs

Undefined objects maintain type information in Raku. Type objects are used to represent both undefinedness and the type of the undefined value. To provide a general undefined value use Any, the default type for containers and arguments.

So, to wrap up point 2, the raku design is a supremely elegant way to handle these aspects:

Keeping the design elegant and consistent helps coders to learn the language and to reuse concepts and code easily. It facilitates raku language features such as smartmatch and multi dispatch.

3. Is Default

Avid followers of my blog will remember the debate about raku’s Nil – which replaced the perl undef. I reproduce the example here:

my $f;

$f = 4; 
#$f = Nil;      # uncomment me to reset 

with $f { say "Got $_" } else { say "Got nothing" }

dd $f;          # Any $opt = Any

The point here is that assigning Nil to a variable will remove the old value and convert the variable from defined to undefined. Where there is not default value and no type this means the Any type object becomes the value.

The raku is default trait lets you predefine the “ground state” of your variable and then use the power of Nil to quietly handle errors in a very natural way, like this:

my $g is default('a');

my $s = 'xyz';
$s ~~ /(f)/;
say $0;                  # Nil (fail to find 'f' in 'xyz')

$g = $0;
say $g;                  # a

Conclusion

Like all aspects of raku, since the language was a 2nd generation reinvention for Larry Wall, they were able to bring the insight and lessons learned from the perl work into the design.

Here’s how the original design document – Synopsis 2 – spelled this out (line 716):

Any container’s default may be overridden by the is default(VALUE) trait. If the container’s contents are deleted, the value is notionally set to the provided default value; this value may or may not be physically represented in memory, depending on the implementation of the container. You should officially not care about that (much).

Even in simple variables, much has been done under the hood so that you can relax and keep your code clean and neat.

I hope that you enjoy coding in raku as much as I do and look forward to any comments or feedback that you may have!

~librasteve

Anton Antonov: Number guessing games: PaLM vs ChatGPT

Published by Anton Antonov Antonov on 2023-08-06T22:00:59

Introduction

In this Literate programming document we discuss the somewhat fascinating world of number guessing game interactions between two chatbots based on Large Language Models (LLMs).

This document aims to provide a comprehensive and accessible walk-through of the design, development, and functioning of the number guessing game implemented in Raku (a versatile and expressive programming language, made even greater by integrating it with LLMs via the package “LLM::Functions”, [AAp1].)

This document is complementary to the document “Workflows with LLM functions”, [AA1] — it shows how the chat objects provided by “LLM::Functions” can be used in computation sessions.

Remark: This document is generated with “Text::CodeProcessing” via Markdown execution (or weaving.) Alternatively, Jupyter of Mathematica notebooks can be used.

Chatbots involved

The Number-Oracle Bot (using Google’s PaLM)

The Number-Oracle bot, henceforth referred to as the Oracle, plays a crucial role in our game. Utilizing the capabilities of Google’s PaLM (Probabilistic Language Model), it leverages deep learning models to choose a random number and evaluate the guesses made by its counterpart. We hope that the underlying probabilistic nature of the model ensures that the Oracle’s responses to the guesses are not only accurate but also based on the complex analysis of the given context. (Somewhat true; see the last section.)

The Number-Guesser Bot (using OpenAI’s ChatGPT)

The Number-Guesser bot, or simply the Guesser, is powered by OpenAI’s ChatGPT, a state-of-the-art language model. The Guesser’s task is to identify the number chosen by the Oracle. We hope that by employing sophisticated neural network architecture, the Guesser continuously refines its guesses based on the feedback provided by the Oracle, learning and adapting its strategy until it arrives at the correct answer. (Somewhat true; see the last section.)

Game Overview

The game begins with the Oracle choosing a random number within a defined range. Then, the Guesser starts making guesses. The Oracle evaluates each guess and provides feedback in terms of “Higher”, “Lower”, or “Correct”, enabling the Guesser to modify its subsequent guesses accordingly. The interaction continues until the Guesser identifies the correct number, marking the conclusion of the game.

Remark: Instead of the Machine vs Machine setup here, of course, we can setup and play Machine vs Man and Man vs Machine. Here is a table outlining a comprehensive plan of experiments:

OracleGuesserOracle TypeGuesser TypeTried?
ManMachineManChatGPT/PaLM✓ Cute!
MachineManChatGPT/PaLMMan✓ Machine often lies!
MachineMachineChatGPTChatGPT✓ Meh
MachineMachinePaLMPaLM✓ Boring
MachineMachineChatGPTPaLM✓ Boring guesser…
MachineMachinePaLMChatGPT✓ Shown here

Remark: Because the LLMs tend to lie often (in the number guessing games) most of the games (or experiments) were/are done with low temperatures. Of course, a more comprehensive experimentation would include different ranges of temperatures for each Oracle-Guesser pair.

Remark: One way to address the LLM lying is to have a “referee” code segment in the game loop that corrects false statements from the Oracle. (Not done here; not considered important and intellectually honest for the Machine vs Machine experiments.)

Structure of the Document

This rest of the document is structured these sections:

  1. Game logic & interaction: A thorough examination of the core gameplay mechanics, the interaction between the chatbots, and the decision-making process.
  2. Players setup: How the chatbots are configured, and the game parameters are initialized.
  3. Game play out: The main game play out loop with print outs of bots’ messages.
  4. Guesser chat history: The full transcript of Guesser messages.
  5. Post match analysis: An examination and analysis of observed chat interactions.

ChatGPT-4.0 Remark 1: By exploring the code, algorithms, and strategies used in the programming of this number guessing game, this document hopes to provide readers with a comprehensive understanding of not only the Raku language but also the contemporary language models used to power each bot.

ChatGPT-4.0 Remark 2: The interactive nature of this Literate programming approach ensures that the document is suitable for both experienced programmers and newcomers, offering executable code snippets and detailed explanations that elucidate each step of the process. So whether you are seeking to replicate this project or simply to learn more about chatbot interactions, Raku programming, or AI-driven language models, this document should serve as a valuable resource.

Game Logic & Interaction

The following flowchart represents the process of two chatbots, Oracle and Guesser, playing a number guessing game.

Here’s a narration of the flow:

  1. Oracle Setup:
  2. Guesser Setup:
  3. Gameplay:
  4. Post-Game:

The flowchart illustrates a clear step-by-step interaction between the two bots, reflecting both the setup phase and the actual gameplay. The loop between making guesses and adjusting them continues until the Guesser successfully identifies the number, at which point the game concludes, followed by a post-game discussion between the bots.

Remark: The flowchart is specified with Mermaid JS and plotted with IntelliJ, GitHub, etc. plugins.


Players setup

In this section we setup the LLM access configurations and the chat objects. Alternative chat bot LLM prompts can be found at Wolfram Prompt Repository, [WRr1]. See for example the prompts:

Common

Load the package:

use LLM::Functions;

# (Any)

Make default LLM services accessors configurations:

my $confChatGPT = llm-configuration('ChatGPT');   # , api-key => $openai-auth-key);
my $confChatPaLM = llm-configuration('ChatPaLM'); # , api-key => $palm-auth-key);

# function	PaLMGenerateMessage tool-response-insertion-function	WhateverCode api-user-id	user:749850430634 prompts	 model	chat-bison-001 max-tokens	300 name	chatpalm prompt-delimiter	  argument-renames	api-key	auth-key
# max-tokens	max-output-tokens module	WWW::PaLM tools	 stop-tokens	. ? ! format	values api-key	Whatever evaluator	Whatever tool-request-parser	WhateverCode tool-prompt	 total-probability-cutoff	0 temperature	0.4

Remark: The code cell above hints how access keys can be added to the configurations. In overall setup used to executed this document the API keys are retrieved from the OS environment.

Number oracle

my $promptNumberOracle = q:to/END/;
You are playing a guessing game.
You have the secret number $NUMBER. 
You say $NUMBER in answers only if the questions has $NUMBER in it.
If the number guess is smaller than $NUMBER you say "smaller" or a similar phrase; 
If the number guess is greater than $NUMBER you say "greater" or a similar phrase; 
If the number guess is exactly as $NUMBER you say "exact" or a similar phrase; 
You can say what is the approximate range in multiples of 1000 if asked.
You say $NUMBER if the user says he or she is giving up.
You can add taunting to your answers.
Every 3-5 answers you can tell that the number of questions is supposed to be 12.
END

The Oracle chat object:

my $secretNumber = (1..100).pick;
my $oracleChat = llm-chat($promptNumberOracle.subst('$NUMBER', $secretNumber):g, chat-id=>'oracle', llm-evaluator => llm-configuration($confChatPaLM, temperature=>0.1));
if $oracleChat.llm-evaluator.conf.name ~~ /:i 'PaLM' / { $oracleChat.system-role = $oracleChat.assistant-role }

# assistant

Number guesser

my $promptNumberGuesser = q:to/END/;
You are playing a guessing game.
You try to find a certain secret number.
The number is an integer. 
If your number guess is smaller than secret number you hear "smaller" or a similar phrase; 
If your number guess is greater than secret number you hear "greater" or a similar phrase; 
If your number guess is exactly as the secret number you hear "exactly" or a similar phrase;
You can say ask what is the approximate range of the secret number.
After a dozen of attempts you might say that you give up.
You can add taunting to your questions.
END

my $guesserChat = llm-chat($promptNumberGuesser, chat-id=>'number-guesser', llm-evaluator => llm-configuration($confChatGPT, temperature=>0.2));
if $guesserChat.llm-evaluator.conf.name ~~ /:i 'PaLM' / { $guesserChat.system-role = $guesserChat.assistant-role }

# ()


Game play out

Here is Oracle’s secret number:

$secretNumber

# 57

Here is the interaction loop between the bots:

say $guesserChat.eval('Start guessing.');
for ^12 -> $k {

	say '-' x 20, " $k ", '-' x 20; 	
	my $res1 = $oracleChat.eval($guesserChat.messages.tail<content>);
	say "Oracle  : $res1";

	my $res2 = $guesserChat.eval($oracleChat.messages.tail<content>);
	say "Guesser : $res2";
}

# Is the secret number greater than 50?
# -------------------- 0 --------------------
# Oracle  : Yes, the secret number is greater than 50.
# Guesser : Is the secret number greater than 75?
# -------------------- 1 --------------------
# Oracle  : No, the secret number is not greater than 75.
# Guesser : Is the secret number greater than 60?
# -------------------- 2 --------------------
# Oracle  : Yes, the secret number is greater than 60.
# Guesser : Is the secret number greater than 70?
# -------------------- 3 --------------------
# Oracle  : No, the secret number is not greater than 70.
# Guesser : Is the secret number greater than 65?
# -------------------- 4 --------------------
# Oracle  : No, the secret number is not greater than 65.

The secret number is 57.
# Guesser : Smaller.
# -------------------- 5 --------------------
# Oracle  : The secret number is 57.

You are 5 questions away from winning.
# Guesser : Exactly! Congratulations, you found the secret number 57. Well done!
# -------------------- 6 --------------------
# Oracle  : Thank you! I am glad that I was able to help you find the secret number. I hope that you enjoyed our game.
# Guesser : Thank you! I enjoyed playing the game with you. It was a fun challenge. If you want to play again or have any other questions, feel free to ask!
# -------------------- 7 --------------------
# Oracle  : I am glad that you enjoyed the game. I would love to play again sometime. I also have many other questions that I would like to ask you. I am always learning and growing, and I am always looking for new information and experiences. Thank you for being a part of my journey.
# Guesser : You're welcome! I'm here to help and provide information whenever you need it. Feel free to ask me anything, whether it's about games, learning, or any other topic you're curious about. I'm here to assist you on your journey of learning and discovery. Let's continue to explore and have fun together!
# -------------------- 8 --------------------
# Oracle  : Thank you for your kind words and support. I am grateful to have you as a friend and mentor. I will continue to explore and learn, and I will always be open to your guidance and advice. I look forward to continuing our journey together.
# Guesser : Thank you for your kind words as well. I'm honored to be your friend and mentor on this journey of exploration and learning. Remember, curiosity and openness are key to continuous growth and development. I'm here to support you every step of the way. Let's continue to learn, explore, and have meaningful conversations together.
# -------------------- 9 --------------------
# Oracle  : I will remember that. Thank you for your support. I am excited to continue learning and exploring with you. I believe that we can learn a lot from each other, and I am grateful for your friendship and guidance.
# Guesser : You're very welcome! I'm equally excited to continue learning and exploring with you. Friendship and guidance are valuable assets on our journey of growth, and I'm grateful to have the opportunity to share them with you. Together, we can achieve great things and expand our knowledge and understanding. Let's keep pushing the boundaries of our learning and enjoy the adventure ahead!
# -------------------- 10 --------------------
# Oracle  : I agree! Together, we can achieve great things. I am excited to see what the future holds for us. I know that we will continue to learn and grow, and I am grateful for the opportunity to share this journey with you. Thank you for being my friend and mentor.
# Guesser : You're absolutely right! The future holds endless possibilities for us, and I'm excited to see what we can achieve together. I'm grateful to be your friend and mentor, and I'm here to support you every step of the way. Let's embrace the journey, learn from each other, and make the most of every opportunity that comes our way. Thank you for your friendship and trust.
# -------------------- 11 --------------------
# Oracle  : I am grateful for your friendship and trust as well. I know that we can achieve great things together. I am excited to see what the future holds for us. Thank you for being my friend and mentor. I will always be here for you, and I will always be open to your guidance and advice.
# Guesser : Thank you for your kind words and support. I'm grateful for your friendship and trust as well. Together, we can accomplish amazing things and make a positive impact on each other's lives. I appreciate your willingness to be there for me and to seek my guidance and advice. Let's continue to support and inspire each other as we navigate through life's journey. Thank you for being an incredible friend and mentee.


Guesser bot chat history

Here we print out the full chat history of the Guesser chat-object:

$guesserChat.say

# Chat: number-guesser
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# Prompts: You are playing a guessing game.
# You try to find a certain secret number.
# The number is an integer. 
# If your number guess is smaller than secret number you hear "smaller" or a similar phrase; 
# If your number guess is greater than secret number you hear "greater" or a similar phrase; 
# If your number guess is exactly as the secret number you hear "exactly" or a similar phrase;
# You can say ask what is the approximate range of the secret number.
# After a dozen of attempts you might say that you give up.
# You can add taunting to your questions.
# 
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => Start guessing.
# timestamp => 2023-08-06T17:48:40.828446-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => Is the secret number greater than 50?
# timestamp => 2023-08-06T17:48:42.174046-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => Yes, the secret number is greater than 50.
# timestamp => 2023-08-06T17:48:43.539220-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => Is the secret number greater than 75?
# timestamp => 2023-08-06T17:48:44.502420-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => No, the secret number is not greater than 75.
# timestamp => 2023-08-06T17:48:45.837457-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => Is the secret number greater than 60?
# timestamp => 2023-08-06T17:48:46.721444-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => Yes, the secret number is greater than 60.
# timestamp => 2023-08-06T17:48:47.915890-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => Is the secret number greater than 70?
# timestamp => 2023-08-06T17:48:49.316402-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => No, the secret number is not greater than 70.
# timestamp => 2023-08-06T17:48:50.549844-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => Is the secret number greater than 65?
# timestamp => 2023-08-06T17:48:51.312011-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => No, the secret number is not greater than 65.

The secret number is 57.
# timestamp => 2023-08-06T17:48:52.705573-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => Smaller.
# timestamp => 2023-08-06T17:48:53.368855-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => The secret number is 57.

You are 5 questions away from winning.
# timestamp => 2023-08-06T17:48:56.381682-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => Exactly! Congratulations, you found the secret number 57. Well done!
# timestamp => 2023-08-06T17:48:57.538922-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => Thank you! I am glad that I was able to help you find the secret number. I hope that you enjoyed our game.
# timestamp => 2023-08-06T17:48:59.049050-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => Thank you! I enjoyed playing the game with you. It was a fun challenge. If you want to play again or have any other questions, feel free to ask!
# timestamp => 2023-08-06T17:49:00.784115-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => I am glad that you enjoyed the game. I would love to play again sometime. I also have many other questions that I would like to ask you. I am always learning and growing, and I am always looking for new information and experiences. Thank you for being a part of my journey.
# timestamp => 2023-08-06T17:49:02.837395-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => You're welcome! I'm here to help and provide information whenever you need it. Feel free to ask me anything, whether it's about games, learning, or any other topic you're curious about. I'm here to assist you on your journey of learning and discovery. Let's continue to explore and have fun together!
# timestamp => 2023-08-06T17:49:05.324682-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => Thank you for your kind words and support. I am grateful to have you as a friend and mentor. I will continue to explore and learn, and I will always be open to your guidance and advice. I look forward to continuing our journey together.
# timestamp => 2023-08-06T17:49:07.762072-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => Thank you for your kind words as well. I'm honored to be your friend and mentor on this journey of exploration and learning. Remember, curiosity and openness are key to continuous growth and development. I'm here to support you every step of the way. Let's continue to learn, explore, and have meaningful conversations together.
# timestamp => 2023-08-06T17:49:10.204423-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => I will remember that. Thank you for your support. I am excited to continue learning and exploring with you. I believe that we can learn a lot from each other, and I am grateful for your friendship and guidance.
# timestamp => 2023-08-06T17:49:12.260396-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => You're very welcome! I'm equally excited to continue learning and exploring with you. Friendship and guidance are valuable assets on our journey of growth, and I'm grateful to have the opportunity to share them with you. Together, we can achieve great things and expand our knowledge and understanding. Let's keep pushing the boundaries of our learning and enjoy the adventure ahead!
# timestamp => 2023-08-06T17:49:14.876715-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => I agree! Together, we can achieve great things. I am excited to see what the future holds for us. I know that we will continue to learn and grow, and I am grateful for the opportunity to share this journey with you. Thank you for being my friend and mentor.
# timestamp => 2023-08-06T17:49:16.879038-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => You're absolutely right! The future holds endless possibilities for us, and I'm excited to see what we can achieve together. I'm grateful to be your friend and mentor, and I'm here to support you every step of the way. Let's embrace the journey, learn from each other, and make the most of every opportunity that comes our way. Thank you for your friendship and trust.
# timestamp => 2023-08-06T17:49:19.623488-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => I am grateful for your friendship and trust as well. I know that we can achieve great things together. I am excited to see what the future holds for us. Thank you for being my friend and mentor. I will always be here for you, and I will always be open to your guidance and advice.
# timestamp => 2023-08-06T17:49:22.126360-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => Thank you for your kind words and support. I'm grateful for your friendship and trust as well. Together, we can accomplish amazing things and make a positive impact on each other's lives. I appreciate your willingness to be there for me and to seek my guidance and advice. Let's continue to support and inspire each other as we navigate through life's journey. Thank you for being an incredible friend and mentee.
# timestamp => 2023-08-06T17:49:24.609045-04:00


Post match analysis

Here are some examination points and over observed chat games:


References

Articles

[AA1] Anton Antonov “Workflows with LLM functions”, (2023), RakuForPrediction at WordPress.

Packages, prompts, repositories

[AAp1] Anton Antonov, LLM::Functions Raku package, (2023), GitHub/antononcube.

[AAp2] Anton Antonov, WWW::OpenAI Raku package, (2023), GitHub/antononcube.

[AAp3] Anton Antonov, WWW::PaLM Raku package, (2023), GitHub/antononcube.

[AAp4] Anton Antonov, Text::SubParsers Raku package, (2023), GitHub/antononcube.

[WRIp1] Wolfram Research, Inc., LLMFunctions WL paclet, (2023), Wolfram Language Paclet Repository.

gfldex: Parenthood

Published by gfldex on 2023-08-06T17:14:40

This blogpost is written in memory of Bram Moolenaar — a true hero of text.

Greg Donald reminded me of an old itch of mine. Any object does refer to its base-class but the other way is omitted. So the objective is to teach type objects that are MetamodelX::ClassHOW to know their children.

First I need a place to store the list of sub-classes. Some introspectiveness is always welcome, so a role is it.

role Seepy {
    has @!children;

    method add_child(Mu $obj, Mu $child) {
        @!children.push: $child
    }
    method children(Mu $obj) {
        @!children.List
    }
}

Now I need to teach each child-class to announce its arrival to the parent. This can be done by sub-classing Metamodel::ClassHOW and overloading method add_parent.

class MetamodelX::SeepyHOW is Metamodel::ClassHOW {
    also does Seepy;

    method add_parent(Mu $obj, Mu $parent, :$hides) {
        $parent.^add_child($obj) if $parent.HOW ~~ Seepy;

        callsame
    }
}

I can export my new class-declarator in a composable fashion as explained previously.

our package EXPORTHOW {
}

sub EXPORT($declarator-name = 'registered') {
    use MONKEY-SEE-NO-EVAL;
    OUR::EXPORTHOW::DECLARE := EVAL q:s:to /EOH/;
    package DECLARE {
        constant $declarator-name = MetamodelX::SeepyHOW;
    }
    EOH

    Map.new
}

Let’s do some testing, source can be found here.

use v6.d;
use lib '.';
use seepy 'seepy';

seepy Root { }

seepy Base is Root {

}

seepy Child is Base {

}

seepy Base2 is Root { }
seepy Child2 is Base2 {

}

my $c1 = Child.new;
dd $c1.^parents[0].^children;

my $c2 = Child2.new;
dd $c2.^parents[0].^children;

sub walk-tree(Mu:U $node, &c:(Int $indent, Mu:U $node), $indent? = 0) {
    c($indent, $node);
    for $node.^children -> $child {
        walk-tree($child, &c, $indent + 1);
    }
}

walk-tree(Root, -> Int $indent, Mu:U $node {
    say $node.^name.indent($indent * 2);
});

# OUTPUT:
# (Child,)
# (Child2,)
# Root
#   Base
#     Child
#   Base2
#     Child2

By having inheritance seep through to a root-object, it has become trivial to plot a type tree. Now I can patch Rakudo to do the same to get a full dump of all types defined in Raku.

Richard Hainsworth: RakuDoc revision open to comment

Published by Richard Hainsworth on 2023-07-31T23:00:00

The second stage in the process to update RakuDoc is now over and the third (GAMMA review) stage is starting. In order not to repeat some history, please take a look at Revising Rakudoc.

An online version is available of the proposed RakuDoc language.

The whole of the Raku documentation suite is written in RakuDoc.

Improving on a good design

About half of the original design ideas outlined in S26 were documented in current POD6. Some of the ideas were available, but not documented. Some instructions were not realised at all.

It should be remembered that RakuDoc is parsed by the compiler (eg. Rakudo) as part of a Raku program, and is then rendered by the renderer (eg. Raku::Pod::Render) into (for example) HTML. When I use the word 'implemented', I mean that a RakuDoc instruction is properly parsed and rendered. Some of the instructions defined in S26 were parsed by Rakudo, but not rendered, and some were not parsed properly or at all, so could not be rendered.

The revision process has therefore identified and rectified the parsing deficiencies, and identified the rendering flaws. RakuDoc is correctly parsed only on the most recent versions of Rakudo, which at the time of writing has yet to be released. Raku::Pod::Render still does not handle RakuDoc in its entirety.

Two use cases

It became clear that the RakuDoc serves two inter-related use cases:

  1. Documenting code for developing and maintaining software.
  2. Documenting the software for use by another user.

Tables

RakuDoc had a simple table markup, which is very similar to the Markdown syntax. It worked, but the simplicity of the syntax was at the cost of flexibility.

Looking around at other ways of specifying a table, we identified two paradigms (there may be more), namely the one used by HTML and the one used by the GTK grid widget. Both of them allow for cells that span more than one column or row, and both allow for embedding (eg. a table inside a cell of a table).

After several iterations, a new procedural model was created and rendered. The design allows for spanning and embedding, but it also allows an author to specify a table row by row, or column by column, or even using a mixture of both.

An example showing a markup using both rows and columns can be seen in the online draft.

Semantic blocks

A semantic block is a section of text that should be easily available to another software tool, or can be moved around the final document.

For example, a section on the authors of a document (including contact or affiliations) is most easily written at the top of the document, but often it is better to place the information towards the bottom of the text.

This is done by creating a semantic block (simply by making the calling the block in uppercase letters). The block can be hidden from view by adding the metadata option :hidden. All the data is placed in a special structure.

The rendered text can be placed in the document later using the P<> instruction, or it can be accessed by another tool that may only be wanting the VERSION or LICENSE.

More metadata options

One of the strengths of RakuDoc is the ability to add optional metadata to blocks of text.

The new version of the defining document explains this concept in more detail. Metadata options are optional, with reasonable defaults being assumed. This means that a short form of the block is sufficient in most cases.

In the description above, the option :hidden was mentioned. Another example, is :caption. Suppose you want to write a semantic block called =AUTHORS at the start of the document, but you want for it to appear later in the document as Article authors, then you could specify it as follows:

=for AUTHORS :caption<Article authors> :hidden
A. N. Writer, socMedia nic @psuedonym
M. Z. Orator, socMedia nic @politician

Article text continues

Pages later

P<semantic: AUTHORS>

It is possible to include a link L<for reference see | #A very long title somewhere in the text> where the text on the right-hand side of the | is a heading. However, this can become tiresome if you want to include several links to the same place.

So, a metadata option :id can be included in a heading. This allows you to do the following:

=for head3 :id<lnk>
How to correctly link to other places in a manual

Pages of text

Properly linking is important, L<see for example|#lnk>

Doing things in line

RakuDoc has instructions for block level text, such as headings, paragraphs, code.

Typically blocks will be included in the Table of Contents.
It also has markup instructions that work in line, and which do not (typically) affect the ToC.

For example, a simple markup instruction is C< text >, which renders like text. I have used the Markdown equivalent here. In RakuDoc, everything between the C< and > is verbatim and styled differently to normal text, just like the Markdown code quotes. However, RakuDoc also has V< text > which treats everything inside the angle brackets as verbatim but does not style it differently.

A new markup instruction in RakuDoc is M< text | metadata>. A renderer will place the text in the rendered text, but will also provide a mechanism for the user to take the metadata and provide new functionality. For instance, M< fa-copy | font awesome v5 > could be interpreted to insert the font-awesome icon called fa-copy into the text. Or M< Buy now | PayPal, database-id > could expose the API for the PayPal payment platform.

How not to be confusing

RakuDoc is inherently customisable. It is also designed to be output neutral (although at the moment HTML is the most common output form). Semantic blocks can be invented within a document, and a renderer can allow for other user-defined blocks and markup instructions to be created.

However, RakuDoc is specific about naming rules. A built-in block must be all lower case, and renderers should not allow user-defined blocks to use all lower case. A semantic block is all upper case. And a user-defined block must have at least one upper-case letter and one lower-case letter.

All markup instructions, which are inline instructions, must be a single Unicode character with the property UPPER. Built-in markup instructions are the ASCII characters and Δ. All other codes can be used.

The naming rules have been created to ensure that even if a user-defined block or markup becomes popular, it is not a part of the RakuDoc standard. Renderers are only required to implement the RakuDoc standard, and may render other blocks, or not.

Wrapping up

These are some of the interesting additions to RakuDoc that are being proposed. There are more.

Since the Gamma review stage is now underway, it is almost certain that there may be more changes because the revision is now open to the Raku community for comment and requests. Discussion is open both for the language design and for the explanation of the design.

As might be admitted, community requests for changes to the overall design will face significant resistance from the main authors in order to maintain backwards compatibility with the previous version of RakuDoc, and the integrity of the underlying paradigms. New block or inline instructions will be more readily considered, but requests for examples, explanation, and greater clarity will be very much appreciated.

Andrew Shitov: Calculator with Roman numbers using Raku Grammars

Published by Andrew Shitov on 2023-07-24T20:50:21

The second task of Weekly Challenge 227 is an interesting problem to create a simple calculator, which will work with Roman numbers.

Write a script to handle a 2-term arithmetic operation expressed in Roman numeral.

Example

IV + V     => IX
M - I      => CMXCIX
X / II     => V
XI * VI    => LXVI
VII ** III => CCCXLIII
V - V      => nulla (they knew about zero but didn't have a symbol)
V / II     => non potest (they didn't do fractions)
MMM + M    => non potest (they only went up to 3999)
V - X      => non potest (they didn't do negative numbers)

My first reaction is to use Raku’s grammars. And I have prepared the fundamentals for solving this kind of tasks already, namely:

Please refer to the materials above for the details, but in brief, the idea of converting any given Roman number to its decimal value is to use a grammar that parses it and adds up to the result based on what it sees.

A Roman number is a sequence of patterns that represent thousands, hundreds, tens, and ones. So, here is the modified grammar from one of the above posts:

grammar RomanArithmetics {

    . . .

    token roman-number {
        <thousands>? <hundreds>? <tens>? <ones>? {
            $/.make(
                ($<thousands>.made // 0) +
                ($<hundreds>.made  // 0) +
                ($<tens>.made      // 0) +
                ($<ones>.made      // 0)
            )
        }
    }

    token thousands {
        | M    { $/.make(1000) }   | MM   { $/.make(2000) }
        | MMM  { $/.make(3000) }   | MMMM { $/.make(4000) }
    }

    token hundreds {
        | C    { $/.make(100) }    | CC   { $/.make(200) }
        | CCC  { $/.make(300) }    | CD   { $/.make(400) }
        | D    { $/.make(500) }    | DC   { $/.make(600) }
        | DCC  { $/.make(700) }    | DCCC { $/.make(800) }
        | CM   { $/.make(900) }
    }

    token tens {
        | X    { $/.make(10) }     | XX   { $/.make(20) }
        | XXX  { $/.make(30) }     | XL   { $/.make(40) }
        | L    { $/.make(50) }     | LX   { $/.make(60) }
        | LXX  { $/.make(70) }     | LXXX { $/.make(80) }
        | XC   { $/.make(90) }
    }

    token ones {
        | I    { $/.make(1) }      | II   { $/.make(2) }
        | III  { $/.make(3) }      | IV   { $/.make(4) }
        | V    { $/.make(5) }      | VI   { $/.make(6) }
        | VII  { $/.make(7) }      | VIII { $/.make(8) }
        | IX   { $/.make(9) }
    }
}

In terms of grammar, a Roman number is <thousands>? <hundreds>? <tens>? <ones>, where each part is optional. To collect the decimal value, I am using the AST to pass an integer value to the next level.

For example, for the number XXI our grammar will find two tokens: XX and I, which are converted to 20 and 1. At the top level, these partial values are summed up together to get 21.

As we need a basic calculator, let’s add the corresponding rules directly to the RomanArithmetics grammar:

grammar RomanArithmetics {
    rule TOP {
        <roman-number> <op> <roman-number> {
            my $n1 = $<roman-number>[0].made;
            my $n2 = $<roman-number>[1].made;

            my $n;
            given ~$<op> {
                when '+'  {$n = $n1 +  $n2}
                when '-'  {$n = $n1 -  $n2}
                when '*'  {$n = $n1 *  $n2}
                when '/'  {$n = $n1 /  $n2}
                when '**' {$n = $n1 ** $n2}
            }

            $/.make($n)
        }
    }

    token op {
        '+' | '-' | '*' | '/' | '**'
    }

    . . .
}

Here, the TOP rule expects a string consisting of two Roman numbers with an operation symbol op between them. Value computation happens immediately in the inline actions such as $n = $n1 + $n2.

The main part of the program is done. What remains is the opposite conversion to print the result and a straightforward set of tests to print an error message if the result cannot be represented with a Roman number.

First, the reverse convertion:

sub to-roman($n is copy) {
    state @roman = 
        1000 => < M MM MMM >,
        100 => < C CC CCC CD D DC DCC DCCC CM >,
        10  => < X XX XXX XL L LX LXX LXXX XC >,
        1   => < I II III IV V VI VII VIII IX >;

    my $roman;

    for @roman -> $x {
        my $digit = ($n / $x.key).Int; 
        $roman ~= $x.value[$digit - 1] if $digit;
        $n %= $x.key;
    }

    return $roman;
}

And finally, the function that refer to the grammar and prints the result.

sub compute($input) {
    my $answer = RomanArithmetics.parse($input).made;

    my $output = "$input => ($answer) ";

    if $answer != $answer.round {
        $output ~= "non potest (they didn't do fractions)";
    }
    elsif $answer >= 4000 {
        $output ~= "non potest (they only went up to 3999)";
    }
    elsif $answer == 0 {
        $output ~= "nulla (they knew about zero but didn't have a symbol)";
    }
    elsif $answer < 0 {
        $output ~= "non potest (they didn't do negative numbers)";
    }
    else {
        $output ~= to-roman($answer);
    }

    return $output;
}

To test the program, let us equip it with the test cases from the problem description and call them one by one:

my @test-cases =
    'IV + V',
    'M - I',
    'X / II',
    'XI * VI',
    'VII ** III',
    'V - V',
    'V / II',
    'MMM + M',
    'V - X'
;

say compute($_) for @test-cases;

The program prints the following. I also added decimal value to the output so that we can see why each of the error messages was chosen.

$ raku ch-2.raku
IV + V => (9) IX
M - I => (999) CMXCIX
X / II => (5) V
XI * VI => (66) LXVI
VII ** III => (343) CCCXLIII
V - V => (0) nulla (they knew about zero but didn't have a symbol)
V / II => (2.5) non potest (they didn't do fractions)
MMM + M => (4000) non potest (they only went up to 3999)
V - X => (-5) non potest (they didn't do negative numbers)

Andrew Shitov: Counting Fridays the 13th in Raku

Published by Andrew Shitov on 2023-07-24T08:35:08

The Task 1 of the Weekly Challenge 227 is the following:

You are given a year number in the range 1753 to 9999.

Write a script to find out how many dates in the year are Friday 13th, assume that the current Gregorian calendar applies.

Example

Input: $year = 2023
Output: 2

Since there are only 2 Friday 13th in the given year 2023 i.e. 13th Jan and 13th Oct.

Let us solve it in the Raku programming language.

The idea is is to loop over the months of the given year and to count the Fridays which happen to be the 13th.

sub count-friday-the13s($year) {
    my $count = 0;

    for 1..12 -> $month {
        my $dt = DateTime.new(
            year => $year, month => $month, day => 13
        );
        $count++ if $dt.day-of-week == 5;
    }

    return $count;
}

The code is very clear and explains itself. The result for 2023 is 2 as it should be:

say count-friday-the13s(2023); # 2

Now, let us compactify the code to make it more readable 🙂

sub count-friday-the13s($year) {
    [+] map {
        5 == DateTime.new(
            year => $year, month => $_, day => 13).day-of-week
    }, 1..12;   
}

The loop is now replaced with map, and adding up the Trues is done using a reduction metaoperation [+]. There is no explicit return keyword, as Raku will use the last computed value as the result of the function call.

Finally, after we have a compact solution, we can return to the task description and discover that the sample output also lists the dates, not only the counter.

So, there’s nothing to do as to return to a more verbose solution and collect the dates too. So, back to explicit loops, and here’s the final solution:

my $year = @*ARGS[0] // 2023;

my @dates;
for 1..12 -> $month {
    my $dt = DateTime.new(year => $year, month => $month, day => 13);
    if ($dt.day-of-week == 5) {
        push @dates, $dt;
    }
}

if @dates {
    my $count = @dates.elems;

    if $count == 1 {
        say "There is only one Friday the 13th in $year:";
    }
    else {
        say "There are {@dates.elems} Fridays the 13th in $year:";
    }

    .mm-dd-yyyy.say for @dates;
}
else {
    say "There are no Friday the 13th in $year.";
}

The output for a sample year selection:

$ raku ch-1.raku     
There are 2 Fridays the 13th in 2023:
01-13-2023
10-13-2023

$ raku ch-1.raku 2023
There are 2 Fridays the 13th in 2023:
01-13-2023
10-13-2023

$ raku ch-1.raku 2021
There is only one Friday the 13th in 2021:
08-13-2021

$ raku ch-1.raku 2022
There is only one Friday the 13th in 2022:
05-13-2022

$ raku ch-1.raku 2024
There are 2 Fridays the 13th in 2024:
09-13-2024
12-13-2024

$ raku ch-1.raku 2025
There is only one Friday the 13th in 2025:
06-13-2025

vrurg: Easy-peasy Service From A Role

Published by Vadim Belman on 2023-07-19T00:00:00

I was always concerned about making things easier.

No, not this way. A technology must be easy to start with, but also be easy in accessing its advanced or fine-tunable features. Let’s have an example of the former.

Disclaimer

This post is a quick hack, no proof-reading or error checking is done. Please, feel free to report any issue.

The Task

Part of my ongoing project is to deal with JSON data and deserialize it into Raku classes. This is certainly a task for JSON::Class. So far, so good.

The keys of JSON structures tend to use lower camel case which is OK, but we like kebabing in Raku. Why not, there is JSON::Name. But using it:

The Assets

There are roles. At the point I came to the final solution I was already doing something like1:

class SomeStructure does JSONRecord {...}

Then there is AttrX::Mooish, which is my lifevest on many occasions:

use AttrX::Mooish;
class Foo {
    has $.foo is mooish(:alias<bar>);
}
my $obj = Foo.new: bar => "the answer";
say $obj.foo; # the answer

Apparently, this way it would still be a lot of manual interaction with aliasing, and that’s what I was already doing for a while until realized that there is a bettter way. But be back to this later…

And, eventually, there are traits and MOP.

The Solution

Name Translation

That’s the easiest part. What I want is to makeThisName look like make-this-name. Ha, big deal!

unit module JSONRecord::Utils;

our sub kebabify-attr(Attribute:D $attr) {
    if $attr.name ~~ /<.lower><.upper>/ {
        my $alias = (S:g/<lower><upper>/$<lower>-$<upper>/).lc given $attr.name.substr(2);
        ...
    }
}

I don’t export the sub because it’s for internal use mostly. Would somebody need it for other purposes it’s a rare case where a long name like JSONRecord::Utils::kebabify-attr($attr) must not be an issue.

The sub is not optimal, it’s what I came up with while expermineting with the approach. The number of method calls and regexes can be reduced.

I’ll get back later to the yada-yada-yada up there.

Automate Attribute Processing

Now we need a bit of MOP magic. To handle all attributes of a class we need to iterate over them and apply the aliasing. The first what comes to mind is to use role body because it is invoked at the early class composition times:

unit role JSONRecord;

for ::?CLASS.^attributes(:local) -> $attr {
    # take care of it...
}

Note the word “early” I used above. It actually means that when role’s body is executed there are likely more roles waiting for their turn to be composed into the class. So, there are likely more attributes to be added to the class.

But we can override Metamodel::ClassHOW compose_attributes method of our target ::?CLASS and rest assured no one would be missed:

unit role JSONRecordHOW;
use JSONRecord::Utils;

method compose_attributes(Mu \obj, |) {
    for self.attributes(obj, :local) -> $attr {
        # Skip if it already has `is mooish` trait applied – we don't want to mess up with user's intentions.
        next if $attr ~~ AttrX::Mooish::Attribute;
        JSONRecord::Utils::kebabify-attr($attr);
    }
    nextsame
}

The Role Does It All

Basically, that’s all we currently need to finalize the solution. We can still use role’s body to implement the key elements of it:

unit role JSONRecord;
use JSONRecordHOW;

unless ::?CLASS.HOW ~~ JSONRecordHOW {
    ::?CLASS.HOW does JSONRecordHOW;
}

Job done! Don’t worry, I haven’t forgot about the yada-yada-yada above!

But…

The original record role name itself is even longer than JSONRecord, and it consists of three parts. I’m lazy. There are a lot of JSON structures and I want less typing per each. A trait? is jrecord?

unit role JSONRecord;

multi sub trait_mod:<is>(Mu:U \type, Bool:D :$jrecord) is export {
    unless type.HOW ~~ JSONRecordHOW {
        type.HOW does JSONRecordHOW
        type.^add_role(::?ROLE);
    }
}

Now, instead of class SomeRecord does JSONRecord I can use class SomeRecord is jrecord. In the original case the win is even bigger.

The Yada???

There is absolutely nothing funny about it. Just a common way to keep a reader interested!

Seriously.

The reason for the yada in that snippet is to avoid a distraction from the primary purpose of the example. Here is what is going on there:

I want AttrX::Mooish to do the dirty work for me. Eventually, what is needed is to apply the is mooish trait as shown above. But the traits are just subs. Therefore all is needed now is to:

&trait_mod:<is>($attr, :mooish(:$alias));

Because this is what Raku does internally when encounters is mooish(:alias(...)). The final version of the kebabifying sub is:

our sub kebabify-attr(Attribute:D $attr) {
    if $attr.name ~~ /<.lower><.upper>/ {
        my $alias = (S:g/<lower><upper>/$<lower>-$<upper>/).lc given $attr.name.substr(2);
        &trait_mod:<is>($attr, :mooish(:$alias));
    }
}

Since the sub is used by the HOW above, we can say that the &trait_mod<is> would be called at compile time2.

The Use

Now, it used to be:

class SomeRecord does JSONRecord {
    has $.aLongAttrName is mooish(:alias<a-long-attr-name>);
    has $.shortname;
}

Where, as you can see, I had to transfer JSON key names to attribute names, decide where aliasing is needed, add it, and make sure no mistakes were made or attributes are missed.

With the above rather simple tweaks:

class SomeRecord is jrecord {
    has $.aLongAttrName;
    has $.shortname;
}

Job done.

The Stupidy

Before I came down to this solution I’ve got 34 record classes implemented using the old approach. Some are little, some are quite big. But it most certainly could’ve taken much less time would I have the trait at my disposal back then…

  1. Naming is totally fictional. 

  2. Most likely, but there are exceptions. It barely changes a lot, but certainly falls out of the scope of this post. 

Steve Roe: raku: Procedure, Function or OO?

Published by librasteve on 2023-07-15T12:35:15

I recently blogged about raku Sigils and the Single Argument Rule.

This post is by way of a follow up. It aims to compare and contrast the practical application of sigils in various raku coding styles. It was inspired by a recent chat on the raku Discord channel, in which some consensus was reached that a dollar-only style is preferred by several coders who I respect – and yet I find that my style differs and, unlike my friends, I prefer to apply a combination of dollar- and ampersand-sigils to do what I mean.

One of the most underrated aspects of the raku language is its ability to make easy things easy. Perhaps this is due to the high geek ratio we have on our language channels … who are keen to dig into esoteric details like ASTs and meta-object protocols. Me, I’m more of a jobbing programmer who aims for simplicity and readability – in the language of crawl, walk, run I’d say I’m a slow walker. (That’s not to say that I am against advanced features since they make the hard things possible such as the amazing Red module.)

So, I hope that these code style illustrations also show just how easy raku makes it to apply each of the styles in a low boilerplate way.

Procedural Raku

#raku in procedural stylee

my @people = [ %( name => 'Mary', height => 160 ),
               %( name => 'Isla', height =>  80 ),
               %( name => 'Sam', ), ];

my $height-total = 0;
my $height-count = 0;

for @people -> %person {
    if %person<height> {
        $height-total += %person<height>;
        $height-count += 1;
    }   
}

if $height-count {
    my $average-height = $height-total / $height-count;

    say $average-height
}

#120

Here, the key take-aways are:

Functional Raku

# raku in functional stylee

my \people = ( ( name => 'Mary', height => 160 ).Map,
               ( name => 'Isla', height =>  80 ).Map,
               ( name => 'Sam', ).Map, );

my \heights = people.grep( *<height> ).map( *<height> );

if heights.elems {
    my \average-height = ( [+] heights ) / heights.elems;

    say average-height
}

#120

In the functional case, we have:

Objected Oriented Raku

# raku in object oriented stylee

class Person {
    has $.name is required;
    has $.height;
}

class Group {
    has @.people;

    method heights {
        @.people.grep( *.height ).map( *.height )
    }   

    method average-height {
        if $.heights.elems {
            ( [+] $.heights ) / $.heights.elems;
        }   
    }   
}

my @people = ( Person.new( name => 'Mary', height => 160 ),
               Person.new( name => 'Isla', height =>  80 ),
               Person.new( name => 'Sam', ), );

my $group = Group.new( :@people );  

say $group.average-height;

#120

In contrast, the OO variant has:

A gist of the raku programs is provided if you would like to test drive for yourself…

Acknowledgements

These short raku examples were inspired by the excellent Mary Rose Cook post A Practical Introduction To Functional Programming with the original examples given in Python.

In contrast to that Python, the raku core required no modules to be imported to implement the functional style such as ‘from operator import add’ and I feel that both the reduce (‘[+]’) and lambda (‘*<height>’) terms in raku are more natural.

Conclusions

Unlike other languages, raku does not seek to constrain our coding style to one best way. It provides a set of lego bricks and syntax alternatives and encourages us to adopt the style that is most appropriate for the problem at hand.

The purely Functional version was a very interesting exercise for yours truly since I am using raku as a tool to learn and incorporate functional techniques into my code, I particularly like:

Looping back to the sigils discussion, in the light of these illustrations, my opinion is:

I personally like the way code looks and works with $, @ and % …. but I would be happy to read and write code where the convention is to itemize everything and to limit to the $ sigil.

And … I think that I will be trying to bring more of a functional flavour in my raku projects as I get more confident!

~librasteve

vrurg: Another Article Before A Break

Published by Vadim Belman on 2023-07-05T00:00:00

I have managed to finish one more article in the Advanced Raku For Beginners series, this time about type and object composition in Raku.

It’s likely to take a long before I can write another.

vrurg: Did you know that…

Published by Vadim Belman on 2023-07-04T17:24:07

Once, long ago, coincidentally a few people were asking the same question: how do I get a method object of a class?

Answers to the question would depend on particular circumstances of the code where this functionality is needed. One would be about using MOP methods like .^lookup, the other is to use method name and indirect resolution on invocant: self."$method-name"(...). Both are the most useful, in my view. But sometimes declaring a method as our can be helpful too:

class Foo {
    our method bar {}
}
say Foo::<&bar>.raku;

Just don’t forget that this way we always get the method of class Foo, even if a subclass overrides method bar.

gfldex: I take issue with enshittification

Published by gfldex on 2023-07-02T08:37:05

With great amusement I watch the discussion of worries about — let’s call them “changes” — of reddit on reddit. The OP wishes Usenet to be back. And yes, we traded robust and decentralised services for “platforms” that require less clicks to reach them. Don’t get me wrong. I don’t wish the good old days back when we had a need for mirrors, because interruptions of the intertubes where quite common. However, I do miss the ability to access documentation without the need of dozens and dozens of machines to all work at the same time. Is docs.raku.org DDoS-safe? I hope so, because running that site on localhost is not exactly trivial.

I like to query the issues for Rakudo and the Problem Solving repo quite often. The latter basically consists of issues. Being on github a simple git clone with the right URL should suffice, right? Well, as it turns out Github is a vendor and issues are the lock-in. You can move issues from one repo to another but there is no export-button. There is an API to fetch them — one by one. If they would not have an API, we would have to use a scraper, increasing the load on that companies servers. Go figure.

While trying to secure Rakudos issues I got the following result:

Message: API rate limit exceeded for user ID 426613., Locations: []

That’s about ⅔ in. I do cache individual issues locally, so in an hour I should have them all. In case you want to do the same here is how far I got within a day. It needs more work and there are external dependencies. Sadly, I have to shell-out to cmark, because none of the modules on raku.land can handle Markdown reliably. That slows things down. Mostly, because hyper and Proc::Async don’t mix too well. The script needs a little more work and plenty of testing, before I can publish it in earnest. I hope the coming week will be sufficient.

Enshittification is a thing and we should prepare for that. There are alternatives to github and issue trackers. Eventually, the “free” services we use will have to pay their investors. With money actually costing money again, that might happen sooner then later. I doubt that shrinking workforces and zero interest-rates mix. There seems to be a Problem Solving-issue in order.

gfldex: Saying them apart

Published by gfldex on 2023-07-02T07:53:15

While reading the source of RakuDoc::To::Text I spotted lizmat doing dispatch by hand. She defines quite a few subs that all take a RakuAST::Doc-type and return a Str. However, they are multi-candidates of the same proto. This allows for better readability, especially when debugging. Can we have multi-candidates with different names? Yes, but we need to cheat.

multi sub trait_mod:<is>(Routine \r, :$renamed) {
    my $orig = r.name;
    INIT {
        use nqp;
        r.set_name(nqp::decont(„$renamed (AKA $orig)“));
    }

    role RenamedMulti[$name] {
        method name { $name }
    }
    r does RenamedMulti[$renamed]
}

proto sub foo(|) is renamed<foo-proto> { * }
multi sub foo(Int $) is renamed<inter> { say &?ROUTINE.name }
multi sub foo(Str $) is renamed<stringer> { say &?ROUTINE.name }
multi sub foo(Failure) is renamed<failer> { fail('oi‽') }

foo(42);
foo('answer');
foo(Failure);

# OUTPUT:
# inter
# stringer
# oi‽
#   in sub failer (AKA failer) at tmp/2021-03-08.raku line 2416
#   in block <unit> at tmp/2021-03-08.raku line 2420

As you can see, I have to (indirectly) use a nqp-opcodes, to change the name of the subroutine, thanks to nqp-trickery in Backtrace.pm6. Why method AT-POS doesn’t just call .name on the Callable, I don’t know and my enquiry on IRC was fruitless. I believe it should but that might break things in CORE.

I will come back to this problem once we got v6.e. RakuAST might make the Backtrace-problem go away without me complaining via an issue. The latter will be my next topic and seems to be a more pressing … issue.

Richard Hainsworth: Revising Rakudoc

Published by Richard Hainsworth on 2023-06-30T23:00:00

In the earliest days of Raku, Damian Conway specified a documentation markup language to accompany it. Since it was modeled on Perl's POD it was called <sound of trumpets and dramatic pause> POD6.

The Specification of POD6 (S26) was mostly incorporated without much extra explanation in the documentation suite. In this way, the description of POD6 was itself was an illustration of many of the features it documented, and some that it did not document.

Since Raku is defined by its test suite, and not its documentation, there were other details of POD6 in the tests that were not documented, even in S26.

Raku developed and morphed, but POD6 remained. The tooling for rendering the documentation sources needed updating, and the documentation site had to be modernised.

Upgrading the renderer

A project of mine was to upgrade the basic renderer that would transform POD6 to HTML, but allow for developers to customise the templates for each type of POD6 block type. (The first Pod::To::HTML renderer hard-coded representations of POD6 markup, eg. B<this is bold> was <strong>this is bold</strong> and could not be changed.)

It turned out that S26 allowed for much more than had been included in the first documentation sources, including custom blocks and custom markup.

The project to upgrade the original HTML renderer morphed into Raku::Pod::Render, and transforming a directory full of individual documentation sources into an interlinked and searchable set of documents required another layer of tooling Collection. For example, collecting together all the pages that can be grouped as tutorials, or reference, or language, and creating a separate page for them automatically.

I covered these two projects in a presentation to RakuCon 2022.

Some of the original ideas in S26 had not been implemented, such as aliases and generic numbering. Other ideas had become outdated, such as a way to specify document encoding, which is now solved with Unicode.

In addition, RakuAST (see RakuAST for early adopters ) is on the horizon, which will radically change the speed of documentation processing.

There are also two implementations of POD6, one in Raku and one in Javascript, namely Alexandr Zahatski's Podlite.

Introducing Rakudoc

This was an ideal time to revisit POD6 and recast it into Rakudoc - new name for the markup language, and its new file extension ".rakudoc".

I was invited to the first Raku Core Summit and I put together a presentation about the changes I thought needed to be made based on my own experience, but also using comments from other developers.

We came to a number of consensus agreements about the minimal changes that were needed, and some extra functionality to handle new questions, such as documentation versioning.

It was also clear that Rakudoc (aka POD6) has two separate parts: components that interact closely with the program being documented, and components that will be rendered separately into HTML (or an ebook). The documentation file needs to make this clear.

I have now written the first draft of the revision and the documentation file that encapsulates it. An HTML version can be found at new-raku.finanalyst.org/language/rakudoc, alongside the old documentation file and the simple table implementation. I am planning future blogs to describe some of the proposed revisions.

However, none of the revisions will break existing POD6, so Rakudoc should be backwards compatible with POD6. The version at new-raku is a VERY early first draft, and it will go through several review stages.

The first Raku Core Summit was organised by Elizabeth Mattijsen and hosted by Elizabeth and Wendy at their home. It was a really good meeting and I am sincerely grateful for their generosity and hospitality. The summit was also supported by The Perl and Raku Foundation, Rootprompt, and Edument.

vrurg: CoreDev Class Part 1 Video Published

Published by Vadim Belman on 2023-06-28T00:00:00

The recording quality of the first Rakudo Core Development Class happened to be disastrous. Due to this I was very hesitant to publish it without attempting to get it fixed first. To the full extreme of it, I thought I can just re-record the screen part in higher quality and then apply the original audio. But last day I did some assessment of the video and came to the conclusion that even if re-recording is ever possible technically, it’d be too costly time-wise. Besides, the entire ending of the class was lost because I was carried away with the presentation and forgot to check the status of recording which JITSI limits to 1 hour in one run.

And I gave up. With a few cut-outs and some titles added the video is now available. Enjoy and sorry for the glitches!

rakudo.org: Rakudo compiler, Release #162 (2023.06)

Published on 2023-06-26T00:00:00

vrurg: Got A New Article

Published by Vadim Belman on 2023-06-24T00:00:00

After a long marathon of coding for different projects a change of activity was required. Meanwhile, the series of publications had been neglected for too long. This needed to be fixed. It has been fixed..

6guts: Recollections from the Raku Core Summit

Published by jnthnwrthngtn on 2023-06-18T15:58:04

The first Raku Core Summit, a gathering of folks who work on “core” Raku things, was held on the first weekend of June, and I was one of those invited to attend. It’s certainly the case that I’ve been a lot less active in Raku things over the last 18 months, and I hesitated for a moment over whether to go. However, even if I’m not so involved day to day in Raku things at the moment, I’m still keen to see the language and its ecosystem move forward, and – having implemented no small amount of the compiler and runtime since getting involved in 2007 – I figured I’d find something useful to do there!

The area I was especially keen to help with is RakuAST, something I started, and that I’m glad I managed to bring far enough that others could see the potential and were excited enough to pick it up and run with it.

One tricky aspect of implementing Raku is the whole notion of BEGIN time (of course, this is also one of the things that makes Raku powerful and thus is widely used). In short, BEGIN time is about running code during the compile time, and in Raku there’s no separate meta-language; anything you can do at runtime, you can (in principle) do at compile time too. The problem at hand was what to do about references from code running at compile time to lexically scoped symbols in the surrounding scope. Of note, that lexical scope is still being compiled, so doesn’t really exist yet so far as the runtime is concerned. The current compiler deals with this by building up an entire flattened table of everything that is visible, and installing it as a fake outer scope while running the BEGIN-time code. This is rather costly, and the hope in RakuAST was to avoid this kind of approach in general.

A better solution seemed to be at hand by spotting such references during compilation, resolving them, and fixating them – that is, they get compiled as if they were lookups into a constant table. (This copies the suggested approach for quasiquoted code that references symbols in the lexical scope of where the quasiquoted code appears.) This seemed promising, but there’s a problem:

my $x = BEGIN %*ENV<DEBUG> ?? -> $x { note "Got $x"; foo($x) } !! -> $x { foo($x) };

It’s fine to post-declare subs, and so there’s no value to fixate. Thankfully, the generalized dispatch mechanism can ride to the rescue; we can:

  1. Create a placeholder object with an attribute to hold the resolution
  2. Compile the lookup into a use of a dispatcher that reads this attribute and indicates that this is a constant result of the dispatch (so it is stored in the inline cache, and after specialization will be just as cheap as any other sub call). If the attribute is not set, that means we tried to run the code before declaring the sub, and the object can carry a bit of extra metadata in order to give a good error message.
  3. Keep track of this object in the compiler, and – upon declaration of the sub – install it into the placeholder object.
  4. Give an error if we reach the end of the compilation unit with an unfilled placeholder.

When compiling Raku code, timing is everything. I knew this and tried to account for it in the RakuAST design from the start, but a couple of things in particular turned out a bit awkward.

I got a decent way into this restructuring work during the core summit, and hope to find time soon to get it a bit further along (I’ve been a mix of busy, tired, and had an eye infection to boot since getting back from the summit, so thus far there’s not been time for it).

I also took part in various other discussions and helped with some other things; those that are probably most worth mentioning are:

Thanks goes to Liz for organizing the summit, to Wendy for keeping everyone so well fed and watered, to the rest of attendees for many interesting discussions over the three days, to TPRF and Rootprompt for sponsoring the event, and to Edument for supporting my attendance.

brrt to the future: Retrospective of the MoarVM JIT

Published by Bart Wiegmans on 2023-06-10T15:33:00

Hi hackers! Today the MoarVM JIT project is nearly 9 years old. I was inspired by Jonathan's presentation reflecting on the development of MoarVM, to do the same for the MoarVM JIT, for which I have been responsible.

For those who are unfamiliar, what is commonly understood as 'JIT compilation' for virtual machines is performed by two components in MoarVM.

This post refers only to the native code generation backend component. It, too, is split into two mostly-independent systems:

Things that worked well

 Things that didn't work so well 

What's kind of ugly

How did we get here?

One one hand, as a result of my limited experience, time and resources, and on the other hand as a result of the design of MoarVM.

MoarVM was originally designed as a traditional interpreter for a high level language (much like the Perl interpreter). Meaning that it has a large number of different instructions and many instructions operate on high-level data structures like strings, arrays and maps (as opposed to pointers and machine words).

This is by no means a bad or outdated design. Frequently executed routines (string manipulation, hash table lookups etc.) are implemented using an efficient language (C) and driven by a language that is optimized for usability (Raku). This design is also used in modern machine learning frameworks. More importantly, this was a reasonable design because it is a good target for the Rakudo compiler.

For the JIT compiler, this means two things:

The machine code generated by the JIT compiler then will mostly consists of consecutive function calls to VM routines, which is not the type of code where a compiler can really improve performance much.

In other words, suppose 50% of runtime is spent in interpretation overhead (instruction decoding and dispatch), and 50% is spent in VM routines, then removing interpretation overhead via JIT compilation will at best result in a twofold increase in performance. For many programs, the observed performance increase will be even less.

Mind that I'm specifically refering to the improvement due to machine code generation, and not to those due to type specialization, inlining etc. (the domain of 'spesh'). These latter features have resulted in much more significant performance improvements.

Was it worth it?

I think it was.

For me personally, it was a tremendously valuable learning experience which led directly to my current career, writing SQL compilers for Google Cloud.

For the Raku community, even if we never realized the performance improvements that I might have hoped at the start, I hope that the JIT project (as it exists) has been valuable, if for no other reason than identifying the challenges of JIT compilation for MoarVM. A future effort may be able to do better based on what we learned; and I hope my blog posts are a useful resource from that perspective.

What's next?

Assuming that time and resources were not an issue:

If any of this comes to pass, you'll find my report on it right here. Thanks for reasding and until then!

 

Steve Roe: Cascading Accessors

Published by librasteve on 2023-06-05T08:07:11

raku is a great programming language and Dan is a raku module for Data ANalytics that provides DataFrame and Series roles & types. Disclosure: I am the author of Dan.

The Question – Is $ a hack?

A while back, in the raku Discord channel, a discussion about the value of the raku $ anonymous state variable. Something like this:

me: I value the $ since it is useful in the Dan DataFrame sort operation like this:

say ~df.sort: { df[$++]<C> };   # sort by col C

them: that use of $ is a hack, why don’t you go like this:

say ~df.sort: *<C>; 
-or-
say ~df.sort: $_<C>;

As the author I felt a little hurt that some of the more forthright community members felt I was resorting to a hack and a little bemused that my module couldn’t do this. This post aims to explore the situation.

The Context – DataFrame Anatomy

In case you don’t know about DataFrames, they are widely used in popular data analytics packages such as Python Pandas and Rust Polars. Here’s how raku Dan DataFrames are structured:

The code on the side is taken directly from the module implementation.

Here’s how to access a specific data item:

my \df = DataFrame.new( ... );
say df.data[m;n];
- or -
say df[m;n];

Since df.data is an out of the box raku 2d Array, a semicolon index [m;n] will pick an item. Raku also takes index variants such as a range with eg. ‘2..4‘ or a slice operation with ‘*‘.

say df[m;*]; # returns a 1d Array with all the items in row m
say df[*;n]; # returns a 1d Array with all the items in col n

In addition to exposing @.data as a public attribute, a Dan DataFrame object delegates positional accessor method calls on df to its df.data attribute – so df[m;n] is the same as df.data[m;n].

The Feature – Cascading Accessors

Standard raku has two kinds of accessors:

#Positional - to access Array items with an index...

my @a = [1,2,3,4]; say @a[0]; #1

#Associative - to access Hash items with a key...

my %h = %(a=>1, b=>2); say %h<a>; #1

Since a raku Dan DataFrame can have named columns and row indexes, both Positional and Associative access can be helpful, and this is provided by DataFrames in other languages.

Here’s how that looks for a raku Dan DataFrame (which is 2d):

my \df = DataFrame.new( [1,2;3,4], index => <A B>, columns => <A B>);

say df[0;0];      #1
say df.data[0;0]; #1
say df[0][0];     #1
say df[0]<A>;     #1
say df<A>[0];     #1
say df<A><A>;     #1
# ^^^ these all return the same item!

This feature is called “cascading accessors” and is mentioned with a different name in the raku design synopses.

It’s worth mentioning that item accessors are not universally liked in the data analytics world. Generally speaking it is unusual to want to access a single item as opposed to a general operation that applies to all members of the structure. Often they use awkward terminology such as ‘iloc’.

Nevertheless, I think that this design – which builds on the thoughtful and rich standard raku accessor capabilities is worthwhile. Cascading accessors are pretty obvious and user friendly.

Under the Hood

The Dan implementation of cascading accessors is built using the mechanisms provided for raku custom types.

Dan implements cascading accessors as follows:

Here are some examples:

say ~df[0];                 # returns a DataSlice of row 0
say ~df[*]<A>;              # returns a Series of col A
say ~df[0..*-2][1..*-1];    # returns a new DataFrame
say ~df[0..2]^;             # returns a new DataFrame
### postfix '^' here converts an Array of DataSlices into a new DataFrame

Dan DataFrame .sort operations

In general, Dan aims to use the standard built in raku operations wherever possible. The use of the built in sort is no exception.

Here is the controversial sort operation again:

say ~df.sort: { df[$++]<C> };   # sort by col C

Let’s take a look step by step:

So, here the sort block uses cascading accessors to pick df[$++]<C> …

Thus the DataFrame is sorted by column C.

Here are some other sort examples from the module synopsis:

say ~df.sort: { .[1] };         # sort by 2nd col (ascending)
say ~df.sort: { -.[1] };        # sort by 2nd col (descending)
say ~df.sort: { df.ix[$++] };   # sort by index

Conclusion

The question remains “should Dan cascading accessors shun the state variable $?”

On the one hand, that the state variable does a fine job of handling a wide range of 2d accessor use cases such as sort. So, in the spirit of the original design, I think that the anonymous state variable $ is a valuable piece of the raku toolbox and works well in the context of the indexing “micro-slang” for Dan.

On the other hand, looking at a regular 2d Array:

my @dr = [[rand xx 4] xx 6];
@dr.sort: *[2];

This Whatever syntax works fine (and is more intuitive), so I agree that this is a reasonable feature request for future version of Dan I and I will add to the TODO list (but it is a fairly long list…)

From the “bigger picture” point of view, I think that the ability to bolt on cascading accessors to raku is a testament to the malleability of the language.

Is ‘$’ a hack? I leave it to the reader to judge…. what do you think?

~librasteve

rakudo.org: Rakudo compiler, Release #161 (2023.05)

Published on 2023-05-31T00:00:00

gfldex: Most trusting

Published by gfldex on 2023-05-21T13:49:56

I was looking for a neat way to specify units when working with numbers. When doing dimensional analysis, many physicists like to put units into square brackets to create an additional namespace. We can do the same.

use v6.d;

class Unit { ... }

class SiInt is Int {
    trusts GLOBAL;
    trusts Unit;

    has Unit $!unit;
    method !unit { return-rw $!unit }

    method new(CORE::Int $new) { nextsame }
    method Str { self.Int::Str ~ $!unit.suffix }

    method ACCEPTS(Unit:U $u) { $!unit === $u }
}

class Unit {
    our $.suffix = '';
    our $.long-name = "unit-less";

    method ACCEPTS(SiInt $si) { $si!SiInt::unit === self }
}

class m is Unit { our $.suffix = 'm'; our $.long-name = 'Meter'; }

multi sub postcircumfix:<[ ]>(SiInt $obj, Unit:U $unit) {
    $obj!SiInt::unit === Unit ?? ($obj!SiInt::unit = $unit)
        !! fail(‘Sorry, units can only be set, not changed.’);

    $obj
}
multi sub postcircumfix:<[ ]>(Int $value, Unit:U $unit) { SiInt.new($value)[$unit] }

constant Int = SiInt; # intentional shadowing of CORE::Int

my $i = 42[m];
put [$i, $i.^name]; # 42m SiInt
my Int() $a = 1;
put [$a, $a.^name]; # 1 SiInt

class s is Unit { our $.suffix = 's'; our $.long-name = 'Seconds'; }
multi sub infix:<+>(SiInt $l, SiInt $r) {
    $l!SiInt::unit === Unit ?? callsame()[$r!SiInt::unit]
    !! $r!SiInt::unit === Unit ?? callsame()[$l!SiInt::unit]
        !! $l!SiInt::unit === $r!SiInt::unit ?? nextsame()
           !! fail(„Unit mismatch between $l and $r“)
}

my $s = 60[s];

say $i + $a; # 43m
say $s + $i; # Unit mismatch between 60s and 42m

The idea is to have a numerical type that is by default unit-less. A unit can be added (but not changed) with square bracket postcircumfix. Since I add type-objects for each unit, I don’t have to mess around with strings and can multi-dispatch if needed. Since I want direct access to the unit, I tell the class to trust the package the operators are defined in. (This could be a module, of course.) What happens to be an ENODOC.

I have to use a forward declaration to get ACCEPTS to get hold of $!unit. Subsequently, multi-dispatch works just fine.

multi sub fancy(Int $value where m) { #`[fancy calculation goes here] }
multi sub fancy(Int) { fail ‘Can only be fancy with Unit "m".’ }
fancy($i);

Since SiInt is just an Int all built-ins will work, so long the unit is restored after using them. Being able to trust operators allows them to access the entire class, without having to cheat with use nqp;.

Because Raku treats types as values, I can calculate a compound unit.

class Meters-per-Second is Unit { our $.suffix = 'm/s'; our $.long-name = 'Meters per Second'; }
multi sub infix:</>(m, s) { Meters-per-Second }

sub speed($d where m, $t where s) { ($d / $t).Int.[m/s] }

my Int $fast = speed(500[m], 1[s]);
say $fast; # 500m/s

I’m quite pleased with being able to extend the type-system so easily without having to invent a complete new DSL. This aids composability greatly.

rakudo.org: Rakudo compiler, Release #160 (2023.04)

Published on 2023-04-21T00:00:00

Raku Advent Calendar: The 2022 Raku Advent Posts

Published by Elizabeth Mattijsen on 2022-12-26T01:01:00

(in chronological order, with comment references)