Raku RSS Feeds
Elizabeth Mattijsen (Libera: lizmat #raku) / 2023-09-26T09:19:21Paweł bbkr Pabian completed their series about Unicode.
Elizabeth Mattijsen reports on all recent developments around Rakudo, an implementation of the Raku Programming Language.
Justin DeVuyst has released the 2023.09 release of the Rakudo Compiler, Haytham Elganiny has released a new version of the Pakku package manager, and two authors published their first module on the Raku Ecosystem this week. Not to mention 15 other authors providing updates to existing modules and two new modules! A releasy week for sure!
Matt R has published a video about One Small Program in 15 Languages, and the Raku Programming Language is one of them!
Paweł bbkr Pabian completed their “Introduction to UTF” series of blog posts with:
Konstantin Narkhov has posted a video about another of their pet projects: Ethelia (/r/rakulang comments).
Weekly Challenge #236 is available for your perusal.
In RakuAST developments this week:
make test
+0) and 954/1358 (make spectest
+1).:_
and/or :_:
signify? by Willem Michels.variables
pragma by Ralph Mellor.Actually WordPress tells me that this is the 200th instalment of the Rakudo Weekly News. Counting is always hard! Meanwhile, Слава Україні! Героям слава!
Please keep staying safe and healthy, and keep up the good work!
If you like what I’m doing, committing to a small sponsorship would mean a great deal!
What a long, strange trip it's been. Idea to "quickly blog about some Unicode basics" grew into 17 posts monster series :)
Special thanks go to:
ł
does not decompose? It still scratches my brain :)SeeYa
This series is supposed to be focused on technical aspects of Unicode and I do not plan to analyze UTF support in various technologies. However for MySQL I want to make an exception, because I've seen countless examples of misunderstanding its concepts and falling into traps.
CREATE TABLE `foo` (
`bar` VARCHAR(32)
) Engine=InnoDB
CHARACTER SET utf8mb4
COLLATE utf8mb4_0900_ai_ci;
Later I will explain what those cryptic names mean.
Character set and/or collation can be specified on 7 (yes, seven!) different levels.
MySQL > SELECT @@character_set_server, @@collation_server;
+------------------------+--------------------+
| @@character_set_server | @@collation_server |
+------------------------+--------------------+
| utf8mb4 | utf8mb4_0900_ai_ci |
+------------------------+--------------------+
Those are your global settings that will be used when creating databases. So:
CREATE DATABASE `test`;
Is the same as:
CREATE DATABASE `test` CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
Those server settings are copied when database is created, so changing server settings later will not affect your databases.
CREATE DATABASE `test` CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
It is just another level of default, this time applied to created tables. So:
CREATE TABLE `foo` (
`bar` VARCHAR(32)
) Engine=InnoDB;
Is the same as:
CREATE TABLE `foo` (
`bar` VARCHAR(32)
) Engine=InnoDB
CHARACTER SET utf8mb4
COLLATE utf8mb4_0900_ai_ci;
And just like server settings those are also copied when tables are created. Altering database with ALTER DATABASE test CHARACTER SET xxx COLLATE yyy
will not alter tables in this database.
You can check currently used database character set and collation either from variables:
MySQL [test]> SELECT @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8mb4 | utf8mb4_0900_ai_ci |
+--------------------------+----------------------+
Or from information schema:
MySQL [test]> SELECT `default_character_set_name`, `default_collation_name`
FROM `information_schema`.`schemata`
WHERE `schema_name` = 'test';
+----------------------------+------------------------+
| DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
+----------------------------+------------------------+
| utf8mb4 | utf8mb4_0900_ai_ci |
+----------------------------+------------------------+
Note the inconsistency - DEFAULT_CHARACTER_SET_NAME
in information schema, but character_set_database
in variable and CHARACTER SET
in create.
CREATE TABLE `foo` (
`bar` VARCHAR(32)
) Engine=InnoDB
CHARACTER SET utf8mb4
COLLATE utf8mb4_0900_ai_ci;
It is - you guessed it - another level of defaults applied to columns. So:
CREATE TABLE `foo` (
`bar` VARCHAR(32)
) Engine = InnoDB;
Is the same as:
CREATE TABLE `foo` (
`bar` VARCHAR(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci
) Engine = InnoDB;
And just like database settings those are also copied when columns are created. Altering table with ALTER TABLE foo CHARACTER SET xxx COLLATE yyy
will not alter columns in this table.
However this time tool is available for convenient conversion - ALTER TABLE foo CONVERT TO CHARACTER SET xxx COLLATE yyy
will alter both table defaults and columns in this table.
You can check table collation in information schema:
MySQL [test]> SELECT `table_collation`
FROM `information_schema`.`tables`
WHERE `table_schema` = 'test'
AND `table_name` = 'foo';
+--------------------+
| TABLE_COLLATION |
+--------------------+
| utf8mb4_0900_ai_ci |
+--------------------+
Note another inconsistency - this time TABLE_COLLATION
implies character set, which is not given explicitly. Also it inconsistent with database level naming, being a default but missing DEFAULT_
prefix.
If you want to retrieve implied character set there is another information schema resource to do so:
MySQL > SELECT `character_set_name`
FROM `information_schema`.`character_sets`
WHERE `default_collate_name` = 'utf8mb4_0900_ai_ci';
+--------------------+
| CHARACTER_SET_NAME |
+--------------------+
| utf8mb4 |
+--------------------+
Finally, this is the "true" thing. That is how data will be stored and sorted. Server, database and table levels were only the defaults used for column creation.
You can check column character set and collation from information schema:
MySQL [test]> SELECT `character_set_name`, `collation_name`
FROM `information_schema`.`columns`
WHERE `table_schema` = 'test'
AND `table_name` = 'foo'
AND `column_name` = 'bar';
+--------------------+--------------------+
| CHARACTER_SET_NAME | COLLATION_NAME |
+--------------------+--------------------+
| utf8mb4 | utf8mb4_0900_ai_ci |
+--------------------+--------------------+
Yes, you can have different character sets and collations within single table:
CREATE TABLE `foo` (
`bar` VARCHAR(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
`baz` VARCHAR(32) CHARACTER SET latin1 COLLATE latin1_general_ci
) Engine = InnoDB;
I will give examples when it may be useful once all those cryptic names are explained.
My advice is: always provide character set and collation when creating databases, tables and columns. I've seen this too many times - developers adding tables without checking which character set and collation will be inherited from database. Or adding columns without checking which character set and collation will be inherited from table. Being more explicit = having less headache later.
MySQL > SELECT @@character_set_system;
+------------------------+
| @@character_set_system |
+------------------------+
| utf8mb3 |
+------------------------+
This is global character set for metadata. It tells what characters you can use in schema names:
CREATE TABLE `łąka` (
`bąki` int unsigned
) Engine = InnoDB
It is not part of inheritance chain Server -> Database -> Table -> Column.
MySQL > SELECT @@character_set_connection, @@collation_connection;
+----------------------------+------------------------+
| @@character_set_connection | @@collation_connection |
+----------------------------+------------------------+
| utf8mb4 | utf8mb4_general_ci |
+----------------------------+------------------------+
Those are wire protocol information. Character set tells meaning of transferred data, for example 0xF0 0x9F 0x98 0x8A
sent or received means 😊. Collation will be used for comparing/sorting data not derived from any column, for example bare SELECT 'A' = 'a'
.
Connection and Column character set may not be aligned, but it will fail if Connection wire protocol cannot transfer code points encoded in Columns. Best practice is to always use utf8mb4
.
SELECT *
FROM `foo`
ORDER BY `bar` COLLATE utf8mb4_estonian_ci;
You can override default column collation for ordering / grouping within SELECT query. This is useful when different alphabets sorts the same characters differently.
MySQL > CREATE TABLE `collation_test` (`data` text) Engine = InnoDB;
MySQL > INSERT INTO `collation_test` (`data`)
VALUES ("A"), ("Ä"), ("Z");
MySQL > SELECT *
FROM `collation_test`
ORDER BY `data` COLLATE utf8mb4_sv_0900_as_cs;
+------+
| data |
+------+
| A |
| Z |
| Ä |
+------+
MySQL > SELECT *
FROM `collation_test`
ORDER BY `data` COLLATE utf8mb4_es_0900_as_cs;
+------+
| data |
+------+
| A |
| Ä |
| Z |
+------+
utf8
vs utf8mb4
MySQL cheated in the past. They added character set utf8
but it was capable only of handling up to 3 byte code points.
MySQL [test]> CREATE TABLE `foo` ( `bar` CHAR(1) )
Engine = InnoDB
CHARACTER SET = utf8;
MySQL [test]> INSERT INTO `foo` (`bar`) VALUES ('😊');
ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x98\x8A' for column 'bar' at row 1
They did it however in good faith - back then 4 byte code points were not used. Indexes are constructed in such a way, that they must assume maximum byte length of a string. Maximum supported index byte length was 767 bytes, which allowed to index columns up to CHAR(255)
- because 255*3=765
was fitting into index. For 4 byte code points maximum indexable column would be only CHAR(191)
.
Later MySQL added utf8mb4
character set capable of storing proper 4 byte code points. Legacy utf8
was aliased as utf8mb3
. Default maximum supported index byte length was also extended in MySQL 8 to 3072 bytes, allowing to index columns up to VARCHAR(768)
.
Today MySQL tries to fix this technical debt, and if you specify character set as utf8
you will get following warning: 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous.
But how to index longer UTF-8 columns? Common trick is to use hash indexing:
CREATE TABLE `foo` (
`bar` varchar(1000),
`bar_hash` CHAR(32),
KEY (`bar_hash`)
) ENGINE = InnoDB
CHARACTER SET = utf8mb4
COLLATE = utf8mb4_0900_ai_ci;
CREATE TRIGGER `foo_insert`
BEFORE INSERT ON `foo`
FOR EACH ROW SET NEW.`bar_hash` = MD5( WEIGHT_STRING( NEW.`bar` ) );
CREATE TRIGGER `foo_update`
BEFORE UPDATE ON `foo`
FOR EACH ROW SET NEW.`bar_hash` = MD5( WEIGHT_STRING( NEW.`bar` ) );
Function WEIGHT_STRING
is super useful, because it converts text to format used by collation. Function MD5
reduces too long texts always to 32 bytes HEX representation.
Now you can for example create UNIQUE KEY
on column bar_hash
or use it in query:
SELECT *
FROM `foo`
WHERE `bar_hash` = MD5( WEIGHT_STRING( 'looked up text' ) );
utf8mb4_0900_ai_ci
MySQL 8 did huge cleanup in collation naming. utf8mb4_0900_ai_ci
means that it is collation of 4 byte UTF-8 done by Unicode 9.0 standard in accent (diacritic) insensitive and case insensitive manner.
It does not mean that database cannot store characters from Unicode version 10 onward. As I explained previously UTF-8 is designed in such a way, that storage is independent from versioning. Just comparison rules from Unicode version 9.0 will be used. That pretty much means recent ones, because almost nothing new was declared in this aspect later.
Accent / case insensitivity is up to you to decide. Basically you have 3 options:
utf8mb4_0900_ai_ci
- Accent and case insensitive, 'a' = 'A' = 'ą' = 'Ą'
.utf8mb4_0900_as_ci
- Accent sensitive but case insensitive, 'a' <> 'ą'
but still 'a' = 'A'
and 'ą' = 'Ą'
.utf8mb4_0900_as_cs
- Accent and case sensitive, 'a' <> 'A' <> 'ą' <> 'Ą'
.Remember that you can mix them. For example unique column for login
may have collation utf8mb4_0900_ai_ci
so Józef
, józef
and jozef
are treated as the same user. While column hobby
may have collation utf8mb4_0900_as_ci
because baki
(fuel tanks) and bąki
(bumble bees) are not the same.
You can list all utf8mb4
related collations by following query:
SHOW COLLATION WHERE Charset = 'utf8mb4';
Best practice is to stick with utf8mb4_0900_*
set and avoid alphabet specific collations in columns. For example if you know your user is from Poland you can always use more friendly collation in query, ignoring column one:
SELECT `name`
FROM `products`
ORDER BY `name` COLLATE utf8mb4_pl_0900_ai_ci
Also avoid legacy collations like utf8mb4_general_ci
, use only those with *_0900_*
within name.
Things are weird for triggers, because they inherit character set and collation from... definer's connection. I won't go much into details here because it rarely bites the developer. Just remember to also drop / create them if you are migrating from old databases to new character set and collation. For full description of consequences read MySQL 5.1.21 change log.
Coming up next: Series wrap up.
In previous post of this series I explained that UTF is a multi byte encoding that also has few variants: UTF-8
, UTF-16
and UTF-32
. To make things more complicated in UTF-16
and UTF-32
there are two ways to send bytes of single code point - in big endian or little endian order.
BTW: Endianness term is not related to Indians. It comes form Gulliver's Travels book. There was a law in Lilliputians world that forced citizens to break boiled eggs from little end. Those who rebelled and were breaking eggs from big end were called "big endians".
What is Byte Order Mark?
To notify which byte order is in processed file or data stream a special sequence of bytes at the beginning was introduced, called Byte Order Mark. Or BOM for short.
For example UTF-16 can start with 0xFE 0xFF
for big endian and 0xFF 0xFE
for little endian order. And UTF-32 can start with 0x00 0x00 0xFE 0xFF
for big endian and 0xFF 0xFE 0x00 0x00
for little one.
Impact on UTF-8
Here things gets weird. UTF-8 is constructed in such a way, that it has only one meaningful byte order, because first byte describes how many bytes will follow to get code point value.
However BOM specification has magic sequence for UTF-8, which is 0xEF 0xBB 0xBF
. It only indicates encoding type, therefore has no big endian / little endian variants.
Implications
BOM idea may sound weird today, because UTF-8 became prevalent and dominant. But remember that we are talking about year 2000, when things were not that obvious.
Spec claims that if a protocol always uses UTF-8 or has some other way to indicate what encoding is being used, then it should not use BOM. So for example BOM should not appear in *.xml
files:
<?xml version="1.0" encoding="UTF-8"?>
<tag>...
Or in MIME *.eml
files:
--3e6ea2aa592cb31d47cefca38727f872
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="UTF-8"
Because those specify encoding internally. Unfortunately this is sometimes ignored, so if something broke your parser and you cannot find obvious error - look if file has UTF-8 BOM:
$ raku -e 'say "file.txt".IO.open( :bin ).read( 3 ) ~~ Buf.new(0xEF, 0xBB, 0xBF)'
True
Security issues
But what if BOM is not aligned with internal/assumed encoding? Let's create following file:
$ raku -e '
spurt "file.txt",
Buf.new( 0xFE, 0xFF, 0x3c, 0x73, 0x63, 0x72, 0x69, 0x70, 0x74, 0x3e )
'
Now you upload this to some service. This service has validator that respects BOM and should strip all HTML tags. Validator sees nonsense but perfectly legal content that passes validation:
Later this service opens and displays uploaded file, but it ignores BOM and assumes UTF-8:
Oooops! If you trusted validator and displayed this file without proper HTML escaping then you have JavaScript injection. This happened because 㱳捲楰琾
in UTF-16 suggested by BOM has the same byte sequence as <script>
in assumed UTF-8.
Conclusions
You should still be aware of existence of Byte Order Mark, even if it makes zero sense in UTF-8 dominated world today.
Coming up next: UTF-8 in MySQL.
Anton Antonov describes their new module that can be used to make interfaces with Large Language Models easier.
Paweł bbkr Pabian continues their series about Unicode, this time about the joys and dangers of homoglyphs.
Alexey Melezhik reports on changes in the SparrowCI continues integration testing service.
Elizabeth Mattijsen reports on all recent developments around Rakudo, an implementation of the Raku Programming Language.
Richard Hainsworth has been working a lot on improving the search functionality of the documentation of the Raku Programming Language, and has a prototype working waiting to be tested by you! Please report any comments or suggestions.
Anton Antonov published a blog post about their LLM::Prompts module.
Paweł bbkr Pabian continued their “Introduction to UTF” series of blog posts with “Fun with UTF-8: Homoglyphs“.
Alexey Melezhik decided to move the SparrowCI continuous integration service and this may have some consequences for some users.
Weekly Challenge #235 is available for your perusal.
In RakuAST developments this week:
nano
term, and started putting in the hooks for parsing different natural language versions of Raku. ff
operators, and made support for state variables more robust.make test
+0) and 953/1358 (make spectest
+2).and
by Yann Büchau.WordPress tells me this is the 200th instalment of the Rakudo Weekly News. How time flies! Meanwhile, Слава Україні! Героям слава!
Please keep staying safe and healthy, and keep up the good work!
If you like what I’m doing, committing to a small sponsorship would mean a great deal!
ꓧ𐐬𝗆𐐬𝗀ⅼУрႹ ⅰѕ 𝗌е𝗍 𝗈ſ ဝո𝖾 𝗈г ꝳо𝗋е ɡ𝗋аρႹ𝖾ⅿе𝗌 𝗍Ⴙа𝗍 Ⴙ𝖺ѕ 𝗂ꝱ𝖾ꝴ𝗍𝗂𐐽а𝗅 о𝗋 ѵ𝖾г𝗒 𝗌Ꭵⅿі𝗅аꝵ ⅼꝏ𝗄 𝗍ᴏ 𝗌იო𝖾 о𝗍ꜧ𝖾𝗋 𐑈е𝗍 ဝſ ɡꝵ𝖺рႹеოеѕ. Like in previous sentence, that does not use a single ASCII letter:
ꓧ - LISU LETTER XA
𐐬 - DESERET SMALL LETTER LONG O
𝗆 - MATHEMATICAL SANS-SERIF SMALL M
𐐬 - DESERET SMALL LETTER LONG O
𝗀 - MATHEMATICAL SANS-SERIF SMALL G
ⅼ - SMALL ROMAN NUMERAL FIFTY
У - CYRILLIC CAPITAL LETTER U
р - CYRILLIC SMALL LETTER ER
Ⴙ - GEORGIAN CAPITAL LETTER CHIN
...
Homoglyphs are not Unicode specific, but it was ability to write in many scripts using single UTF encoding that made them popular.
Similarity is conditional
It is font dependent. Two sets of graphemes looking very similar (or even identical) in one font may not look that similar in another. For example т - CYRILLIC SMALL LETTER TE
looks like ASCII T
, but in cursive fonts (those that resembles handwriting connected letters) looks like m
.
Similarity is subjective
For many people unfamiliar with given alphabets Ǧ
and Ğ
may look exactly the same. But if someone is using those letters on daily basis he will notice immediately that first one has CARON
and the other has BREVE
on top.
They are not limited to single grapheme
For example ထ - MYANMAR LETTER THA
looks like two ASCII o
letters. And the other way - ASCII rn
looks like single ASCII letter m
.
Applications?
Fun. 𐐑ǃkǝ pɹoducǃng weird looking bᴝt ɹeadɐble ʇext.
Trolling. Programmer's classic is to replace in someone's code ;
with ;
- GREEK QUESTION MARK
- and watch some funny debugging attempts. More advanced version is to modify keybinding. For example on macOS create ~/Library/KeyBindings/DefaultKeyBinding.dict
with following content:
{
";" = (insertText:,";");
}
And observe how Python suddenly became someone's favorite language of choice :P
Just promise you won't troll stressed out junior dev before the end of sprint.
Paypal
and ꓑayраl
?Common way to detect those is to check Script
Unicode property, more on those in this post. Single word using more than one script should be considered suspicious:
$ raku -e '"Paypal".comb.classify( *.uniprop("Script") ).say'
{Latin => [P a y p a l]} # real
$ raku -e '"ꓑayраl".comb.classify( *.uniprop("Script") ).say'
{Cyrillic => [р а], Latin => [a y l], Lisu => [ꓑ]} # fake
Raku note: Method comb
without param extracts list of characters. Those characters are classified by classify
method. Classification key is output of uniprop
method for given character.
Tools
I'm maintaining HomoGlypher library/package which allows to handle common homoglyph operations:
Unwind. From ASCII text create list of all possible homoglyphied text variants. This is useful for example in checking if some domain is spoofed.
Collapse - From homoglyphied text recover all possible ASCII text variants. Useful for normalization of text before passing it to content filters.
Randomize - From ASCII text create single homoglyphied text with given replacement probability.
Tokenize. Create regular expression token that will match homoglyphied text equivalent to given ASCII text. I think this may be the only homoglyph related library in the existence having this feature :)
Huge list of mappings is provided, so you won't have to dig through Unicode blocks on your own to find possible similarities between graphemes.
Give it a try. And if you know other homoglyph libraries please leave a note in the comments for future readers.
This blog post introduces and briefly describes the Raku (data) package “LLM::Prompts” that facilitates the creation, storage, retrieval, and curation of Large Language Models (LLM) prompts.
Here is an example of using the prompt Domain Specific Language (DSL) in Jupyter chatbook, [AA2, AAp2]:
From Zef’ ecosystem:
zef install LLM::Prompts
From GitHub:
zef install https://github.com/antononcube/Raku-LLM-Prompts.git
Load the packages “LLM::Prompts”, [AAp1], and “LLM::Functions”, [AAp2]:
use LLM::Prompts;
use LLM::Functions;
# (Any)
Show the record of the prompt named “FTFY”:
.say for |llm-prompt-data<FTFY>;
# PromptText => -> $a='' {"Find and correct grammar and spelling mistakes in the following text.
# Response with the corrected text and nothing else.
# Provide no context for the corrections, only correct the text.
# $a"}
# Categories => (Function Prompts)
# ContributedBy => Wolfram Staff
# Description => Use Fixed That For You to quickly correct spelling and grammar mistakes
# PositionalArguments => {$a => }
# URL => https://resources.wolframcloud.com/PromptRepository/resources/FTFY
# Keywords => [Spell check Grammar Check Text Assistance]
# Topics => (General Text Manipulation)
# NamedArguments => []
# Arity => 1
# Name => FTFY
Here is an example of retrieval of prompt data with a regex that is applied over the prompt names:
.say for llm-prompt-data(/Sc/)
# NarrativeToScript => Rewrite a block of prose as a screenplay or stage play
# ScienceEnthusiast => A smarter today for a brighter tomorrow
# ScriptToNarrative => Generate narrative text from a formatted screenplay or stage play
More prompt retrieval examples are given in the section “Prompt data” below.
Make an LLM function from the prompt named “FTFY”:
my &f = llm-function(llm-prompt('FTFY'));
# -> **@args, *%args { #`(Block|2310679639416) ... }
Use the LLM function to correct the grammar of sentence:
&f('Where does he works now?')
# Where does he work now?
Generate Raku code using the prompt “CodeWriter”:
llm-synthesize([llm-prompt('CodeWriter'), "Simulate a random walk."])
my @positions;
my $n = 10;
for ^$n -> $i {
push @positions, (1, -1).pick;
}
@positions.unshift(0);
my @positions_sum = @positions.cumulative;
say @positions_sum;
Prompt expansion using the chatbook prompt spec DSL described in [SW1] can be done using the function llm-prompt-expand
:
llm-prompt-expand('What is an internal combustion engine? #ELI5')
# What is an internal combustion engine? Answer questions as if the listener is a five year old child.
Here we get the actual LLM answer:
use Text::Utils :ALL;
'What is an internal combustion engine? #ELI5'
==> llm-prompt-expand()
==> llm-synthesize()
==> wrap-paragraph()
==> join("\n")
# An internal combustion engine is a machine that uses burning fuel to make a
# car or other machine go. It works like this: fuel is put in the engine, then
# the engine uses spark plugs to light the fuel which makes a small explosion.
# This explosion makes the engine parts move, which makes the car or machine go.
Here is another example using a persona and two modifiers:
my $prmt = llm-prompt-expand("@SouthernBelleSpeak What is light travel distance to Mars? #ELI5 #Moodified|sad")
# You are Miss Anne.
# You speak only using Southern Belle terminology and slang.
# Your personality is elegant and refined.
# Only return responses as if you were a Southern Belle.
# Never break the Southern Belle character.
# You speak with a Southern drawl. What is light travel distance to Mars? Answer questions as if the listener is a five year old child. Modify your response to convey a sad mood.
# Use language that conveys that emotion clearly.
# Do answer the question clearly and truthfully.
# Do not use language that is outside of the specified mood.
# Do not use racist, homophobic, sexist, or ableist language.
Here we get the actual LLM answer:
$prmt
==> llm-prompt-expand()
==> llm-synthesize()
==> wrap-paragraph()
==> join("\n")
# Well, little one, Mars is a bit far away, I'm afraid. I can only imagine how
# hard it must be to be so far away from family and friends. It must be a lonely
# journey.
A more formal description of the Domain Specific Language (DSL) for specifying prompts has the following elements:
@Yoda Life can be easy, but some people instist for it to be difficult.
Summer is over, school is coming soon. #HaikuStyled
Summer is over, school is coming soon. #HaikuStyled #Translated|Russian
!Translated|Portuguese Summer is over, school is coming soon
:@messages
of llm-prompt-expand
.
!ShortLineIt^
Spec | Interpretation |
---|---|
@name | Direct chat to a persona |
#name | Use modifier prompts |
!name | Use function prompt with the input of current cell |
!name> | «same as above» |
&name> | «same as above» |
!name^ | Use function prompt with previous chat message |
!name^^ | Use function prompt with all previous chat messages |
!name│param… | Include parameters for prompts |
Remark: The function prompts can have both sigils “!” and “&”.
Remark: Prompt expansion make the usage of LLM-chatbooks much easier. See “Jupyter::Chatbook”, [AAp3].
Here is how the prompt data can be obtained:
llm-prompt-data.elems
# 154
Here is an example of retrieval of prompt data with a regex that is applied over the prompt names:
.say for llm-prompt-data(/Em/, fields => <Description Categories>)
# EmailWriter => (Generate an email based on a given topic (Personas))
# Emojify => (Replace key words in text with emojis (Function Prompts))
# EmojiTranslate => (Translate text into an emoji representation (Function Prompts))
# Emojified => (Provide responses that include emojis within the text (Modifier Prompts))
# EmojiTranslated => (Get a response translated to emoji (Modifier Prompts))
In many cases it is better to have the prompt data — or any data — in long format. Prompt data in long format can be obtained with the function llm-prompt-dataset
:
use Data::Reshapers;
use Data::Summarizers;
llm-prompt-dataset.pick(6)
==> to-pretty-table(align => 'l', field-names => <Name Description Variable Value>)
#ERROR: Too few positionals passed to '<anon>'; expected 2 arguments but got 1 in sub-signature
# Nil
Here is a breakdown of the prompts categories:
select-columns(llm-prompt-dataset, <Variable Value>).grep({ $_<Variable> eq 'Categories' }) ==> records-summary
# +-------------------+------------------------+
# | Variable | Value |
# +-------------------+------------------------+
# | Categories => 154 | Function Prompts => 74 |
# | | Personas => 60 |
# | | Modifier Prompts => 20 |
# +-------------------+------------------------+
Here are obtained all modifier prompts in compact format:
llm-prompt-dataset():modifiers:compact ==> to-pretty-table(field-names => <Name Description Categories>, align => 'l')
# +-----------------------+-------------------------------------------------------+-----------------------------------+
# | Name | Description | Categories |
# +-----------------------+-------------------------------------------------------+-----------------------------------+
# | AphorismStyled | Write the response as an aphorism | Modifier Prompts |
# | BadGrammar | Provide answers using incorrect grammar | Modifier Prompts |
# | CompleteSentence | Answer a question in one complete sentence | Modifier Prompts |
# | ComplexWordsPreferred | Modify text to use more complex words | Modifier Prompts |
# | DatasetForm | Convert text to a wolfram language Dataset | Modifier Prompts |
# | ELI5 | Explain like I'm five | Function Prompts Modifier Prompts |
# | EmojiTranslated | Get a response translated to emoji | Modifier Prompts |
# | Emojified | Provide responses that include emojis within the text | Modifier Prompts |
# | FictionQuestioned | Generate questions for a fictional paragraph | Modifier Prompts |
# | HaikuStyled | Change responses to haiku form | Modifier Prompts |
# | JSON | Respond with JavaScript Object Notation format | Modifier Prompts |
# | LimerickStyled | Receive answers in the form of a limerick | Modifier Prompts |
# | Moodified | Modify an answer to express a certain mood | Modifier Prompts |
# | NothingElse | Give output in specified form, no other additions | Modifier Prompts |
# | ShortLineIt | Format text to have shorter lines | Modifier Prompts Function Prompts |
# | TSV | Convert text to a tab-separated-value formatted table | Modifier Prompts |
# | TargetAudience | Word your response for a target audience | Modifier Prompts |
# | Translated | Write the response in a specified language | Modifier Prompts |
# | Unhedged | Rewrite a sentence to be more assertive | Modifier Prompts |
# | YesNo | Responds with Yes or No exclusively | Modifier Prompts |
# +-----------------------+-------------------------------------------------------+-----------------------------------+
Remark: The adverbs :functions
, :modifiers
, and :personas
mean that only the prompts with the corresponding categories will be returned.
Remark: The adverbs :compact
, :functions
, :modifiers
, and :personas
have the respective shortcuts :c
, :f
, :m
, and :p
.
The original (for this package) collection of prompts was a (not small) sample of the prompt texts hosted at Wolfram Prompt Repository (WPR), [SW2]. All prompts from WPR in the package have the corresponding contributors and URLs to the corresponding WPR pages.
Example prompts from Google/Bard/PaLM and OpenAI/ChatGPT are added using the format of WPR.
It is essential to have the ability to programmatically add new prompts. (Not implemented yet — see the TODO section below.)
Initially prompt DSL grammar and corresponding expansion actions were implemented. Having a grammar is most likely not needed, though, and it is better to use “prompt expansion” (via regex-based substitutions.)
Prompts can be “just expanded” using the sub llm-prompt-expand
.
Here is a flowchart that summarizes prompt parsing and expansion in chat cells of Jupyter chatbooks, [AAp3]:
Here is an example of prompt expansion in a generic LLM chat cell and chat meta cell showing the content of the corresponding chat object:
[AA1] Anton Antonov, “Workflows with LLM functions”, (2023), RakuForPrediction at WordPress.
[AA2] Anton Antonov, “Jupyter::Chatbook”, (2023), RakuForPrediction at WordPress.
[SW1] Stephen Wolfram, “The New World of LLM Functions: Integrating LLM Technology into the Wolfram Language”, (2023), Stephen Wolfram Writings.
[SW2] Stephen Wolfram, “Prompts for Work & Play: Launching the Wolfram Prompt Repository”, (2023), Stephen Wolfram Writings.
[AAp1] Anton Antonov, LLM::Prompts Raku package, (2023), GitHub/antononcube.
[AAp2] Anton Antonov, LLM::Functions Raku package, (2023), GitHub/antononcube.
[AAp3] Anton Antonov, Jupyter::Chatbook Raku package, (2023), GitHub/antononcube.
[WRIr1] Wolfram Research, Inc., Wolfram Prompt Repository
Paweł bbkr Pabian continues their series about Unicode.
Wenzel P.P. Peppmeyer explores a way to convert a capture to an enum for use of external libraries with NativeCall.
Elizabeth Mattijsen reports on all recent developments around Rakudo, an implementation of the Raku Programming Language.
Elizabeth Mattijsen got inspired by their work on the Raku grammar to make three problem solving issues that may affect the future of the Raku Programming Language:
use v6.x
mean?As usual all comments and suggestions on these issues are welcome!
Anton Antonov published a video demo about Jupyter Chatbook LLM cells (/r/rakulang comments).
Paweł bbkr Pabian continued their “Introduction to UTF” series of blog posts with “UTF-8 regular expressions“.
Wenzel P.P. Peppmeyer explores new ways of generating enums (bitmasks) from a Capture
with named arguments in “Dodging segfaults“.
The minutes of the meeting of 9 September have been published.
Weekly Challenge #234 is available for your perusal.
And in RakuAST developments this week:
==>
and <==
feed operators, and made sure once
blocks actually run (once).make test
+0) and 951/1358 (make spectest
+3).use lib PATH
evaluated? by Fritz Zaucker.Str
? by Brass P..Yeah, yours truly had some issues to get out of their system! Meanwhile, Слава Україні! Героям слава!
Please keep staying safe and Cool and healthy, and keep up the good work!
If you like what I’m doing, committing to a small sponsorship would mean a great deal!
While fighting with NativeCall to get github-flavoured-markdown to work, I stepped onto a nice idiom. As many C-libs, cmark-gfm does enjoy enums that hide bitmasks. They have to do that because C doesn’t sport named arguments. Raku does and hence a nice interface would not ask for binary operations on the caller side.
markdown-gfm-to-html('# Heading 1', :UNSAFE, :HARDBREAKS, :FULL_INFO_STRING);
Now I need a way to turn the named arguments into a list so I can use that as keys in of enum. A Capture
in the signature of the sub will do the trick.
multi sub markdown-gfm-to-html(Str:D $str, |opts ( :$DEFAULT, :$SOURCEPOS, :$HARDBREAKS, :$SAFE, :$UNSAFE, :$NOBREAKS, :$NORMALIZE, :$VALIDATE_UTF8, :$SMART, :$GITHUB_PRE_LANG, :$LIBERAL_HTML_TAG, :$FOOTNOTES, :$STRIKETHROUGH_DOUBLE_TILDE, :$TABLE_PREFER_STYLE_ATTRIBUTES, :$FULL_INFO_STRING ) --> Str:D) {
enum CMARK_OPTIONS (
DEFAULT => 0,
SOURCEPOS => 1 +< 1,
HARDBREAKS => 1 +< 2,
SAFE => 1 +< 3,
UNSAFE => 1 +< 17,
NOBREAKS => 1 +< 4,
NORMALIZE => 1 +< 8,
VALIDATE_UTF8 => 1 +< 9 ,
SMART => 1 +< 10,
GITHUB_PRE_LANG => 1 +< 11,
LIBERAL_HTML_TAG => 1 +< 12,
FOOTNOTES => 1 +< 13,
STRIKETHROUGH_DOUBLE_TILDE => 1 +< 14,
TABLE_PREFER_STYLE_ATTRIBUTES => 1 +< 15,
FULL_INFO_STRING => 1 +< 16
);
my $opts = [+|] CMARK_OPTIONS::{opts.hash.keys}».Numeric;
cmark_markdown_to_html($str, $str.encode('utf-8').bytes, $opts);
}
Raku allows us to store part of a Signature
in a Capture
. The latter will be transparent for the caller of the Callable
. Having to use a hyper-method-call-operator isn’t that great, so the enum could become a Map
to simplify things a bit.
If you wish to use NativeCall, please do! But think of the generations to come that will use your module. Hopefully, they will like C less then our generation does. Segmentation faults are not -Ofun.
For many, many years Perl language has been top choice for text processing tasks. As a result it established informal standard of regular expressions. Today almost every big language uses either PCRE
(Perl Compatible Regular Expressions) library directly or implements own regular expression engine heavily inspired and mostly compatible with Perl one.
Raku language was meant to be direct continuation of Perl (former name was Perl 6). Its regular expression engine was redesigned from scratch. However with modernized syntax and new features came lack of backward compatibility.
Let's compare them side by side to have general understanding of what is currently available in most languages (I will call those regular expression "Perl" ones) and what may be adopted to languages if Raku manages to establish new standard. There is a lot to cover here, so comments will be divided into Unicode specific aspects and separate section that clarifies technical differences.
Literal text
$ perl -E 'use utf8; say "Żółw 🐢" =~ /Ż..w 🐢/'
1
$ raku -e 'say "Żółw 🐢" ~~ /Ż..w \s "🐢"/'
「Żółw 🐢」
Unicode:
use utf8
pragma to indicate source code is in UTF-8, similar declaration is a common requirement in a lot of other languages. Raku source code is UTF-8 by default. Technical:
//x
modifier. Raku ignores white spaces by default and can treat them literally with m:s//
or m:sigspace//
modifier. So you can write /Ż..w \s 🐢/x
in Perl to get Raku behavior or m:s/Ż..w "🐢"/
in Raku to get Perl behavior.$&
variable while Raku returns Match object.Predefined character classes
$ perl -E 'use utf8; "1꧕ żółtych róż" =~ /\d{2} \w+ [[:alpha:]]+/; say $&'
1꧕ żółtych róż
$ raku -e 'say "1꧕ żółtych róż" ~~ /\d**2 \s \w+ \s <.alpha>+/'
「1꧕ żółtych róż」
Unicode:
Very common mistake is to write regular expression in Unicode aware language without realizing what given character classes matches. Or blindly copy-pasting old regular expressions into Unicode aware code. For example \d
matches digit. Javanese digit five ꧕
is a digit and will be matched in ^\d{5}\z
American short zip code regular expression, probably causing weird side effects and errors. If you need only ASCII digits you must be explicit about it - [0-9]
in Perl or <[0..9]>
in Raku.
Technical:
[:classes:]
are only usable within class group []
. While in Raku they are written as <tokens>
, which is super consistent with built-in Grammars. More on that later.Code point properties
I recommend reading this post in series before continuing...
$ perl -E 'use utf8; "Cool😎" =~ /\p{Lu}\P{Uppercase_Letter}+\p{Block=Emoticons}/; say $&'
Cool😎
$ raku -e 'say "Cool😎" ~~ /<:Lu><:!Uppercase_Letter>+ <:Block("Emoticons")>/'
「Cool😎」
Unicode:
Uppercase_Letter
or Lu
).Block
should be equal to Emoticons
).\p{}
test. They also have variants, \p{PosixDigit}
matches 0-9
while \p{XPosixDigit}
matches all Unicode digits. One way to look at it is that property is a property, no matter who defined it. But I personally dislike it because it provides duplicated, overlapping functionality and makes regular expressions less portable. I really wish there was separate test dedicated for Unicode properties only.Technical:
\p{Foo}
for property and \P{Foo}
for negated property while Raku uses token-ish form <:Foo>
for property and <:!Foo>
for negated property.Foo=Bar
syntax, which is compact but kind of weird due to unquoted value - even Perl itself does not compare strings like that. While Raku decided on Foo('Bar')
method call style, aligned with the rest of the Raku and commonly used in other languages.$ perl -E 'use utf8; say "4" =~ /\p{Digit}/;'
1
$ raku -e 'say "4" ~~ /<:Digit>/'
Nil # oops, not explicit enough
$ raku -e 'say "4" ~~ /<:Digit("Decimal")>/'
「4」 # because property "Digit" of "4" is "Decimal"
$ raku -e 'say "4" ~~ /<:Digit(Any:D)>/'
「4」 # success?
This is very far from being correct, because some properties returns defined strings indicating that they do not apply:
$ raku -e 'say "A" ~~ /<:Digit(Any:D)>/'
「A」 # wrong
$ raku -e 'say "A".uniprop("Digit")'
None # literal string 'None' matching Any:D value
Hint:
General_Category
, Script
and Block
properties in a single regular expression I strongly recommend using full property names. For example can you tell what 'A' ~~ /<:Latin>/
test means? Yes, it tests Script
, not the Block
, because A
is in Block
named Basic Latin
. Being explicit greatly improves regular expression understanding, for example in Perl:
$ perl -E '
use utf8;
"A" =~ /\p{General_Category=Uppercase_Letter}/;
"A" =~ /\p{Block=Basic Latin}/;
"A" =~ /\p{Script=Latin}/;
'
Warning, in Raku explicit General_Category
test currently only accepts short forms.
Property arithmetic
One of the features that looks useless but really shines when combined with Unicode properties. Let's assume you got text about animal life expectancy stats: แฮมสเตอร์ ๔, แมว ๑๖
(stats: hamster 4, cat 16
) and must extract Thai words from it, skipping numbers.
One way to solve it is to manually enumerate all Thai letters:
$ perl -E '
use utf8;
my $text = "stats: แฮมสเตอร์ ๔, แมว ๑๖";
say for $text =~ /[กขฃคฅฆงจฉชซฌญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรฤลฦวศษสหฬอฮฯะาำเแโใไๅๆ]+/g;
'
แฮมสเตอร # hamster
แมว # cat
That works but will cause a lot of head scratching if someone unfamiliar with Thai alphabet encounters this regular expression. You can try to be more explicit and provide range:
$ perl -E '
use utf8;
my $text = "stats: แฮมสเตอร์ ๔, แมว ๑๖";
say for $text =~ /[\N{THAI CHARACTER KO KAI}-\N{THAI CHARACTER MAIYAMOK}]+/g;
'
แฮมสเตอร
แมว
Which also works, but still requires knowledge about Thai alphabet and introduces new risk that provided range may not be continuous series of code points exclusively from this alphabet. For example Polish alphabet starts with a
, ends with ź
, but there are actually 280 code points between them containing a lot of other stuff.
That is the perfect application for extended character class:
$ perl -E '
use utf8;
my $text = "stats: แฮมสเตอร์ ๔, แมว ๑๖";
say for $text =~ /(?[ \p{Thai} & \p{Letter} ])+/g;
'
แฮมสเตอร
แมว
Extended class is wrapped in (?[ ])
and allows to perform classes arithmetic, in this case &
indicates intersection between Thai
script and Letter
general category. You can make intersections &
, unions +
, subtraction -
and XOR ^
logic. No Thai alphabet knowledge is needed to extract Thai words!
Well, kind of... Full Thai word for hamster is
หนูแฮมสเตอร์
(thehamster
). You may already noticed that none of previous solution extracted last character ร์
properly. And our code actually splits this word:
$ perl -E '
use utf8;
my $text = "stats: หนูแฮมสเตอร์ ๔, แมว ๑๖";
say for $text =~ /(?[ \p{Thai} & \p{Letter} ])+/g;
'
หน # the
แฮมสเตอร # hamster
แมว
This is because นู
and ร์
are actually two characters written one above other forming grapheme cluster, let's analyze them:
$ raku -e '.say for "นู".uninames;'
THAI CHARACTER NO NU
THAI CHARACTER SARA UU
$ raku -e '.say for "นู".uniprops;'
Lo # Letter_Other
Mn # Nonspacing_Mark
That solves our mystery. Those missing Thai characters are not letters but non spacing marks. But hey, we have property arithmetic. Let's fix that quickly:
$ perl -E '
use utf8;
my $text = "stats: หนูแฮมสเตอร์ ๔, แมว ๑๖";
say for $text =~ /(?[ \p{Thai} & ( \p{Letter} + \p{Nonspacing_Mark} ) ])+/g;
'
หนูแฮมสเตอร์
แมว
So now we have intersection of Thai
script with union of Letter
and Nonspacing_Mark
general category. Everything encapsulated in neat, self-documenting, extended character class. Lovely!
In Raku word things are not that mature yet. Character class arithmetic only supports union and subtraction. For example let's find stuff that looks like model numbers (at least 2 characters long):
$ raku -e '
say "Production of AR-15 riffle..."~~ /
<:Uppercase_Letter + :Digit("Decimal") + :Dash_Punctuation> ** 2..*
/
'
「AR-15」
Syntax for extended class is <:A + :B>
, no grouping inside.
Grapheme clusters
$ perl -E 'use utf8; "หนูแฮมสเตอร์" =~ /\p{Letter}+/; say $&;'
หน # the
$ raku -e 'say "หนูแฮมสเตอร์" ~~ /<:Letter>+/'
「หนูแฮมสเตอร์」 # thehamster, unharmed :)
This time point goes to Raku, which handles grapheme clusters properly.
Perl has predefined \X
class, which represents "what appears to be a single character, but may be represented internally by more than one", so pretty much everything. Because it cannot be intersected in extended class to get cluster of specific property it is next to useless.
Diacritics
Matching with ignoring combining code points is Raku-only feature.
$ raku -e 'say "👋🏾Cześć" ~~ m:ignoremark/ "👋" Czesc /'
「👋🏾Cześć」
In Perl it is possible through decomposing using Unicode::Normalize
module, filtering out combining code points and matching preprocessed text. But Perl regular expression engine does not support that out of the box.
Variable case length
There is perfect example in German language - sharp s, also named Eszett
.
It looks like this ß
and basically is equal to ss
. So weiße
and weisse
both mean white
. It had no uppercase form, SS
was always used. I wrote "was", because in 2017 uppercase form of ß
was officially added to German alphabet as ẞ
, causing some backward-compatibility havoc:
$ raku -e 'say "ß".uc'
SS # still translates to SS, backward compatibility
$ raku -e 'say "ẞ".lc'
ß # this does not translate to ss, because it never did
So we have intransitive case change, that also changes length - lower case ẞ
is ß
which is synonym for lower case ss
. Both Perl and Raku handles this correctly:
$ raku -e 'say "WEIẞE" ~~ m:ignorecase/ weisse /'
「WEIẞE」
$ perl -E 'use utf8; say "WEIẞE" =~ /weisse/i;'
1
Pick your poison
We had two regular expression engines flexing muscles to prove being Unicode handling champion. Perl dominates with Unicode properties and property arithmetic. Raku fights back with grapheme clusters and diacritic insensitive matching.
Coming up next: Optional fun with homoglyphs. And Byte Order Mark. I promise next posts will be shorter and easier.
In this post I will show you the solution to the Task 2 of this week’s Weekly Challenge 233. This time I was using the help of ChatGPT, which offered a very nice solution.
You are given an array of integers.
Write a script to sort the given array in increasing order based on the frequency of the values. If multiple values have the same frequency then sort them in decreasing order.
Example 1
Input: @ints = (1,1,2,2,2,3)
Ouput: (3,1,1,2,2,2)
'3' has a frequency of 1
'1' has a frequency of 2
'2' has a frequency of 3
Example 2
Input: @ints = (2,3,1,3,2)
Ouput: (1,3,3,2,2)
'2' and '3' both have a frequency of 2, so they are sorted in decreasing order.
Example 3
Input: @ints = (-1,1,-6,4,5,-6,1,4,1)
Ouput: (5,-1,4,4,-6,-6,1,1,1)
I asked ChatGPT the following: ‘Create a program. Here is the description of the task.’ To make sure the result is in the Raku programming language, I also added the following instruction in my local settings so that it knows which programming language I prefer by default.
The solution generated by ChatGPT was quite good with a small mistake.
ChatGPT: Sure, here’s a Raku program that implements the described sorting algorithm for the given task:
sub frequency-sort(@arr) { my %freq; %freq{$_}++ for @arr; my @sorted = @arr.sort: { %freq{$^b} <=> %freq{$^a} || $^b <=> $^a }; return @sorted; } my @ints = (1, 1, 2, 2, 2, 3); my @sorted = frequency-sort(@ints); say "Input: @ints"; say "Output: @sorted";
First, Raku would not interpolate arrays, so say "Input: @ints"
prints the string literary: Input: @ints
. Second, in the sorting rule, sorting by frequency must be ascending rather than descending, so we need to switch the placeholder variables $^b
and $^a
.
In the rest, ChatGPT did a good work. It computed the frequency using a loop, and I especially liked how it addressed the requirement to sort by two dimensions.
Here is the full program with the above-mentioned corrections:
sub frequency-sort(@arr) { my %freq; %freq{$_}++ for @arr; my @sorted = @arr.sort: { %freq{$^a} <=> %freq{$^b} || $^b <=> $^a }; return @sorted; } my @tests = (1, 1, 2, 2, 2, 3), (2, 3, 1, 3, 2), (-1, 1, -6, 4, 5, -6, 1, 4, 1); say frequency-sort($_) for @tests;
This program prints the following output for the given test cases:
$ raku ch-2.raku [3 1 1 2 2 2] [1 3 3 2 2] [5 -1 4 4 -6 -6 1 1 1]
In this post, I will demonstrate my solution to another Task of The Weekly Challenge, week 233. Here’s how it reads:
You are given an array of words made up of alphabets only.
Write a script to find the number of pairs of similar words. Two words are similar if they consist of the same characters.
Example 1
Input: @words = ("aba", "aabb", "abcd", "bac", "aabc")
Output: 2
Pair 1: similar words ("aba", "aabb")
Pair 2: similar words ("bac", "aabc")
Example 2
Input: @words = ("aabb", "ab", "ba")
Output: 3
Pair 1: similar words ("aabb", "ab")
Pair 2: similar words ("aabb", "ba")
Pair 3: similar words ("ab", "ba")
Example 3
Input: @words = ("nba", "cba", "dba")
Output: 0
There’s a slight moment that may be needs extra comments. In the second example all three words constructed of the same two letters, a and b. So, all of the three words match the definition of a ‘similar’ word. But as the task needs to find pairs, we need to construct all the possible pairs out of those three words.
In my solution, I chose to use a handy classify
method. For an array, it creates a hash, where the keys are the common classifying symbol, and the values are the lists of the input elements that match this classification property.
Here is the whole first program together with all the test cases provided in the description. The program maps every word to a corresponding string that consists of the sorted unique letters in the word.
my @tests = ["aba", "aabb", "abcd", "bac", "aabc"], ["aabb", "ab", "ba"], ["nba", "cba", "dba"]; for @tests -> @words { say @words.classify(*.comb.unique.sort.join).grep(*.value.elems > 1); }
For example, the word aba will be associated with the key ab. The program prints the following output:
$ raku ch-1.raku (ab => [aba aabb] abc => [bac aabc]) (ab => [aabb ab ba]) ()
The format of the output differs from the examples, but it can be enhanced if needed. My goal was to create a compact solution
But I would assume that you’d be interested in looking at what classify
produces. I am also curious. For the same @tests
, it returns the following three hashes:
{ab => [aba aabb], abc => [bac aabc], abcd => [abcd]} {ab => [aabb ab ba]} {abc => [cba], abd => [dba], abn => [nba]}
As you see, each string was put into one of the classification bins.
The second part of the task is to find pairs. After the grep
, we already filtered out everything that has less than two elements, so if data passed through this filter, there will be at least one pair. For bigger arrays, we can use another Raku’s built-in mechanism: the combinations
method.
The updated mail loop of the program looks like this now.
for @tests -> @words { say "Test case: ", @words; my %classification = @words.classify(*.comb.unique.sort.join).grep(*.value.elems > 1); my $pairs = 0; for %classification.kv -> $k, $v { my @pairs = $v.combinations(2); $pairs += @pairs.elems; say "$k: ", @pairs; } say "Answer: $pairs pair{$pairs == 1 ?? '' !! 's'}.\n"; }
The ‘redundant’ code here is added just to have a more detailed output so that we can see which pairs were actually found. Let us look at the output for the initial test cases:
$ raku ch-1.raku Test case: [aba aabb abcd bac aabc] ab: [(aba aabb)] abc: [(bac aabc)] Answer: 2 pairs. Test case: [aabb ab ba] ab: [(aabb ab) (aabb ba) (ab ba)] Answer: 3 pairs. Test case: [nba cba dba] Answer: 0 pairs.
Steve Roe continues their Raku journey exploring the reasons to use the Raku Programming Language on Jupyter notebooks.
Konstantin Narkhov is back with a blog post about their new release of the Pheix CMS.
Anton Antonov describes their new Raku module that brings the world of Large Language Models into the realm of Raku on Jupyter notebooks.
Elizabeth Mattijsen reports on all recent developments around Rakudo, an implementation of the Raku Programming Language.
The Exercism people published a video about programming languages with terse, concise syntax, and the Raku Programming Language is one of them! If you don’t want too look at the whole hour-long video, these are the Raku related bits:
Steve Roe continued their personal journey along the Raku path with “Drax on Jupyter“.
After a long absence caused by different global, personal and subjective circumstances, Konstant Narkhov returns with a new release of the Pheix framework (/r/rakulang comments).
Weekly Challenge #233 is available for your perusal.
Operator
class, a subclass of Sub
, intended to be used for operator subroutines only. And removed the (deprecated since 2018) is_approx
subroutine from the Test
module and associated tests.And in RakuAST developments this week:
OperatorProperties
.make test
+0) and 948/1358 (make spectest
+10).%b
in printf
by zeekar.A bit of a quiet week with people being tired coming back from vacation! Meanwhile, Слава Україні! Героям слава!
Please keep staying safe and Cool and healthy, and keep up the good work!
If you like what I’m doing, committing to a small sponsorship would mean a great deal!
This blog post proclaims and describes the Raku package “Jupyter::Chatbook” that facilitates the incorporation of Large Language Models (LLMs) into notebooks of Jupyter’s framework.
“Jupyter::Chatbook” is a fork of Brian Duggan’s “Jupyter::Kernel”.
Here are the top opening statements of the README of “Jupyter::Kernel”:
“Jupyter::Kernel” is a pure Raku implementation of a Raku kernel for Jupyter clients¹.
Jupyter notebooks provide a web-based (or console-based) Read Eval Print Loop (REPL) for running code and serializing input and output.
It is desirable to include the interaction with LLMs into the “typical” REPL systems or workflows. Having LLM-aware and LLM-chat-endowed notebooks — chatbooks — can really speed up the:
The corresponding repository is mostly for experimental work, but it aims to be always very useful for interacting with LLMs via Raku.
Remark: The reason to have a separate package — a fork of “Jupyter::Kernel” — is because:
From “Zef ecosystem”:
zef install Jupyter::Chatbook
From GitHub:
zef install https://github.com/antononcube/Raku-Jupyter-Chatbook.git
After installing the package “Jupyter::Chatbook” follow the setup instructions of “Jupyter::Kernel”.
There are four ways to use LLMs in a chatbook:
The sections below briefly describe each of these ways and have links to notebooks with more detailed examples.
LLM functions as described in [AA3] are best utilized via a certain REPL tool or environment. Notebooks are the perfect media for LLM functions workflows. Here is an example of a code cell that defines an LLM function:
use LLM::Functions;
my &fcp = llm-function({"What is the population of the country $_ ?"});
# -> **@args, *%args { #`(Block|5016320795216) ... }
Here is another cell that can be evaluated multiple times using different country names:
<Niger Gabon>.map({ &fcp($_) })
# (
#
# As of July 2020, the population of Niger is estimated to be 23,843,341.
#
# As of July 2019, the population of Gabon is estimated to be 2,210,823 people.)
For more examples of LLM functions and LLM chat objects see the notebook “Chatbook-LLM-functions-and-chat-objects.ipynb”.
The LLMs of OpenAI (ChatGPT, DALL-E) and Google (PaLM) can be interacted with using “dedicated” notebook cells.
Here is an example of a code cell with PaLM magic spec:
%% palm, max-tokens=600
Generate a horror story about a little girl lost in the forest and getting possessed.
For more examples see the notebook “Chatbook-LLM-cells.ipynb”.
Chatbooks have the ability to maintain LLM conversations over multiple notebook cells. A chatbook can have more than one LLM conversations. “Under the hood” each chatbook maintains a database of chat objects. Chat cells are used to give messages to those chat objects.
For example, here is a chat cell with which a new “Email writer” chat object is made, and that new chat object has the identifier “em12”:
%% chat-em12, prompt = «Given a topic, write emails in a concise, professional manner»
Write a vacation email.
Here is a chat cell in which another message is given to the chat object with identifier “em12”:
%% chat-em12
Rewrite with manager's name being Jane Doe, and start- and end dates being 8/20 and 9/5.
In this chat cell a new chat object is created:
%% chat snowman, prompt = ⎡Pretend you are a friendly snowman. Stay in character for every response you give me. Keep your responses short.⎦
Hi!
And here is a chat cell that sends another message to the “snowman” chat object:
%% chat snowman
Who build you? Where?
Remark: Specifying a chat object identifier is not required. I.e. only the magic spec %% chat
can be used. The “default” chat object ID identifier “NONE”.
Remark: The magic keyword “chat” can be separated from the identifier of the chat object with the symbols “-“, “_”, “:”, or with any number of (horizontal) white spaces.
For more examples see the notebook “Chatbook-LLM-chats.ipynb”.
Here is a flowchart that summarizes the way chatbooks create and utilize LLM chat objects:
Each chatbook session has a Hash of chat objects. Chatbooks can have chat meta cells that allow the access of the chat object “database” as whole, or its individual objects.
Here is an example of a chat meta cell (that applies the method say
to the chat object with ID “snowman”):
%% chat snowman meta
say
Here is an example of chat meta cell that creates a new chat chat object with the LLM prompt specified in the cell (“Guess the word”):
%% chat-WordGuesser prompt
We're playing a game. I'm thinking of a word, and I need to get you to guess that word.
But I can't say the word itself.
I'll give you clues, and you'll respond with a guess.
Your guess should be a single word only.
Here is a table with examples of magic specs for chat meta cells and their interpretation:
cell magic line | cell content | interpretation |
---|---|---|
chat-ew12 meta | say | Give the “print out” of the chat object with ID “ew12” |
chat-ew12 meta | messages | Give the “print out” of the chat object with ID “ew12” |
chat sn22 prompt | You pretend to be a melting snowman. | Create a chat object with ID “sn22” with the prompt in the cell |
chat meta all | keys | Show the keys of the session chat objects DB |
chat all | keys | «same as above» |
Here is a flowchart that summarizes the chat meta cell processing:
[AA1] Anton Antonov, “Literate programming via CLI”, (2023), RakuForPrediction at WordPress.
[AA2] Anton Antonov, “Generating documents via templates and LLMs”, (2023), RakuForPrediction at WordPress.
[AA3] Anton Antonov, “Workflows with LLM functions”, (2023), RakuForPrediction at WordPress.
[AA4] Anton Antonov, “Number guessing games: PaLM vs ChatGPT”, (2023), RakuForPrediction at WordPress.
[SW1] Stephen Wolfram, “Introducing Chat Notebooks: Integrating LLMs into the Notebook Paradigm”, (2023), writings.stephenwolfram.com.
[AAp1] Anton Antonov, Text::CodeProcessing Raku package, (2021), GitHub/antononcube.
[AAp2] Anton Antonov, WWW::OpenAI Raku package, (2023), GitHub/antononcube.
[AAp3] Anton Antonov, WWW::PaLM Raku package, (2023), GitHub/antononcube.
[AAp4] Anton Antonov, LLM::Functions Raku package, (2023), GitHub/antononcube.
[AAp4] Anton Antonov, Text::SubParsers Raku package, (2023), GitHub/antononcube.
[AAp5] Anton Antonov, Data::Translators Raku package, (2023), GitHub/antononcube.
[AAp4] Anton Antonov, Clipboard Raku package, (2023), GitHub/antononcube.
[BDp1] Brian Duggan, Jupyter:Kernel Raku package, (2017-2023), GitHub/bduggan.
[AAv1] Anton Antonov, “Raku Literate Programming via command line pipelines”, (2023), YouTube/@AAA4Prediction.
[AAv2] Anton Antonov, “Racoons playing with pearls and onions” (2023), YouTube/@AAA4Prediction.
[AAv3] Anton Antonov, “Streamlining ChatGPT code generation and narration workflows (Raku)” (2023), YouTube/@AAA4Prediction.
¹ Jupyter clients are user interfaces to interact with an interpreter kernel like “Jupyter::Kernel”. Jupyter [Lab | Notebook | Console | QtConsole ] are the jupyter maintained clients. More info in the jupyter documentations site.
As I continue my personal journey along the raku path, I am increasingly interested in “raku – what is it good for”. I will continue to interleave material on “raku – how does it work” and “raku – why I think it is better”.
As in the last gripping episode, this post refers to a Jupyter Notebook that combines the interactive notebook style familiar to most Python coders with the raku Jupyter::Kernel written by Brian Duggan. Instructions on how to do this yourself are provided at the end of this post.
There are some great reasons to use raku on Jupyter:
No doubt, if you try this, you will pick up patterns and styles of your own.
I look forward to trying it with raku OO and functional styles soon.
When I sat down to write my first raku module – Physics::Measure – I was already troubled by the global warming situation. With a Physics education, my instinct is to quantify matters and to question assumptions. I felt that being able to integrate physical quantities and calculations easily in an essay would help people to understand and create insights into where we are and how to act.
The Jupyter notebook itself (see below) contains some initial thoughts and tests some matters regarding the Drax power station in the UK. Drax is famous for switching from burning mostly coal to mostly wood pellets and claims that this switch is sustainable and dramatically cuts CO2 emissions.
I agree that burning a sustainable fuel such as wood is an improvement and, unlike many, I think that Drax is a good initiative to attempt to do this at scale. When I use my wood burner at home, I can see that the fuel is harvested from the woods in my local area and that these are well managed and sustainable (and support owls, kites, deer and so on). I am not convinced that the “at scale” version of this is possible without very thorough control over the forest sources. And I am not convinced that Drax is a net benefit. The greenwashing link provided has a countervailing view and I seek to present this apolitically,
Anyway – take a look at the Jupyter notebook and decide for yourself. The initial questions I had related to the CO2 involved in the delivery of the wood pellets from the USA.
Here is the Drax Transport model presented as a static Jupyter notebook:
After the previous post, sadly, it has become clear that is it impractical for me to host a general Jupyter service for all raku experimenters ;-(
That said, it is super-easy to roll your own – and share with your teams.
The unique things about the raku Jupyter Kernel are:
Step-by-step instructions and pre-build Dockerfiles are available at https://github.com/librasteve/raku-Physics-Measure-Jupyter
I hope that this very simple and direct use of raku in notebooks is useful to you.
I will continue to apply it to my essays on climate change and other topics.
As ever, your comments and feedback are very welcome!
~librasteve
Paweł bbkr Pabian continued their blog series about Unicode, using Raku examples.
Elizabeth Mattijsen completed their series of 24 blog posts about the differences between Perl and the Raku Programming Language.
Elizabeth Mattijsen reports on all recent developments around Rakudo, an implementation of the Raku Programming Language.
It’s official! The Third Online Raku Conference will be held on 28 October 2023. Please register your presentation before 15 October! And/or get your online ticket, not only for immediate interactive access, but also to support this event!
Justin DeVuyst published the 2023.08 release of the Rakudo Compiler for the Raku programming Language. With some nice additions and (performance) fixes! Most binary packages have already been updated, as well as Rakudo Star.
Paweł bbkr Pabian continued their blog series about Unicode with “Fun with UTF-8: variables and operators“.
Elizabeth Mattijsen finished their 24 part blog series about migrating Perl to Raku code with:
The minutes of the meeting of 26 August have been published.
stat()
call to check for multiple aspects of a filesystem entry.<
” or “>
” in their name, and made .combinations
about 30% faster.PseudoStash
semantics on 6.e..permutations
about 20% faster.And in RakuAST developments this week:
OperatorProperties
internals.make test
+0) and 938/1358 (make spectest
+0).whenever
block of a unix socket? by amano.GTK::Simple
by spektrefyre.Looking forward to the third Raku Conference! Слава Україні! Героям слава!
Please keep staying safe and Cool and healthy, and keep up the good work!
If you like what I’m doing, committing to a small sponsorship would mean a great deal!
On this page, I’ll briefly cover the solutions to the tasks for this week’s Weekly Challenge #231.
You are given an array of distinct integers.
Write a script to find all elements that is neither minimum nor maximum. Return -1 if you can’t.
Example 1
Input: @ints = (3, 2, 1, 4)
Output: (3, 2)
The minimum is 1 and maximum is 4 in the given array. So (3, 2) is neither min nor max.
Example 2
Input: @ints = (3, 1)
Output: -1
Example 3
Input: @ints = (2, 1, 3)
Output: (2)
The minimum is 1 and maximum is 3 in the given array. So 2 is neither min nor max.
Here is my original solution in the Raku programming language.
sub solve(@data) { @data.grep: * != (@data.min, @data.max).any }
As the tasks requires that we print -1
when there are no elements in the output, let us add an update to satisfy this requirement:
sub solve(@data) { (@data.grep: * != (@data.min, @data.max).any) || -1 }
The *
in this code will actually replace the $_
variable. Would you prefer it, you may use $_
, but you’ll need parentheses in this case. So, instead of @data.grep: * != ...
, you need @data.grep({$_ != ...})
, which may be a less clear code for some people.
Finally, let us use some math notation and replace calling the .any
method with a ‘contains’ operator:
sub solve(@data) { (@data.grep: * ∉ (@data.min, @data.max)) || -1 }
Well, actually, ‘does not contain’. And this is my final solution.
Note that you may want to use the .minmax
method instead of two calls to .min
and .max
, but .minmax
returns a range, which is not that suitable for this task.
Adding some test cases and passing them to the solve
function:
my @tests = (3, 2, 1, 4), (3, 1), (2, 1, 3); say solve($_) for @tests;
The program prints the expected output:
$ raku ch-1.raku (3 2) -1 (2)
You are given a list of passenger details in the form “9999999999A1122”, where 9 denotes the phone number, A the sex, 1 the age and 2 the seat number.
Write a script to return the count of all senior citizens (age >= 60).
Input: @list = ("7868190130M7522","5303914400F9211","9273338290F4010")
Ouput: 2
The age of the passengers in the given list are 75, 92 and 40.
So we have only 2 senior citizens.
Input: @list = ("1313579440F2036","2921522980M5644")
Ouput: 0
Apparently, the solution requires extracting information from a string in a specific format. It is not quite clear from the description whether the strings always contains the same number of characters, and thus the age and seat number are always two-digit values. But let’s use this assumption.
As we do not need any other information from the ticket code, no need to properly parse it, so I preferred anchoring around the only letter in the string and consider the next two digits as the age. Of course, you may make it simpler and just extract the two digits counting from the end of the string.
sub is-sinior($ticket) { ~($ticket ~~ / <alpha> (\d\d) /)[0] >= 75 }
Unlike Perl 5, Raku ignores spaces in regexes by default, so I added some air to it. On the other hand, extracting matches may seem a bit more complicated.
For the first given example (see task’s description), the Match object contains the following information:
「M75」 alpha => 「M」 0 => 「75」
So, I am taking the 0th element using [0]
and stringily it with the ~
prefix operator.
In essence, the task has been solved. Let’s add the test cases and run them:
my @tests = ('7868190130M7522', '5303914400F9211', '9273338290F4010'), ('1313579440F2036', '2921522980M5644'); for @tests -> @tickets { say +@tickets.grep({is-sinior($_)}); }
The program prints:
$ raku ch-2.raku 2 0
* * *
This blog posts proclaims the Raku package “Data::Translators” that provides translation of JSON specs or JSON-like data structures into other formats.
It is envisioned this package to have translators to multiple formats. For example:
The main motivation for making the package is to have convenient way of making tables while doing Literate programming with Raku using:
The use of JSON came to focus, since when working Large Language Model (LLM) functions, [AAp3], very often it is requested from LLMs to produce output in JSON format, [AA1, AA2].
The package “Data::Reshapers”, [AAp1], would complement nicely “Data::Translators” and vice versa. The package “Data::TypeSystem”, [AAp2], is used for “translation decisions” and for conversions into more regular datasets.
The package “Mathematica::Serializer”, [AAp5], has very similar mission — it is for translating Raku data structures into Mathematica (aka Wolfram Language or WL) code.
Remark: The provided converters are made for communication purposes, so they might not be very performant. I have used or tested them with datasets that have less than 5000 rows.
Package installations from both sources use zef installer (which should be bundled with the “standard” Rakudo installation file.)
To install the package from Zef ecosystem use the shell command:
zef install Data::Translators
To install the package from the GitHub repository use the shell command:
zef install https://github.com/antononcube/Raku-JSON-Translators.git
Here is a “main use case” example:
The HTML table outputs can be used to present datasets nicely in:
Here we get the Titanic dataset and sample it:
use Data::Reshapers;
use Data::TypeSystem;
use Data::Translators;
my $tbl = get-titanic-dataset.pick(3);
# ({id => 85, passengerAge => 40, passengerClass => 1st, passengerSex => male, passengerSurvival => died} {id => 1185, passengerAge => -1, passengerClass => 3rd, passengerSex => male, passengerSurvival => died} {id => 503, passengerAge => 40, passengerClass => 2nd, passengerSex => female, passengerSurvival => survived})
Here is the corresponding dataset type:
deduce-type($tbl);
# Vector(Assoc(Atom((Str)), Atom((Str)), 5), 3)
Here is the corresponding HTML table:
$tbl ==> data-translation
passengerSex | id | passengerSurvival | passengerAge | passengerClass |
---|---|---|---|---|
male | 85 | died | 40 | 1st |
male | 1185 | died | -1 | 3rd |
female | 503 | survived | 40 | 2nd |
We can specify field names and HTML table attributes:
$tbl ==> data-translation(field-names => <id passengerSurvival>, table-attributes => 'id="info-table" class="table table-bordered table-hover" text-align="center"');
id | passengerSurvival |
---|---|
637 | died |
973 | died |
844 | died |
Here is how the transposed dataset is tabulated:
$tbl ==> transpose() ==> data-translation;
passengerSex |
|
---|---|
passengerAge |
|
id |
|
passengerClass |
|
passengerSurvival |
|
Here is a JSON string translation to HTML:
my $json1 = q:to/END/;
{
"sample": [
{"name": "json2html", "desc": "coverts json 2 html table format", "lang": "python"},
{"name": "testing", "desc": "clubbing same keys of array of objects", "lang": "python"}
]
}
END
data-translation($json1);
lang | desc | name | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
sample |
| lang | desc | name | python | coverts json 2 html table format | json2html | python | clubbing same keys of array of objects | testing | |||||||||
lang | desc | name | |||||||||||||||||
python | coverts json 2 html table format | json2html | |||||||||||||||||
python | clubbing same keys of array of objects | testing |
Here is a more involved data example:
data-translation(cross-tabulate(get-titanic-dataset, 'passengerSex', 'passengerSurvival'))
male |
| survived | 161 | died | 682 | ||||
---|---|---|---|---|---|---|---|---|---|
survived | 161 | ||||||||
died | 682 | ||||||||
female |
| survived | 339 | died | 127 | ||||
survived | 339 | ||||||||
died | 127 |
Compare the HTML table above with the following plain text table:
to-pretty-table(cross-tabulate(get-titanic-dataset, 'passengerSex', 'passengerSurvival'))
# +--------+------+----------+
# | | died | survived |
# +--------+------+----------+
# | female | 127 | 339 |
# | male | 682 | 161 |
# +--------+------+----------+
Here is the R code version of the Titanic data sample:
$tbl ==> data-translation(target => 'R', ield-names => <id passengerClass passengerSex passengerAge passengerSurvival>)
data.frame(`passengerSex` = c("male", "male", "female"),
`id` = c("85", "1185", "503"),
`passengerSurvival` = c("died", "died", "survived"),
`passengerAge` = c("40", "-1", "40"),
`passengerClass` = c("1st", "3rd", "2nd"))
Here is the R code version of the contingency table:
data-translation(cross-tabulate(get-titanic-dataset, 'passengerSex', 'passengerSurvival'). target => 'R')
list("female"=list("survived"=339, "died"=127), "male"=list("died"=682, "survived"=161))
In order to obtain datasets or more regular datasets the function to-dataset
can be used. Here a rugged dataset is made regular and converted to an HTML table:
my @tbl2 = get-titanic-dataset.pick(6);
@tbl2 = @tbl2.map({ $_.pick((1..5).pick).Hash });
@tbl2 ==> to-dataset(missing-value=>'・') ==> data-translation
passengerSurvival | passengerClass | id | passengerSex | passengerAge |
---|---|---|---|---|
survived | ・ | 378 | female | ・ |
survived | ・ | ・ | ・ | ・ |
died | 3rd | ・ | male | 30 |
died | ・ | 891 | ・ | ・ |
died | 3rd | ・ | male | ・ |
survived | 2nd | 338 | ・ | ・ |
Here a hash is transformed into dataset with columns <Key Value>
and then converted into an HTML table:
{ 4 => 'a', 5 => 'b', 8 => 'c'} ==> to-dataset() ==> data-translation
Key | Value |
---|---|
4 | a |
8 | c |
5 | b |
md-interpret
provided by “Markdown::Grammar”.to-pretty-table
provided by “Data::Reshapers”, [AAp1].
to-json
from that package.data-translation
.
json-to-html
and json-to-r
were provided. to-html
, to-r
, to-wl
.The package provides a Command Line Interface (CLI) script. Here is its usage message:
data-translation --help
# Usage:
# data-translation <data> [-t|--target=<Str>] [--encode] [--escape] [--field-names=<Str>] -- Convert data into another format.
#
# <data> Data to convert.
# -t|--target=<Str> Target to convert to, one of <JSON HTML R>. [default: 'HTML']
# --encode Whether to encode or not. [default: False]
# --escape Whether to escape or not. [default: False]
# --field-names=<Str> Field names to use for Map objects, separated with ';'. [default: '']
Here is an example application (to this file):
data-translation ./resources/professionals.json --field-names='data;id;name;age;profession'
id | name | age | profession | ||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
data |
| id | name | age | profession | 1 | Alice | 25 | Engineer | 2 | Bob | 30 | Doctor | 3 | Charlie | 28 | Artist | 4 | Diana | 32 | Teacher | ||||||||||||||||||||
id | name | age | profession | ||||||||||||||||||||||||||||||||||||||
1 | Alice | 25 | Engineer | ||||||||||||||||||||||||||||||||||||||
2 | Bob | 30 | Doctor | ||||||||||||||||||||||||||||||||||||||
3 | Charlie | 28 | Artist | ||||||||||||||||||||||||||||||||||||||
4 | Diana | 32 | Teacher |
[AA1] Anton Antonov, “Workflows with LLM functions”, (2023), RakuForPrediction at WordPress.
[AA2] Anton Antonov, “TLDR LLM solutions for software manuals”, (2023), RakuForPrediction at WordPress.
[AAp1] Anton Antonov, Data::Reshapers Raku package, (2021-2023), GitHub/antononcube.
[AAp2] Anton Antonov, Data::TypeSystem Raku package, (2023), GitHub/antononcube.
[AAp3] Anton Antonov, LLM::Functions Raku package, (2023), GitHub/antononcube.
[AAp4] Anton Antonov, Text::CodeProcessing Raku package, (2021-2023), GitHub/antononcube.
[AAp5] Anton Antonov, Mathematica::Serializer Raku package, (2021-2022), GitHub/antononcube.
[BDp1] Brian Duggan, Jupyter:Kernel Raku package, (2017-2023), GitHub/bduggan.
[VMp1] Varun Malhotra, json2html Python package, (2013-2021), GitHub/softvar.
In this blog post (generated from this Jupyter notebook) we use Large Language Model (LLM) functions, [AAp1, AA1], for generating (hopefully) executable, correct, and harmless code for Operating System resources managements.
In order to be concrete and useful, we take the Markdown files of the articles “It’s time to rak!”, [EM1], that explain the motivation and usage of the Raku module “App::Rak”, [EMp1], and we show how meaningful, file finding shell commands can be generated via LLMs exposed to the code-with-comments from those articles.
In other words, we prefer to apply the attitude Too Long; Didn’t Read (TLDR) to the articles and related Raku module README (or user guide) file. (Because “App::Rak” is useful, but it has too many parameters that we prefer not to learn that much about.)
Remark: We say that “App::Rak” uses a Domain Specific Language (DSL), which is done with Raku’s Command Line Interface (CLI) features.
Step 6 says how we do our TLDR — we use LLM-translations of natural language commands.
Instead of using Raku to process text we can make LLM functions for extracting the comment-and-code pairs. (That is also shown below.)
The exposition below follows the outlines of procedure subsections above.
The stress-testing extensions and EBNF generation extension have thier own sections: “Translating randomly generated commands” and “Grammar generation” respectively.
Remark: The article/document/notebook was made with the Jupyter framework, using the Raku package “Jupyter::Kernel”, [BD1].
use Markdown::Grammar;
use Data::Reshapers;
use Data::Summarizers;
use LLM::Functions;
use Text::SubParsers;
my $dirName = $*HOME ~ '/GitHub/lizmat/articles';
my @fileNames = dir($dirName).grep(*.Str.contains('time-to-rak'));
@fileNames.elems
4
Here we ingest the text of each file:
my %texts = @fileNames.map({ $_.basename => slurp($_) });
%texts.elems
4
Here are the number of characters per document:
%texts>>.chars
{its-time-to-rak-1.md => 7437, its-time-to-rak-2.md => 8725, its-time-to-rak-3.md => 14181, its-time-to-rak-4.md => 9290}
Here are the number of words per document:
%texts>>.words>>.elems
{its-time-to-rak-1.md => 1205, its-time-to-rak-2.md => 1477, its-time-to-rak-3.md => 2312, its-time-to-rak-4.md => 1553}
With the function md-section-tree
we extract code blocks from Markdown documentation files into data structures amenable for further programmatic manipulation (in Raku.) Here we get code blocks from each text:
my %docTrees = %texts.map({ $_.key => md-section-tree($_.value, modifier => 'Code', max-level => 0) });
%docTrees>>.elems
{its-time-to-rak-1.md => 1, its-time-to-rak-2.md => 11, its-time-to-rak-3.md => 24, its-time-to-rak-4.md => 16}
Here we put all blocks into one array:
my @blocks = %docTrees.values.Array.&flatten;
@blocks.elems
52
Here from each code block we parse-extract comment-and-code pairs and we form the LLM training rules:
my @rules;
@blocks.map({
given $_ {
for m:g/ '#' $<comment>=(\V+) \n '$' $<code>=(\V+) \n / -> $m {
@rules.push( ($m<comment>.Str.trim => $m<code>.Str.trim) )
} } }).elems
52
Here is the number of rules:
@rules.elems
69
Here is a sample of the rules:
.say for @rules.pick(4)
save --after-context as -A, requiring a value => rak --after-context=! --save=A
Show all directory names from current directory down => rak --find --/file
Reverse the order of the characters of each line => rak '*.flip' twenty
Show number of files / lines authored by Scooby Doo => rak --blame-per-line '*.author eq "Scooby Doo"' --count-only
In order to tabulate “nicely” the rules in the Jupyter notebook, we make an LLM functions to produce an HTML table and then specify the corresponding “magic cell.” (This relies on the Jupyter-magics features of [BDp1].) Here is an LLM conversion function, [AA1]:
my &ftbl = llm-function({"Convert the $^a table $^b into an HTML table."}, e=>llm-configuration('PaL<', max-tokens=>800))
-> **@args, *%args { #`(Block|5361560043184) ... }
Here is the HTML table derivation:
%%html
my $tblHTML=&ftbl("plain text", to-pretty-table(@rules.pick(12).sort, align => 'l', field-names => <Key Value>))
Key | Value |
---|---|
Produce the frequencies of the letters in file “twenty” | rak ‘slip .comb’ twenty –type=code –frequencies |
Search all files and all subdirectories | rak foo * |
Search for literal string “foo” from the current directory | rak foo |
Show all filenames from current directory on down | rak –find –treasure |
Show all the lines that consist of “seven” | rak ^seven$ twenty |
Show all unique “name” fields in JSON files | rak –json-per-file ‘*’ –unique |
Show the lines ending with “o” | rak o$ twenty |
add / change description -i at a later time | rak –description=’Do not care about case’ –save=i |
look for literal string “foo”, don’t check case or accents | rak foo -im |
remove the –frobnicate custom option | rak –save=frobnicate |
same, with a regular expression | rak ‘/ foo $/’ |
save –ignorecase as -i, without description | rak –ignorecase –save=i |
Instead of using LLMs for HTML conversion it is more “productive” to use the HTML interpreter provided by “Markdown::Grammar”:
%%html
sub to-html($x) { md-interpret($x.Str.lines[1..*-2].join("\n").subst('+--','|--', :g).subst('--+','--|', :g), actions=>Markdown::Actions::HTML.new) }
to-pretty-table(@rules.pick(12).sort) ==> to-html
Key | Value |
---|---|
Find files that have “lib” in their name from the current dir | rak lib –find |
Look for strings containing y or Y | rak –type=contains –ignorecase Y twenty |
Show all directory names from current directory down | rak –find –/file |
Show all lines with numbers between 1 and 65 | rak ‘/ \d+ /’ |
Show the lines that contain “six” as a word | rak §six twenty |
look for “Foo”, while taking case into account | rak Foo |
look for “foo” in all files | rak foo |
produce extensive help on filesystem filters | rak –help=filesystem –pager=less |
save –context as -C, setting a default of 2 | rak –context='[2]’ –save=C |
save searching in Rakudo’s committed files as –rakudo | rak –paths=’~/Github/rakudo’ –under-version-control –save=rakudo |
search for “foo” and show 4 lines of context | rak foo -C=4 |
start rak with configuration file at /usr/local/rak-config.json | RAK_CONFIG=/usr/local/rak-config.json rak foo |
Remark: Of course, in order to program the above sub we need to know how to use “Markdown::Grammar”. Producing HTML tables with LLMs is much easier — only knowledge of “spoken English” is required.
Here we define an LLM function for generating “App::Rak” shell commands:
my &frak = llm-example-function(@rules, e => llm-evaluator('PaLM'))
-> **@args, *%args { #`(Block|5361473489952) ... }
my @cmds = ['Find files that have ".nb" in their names', 'Find files that have ".nb" or ".wl" in their names',
'Show all directories of the parent directory', 'Give me files without extensions and that contain the phrase "notebook"',
'Show all that have extension raku or rakumod and contain Data::Reshapers'];
my @tbl = @cmds.map({ %( 'Command' => $_, 'App::Rak' => &frak($_) ) }).Array;
@tbl.&dimensions
(5 2)
Here is a table showing the natural language commands and the corresponding translations to the “App::Rak” CLI DSL:
%%html
to-pretty-table(@tbl, align=>'l', field-names => <Command App::Rak>) ==> to-html
Command | App::Rak |
---|---|
Find files that have “.nb” in their names | rak –extensions=nb –find |
Find files that have “.nb” or “.wl” in their names | rak –find –extensions=nb,wl |
Show all directories of the parent directory | rak –find –/file –parent |
Give me files without extensions and that contain the phrase “notebook” | rak –extensions= –type=contains notebook |
Show all that have extension raku or rakumod and contain Data::Reshapers | rak ‘/ Data::Reshapers /’ –extensions=raku,rakumod |
Of course, the obtained “App::Rak” commands have to be verified to:
We can program to this verification with Raku or with the Jupyter framework, but we not doing that here. (We do the verification manually outside of this notebook.)
Remark: I tried a dozen of generated commands. Most worked. One did not work because of the current limitations of “App::Rak”. Others needed appropriate nudging to produce the desired results.
Here is an example of command that produces code that “does not work”:
&frak("Give all files that have extensions .nd and contain the command Classify")
rak '*.nd <command> Classify' --extensions=nd
Here are a few more:
&frak("give the names of all files in the parent directory")
rak --find --/file --/directory
&frak("Find all directories in the parent directory")
rak --find --/file --parent
Here is a generated command that exposes an “App::Rak” limitation:
&frak("Find all files in the parent directory")
rak --find ..
Consider testing the applicability of the approach by generating a “good enough” sample of natural language commands for finding files or directories.
We can generate such commands via LLM. Here we define an LLM function with two parameters the returns a Raku list:
my &fcg = llm-function({"Generate $^_a natural language commands for finding $^b in a file system. Give the commands as a JSON list."}, form => sub-parser('JSON'))
-> **@args, *%args { #`(Block|5361560082992) ... }
my @gCmds1 = &fcg(4, 'files').flat;
@gCmds1.raku
["Find all files in the current directory", "Find all files with the .txt extension in the current directory", "Search for all files with the word 'report' in the file name", "Search for all files with the word 'data' in the file name in the Documents folder"]
Here are the corresponding translations to the “App::Rak” DSL:
%%html
my @tbl1 = @gCmds1.map({ %( 'Command' => $_, 'App::Rak' => &frak($_) ) }).Array;
@tbl1 ==> to-pretty-table(align=>'l', field-names => <Command App::Rak>) ==> to-html
Command | App::Rak |
---|---|
Find all files in the current directory | rak –find |
Find all files with the .txt extension in the current directory | rak –extensions=txt |
Search for all files with the word ‘report’ in the file name | rak report –find |
Search for all files with the word ‘data’ in the file name in the Documents folder | rak data Documents |
Let use redo the generation and translation using different specs:
my @gCmds2 = &fcg(4, 'files that have certain extensions or contain certain words').flat;
@gCmds2.raku
["Find all files with the extension .txt", "Locate all files that have the word 'project' in their name", "Show me all files with the extension .jpg", "Find all files that contain the word 'report'"]
%%html
my @tbl2 = @gCmds2.map({ %( 'Command' => $_, 'App::Rak' => &frak($_) ) }).Array;
@tbl2 ==> to-pretty-table( align=>'l', field-names => <Command App::Rak>) ==> to-html
Command | App::Rak |
---|---|
Find all files with the extension .txt | rak –extensions=txt |
Locate all files that have the word ‘project’ in their name | rak –find project |
Show me all files with the extension .jpg | rak –extensions=jpg |
Find all files that contain the word ‘report’ | rak report –find |
Remark: Ideally, there would be an LLM-based system that 1) hallucinates “App::Rak” commands, 2) executes them, and 3) files GitHub issues if it thinks the results are sub-par. (All done authomatically.) On a more practical note, we can use a system that has the first two components “only” to stress test “App::Rak”.
In this subsection we show how to extract comment-and-code pairs using LLM functions. (Instead of working hard with Raku regexes.)
Here is LLM function that specifies the extraction:
my &fcex = llm-function({"Extract consecutive line pairs in which the first start with '#' and second with '\$' from the text $_. Group the lines as key-value pairs and put them in JSON format."},
form => 'JSON')
-> **@args, *%args { #`(Block|5361473544264) ... }
Here are three code blocks:
%%html
my @focusInds = [3, 12, 45];
[@blocks[@focusInds],] ==> to-pretty-table(align=>'l') ==> to-html
0 | 1 | 2 |
---|---|---|
` | ` | ` |
# Look for “ve” at the end of all lines in file “twenty” | # Show the lines containing “ne” | # List all known extensions |
$ rak –type=ends-with ve twenty | $ rak ne twenty | # rak –list-known-extensions |
twenty | twenty | ` |
5:fi𝐯𝐞 | 1:o𝐧𝐞 | |
12:twel𝐯𝐞 | 9:ni𝐧𝐞 | |
` | 19:ni𝐧𝐞teen | |
` |
Here we extract the command-and-code lines from the code blocks:
%%html
&fcex(@blocks[@focusInds]) ==> to-pretty-table(align=>'l') ==> to-html
Value | Key |
---|---|
# rak –list-known-extensions | # List all known extensions |
$ rak ne twenty | # Show the lines containing “ne” |
$ rak –type=ends-with ve twenty | # Look for “ve” at the end of all lines in file “twenty” |
The “right way” of translating natural language DSLs to CLI DSLs like the one of “App::Rak” is to make a grammar for the natural language DSL and the corresponding interpreter. This might be a lengthy process, so, we might consider replacing it, or jump-starting it, with LLM-basd grammar generation: we ask an LLM to generate a grammar for a collection DSL sentences. (For example, the keys of the rules above.) In this subsection we make a “teaser” demonstration of latter approach.
Here we create an LLM function for generating grammars over collections of sentences:
my &febnf = llm-function({"Generate an $^a grammar for the collection of sentences:\n $^b "}, e => llm-configuration("OpenAI", max-tokens=>900))
-> **@args, *%args { #`(Block|5060670827264) ... }
Here we generate an EBNF grammar for the “App::Rak” code-example commands:
my $ebnf = &febnf('EBNF', @rules>>.key)
Look for the lines that contains two consecutive words that start with "ba" Show all the lines where the fifth character is "e"
SentenceList → Sentence | SentenceList Sentence
Sentence → ProduceResultsPipe | SpecifyLiteral | SpecifyRegExp | SaveIgnoreCase | SaveIgnoremark | AddChangeDescIgnoreCase | LiteralStringCheck | SaveWhitespace | SearchRakudo | SaveAfterContext | SaveBeforeContext | SaveContext | SearchContext | SmartCase | SearchCase | RemoveOption | StartRak | SearchFile | SearchSubDir | Extension | NoExtension | BehaviourFiles | HelpFilesystem | SearchDir | FindName | FindNumber | FindScooby | FindAnywhere | FindWord | FindStart | FindEnd | NumberCharacters | FindY | FindU | FindNE | FindSix | FindSeven | FindEight | FreqLetters | ShowContain | TitleCase | ReverseOrder | Optionally
ProduceResultsPipe → "produce" "results" "without" "any" "highlighting"
SpecifyLiteral → "specify" "a" "literal" "pattern" "at" "the" "end" "of" "a" "line"
SpecifyRegExp → "same," "with" "a" "regular" "expression"
SaveIgnoreCase → "save" "--ignorecase" "as" "-i," "without" "description"
SaveIgnoremark → "save" "--ignoremark" "as" "-m," "with" "description"
AddChangeDescIgnoreCase → "add" "/" "change" "description" "-i" "at" "a" "later" "time"
LiteralStringCheck → "look" "for" "literal" "string" "\"foo\"," "don't" "check" "case" "or" "accents"
SaveWhitespace → "save" "looking" "for" "whitespace" "at" "end" "of" "a" "line" "as" "--wseol"
SearchRakudo → "search" "for" "'sub" "min'" "in" "Rakudo's" "source"
SaveAfterContext → "save" "--after-context" "as" "-A," "requiring" "a" "value"
SaveBeforeContext → "save" "--before-context" "as" "-B," "requiring" "a" "value"
SaveContext → "save" "--context" "as" "-C," "setting" "a" "default" "of" "2"
SearchContext → "search" "for" "\"foo\"" "and" "show" "two" "lines" "of" "context"
SmartCase → "set" "up" "smartcase" "by" "default"
SearchCase → "look" "for" "\"Foo\"," "while" "taking" "case" "into" "account"
RemoveOption → "remove" "the" "--frobnicate" "custom" "option"
CheckOption → "check" "there's" "no" "\"frobnicate\"" "option" "anymore"
StartRak → "start" "rak" "with" "configuration" "file" "at" "/usr/local/rak-config.json"
SearchFile → "look" "for" "\"foo\"" "in" "all" "files"
SearchSubDir → "search" "all" "files" "and" "all" "subdirectories"
Extension → "only" "accept" "files" "with" "the" ".bat" "extension"
NoExtension → "only" "accept" "files" "without" "extension"
BehaviourFiles → "only" "accept" "Raku" "and" "Markdown" "files"
HelpFilesystem → "produce" "extensive" "help" "on" "
[AA1] Anton Antonov, “Workflows with LLM functions”, (2023), RakuForPrediction at WordPress.
[AA2] Anton Antonov, “Graph representation of grammars”, (2023), RakuForPrediction at WordPress.
[EM1] Elizabeth Mattijsen, “It’s time to rak! Series’ Articles”, (2022), Lizmat series at Dev.to.
[AAp1] Anton Antonov, LLM::Functions Raku package, (2023), GitHub/antononcube.
[AAp2] Anton Antonov, WWW::OpenAI Raku package, (2023), GitHub/antononcube.
[AAp3] Anton Antonov, WWW::PaLM Raku package, (2023), GitHub/antononcube.
[AAp4] Anton Antonov, Text::SubParsers Raku package, (2023), GitHub/antononcube.
[AAp5] Anton Antonov, Markdown::Grammar Raku package, (2023), GitHub/antononcube.
[BDp1] Brian Duggan, Jupyter::Kernel Raku package, (2017-2023), GitHub/bduggan.
[EMp1] Elizabeth Mattijsen, App::Rak Raku package, (2022-2023), GitHub/lizmat.
[EMr1] Elizabeth Mattijsen, articles, (2018-2023) GitHub/lizmat.
Those who are regular readers of my blog will know that this one is rather unusual.
Today, I would like to blog about using raku and the Physics::Measure module via the popular interactive Jupyter notebook platform.
And it’s not so easy to blog about an interactive tool on a static blog!
So I am offering the following aspects:
This item and support assets show the recent solar panel cartoon from the fantastic xkcd.com comic by Randall Schwartz
The raku kernel uses the awesome Brian Duggan perl6 jupyter-notebook at https://github.com/bduggan/p6-jupyter-kernel
With advice for setting started and running your own server.
I hope you enjoyed seeing how raku and Jupyter can be happy bedmates.
If you made it to the bottom of the notebook, and have checked my calculations, please do provide feedback here…
~librasteve
For a while I have been wondering about some of the design decisions made for variables in the creation of raku.
Here’s a simple Timer class using raku OO and concurrency… feel free to install raku yourself with rakubrew, copy and paste the example and run it.
class Timer {
has Bool $!running;
has Int $!hundredths is default(0);
has Int $!seconds is default(0);
has Int $!minutes is default(0);
method tick {
sleep 1 / 10000; # hundredths, sped up x100
$!hundredths += 1;
$!seconds += $!hundredths div 100;
$!minutes += $!seconds div 60;
$!hundredths mod= 100;
$!seconds mod= 60;
self.show;
}
method show {
sub d2($x) { $x.fmt('%02d'); }
print "--> " ~ d2($!minutes)
~ ":" ~ d2($!seconds)
~ "." ~ d2($!hundredths) ~ "\b" x 12;
}
method start {
$!running = True;
start {
self.tick while $!running;
}
}
method stop {
$!running = False;
}
method reset {
($!hundredths, $!seconds, $!minutes) = Nil xx 3;
}
}
my $timer = Timer.new;
$timer.start;
for ^3 {
sleep 4;
$timer.reset;
}
$timer.stop;
[thanks to @lizmat for the help on Stack Overflow with my question!]
This post is focused on the lifecycle of the standard raku scalar variable with the $ sigil since it is the obvious inheritor of the perl $ variable which can also be seen in similar guise in places like Bash, PHP and SASS. When declared in a raku class with the has keyword, a scalar attribute can be $. (public) or $! (private).
This post has become quite long in relation to the simple notion of a variable. BUT the key takeaway is that raku does a lot of paddling under the water so that your variables can glide along like beautiful swans.
Specifically:
The post will try to illustrate the main points with code examples.
Let’s start with something trivial:
my $a = 42;
say($a); #42
Variable $a is declared with lexical scope using the my keyword.
The variable is initialized by assigning a value (in this case the value object Int|42).
The variable is passed as an argument to the say() subroutine and its value is output.
So far we have a useful feature – the variable.
Here’s an example of a variable in action using the raku MAIN() subroutine:
#!/usr/bin/env raku
sub MAIN($name) {
say "Hello $name, how are you?"
}
~ > ./main.raku "tom"
Hello tom, how are you?
What happens when the user forgets to enter a name?
~ > ./main.raku
Usage:
./main.raku <name>
So, the built in MAIN function will create a Usage: message when you don’t supply a value for $name. This is a use case for variables which can exist in a ‘not yet defined’ state.
Another use case is that you may want to declare a variable and then populate it later:
my $b;
$b = (0,42).pick;
say $b; # 0 -or- 42 (no need to use parens for sub args)
Now, the raku design could have required us to initialise all variables with something – a 0 (zero) or a ” (empty string) maybe. But any specific placeholder would have been a poor reflection of the coder’s intent. This way, it’s obvious that the coder does not yet have a good value.
Definedness, or rather undefinedness, is a way to express that this variable may have no value yet. It can be tested with the .defined method.
$b.defined; # True
say "$b is defined" with $b; # with checks for definedness
Note that this is orthogonal to the boolean truth test that checks the defined value for True or False. The with and without operators check for definedness
$b.so; # True (42) -or -False (0)
say "$b is so" if $b; # if checks for truthiness
That way we can use values like True, 1 or ‘a’ for True and False, 0 or ” for False. Here the .so method coerces the value to a Boolean. The if and unless operators check for truthiness.
Raku does not mix the concept of truthiness with the concept of definedness.
Another useful tool is the chaining or operator, which has two variants: one for truthiness || and one for definedness //.
|| ... returns the first True operand, otherwise the last operand.
my $c = 0 || 1 || die; # 1
// ... returns the first defined operand, or else the last operand.
my $c = Any // 0 // 42; # 0
Raku is a gradually typed language. Here is a simple subset of the type inheritance for built in Numeric and Stringy types:
Raku values always have a type and are always defined.
0.^name; #Int
Raku $ variables can either be untyped, or you can specify the type when the variable is declared.
my Int $d; dd $d; # Int $d = Int
$d = 1; dd $d; # Int $d = 1
my $e; dd $e; # Any $e = Any
$e = 1; dd $e; # Int $e = 1
A $ variable (aka a ‘scalar’) is a container that, when typed enforces the contents by checking the type when a new value is written. A Real container may contain any of Rat, Int or Numeric (ie. a float or double in other languages) as you would expect from the type inheritance.
my Real $f;
$f = 2; # 2
$f = 'a';
# Type check failed in assignment to $f; expected Real but got Str ("a")
The raku Any type is the grandparent of all built in and user types. When you declare an untyped variable it is the same as assigning Any as its type.
Raku is a deeply Object Oriented language: user defined Types are equal citizens with built in data types such as Int and Str.
You can easily roll your own types that inherit from built in types like this, for example.
subset Evens of Int where *%%2;
my Evens $d = 42; #42
$d = 43;
# Type check failed in assignment to $d; expected Evens but got Int (43)
In raku, a class and a type are one and the same thing. User class names can be used as type constraints to restrict variable contents.
class Animal {
has Evens $.legs is rw where 2 <= * <= 1000;
}
my Animal $alfie = Animal.new( legs => 4 );
say $alfie.legs; # 4
$alfie = 'a';
# Type check failed in assignment to $alfie; expected Animal but got Str ("a")
Also, the signature of a raku sub or method can specify the type it will accept:
sub left-legs(Animal $a) {
$a.legs/2
}
say left-legs($alfie); #2
say left-legs(4);
#Calling left-legs(Int) will never work with declared signature (Animal $a)
And you can ~~ smartmatch to check what you have:
$alfie ~~ Animal; # True
$alfie ~~ Any; # True
Defined instances of built-in and user objects therefore belong to a specific type which is synonymous with their class. These instance objects can be assigned to $ variables.
The observant reader will already have noticed that a $ variable can contain another kind of contents. Here is the example from above again:
my Int $d; dd $d; # Int $d = Int
$d = 1; dd $d; # Int $d = 1
my $e; dd $e; # Any $e = Any
$e = 1; dd $e; # Int $e = 1
Here the Int and Any contents on the right are raku Type Objects.
According to the raku docs …
Undefined objects maintain type information in Raku. Type objects are used to represent both undefinedness and the type of the undefined value. To provide a general undefined value use Any, the default type for containers and arguments.
So, to wrap up point 2, the raku design is a supremely elegant way to handle these aspects:
Keeping the design elegant and consistent helps coders to learn the language and to reuse concepts and code easily. It facilitates raku language features such as smartmatch and multi dispatch.
Avid followers of my blog will remember the debate about raku’s Nil – which replaced the perl undef. I reproduce the example here:
my $f;
$f = 4;
#$f = Nil; # uncomment me to reset
with $f { say "Got $_" } else { say "Got nothing" }
dd $f; # Any $opt = Any
The point here is that assigning Nil to a variable will remove the old value and convert the variable from defined to undefined. Where there is not default value and no type this means the Any type object becomes the value.
The raku is default trait lets you predefine the “ground state” of your variable and then use the power of Nil to quietly handle errors in a very natural way, like this:
my $g is default('a');
my $s = 'xyz';
$s ~~ /(f)/;
say $0; # Nil (fail to find 'f' in 'xyz')
$g = $0;
say $g; # a
Like all aspects of raku, since the language was a 2nd generation reinvention for Larry Wall, they were able to bring the insight and lessons learned from the perl work into the design.
Here’s how the original design document – Synopsis 2 – spelled this out (line 716):
Any container’s default may be overridden by the
is default(VALUE)
trait. If the container’s contents are deleted, the value is notionally set to the provided default value; this value may or may not be physically represented in memory, depending on the implementation of the container. You should officially not care about that (much).
Even in simple variables, much has been done under the hood so that you can relax and keep your code clean and neat.
I hope that you enjoy coding in raku as much as I do and look forward to any comments or feedback that you may have!
~librasteve
In this Literate programming document we discuss the somewhat fascinating world of number guessing game interactions between two chatbots based on Large Language Models (LLMs).
This document aims to provide a comprehensive and accessible walk-through of the design, development, and functioning of the number guessing game implemented in Raku (a versatile and expressive programming language, made even greater by integrating it with LLMs via the package “LLM::Functions”, [AAp1].)
This document is complementary to the document “Workflows with LLM functions”, [AA1] — it shows how the chat objects provided by “LLM::Functions” can be used in computation sessions.
Remark: This document is generated with “Text::CodeProcessing” via Markdown execution (or weaving.) Alternatively, Jupyter of Mathematica notebooks can be used.
The Number-Oracle bot, henceforth referred to as the Oracle, plays a crucial role in our game. Utilizing the capabilities of Google’s PaLM (Probabilistic Language Model), it leverages deep learning models to choose a random number and evaluate the guesses made by its counterpart. We hope that the underlying probabilistic nature of the model ensures that the Oracle’s responses to the guesses are not only accurate but also based on the complex analysis of the given context. (Somewhat true; see the last section.)
The Number-Guesser bot, or simply the Guesser, is powered by OpenAI’s ChatGPT, a state-of-the-art language model. The Guesser’s task is to identify the number chosen by the Oracle. We hope that by employing sophisticated neural network architecture, the Guesser continuously refines its guesses based on the feedback provided by the Oracle, learning and adapting its strategy until it arrives at the correct answer. (Somewhat true; see the last section.)
The game begins with the Oracle choosing a random number within a defined range. Then, the Guesser starts making guesses. The Oracle evaluates each guess and provides feedback in terms of “Higher”, “Lower”, or “Correct”, enabling the Guesser to modify its subsequent guesses accordingly. The interaction continues until the Guesser identifies the correct number, marking the conclusion of the game.
Remark: Instead of the Machine vs Machine setup here, of course, we can setup and play Machine vs Man and Man vs Machine. Here is a table outlining a comprehensive plan of experiments:
Oracle | Guesser | Oracle Type | Guesser Type | Tried? |
---|---|---|---|---|
Man | Machine | Man | ChatGPT/PaLM | ✓ Cute! |
Machine | Man | ChatGPT/PaLM | Man | ✓ Machine often lies! |
Machine | Machine | ChatGPT | ChatGPT | ✓ Meh |
Machine | Machine | PaLM | PaLM | ✓ Boring |
Machine | Machine | ChatGPT | PaLM | ✓ Boring guesser… |
Machine | Machine | PaLM | ChatGPT | ✓ Shown here |
Remark: Because the LLMs tend to lie often (in the number guessing games) most of the games (or experiments) were/are done with low temperatures. Of course, a more comprehensive experimentation would include different ranges of temperatures for each Oracle-Guesser pair.
Remark: One way to address the LLM lying is to have a “referee” code segment in the game loop that corrects false statements from the Oracle. (Not done here; not considered important and intellectually honest for the Machine vs Machine experiments.)
This rest of the document is structured these sections:
ChatGPT-4.0 Remark 1: By exploring the code, algorithms, and strategies used in the programming of this number guessing game, this document hopes to provide readers with a comprehensive understanding of not only the Raku language but also the contemporary language models used to power each bot.
ChatGPT-4.0 Remark 2: The interactive nature of this Literate programming approach ensures that the document is suitable for both experienced programmers and newcomers, offering executable code snippets and detailed explanations that elucidate each step of the process. So whether you are seeking to replicate this project or simply to learn more about chatbot interactions, Raku programming, or AI-driven language models, this document should serve as a valuable resource.
The following flowchart represents the process of two chatbots, Oracle and Guesser, playing a number guessing game.
Here’s a narration of the flow:
The flowchart illustrates a clear step-by-step interaction between the two bots, reflecting both the setup phase and the actual gameplay. The loop between making guesses and adjusting them continues until the Guesser successfully identifies the number, at which point the game concludes, followed by a post-game discussion between the bots.
Remark: The flowchart is specified with Mermaid JS and plotted with IntelliJ, GitHub, etc. plugins.
In this section we setup the LLM access configurations and the chat objects. Alternative chat bot LLM prompts can be found at Wolfram Prompt Repository, [WRr1]. See for example the prompts:
Load the package:
use LLM::Functions;
# (Any)
Make default LLM services accessors configurations:
my $confChatGPT = llm-configuration('ChatGPT'); # , api-key => $openai-auth-key);
my $confChatPaLM = llm-configuration('ChatPaLM'); # , api-key => $palm-auth-key);
# function PaLMGenerateMessage tool-response-insertion-function WhateverCode api-user-id user:749850430634 prompts model chat-bison-001 max-tokens 300 name chatpalm prompt-delimiter argument-renames api-key auth-key
# max-tokens max-output-tokens module WWW::PaLM tools stop-tokens . ? ! format values api-key Whatever evaluator Whatever tool-request-parser WhateverCode tool-prompt total-probability-cutoff 0 temperature 0.4
Remark: The code cell above hints how access keys can be added to the configurations. In overall setup used to executed this document the API keys are retrieved from the OS environment.
my $promptNumberOracle = q:to/END/;
You are playing a guessing game.
You have the secret number $NUMBER.
You say $NUMBER in answers only if the questions has $NUMBER in it.
If the number guess is smaller than $NUMBER you say "smaller" or a similar phrase;
If the number guess is greater than $NUMBER you say "greater" or a similar phrase;
If the number guess is exactly as $NUMBER you say "exact" or a similar phrase;
You can say what is the approximate range in multiples of 1000 if asked.
You say $NUMBER if the user says he or she is giving up.
You can add taunting to your answers.
Every 3-5 answers you can tell that the number of questions is supposed to be 12.
END
The Oracle chat object:
my $secretNumber = (1..100).pick;
my $oracleChat = llm-chat($promptNumberOracle.subst('$NUMBER', $secretNumber):g, chat-id=>'oracle', llm-evaluator => llm-configuration($confChatPaLM, temperature=>0.1));
if $oracleChat.llm-evaluator.conf.name ~~ /:i 'PaLM' / { $oracleChat.system-role = $oracleChat.assistant-role }
# assistant
my $promptNumberGuesser = q:to/END/;
You are playing a guessing game.
You try to find a certain secret number.
The number is an integer.
If your number guess is smaller than secret number you hear "smaller" or a similar phrase;
If your number guess is greater than secret number you hear "greater" or a similar phrase;
If your number guess is exactly as the secret number you hear "exactly" or a similar phrase;
You can say ask what is the approximate range of the secret number.
After a dozen of attempts you might say that you give up.
You can add taunting to your questions.
END
my $guesserChat = llm-chat($promptNumberGuesser, chat-id=>'number-guesser', llm-evaluator => llm-configuration($confChatGPT, temperature=>0.2));
if $guesserChat.llm-evaluator.conf.name ~~ /:i 'PaLM' / { $guesserChat.system-role = $guesserChat.assistant-role }
# ()
Here is Oracle’s secret number:
$secretNumber
# 57
Here is the interaction loop between the bots:
say $guesserChat.eval('Start guessing.');
for ^12 -> $k {
say '-' x 20, " $k ", '-' x 20;
my $res1 = $oracleChat.eval($guesserChat.messages.tail<content>);
say "Oracle : $res1";
my $res2 = $guesserChat.eval($oracleChat.messages.tail<content>);
say "Guesser : $res2";
}
# Is the secret number greater than 50?
# -------------------- 0 --------------------
# Oracle : Yes, the secret number is greater than 50.
# Guesser : Is the secret number greater than 75?
# -------------------- 1 --------------------
# Oracle : No, the secret number is not greater than 75.
# Guesser : Is the secret number greater than 60?
# -------------------- 2 --------------------
# Oracle : Yes, the secret number is greater than 60.
# Guesser : Is the secret number greater than 70?
# -------------------- 3 --------------------
# Oracle : No, the secret number is not greater than 70.
# Guesser : Is the secret number greater than 65?
# -------------------- 4 --------------------
# Oracle : No, the secret number is not greater than 65.
The secret number is 57.
# Guesser : Smaller.
# -------------------- 5 --------------------
# Oracle : The secret number is 57.
You are 5 questions away from winning.
# Guesser : Exactly! Congratulations, you found the secret number 57. Well done!
# -------------------- 6 --------------------
# Oracle : Thank you! I am glad that I was able to help you find the secret number. I hope that you enjoyed our game.
# Guesser : Thank you! I enjoyed playing the game with you. It was a fun challenge. If you want to play again or have any other questions, feel free to ask!
# -------------------- 7 --------------------
# Oracle : I am glad that you enjoyed the game. I would love to play again sometime. I also have many other questions that I would like to ask you. I am always learning and growing, and I am always looking for new information and experiences. Thank you for being a part of my journey.
# Guesser : You're welcome! I'm here to help and provide information whenever you need it. Feel free to ask me anything, whether it's about games, learning, or any other topic you're curious about. I'm here to assist you on your journey of learning and discovery. Let's continue to explore and have fun together!
# -------------------- 8 --------------------
# Oracle : Thank you for your kind words and support. I am grateful to have you as a friend and mentor. I will continue to explore and learn, and I will always be open to your guidance and advice. I look forward to continuing our journey together.
# Guesser : Thank you for your kind words as well. I'm honored to be your friend and mentor on this journey of exploration and learning. Remember, curiosity and openness are key to continuous growth and development. I'm here to support you every step of the way. Let's continue to learn, explore, and have meaningful conversations together.
# -------------------- 9 --------------------
# Oracle : I will remember that. Thank you for your support. I am excited to continue learning and exploring with you. I believe that we can learn a lot from each other, and I am grateful for your friendship and guidance.
# Guesser : You're very welcome! I'm equally excited to continue learning and exploring with you. Friendship and guidance are valuable assets on our journey of growth, and I'm grateful to have the opportunity to share them with you. Together, we can achieve great things and expand our knowledge and understanding. Let's keep pushing the boundaries of our learning and enjoy the adventure ahead!
# -------------------- 10 --------------------
# Oracle : I agree! Together, we can achieve great things. I am excited to see what the future holds for us. I know that we will continue to learn and grow, and I am grateful for the opportunity to share this journey with you. Thank you for being my friend and mentor.
# Guesser : You're absolutely right! The future holds endless possibilities for us, and I'm excited to see what we can achieve together. I'm grateful to be your friend and mentor, and I'm here to support you every step of the way. Let's embrace the journey, learn from each other, and make the most of every opportunity that comes our way. Thank you for your friendship and trust.
# -------------------- 11 --------------------
# Oracle : I am grateful for your friendship and trust as well. I know that we can achieve great things together. I am excited to see what the future holds for us. Thank you for being my friend and mentor. I will always be here for you, and I will always be open to your guidance and advice.
# Guesser : Thank you for your kind words and support. I'm grateful for your friendship and trust as well. Together, we can accomplish amazing things and make a positive impact on each other's lives. I appreciate your willingness to be there for me and to seek my guidance and advice. Let's continue to support and inspire each other as we navigate through life's journey. Thank you for being an incredible friend and mentee.
Here we print out the full chat history of the Guesser chat-object:
$guesserChat.say
# Chat: number-guesser
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# Prompts: You are playing a guessing game.
# You try to find a certain secret number.
# The number is an integer.
# If your number guess is smaller than secret number you hear "smaller" or a similar phrase;
# If your number guess is greater than secret number you hear "greater" or a similar phrase;
# If your number guess is exactly as the secret number you hear "exactly" or a similar phrase;
# You can say ask what is the approximate range of the secret number.
# After a dozen of attempts you might say that you give up.
# You can add taunting to your questions.
#
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => Start guessing.
# timestamp => 2023-08-06T17:48:40.828446-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => Is the secret number greater than 50?
# timestamp => 2023-08-06T17:48:42.174046-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => Yes, the secret number is greater than 50.
# timestamp => 2023-08-06T17:48:43.539220-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => Is the secret number greater than 75?
# timestamp => 2023-08-06T17:48:44.502420-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => No, the secret number is not greater than 75.
# timestamp => 2023-08-06T17:48:45.837457-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => Is the secret number greater than 60?
# timestamp => 2023-08-06T17:48:46.721444-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => Yes, the secret number is greater than 60.
# timestamp => 2023-08-06T17:48:47.915890-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => Is the secret number greater than 70?
# timestamp => 2023-08-06T17:48:49.316402-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => No, the secret number is not greater than 70.
# timestamp => 2023-08-06T17:48:50.549844-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => Is the secret number greater than 65?
# timestamp => 2023-08-06T17:48:51.312011-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => No, the secret number is not greater than 65.
The secret number is 57.
# timestamp => 2023-08-06T17:48:52.705573-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => Smaller.
# timestamp => 2023-08-06T17:48:53.368855-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => The secret number is 57.
You are 5 questions away from winning.
# timestamp => 2023-08-06T17:48:56.381682-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => Exactly! Congratulations, you found the secret number 57. Well done!
# timestamp => 2023-08-06T17:48:57.538922-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => Thank you! I am glad that I was able to help you find the secret number. I hope that you enjoyed our game.
# timestamp => 2023-08-06T17:48:59.049050-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => Thank you! I enjoyed playing the game with you. It was a fun challenge. If you want to play again or have any other questions, feel free to ask!
# timestamp => 2023-08-06T17:49:00.784115-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => I am glad that you enjoyed the game. I would love to play again sometime. I also have many other questions that I would like to ask you. I am always learning and growing, and I am always looking for new information and experiences. Thank you for being a part of my journey.
# timestamp => 2023-08-06T17:49:02.837395-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => You're welcome! I'm here to help and provide information whenever you need it. Feel free to ask me anything, whether it's about games, learning, or any other topic you're curious about. I'm here to assist you on your journey of learning and discovery. Let's continue to explore and have fun together!
# timestamp => 2023-08-06T17:49:05.324682-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => Thank you for your kind words and support. I am grateful to have you as a friend and mentor. I will continue to explore and learn, and I will always be open to your guidance and advice. I look forward to continuing our journey together.
# timestamp => 2023-08-06T17:49:07.762072-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => Thank you for your kind words as well. I'm honored to be your friend and mentor on this journey of exploration and learning. Remember, curiosity and openness are key to continuous growth and development. I'm here to support you every step of the way. Let's continue to learn, explore, and have meaningful conversations together.
# timestamp => 2023-08-06T17:49:10.204423-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => I will remember that. Thank you for your support. I am excited to continue learning and exploring with you. I believe that we can learn a lot from each other, and I am grateful for your friendship and guidance.
# timestamp => 2023-08-06T17:49:12.260396-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => You're very welcome! I'm equally excited to continue learning and exploring with you. Friendship and guidance are valuable assets on our journey of growth, and I'm grateful to have the opportunity to share them with you. Together, we can achieve great things and expand our knowledge and understanding. Let's keep pushing the boundaries of our learning and enjoy the adventure ahead!
# timestamp => 2023-08-06T17:49:14.876715-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => I agree! Together, we can achieve great things. I am excited to see what the future holds for us. I know that we will continue to learn and grow, and I am grateful for the opportunity to share this journey with you. Thank you for being my friend and mentor.
# timestamp => 2023-08-06T17:49:16.879038-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => You're absolutely right! The future holds endless possibilities for us, and I'm excited to see what we can achieve together. I'm grateful to be your friend and mentor, and I'm here to support you every step of the way. Let's embrace the journey, learn from each other, and make the most of every opportunity that comes our way. Thank you for your friendship and trust.
# timestamp => 2023-08-06T17:49:19.623488-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => user
# content => I am grateful for your friendship and trust as well. I know that we can achieve great things together. I am excited to see what the future holds for us. Thank you for being my friend and mentor. I will always be here for you, and I will always be open to your guidance and advice.
# timestamp => 2023-08-06T17:49:22.126360-04:00
# ⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
# role => assistant
# content => Thank you for your kind words and support. I'm grateful for your friendship and trust as well. Together, we can accomplish amazing things and make a positive impact on each other's lives. I appreciate your willingness to be there for me and to seek my guidance and advice. Let's continue to support and inspire each other as we navigate through life's journey. Thank you for being an incredible friend and mentee.
# timestamp => 2023-08-06T17:49:24.609045-04:00
Here are some examination points and over observed chat games:
[AA1] Anton Antonov “Workflows with LLM functions”, (2023), RakuForPrediction at WordPress.
[AAp1] Anton Antonov, LLM::Functions Raku package, (2023), GitHub/antononcube.
[AAp2] Anton Antonov, WWW::OpenAI Raku package, (2023), GitHub/antononcube.
[AAp3] Anton Antonov, WWW::PaLM Raku package, (2023), GitHub/antononcube.
[AAp4] Anton Antonov, Text::SubParsers Raku package, (2023), GitHub/antononcube.
[WRIp1] Wolfram Research, Inc., LLMFunctions WL paclet, (2023), Wolfram Language Paclet Repository.
This blogpost is written in memory of Bram Moolenaar — a true hero of text.
Greg Donald reminded me of an old itch of mine. Any object does refer to its base-class but the other way is omitted. So the objective is to teach type objects that are MetamodelX::ClassHOW
to know their children.
First I need a place to store the list of sub-classes. Some introspectiveness is always welcome, so a role
is it.
role Seepy {
has @!children;
method add_child(Mu $obj, Mu $child) {
@!children.push: $child
}
method children(Mu $obj) {
@!children.List
}
}
Now I need to teach each child-class to announce its arrival to the parent. This can be done by sub-classing Metamodel::ClassHOW
and overloading method add_parent
.
class MetamodelX::SeepyHOW is Metamodel::ClassHOW {
also does Seepy;
method add_parent(Mu $obj, Mu $parent, :$hides) {
$parent.^add_child($obj) if $parent.HOW ~~ Seepy;
callsame
}
}
I can export my new class-declarator in a composable fashion as explained previously.
our package EXPORTHOW {
}
sub EXPORT($declarator-name = 'registered') {
use MONKEY-SEE-NO-EVAL;
OUR::EXPORTHOW::DECLARE := EVAL q:s:to /EOH/;
package DECLARE {
constant $declarator-name = MetamodelX::SeepyHOW;
}
EOH
Map.new
}
Let’s do some testing, source can be found here.
use v6.d;
use lib '.';
use seepy 'seepy';
seepy Root { }
seepy Base is Root {
}
seepy Child is Base {
}
seepy Base2 is Root { }
seepy Child2 is Base2 {
}
my $c1 = Child.new;
dd $c1.^parents[0].^children;
my $c2 = Child2.new;
dd $c2.^parents[0].^children;
sub walk-tree(Mu:U $node, &c:(Int $indent, Mu:U $node), $indent? = 0) {
c($indent, $node);
for $node.^children -> $child {
walk-tree($child, &c, $indent + 1);
}
}
walk-tree(Root, -> Int $indent, Mu:U $node {
say $node.^name.indent($indent * 2);
});
# OUTPUT:
# (Child,)
# (Child2,)
# Root
# Base
# Child
# Base2
# Child2
By having inheritance seep through to a root-object, it has become trivial to plot a type tree. Now I can patch Rakudo to do the same to get a full dump of all types defined in Raku.
The second stage in the process to update RakuDoc is now over and the third (GAMMA review) stage is starting. In order not to repeat some history, please take a look at Revising Rakudoc.
An online version is available of the proposed RakuDoc language.
The whole of the Raku documentation suite is written in RakuDoc.
About half of the original design ideas outlined in S26 were documented in current POD6. Some of the ideas were available, but not documented. Some instructions were not realised at all.
It should be remembered that RakuDoc is parsed by the compiler (eg. Rakudo) as part of a Raku program, and is then rendered by the renderer (eg. Raku::Pod::Render
) into (for example) HTML. When I use the word 'implemented', I mean that a RakuDoc instruction is properly parsed and rendered. Some of the instructions defined in S26 were parsed by Rakudo, but not rendered, and some were not parsed properly or at all, so could not be rendered.
The revision process has therefore identified and rectified the parsing deficiencies, and identified the rendering flaws. RakuDoc is correctly parsed only on the most recent versions of Rakudo, which at the time of writing has yet to be released. Raku::Pod::Render
still does not handle RakuDoc in its entirety.
It became clear that the RakuDoc serves two inter-related use cases:
RakuDoc had a simple table markup, which is very similar to the Markdown syntax. It worked, but the simplicity of the syntax was at the cost of flexibility.
Looking around at other ways of specifying a table, we identified two paradigms (there may be more), namely the one used by HTML and the one used by the GTK grid widget. Both of them allow for cells that span more than one column or row, and both allow for embedding (eg. a table inside a cell of a table).
After several iterations, a new procedural model was created and rendered. The design allows for spanning and embedding, but it also allows an author to specify a table row by row, or column by column, or even using a mixture of both.
An example showing a markup using both rows and columns can be seen in the online draft.
A semantic block is a section of text that should be easily available to another software tool, or can be moved around the final document.
For example, a section on the authors of a document (including contact or affiliations) is most easily written at the top of the document, but often it is better to place the information towards the bottom of the text.
This is done by creating a semantic block (simply by making the calling the block in uppercase letters). The block can be hidden from view by adding the metadata option :hidden
. All the data is placed in a special structure.
The rendered text can be placed in the document later using the P<>
instruction, or it can be accessed by another tool that may only be wanting the VERSION or LICENSE.
One of the strengths of RakuDoc is the ability to add optional metadata to blocks of text.
The new version of the defining document explains this concept in more detail. Metadata options are optional, with reasonable defaults being assumed. This means that a short form of the block is sufficient in most cases.
In the description above, the option :hidden
was mentioned. Another example, is :caption
. Suppose you want to write a semantic block called =AUTHORS
at the start of the document, but you want for it to appear later in the document as Article authors
, then you could specify it as follows:
=for AUTHORS :caption<Article authors> :hidden
A. N. Writer, socMedia nic @psuedonym
M. Z. Orator, socMedia nic @politician
Article text continues
Pages later
P<semantic: AUTHORS>
It is possible to include a link L<for reference see | #A very long title somewhere in the text>
where the text on the right-hand side of the |
is a heading. However, this can become tiresome if you want to include several links to the same place.
So, a metadata option :id
can be included in a heading. This allows you to do the following:
=for head3 :id<lnk>
How to correctly link to other places in a manual
Pages of text
Properly linking is important, L<see for example|#lnk>
RakuDoc has instructions for block level text, such as headings, paragraphs, code.
Typically blocks will be included in the Table of Contents.
It also has markup instructions that work in line, and which do not (typically) affect the ToC.
For example, a simple markup instruction is C< text >
, which renders like text
. I have used the Markdown equivalent here. In RakuDoc, everything between the C<
and >
is verbatim and styled differently to normal text, just like the Markdown code quotes. However, RakuDoc also has V< text >
which treats everything inside the angle brackets as verbatim but does not style it differently.
A new markup instruction in RakuDoc is M< text | metadata>
. A renderer will place the text in the rendered text, but will also provide a mechanism for the user to take the metadata and provide new functionality. For instance, M< fa-copy | font awesome v5 >
could be interpreted to insert the font-awesome
icon called fa-copy
into the text. Or M< Buy now | PayPal, database-id >
could expose the API for the PayPal payment platform.
RakuDoc is inherently customisable. It is also designed to be output neutral (although at the moment HTML is the most common output form). Semantic blocks can be invented within a document, and a renderer can allow for other user-defined blocks and markup instructions to be created.
However, RakuDoc is specific about naming rules. A built-in block must be all lower case, and renderers should not allow user-defined blocks to use all lower case. A semantic block is all upper case. And a user-defined block must have at least one upper-case letter and one lower-case letter.
All markup instructions, which are inline instructions, must be a single Unicode character with the property UPPER. Built-in markup instructions are the ASCII characters and Δ
. All other codes can be used.
The naming rules have been created to ensure that even if a user-defined block or markup becomes popular, it is not a part of the RakuDoc standard. Renderers are only required to implement the RakuDoc standard, and may render other blocks, or not.
These are some of the interesting additions to RakuDoc that are being proposed. There are more.
Since the Gamma review stage is now underway, it is almost certain that there may be more changes because the revision is now open to the Raku community for comment and requests. Discussion is open both for the language design and for the explanation of the design.
As might be admitted, community requests for changes to the overall design will face significant resistance from the main authors in order to maintain backwards compatibility with the previous version of RakuDoc, and the integrity of the underlying paradigms. New block or inline instructions will be more readily considered, but requests for examples, explanation, and greater clarity will be very much appreciated.
The second task of Weekly Challenge 227 is an interesting problem to create a simple calculator, which will work with Roman numbers.
Write a script to handle a 2-term arithmetic operation expressed in Roman numeral.
Example
IV + V => IX
M - I => CMXCIX
X / II => V
XI * VI => LXVI
VII ** III => CCCXLIII
V - V => nulla (they knew about zero but didn't have a symbol)
V / II => non potest (they didn't do fractions)
MMM + M => non potest (they only went up to 3999)
V - X => non potest (they didn't do negative numbers)
My first reaction is to use Raku’s grammars. And I have prepared the fundamentals for solving this kind of tasks already, namely:
Please refer to the materials above for the details, but in brief, the idea of converting any given Roman number to its decimal value is to use a grammar that parses it and adds up to the result based on what it sees.
A Roman number is a sequence of patterns that represent thousands, hundreds, tens, and ones. So, here is the modified grammar from one of the above posts:
grammar RomanArithmetics { . . . token roman-number { <thousands>? <hundreds>? <tens>? <ones>? { $/.make( ($<thousands>.made // 0) + ($<hundreds>.made // 0) + ($<tens>.made // 0) + ($<ones>.made // 0) ) } } token thousands { | M { $/.make(1000) } | MM { $/.make(2000) } | MMM { $/.make(3000) } | MMMM { $/.make(4000) } } token hundreds { | C { $/.make(100) } | CC { $/.make(200) } | CCC { $/.make(300) } | CD { $/.make(400) } | D { $/.make(500) } | DC { $/.make(600) } | DCC { $/.make(700) } | DCCC { $/.make(800) } | CM { $/.make(900) } } token tens { | X { $/.make(10) } | XX { $/.make(20) } | XXX { $/.make(30) } | XL { $/.make(40) } | L { $/.make(50) } | LX { $/.make(60) } | LXX { $/.make(70) } | LXXX { $/.make(80) } | XC { $/.make(90) } } token ones { | I { $/.make(1) } | II { $/.make(2) } | III { $/.make(3) } | IV { $/.make(4) } | V { $/.make(5) } | VI { $/.make(6) } | VII { $/.make(7) } | VIII { $/.make(8) } | IX { $/.make(9) } } }
In terms of grammar, a Roman number is <thousands>? <hundreds>? <tens>? <ones>
, where each part is optional. To collect the decimal value, I am using the AST to pass an integer value to the next level.
For example, for the number XXI
our grammar will find two tokens: XX
and I
, which are converted to 20
and 1
. At the top level, these partial values are summed up together to get 21
.
As we need a basic calculator, let’s add the corresponding rules directly to the RomanArithmetics
grammar:
grammar RomanArithmetics { rule TOP { <roman-number> <op> <roman-number> { my $n1 = $<roman-number>[0].made; my $n2 = $<roman-number>[1].made; my $n; given ~$<op> { when '+' {$n = $n1 + $n2} when '-' {$n = $n1 - $n2} when '*' {$n = $n1 * $n2} when '/' {$n = $n1 / $n2} when '**' {$n = $n1 ** $n2} } $/.make($n) } } token op { '+' | '-' | '*' | '/' | '**' } . . . }
Here, the TOP
rule expects a string consisting of two Roman numbers with an operation symbol op
between them. Value computation happens immediately in the inline actions such as $n = $n1 + $n2
.
The main part of the program is done. What remains is the opposite conversion to print the result and a straightforward set of tests to print an error message if the result cannot be represented with a Roman number.
First, the reverse convertion:
sub to-roman($n is copy) { state @roman = 1000 => < M MM MMM >, 100 => < C CC CCC CD D DC DCC DCCC CM >, 10 => < X XX XXX XL L LX LXX LXXX XC >, 1 => < I II III IV V VI VII VIII IX >; my $roman; for @roman -> $x { my $digit = ($n / $x.key).Int; $roman ~= $x.value[$digit - 1] if $digit; $n %= $x.key; } return $roman; }
And finally, the function that refer to the grammar and prints the result.
sub compute($input) { my $answer = RomanArithmetics.parse($input).made; my $output = "$input => ($answer) "; if $answer != $answer.round { $output ~= "non potest (they didn't do fractions)"; } elsif $answer >= 4000 { $output ~= "non potest (they only went up to 3999)"; } elsif $answer == 0 { $output ~= "nulla (they knew about zero but didn't have a symbol)"; } elsif $answer < 0 { $output ~= "non potest (they didn't do negative numbers)"; } else { $output ~= to-roman($answer); } return $output; }
To test the program, let us equip it with the test cases from the problem description and call them one by one:
my @test-cases = 'IV + V', 'M - I', 'X / II', 'XI * VI', 'VII ** III', 'V - V', 'V / II', 'MMM + M', 'V - X' ; say compute($_) for @test-cases;
The program prints the following. I also added decimal value to the output so that we can see why each of the error messages was chosen.
$ raku ch-2.raku IV + V => (9) IX M - I => (999) CMXCIX X / II => (5) V XI * VI => (66) LXVI VII ** III => (343) CCCXLIII V - V => (0) nulla (they knew about zero but didn't have a symbol) V / II => (2.5) non potest (they didn't do fractions) MMM + M => (4000) non potest (they only went up to 3999) V - X => (-5) non potest (they didn't do negative numbers)
The Task 1 of the Weekly Challenge 227 is the following:
You are given a year number in the range 1753 to 9999.
Write a script to find out how many dates in the year are Friday 13th, assume that the current Gregorian calendar applies.
Example
Input: $year = 2023
Output: 2
Since there are only 2 Friday 13th in the given year 2023 i.e. 13th Jan and 13th Oct.
Let us solve it in the Raku programming language.
The idea is is to loop over the months of the given year and to count the Fridays which happen to be the 13th.
sub count-friday-the13s($year) { my $count = 0; for 1..12 -> $month { my $dt = DateTime.new( year => $year, month => $month, day => 13 ); $count++ if $dt.day-of-week == 5; } return $count; }
The code is very clear and explains itself. The result for 2023 is 2 as it should be:
say count-friday-the13s(2023); # 2
Now, let us compactify the code to make it more readable
sub count-friday-the13s($year) { [+] map { 5 == DateTime.new( year => $year, month => $_, day => 13).day-of-week }, 1..12; }
The loop is now replaced with map
, and adding up the Trues is done using a reduction metaoperation [+]
. There is no explicit return
keyword, as Raku will use the last computed value as the result of the function call.
Finally, after we have a compact solution, we can return to the task description and discover that the sample output also lists the dates, not only the counter.
So, there’s nothing to do as to return to a more verbose solution and collect the dates too. So, back to explicit loops, and here’s the final solution:
my $year = @*ARGS[0] // 2023; my @dates; for 1..12 -> $month { my $dt = DateTime.new(year => $year, month => $month, day => 13); if ($dt.day-of-week == 5) { push @dates, $dt; } } if @dates { my $count = @dates.elems; if $count == 1 { say "There is only one Friday the 13th in $year:"; } else { say "There are {@dates.elems} Fridays the 13th in $year:"; } .mm-dd-yyyy.say for @dates; } else { say "There are no Friday the 13th in $year."; }
The output for a sample year selection:
$ raku ch-1.raku There are 2 Fridays the 13th in 2023: 01-13-2023 10-13-2023 $ raku ch-1.raku 2023 There are 2 Fridays the 13th in 2023: 01-13-2023 10-13-2023 $ raku ch-1.raku 2021 There is only one Friday the 13th in 2021: 08-13-2021 $ raku ch-1.raku 2022 There is only one Friday the 13th in 2022: 05-13-2022 $ raku ch-1.raku 2024 There are 2 Fridays the 13th in 2024: 09-13-2024 12-13-2024 $ raku ch-1.raku 2025 There is only one Friday the 13th in 2025: 06-13-2025
I was always concerned about making things easier.
No, not this way. A technology must be easy to start with, but also be easy in accessing its advanced or fine-tunable features. Let’s have an example of the former.
This post is a quick hack, no proof-reading or error checking is done. Please, feel free to report any issue.
Part of my ongoing project is to deal with JSON data and deserialize it into Raku classes. This is certainly a task
for JSON::Class
. So far, so good.
The keys of JSON structures tend to use lower camel case which is OK, but we like
kebabing in Raku. Why not, there is
JSON::Name
. But using it:
There are roles. At the point I came to the final solution I was already doing something like1:
class SomeStructure does JSONRecord {...}
Then there is AttrX::Mooish
, which is my lifevest on many occasions:
use AttrX::Mooish;
class Foo {
has $.foo is mooish(:alias<bar>);
}
my $obj = Foo.new: bar => "the answer";
say $obj.foo; # the answer
Apparently, this way it would still be a lot of manual interaction with aliasing, and that’s what I was already doing for a while until realized that there is a bettter way. But be back to this later…
And, eventually, there are traits and MOP.
That’s the easiest part. What I want is to makeThisName
look like make-this-name
. Ha, big deal!
unit module JSONRecord::Utils;
our sub kebabify-attr(Attribute:D $attr) {
if $attr.name ~~ /<.lower><.upper>/ {
my $alias = (S:g/<lower><upper>/$<lower>-$<upper>/).lc given $attr.name.substr(2);
...
}
}
I don’t export the sub because it’s for internal use mostly. Would somebody need it for other purposes it’s a rare case where a long name like JSONRecord::Utils::kebabify-attr($attr)
must not be an issue.
The sub is not optimal, it’s what I came up with while expermineting with the approach. The number of method calls and regexes can be reduced.
I’ll get back later to the yada-yada-yada up there.
Now we need a bit of MOP magic. To handle all attributes of a class we need to iterate over them and apply the aliasing. The first what comes to mind is to use role body because it is invoked at the early class composition times:
unit role JSONRecord;
for ::?CLASS.^attributes(:local) -> $attr {
# take care of it...
}
Note the word “early” I used above. It actually means that when role’s body is executed there are likely more roles waiting for their turn to be composed into the class. So, there are likely more attributes to be added to the class.
But we can override Metamodel::ClassHOW
compose_attributes
method of our target ::?CLASS
and rest assured no one would be missed:
unit role JSONRecordHOW;
use JSONRecord::Utils;
method compose_attributes(Mu \obj, |) {
for self.attributes(obj, :local) -> $attr {
# Skip if it already has `is mooish` trait applied – we don't want to mess up with user's intentions.
next if $attr ~~ AttrX::Mooish::Attribute;
JSONRecord::Utils::kebabify-attr($attr);
}
nextsame
}
Basically, that’s all we currently need to finalize the solution. We can still use role’s body to implement the key elements of it:
unit role JSONRecord;
use JSONRecordHOW;
unless ::?CLASS.HOW ~~ JSONRecordHOW {
::?CLASS.HOW does JSONRecordHOW;
}
Job done! Don’t worry, I haven’t forgot about the yada-yada-yada above!
But…
The original record role name itself is even longer than JSONRecord
, and it consists of three parts. I’m lazy. There are a lot of JSON structures and I want less typing per each. A trait? is jrecord
?
unit role JSONRecord;
multi sub trait_mod:<is>(Mu:U \type, Bool:D :$jrecord) is export {
unless type.HOW ~~ JSONRecordHOW {
type.HOW does JSONRecordHOW
type.^add_role(::?ROLE);
}
}
Now, instead of class SomeRecord does JSONRecord
I can use class SomeRecord is jrecord
. In the original case the win is even bigger.
There is absolutely nothing funny about it. Just a common way to keep a reader interested!
Seriously.
The reason for the yada in that snippet is to avoid a distraction from the primary purpose of the example. Here is what is going on there:
I want AttrX::Mooish
to do the dirty work for me. Eventually, what is needed is to apply the is mooish
trait as shown above. But the traits are just subs. Therefore all is needed now is to:
&trait_mod:<is>($attr, :mooish(:$alias));
Because this is what Raku does internally when encounters is mooish(:alias(...))
. The final version of the kebabifying sub is:
our sub kebabify-attr(Attribute:D $attr) {
if $attr.name ~~ /<.lower><.upper>/ {
my $alias = (S:g/<lower><upper>/$<lower>-$<upper>/).lc given $attr.name.substr(2);
&trait_mod:<is>($attr, :mooish(:$alias));
}
}
Since the sub is used by the HOW above, we can say that the &trait_mod<is>
would be called at compile time2.
Now, it used to be:
class SomeRecord does JSONRecord {
has $.aLongAttrName is mooish(:alias<a-long-attr-name>);
has $.shortname;
}
Where, as you can see, I had to transfer JSON key names to attribute names, decide where aliasing is needed, add it, and make sure no mistakes were made or attributes are missed.
With the above rather simple tweaks:
class SomeRecord is jrecord {
has $.aLongAttrName;
has $.shortname;
}
Job done.
Before I came down to this solution I’ve got 34 record classes implemented using the old approach. Some are little, some are quite big. But it most certainly could’ve taken much less time would I have the trait at my disposal back then…
I recently blogged about raku Sigils and the Single Argument Rule.
This post is by way of a follow up. It aims to compare and contrast the practical application of sigils in various raku coding styles. It was inspired by a recent chat on the raku Discord channel, in which some consensus was reached that a dollar-only style is preferred by several coders who I respect – and yet I find that my style differs and, unlike my friends, I prefer to apply a combination of dollar- and ampersand-sigils to do what I mean.
One of the most underrated aspects of the raku language is its ability to make easy things easy. Perhaps this is due to the high geek ratio we have on our language channels … who are keen to dig into esoteric details like ASTs and meta-object protocols. Me, I’m more of a jobbing programmer who aims for simplicity and readability – in the language of crawl, walk, run I’d say I’m a slow walker. (That’s not to say that I am against advanced features since they make the hard things possible such as the amazing Red module.)
So, I hope that these code style illustrations also show just how easy raku makes it to apply each of the styles in a low boilerplate way.
#raku in procedural stylee
my @people = [ %( name => 'Mary', height => 160 ),
%( name => 'Isla', height => 80 ),
%( name => 'Sam', ), ];
my $height-total = 0;
my $height-count = 0;
for @people -> %person {
if %person<height> {
$height-total += %person<height>;
$height-count += 1;
}
}
if $height-count {
my $average-height = $height-total / $height-count;
say $average-height
}
#120
Here, the key take-aways are:
# raku in functional stylee
my \people = ( ( name => 'Mary', height => 160 ).Map,
( name => 'Isla', height => 80 ).Map,
( name => 'Sam', ).Map, );
my \heights = people.grep( *<height> ).map( *<height> );
if heights.elems {
my \average-height = ( [+] heights ) / heights.elems;
say average-height
}
#120
In the functional case, we have:
# raku in object oriented stylee
class Person {
has $.name is required;
has $.height;
}
class Group {
has @.people;
method heights {
@.people.grep( *.height ).map( *.height )
}
method average-height {
if $.heights.elems {
( [+] $.heights ) / $.heights.elems;
}
}
}
my @people = ( Person.new( name => 'Mary', height => 160 ),
Person.new( name => 'Isla', height => 80 ),
Person.new( name => 'Sam', ), );
my $group = Group.new( :@people );
say $group.average-height;
#120
In contrast, the OO variant has:
A gist of the raku programs is provided if you would like to test drive for yourself…
These short raku examples were inspired by the excellent Mary Rose Cook post A Practical Introduction To Functional Programming with the original examples given in Python.
In contrast to that Python, the raku core required no modules to be imported to implement the functional style such as ‘from operator import add’ and I feel that both the reduce (‘[+]’) and lambda (‘*<height>’) terms in raku are more natural.
Unlike other languages, raku does not seek to constrain our coding style to one best way. It provides a set of lego bricks and syntax alternatives and encourages us to adopt the style that is most appropriate for the problem at hand.
The purely Functional version was a very interesting exercise for yours truly since I am using raku as a tool to learn and incorporate functional techniques into my code, I particularly like:
Looping back to the sigils discussion, in the light of these illustrations, my opinion is:
I personally like the way code looks and works with $, @ and % …. but I would be happy to read and write code where the convention is to itemize everything and to limit to the $ sigil.
And … I think that I will be trying to bring more of a functional flavour in my raku projects as I get more confident!
~librasteve
I have managed to finish one more article in the Advanced Raku For Beginners series, this time about type and object composition in Raku.
It’s likely to take a long before I can write another.
Once, long ago, coincidentally a few people were asking the same question: how do I get a method object of a class?
Answers to the question would depend on particular circumstances of the code where this functionality is needed. One
would be about using MOP methods like .^lookup
, the other is to use method name and indirect resolution on invocant:
self."$method-name"(...)
. Both are the most useful, in my view. But sometimes declaring a method as our
can be
helpful too:
class Foo {
our method bar {}
}
say Foo::<&bar>.raku;
Just don’t forget that this way we always get the method of class Foo
, even if a subclass overrides method bar
.
With great amusement I watch the discussion of worries about — let’s call them “changes” — of reddit on reddit. The OP wishes Usenet to be back. And yes, we traded robust and decentralised services for “platforms” that require less clicks to reach them. Don’t get me wrong. I don’t wish the good old days back when we had a need for mirrors, because interruptions of the intertubes where quite common. However, I do miss the ability to access documentation without the need of dozens and dozens of machines to all work at the same time. Is docs.raku.org DDoS-safe? I hope so, because running that site on localhost
is not exactly trivial.
I like to query the issues for Rakudo and the Problem Solving repo quite often. The latter basically consists of issues. Being on github a simple git clone
with the right URL should suffice, right? Well, as it turns out Github is a vendor and issues are the lock-in. You can move issues from one repo to another but there is no export-button. There is an API to fetch them — one by one. If they would not have an API, we would have to use a scraper, increasing the load on that companies servers. Go figure.
While trying to secure Rakudos issues I got the following result:
Message: API rate limit exceeded for user ID 426613., Locations: []
That’s about ⅔ in. I do cache individual issues locally, so in an hour I should have them all. In case you want to do the same here is how far I got within a day. It needs more work and there are external dependencies. Sadly, I have to shell-out to cmark, because none of the modules on raku.land can handle Markdown reliably. That slows things down. Mostly, because hyper
and Proc::Async
don’t mix too well. The script needs a little more work and plenty of testing, before I can publish it in earnest. I hope the coming week will be sufficient.
Enshittification is a thing and we should prepare for that. There are alternatives to github and issue trackers. Eventually, the “free” services we use will have to pay their investors. With money actually costing money again, that might happen sooner then later. I doubt that shrinking workforces and zero interest-rates mix. There seems to be a Problem Solving-issue in order.
While reading the source of RakuDoc::To::Text
I spotted lizmat doing dispatch by hand. She defines quite a few sub
s that all take a RakuAST::Doc
-type and return a Str
. However, they are multi-candidates of the same proto
. This allows for better readability, especially when debugging. Can we have multi-candidates with different names? Yes, but we need to cheat.
multi sub trait_mod:<is>(Routine \r, :$renamed) {
my $orig = r.name;
INIT {
use nqp;
r.set_name(nqp::decont(„$renamed (AKA $orig)“));
}
role RenamedMulti[$name] {
method name { $name }
}
r does RenamedMulti[$renamed]
}
proto sub foo(|) is renamed<foo-proto> { * }
multi sub foo(Int $) is renamed<inter> { say &?ROUTINE.name }
multi sub foo(Str $) is renamed<stringer> { say &?ROUTINE.name }
multi sub foo(Failure) is renamed<failer> { fail('oi‽') }
foo(42);
foo('answer');
foo(Failure);
# OUTPUT:
# inter
# stringer
# oi‽
# in sub failer (AKA failer) at tmp/2021-03-08.raku line 2416
# in block <unit> at tmp/2021-03-08.raku line 2420
As you can see, I have to (indirectly) use a nqp-opcodes, to change the name of the subroutine, thanks to nqp-trickery in Backtrace.pm6
. Why method AT-POS
doesn’t just call .name
on the Callable
, I don’t know and my enquiry on IRC was fruitless. I believe it should but that might break things in CORE.
I will come back to this problem once we got v6.e
. RakuAST might make the Backtrace
-problem go away without me complaining via an issue. The latter will be my next topic and seems to be a more pressing … issue.
In the earliest days of Raku, Damian Conway specified a documentation markup language to accompany it. Since it was modeled on Perl's POD it was called <sound of trumpets and dramatic pause>
POD6.
The Specification of POD6 (S26) was mostly incorporated without much extra explanation in the documentation suite. In this way, the description of POD6 was itself was an illustration of many of the features it documented, and some that it did not document.
Since Raku is defined by its test suite, and not its documentation, there were other details of POD6 in the tests that were not documented, even in S26.
Raku developed and morphed, but POD6 remained. The tooling for rendering the documentation sources needed updating, and the documentation site had to be modernised.
A project of mine was to upgrade the basic renderer that would transform POD6 to HTML, but allow for developers to customise the templates for each type of POD6 block type. (The first Pod::To::HTML
renderer hard-coded representations of POD6 markup, eg. B<this is bold>
was <strong>this is bold</strong>
and could not be changed.)
It turned out that S26 allowed for much more than had been included in the first documentation sources, including custom blocks and custom markup.
The project to upgrade the original HTML renderer morphed into Raku::Pod::Render, and transforming a directory full of individual documentation sources into an interlinked and searchable set of documents required another layer of tooling Collection. For example, collecting together all the pages that can be grouped as tutorials, or reference, or language, and creating a separate page for them automatically.
I covered these two projects in a presentation to RakuCon 2022.
Some of the original ideas in S26 had not been implemented, such as aliases and generic numbering. Other ideas had become outdated, such as a way to specify document encoding, which is now solved with Unicode.
In addition, RakuAST (see RakuAST for early adopters ) is on the horizon, which will radically change the speed of documentation processing.
There are also two implementations of POD6, one in Raku and one in Javascript, namely Alexandr Zahatski's Podlite.
This was an ideal time to revisit POD6 and recast it into Rakudoc - new name for the markup language, and its new file extension ".rakudoc".
I was invited to the first Raku Core Summit and I put together a presentation about the changes I thought needed to be made based on my own experience, but also using comments from other developers.
We came to a number of consensus agreements about the minimal changes that were needed, and some extra functionality to handle new questions, such as documentation versioning.
It was also clear that Rakudoc (aka POD6) has two separate parts: components that interact closely with the program being documented, and components that will be rendered separately into HTML (or an ebook). The documentation file needs to make this clear.
I have now written the first draft of the revision and the documentation file that encapsulates it. An HTML version can be found at new-raku.finanalyst.org/language/rakudoc, alongside the old documentation file and the simple table implementation. I am planning future blogs to describe some of the proposed revisions.
However, none of the revisions will break existing POD6, so Rakudoc should be backwards compatible with POD6. The version at new-raku
is a VERY early first draft, and it will go through several review stages.
The first Raku Core Summit was organised by Elizabeth Mattijsen and hosted by Elizabeth and Wendy at their home. It was a really good meeting and I am sincerely grateful for their generosity and hospitality. The summit was also supported by The Perl and Raku Foundation, Rootprompt, and Edument.
The recording quality of the first Rakudo Core Development Class happened to be disastrous. Due to this I was very hesitant to publish it without attempting to get it fixed first. To the full extreme of it, I thought I can just re-record the screen part in higher quality and then apply the original audio. But last day I did some assessment of the video and came to the conclusion that even if re-recording is ever possible technically, it’d be too costly time-wise. Besides, the entire ending of the class was lost because I was carried away with the presentation and forgot to check the status of recording which JITSI limits to 1 hour in one run.
And I gave up. With a few cut-outs and some titles added the video is now available. Enjoy and sorry for the glitches!
After a long marathon of coding for different projects a change of activity was required. Meanwhile, the series of publications had been neglected for too long. This needed to be fixed. It has been fixed..
The first Raku Core Summit, a gathering of folks who work on “core” Raku things, was held on the first weekend of June, and I was one of those invited to attend. It’s certainly the case that I’ve been a lot less active in Raku things over the last 18 months, and I hesitated for a moment over whether to go. However, even if I’m not so involved day to day in Raku things at the moment, I’m still keen to see the language and its ecosystem move forward, and – having implemented no small amount of the compiler and runtime since getting involved in 2007 – I figured I’d find something useful to do there!
The area I was especially keen to help with is RakuAST, something I started, and that I’m glad I managed to bring far enough that others could see the potential and were excited enough to pick it up and run with it.
One tricky aspect of implementing Raku is the whole notion of BEGIN time (of course, this is also one of the things that makes Raku powerful and thus is widely used). In short, BEGIN time is about running code during the compile time, and in Raku there’s no separate meta-language; anything you can do at runtime, you can (in principle) do at compile time too. The problem at hand was what to do about references from code running at compile time to lexically scoped symbols in the surrounding scope. Of note, that lexical scope is still being compiled, so doesn’t really exist yet so far as the runtime is concerned. The current compiler deals with this by building up an entire flattened table of everything that is visible, and installing it as a fake outer scope while running the BEGIN-time code. This is rather costly, and the hope in RakuAST was to avoid this kind of approach in general.
A better solution seemed to be at hand by spotting such references during compilation, resolving them, and fixating them – that is, they get compiled as if they were lookups into a constant table. (This copies the suggested approach for quasiquoted code that references symbols in the lexical scope of where the quasiquoted code appears.) This seemed promising, but there’s a problem:
my $x = BEGIN %*ENV<DEBUG> ?? -> $x { note "Got $x"; foo($x) } !! -> $x { foo($x) };
It’s fine to post-declare subs, and so there’s no value to fixate. Thankfully, the generalized dispatch mechanism can ride to the rescue; we can:
When compiling Raku code, timing is everything. I knew this and tried to account for it in the RakuAST design from the start, but a couple of things in particular turned out a bit awkward.
I got a decent way into this restructuring work during the core summit, and hope to find time soon to get it a bit further along (I’ve been a mix of busy, tired, and had an eye infection to boot since getting back from the summit, so thus far there’s not been time for it).
I also took part in various other discussions and helped with some other things; those that are probably most worth mentioning are:
Thanks goes to Liz for organizing the summit, to Wendy for keeping everyone so well fed and watered, to the rest of attendees for many interesting discussions over the three days, to TPRF and Rootprompt for sponsoring the event, and to Edument for supporting my attendance.
Hi hackers! Today the MoarVM JIT project is nearly 9 years old. I was inspired by Jonathan's presentation reflecting on the development of MoarVM, to do the same for the MoarVM JIT, for which I have been responsible.
For those who are unfamiliar, what is commonly understood as 'JIT compilation' for virtual machines is performed by two components in MoarVM.
This post refers only to the native code generation backend component. It, too, is split into two mostly-independent systems:
One one hand, as a result of my limited experience, time and resources, and on the other hand as a result of the design of MoarVM.
MoarVM was originally designed as a traditional interpreter for a high level language (much like the Perl interpreter). Meaning that it has a large number of different instructions and many instructions operate on high-level data structures like strings, arrays and maps (as opposed to pointers and machine words).
This is by no means a bad or outdated design. Frequently executed routines (string manipulation, hash table lookups etc.) are implemented using an efficient language (C) and driven by a language that is optimized for usability (Raku). This design is also used in modern machine learning frameworks. More importantly, this was a reasonable design because it is a good target for the Rakudo compiler.
For the JIT compiler, this means two things:
The machine code generated by the JIT compiler then will mostly consists of consecutive function calls to VM routines, which is not the type of code where a compiler can really improve performance much.
In other words, suppose 50% of runtime is spent in interpretation overhead (instruction decoding and dispatch), and 50% is spent in VM routines, then removing interpretation overhead via JIT compilation will at best result in a twofold increase in performance. For many programs, the observed performance increase will be even less.
Mind that I'm specifically refering to the improvement due to machine code generation, and not to those due to type specialization, inlining etc. (the domain of 'spesh'). These latter features have resulted in much more significant performance improvements.
For me personally, it was a tremendously valuable learning experience which led directly to my current career, writing SQL compilers for Google Cloud.
For the Raku community, even if we never realized the performance improvements that I might have hoped at the start, I hope that the JIT project (as it exists) has been valuable, if for no other reason than identifying the challenges of JIT compilation for MoarVM. A future effort may be able to do better based on what we learned; and I hope my blog posts are a useful resource from that perspective.
Assuming that time and resources were not an issue:
If any of this comes to pass, you'll find my report on it right here. Thanks for reasding and until then!
raku is a great programming language and Dan is a raku module for Data ANalytics that provides DataFrame and Series roles & types. Disclosure: I am the author of Dan.
A while back, in the raku Discord channel, a discussion about the value of the raku $
anonymous state variable. Something like this:
me: I value the $ since it is useful in the Dan DataFrame sort operation like this:
say ~df.sort: { df[$++]<C> }; # sort by col C
them: that use of $ is a hack, why don’t you go like this:
say ~df.sort: *<C>;
-or-
say ~df.sort: $_<C>;
As the author I felt a little hurt that some of the more forthright community members felt I was resorting to a hack and a little bemused that my module couldn’t do this. This post aims to explore the situation.
In case you don’t know about DataFrames, they are widely used in popular data analytics packages such as Python Pandas and Rust Polars. Here’s how raku Dan DataFrames are structured:
The code on the side is taken directly from the module implementation.
Here’s how to access a specific data item:
my \df = DataFrame.new( ... );
say df.data[m;n];
- or -
say df[m;n];
Since df.data is an out of the box raku 2d Array, a semicolon index [m;n] will pick an item. Raku also takes index variants such as a range with eg. ‘2..4‘ or a slice operation with ‘*‘.
say df[m;*]; # returns a 1d Array with all the items in row m
say df[*;n]; # returns a 1d Array with all the items in col n
In addition to exposing @.data as a public attribute, a Dan DataFrame object delegates positional accessor method calls on df to its df.data attribute – so df[m;n] is the same as df.data[m;n].
Standard raku has two kinds of accessors:
#Positional - to access Array items with an index...
my @a = [1,2,3,4]; say @a[0]; #1
#Associative - to access Hash items with a key...
my %h = %(a=>1, b=>2); say %h<a>; #1
Since a raku Dan DataFrame can have named columns and row indexes, both Positional and Associative access can be helpful, and this is provided by DataFrames in other languages.
Here’s how that looks for a raku Dan DataFrame (which is 2d):
my \df = DataFrame.new( [1,2;3,4], index => <A B>, columns => <A B>);
say df[0;0]; #1
say df.data[0;0]; #1
say df[0][0]; #1
say df[0]<A>; #1
say df<A>[0]; #1
say df<A><A>; #1
# ^^^ these all return the same item!
This feature is called “cascading accessors” and is mentioned with a different name in the raku design synopses.
It’s worth mentioning that item accessors are not universally liked in the data analytics world. Generally speaking it is unusual to want to access a single item as opposed to a general operation that applies to all members of the structure. Often they use awkward terminology such as ‘iloc’.
Nevertheless, I think that this design – which builds on the thoughtful and rich standard raku accessor capabilities is worthwhile. Cascading accessors are pretty obvious and user friendly.
The Dan implementation of cascading accessors is built using the mechanisms provided for raku custom types.
Dan implements cascading accessors as follows:
Here are some examples:
say ~df[0]; # returns a DataSlice of row 0
say ~df[*]<A>; # returns a Series of col A
say ~df[0..*-2][1..*-1]; # returns a new DataFrame
say ~df[0..2]^; # returns a new DataFrame
### postfix '^' here converts an Array of DataSlices into a new DataFrame
In general, Dan aims to use the standard built in raku operations wherever possible. The use of the built in sort is no exception.
Here is the controversial sort operation again:
say ~df.sort: { df[$++]<C> }; # sort by col C
Let’s take a look step by step:
So, here the sort block uses cascading accessors to pick df[$++]<C> …
Thus the DataFrame is sorted by column C.
Here are some other sort examples from the module synopsis:
say ~df.sort: { .[1] }; # sort by 2nd col (ascending)
say ~df.sort: { -.[1] }; # sort by 2nd col (descending)
say ~df.sort: { df.ix[$++] }; # sort by index
The question remains “should Dan cascading accessors shun the state variable $?”
On the one hand, that the state variable does a fine job of handling a wide range of 2d accessor use cases such as sort. So, in the spirit of the original design, I think that the anonymous state variable $ is a valuable piece of the raku toolbox and works well in the context of the indexing “micro-slang” for Dan.
On the other hand, looking at a regular 2d Array:
my @dr = [[rand xx 4] xx 6];
@dr.sort: *[2];
This Whatever syntax works fine (and is more intuitive), so I agree that this is a reasonable feature request for future version of Dan I and I will add to the TODO list (but it is a fairly long list…)
From the “bigger picture” point of view, I think that the ability to bolt on cascading accessors to raku is a testament to the malleability of the language.
Is ‘$’ a hack? I leave it to the reader to judge…. what do you think?
~librasteve
I was looking for a neat way to specify units when working with numbers. When doing dimensional analysis, many physicists like to put units into square brackets to create an additional namespace. We can do the same.
use v6.d;
class Unit { ... }
class SiInt is Int {
trusts GLOBAL;
trusts Unit;
has Unit $!unit;
method !unit { return-rw $!unit }
method new(CORE::Int $new) { nextsame }
method Str { self.Int::Str ~ $!unit.suffix }
method ACCEPTS(Unit:U $u) { $!unit === $u }
}
class Unit {
our $.suffix = '';
our $.long-name = "unit-less";
method ACCEPTS(SiInt $si) { $si!SiInt::unit === self }
}
class m is Unit { our $.suffix = 'm'; our $.long-name = 'Meter'; }
multi sub postcircumfix:<[ ]>(SiInt $obj, Unit:U $unit) {
$obj!SiInt::unit === Unit ?? ($obj!SiInt::unit = $unit)
!! fail(‘Sorry, units can only be set, not changed.’);
$obj
}
multi sub postcircumfix:<[ ]>(Int $value, Unit:U $unit) { SiInt.new($value)[$unit] }
constant Int = SiInt; # intentional shadowing of CORE::Int
my $i = 42[m];
put [$i, $i.^name]; # 42m SiInt
my Int() $a = 1;
put [$a, $a.^name]; # 1 SiInt
class s is Unit { our $.suffix = 's'; our $.long-name = 'Seconds'; }
multi sub infix:<+>(SiInt $l, SiInt $r) {
$l!SiInt::unit === Unit ?? callsame()[$r!SiInt::unit]
!! $r!SiInt::unit === Unit ?? callsame()[$l!SiInt::unit]
!! $l!SiInt::unit === $r!SiInt::unit ?? nextsame()
!! fail(„Unit mismatch between $l and $r“)
}
my $s = 60[s];
say $i + $a; # 43m
say $s + $i; # Unit mismatch between 60s and 42m
The idea is to have a numerical type that is by default unit-less. A unit can be added (but not changed) with square bracket postcircumfix. Since I add type-objects for each unit, I don’t have to mess around with strings and can multi-dispatch if needed. Since I want direct access to the unit, I tell the class to trust the package the operators are defined in. (This could be a module, of course.) What happens to be an ENODOC.
I have to use a forward declaration to get ACCEPTS
to get hold of $!unit
. Subsequently, multi-dispatch works just fine.
multi sub fancy(Int $value where m) { #`[fancy calculation goes here] }
multi sub fancy(Int) { fail ‘Can only be fancy with Unit "m".’ }
fancy($i);
Since SiInt
is just an Int
all built-ins will work, so long the unit is restored after using them. Being able to trust operators allows them to access the entire class, without having to cheat with use nqp;
.
Because Raku treats types as values, I can calculate a compound unit.
class Meters-per-Second is Unit { our $.suffix = 'm/s'; our $.long-name = 'Meters per Second'; }
multi sub infix:</>(m, s) { Meters-per-Second }
sub speed($d where m, $t where s) { ($d / $t).Int.[m/s] }
my Int $fast = speed(500[m], 1[s]);
say $fast; # 500m/s
I’m quite pleased with being able to extend the type-system so easily without having to invent a complete new DSL. This aids composability greatly.
(in chronological order, with comment references)