Growing explosively over the last decade, most text today is now in Unicode, with some corpora reaching 100% Unicode. This tutorial shows Perl programmers how to write scripts to handle Unicode data reliably.
๐ ยกฦจdlษษฅ ฦจแดษฅส ษdoษฅ puษ สปฮปษp ษษแดu ษ ษสษษฅ สปสษnl pooโ
๐
http://training.perl.com/OSCON2011/index.html
Pod::S5
by Tom Linden,
which in turn uses S5
by Eric Meyer. Slideshow controls appears
if you hover near the bottom right corner.
that will somehow โenable Unicode by defaultโ in your code.
unichars
,
uninames
,
uniprops
charnames
pragma:
unicore/html_alias.pl
tcgrep
,
ucsort
,
unifmt
,
unilook
,
uniquote
(a better cat โv
or od
),
uniwc
nfc
,
nfd
,
nfkc
,
nfkd
,
plus nfcheck
uc
,
lc
,
tc
,
plus titulate
;
unifont
,
leo
,
unicaps
,
uninarrow
,
uniwide
,
unisubs
,
unisupers
es-sort
,
hantest
,
havshpx
,
hypertest
,
nunez
(should be called nรบรฑez
or even better, bรบsqueda-libre
, but for ๏ฌlesys issues)
$ export PERL_UNICODE=CSD # but see below $ export LESS=MQeicsnf $ alias pc 'perl5.14.0 -Mcharnames=:full,:short,latin,greek -E' $ alias ug=uninames $ alias um='ucsort | less -r'
% alias um 'ucsort | less -r' % alias ug uninames % alias pc 'perl5.14.0 -Mcharnames=:full,:short,latin,greek -E' % setenv PERL_UNICODE CSD % export LESS MQeicsnf
Here are simple examples to play with:
% pc 'say "\N{long s} \N{ae} \N{Omega} \N{omega}" \N{UPWARDS ARROW}' % echo exempli gratia | tc | unifont % tcgrep -n '\P{ASCII}' pue.pod | uniquote -x | less % tcgrep -n '\P{ASCII}' pue.pod | uniquote -v | less % echo you are not expected to read this | leo % echo these are small caps | tc | unicaps % echo supers | unisupers % echo 123a | unisubs
Here are more interesting examples to play with:
% uninames latin ligat % uninames SIGN -arabic % uninames arrow -combining % uniprops 'MICRO SIGN' % uniprops -a 2010 % unichars '\pL' '\p{Greek}' % unichars '\pL' '\p{Greek}' | um % unichars '\p{Age=6.0}' | um % unichars -gasn NUM % unilook glob % unilook /glob % unilook -v '\pM' % unilook -v '\N{acute}' % unilook -v whom % unilook -vpn run
PERL_UNICODE
envariable to AS
. This makes all ๐ช
scripts decode @ARGV
as UTFโ8 strings, and sets the encoding of all
three of stdin, stdout, and stderr to UTFโ8. You may have to turn it
o๏ฌ at times, though. I donโt recommend D
.
do
hickey), prominently assert that you are running perl version 5.12
or better via:
use v5.12; # minimal for unicode_strings feature use v5.14; # optimal for unicode_strings feature
use utf8;
use strict;
use warnings;
use warnings qw( FATAL utf8 );
utf8
warning class comprises three subwarnings
โ nonchar
, surrogate
, and non_unicode
โ which you may sometimes
wish to exert greater (i.e., separate) control over.
no warnings "non_unicode";
:std
adds in STDIN
, STDOUT
, and STDERR
.
This critical step
implicitly decodes incoming data and encodes outgoing data as UTFโ8.
use open qw( :encoding(UTF-8) :std );
\N{CHARNAME}
.
use charnames qw( :full );
DATA
handle, you must explicitly set its
encoding. If you want this to be UTFโ8, then say:
binmode(DATA, ":encoding(UTF-8)");
#!
line is debatable. Consider it a shortcut for whatever
you need on your system: I try not to in๏ฌict perlrunโs
eval exec
hack on people. ๐
open
pragma from working
correctly if youโve also used the autodie
pragma:
https://rt.cpan.org/Public/Bug/Display.html?id=54777
Donโt take my #!
here too seriously; it has Issues.
#!/usr/bin/env perl use v5.14; use utf8; use strict; use autodie; use warnings; use warnings qw< FATAL utf8 >; use open qw< :std :encoding(UTF-8) >; use charnames qw< :full >; use feature qw< unicode_strings >;
The ๏ฌrst of these is almost always needed; the rest, not so much.
use Unicode::Normalize qw< NFD NFC >; use Encode qw< encode decode >; use Carp qw< carp croak confess cluck >; use File::Basename qw< basename >; $0 = basename($0); # shorter messages
Donโt make $|
hot if you have a lot of output on STDOUT
.
binmode(DATA, ":encoding(UTF-8)"); # This works like perl -CA: note that it # assumes your terminal is set to use UTF-8 if (grep /\P{ASCII}/ => @ARGV) { @ARGV = map { decode("UTF-8", $_) } @ARGV; } $| = 1; # comment out for performance END { close STDOUT }
This avoids compileโtime ๐ โbugsโ in the pragma:
# XXX: use warnings FATAL => "all"; local $SIG{__DIE__} = sub { confess "Uncaught exception: @_" unless $^S; }; local $SIG{__WARN__} = sub { if ($^S) { cluck "Trapped warning: @_" } else { confess "Deadly warning: @_" } };
I use this on normal CLI ๏ฌlters:
if (@ARGV == 0 && -t STDIN && -t STDERR) { print STDERR "$0: reading input from tty, type ^D for EOF...\n"; } while (<>) { chomp; $_ = NFD($_); ... } continue { say NFC($_); } __END__ ๐๐๐ ๐๐๐๐๐๐ย ๐๐๐๐ ๐๐๐๐๐๐๐๐๐
PERL_UNICODE
, which I have set to "SA"
. Thatโs equivalent to
running with the -CSA
commandโline option. Possible values are
0
= turn o๏ฌ all ๏ฌags (thatโs a DIGIT ZERO)
I
= STDIN is assumed to be in UTFโ8
O
= STDOUT will be in UTFโ8
E
= STDERR will be in UTFโ8
S
= I
+ O
+ E
i
= UTFโ8 is the default PerlIO layer for input streams
o
= UTFโ8 is the default PerlIO layer for output streams
D
= i
+ o
A
= the @ARGV
elements are expected to be strings
encoded in UTFโ8
L
= makes "IOEioA"
conditional on the locale
environment variables (LC_ALL
, LC_TYPE
, and LANG
, in order of
decreasing precedence) โ if the variables indicate UTFโ8, then the
selected "IOEioA"
are in e๏ฌect.
use feature "unicode_strings";
v5.14
utf8
feature
charnames
open
re "/flags"
encoding::warnings
bytes
encoding
locale
Specify Unicode literals any of these ways:
As literal UTFโ8 under the recommended utf8
pragma, allowing you to
write "ร contre-cลur"
, "ร
ngstrรถm"
, or "๐ช ๐ ๐ช"
directly.
As wicked โmagic numbersโ like chr(0x1F4A9)
, "\x{2639}"
, or "\N{U+A0}"
.
Using the charnames
pragma and the \N{CHARNAME}
construct, strings like "\N{LATIN SMALL LETTER A WITH GRAVE}
contre-c\N{LATIN SMALL LIGATURE OE}ur"
, "A\N{COMBINING RING
ABOVE}ngstro\N{COMBINING DIAERESIS}m"
, and "\N{FAMILY}
\N{GROWING HEART} \N{DROMEDARY CAMEL}"
.
:full
,
:short
,
SCRIPTNAME
,
or
:alias
.
charnames::string_vianame(name)
for runtime lookup of either
a character name or a named character sequence, returning its string
representation
charnames::vianame(name)
for runtime lookup of a character
name (but not a named character sequence) to get its ordinal value
(code point)
charnames::viacode(code)
for runtime lookup of a code point
to get its Unicode name.
use charnames ":full"; print "\N{GREEK CAPITAL LETTER DELTA} is delta.\n"; # ฮ is delta. use charnames ':short'; print "\N{greek:Delta} is an upper-case delta.\n"; # ฮ is an upper-case delta.
use charnames qw(cyrillic greek); print "Sigmata are \N{Sigma}, \N{sigma}, and \N{final sigma}.\n"; # Sigmata are ฮฃ, ฯ, and ฯ. print "\N{Be} and \N{be} are Cyrillic B's.\n"; # ะ and ะฑ are Cyrillic B's.
:alias
and a hash:
use charnames ":full", ":alias" => { e_ACUTE => "LATIN SMALL LETTER E WITH ACUTE", E_ACUTE => "LATIN CAPITAL LETTER E WITH ACUTE", }; print "I'll have the \N{e_ACUTE}touff\N{e_ACUTE}e.\n"; # I'll have the รฉtouffรฉe.
:alias
and a string looks for a corresponding
๏ฌle to require from unicore/
, which must be a subdirectory under your
@INC
path. For example, :html
would look for a ๏ฌle named
unicore/html_alias.pl
.
use charnames ":alias" => ":html"; print "\N{frac14} and \N{frac12} are \N{frac34}.\n"; # ยผ and ยฝ are ยพ.
Key core modules for Unicode are:
Encode
Unicode::Normalize
Unicode::Collate
Unicode::Collate::Locale
Unicode::UCD
DBM_Filter::utf8
Encode
module is most often used implicitly: itโs
loaded automatically whenever you pass an :encoding(ENC)
argument to binmode
or to open
.
binmode(STDIN, ":encoding(cp1252)") || die "can't binmode STDIN: $!"; open(OUTPUT, "> :raw :encoding(UTF-16LE) :crlf", $filename) || die "can't open $filename: $!"; print OUTPUT while <STDIN>; close(OUTPUT) || die "couldn't close $filename: $!"; close(STDIN) || die "couldn't close STDIN: $!";
Encode
module provides functions for when you need to
manually decode incoming data and to manually encode outgoing data.
encode
, decode
,
and find_encoding
.
use Encode qw< find_encoding >; for my $alias (qw< utf8 UTF-8 utf16le >) { my $obj = find_encoding($alias); my $name = $obj ? $obj->name() : "UNKNOWN"; printf "%-8s is really %s.\n", $alias, $name; } # utf8 is really utf8. # UTF-8 is really utf-8-strict. # utf16le is really UTF-16LE.
% byte2uni -a -e nextstep | less
The MacRoman encoding is a bit weird:
use charnames qw< :full >; use Encode ( "decode", # $unicode = decode("scheme", $bytes); "encode", # $bytes = encode("scheme", $unicode); ); my $permil = "\N{PER MILLE SIGN}"; printf "A permille %s is U+%vX in Unicode", $permil, $permil; my $bytes = encode("macroman", $permil); printf " but is 0x%vX in Macroman\n", $bytes; # A permille โฐ is U+2030 in Unicode but is 0xE4 in Macroman
The MacRoman encoding is still a bit weird:
use charnames qw< :full >; use Encode ( "decode", # $unicode = decode("scheme", $bytes); "encode", # $bytes = encode("scheme", $unicode); ); my $byte = chr(0x8E); my $char = decode("macroman", $byte); printf "An %vX in MacRoman is %vX in Unicode\n", $byte, $char; printf "Which is really a %s\n", charnames::viacode(ord $char); # An 8E in MacRoman is E9 in Unicode # Which is really a LATIN SMALL LETTER E WITH ACUTE
Unicode::Normalize
module.
while (<>) { chomp; $_ = NFD($_); ... } continue { say NFC($_); }
Just as one example, consider all these variants of a Latin small letter o with tilde:
N | Glyph | NFC? | NFD? | ๐ช๐ช๐ช๐ช๐ช | Code Points |
---|---|---|---|---|---|
1 | รต | โ | โ | "\x{F5}" |
LATIN SMALL LETTER O WITH TILDE |
2 | oฬ | โ | โ | "o\x{303}" |
LATIN SMALL LETTER O, COMBINING TILDE |
3 | ศญ | โ | โ | "\x{22D}" |
LATIN SMALL LETTER O WITH TILDE AND MACRON |
4 | รตฬ | โ | โ | "\x{F5}\x{304}" |
LATIN SMALL LETTER O WITH TILDE, COMBINING MACRON |
5 | oฬฬ | โ | โ | "o\x{303}\x{304}" |
LATIN SMALL LETTER O, COMBINING TILDE, COMBINING MACRON |
6 | ลฬ | โ | โ | "o\x{304}\x{303}" |
LATIN SMALL LETTER O, COMBINING MACRON, COMBINING TILDE |
7 | ลฬ | โ | โ | "\x{14D}\x{303}" |
LATIN SMALL LETTER O WITH MACRON, COMBINING TILDE |
eq
to get 1 & 2, 3โ5, and 6 & 7 to each
respectively test equal to one another.
In a regex, all 7 of those will be completely matched by \X
, an
extended grapheme cluster. Yes, but now what? ๐ญ Iโm afraid this is
where it stops being easy. NFD is assumed and required for the
following to work:
/^o/
reports that all 7 start with an o.
/^o\x{COMBINING TILDE}/
reports that 1โ5 start with an o and
a tilde, but that misses 6 & 7.
/^o\pM*?\x{COMBINING TILDE}/
to get all 7 matching.
\p{Grapheme_Extend}
instead of \pM
โ and, were there
any, using \p{Grapheme_Base}
instead of \PM
):
$o_tilde_rx = qr{ o \pM *? \x{COMBINING TILDE} \pM* }x;
eq
, ne
, le
, gt
, cmp
,
sort
, &c &c. ๐
Unicode::Collate
module. Itโs superโfancy,
so Iโll just show the simplest approaches here.
@a = sort @b
, just swap that code out for this
and all will be well:
use Unicode::Collate; @sorted = Unicode::Collate::->new->sort(@unsorted);
Unicode::Collate::Locale
module
for national sorts.
use Unicode::Collate::Locale; state $coll = new Unicode::Collate::Locale:: locale => "fr", # lots of other parameters possible here ; my @bons_mots = $coll->sort(our @mots);
@srecs = sort { $b->{AGE} <=> $b->{AGE} || $a->{NAME} cmp $b->{NAME} } @recs;
getSortKey
method:
my $collator = Unicode::Collate::->new(); for my $rec (@recs) { $rec->{NAME_key} = $collator->getSortKey($rec->{NAME}); } @srecs = sort { $b->{AGE} <=> $b->{AGE} || $a->{NAME_key} cmp $b->{NAME_key} } @recs;
These are its literal Getopt:::Long
arguments:
# collator constructor options --backwards-levels=i --collation-level|level|l=i --katakana-before-hiragana --normalization|n=s --override-CJK=s --override-Hangul=s --preprocess|P=s --upper-before-lower|u --variable=s # program specific options --case-insensitive|insensitive|i --input-encoding|e=s --locale|L=s --paragraph|p --reverse-fields|last --reverse-output|r --right-to-left|reverse-input
CPAN modules for handling Unicode include:
Unicode::LineBreak
, which includes Unicode::GCString
. These
respectively solve โthe format
problemโ and โthe printf
problemโ.
Unicode::Casing
for things like
lc
ฮฃฮคฮฮฮฮฮฃ โ ฯฯฮนฮณฮผฮฑฯ in Greek, or uc
i โ ฤฐ in the Turkic languages.
Unicode::Unihan
, and
if you liked the last one, you might want to look into
Lingua::JA::Romanize::Japanese
,
Lingua::KO::Hangul::Util
,
Lingua::KO::Romanize::Hangul
,
Lingua::ZH::Romanize::Pinyin
, &c.
Unicode::Stringprep
LATIN SMALL LETTER O WITH STROKE
has no decomposition to something
with an o in it.
Unicode::Collate
does count o, รต, and รธ as the same letter โ normally. Not
in Swedish or Hungarian, though.
LATIN SMALL
LETTER ETH
to anything with a d in it, but the UCA treats them as
the same letter. Er, except in Icelandic (the "is"
locale), where
d and รฐ are now di๏ฌerent letters in their own right.
\x{E6}
, or oe & ล \x{153}
? Those arenโt
casefolds of each other as occurs with ij and ฤณ \x{133}
,
and thereโs no useful decomposition, either. But Unicode::Collate
will treat them
alike.
\x{E4}
or a\x{308}
)
are the same. No kidding.
state $coll = new Unicode::Collate::Locale:: locale => "de__phonebook", ; if ($coll->eq($a, $b)) { ... }
FixString.pm
module and my
es-sort,
nรบรฑez, and
unilook tools.
\p{Lu}
is almost as wrong as code that uses
[A-Za-z]
. You need to use \p{Upper}
instead, because
\p{Lowercase}
(โก \p{Lower}
) is di๏ฌerent from
\p{Lowercase_Letter}
(โก \p{Ll}
) by 159 code points:
% unichars '\p{Lowercase}' '\P{Lowercase_Letter}' % unichars '\p{Lower}' '\P{Ll}' # same but easier to type
[a-zA-Z]
is even worse. And it canโt use \pL
or \p{Letter}
; it needs to use \p{Alphabetic}
. Not all alphabetics
are letters:
% unichars -a '\p{alphabetic}' '\P{Letter}' | wc -l # 1006 code points
/[\$\@%]\w+/
, then you
have a problem (or two).
/[\$\@%]\p{IDS}\p{IDC}*/
\h
and \v
, depending. And you should never use \s
to mean all possible Unicode whitespace.
\s
does not mean [\h\v]
. These both tell the same tale:
% unichars '\S' '[\v\h]' ---- U+000B LINE TABULATION % unichars '\S' '\p{space}' ---- U+000B LINE TABULATION
\n
for a line boundary, or even \r\n
, then
you are doing it wrong.
\R
. It means (?:\r\n|\v)
.
% unichars '\R' ---- U+000A LINE FEED (LF) ---- U+000B LINE TABULATION ---- U+000C FORM FEED (FF) ---- U+000D CARRIAGE RETURN (CR) ---- U+0085 NEXT LINE (NEL) ---- U+2028 LINE SEPARATOR ---- U+2029 PARAGRAPH SEPARATOR
my $slurpy = `cat somefile`; # pretend I didnโt do this :) $slurpy =~ s/\R/\n/g; # convert Unicode linebreaks
People make millions of broken assumptions about Unicode. Until they understand these things, their ๐ช code will be broken. Look for these Unicode antipatterns:
$/
to something that will work
with any valid line separator is wrong. \R
only works in patterns.
lc(uc($s)) eq $s
or uc(lc($s)) eq $s
, is completely borken and
worng. Consider that the uc("ฯ")
and uc("ฯ")
are both "ฮฃ"
,
but lc("ฮฃ")
cannot possibly return both of those.
"ยช"
is a
lowercase letter with no uppercase. Kinda.
"แต"
and "แดฌ"
are Cased
letters, they casemap
only to themselves. Both are Lowercase
, and Letter
s, but they are not
Lowercase_Letter
s.
\p{Lowercase_Letter}
, despite being
both \p{Letter}
s and \p{Lowercase}
. Theyโre \p{Modifier_Letter}
s,
actually. Honest.
% unichars -gas 'grep { length > 1 } lc, ucfirst, uc'
% unichars -gas 'uc ne ucfirst'
Case
is broken.
Beyond just letters, it turns out that numbers, symbols, and even marks have case.
% unichars -gas '\PL' '\p{Cased}'
Cased
code point always gives a di๏ฌerent code point is broken.
This shows there are 1299 unโcaseโchanging cased code points:
% unichars -gas '\p{Cased}' '[^\p{CWL}\p{CWT}\p{CWU}]'
y/\000-\177/\200-\377/
is broken
and wrong.
Try tr[\0-\x{10_FFFF}][\x{20_0000}-\x{30_FFFF}]
if you dare.
Mark
s to get โASCIIโ
letters is
evil and rude.
\p{Diacritic}
and marks
\p{Mark}
are the same thing is broken.
% unichars -gas '\p{mark}' '\P{DIACRITIC}' # 1068 code points % unichars -gas '\P{MARK}' '\p{diacritic}' # 209 code points
\p{GC=Dash_Punctuation}
covers as much as
the binary property \p{Dash=Yes}
is broken.
% unichars -gas '\p{Dash}'
\p{Mark}
characters take up zero print columns is broken.
% unichars -gas '\pM' '\P{BC=NSM}'
\X
can match is wrong.
\X
can never start with a \p{Mark}
character is wrong.
\X
can never hold two nonโ\p{Mark}
characters is wrong.
"\x{FFFF}"
is wrong.
"\xC0\x80"
is UTFโ8 is
broken and wrong.
% perl -Mcharnames=:full -E 'say "\N{RLE}", "12 < 345 < 6789"' 6789 > 345 > 12 % perl -Mcharnames=:full -E 'say "\N{RLO}", "12 < 345 < 6789"' 9876 > 543 > 21
X
and then
character Y
, that those will show up as XY
is wrong. Sometimes
they donโt.
\p{Math}
code points are visible
characters is wrong.
\w
contains only letters, digits, and
underscores is wrong โ unless you use the /a
modi๏ฌer or
use re "/a";
^
and ~
are punctuation marks is
wrong.
ใ
, โ
, ใจ
, โจ
, & โข
contain any letters in
them is wrong โ except in NFKD:
% unichars -gas '\pS' 'NFKD =~ /\p{Latin}/' | ucsort | less -r
\p{InLatin}
is the same as \p{Latin}
is
heinously broken.
\p{InLatin}
is almost ever useful is
almost certainly wrong.
$FIRST_LETTER
as the ๏ฌrst letter
in some alphabet and $LAST_LETTER
as the last letter in that same
alphabet, writing [${FIRST_LETTER}-${LAST_LETTER}]
has any meaning
whatsoever is almost always complete broken and wrong and meaningless.
?
is broken, stupid,
braindead, and runs contrary to the standard recommendation, which says
not to do that! So donโt.
printf
widths to pad and
justify Unicode data is broken and wrong. Use Unicode::GCString
to count columns.
ls
or readdir
on its enclosing directory,
youโll actually ๏ฌnd that ๏ฌle with the name you created it under is
buggy, broken, and wrong. Stop being surprised by this!
/s/i
can match only "S"
or
"s"
is broken and wrong. Youโd be surprised!
\PM\pM
to ๏ฌnd grapheme clusters instead of using
\X
is broken and wrong.
ๆๅญๅใ ๐ฌ ๐ณ ๐ต ๐ท ๐บ ๐ ๐ฉ ๐ญ ๐ฒ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ฌ ๐ฏ ๐ฑ ๐ต ๐น ๐ป ๐ ๐ ค ๐ ซ ๐ ๐ ๐ ๐ ๐ ๐ถ ๐ป ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ฃ ๐ฆ ๐จ ๐ซ ๐ฉ ๐ฌ ๐ฎ ๐ฑ ๐ต ๐บ ๐ผ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ฐ ๐ฑ ๐ฒ ๐ณ ๐ด ๐ต ๐ท ๐ธ ๐น ๐บ ๐ป ๐ผ ๐ฝ ๐พ ๐ฟ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ก ๐ข ๐ฃ ๐ค ๐ฅ ๐ฆ ๐ง ๐จ ๐ฉ ๐ช ๐ซ ๐ฌ ๐ญ ๐ฎ ๐ฏ ๐ฐ ๐ฑ ๐ฒ ๐ณ ๐ด ๐ต ๐ถ ๐ท ๐ธ ๐น ๐บ ๐ป ๐ผ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ก ๐ข ๐ฃ ๐ค ๐ฅ ๐ฆ ๐ง ๐จ ๐ฉ ๐ช ๐ซ ๐ฌ ๐ญ ๐ฎ ๐ฏ ๐ฐ ๐ฑ ๐ฒ ๐ณ ๐ด ๐ต ๐ถ ๐ท ๐ธ ๐น ๐บ ๐ป ๐ผ ๐ฝ ๐พ ๐ฟ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ก ๐ข ๐ฃ ๐ค ๐ฅ ๐ฆ ๐ง ๐จ ๐ฉ ๐ช ๐ซ ๐ฌ ๐ญ ๐ฎ ๐ฏ ๐ฐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ก ๐ข ๐ฃ ๐ค ๐ฅ ๐ฆ ๐ง ๐จ ๐ฉ ๐ช ๐ซ ๐ฌ ๐ญ ๐ฎ ๐ฏ ๐ฐ ๐ฑ ๐ฒ ๐ณ ๐ด ๐ต ๐ถ ๐ท ๐ธ ๐น ๐บ ๐ป ๐ผ ๐ฝ ๐พ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ก ๐ข ๐ฃ ๐ค ๐ฅ ๐ฆ ๐ง ๐จ ๐ฉ ๐ช ๐ซ ๐ฌ ๐ญ ๐ฎ ๐ฏ ๐ฐ ๐ฑ ๐ฒ ๐ณ ๐ด ๐ต ๐ถ ๐ท ๐ธ ๐น ๐บ ๐ป ๐ผ ๐ฝ ๐พ ๐ฟ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ก ๐ข ๐ฃ ๐ค ๐ฅ ๐ฆ ๐ง ๐จ ๐ฉ ๐ช ๐ซ ๐ฌ ๐ญ ๐ฎ ๐ฏ ๐ฐ ๐ฑ ๐ฒ ๐ณ ๐ด ๐ต ๐ถ ๐ท ๐ธ ๐น ๐บ ๐ป ๐ผ ๐ฝ ๐พ ๐ฟ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ก ๐ข ๐ฃ ๐ค ๐ฅ ๐ฆ ๐ง ๐จ ๐ฉ ๐ช ๐ซ ๐ฌ ๐ญ ๐ฎ ๐ฏ ๐ฐ ๐ฑ ๐ฒ ๐ณ ๐ด ๐ต ๐ถ ๐ท ๐น ๐บ ๐ป ๐ผ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ก ๐ข ๐ฃ ๐ค ๐ฅ ๐ฆ ๐ง ๐จ ๐ฉ ๐ช ๐ซ ๐ฌ ๐ญ ๐ฎ ๐ฏ ๐ฐ ๐ฑ ๐ฒ ๐ณ ๐ด ๐ต ๐ถ ๐ท ๐ธ ๐น ๐บ ๐ป ๐ผ ๐ฝ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ก ๐ข ๐ฃ ๐ค ๐ฅ ๐ฆ ๐ง ๐ป ๐ผ ๐ฝ ๐พ ๐ฟ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ก ๐ข ๐ฃ ๐ค ๐ฅ ๐จ ๐ฉ ๐ช ๐ซ ๐ญ ๐ฐ ๐ฑ ๐ฒ ๐ณ ๐ต ๐ถ ๐ท ๐ธ ๐น ๐บ ๐ป ๐ผ ๐ฝ ๐พ ๐ฟ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ข ๐ค ๐ง ๐ช ๐ฌ ๐ฏ ๐ณ ๐ธ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ก ๐ค ๐ฆ ๐ฉ ๐ซ ๐ฎ ๐ฒ ๐ณ ๐ท ๐บ ๐ผ ๐ฟ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ข ๐ฉ ๐ฎ ๐ฐ ๐ณ ใ ๊ฃ ๐ด ๐ถ ๐น ๐ป ๐พ ๐ ๐ ๐ ๐ ๐ ๐ญ ๐ฑ ๐ ๐ ๐ง ๐ผ ๐ฃ