.\" Automatically generated by Pod::Man 2.22 (Pod::Simple 3.13)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" Set up some character translations and predefined strings. \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote. \*(C+ will
.\" give a nicer C++. Capital omega is used to do unbreakable dashes and
.\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff,
.\" nothing in troff, for use with C<>.
.tr \(*W-
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
.ie n \{\
. ds -- \(*W-
. ds PI pi
. if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
. if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch
. ds L" ""
. ds R" ""
. ds C` ""
. ds C' ""
'br\}
.el\{\
. ds -- \|\(em\|
. ds PI \(*p
. ds L" ``
. ds R" ''
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\"
.\" If the F register is turned on, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD. Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.ie \nF \{\
. de IX
. tm Index:\\$1\t\\n%\t"\\$2"
..
. nr % 0
. rr F
.\}
.el \{\
. de IX
..
.\}
.\" ========================================================================
.\"
.IX Title "Locale::MakePhrase::OSDC2004 3"
.TH Locale::MakePhrase::OSDC2004 3 "2006-03-20" "perl v5.10.1" "User Contributed Perl Documentation"
.\" For nroff, turn off justification. Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH "NAME"
Locale::MakePhrase::OSDC2004
.SH "SYNOPSIS"
.IX Header "SYNOPSIS"
This \s-1POD\s0 is a conference paper for the OpenSource Developers
Conference 2004 (Melbourne, Australia).
.PP
The information that follows is description of a technique which is in
use within the Locale::MakePhrase module.
.SH "DESCRIPTION"
.IX Header "DESCRIPTION"
Language localisation of applications (ie internationalisation of text
strings) can be a complicated problem. Existing solutions
are often based around enumerating or objectifying the message, thus
allowing the output mechanism to display the appropriate string.
Alternatively, we can use a text string as a \fBkey\fR to a mechanism
which returns the language specific string.
.PP
Most translation systems are loosely based around one of these concepts.
For example, the \fBC\fR library implements the \f(CW\*(C`catgets\*(C'\fR function
(among many locale-related functions), which takes a 'message number'
and returns a string based on the language. The \f(CW\*(C`gettext\*(C'\fR function call
implements a similar mechanism.
.PP
The following information describes a Perl module which implements a
(possibly new?) technique which I have termed \fBlinguistic rule evaluation\fR,
ie language rules which can be evaluated at run-time. Using this
technique, it is possible to determine which language phrase to be
output, given the current input phrase.
.PP
Note that this module still requires a linguist to mark-up the application
(in the appropriate language/dialect), except that it provides a more
sophisticated set of tools (than say, gettext) so that when some
text gets displayed, it will more accurately reflect the application
context.
.PP
For further information on the complexities of localising application
strings, please read Locale::Maketext::TPJ13.
.SH "How would you or I speak a phrase in a second language?"
.IX Header "How would you or I speak a phrase in a second language?"
Many people speak more than one language. When a person wants to
translate a phrase from one language to another, they will usually do
something like:
.IP "1." 3
Think of the phrase that you want to say, usually in the language
that they speak most often.
.IP "2." 3
Try to understand what it is they are trying to say; that is, they
determine the context/meaning of the phrase.
.IP "3." 3
Speak that information in the second language.
.PP
The important point is that the information conveyed by the phrase is what
is being translated, and this is determined by all of the various
pieces of information surrounding the phrase (such as, the geographical
region).
.SH "How hard is it really?"
.IX Header "How hard is it really?"
Luckily for us, linguists already have a pretty good idea of how to
translate any given phrase, into a second language. Actually, so does
anyone that can speak more than one language...
.PP
The general philosophy is that for phrases that dont have any of the
\&'fill in the blanks' (as in \*(L"Please choose some _\|_\|_\*(R"), then it is a
relatively simple problem to translate the phrase; its generally
just a matter of knowing the language/region (as in en_au).
.PP
However, for the 'fill in the blanks' phrases, then it is substantially
more complicated, as we have to handle singular, plural, duality, zero,
etc. on a per-language basis.
.PP
But more importantly, each blank that needs to be filled, needs to
be tested to understand what what information it is actually trying
to convey. For example:
.Sp
.RS 2
Lets say that the English phrase is \*(L"Please select _\|_\|_ files\*(R", where
the blank entry is a number. And lets assume that we would like to
display the correct output phrase which matches the 'meaning' of the
phrase for all possible values of blank.
.Sp
Now, if the blank has a value of zero then ideally we would like
to be able to display \*(L"Dont select any files\*(R". To output this phrase
we need to evaluate the value of the blank; if the value is non-zero,
then we would want to display something else.
.RE
.PP
This test/evaluation needs to happen at run-time, as the value of the
blank is not known until just before we output the message; and this
example is just for English \- lets extrapolate this a little bit...
.Sp
.RS 2
What if the blank has a value of one? Ideally we want to output the
phrase \*(L"Only select a single file\*(R".
.Sp
What if the value is two? Should we output \*(L"Please select 2 files\*(R"
or should we output \*(L"Please select two files\*(R"?
.Sp
What happens if the value is really big? What happens if it negative?
.RE
.PP
This example is applicable for English. What about the next language?
Do any of these tests/evaluations apply? If so, how many of these
tests are common to all languages?
.IP "Note:" 6
.IX Item "Note:"
I have just said \*(L"the no-fill-in-the-blanks\*(R" is relatively simple, in the
translation stages. However, this ignores the fact that phrases in
some languages also have gender, age, seniority and other properties
that should be taken into account. This is the subject of further
study.
.SH "Don't assume a property of one language, is applicable to others."
.IX Header "Don't assume a property of one language, is applicable to others."
The previous example highlighted the number of phrases that need
evaluation for English. It turns out that assuming other languages
have similar properties, is simply a misnomer; there is no single
person who would be capable of understanding the nuances of every language.
Thus it is pointless to even try to make a property of one language,
also apply to another.
.PP
Lets look at an example \- Chinese vs English.
.Sp
.RS 2
When translating a phrase with numbers, in most cases the Chinese
phrase wont change for the singular vs plural cases.
.Sp
Whereas Engligh requires two seperate phrases, one for singular
and one for plural.
.RE
.SH "What is a linguistic rule?"
.IX Header "What is a linguistic rule?"
Now that we have discussed how you or I would translate a phrase,
lets explain the concept of a \fBlinguistic rule\fR.
.PP
A linguistic rule contains the properties to encapsulate the technique
of interpreting the meaning of a phrase. When we want to translate
a given phrase into another language, we select the most appropriate
rule from many rules. The choice of the most suitable rules, is part
of the \fBlinuguistic rule evaluation\fR engine.
.PP
A rule has the following properties:
.IP "language" 2
.IX Item "language"
This is an \s-1RFC3066\s0 language tag (eg 'en' or 'en_au').
.IP "key" 2
.IX Item "key"
This is the phrase that is used as the base input phrase. This will
most likely be in the language of the programmer (eg English).
.IP "translation" 2
.IX Item "translation"
The output phrase written in the appropriate language.
.IP "expression" 2
.IX Item "expression"
If the phrase contains variables, this is the expression that is used
to determine if this output phrase should be the phrase that is
chosen.
.IP "priority" 2
.IX Item "priority"
In some circumstances, there may be multiple expressions which
evaluate to be true. The priority is used to determine which expression
to evaluate first.
.PP
A linguistic rule, from programmers point of view, is a struct which
contains enough information to enable us to implement an
equivalent process as that of a linguist.
.SH "Text syntax"
.IX Header "Text syntax"
Before we describe some of the details, we should explain the syntax
of the application text.
.PP
Whenever we want an application value to take part in the phrase, we
use the syntax:
.PP
.Vb 1
\& "This is some phrase, with a [_1] value that is to be run\-time evaluated"
.Ve
.PP
The square brackets indicate that a program value is going to be
passed to the translation engine. Some application strings dont have
any program arguements, while others will have many.
.SH "Expression syntax"
.IX Header "Expression syntax"
The sytax of an expression, is of the form:
.PP
.Vb 1
\& _X op val
.Ve
.PP
where:
.PP
.Vb 4
\& X \- numerical application argument; the underscore indicates
\& that the value is an argument, not a value
\& op \- evaluation operator
\& val \- the value to tested against
.Ve
.PP
An example of an expression, for English:
.PP
.Vb 3
\& _1 == 0
\& _2 > 1
\& left(_3,5) eq "house"
.Ve
.SH "Linguistic rule evaluation"
.IX Header "Linguistic rule evaluation"
To summarise, the engine implements the following:
.IP "1." 3
Find all language rules where the key matches the input phrase, for
the corresponding language tag. Note that the implementation supports
the concept of fallback languages (eg: 'en_au' falls back to 'en').
The linguistic rules for the fallback languages which match the key,
are also retrieved.
.IP "2." 3
Sort the rules based on a combination of the priority, the language
tag (eg 'en_au' has higher precedence than 'en') and whether a non-null
expression exists. Rules with no expression have the lowest priority.
.IP "3." 3
Evalute the expression from each rule, starting with the highest
priority. The first rule to evaluate to \f(CW\*(C`true\*(C'\fR, is chosen.
.IP "4." 3
Apply the arguments in-place, to the selected rule's translated value.
.SH "Features"
.IX Header "Features"
.IP "\(bu" 2
Support an arbitrary number of blanks to fill.
.IP "\(bu" 2
Be able to swap the ordering of the blanks, ie positional argument 2
needs to be able to be the first blank to fill.
.IP "\(bu" 2
Allow translations in dialects of a language to be output, in
preference to the corresponding translation in the base language.
.IP "\(bu" 2
Support multiple types of backing stores, eg: single file for all
languages, a file per language or a database.
.SH "Example"
.IX Header "Example"
The Locale::MakePhrase tarball contains test cases. These are
used as working examples...
.SH "Further development"
.IX Header "Further development"
Gender... Age...
.Sp
.RS 2
As an examples, lets say that we were talking about a person,
specifically a female child. In Italian the term used would be
\&'bambina'; for a male child it is 'bambino'. Thus in this case,
the context surrounding the phrase will include the age and gender
of the child.
.RE
.PP
How do we handle this? Future development may revolve around the
support of gender, age and seniority.
.PP
Each of these three properties need to be considered from the point
of view of the speaker as well as the receiver. Since the speaker
is simply a computer, one possible scenario is to pass the age and
gender of the user, as arguments to the constructor of the translation
instance.
Copyright 2K16 - 2K18 Indonesian Hacker Rulez