!set p2=!item 2 of $special_parm
!if $p2!=$empty
!read help
/symtext
/list.phtml
!if $stylecnt > 0
!endif
!endif
!set test=!defof style_exists in symtext/$module_language/$p2/def
!if $test=yes
!changeto symtext/$module_language/$p2/help.phtml
!endif
!endif
<h2>Symtext documentation</h2>
!href cmd
=help
&special_parm
=symtext
,list List of symtext styles
.
<p>
Symtext is a natural language parsing syntax. It is designed to make it
easier to identify different ways to say the same thing in natural language,
and its main purpose is for the recognition of freely typed or composed
short text answers to exercises.
<p>
Recognition of free text answers is difficult due to the following facts.
<ul>
<li>Different context requires different tolerance and precision. A language
exercise cannot tolerate spelling or grammar error, which may not be the
case for a mathematical exercise.
</li>
<li>Natural language often allows many different ways to say the same thing,
between "A or B" and "B or A", "Paul is older than Bill" and "Bill is
younger than Paul", "x and y are similar" and "x is similar to y", or even
"this costs too much" and "it is too expensive".
</li>
<li>Typing errors are common in freely typed text. In many cases, typing
errors should be tolerated. But before an unknown word, it is difficult for
the software to tell whether it is a typing error or a bad answer.
</li>
</ul>
In view of the above, the design of symtext has incorporated the following
features.
<ul>
<li>A nestable syntax allowing the identification of various language
alternatives (different ways to say the same thing).
</li>
<li
>Macro dictionaries can be
defined to help improve the human readability
of the matching rules.
</li>
<li>User-definable multiple dictionaries that can be used for various text
analysis purposes.
</li>
<li>Designated portions of the text can be output for further processing.
</li>
<li>
It is based on user-definable styles, with different styles defining
different dictionaries and macros. So they can be used to deal with
different context.
</li>
<li
>Language scope can be delimited by declaring the
list of allowed words
.
Text containing words not in the
list can be considered to be out
-scope and
be sent back for rephrasing, instead of being rejected as bad answer. A
correct use of this feature can solve most of the problems related to typing
errors and unexpected answers.
</li>
</ul>
<hr><h3>
How it works</h3>
<p>
Symtext deals with the problem of comparing two sentences. The first is the
<em>sample</em> which is typically the answer given to an exercise. It is
compared to the second sentence, the <em>tester</em>, which is typically the
good answer as declared by the author of the exercise.
</p><p>
The sample must be plain text in natural language, while the tester may
contain <em>symtext rules</em> allowing it to <em>match</em> various samples
that are considered to have the same meaning. Such various ways to say the
same thing are alternatives in the natural language. The scope of the
acceptable alternatives depends on the context of the exercise, therefore
must be precisely
defined by the author
. Symtext is designed to allow
authors to make such definitions.
</p><p>
Symtext rules are word based, that is, it only compares words. A word is a
chain of alphabetic characters or digits delimited by spaces or special
symbols (parentheses, quotes, punctuations etc.). Any special symbol is
considered
as a word by itself
. And symtext does not
count the number of
space characters between two words: any chain of consecutive space
characters will be reduced to one space.
</p><p>
A set of basic
<em
>builtin rules
</em
> are
defined in the symtext syntax
. For
example, the rule <span class="tt">[Iperm:x,and,y]</span> matches both samples "x and y"
and "y and x". Rules can be nested:
<pre>neither [Aperm:[Alt:I,me,we,us],nor,our teacher]</pre>
matches the following 8 cases.
</p><p>
"neither I nor our teacher", "neither our teacher nor I", "neither me nor
our teacher", "neither our teacher nor me", "neither we nor our teacher",
"neither our teacher nor we", "neither us nor our teacher", "neither our
teacher nor us".
</p><p>
In general applications, a context <em>style</em> can be declared before
making the comparison. A style is a set of dictionaries and options. These
include pre-transformation dictionaries that can be used for example to
identify singular and plural words before comparison, a macro dictionary
that can simplify the writing of tester rules and make it more readable, and
user-definable dictionaries for various other purposes.
!href cmd
=help
&special_parm
=symtext
,list List of styles
.
</p><p>
For example
, a
<em
>positional macro
</em
> <span
class="tt">_divides
</span
> can be
defined in
the macro dictionary, so that the tester <span class="tt">x _divides [y + z]</span> will
match the following samples.
</p><p>
"x divides y + z", "x is a factor of y + z", "y + z is divisible by x", "y +
z is a multiple of x".
</p><p>
Note here that such a macro is positional, so that the string "y + z" must
be enclosed in a pair of brackets to make them look as one word for the
macro. Otherwise it will rather match things like "y is a multiple of x +
z", which clearly is wrong.
</p><p>
This example shows that the final power of the syntax depends primarily on the
construction of the macro dictionary (which will vary from style to style).
</p><p>
The tester is a text string containing ordinary words, matching rules and
positional macros. An ordinary word is simply compared with the word at the
corresponding position in the sample, while matching rules and macros can
match multiple possibilities in the sample.
</p><p>
Before comparison takes place, words in both the sample and the tester may
first be transformed in order to identify small differences that one wants
to ignore, such as upper and lower cases, singular and plural nouns etc.
</p><p>
Unlike regular expression, symtext match occurs only if the tester matches
the whole sample. Match does not occur if the tester only matches a part of
the sample. However, wildcard rules can be included in the tester if part of
the sample needs to be ignored.
</p>
<hr><h3>Details of the syntax</h3>
<b>Definitions</b>. <ul>
<li>A <em>tstring</em> is a succession of <em>atoms</em>.
</li><li>An <em>atom</em> is either a <em>word</em>, a <em>bracket block</em> or
a positional macro name.
</li
><li
>A
<em
>word
</em
> is either a
list of consecutive alphanumerical
characters or a single special character. In the first case, the word is
delimited by either spaces or non-alphanumerical characters.
</li><li>A <em>bracket block</em> is a substring enclosed by a pair of brackets.
It can be either a tstring, or a <em>matching rule</em>.
</li><li>A <em>positional macro</em> is a word (macro name) preceded by the
underline character
. The macro name must be
defined in the macro dictionary
,
otherwise the whole atom will be treated as an ordinary word.
</li></ul>
<p>
A
<em
>matching rule
</em
> may be either builtin or
defined in the style macro
dictionary. It must be enclosed by a pair of brackets, and the first
character must be alphabetic. If the first character is upper-case, it is
builtin. otherwise it is a macro.
</p><p>
Syntax of the matching rule: <span class="tt">[rule_name:parameters]</span>.
<em>rule_name</em> must start with the first character of the block, it must
be a valid rule name, and the colon must immediately follow the name (no
spaces inserted). Otherwise the block will be treated as a normal tstring
rather than a rule.
</p><p>
<em
>Parameters
</em
> is a comma
-separated
list of strings
. Each parameter can
be a tstring itself (hence can contain nested subrules), except in some
special cases of builtin rules where some of the parameter has a special
meaning, e.g. the first parameter of the rule <em>Pick</em> must be a
positive integer.
</p><p>
There are also two special bracket blocks that are in fact simplifications
of builtin matching rules:
<ul><li>
<span class="tt">[A|B|C]</span> is equivalent to <span class="tt">[Alt:A,B,C]</span>. For this
reason, the character <span class="tt">|</span> is reserved. To have it matched, write
<span class="tt">[|]</span> (or <span class="tt">[Alt:|]</span>).
</li><li>
<span class="tt">[**]</span> is equivalent to <span class="tt">[Wild:**]</span>, <span class="tt">[* *]</span> is
equivalent to <span class="tt">[Wild:* *]</span>, etc. A block falls into this category if
the first character is a '<span class="tt">*</span>'.
</ul>
<hr><h3>Builtin rules</h3>
<p>
A builtin rule is a matching rule where the first character of the name is
upper-case.
</p><p>
Any parameter may include the comma character, as long as it is enclosed by
a pair of parentheses or brackets.
</p>
!read tabletheme
!set wims_backslash_insmath=yes
$table_header
$table_hdtr
<th>name</th>
<th><small>Number of<br>parameters</small></th>
<th>Effect</th>
<th>Detail</th>
</tr>
$table_tr
<td>Alt</td>
<td align=middle>\(>= 1)</td>
<td>Matches any one of the parameters.</td>
<td><span class="tt">[Alt:a,b,c d]</span> matches "a", "b" or "c d".</td>
</tr>
$table_tr
<td>Aperm</td>
<td align=middle>\(>= 3)</td>
<td>"And" styled permutation.</td>
<td><span class="tt">[Aperm:[,],and,A,B,C]</span> matches "A, B and C", "B, A and C", etc.
The order of parameters 3 and up is arbitrary, and the first two parameters
are used to insert between them: parameter 1 is inserted except for the
last insertion where parameter 2 is inserted.
</td></tr>
$table_tr
<td>Apick</td>
<td align=middle>\(>= 4)</td>
<td>"And" styled arbitrary selection.</td>
<td><span class="tt">[Apick:3,[,],and,A,B,C,D,E]</span> matches "B, E and A", "C, A and E",
etc. Parameter 1 must be an integer and gives the number of items to pick.
</td></tr>
$table_tr
<td>Dic</td>
<td align=middle>\(1)</td>
<td>Dictionary check</td>
<td><span class="tt">[Dic:wordtype transitive verb]</span> matches any word or group of
words that is
defined in the dictionary
"wordtype", with a definition that
contains an item "transitive verb".
Note. No word transformation is performed on the parameter of this rule.
</td></tr>
$table_tr
<td>Dperm</td>
<td align=middle>\(4)</td>
<td>Dependent permutation: parameters to match depend on the sample.</td>
<td><span class="tt">[Dperm:a,b,c,d]</span> matches either "a b c" or "c d a", but nothing
else. For example, <br/>
<span class="tt">[Dperm:x,beats,y,is beaten by]</span> matches either "x beats y" or "y is
beaten by x". Or in French,
<br/>
<span class="tt">il [Dperm:,y,est allé,à Paris]</span> matches either "il y est allé" or
"il est allé à Paris".
</td></tr>
$table_tr
<td>Ins</td>
<td align=middle>\(>= 3)</td>
<td>Arbitrary insertion of parameter 1.</td>
<td><span class="tt">[Ins:A,B,C,D,E]</span> matches "B A C D E", "B C A D E", "B C D A E".
Parameter 2 and up must be matched in the given order, while parameter 1 may
find its place anywhere between them. <p>
To match
"A B C D", "B A C D", "B C A D" and
"B C D A", put two
empty
parameters: <span class="tt">[Ins:A,,B,C,D,]</span>.
</p></td></tr>
$table_tr
<td>Iperm</td>
<td align=middle>\(3)</td>
<td>Inter-permutation.</td>
<td><span class="tt">[Iperm:Bill,and,Alice]</span> matches "Bill and Alice" and "Alice and
Bill". But not the three words in any other order.
</td></tr>
$table_tr
<td>M</td>
<td align=middle>\(1)</td>
<td>Shared macro.</td>
<td>The content (any tstring) of the macro can be shared with other calls (with
the same content
). This is mainly designed
for the macros
file, with the aim of
reducing the size of compiled ruleset. Moreover, Shared macros can be self-nested
(while non-shared ones cannot).
</td></tr>
$table_tr
<td>Neg</td>
<td align=middle>\(1)</td>
<td>Logical match negation.</td>
<td>This rule returns match if the sample does not match its parameter, and
vice versa. <p>
In the first
case, the rule matches the
empty string in the sample
.
</p></td></tr>
$table_tr
<td>Nomatch</td>
<td align=middle>\(0)</td>
<td>This is a synonym of <span class="tt">None</span>.</td>
<td>
</td></tr>
$table_tr
<td>None</td>
<td align=middle>\(0)</td>
<td>Matches nothing.</td>
<td><span class="tt">[None:]</span> always returns no match.</td>
</tr>
$table_tr
<td>Not</td>
<td align=middle>\(1)</td>
<td>This is a synonym of <span class="tt">Neg</span>.</td>
<td>
</td></tr>
$table_tr
<td>Opick</td>
<td align=middle>\(>= 2)</td>
<td>Matches an ordered subset of given number of parameters.</td>
<td>This rule is as <span class="tt">Pick</span>, except that it only matches subsets that
are in the same order as that given in the parameters.
</td></tr>
$table_tr
<td>Out</td>
<td align=middle>\(2)</td>
<td>Match plus output</td>
<td>The first parameter is a variable name, and the second parameter can be
any combination of words, subrules and macros. If match occurs for the
second parameter, the matching text will be put as a value of the variable
and output. <p>
Example. <span class="tt">[Out:myvar,[*]]</span> matches any single word, and if the
matched word is "myword" (in the sample), the match output contains a string
"myvar=myword" that can be parsed to know what word the user has entered in
this location.
</p>
</td></tr>
$table_tr
<td>Perm</td>
<td align=middle>\(>= 2)</td>
<td>Matches all the parameters in arbitrary order.</td>
<td><span class="tt">[Perm:x,y,z]</span> matches "x y z", "y x z", "z x y" etc.
</td></tr>
$table_tr
<td>Pick</td>
<td align=middle>\(>= 2)</td>
<td>Matches a subset of given number of parameters in any order.</td>
<td>The first parameter must be a positive integer n. The rule matches any
subset of n parameters within the rest, in any order. <br/>
Example: <span class="tt">[Pick:2,a,b,c,d]</span> matches "a b", "d b", "c a" etc. <br/>
<span class="tt">[Pick:3,x,y,z]</span> is equivalent to <span class="tt">[Perm:x,y,z]</span>. <br/>
<span class="tt">[Pick:1,a,b,c,d]</span> is equivalent to <span class="tt">[Alt:a,b,c,d]</span>.
<p>
Extensions: <span class="tt">[Pick:+2,...]</span> matches any subset of at least 2
parameters. <br/>
<span class="tt">[Pick:-3,...]</span> matches any subset of at most 3 parameters (including
</p>
<p>
Known bug: repetition of the same parameter is not recognized.
<span class="tt">[Pick:2,a,b,c,d]</span> does not match "a c c".
</p>
</td></tr>
$table_tr
<td>Rep</td>
<td align=middle>\(>= 1)</td>
<td>Matches an arbitrary number (at least one) of parameters in any order and
with any repetition.</td>
<td><span class="tt">[Rep:0,1]</span> matches "0 1", "1", "0 1 0 0 1 1 0", etc.
</td></tr>
$table_tr
<td>W</td>
<td align=middle>\(0 or 1)</td>
<td
>Matches words in a
list.</td
>
<td
>This rule matches the
next word
if it appears somewhere in the tester or
if it is a word given in the parameter. <p>
If this rule is put in the last tester line, words in all the tester lines
$table_tr
<td>Wild</td>
<td align=middle>\(1)</td>
<td>Wildcard word match.</td>
<td>The unique parameter must be composed of words "*", "**", and/or "**n"
where n is a positive number. The first matches any single word, the second
matches 0 or any number of words, and the third matches from 0 to n
arbitrary words. For example, <br/>
<span class="tt">[Wild:* * **3]</span> matches between 2 to 5 words (inclusive).
</td></tr>
$table_end
<hr><h3>Construction of styles</h3>
<p>
A style corresponds to a directory and its contents. Under WIMS, the style
can either be shared among all modules in the public_html/scripts/symtext
directory, or be special to one module, in the module's directory.
</p><p>
The style must contain an index file, named <span class="tt">def</span>. It defines the
basic configuration choices of the style. Every line of the file is a
definition under the format <span class="tt">name=value</span>.
</p><p>
The <span class="tt">def</span> file must contain a definition <span class="tt">style_exists=yes</span>,
otherwise the existence of the style will not be recognized. All the rest is
optional.
</p><p>
It may contain a definition of <em>option</em>, that lists option words that
will always be activated for the style.
</p><p>
It can also define general dictionaries using the name
<em>dictionaries</em>. The value must be a list of words, each corresponding
to a dictionary file in the style. The number of general dictionaries is
limited.
</p><p>
For each general dictionary, a variable NAME_unknown can be defined (where
NAME should be replaced by the dictionary name), which tells how a word
should be treated if it is not found in the dictionary (unknown). The value
may be <span class="tt">delete</span> (default) which means the unknown word should be
replaced by an empty string; <span class="tt">leave</span> which will return the unknown
word unchanged; or anything else. In the last case, the value will be used
to replace the unknown word.
</p><p>
There may also be three dictionary files with reserved names:
<span class="tt">suffix</span>, <span class="tt">trans</span> and <span class="tt">macros</span>. All dictionaries are
line dictionaries, with each line in the format <span class="tt">name:definition</span>. Names
must be sorted (using the special program <span class="tt">dicsort</span> in the WIMS
package). All of the dictionaries are optional.
</p><p>
Both name and definition may contain space characters. However, except macro
definitions there is no transformation after the dictionary is read, so only
single space characters should be used. The name field should start and end
with non-space characters. Multiple definitions with a same name will give
unpredictable result.
</p><p>
The <em>suffix</em> dictionary is a very special one, that is used to
transform word suffixes before any other transformation. It is easy to
understand except that in the name field, the suffixes are defined in
reverse order.
</p><p>
The <em>trans</em> dictionary is used for word replacements after suffix
transformation. Both dictionaries will be consulted before any string
comparison takes place. For example, if we want to identify nouns under
singular and plural forms, we can first use the <em>suffix</em> dictionary
to transform plural nouns into singular suffix if they obey a general suffix
rule; then for nouns with special plural forms, the <em>trans</em>
dictionary can be used to transform them.
</p><p>
Both the <em>suffix</em> and <em>trans</em> dictionaries must be constructed
to be <em>order 1 stable</em>, that is, if an already transformed string is
resubmitted to the dictionary, no further transformation will take place.
</p><p>
The <em>macros</em> dictionary contains both definitions for positional
macros and macro rules. The former have names starting with the underline
character, while the latter starts with lower case letters.
</p><p>
The definition of a macro is a tstring that will be used to replace the
macro. Macros can be nested, that is, the definition of a macro may contain
calls to other macros, in any order. However, infinite nesting loops will
result in an error.
</p><p>
In order to preserve consistency for positional macros, the definition of
any macro must be composed of exactly one atom.
</p><p>
Macro definitions may contain parameters. For this purpose, the character
<span class="tt">@</span> has a special meaning in a macro definition. When invoked, it
must be followe by an integer. And the character together with the following
integer will be replaced by a macro parameter during macro expansion.
</p><p>
For a rule macro, <span class="tt">@1</span> means the first parameter, <span class="tt">@2</span> means
the second parameter, etc. It is an error if the macro is invoked in a
tstring without giving enough parameters.
</p><p>
For a positional macro, <span class="tt">@1</span> designates the first atom following the
macro, while <span class="tt">@-1</span> designates the first atom preceding the macro, etc.
It is also an error if a positional macro is inserted in a tstring without
enough atoms before or after it as required by its definition.
</p><p>
Care must be taken to the point that a macro parameter may result in several
atoms after expansion. This is not a problem unless the macro definition
contains a positional macro. In case where it is necessary to ensure the
position of parameters, one can enclose the parameter by a pair of brackets,
such as <span class="tt">[@1]</span>.
</p><p>
A general dictionary has the same syntax as the reserved ones. In this case,
the definition field can be a comma-separated list of items. These
dictionaries are used via the <span class="tt">Dic</span> builtin rule, which gives a match
if one of the items in the definition is equal to the value given in the
parameter of the rule.
</p>
<hr><h3>The command line program</h3>
The command line program <em>symtext</em> is specially built for WIMS, so that
all the input data are sent through environment variables. It can also be
used as a standalone program, but in this case it is better that a wrapper
script be used to put the input-output into a more *nix flavor.
$table_header
<caption>List of environment parameters</caption>
$table_hdtr<th>Name</th>
<th>Value</th>
<th>Comments</th>
</tr>
$table_tr
<td>wims_exec_parm</td>
<td>The main data input</td>
<td>A multi-line string.
<ul>
<li>Line 1: command followed by options. Valid commands:
<ul>
<li><span class="tt">match</span> check matching.
</li>
<li><span class="tt">debug</span> check matching with debug information.
</li>
</ul>
</li>
<li>Line 2: The sample.</li>
<li>Line 3 and up: Each line is a tester.</li>
</ul>
<p>
Lines can also be delimited by the semi-colon. For this reason, semi-colons
must be protected by parentheses in both the sample and the tester.
</p><p>
Options have the same syntax as in the style option definition. With one
more possible definition here: <span class="tt">style=[the_name_of_style]</span>.
</td></tr>
$table_tr
<td>module_dir</td>
<td>Directory to current module</td>
<td>Automatically defined if called by WIMS. If this variable is undefined,
then w_symtext must give the complete path of the style.
</td></tr>
$table_tr
<td>w_module_language</td>
<td>Language</td>
<td>Only used when called by WIMS. Can be overrun by the "language=" option.
</td></tr>
$table_end
<p>
Options have two origins: either from the environment variable
<em>w_symtext_option</em> or from the <em>def</em> file of the style. The two
have the same syntax.
</p>
!set option_data=!trim \
alnumly,word,Transform everything non-alphabetic and non-digit into space.\
alphaonly,word,Transform everything non-alphabetic into space.\
deaccent,word,Remove accents from letters before comparison.\
debug,word,Output debug information to stderr.\
language,value,A two-letter language code.\
matchall,word,Match every line of the tester, instead of stopping after the first match.\
nocase,word,Fold both texts to lower case before comparison.\
nocs,word,Replace computer-oriented characters by spaces (<span class="tt">_&$#\\@~</span>)\
nomath,word,Replace mathematical operators by spaces (<tt>+-*/=|%<>()_</tt>)\
noparentheses,word,Replace parentheses by spaces (<span class="tt">()[]{}</span>)\
nopunct,word,Replace puncuation characters by spaces (<span class="tt">.,;:?!"</span>) except the dot as a decimal point.\
noquote,word,Replace quoting characters by spaces (<span class="tt">`'"</span>)\
reaccent,word,Allow composition of accented letters using special characters.\
style,value,The style,, only valid in the environment parameter.\
$table_header
<caption>List of options</caption>
$table_hdtr
<th>Name</th>
<th>Nature</th>
<th>Meaning</th>
</tr>
!set n=!linecnt $option_data
!for i=1 to $n
!set l=!line $i of $option_data
!distribute item $l into name,nature,meaning
$table_tr
<td>$name</td>
<td>$nature</td>
<td>$meaning
!next i
$table_end
</p><p>
<b>Program output</b>. The output is empty if no match is found.
!set error_data=!nonempty lines \
bad_command,Invalid command in the input.\
bad_dictionary,Non-existing dictionary specified.\
bad_macro,Bad macro name.\
bad_macro_position,Positional macro placed in the tester where pre- or post-parameters cannot be found.\
bad_pickcnt,Invalid first parameter for Pick.\
block_overflow,Too many rules and parameters defined in the tester (before or after macro expansion).\
duplication_in_dictionary,A name is defined twice in the indicated dictionary (in the style).\
file_too_long,File size exceeded limit.\
level_overflow,Too much nesting; probably an internal bug.\
list_overflow,A rule contains too many parameters.\
macro_level_overflow,Too many recursive macro definitions. Usually it is an infinite loop in the macro dictionary.\
name_too_long,Macro or variable name exceeded length limit.\
string_too_long,String length limit exceeded.\
style_not_found,Inexisting style specified.\
syntax_error,Syntax error in a macro or rule.\
tag_overflow,Tester expansion is too complicated.\
too_many_dictionaries,The number of dictionaries declared in the style has exceeded limit.\
unknown_cmd,Unknown matching rule name.\
unmatched_parentheses,Unmatched parentheses or brackets.\
unsorted_dictionary,The indicated dictionary (in the style) is in bad order.\
wrong_parmcnt,A matching rule has a number of parameters that does not meet its definition.\
$table_header
<caption>Error messages</caption>
$table_hdtr
<th>Message</th>
<th>Meaning</th>
</tr>
!set n=!linecnt $error_data
!for i=1 to $n
!set l=!line $i of $error_data
!distribute items $l into msg,mean
$table_tr
<td class="tt">$msg</td>
<td>$mean</td>
</tr>
!next i
$table_end