Subversion Repositories wimsdev

Rev

Rev 6250 | Go to most recent revision | Details | Compare with Previous | Last modification | View Log | RSS feed

Rev Author Line No. Line
20 reyssat 1
 
2
!set p2=!item 2 of $special_parm
3
!if $p2!=$empty
4
 !if $p2=list
5
  !read help/symtext/list.phtml
6
  !if $stylecnt > 0
7
   !exit
8
  !endif
9
 !endif
10
 !set test=!defof style_exists in symtext/$module_language/$p2/def
11
 !if $test=yes
12
  !changeto symtext/$module_language/$p2/help.phtml
13
 !endif
14
!endif
15
 
6133 bpr 16
<h2>Symtext documentation</h2>
20 reyssat 17
 
18
!href cmd=help&special_parm=symtext,list List of symtext styles
6250 bpr 19
.
20
<p>
20 reyssat 21
 
22
Symtext is a natural language parsing syntax. It is designed to make it
23
easier to identify different ways to say the same thing in natural language,
24
and its main purpose is for the recognition of freely typed or composed
25
short text answers to exercises.
26
<p>
27
 
28
Recognition of free text answers is difficult due to the following facts.
29
<ul>
30
 
31
<li>Different context requires different tolerance and precision. A language
32
exercise cannot tolerate spelling or grammar error, which may not be the
33
case for a mathematical exercise.
6250 bpr 34
</li>
20 reyssat 35
<li>Natural language often allows many different ways to say the same thing,
36
between "A or B" and "B or A", "Paul is older than Bill" and "Bill is
37
younger than Paul", "x and y are similar" and "x is similar to y", or even
38
"this costs too much" and "it is too expensive".
6250 bpr 39
</li>
20 reyssat 40
<li>Typing errors are common in freely typed text. In many cases, typing
41
errors should be tolerated. But before an unknown word, it is difficult for
42
the software to tell whether it is a typing error or a bad answer.
6250 bpr 43
</li>
20 reyssat 44
</ul>
45
 
46
In view of the above, the design of symtext has incorporated the following
47
features.
48
 
49
<ul>
50
<li>A nestable syntax allowing the identification of various language
51
alternatives (different ways to say the same thing).
6250 bpr 52
</li>
20 reyssat 53
<li>Macro dictionaries can be defined to help improve the human readability
54
of the matching rules.
6250 bpr 55
</li>
20 reyssat 56
<li>User-definable multiple dictionaries that can be used for various text
57
analysis purposes.
6250 bpr 58
</li>
20 reyssat 59
<li>Designated portions of the text can be output for further processing.
6250 bpr 60
</li>
61
<li>
62
It is based on user-definable styles, with different styles defining
20 reyssat 63
different dictionaries and macros. So they can be used to deal with
64
different context.
6250 bpr 65
</li>
20 reyssat 66
<li>Language scope can be delimited by declaring the list of allowed words.
67
Text containing words not in the list can be considered to be out-scope and
68
be sent back for rephrasing, instead of being rejected as bad answer. A
69
correct use of this feature can solve most of the problems related to typing
70
errors and unexpected answers.
6250 bpr 71
</li>
20 reyssat 72
</ul>
73
 
6250 bpr 74
<hr /><h3>
75
How it works</h3>
76
<p>
20 reyssat 77
Symtext deals with the problem of comparing two sentences. The first is the
78
<em>sample</em> which is typically the answer given to an exercise. It is
79
compared to the second sentence, the <em>tester</em>, which is typically the
80
good answer as declared by the author of the exercise.
6250 bpr 81
</p><p>
20 reyssat 82
 
83
The sample must be plain text in natural language, while the tester may
84
contain <em>symtext rules</em> allowing it to <em>match</em> various samples
85
that are considered to have the same meaning. Such various ways to say the
86
same thing are alternatives in the natural language. The scope of the
87
acceptable alternatives depends on the context of the exercise, therefore
88
must be precisely defined by the author. Symtext is designed to allow
89
authors to make such definitions.
6250 bpr 90
</p><p>
20 reyssat 91
 
92
Symtext rules are word based, that is, it only compares words. A word is a
93
chain of alphabetic characters or digits delimited by spaces or special
94
symbols (parentheses, quotes, punctuations etc.). Any special symbol is
4427 bpr 95
considered as a word by itself. And symtext does not count the number of
20 reyssat 96
space characters between two words: any chain of consecutive space
97
characters will be reduced to one space.
6250 bpr 98
</p><p>
20 reyssat 99
 
100
A set of basic <em>builtin rules</em> are defined in the symtext syntax. For
5903 bpr 101
example, the rule <span class="tt">[Iperm:x,and,y]</span> matches both samples "x and y"
20 reyssat 102
and "y and x". Rules can be nested:
103
<pre>neither [Aperm:[Alt:I,me,we,us],nor,our teacher]</pre>
2000 bpr 104
matches the following 8 cases.
6250 bpr 105
</p><p>
20 reyssat 106
 
107
"neither I nor our teacher", "neither our teacher nor I", "neither me nor
108
our teacher", "neither our teacher nor me", "neither we nor our teacher",
109
"neither our teacher nor we", "neither us nor our teacher", "neither our
110
teacher nor us".
6250 bpr 111
</p><p>
20 reyssat 112
 
113
In general applications, a context <em>style</em> can be declared before
114
making the comparison. A style is a set of dictionaries and options. These
115
include pre-transformation dictionaries that can be used for example to
116
identify singular and plural words before comparison, a macro dictionary
117
that can simplify the writing of tester rules and make it more readable, and
118
user-definable dictionaries for various other purposes.
119
!href cmd=help&special_parm=symtext,list List of styles
120
.
6250 bpr 121
</p><p>
20 reyssat 122
 
5903 bpr 123
For example, a <em>positional macro</em> <span class="tt">_divides</span> can be defined in
124
the macro dictionary, so that the tester <span class="tt">x _divides [y + z]</span> will
20 reyssat 125
match the following samples.
6250 bpr 126
</p><p>
20 reyssat 127
 
128
"x divides y + z", "x is a factor of y + z", "y + z is divisible by x", "y +
129
z is a multiple of x".
6250 bpr 130
</p><p>
20 reyssat 131
 
132
Note here that such a macro is positional, so that the string "y + z" must
133
be enclosed in a pair of brackets to make them look as one word for the
134
macro. Otherwise it will rather match things like "y is a multiple of x +
135
z", which clearly is wrong.
6250 bpr 136
</p><p>
20 reyssat 137
 
138
This example shows that the final power of the syntax depends primarily on the
139
construction of the macro dictionary (which will vary from style to style).
6250 bpr 140
</p><p>
20 reyssat 141
 
142
The tester is a text string containing ordinary words, matching rules and
143
positional macros. An ordinary word is simply compared with the word at the
144
corresponding position in the sample, while matching rules and macros can
145
match multiple possibilities in the sample.
6250 bpr 146
</p><p>
20 reyssat 147
 
148
Before comparison takes place, words in both the sample and the tester may
149
first be transformed in order to identify small differences that one wants
150
to ignore, such as upper and lower cases, singular and plural nouns etc.
6250 bpr 151
</p><p>
20 reyssat 152
 
153
Unlike regular expression, symtext match occurs only if the tester matches
154
the whole sample. Match does not occur if the tester only matches a part of
155
the sample. However, wildcard rules can be included in the tester if part of
156
the sample needs to be ignored.
6250 bpr 157
</p>
20 reyssat 158
 
6250 bpr 159
<hr /><h3>Details of the syntax</h3>
20 reyssat 160
 
161
<b>Definitions</b>. <ul>
162
<li>A <em>tstring</em> is a succession of <em>atoms</em>.
6250 bpr 163
</li><li>An <em>atom</em> is either a <em>word</em>, a <em>bracket block</em> or
20 reyssat 164
a positional macro name.
6250 bpr 165
</li><li>A <em>word</em> is either a list of consecutive alphanumerical
20 reyssat 166
characters or a single special character. In the first case, the word is
167
delimited by either spaces or non-alphanumerical characters.
6250 bpr 168
</li><li>A <em>bracket block</em> is a substring enclosed by a pair of brackets.
20 reyssat 169
It can be either a tstring, or a <em>matching rule</em>.
6250 bpr 170
</li><li>A <em>positional macro</em> is a word (macro name) preceded by the
20 reyssat 171
underline character. The macro name must be defined in the macro dictionary,
172
otherwise the whole atom will be treated as an ordinary word.
6250 bpr 173
</li></ul>
174
<p>
20 reyssat 175
A <em>matching rule</em> may be either builtin or defined in the style macro
176
dictionary. It must be enclosed by a pair of brackets, and the first
177
character must be alphabetic. If the first character is upper-case, it is
178
builtin. otherwise it is a macro.
6250 bpr 179
</p><p>
20 reyssat 180
 
5903 bpr 181
Syntax of the matching rule: <span class="tt">[rule_name:parameters]</span>.
20 reyssat 182
<em>rule_name</em> must start with the first character of the block, it must
183
be a valid rule name, and the colon must immediately follow the name (no
184
spaces inserted). Otherwise the block will be treated as a normal tstring
6250 bpr 185
rather than a rule.
186
</p><p>
20 reyssat 187
 
188
<em>Parameters</em> is a comma-separated list of strings. Each parameter can
189
be a tstring itself (hence can contain nested subrules), except in some
190
special cases of builtin rules where some of the parameter has a special
191
meaning, e.g. the first parameter of the rule <em>Pick</em> must be a
192
positive integer.
6250 bpr 193
</p><p>
20 reyssat 194
 
195
There are also two special bracket blocks that are in fact simplifications
196
of builtin matching rules:
5903 bpr 197
<ul><li>
198
<span class="tt">[A|B|C]</span> is equivalent to <span class="tt">[Alt:A,B,C]</span>. For this
199
reason, the character <span class="tt">|</span> is reserved. To have it matched, write
200
<span class="tt">[|]</span> (or <span class="tt">[Alt:|]</span>).
20 reyssat 201
 
202
 
5903 bpr 203
</li><li>
204
<span class="tt">[**]</span> is equivalent to <span class="tt">[Wild:**]</span>, <span class="tt">[* *]</span> is
205
equivalent to <span class="tt">[Wild:* *]</span>, etc. A block falls into this category if
206
the first character is a '<span class="tt">*</span>'.
20 reyssat 207
 
208
</ul>
209
 
6250 bpr 210
<hr /><h3>Builtin rules</h3>
211
<p>
20 reyssat 212
A builtin rule is a matching rule where the first character of the name is
6250 bpr 213
upper-case.
214
</p><p>
20 reyssat 215
 
216
Any parameter may include the comma character, as long as it is enclosed by
6250 bpr 217
a pair of parentheses or brackets.
218
</p>
20 reyssat 219
 
220
 
221
 
222
!read tabletheme
223
!set wims_backslash_insmath=yes
224
$table_header
225
$table_hdtr
226
<th>name</th>
6249 bpr 227
<th><small>Number of<br />parameters</small></th>
20 reyssat 228
<th>Effect</th>
229
<th>Detail</th>
6249 bpr 230
</tr>
20 reyssat 231
$table_tr
6249 bpr 232
<td>Alt</td>
233
<td align=middle>\(>= 1)</td>
234
<td>Matches any one of the parameters.</td>
235
<td><span class="tt">[Alt:a,b,c d]</span> matches "a", "b" or "c d".</td>
236
</tr>
20 reyssat 237
$table_tr
6249 bpr 238
<td>Aperm</td>
239
<td align=middle>\(>= 3)</td>
240
<td>"And" styled permutation.</td>
5903 bpr 241
<td><span class="tt">[Aperm:[,],and,A,B,C]</span> matches "A, B and C", "B, A and C", etc.
2000 bpr 242
The order of parameters 3 and up is arbitrary, and the first two parameters
20 reyssat 243
are used to insert between them: parameter 1 is inserted except for the
244
last insertion where parameter 2 is inserted.
6249 bpr 245
</td></tr>
20 reyssat 246
$table_tr
6249 bpr 247
<td>Apick</td>
248
<td align=middle>\(>= 4)</td>
249
<td>"And" styled arbitrary selection.</td>
5903 bpr 250
<td><span class="tt">[Apick:3,[,],and,A,B,C,D,E]</span> matches "B, E and A", "C, A and E",
20 reyssat 251
etc. Parameter 1 must be an integer and gives the number of items to pick.
6249 bpr 252
</td></tr>
20 reyssat 253
 
254
$table_tr
6249 bpr 255
<td>Dic</td>
256
<td align=middle>\(1)</td>
257
<td>Dictionary check</td>
5903 bpr 258
<td><span class="tt">[Dic:wordtype transitive verb]</span> matches any word or group of
20 reyssat 259
words that is defined in the dictionary "wordtype", with a definition that
6249 bpr 260
contains an item "transitive verb".
20 reyssat 261
 
262
Note. No word transformation is performed on the parameter of this rule.
6249 bpr 263
</td></tr>
20 reyssat 264
$table_tr
6249 bpr 265
<td>Dperm</td>
266
<td align=middle>\(4)</td>
267
<td>Dependent permutation: parameters to match depend on the sample.</td>
5903 bpr 268
<td><span class="tt">[Dperm:a,b,c,d]</span> matches either "a b c" or "c d a", but nothing
5768 bpr 269
else. For example, <br/>
5903 bpr 270
<span class="tt">[Dperm:x,beats,y,is beaten by]</span> matches either "x beats y" or "y is
20 reyssat 271
beaten by x". Or in French,
5768 bpr 272
<br/>
5903 bpr 273
<span class="tt">il [Dperm:,y,est allé,à Paris]</span> matches either "il y est allé" or
20 reyssat 274
"il est allé à Paris".
6249 bpr 275
</td></tr>
20 reyssat 276
 
277
$table_tr
6249 bpr 278
<td>Ins</td>
279
<td align=middle>\(>= 3)</td>
280
<td>Arbitrary insertion of parameter 1.</td>
5903 bpr 281
<td><span class="tt">[Ins:A,B,C,D,E]</span> matches "B A C D E", "B C A D E", "B C D A E".
20 reyssat 282
Parameter 2 and up must be matched in the given order, while parameter 1 may
283
find its place anywhere between them. <p>
284
 
285
To match "A B C D", "B A C D", "B C A D" and "B C D A", put two empty
5903 bpr 286
parameters: <span class="tt">[Ins:A,,B,C,D,]</span>.
6249 bpr 287
</p></td></tr>
20 reyssat 288
$table_tr
6249 bpr 289
<td>Iperm</td>
290
<td align=middle>\(3)</td>
291
<td>Inter-permutation.</td>
5903 bpr 292
<td><span class="tt">[Iperm:Bill,and,Alice]</span> matches "Bill and Alice" and "Alice and
20 reyssat 293
Bill". But not the three words in any other order.
6249 bpr 294
</td></tr>
20 reyssat 295
$table_tr
6249 bpr 296
<td>M</td>
297
<td align=middle>\(1)</td>
298
<td>Shared macro.</td>
20 reyssat 299
<td>The content (any tstring) of the macro can be shared with other calls (with
300
the same content). This is mainly designed for the macros file, with the aim of
301
reducing the size of compiled ruleset. Moreover, Shared macros can be self-nested
302
(while non-shared ones cannot).
6249 bpr 303
</td></tr>
20 reyssat 304
$table_tr
6249 bpr 305
<td>Neg</td>
306
<td align=middle>\(1)</td>
307
<td>Logical match negation.</td>
20 reyssat 308
<td>This rule returns match if the sample does not match its parameter, and
309
vice versa. <p>
310
In the first case, the rule matches the empty string in the sample.
6249 bpr 311
</p></td></tr>
20 reyssat 312
$table_tr
6249 bpr 313
<td>Nomatch</td>
314
<td align=middle>\(0)</td>
315
<td>This is a synonym of <span class="tt">None</span>.</td>
20 reyssat 316
<td>
6249 bpr 317
</td></tr>
20 reyssat 318
$table_tr
6249 bpr 319
<td>None</td>
320
<td align=middle>\(0)</td>
321
<td>Matches nothing.</td>
322
<td><span class="tt">[None:]</span> always returns no match.</td>
323
</tr>
20 reyssat 324
$table_tr
6249 bpr 325
<td>Not</td>
326
<td align=middle>\(1)</td>
327
<td>This is a synonym of <span class="tt">Neg</span>.</td>
20 reyssat 328
<td>
6249 bpr 329
</td></tr>
20 reyssat 330
$table_tr
6249 bpr 331
<td>Opick</td>
332
<td align=middle>\(>= 2)</td>
333
<td>Matches an ordered subset of given number of parameters.</td>
5903 bpr 334
<td>This rule is as <span class="tt">Pick</span>, except that it only matches subsets that
20 reyssat 335
are in the same order as that given in the parameters.
6249 bpr 336
</td></tr>
20 reyssat 337
$table_tr
6249 bpr 338
<td>Out</td>
339
<td align=middle>\(2)</td>
340
<td>Match plus output</td>
20 reyssat 341
<td>The first parameter is a variable name, and the second parameter can be
342
any combination of words, subrules and macros. If match occurs for the
343
second parameter, the matching text will be put as a value of the variable
344
and output. <p>
345
 
5903 bpr 346
Example. <span class="tt">[Out:myvar,[*]]</span> matches any single word, and if the
20 reyssat 347
matched word is "myword" (in the sample), the match output contains a string
348
"myvar=myword" that can be parsed to know what word the user has entered in
349
this location.
6249 bpr 350
</p>
351
</td></tr>
20 reyssat 352
$table_tr
6249 bpr 353
<td>Perm</td>
354
<td align=middle>\(>= 2)</td>
355
<td>Matches all the parameters in arbitrary order.</td>
5903 bpr 356
<td><span class="tt">[Perm:x,y,z]</span> matches "x y z", "y x z", "z x y" etc.
6249 bpr 357
</td></tr>
20 reyssat 358
$table_tr
6249 bpr 359
<td>Pick</td>
360
<td align=middle>\(>= 2)</td>
361
<td>Matches a subset of given number of parameters in any order.</td>
20 reyssat 362
<td>The first parameter must be a positive integer n. The rule matches any
5768 bpr 363
subset of n parameters within the rest, in any order. <br/>
5903 bpr 364
Example: <span class="tt">[Pick:2,a,b,c,d]</span> matches "a b", "d b", "c a" etc. <br/>
365
<span class="tt">[Pick:3,x,y,z]</span> is equivalent to <span class="tt">[Perm:x,y,z]</span>. <br/>
366
<span class="tt">[Pick:1,a,b,c,d]</span> is equivalent to <span class="tt">[Alt:a,b,c,d]</span>.
20 reyssat 367
 
368
<p>
5903 bpr 369
Extensions: <span class="tt">[Pick:+2,...]</span> matches any subset of at least 2
5768 bpr 370
parameters. <br/>
5903 bpr 371
<span class="tt">[Pick:-3,...]</span> matches any subset of at most 3 parameters (including
20 reyssat 372
the empty subset).
6249 bpr 373
</p>
20 reyssat 374
<p>
375
Known bug: repetition of the same parameter is not recognized.
5903 bpr 376
<span class="tt">[Pick:2,a,b,c,d]</span> does not match "a c c".
6249 bpr 377
</p>
378
</td></tr>
20 reyssat 379
$table_tr
6249 bpr 380
<td>Rep</td>
381
<td align=middle>\(>= 1)</td>
20 reyssat 382
<td>Matches an arbitrary number (at least one) of parameters in any order and
6249 bpr 383
with any repetition.</td>
5903 bpr 384
<td><span class="tt">[Rep:0,1]</span> matches "0 1", "1", "0 1 0 0 1 1 0", etc.
6249 bpr 385
</td></tr>
20 reyssat 386
$table_tr
6249 bpr 387
<td>W</td>
388
<td align=middle>\(0 or 1)</td>
389
<td>Matches words in a list.</td>
20 reyssat 390
<td>This rule matches the next word if it appears somewhere in the tester or
391
if it is a word given in the parameter. <p>
392
 
393
If this rule is put in the last tester line, words in all the tester lines
394
will count.
395
 
396
$table_tr
6249 bpr 397
<td>Wild</td>
398
<td align=middle>\(1)</td>
399
<td>Wildcard word match.</td>
20 reyssat 400
<td>The unique parameter must be composed of words "*", "**", and/or "**n"
401
where n is a positive number. The first matches any single word, the second
402
matches 0 or any number of words, and the third matches from 0 to n
5768 bpr 403
arbitrary words. For example, <br/>
5903 bpr 404
<span class="tt">[Wild:* * **3]</span> matches between 2 to 5 words (inclusive).
6249 bpr 405
</td></tr>
20 reyssat 406
$table_end
407
 
6249 bpr 408
<hr /><h3>Construction of styles</h3>
409
<p>
20 reyssat 410
A style corresponds to a directory and its contents. Under WIMS, the style
411
can either be shared among all modules in the public_html/scripts/symtext
412
directory, or be special to one module, in the module's directory.
6249 bpr 413
</p><p>
20 reyssat 414
 
5903 bpr 415
The style must contain an index file, named <span class="tt">def</span>. It defines the
20 reyssat 416
basic configuration choices of the style. Every line of the file is a
6249 bpr 417
definition under the format <span class="tt">name=value</span>.
418
</p><p>
20 reyssat 419
 
5903 bpr 420
The <span class="tt">def</span> file must contain a definition <span class="tt">style_exists=yes</span>,
20 reyssat 421
otherwise the existence of the style will not be recognized. All the rest is
6249 bpr 422
optional.
423
</p><p>
20 reyssat 424
 
425
It may contain a definition of <em>option</em>, that lists option words that
6249 bpr 426
will always be activated for the style.
427
</p><p>
20 reyssat 428
 
429
It can also define general dictionaries using the name
430
<em>dictionaries</em>. The value must be a list of words, each corresponding
431
to a dictionary file in the style. The number of general dictionaries is
432
limited.
433
 
6249 bpr 434
</p><p>
435
 
20 reyssat 436
For each general dictionary, a variable NAME_unknown can be defined (where
437
NAME should be replaced by the dictionary name), which tells how a word
438
should be treated if it is not found in the dictionary (unknown). The value
5903 bpr 439
may be <span class="tt">delete</span> (default) which means the unknown word should be
440
replaced by an empty string; <span class="tt">leave</span> which will return the unknown
20 reyssat 441
word unchanged; or anything else. In the last case, the value will be used
442
to replace the unknown word.
443
 
6249 bpr 444
</p><p>
445
 
20 reyssat 446
There may also be three dictionary files with reserved names:
5903 bpr 447
<span class="tt">suffix</span>, <span class="tt">trans</span> and <span class="tt">macros</span>. All dictionaries are
448
line dictionaries, with each line in the format <span class="tt">name:definition</span>. Names
449
must be sorted (using the special program <span class="tt">dicsort</span> in the WIMS
20 reyssat 450
package). All of the dictionaries are optional.
451
 
6249 bpr 452
</p><p>
453
 
20 reyssat 454
Both name and definition may contain space characters. However, except macro
455
definitions there is no transformation after the dictionary is read, so only
456
single space characters should be used. The name field should start and end
457
with non-space characters. Multiple definitions with a same name will give
458
unpredictable result.
459
 
6249 bpr 460
</p><p>
461
 
20 reyssat 462
The <em>suffix</em> dictionary is a very special one, that is used to
463
transform word suffixes before any other transformation. It is easy to
464
understand except that in the name field, the suffixes are defined in
465
reverse order.
466
 
6249 bpr 467
</p><p>
468
 
20 reyssat 469
The <em>trans</em> dictionary is used for word replacements after suffix
470
transformation. Both dictionaries will be consulted before any string
471
comparison takes place. For example, if we want to identify nouns under
472
singular and plural forms, we can first use the <em>suffix</em> dictionary
473
to transform plural nouns into singular suffix if they obey a general suffix
474
rule; then for nouns with special plural forms, the <em>trans</em>
475
dictionary can be used to transform them.
476
 
6249 bpr 477
</p><p>
478
 
20 reyssat 479
Both the <em>suffix</em> and <em>trans</em> dictionaries must be constructed
480
to be <em>order 1 stable</em>, that is, if an already transformed string is
481
resubmitted to the dictionary, no further transformation will take place.
482
 
6249 bpr 483
</p><p>
484
 
20 reyssat 485
The <em>macros</em> dictionary contains both definitions for positional
486
macros and macro rules. The former have names starting with the underline
487
character, while the latter starts with lower case letters.
488
 
6249 bpr 489
</p><p>
490
 
20 reyssat 491
The definition of a macro is a tstring that will be used to replace the
492
macro. Macros can be nested, that is, the definition of a macro may contain
493
calls to other macros, in any order. However, infinite nesting loops will
494
result in an error.
495
 
6249 bpr 496
</p><p>
497
 
20 reyssat 498
In order to preserve consistency for positional macros, the definition of
499
any macro must be composed of exactly one atom.
500
 
6249 bpr 501
</p><p>
502
 
20 reyssat 503
Macro definitions may contain parameters. For this purpose, the character
5903 bpr 504
<span class="tt">@</span> has a special meaning in a macro definition. When invoked, it
20 reyssat 505
must be followe by an integer. And the character together with the following
506
integer will be replaced by a macro parameter during macro expansion.
507
 
6249 bpr 508
</p><p>
509
 
5903 bpr 510
For a rule macro, <span class="tt">@1</span> means the first parameter, <span class="tt">@2</span> means
20 reyssat 511
the second parameter, etc. It is an error if the macro is invoked in a
512
tstring without giving enough parameters.
513
 
6249 bpr 514
</p><p>
515
 
5903 bpr 516
For a positional macro, <span class="tt">@1</span> designates the first atom following the
517
macro, while <span class="tt">@-1</span> designates the first atom preceding the macro, etc.
20 reyssat 518
It is also an error if a positional macro is inserted in a tstring without
519
enough atoms before or after it as required by its definition.
520
 
6249 bpr 521
</p><p>
522
 
20 reyssat 523
Care must be taken to the point that a macro parameter may result in several
524
atoms after expansion. This is not a problem unless the macro definition
525
contains a positional macro. In case where it is necessary to ensure the
526
position of parameters, one can enclose the parameter by a pair of brackets,
5903 bpr 527
such as <span class="tt">[@1]</span>.
20 reyssat 528
 
6249 bpr 529
</p><p>
530
 
20 reyssat 531
A general dictionary has the same syntax as the reserved ones. In this case,
532
the definition field can be a comma-separated list of items. These
5903 bpr 533
dictionaries are used via the <span class="tt">Dic</span> builtin rule, which gives a match
20 reyssat 534
if one of the items in the definition is equal to the value given in the
535
parameter of the rule.
536
 
6249 bpr 537
</p>
20 reyssat 538
 
6249 bpr 539
<hr /><h3>The command line program</h3>
20 reyssat 540
 
1996 bpr 541
The command line program <em>symtext</em> is specially built for WIMS, so that
20 reyssat 542
all the input data are sent through environment variables. It can also be
543
used as a standalone program, but in this case it is better that a wrapper
6249 bpr 544
script be used to put the input-output into a more *nix flavor.
20 reyssat 545
 
546
$table_header
547
<caption>List of environment parameters</caption>
548
$table_hdtr<th>Name</th>
549
<th>Value</th>
550
<th>Comments</th>
551
</tr>
552
$table_tr
553
<td>wims_exec_parm</td>
554
<td>The main data input</td>
6249 bpr 555
<td>A multi-line string.
556
<ul>
557
<li>Line 1: command followed by options. Valid commands:
558
 <ul>
5903 bpr 559
  <li><span class="tt">match</span> check matching.
6249 bpr 560
  </li>
5903 bpr 561
  <li><span class="tt">debug</span> check matching with debug information.
6249 bpr 562
  </li>
20 reyssat 563
  </ul>
6249 bpr 564
  </li>
565
<li>Line 2: The sample.</li>
566
<li>Line 3 and up: Each line is a tester.</li>
20 reyssat 567
</ul>
6249 bpr 568
<p>
20 reyssat 569
Lines can also be delimited by the semi-colon. For this reason, semi-colons
570
must be protected by parentheses in both the sample and the tester.
6249 bpr 571
 
572
</p><p>
20 reyssat 573
Options have the same syntax as in the style option definition. With one
5903 bpr 574
more possible definition here: <span class="tt">style=[the_name_of_style]</span>.
20 reyssat 575
</td></tr>
576
 
577
$table_tr
578
<td>module_dir</td>
579
<td>Directory to current module</td>
580
<td>Automatically defined if called by WIMS. If this variable is undefined,
581
then w_symtext must give the complete path of the style.
582
</td></tr>
583
 
584
$table_tr
585
<td>w_module_language</td>
586
<td>Language</td>
587
<td>Only used when called by WIMS. Can be overrun by the "language=" option.
588
</td></tr>
589
 
590
$table_end
591
<p>
592
 
593
Options have two origins: either from the environment variable
594
<em>w_symtext_option</em> or from the <em>def</em> file of the style. The two
595
have the same syntax.
596
 
6249 bpr 597
</p>
598
 
20 reyssat 599
!set option_data=!trim \
600
alnumly,word,Transform everything non-alphabetic and non-digit into space.\
601
alphaonly,word,Transform everything non-alphabetic into space.\
602
deaccent,word,Remove accents from letters before comparison.\
603
debug,word,Output debug information to stderr.\
604
language,value,A two-letter language code.\
605
matchall,word,Match every line of the tester&#44; instead of stopping after the first match.\
606
nocase,word,Fold both texts to lower case before comparison.\
5903 bpr 607
nocs,word,Replace computer-oriented characters by spaces (<span class="tt">_&$#\\@~</span>)\
20 reyssat 608
nomath,word,Replace mathematical operators by spaces (<tt>+-*/=|%<>()_</tt>)\
5903 bpr 609
noparentheses,word,Replace parentheses by spaces (<span class="tt">()[]{}</span>)\
610
nopunct,word,Replace puncuation characters by spaces (<span class="tt">.,;:?!"</span>) except the dot as a decimal point.\
611
noquote,word,Replace quoting characters by spaces (<span class="tt">`'"</span>)\
20 reyssat 612
reaccent,word,Allow composition of accented letters using special characters.\
613
style,value,The style&#44, only valid in the environment parameter.\
614
 
615
 
616
$table_header
617
<caption>List of options</caption>
618
$table_hdtr
619
<th>Name</th>
620
<th>Nature</th>
621
<th>Meaning</th>
622
</tr>
623
 
624
!set n=!linecnt $option_data
625
!for i=1 to $n
626
 !set l=!line $i of $option_data
627
 !distribute item $l into name,nature,meaning
628
 $table_tr
629
 <td>$name</td>
6249 bpr 630
 <td>$nature</td>
20 reyssat 631
 <td>$meaning
632
!next i
633
 
634
$table_end
635
 
6249 bpr 636
</p><p>
637
 
20 reyssat 638
<b>Program output</b>. The output is empty if no match is found.
639
 
6249 bpr 640
 
20 reyssat 641
!set error_data=!nonempty lines \
642
bad_command,Invalid command in the input.\
643
bad_dictionary,Non-existing dictionary specified.\
644
bad_macro,Bad macro name.\
645
bad_macro_position,Positional macro placed in the tester where pre- or post-parameters cannot be found.\
646
bad_pickcnt,Invalid first parameter for Pick.\
647
block_overflow,Too many rules and parameters defined in the tester (before or after macro expansion).\
648
duplication_in_dictionary,A name is defined twice in the indicated dictionary (in the style).\
649
file_too_long,File size exceeded limit.\
650
level_overflow,Too much nesting; probably an internal bug.\
651
list_overflow,A rule contains too many parameters.\
652
macro_level_overflow,Too many recursive macro definitions. Usually it is an infinite loop in the macro dictionary.\
653
name_too_long,Macro or variable name exceeded length limit.\
654
string_too_long,String length limit exceeded.\
655
style_not_found,Inexisting style specified.\
656
syntax_error,Syntax error in a macro or rule.\
657
tag_overflow,Tester expansion is too complicated.\
658
too_many_dictionaries,The number of dictionaries declared in the style has exceeded limit.\
659
unknown_cmd,Unknown matching rule name.\
660
unmatched_parentheses,Unmatched parentheses or brackets.\
661
unsorted_dictionary,The indicated dictionary (in the style) is in bad order.\
662
wrong_parmcnt,A matching rule has a number of parameters that does not meet its definition.\
663
 
664
 
665
$table_header
666
<caption>Error messages</caption>
667
$table_hdtr
668
<th>Message</th>
669
<th>Meaning</th>
670
</tr>
671
 
672
!set n=!linecnt $error_data
673
!for i=1 to $n
674
 !set l=!line $i of $error_data
675
 !distribute items $l into msg,mean
676
 $table_tr
7190 bpr 677
 <td class="tt">$msg</td>
20 reyssat 678
 <td>$mean</td>
679
 </tr>
680
!next i
681
$table_end
682