Subversion Repositories wimsdev

Rev

Rev 6876 | Go to most recent revision | Show entire file | Ignore whitespace | Details | Blame | Last modification | View Log | RSS feed

Rev 6876 Rev 6879
Line 18... Line 18...
18
 
18
 
19
(the scripts must be run in the order given here, as some files
19
(the scripts must be run in the order given here, as some files
20
created on earlier stages are used in subsequent stages). In general
20
created on earlier stages are used in subsequent stages). In general
21
the whole process is run by the script ~/bin/mkindex.
21
the whole process is run by the script ~/bin/mkindex.
22
 
22
 
23
* Firstly a series of 3 perl scripts (mkdomain,mkwgrp,modindclass), 
23
* Firstly a series of 3 perl scripts (mkdomain, mkwgrp, modindclass), 
24
that ~/bin/mkindex.sh calls via ~/public_html/bases/sys/mkindex.sh : 
24
that ~/bin/mkindex.sh calls via ~/public_html/bases/sys/mkindex.sh : 
25
 
25
 
26
- the programm ~/public_html/bases/sys/mkdomain.pl creates the lists
26
- the programm ~/public_html/bases/sys/mkdomain.pl creates the lists
27
  of domains from the graph in domain/domain with its translations
27
  of domains from the graph in domain/domain with its translations
28
  (domain/domain.$lang) and in json format (english) to be used for
28
  (domain/domain.$lang) and in json format (english) to be used for
29
  completion in modtool properties
29
  completion in modtool properties ; create also the domain/domaindic.xx
-
 
30
  to be used as a dictionnary in modind and in the search engine
30
 
31
 
31
- the perl program ~/public_html/bases/sys/mkwgrp.pl reads the INDEX
32
- the perl program ~/public_html/bases/sys/mkwgrp.pl reads the INDEX
32
  files of all the modules on the site and generates 
33
  files of all the modules on the site and generates 
33
 
34
 
34
  - keywords (in format .json) to be used for completion in the search
35
  - keywords (in format .json) to be used for completion in the search
Line 42... Line 43...
42
  Some files are created in keywords as keywords/algebra.fr.tmp, but
43
  Some files are created in keywords as keywords/algebra.fr.tmp, but
43
  not used for the moment. The keywords in these "keywords file" are
44
  not used for the moment. The keywords in these "keywords file" are
44
  exactly those in the variable keywords (or keywords_$lang if it
45
  exactly those in the variable keywords (or keywords_$lang if it
45
  exists), doing it with the following rules: taking keywords_$lang if
46
  exists), doing it with the following rules: taking keywords_$lang if
46
  it exists, or keywords (whatever it is a $lang-module or not).
47
  it exists, or keywords (whatever it is a $lang-module or not).
-
 
48
  It adds also the lang version of the domains (see domain/domain.xx).
47
 
49
 
48
- the program ~/public_html/bases/sys/modindclass.pl creates the lists
50
- the program ~/public_html/bases/sys/modindclass.pl creates the lists
49
  of keywords coming from the example classes in
51
  of keywords coming from the example classes in
50
  ~/public_html/bases/class as well as the files author,
52
  ~/public_html/bases/class as well as the files author,
51
  description, language, level, title (no ranking is done).
53
  description, language, level, title (no ranking is done).
-
 
54
 
-
 
55
Be careful : to be used as dictionary, must be sorted by the command
-
 
56
  bin/dicsort (for example for domaindic).
52
 
57
 
53
* Secondly the binary program "modind" (compiled from ~/src/Misc/modind.c) reads 
58
* Secondly the binary program "modind" (compiled from ~/src/Misc/modind.c) reads 
54
 
59
 
55
  -- the INDEX files of all the modules on the site 
60
  -- the INDEX files of all the modules on the site 
56
  -- the auxiliary files in ~/public_html/bases/sys/ (see description
61
  -- the auxiliary files in ~/public_html/bases/sys/ (see description
57
     below)
62
     below)
Line 66... Line 71...
66
  ranking of the site's modules. The modules are classified according
71
  ranking of the site's modules. The modules are classified according
67
  to their types: A=all (except sheet and classes), D=document, O=OEF,
72
  to their types: A=all (except sheet and classes), D=document, O=OEF,
68
  X=exercise, T= tool, R=recreation, M= data module.
73
  X=exercise, T= tool, R=recreation, M= data module.
69
 
74
 
70
  To do that, "modind" uses some dictionnaries in
75
  To do that, "modind" uses some dictionnaries in
71
  ~/public_html/bases/sys/ (as suffix.$search_lang, wgrp, ...)
76
  ~/public_html/bases/sys/ (as suffix.xx, wgrp, domaindic.xx ...)
72
 
77
 
73
  -- separately "modind" reads also the files in
78
  -- separately "modind" reads also the files in
74
  ~/public_html/bases/sys/sheet and do the same type of works
79
  ~/public_html/bases/sys/sheet and do the same type of works.
75
 
80
 
76
 
81
 
77
2) use of index files       
82
2) use of index files       
78
===========================
83
===========================
79
The script ~/public_html/modules/home/search.proc (called by the
84
The script ~/public_html/modules/home/search.proc (called by the
Line 98... Line 103...
98
The files in this directory ~/public_html/bases/sys/ are automatically
103
The files in this directory ~/public_html/bases/sys/ are automatically
99
generated (on install) by the corresponding ".src" file in the "src"
104
generated (on install) by the corresponding ".src" file in the "src"
100
subdirectory, if it exists.
105
subdirectory, if it exists.
101
 
106
 
102
If any of the files described below is omitted, then the corresponding
107
If any of the files described below is omitted, then the corresponding
103
feature in the corresponding language is disabled. E.g. the files
108
feature in the corresponding language is disabled.
104
words.fr/words.fr.src and suffix.fr/suffix.fr.src will be/have been
-
 
105
deleted in order to make the search engine correctly working.
-
 
106
 
109
 
107
  In version < 4.05c, if there is no file words.$lang, the file
110
  In version < 4.05c, if there is no file words.$lang, the file
108
  suffix.$lang was not used (correction in Misc/translator.c to check
111
  suffix.$lang was not used (correction in Misc/translator.c to check
109
  in other situations). 
112
  in other situations). 
110
  The group words were badly treated when the
113
  The group words were badly treated when the words were already in 
111
  words were already in the title, properties, etc. because of
114
  the title, properties, etc. because of
112
  some option unknown_type=unk_delete in modind.c but it has other consequences
115
  some option unknown_type=unk_delete in modind.c but it has other consequences
113
  so it is not the situation.
116
  so it is not the situation.
114
  I think that I will put again the suffix.fr again (but one must now really
-
 
115
  check it : do we want that capital and capitale are the same, which is
-
 
116
  the case for the moment).
-
 
117
 
117
 
118
, will be done by the script in the stable release if we are OK)
118
, will be done by the script in the stable release if we are OK)
119
 
119
 
120
Syntax: the lines for most of these files are in the form
120
Syntax: the lines for most of these files are in the form
121
 
121
 
Line 126... Line 126...
126
=============================================================
126
=============================================================
127
 
127
 
128
Files
128
Files
129
=====
129
=====
130
 
130
 
131
words.$search_lang : correct misprints in the search words
131
words.xx : correct misprints in the search words
132
(used both by "mkindex" and "search.proc"). 
132
(used both by "mkindex" and "search.proc"). 
133
 
133
 
134
E.g. if the file words.en contains the line
134
E.g. if the file words.en contains the line
135
 
135
 
136
==
136
==
Line 149... Line 149...
149
Note: the file words.en is used by the module tool/wcalc.en (see
149
Note: the file words.en is used by the module tool/wcalc.en (see
150
~/public_html/modules/tool/wcalc.en/dic )
150
~/public_html/modules/tool/wcalc.en/dic )
151
 
151
 
152
=====================
152
=====================
153
 
153
 
154
suffix.$search_lang : process common suffixes in the search words
154
suffix.xx : process common suffixes in the search words
155
(used both by "mkindex" and "search.proc"). 
155
(used both by "mkindex" and "search.proc"). 
156
 
156
 
157
E.g. if the file suffix.en contains the line
157
E.g. if the file suffix.en contains the line
158
 
158
 
159
==
159
==
160
ertem:meter
160
ertem:meter
161
==
161
==
162
 
162
 
163
then any word ending in "metre" ("ertem" the other way round) is
163
then any word ending in "metre" ("ertem" the other way round) is
164
substituted by the corresponding one ending in "meter" (kilometre -->
164
substituted by the corresponding one ending in "meter" (kilometre -->
165
kilometer)
165
kilometer)
166
 
166
 
167
Note: suffix.fr was deleted because it caused the search engine/the
167
Note: suffix.fr was deleted because it caused the search engine/the
168
keyword completion not to work properly. The site manager can
168
keyword completion not to work properly. The site manager can
169
reactivate the functionality by adding the file again.
169
reactivate the functionality by adding the file again.
170
 
170
 
171
=====================
171
=====================
172
 
172
 
173
wgrp/wgrp.$search_lang : groups of word
173
wgrp/wgrp.xx : groups of word
174
(these files are automatically generated, and used by "mkindex")
174
(these files are automatically generated, and used by "mkindex")
175
 
175
 
176
E.g. if the file wgrp/wgrp.en contains the line
176
E.g. if the file wgrp/wgrp.en contains the line
177
 
177
 
178
==
178
==
Line 204... Line 204...
204
 
204
 
205
(in the corresponding language file)
205
(in the corresponding language file)
206
 
206
 
207
NOTE: problems when the strings contains the apostrophe "'"
207
NOTE: problems when the strings contains the apostrophe "'"
208
(e.g. "algorithme d'euclide")
208
(e.g. "algorithme d'euclide")
209
 
209
 
-
 
210
=====================
-
 
211
 
-
 
212
domaindic.xx
-
 
213
 
-
 
214
use the files domain/domain.xx to replace the "langugage" domain in the
-
 
215
  english/technic way.
-
 
216
 
210
=====================
217
=====================
211
 
218
 
212
indignore.$search_lang : ignored words
219
indignore.xx : ignored words
213
(used by "mkindex")
220
(used by "mkindex")
214
 
221
 
215
All the words listed in the file are ignored by the search engine. 
222
All the words listed in the file are ignored by the search engine. 
216
 
223
 
217
=====================
224
=====================
218
 
225
 
219
abuse.$search_lang : swearwords to be ignored by the search engine
226
abuse.xx : swearwords to be ignored by the search engine
220
(used by ??)
227
(used by ??)
221
 
228
 
222
=====================
229
=====================
223
 
230
 
224
andor.$search_lang : conjunctions ("and", "or") to be ignored by the 
231
andor.xx : conjunctions ("and", "or") to be ignored by the 
225
search engine
232
search engine
226
 
233
 
227
The file andor.xx is mentioned in src/insmath.c (processing logic
234
The file andor.xx is mentioned in src/insmath.c (processing logic
228
statements in math formulas) but this is for the moment used by no
235
statements in math formulas) but this is for the moment used by no
229
modules (to be used, one must have insmath_logic=yes which do not
236
modules (to be used, one must have insmath_logic=yes which do not
Line 280... Line 287...
280
module (1003) but contain only reference to the corresponding
287
module (1003) but contain only reference to the corresponding
281
translated module (1002 resp 2004). --> HELP there is no A.cn file!!
288
translated module (1002 resp 2004). --> HELP there is no A.cn file!!
282
 
289
 
283
The files A.en contains the following lines related to this module.
290
The files A.en contains the following lines related to this module.
284
 
291
 
-
 
292
?2 or ?4 is the ranking
285
?? (...?2 is the ranking, why do we sometimes have ....?4 )
293
It is a weight -- see name of variable in modind.c -- 
286
(ER : It is a weight -- see name of variable in modind.c -- giving more importance to the title words : 4 if the word appears in the module title, 2 otherwise)
294
giving more importance to the title words : 4 if the word appears 
-
 
295
in the module title, 2 otherwise
287
 
296
 
288
2d:1003?2                           from description and description_it
297
2d:1003?2                           from description and description_it
289
algebra:1003?2			    from domain
298
algebra:1003?2			    from domain
290
bersaglio:1003?2		    from keywords_it
299
bersaglio:1003?2		    from keywords_it
291
click:1003?2			    from description
300
click:1003?2			    from description
Line 350... Line 359...
350
2d:					
359
2d:					
351
algebraisch:			directive "algebra:algebraisch" in words.nl
360
algebraisch:			directive "algebra:algebraisch" in words.nl
352
bersaglio:			
361
bersaglio:			
353
clicking:			directive "click:clicking" in words.nl
362
clicking:			directive "click:clicking" in words.nl
354
combinaison:			"combination:combinaison" in words.nl
363
combinaison:			"combination:combinaison" in words.nl
355
combinazione:
364
combinazione:
356
combinazione lineare:
365
combinazione lineare:
357
gang:
366
gang:
358
levelh4:
367
levelh4:
359
levelh5:
368
levelh5:
360
levelh6:
369
levelh6:
361
levelu1:
370
levelu1:
362
levelu2:
371
levelu2:
363
lineare:
372
lineare:
364
linearly:			"linear:linearly" in words.nl
373
linearly:			"linear:linearly" in words.nl
365
niet: 				"on:niet" in words.nl
374
niet: 				"on:niet" in words.nl
366
ofwel:				"of:ofwel"
375
ofwel:				"of:ofwel"
367
shooting:			"shoot:shooting"
376
shooting:			"shoot:shooting"
368
vector:
377
vector:
Line 447... Line 456...
447
 
456
 
448
- write author,description,language,etc. information in each corresponding file
457
- write author,description,language,etc. information in each corresponding file
449
  bases/site2/author|description|language|...
458
  bases/site2/author|description|language|...
450
 
459
 
451
- normalizes data (suppress uppercase, accents, apostrophe, plural) 
460
- normalizes data (suppress uppercase, accents, apostrophe, plural) 
452
  according to dictionary, to get normalized author,description, title, etc.
461
  according to dictionary domaindic, then maindic with suffix, to get normalized 
-
 
462
  author, description, title, etc.
453
  This is done in the loop for(i=0;i<trcnt;i++){...}
463
  This is done in the loop for(i=0;i<trcnt;i++){...}
454
 
464
 
455
- transforms the (normalized) title into words (change commas to spaces) 
465
- transforms the (normalized) title into words (change commas to spaces) 
456
  and for each word, appends it with weight 4 using function appenditem.
466
  and for each word, appends it with weight 4 using function appenditem.
457
  the variables are the word itself, the current language treated, the serial number of module,
467
  the variables are the word itself, the current language treated, the serial number of module,
458
  the weight=4, and the module language. 
468
  the weight=4, and the module language. 
459
 
469
 
460
- put every information other than title (description, keywords, foreign titles, author...) 
470
- put every information other than title (description, keywords, foreign titles, author...) 
461
  in a buffer, transforms it into words and appends this as above except than weight=2.
471
  in a buffer, transforms it into words and appends this as above except than weight=2.
462
 
-
 
463
  BUG ? : in this process, i_keywords_fr is used twice, probably the first one should be i_keywords_en, to be checked.
-
 
464
 
472
 
465
- the 2 preceeding points (treatment of title and other info) are repeated with the difference
473
- the 2 preceeding points (treatment of title and other info) are repeated with the difference
466
  that the transformation into words is replaced by a translation : 
474
  that the transformation into words is replaced by a translation : 
467
  the commas are kept, but some usual words are deleted.
475
  the commas are kept, but some usual words are deleted.
468
  BUG ? : Another difference is that part of "other information than title" is missing, 
476
  BUG ? : Another difference is that part of "other information than title" is missing, 
469
          for instance the foreign titles, require, author.
477
          for instance the foreign titles, require, author.
470
 
478
 
471
ER : I don't know why the process is repeated : should look at appenditem to see where it is appended, maybe the second time is somewhere else.
479
ER : I don't know why the process is repeated : should look at appenditem 
-
 
480
to see where it is appended, maybe the second time is somewhere else.
472
 
481
 
473
 
482
 
474
===============================
483
===============================