Rev 6879 | Show entire file | Ignore whitespace | Details | Blame | Last modification | View Log | RSS feed
Rev 6879 | Rev 7690 | ||
---|---|---|---|
Line 1... | Line 1... | ||
1 | WIMS' search engine and als |
1 | WIMS' search engine and als |
2 | =========================== |
2 | =========================== |
3 | 3 | ||
4 | WIMS' search engine works in two stages: |
4 | WIMS' search engine works in two stages: |
5 | 5 | ||
6 | 1) update of index files when server data is changed (module added...), |
6 | 1) update of index files when server data is changed (module added...), |
7 | typically once a day. |
7 | typically once a day. |
8 | 2) use of index files at each user's request to find some activities |
8 | 2) use of index files at each user's request to find some activities |
9 | 9 | ||
10 | 10 | ||
11 | Here are some details : |
11 | Here are some details : |
12 | 12 | ||
13 | 1) update of index files |
13 | 1) update of index files |
14 | =========================== |
14 | =========================== |
15 | A series of scripts creates a set of auxiliary files (generally |
15 | A series of scripts creates a set of auxiliary files (generally |
16 | stored in ~/public_html/bases/sys/, see description further down) and |
16 | stored in ~/public_html/bases/sys/, see description further down) and |
17 | a list of "keywords" (stored in ~/public_html/bases/site/). |
17 | a list of "keywords" (stored in ~/public_html/bases/site/). |
18 | 18 | ||
19 | (the scripts must be run in the order given here, as some files |
19 | (the scripts must be run in the order given here, as some files |
20 | created on earlier stages are used in subsequent stages). In general |
20 | created on earlier stages are used in subsequent stages). In general |
21 | the whole process is run by the script ~/bin/mkindex. |
21 | the whole process is run by the script ~/bin/mkindex. |
22 | 22 | ||
23 | * Firstly a series of 3 perl scripts (mkdomain, mkwgrp, modindclass), |
23 | * Firstly a series of 3 perl scripts (mkdomain, mkwgrp, modindclass), |
24 | that ~/bin/mkindex |
24 | that ~/bin/mkindex calls via ~/public_html/bases/sys/mkindex.sh : |
25 | 25 | ||
26 | - the programm ~/public_html/bases/sys/mkdomain.pl creates the lists |
26 | - the programm ~/public_html/bases/sys/mkdomain.pl creates the lists |
27 | of domains from the graph in domain/domain with its translations |
27 | of domains from the graph in domain/domain with its translations |
28 | (domain/domain.$lang) and in json format (english) to be used for |
28 | (domain/domain.$lang) and in json format (english) to be used for |
29 | completion in modtool properties ; create also the domain/domaindic.xx |
29 | completion in modtool properties ; create also the domain/domaindic.xx |
30 | to be used as a dictionnary in modind and in the search engine |
30 | to be used as a dictionnary in modind and in the search engine |
31 | 31 | ||
32 | - the perl program ~/public_html/bases/sys/mkwgrp.pl reads the INDEX |
32 | - the perl program ~/public_html/bases/sys/mkwgrp.pl reads the INDEX |
33 | files of all the modules on the site and generates |
33 | files of all the modules on the site and generates |
34 | 34 | ||
35 | - keywords (in format .json) to be used for completion in the search |
35 | - keywords (in format .json) to be used for completion in the search |
36 | engine) |
36 | engine) |
37 | - the files in wgrp |
37 | - the files in wgrp |
38 | 38 | ||
Line 53... | Line 53... | ||
53 | description, language, level, title (no ranking is done). |
53 | description, language, level, title (no ranking is done). |
54 | 54 | ||
55 | Be careful : to be used as dictionary, must be sorted by the command |
55 | Be careful : to be used as dictionary, must be sorted by the command |
56 | bin/dicsort (for example for domaindic). |
56 | bin/dicsort (for example for domaindic). |
57 | 57 | ||
58 | * Secondly the binary program "modind" (compiled from ~/src/Misc/modind.c) reads |
58 | * Secondly the binary program "modind" (compiled from ~/src/Misc/modind.c) reads |
59 | 59 | ||
60 | -- the INDEX files of all the modules on the site |
60 | -- the INDEX files of all the modules on the site |
61 | -- the auxiliary files in ~/public_html/bases/sys/ (see description |
61 | -- the auxiliary files in ~/public_html/bases/sys/ (see description |
62 | below) |
62 | below) |
63 | 63 | ||
64 | and produces keywords lists stored in ~wims/public_html/bases/site : |
64 | and produces keywords lists stored in ~wims/public_html/bases/site : |
65 | they contains the words (or words groups) coming from the variable |
65 | they contains the words (or words groups) coming from the variable |
Line 77... | Line 77... | ||
77 | 77 | ||
78 | -- separately "modind" reads also the files in |
78 | -- separately "modind" reads also the files in |
79 | ~/public_html/bases/sys/sheet and do the same type of works. |
79 | ~/public_html/bases/sys/sheet and do the same type of works. |
80 | 80 | ||
81 | 81 | ||
82 | 2) use of index files |
82 | 2) use of index files |
83 | =========================== |
83 | =========================== |
84 | The script ~/public_html/modules/home/search.proc (called by the |
84 | The script ~/public_html/modules/home/search.proc (called by the |
85 | "Search" form) reads the lists above, do the actual search in such |
85 | "Search" form) reads the lists above, do the actual search in such |
86 | lists and displays the modules found. It reads also the files of |
86 | lists and displays the modules found. It reads also the files of |
87 | ~/public_html/bases/sys/class and ~/public_html/bases/sys/sheets |
87 | ~/public_html/bases/sys/class and ~/public_html/bases/sys/sheets |
Line 107... | Line 107... | ||
107 | If any of the files described below is omitted, then the corresponding |
107 | If any of the files described below is omitted, then the corresponding |
108 | feature in the corresponding language is disabled. |
108 | feature in the corresponding language is disabled. |
109 | 109 | ||
110 | In version < 4.05c, if there is no file words.$lang, the file |
110 | In version < 4.05c, if there is no file words.$lang, the file |
111 | suffix.$lang was not used (correction in Misc/translator.c to check |
111 | suffix.$lang was not used (correction in Misc/translator.c to check |
112 | in other situations). |
112 | in other situations). |
113 | The group words were badly treated when the words were already in |
113 | The group words were badly treated when the words were already in |
114 | the title, properties, etc. because of |
114 | the title, properties, etc. because of |
115 | some option unknown_type=unk_delete in modind.c but it has other consequences |
115 | some option unknown_type=unk_delete in modind.c but it has other consequences |
116 | so it is not the situation. |
116 | so it is not the situation. |
117 | 117 | ||
118 | , will be done by the script in the stable release if we are OK) |
118 | , will be done by the script in the stable release if we are OK) |
Line 127... | Line 127... | ||
127 | 127 | ||
128 | Files |
128 | Files |
129 | ===== |
129 | ===== |
130 | 130 | ||
131 | words.xx : correct misprints in the search words |
131 | words.xx : correct misprints in the search words |
132 | (used both by "mkindex" and "search.proc"). |
132 | (used both by "mkindex" and "search.proc"). |
133 | 133 | ||
134 | E.g. if the file words.en contains the line |
134 | E.g. if the file words.en contains the line |
135 | 135 | ||
136 | == |
136 | == |
137 | analytical:analytic |
137 | analytical:analytic |
138 | == |
138 | == |
Line 150... | Line 150... | ||
150 | ~/public_html/modules/tool/wcalc.en/dic ) |
150 | ~/public_html/modules/tool/wcalc.en/dic ) |
151 | 151 | ||
152 | ===================== |
152 | ===================== |
153 | 153 | ||
154 | suffix.xx : process common suffixes in the search words |
154 | suffix.xx : process common suffixes in the search words |
155 | (used both by "mkindex" and "search.proc"). |
155 | (used both by "mkindex" and "search.proc"). |
156 | 156 | ||
157 | E.g. if the file suffix.en contains the line |
157 | E.g. if the file suffix.en contains the line |
158 | 158 | ||
159 | == |
159 | == |
160 | ertem:meter |
160 | ertem:meter |
Line 186... | Line 186... | ||
186 | would return both the modules containing the word "affine" and the |
186 | would return both the modules containing the word "affine" and the |
187 | modules containing the word "geometry"). |
187 | modules containing the word "geometry"). |
188 | 188 | ||
189 | The "wgrp" files are now generated from the modules' keywords by the |
189 | The "wgrp" files are now generated from the modules' keywords by the |
190 | script ~/public_html/bases/sys/mkwgrp.pl : whenever a module contains |
190 | script ~/public_html/bases/sys/mkwgrp.pl : whenever a module contains |
191 | multiple words keywords, such keywords are added to the wgrp files. |
191 | multiple words keywords, such keywords are added to the wgrp files. |
192 | 192 | ||
193 | E.g. tool/algebra/smallgroup.fr/INDEX contains the keyword |
193 | E.g. tool/algebra/smallgroup.fr/INDEX contains the keyword |
194 | 194 | ||
195 | keywords=group, finite group, order, subgroup, conjugacy class, center, normal subgroup, subgroup lattice |
195 | keywords=group, finite group, order, subgroup, conjugacy class, center, normal subgroup, subgroup lattice |
196 | 196 | ||
197 | so for each of the groups of words between two commas the |
197 | so for each of the groups of words between two commas the |
198 | corresponding groups of words are created |
198 | corresponding groups of words are created |
Line 209... | Line 209... | ||
209 | 209 | ||
210 | ===================== |
210 | ===================== |
211 | 211 | ||
212 | domaindic.xx |
212 | domaindic.xx |
213 | 213 | ||
214 | use the files domain/domain.xx to replace the " |
214 | use the files domain/domain.xx to replace the "language" domain in the |
215 | english/technic way. |
215 | english/technic way. |
216 | 216 | ||
217 | ===================== |
217 | ===================== |
218 | 218 | ||
219 | indignore.xx : ignored words |
219 | indignore.xx : ignored words |
220 | (used by "mkindex") |
220 | (used by "mkindex") |
221 | 221 | ||
222 | All the words listed in the file are ignored by the search engine. |
222 | All the words listed in the file are ignored by the search engine. |
223 | 223 | ||
224 | ===================== |
224 | ===================== |
225 | 225 | ||
226 | abuse.xx : swearwords to be ignored by the search engine |
226 | abuse.xx : swearwords to be ignored by the search engine |
227 | (used by ??) |
227 | (used by ??) |
228 | 228 | ||
229 | ===================== |
229 | ===================== |
230 | 230 | ||
231 | andor.xx : conjunctions ("and", "or") to be ignored by the |
231 | andor.xx : conjunctions ("and", "or") to be ignored by the |
232 | search engine |
232 | search engine |
233 | 233 | ||
234 | The file andor.xx is mentioned in src/insmath.c (processing logic |
234 | The file andor.xx is mentioned in src/insmath.c (processing logic |
235 | statements in math formulas) but this is for the moment used by no |
235 | statements in math formulas) but this is for the moment used by no |
236 | modules (to be used, one must have insmath_logic=yes which do not |
236 | modules (to be used, one must have insmath_logic=yes which do not |
Line 252... | Line 252... | ||
252 | 252 | ||
253 | As this is an exercise module it is indexed in the lists A.$lang (All) |
253 | As this is an exercise module it is indexed in the lists A.$lang (All) |
254 | and X.$lang (eXercise). |
254 | and X.$lang (eXercise). |
255 | 255 | ||
256 | This is a multilanguage module (main language "en", translation |
256 | This is a multilanguage module (main language "en", translation |
257 | language "it"). |
257 | language "it"). |
258 | 258 | ||
259 | The index file contains the following (nonempty) lines |
259 | The index file contains the following (nonempty) lines |
260 | 260 | ||
261 | title=Vector shoot |
261 | title=Vector shoot |
262 | description=click on a linear combination of 2D vectors. |
262 | description=click on a linear combination of 2D vectors. |
Line 288... | Line 288... | ||
288 | translated module (1002 resp 2004). --> HELP there is no A.cn file!! |
288 | translated module (1002 resp 2004). --> HELP there is no A.cn file!! |
289 | 289 | ||
290 | The files A.en contains the following lines related to this module. |
290 | The files A.en contains the following lines related to this module. |
291 | 291 | ||
292 | ?2 or ?4 is the ranking |
292 | ?2 or ?4 is the ranking |
293 | It is a weight -- see name of variable in modind.c -- |
293 | It is a weight -- see name of variable in modind.c -- |
294 | giving more importance to the title words : 4 if the word appears |
294 | giving more importance to the title words : 4 if the word appears |
295 | in the module title, 2 otherwise |
295 | in the module title, 2 otherwise |
296 | 296 | ||
297 | 2d:1003?2 from description and description_it |
297 | 2d:1003?2 from description and description_it |
298 | algebra:1003?2 from domain |
298 | algebra:1003?2 from domain |
299 | bersaglio:1003?2 from keywords_it |
299 | bersaglio:1003?2 from keywords_it |
Line 301... | Line 301... | ||
301 | combination:1003?2 from description (_not_ from keywords) |
301 | combination:1003?2 from description (_not_ from keywords) |
302 | combinazione:1003?2 from description_it |
302 | combinazione:1003?2 from description_it |
303 | combinazione lineare:1003?2 from keywords + wgrp.en |
303 | combinazione lineare:1003?2 from keywords + wgrp.en |
304 | gang:1003?2 from author |
304 | gang:1003?2 from author |
305 | levelh4:1003?2 from level=h4 (and so on) |
305 | levelh4:1003?2 from level=h4 (and so on) |
306 | levelh5:1003?2 |
306 | levelh5:1003?2 |
307 | levelh6:1003?2 |
307 | levelh6:1003?2 |
308 | levelu1:1003?2 |
308 | levelu1:1003?2 |
309 | levelu2:1003?2 |
309 | levelu2:1003?2 |
310 | linear:1003?2 from description |
310 | linear:1003?2 from description |
311 | linear algebra:1003?2 from keywords |
311 | linear algebra:1003?2 from keywords |
312 | linear combination:1003?2 from keywords |
312 | linear combination:1003?2 from keywords |
313 | lineare:1003?2 from description_it |
313 | lineare:1003?2 from description_it |
314 | shoot:1003?4 from title |
314 | shoot:1003?4 from title |
315 | vector:1003?4 from title + description |
315 | vector:1003?4 from title + description |
316 | (vectors --> vector because of |
316 | (vectors --> vector because of |
317 | directive "sr:r" in suffix.en) |
317 | directive "sr:r" in suffix.en) |
318 | vettore:1003?2 from keywords_it |
318 | vettore:1003?2 from keywords_it |
319 | xiao:1003?2 from author |
319 | xiao:1003?2 from author |
320 | 320 | ||
321 | The file A.it contains the following lines related to this module. |
321 | The file A.it contains the following lines related to this module. |
Line 341... | Line 341... | ||
341 | linear algebra:1003?2 |
341 | linear algebra:1003?2 |
342 | linear combination:1003?2 |
342 | linear combination:1003?2 |
343 | lineare:1003?2 |
343 | lineare:1003?2 |
344 | shoot:1003?4 |
344 | shoot:1003?4 |
345 | vector:1003?4 |
345 | vector:1003?4 |
346 | vectors:1003?2 no corresponding in A.en because |
346 | vectors:1003?2 no corresponding in A.en because |
347 | of directive in suffix.en |
347 | of directive in suffix.en |
348 | vettore:1003?2 |
348 | vettore:1003?2 |
349 | xiao:1003?2 |
349 | xiao:1003?2 |
350 | 350 | ||
351 | NOTE: title_it is missing from the index: you cannot find the module |
351 | NOTE: title_it is missing from the index: you cannot find the module |
Line 354... | Line 354... | ||
354 | The file A.$lang for languages different from the above contains lines |
354 | The file A.$lang for languages different from the above contains lines |
355 | related to this module. |
355 | related to this module. |
356 | 356 | ||
357 | E.g. A.nl |
357 | E.g. A.nl |
358 | 358 | ||
359 | 2d: |
359 | 2d: |
360 | algebraisch: directive "algebra:algebraisch" in words.nl |
360 | algebraisch: directive "algebra:algebraisch" in words.nl |
361 | bersaglio: |
361 | bersaglio: |
362 | clicking: directive "click:clicking" in words.nl |
362 | clicking: directive "click:clicking" in words.nl |
363 | combinaison: "combination:combinaison" in words.nl |
363 | combinaison: "combination:combinaison" in words.nl |
364 | combinazione: |
364 | combinazione: |
365 | combinazione lineare: |
365 | combinazione lineare: |
366 | gang: |
366 | gang: |
Line 430... | Line 430... | ||
430 | (so only them are in the list of completion) |
430 | (so only them are in the list of completion) |
431 | - modind.c creates files A.$lang etc which are based on words of keywords, |
431 | - modind.c creates files A.$lang etc which are based on words of keywords, |
432 | title, description. They are not all of them in the "completion list" |
432 | title, description. They are not all of them in the "completion list" |
433 | but can be written and found by the search engine. |
433 | but can be written and found by the search engine. |
434 | 434 | ||
435 | 435 | ||
436 | 436 | ||
437 | Technical things about modind.c (ER. just to avoid forgetting work in progress) |
437 | Technical things about modind.c (ER. just to avoid forgetting work in progress) |
438 | =============================== |
438 | =============================== |
439 | 439 | ||
440 | The tasks done are in order : |
440 | The tasks done are in order : |
441 | 441 | ||
442 | - prep() : * replaces if possible the default language list (defined at top of file) |
442 | - prep() : * replaces if possible the default language list (defined at top of file) |
443 | by the list of languages installed on the server. |
443 | by the list of languages installed on the server. |
444 | * gets the list of all modules prepared by a previous script |
444 | * gets the list of all modules prepared by a previous script |
445 | * opens files bases/site2/author|description|language|... |
445 | * opens files bases/site2/author|description|language|... |
Line 450... | Line 450... | ||
450 | 450 | ||
451 | - sprep(),sheets() : idem for sheets. |
451 | - sprep(),sheets() : idem for sheets. |
452 | 452 | ||
453 | 453 | ||
454 | 454 | ||
455 | Extracting information from one module for a given language (function onemodule) : |
455 | Extracting information from one module for a given language (function onemodule) : |
456 | 456 | ||
457 | - write author,description,language,etc. information in each corresponding file |
457 | - write author,description,language,etc. information in each corresponding file |
458 | bases/site2/author|description|language|... |
458 | bases/site2/author|description|language|... |
459 | 459 | ||
460 | - normalizes data (suppress uppercase, accents, apostrophe, plural) |
460 | - normalizes data (suppress uppercase, accents, apostrophe, plural) |
461 | according to dictionary domaindic, then maindic with suffix, to get normalized |
461 | according to dictionary domaindic, then maindic with suffix, to get normalized |
462 | author, description, title, etc. |
462 | author, description, title, etc. |
463 | This is done in the loop for(i=0;i<trcnt;i++){...} |
463 | This is done in the loop for(i=0;i<trcnt;i++){...} |
464 | 464 | ||
465 | - transforms the (normalized) title into words (change commas to spaces) |
465 | - transforms the (normalized) title into words (change commas to spaces) |
466 | and for each word, appends it with weight 4 using function appenditem. |
466 | and for each word, appends it with weight 4 using function appenditem. |
467 | the variables are the word itself, the current language treated, the serial number of module, |
467 | the variables are the word itself, the current language treated, the serial number of module, |
468 | the weight=4, and the module language. |
468 | the weight=4, and the module language. |
469 | 469 | ||
470 | - put every information other than title (description, keywords, foreign titles, author...) |
470 | - put every information other than title (description, keywords, foreign titles, author...) |
471 | in a buffer, transforms it into words and appends this as above except than weight=2. |
471 | in a buffer, transforms it into words and appends this as above except than weight=2. |
472 | 472 | ||
473 | - the 2 preceeding points (treatment of title and other info) are repeated with the difference |
473 | - the 2 preceeding points (treatment of title and other info) are repeated with the difference |
474 | that the transformation into words is replaced by a translation : |
474 | that the transformation into words is replaced by a translation : |
475 | the commas are kept, but some usual words are deleted. |
475 | the commas are kept, but some usual words are deleted. |
476 | BUG ? : Another difference is that part of "other information than title" is missing, |
476 | BUG ? : Another difference is that part of "other information than title" is missing, |
477 | for instance the foreign titles, require, author. |
477 | for instance the foreign titles, require, author. |
478 | 478 | ||
479 | ER : I don't know why the process is repeated : should look at appenditem |
479 | ER : I don't know why the process is repeated : should look at appenditem |
480 | to see where it is appended, maybe the second time is somewhere else. |
480 | to see where it is appended, maybe the second time is somewhere else. |
481 | 481 | ||
482 | 482 | ||
483 | =============================== |
483 | =============================== |