Rev 6876 | Go to most recent revision | Show entire file | Ignore whitespace | Details | Blame | Last modification | View Log | RSS feed
Rev 6876 | Rev 6879 | ||
---|---|---|---|
Line 18... | Line 18... | ||
18 | 18 | ||
19 | (the scripts must be run in the order given here, as some files |
19 | (the scripts must be run in the order given here, as some files |
20 | created on earlier stages are used in subsequent stages). In general |
20 | created on earlier stages are used in subsequent stages). In general |
21 | the whole process is run by the script ~/bin/mkindex. |
21 | the whole process is run by the script ~/bin/mkindex. |
22 | 22 | ||
23 | * Firstly a series of 3 perl scripts ( |
23 | * Firstly a series of 3 perl scripts (mkdomain, mkwgrp, modindclass), |
24 | that ~/bin/mkindex.sh calls via ~/public_html/bases/sys/mkindex.sh : |
24 | that ~/bin/mkindex.sh calls via ~/public_html/bases/sys/mkindex.sh : |
25 | 25 | ||
26 | - the programm ~/public_html/bases/sys/mkdomain.pl creates the lists |
26 | - the programm ~/public_html/bases/sys/mkdomain.pl creates the lists |
27 | of domains from the graph in domain/domain with its translations |
27 | of domains from the graph in domain/domain with its translations |
28 | (domain/domain.$lang) and in json format (english) to be used for |
28 | (domain/domain.$lang) and in json format (english) to be used for |
29 | completion in modtool properties |
29 | completion in modtool properties ; create also the domain/domaindic.xx |
- | 30 | to be used as a dictionnary in modind and in the search engine |
|
30 | 31 | ||
31 | - the perl program ~/public_html/bases/sys/mkwgrp.pl reads the INDEX |
32 | - the perl program ~/public_html/bases/sys/mkwgrp.pl reads the INDEX |
32 | files of all the modules on the site and generates |
33 | files of all the modules on the site and generates |
33 | 34 | ||
34 | - keywords (in format .json) to be used for completion in the search |
35 | - keywords (in format .json) to be used for completion in the search |
Line 42... | Line 43... | ||
42 | Some files are created in keywords as keywords/algebra.fr.tmp, but |
43 | Some files are created in keywords as keywords/algebra.fr.tmp, but |
43 | not used for the moment. The keywords in these "keywords file" are |
44 | not used for the moment. The keywords in these "keywords file" are |
44 | exactly those in the variable keywords (or keywords_$lang if it |
45 | exactly those in the variable keywords (or keywords_$lang if it |
45 | exists), doing it with the following rules: taking keywords_$lang if |
46 | exists), doing it with the following rules: taking keywords_$lang if |
46 | it exists, or keywords (whatever it is a $lang-module or not). |
47 | it exists, or keywords (whatever it is a $lang-module or not). |
- | 48 | It adds also the lang version of the domains (see domain/domain.xx). |
|
47 | 49 | ||
48 | - the program ~/public_html/bases/sys/modindclass.pl creates the lists |
50 | - the program ~/public_html/bases/sys/modindclass.pl creates the lists |
49 | of keywords coming from the example classes in |
51 | of keywords coming from the example classes in |
50 | ~/public_html/bases/class as well as the files author, |
52 | ~/public_html/bases/class as well as the files author, |
51 | description, language, level, title (no ranking is done). |
53 | description, language, level, title (no ranking is done). |
- | 54 | ||
- | 55 | Be careful : to be used as dictionary, must be sorted by the command |
|
- | 56 | bin/dicsort (for example for domaindic). |
|
52 | 57 | ||
53 | * Secondly the binary program "modind" (compiled from ~/src/Misc/modind.c) reads |
58 | * Secondly the binary program "modind" (compiled from ~/src/Misc/modind.c) reads |
54 | 59 | ||
55 | -- the INDEX files of all the modules on the site |
60 | -- the INDEX files of all the modules on the site |
56 | -- the auxiliary files in ~/public_html/bases/sys/ (see description |
61 | -- the auxiliary files in ~/public_html/bases/sys/ (see description |
57 | below) |
62 | below) |
Line 66... | Line 71... | ||
66 | ranking of the site's modules. The modules are classified according |
71 | ranking of the site's modules. The modules are classified according |
67 | to their types: A=all (except sheet and classes), D=document, O=OEF, |
72 | to their types: A=all (except sheet and classes), D=document, O=OEF, |
68 | X=exercise, T= tool, R=recreation, M= data module. |
73 | X=exercise, T= tool, R=recreation, M= data module. |
69 | 74 | ||
70 | To do that, "modind" uses some dictionnaries in |
75 | To do that, "modind" uses some dictionnaries in |
71 | ~/public_html/bases/sys/ (as suffix. |
76 | ~/public_html/bases/sys/ (as suffix.xx, wgrp, domaindic.xx ...) |
72 | 77 | ||
73 | -- separately "modind" reads also the files in |
78 | -- separately "modind" reads also the files in |
74 | ~/public_html/bases/sys/sheet and do the same type of works |
79 | ~/public_html/bases/sys/sheet and do the same type of works. |
75 | 80 | ||
76 | 81 | ||
77 | 2) use of index files |
82 | 2) use of index files |
78 | =========================== |
83 | =========================== |
79 | The script ~/public_html/modules/home/search.proc (called by the |
84 | The script ~/public_html/modules/home/search.proc (called by the |
Line 98... | Line 103... | ||
98 | The files in this directory ~/public_html/bases/sys/ are automatically |
103 | The files in this directory ~/public_html/bases/sys/ are automatically |
99 | generated (on install) by the corresponding ".src" file in the "src" |
104 | generated (on install) by the corresponding ".src" file in the "src" |
100 | subdirectory, if it exists. |
105 | subdirectory, if it exists. |
101 | 106 | ||
102 | If any of the files described below is omitted, then the corresponding |
107 | If any of the files described below is omitted, then the corresponding |
103 | feature in the corresponding language is disabled. |
108 | feature in the corresponding language is disabled. |
104 | words.fr/words.fr.src and suffix.fr/suffix.fr.src will be/have been |
- | |
105 | deleted in order to make the search engine correctly working. |
- | |
106 | 109 | ||
107 | In version < 4.05c, if there is no file words.$lang, the file |
110 | In version < 4.05c, if there is no file words.$lang, the file |
108 | suffix.$lang was not used (correction in Misc/translator.c to check |
111 | suffix.$lang was not used (correction in Misc/translator.c to check |
109 | in other situations). |
112 | in other situations). |
110 | The group words were badly treated when the |
113 | The group words were badly treated when the words were already in |
111 |
|
114 | the title, properties, etc. because of |
112 | some option unknown_type=unk_delete in modind.c but it has other consequences |
115 | some option unknown_type=unk_delete in modind.c but it has other consequences |
113 | so it is not the situation. |
116 | so it is not the situation. |
114 | I think that I will put again the suffix.fr again (but one must now really |
- | |
115 | check it : do we want that capital and capitale are the same, which is |
- | |
116 | the case for the moment). |
- | |
117 | 117 | ||
118 | , will be done by the script in the stable release if we are OK) |
118 | , will be done by the script in the stable release if we are OK) |
119 | 119 | ||
120 | Syntax: the lines for most of these files are in the form |
120 | Syntax: the lines for most of these files are in the form |
121 | 121 | ||
Line 126... | Line 126... | ||
126 | ============================================================= |
126 | ============================================================= |
127 | 127 | ||
128 | Files |
128 | Files |
129 | ===== |
129 | ===== |
130 | 130 | ||
131 | words. |
131 | words.xx : correct misprints in the search words |
132 | (used both by "mkindex" and "search.proc"). |
132 | (used both by "mkindex" and "search.proc"). |
133 | 133 | ||
134 | E.g. if the file words.en contains the line |
134 | E.g. if the file words.en contains the line |
135 | 135 | ||
136 | == |
136 | == |
Line 149... | Line 149... | ||
149 | Note: the file words.en is used by the module tool/wcalc.en (see |
149 | Note: the file words.en is used by the module tool/wcalc.en (see |
150 | ~/public_html/modules/tool/wcalc.en/dic ) |
150 | ~/public_html/modules/tool/wcalc.en/dic ) |
151 | 151 | ||
152 | ===================== |
152 | ===================== |
153 | 153 | ||
154 | suffix. |
154 | suffix.xx : process common suffixes in the search words |
155 | (used both by "mkindex" and "search.proc"). |
155 | (used both by "mkindex" and "search.proc"). |
156 | 156 | ||
157 | E.g. if the file suffix.en contains the line |
157 | E.g. if the file suffix.en contains the line |
158 | 158 | ||
159 | == |
159 | == |
160 | ertem:meter |
160 | ertem:meter |
161 | == |
161 | == |
162 | 162 | ||
163 | then any word ending in "metre" ("ertem" the other way round) is |
163 | then any word ending in "metre" ("ertem" the other way round) is |
164 | substituted by the corresponding one ending in "meter" (kilometre --> |
164 | substituted by the corresponding one ending in "meter" (kilometre --> |
165 | kilometer) |
165 | kilometer) |
166 | 166 | ||
167 | Note: suffix.fr was deleted because it caused the search engine/the |
167 | Note: suffix.fr was deleted because it caused the search engine/the |
168 | keyword completion not to work properly. The site manager can |
168 | keyword completion not to work properly. The site manager can |
169 | reactivate the functionality by adding the file again. |
169 | reactivate the functionality by adding the file again. |
170 | 170 | ||
171 | ===================== |
171 | ===================== |
172 | 172 | ||
173 | wgrp/wgrp. |
173 | wgrp/wgrp.xx : groups of word |
174 | (these files are automatically generated, and used by "mkindex") |
174 | (these files are automatically generated, and used by "mkindex") |
175 | 175 | ||
176 | E.g. if the file wgrp/wgrp.en contains the line |
176 | E.g. if the file wgrp/wgrp.en contains the line |
177 | 177 | ||
178 | == |
178 | == |
Line 204... | Line 204... | ||
204 | 204 | ||
205 | (in the corresponding language file) |
205 | (in the corresponding language file) |
206 | 206 | ||
207 | NOTE: problems when the strings contains the apostrophe "'" |
207 | NOTE: problems when the strings contains the apostrophe "'" |
208 | (e.g. "algorithme d'euclide") |
208 | (e.g. "algorithme d'euclide") |
209 | 209 | ||
- | 210 | ===================== |
|
- | 211 | ||
- | 212 | domaindic.xx |
|
- | 213 | ||
- | 214 | use the files domain/domain.xx to replace the "langugage" domain in the |
|
- | 215 | english/technic way. |
|
- | 216 | ||
210 | ===================== |
217 | ===================== |
211 | 218 | ||
212 | indignore. |
219 | indignore.xx : ignored words |
213 | (used by "mkindex") |
220 | (used by "mkindex") |
214 | 221 | ||
215 | All the words listed in the file are ignored by the search engine. |
222 | All the words listed in the file are ignored by the search engine. |
216 | 223 | ||
217 | ===================== |
224 | ===================== |
218 | 225 | ||
219 | abuse. |
226 | abuse.xx : swearwords to be ignored by the search engine |
220 | (used by ??) |
227 | (used by ??) |
221 | 228 | ||
222 | ===================== |
229 | ===================== |
223 | 230 | ||
224 | andor. |
231 | andor.xx : conjunctions ("and", "or") to be ignored by the |
225 | search engine |
232 | search engine |
226 | 233 | ||
227 | The file andor.xx is mentioned in src/insmath.c (processing logic |
234 | The file andor.xx is mentioned in src/insmath.c (processing logic |
228 | statements in math formulas) but this is for the moment used by no |
235 | statements in math formulas) but this is for the moment used by no |
229 | modules (to be used, one must have insmath_logic=yes which do not |
236 | modules (to be used, one must have insmath_logic=yes which do not |
Line 280... | Line 287... | ||
280 | module (1003) but contain only reference to the corresponding |
287 | module (1003) but contain only reference to the corresponding |
281 | translated module (1002 resp 2004). --> HELP there is no A.cn file!! |
288 | translated module (1002 resp 2004). --> HELP there is no A.cn file!! |
282 | 289 | ||
283 | The files A.en contains the following lines related to this module. |
290 | The files A.en contains the following lines related to this module. |
284 | 291 | ||
- | 292 | ?2 or ?4 is the ranking |
|
285 |
|
293 | It is a weight -- see name of variable in modind.c -- |
286 |
|
294 | giving more importance to the title words : 4 if the word appears |
- | 295 | in the module title, 2 otherwise |
|
287 | 296 | ||
288 | 2d:1003?2 from description and description_it |
297 | 2d:1003?2 from description and description_it |
289 | algebra:1003?2 from domain |
298 | algebra:1003?2 from domain |
290 | bersaglio:1003?2 from keywords_it |
299 | bersaglio:1003?2 from keywords_it |
291 | click:1003?2 from description |
300 | click:1003?2 from description |
Line 350... | Line 359... | ||
350 | 2d: |
359 | 2d: |
351 | algebraisch: directive "algebra:algebraisch" in words.nl |
360 | algebraisch: directive "algebra:algebraisch" in words.nl |
352 | bersaglio: |
361 | bersaglio: |
353 | clicking: directive "click:clicking" in words.nl |
362 | clicking: directive "click:clicking" in words.nl |
354 | combinaison: "combination:combinaison" in words.nl |
363 | combinaison: "combination:combinaison" in words.nl |
355 | combinazione: |
364 | combinazione: |
356 | combinazione lineare: |
365 | combinazione lineare: |
357 | gang: |
366 | gang: |
358 | levelh4: |
367 | levelh4: |
359 | levelh5: |
368 | levelh5: |
360 | levelh6: |
369 | levelh6: |
361 | levelu1: |
370 | levelu1: |
362 | levelu2: |
371 | levelu2: |
363 | lineare: |
372 | lineare: |
364 | linearly: "linear:linearly" in words.nl |
373 | linearly: "linear:linearly" in words.nl |
365 | niet: "on:niet" in words.nl |
374 | niet: "on:niet" in words.nl |
366 | ofwel: "of:ofwel" |
375 | ofwel: "of:ofwel" |
367 | shooting: "shoot:shooting" |
376 | shooting: "shoot:shooting" |
368 | vector: |
377 | vector: |
Line 447... | Line 456... | ||
447 | 456 | ||
448 | - write author,description,language,etc. information in each corresponding file |
457 | - write author,description,language,etc. information in each corresponding file |
449 | bases/site2/author|description|language|... |
458 | bases/site2/author|description|language|... |
450 | 459 | ||
451 | - normalizes data (suppress uppercase, accents, apostrophe, plural) |
460 | - normalizes data (suppress uppercase, accents, apostrophe, plural) |
452 | according to |
461 | according to dictionary domaindic, then maindic with suffix, to get normalized |
- | 462 | author, description, title, etc. |
|
453 | This is done in the loop for(i=0;i<trcnt;i++){...} |
463 | This is done in the loop for(i=0;i<trcnt;i++){...} |
454 | 464 | ||
455 | - transforms the (normalized) title into words (change commas to spaces) |
465 | - transforms the (normalized) title into words (change commas to spaces) |
456 | and for each word, appends it with weight 4 using function appenditem. |
466 | and for each word, appends it with weight 4 using function appenditem. |
457 | the variables are the word itself, the current language treated, the serial number of module, |
467 | the variables are the word itself, the current language treated, the serial number of module, |
458 | the weight=4, and the module language. |
468 | the weight=4, and the module language. |
459 | 469 | ||
460 | - put every information other than title (description, keywords, foreign titles, author...) |
470 | - put every information other than title (description, keywords, foreign titles, author...) |
461 | in a buffer, transforms it into words and appends this as above except than weight=2. |
471 | in a buffer, transforms it into words and appends this as above except than weight=2. |
462 | - | ||
463 | BUG ? : in this process, i_keywords_fr is used twice, probably the first one should be i_keywords_en, to be checked. |
- | |
464 | 472 | ||
465 | - the 2 preceeding points (treatment of title and other info) are repeated with the difference |
473 | - the 2 preceeding points (treatment of title and other info) are repeated with the difference |
466 | that the transformation into words is replaced by a translation : |
474 | that the transformation into words is replaced by a translation : |
467 | the commas are kept, but some usual words are deleted. |
475 | the commas are kept, but some usual words are deleted. |
468 | BUG ? : Another difference is that part of "other information than title" is missing, |
476 | BUG ? : Another difference is that part of "other information than title" is missing, |
469 | for instance the foreign titles, require, author. |
477 | for instance the foreign titles, require, author. |
470 | 478 | ||
471 | ER : I don't know why the process is repeated : should look at appenditem |
479 | ER : I don't know why the process is repeated : should look at appenditem |
- | 480 | to see where it is appended, maybe the second time is somewhere else. |
|
472 | 481 | ||
473 | 482 | ||
474 | =============================== |
483 | =============================== |