The dictionary is about 17000 entries, as would be counted in an ordinary dictionary. This expands to more than twice that number of individual stems (the count that the program displays at startup), and may generate many hundreds of thousands of "words" that one can construct over all the declensions and conjugations. But remember that this is a very modest, student-size dictionary. Kidd's Collins Latin Gem, a breast-pocket (8 by 11 cm.) edition (which even has English_to_Latin) contains about 17,000 Latin entries. The ultimate 2100 page Oxford Latin Dictionary has about 34,000 entries, excluding proper names (and it has lots of those). The point of this early version is to provide a tool to help in simple translations for a beginning Latin student or amateur.
A few hundred prefixes and suffixes further enlarge the range. These will generate tens of thousands of additional words - some of which are recognized Latin words, some are perfectly reasonable words which were never used by Cicero or Caesar but might have been used by Augustine or some monk at Jarrow, and some are nonsense.
I decided to automate an elementary-level Latin vocabulary list. As a first stage, I have produced a computer program that will analyze a Latin word and give the various possible interpretations (case, person, gender, tense, mood, etc.), within the limitations of its dictionary. This might be the first step to a full parsing system, but, although just a development tool, it is useful by itself.
The present set of inflections is fairly complete (although it may not be perfect). The dictionary is, of course, limited. The purpose of this dictionary in the overall scheme was to have a variety of words and types, from which the algorithms for the codes could be developed, and with which they could be exercised.
While developing this initial implementation, based on different sources, I learned (or relearned) something that I had overlooked at the beginning. Latin courses, and even very large Latin dictionaries, are put together under very strict ground rules. Some dictionary might be based exclusively on "Classical" (200 BC - 200 AD) texts; it might have every word that appears in every surviving writing of Cicero, but nothing much before or since. Such a dictionary will be inadequate for translating medieval theological or scientific texts. In another example, one textbook might use Caesar as their main source of readings (my high school texts did), while another might avoid Caesar and all military writings (either for pacifist reasons, or just because the author had taught Caesar for 30 years and had grown bored with going over the same material, year after year). One can imagine that the selection of words in such different texts would differ considerably; moreover, even with the same words, the meanings attached would be different. This presents a problem in the development of a dictionary for general use.
One way to proceed would be to have a separate dictionary for each era and application (or a universal dictionary with tags to indicate the appropriate application and meaning for each word). With such an arrangement one would not be offered inappropriate or improbable interpretations. The present system has such a tag mechanism, but it is not yet fully exploited.
The Version 1.8 dictionary may be found to be of fairly general use for the introductory student; it has the easy words that every text uses. It also has a goodly number of adverbs, prepositions, and conjunctions, which are not as sensitive to application as are the nouns and verbs. The system also tests a number of prefixes and suffixes, if the raw word cannot be found. This allows an interpretation of many of the words otherwise unknown. The result of this analysis is fairly straightforward in most cases, is accurate but esoteric in some, and for about 1 in 10 it gives an answer that has no relation to the normal dictionary meaning.
With this facility, and a 12000 word dictionary, trials on some tested classical texts have given hit rates of 97%, and better, excluding proper names. (There are few proper names in the dictionary.) (I am an old soldier and seem to have in the dictionary every possible word for attack or distroy. The system is near perfect for Caesar.) The question arises, what hit rate can be expected for a general dictionary. Classical Latin dictionaries have no references to the terminology of Christian theology. The legal documents and deeds of the Middle Ages are a challenge of jargon and abbreviations. These areas require special knowledge and vocabulary, but even there the ability to handle the non-specialized words is a large part of the effort.
In some distributions, the system allows the inclusion of specialized vocabulary (for instance a SPEcial dictionary for medieval words not in most dictionaries), and the opportunity for the user to add additional words "on the fly" to a DICT.LOC.
The program is probably much larger than is necessary for the present application. It is stiil in development but some effort has now been put into optimization.
This is a Shareware program, which means it is proper to copy it and pass it on to your friends. Consider it a developmental item for which there is at this time no charge. However, it is Copyrighted (c), so don't try to sell it as your own without at least telling me.
This version is distributed without obligation, but the developer would appreciate comments and suggestions.
William A Whitaker
PO Box 3036
McLean VA 22103-3036
USA
whitaker@erols.com
^
With the input of a word, or several words in a line, the program returns information about the possible accedience, if it can find an agreeable stem in its dictionary.
amo am.o V 1 1 PRES ACTIVE IND 1 S X GEN> love, like; fall in love with; be fond of; have a tendency toTo support this method, an INFLECT.SEC data file was constructed containing possible Latin endings encoded by a structure that identifies the part of speech, declension, conjugation, gender, person, number, etc. This is a pure computer encoding for a "brute force" search. No sophisticated knowledge of Latin is used at this point. Rules of thumb (e.g., the fact, always noted early in any Latin course, that a neuter noun has the same ending in the nominative and accusative, with a final -a in the plural) are not used in the search. However, it is convenient to combine several identical endings with a general encoding (e.g., the endings of the perfect tenses are the same for all verbs, and are so encoded, not replicated for every conjugation and variant).
Many of the distinguishing differences identifying conjugations come from the voiced length of stem vowels (e.g., between the present, imperfect and future tenses of a third conjugation I-stem verb and a fourth conjugation verb). These aural differences, the features that make Latin "sound right" to one who speaks it, are lost entirely in the analysis of written endings.
The endings for the verb conjugations are the result of trying to minimize the number of individual endings records, while yet keeping the structure of the inflections data file fairly readable. There is no claim that the resulting arrangement is consonant with any grammarian's view of Latin, nor should it be examined from that viewpoint. While it started from the conjugations in text books, it can only be viewed as some fuzzy intermediate step along a path to a mathematically minimal number of encoded verb endings. Later versions of the program might improve the system.
There are some egregious liberties taken in the encoding. With the inclusion of two present stems, the third conjugation I-stem verbs may share the endings of the regular third conjugation. The fourth conjugation has disappeared altogether, and is represented as a somewhat modified variant of the third conjugation (3, 4)! There is an artificial fifth conjugation for esse and others, and a sixth for eo.
As an example, a verb ending record has the structure:
Thus, the entry for the ending appropriate to "amo" is:
V 1 1 PRES IND ACTIVE 1 S X 1 oKIND is not often used with the verb endings, but is part of the record for convenience elsewhere. For verbs, the KIND has not yet been exploited significantly, except for DEP and IMPERS.
The rest of the elements are straightforward and generally use the abbreviations that are common in any Latin text. An X or 0 represents the "don't know" or "don't care" for enumeration or numeric types. Details are documented below in the CODES section.
A verb dictionary record has the structure:
Thus, an entry corresponding to "amo amare amavi amatus" is:
am am amav amat V 1 1 X X X like, love(The dangling X X is used to encode information about the time in which this word is found and the subject area. There is not yet enough details in the dictionary to allow exploitation of this information.)
Endings may not uniquely determine which stem, and therefore meaning. "portas" could be the ablative plural of "gate", or the second person, singular, present indicative active of "carry". In both cases the stem is "port", as it is for "portus", "harbor". All possibilities are reported.
portas port.as V 1 1 PRES IND ACTIVE 2 S X > carry, bring port.as N 1 1 ACC P F T > gate, entrance; city gates; door; avenue;And note that the same stem (port) has other uses.
portum port.um N 4 1 ACC S M T > port, harbor; refuge, haven, place of refugePLEASE NOTE: It is certainly possible for the program to find a valid Latin construction that fits the input word and to have that interpretation be entirely wrong in the context. It is even possible to interpret a number, in Roman numerals, as a word! (But the number would be reported also.)
For the case of defective verbs, the process does not necessarily have to be precise. Since the purpose is only to translate from Latin, even if there are unused forms included in the algorithm, these will not come up in any real Latin text. The endings for the verb conjugations are the result of trying to minimize the number of individual endings records, while keeping the structure of the base INFLECTIONS data file fairly readable.
In general the program will try to construct a match with the inflections and the dictionaries. There are a number of specific checks to reject certain mathematically correct combinations that do not appear in the language, but these check are relatively few. The philosophy has been to allow a generous interpretation. A remark in a text or dictionary that a particular form does not exist must be tempered with the realization that the author probably means that it has not been observed in the surviving classical litterature. This body of reference is miniscule compared to the total use of Latin, even limited to the classical period. Who is to say that further examples would not turn up such an example, even if it might not have been approved of by Cicero. It is also possible the such reasonable, if "improper", constructs might occur in later writings by less educated, or just differennt, authors. Certainly English shows this sort of variation over time. If the exact stem is not found in the dictionary, there are rules for the construction of words which any student would try. The simplest situation is a known stem to which a prefix or suffix has been attached. The method used by the program (if DO_FIXES is on) is to try any fixes that fit, to see if their removal results in an identifiable remainder. Then the meaning is mechanically constructed from the meaning of the fix and the stem. The user may need to interpret with a more conventional English usage. This technique improves the performance significantly. However, in about 40% of the instances in which there is a hit, the derivation is correct but the interpretation takes some imagination. In something less than 10% of the cases, the inferred fix is just wrong, so the user must take some care to see if the interpretation makes any sense.
This method is complicated by the tendency for prefixes to be modified upon attachment (ab+fero => aufero, sub+fero => suffero). The program takes some such instances into account. Ideally, one should look inside the stem for identifiable fragments. One would like to start with the smallest possible stem, and that is most frequently the correct one. While it is mathematically possible that the stem of "actorum" is "actor" with the common inflection "um", no intuitive first semester Latin student would fail to opt for the genitive plural "orum", and probably be right. To first order, the procedure ignores such hints and reports this word in both forms, as well as a verb participle. However, it can use certain generally applicable rules, like the superlative characteristic "issim", to further guess.
Likewise, suffixes are tried. In addition, there is the capability to examine the word for such common techniques as syncope, the omission of the "ve" or "vi" in certain verb perfect forms (audivissem => audissem). These techniques ("tricks") are primitive in the present version, and might be replaced by more powerful procedures in later versions.
If the dictionary can not identify a matching stem, it may be possible to derive a stem from "nearby" stems (an adverb from an adjective is the most common example) and infer a meaning. If all else fails, a portion of the dictionary alphabetically around the word could be listed, from which the user can draw in making a guess, also not available in this version.
The program is written in Ada, and is machine independent. Source is available for compiling onto other machines.
The WORDS program, Version 1.8, accompanying this file should run on PC in DOS, any monitor. Simply copy the files into a subdirectory of a hard disk and call WORDS. The system will even work directly off the floppy, but will be much slower.
There are a number of files associated with the program. These must be in the subdirectory of the program, and the program must be run from that subdirectory.
WORDS.EXE is the executable program.
INFLECT.SEC holds the encoded inflection records.
DICTFILE.GEN contains the stems of the GENERAL dictionary .
MEANFILE.GEN contains the meanings of the GENERAL dictionary entries.
INDXFILE.GEN contains a set of indexes, breaking the DICTFILE into pieces.
There may also be a set of files for a SPECIAL (.SPE) dictionary of the same structure as the GENERAL dictionary.
A LOCAL dictionary may also be used. This is a limited dictionary of a different form, which is human readable and writeable. The knowledgeable user can augment and modify it on-line. It would consist of the file DICT.LOC.
UNIQUES. contains certain words which regular processing does not get.
ADDONS. contains the set of prefixes, suffixes and enclitics (-que, -ve) and the like.
Other files may be generated by the program, so run it in a configuration that allows the creation of files.
All these files are necessary to run the program (except the optional dictionaries SPE and LOC). This excess of files is a consequence of the present developmental nature of the program. The files are very simple, almost human-readable. Presumably, a later version could condense and encode them. Nevertheless, beyond the original COPY, the user need not worry about them.
In addition, there are files that the program may produce on request. All of these share the name WORD, with various extensions, and they are all ASCII text files which can be viewed and processed with an ordinary editor. The casual user probably does not want to get involved with these. WORD.OUT will record the whole output, WORD.UNK will list only words the program is unable to interpret. These outputs are turned on through the PARAMETERS mechanism.
PARAMETERS may be set while running the program by inputting a line containing the question mark as the first character. Alternatively, WORD.MOD contains the MODES that can be set by CHANGE_PARAMETERS. If this file does not exist, the default modes will be used. The file may be produced or changed when changing parameters. It can also be modified, if the user is sufficiently confident, with an editor, or deleted, thereby reverting to defaults.
WORD.OUT is the file produced if the user requests, in CHANGE_PARAMETERS, output to a file. This output can be used for later manipulation with a text editor, especially when the input was a text file of some length. If the parameter UNKNOWNS_ONLY is set, the output serves as a sort of a Latin spell checker. Those words it cannot match may just not be in the dictionary, but they may be typos. A WORD.UNK file of unknowns can be generated.
The program will no longer run off a floppy disk since the dictionary is now too large. It must be run in a hard disk subdirectory. The files are self-extracting by running MAK8WORD.EXE.
To start the program, in the subdirectory that contains all the files, type
WORDS. A setup procedure will execute, processing files. Then the program will
ask for a word to be keyed in. Input the word and give a One can input a whole line at a time, but only one line since the A '?' character input will permit the user to set modes to prevent the
process from trying prefixes and suffixes to get a match on an item unknown to
the dictionary, put output to a file, etc. Going into the CHANGE_PARAMETERS, the
? character calls help for each entry.
Two successive [return]s with no no text will terminate the program (except
in text being read from an @ disk file.)
The syncopated form of the perfect often drops the 'v' and loses the vowel.
An initial 'a' followed by a double letter often is used for an 'ad' prefix,
likewise an initial 'ad' prefix is often replaced by an 'a' followed by a double
letter.
An initial 'i' followed by a double letter often is used for an 'in' prefix,
likewise an initial 'in' prefix is often replaced by an 'i' followed by a double
letter.
A leading 'inp' could be an 'imp'.
A leading 'obt' could be an 'opt'.
An initial 'har...' or 'hal...' may be rendered by an 'ar' or 'al', likewise
the dictionary entry may have 'ar'/'al' and the trial word begin with 'ha...'.
An initial 'c' could be a 'k', or the dictionary entry uses 'c' for 'k'.
A nonterminal 'ae' is often rendered by an 'e'.
An initial 'E' can replace an 'Ae'.
An "iis..." beginning some forms of "eo" may be contracted to "is...".
A nonterminal 'ii' is often replaced by just 'i'; including 'ji', since in
this program and dictionary all 'j' are made 'i'.
A 'cl' could be a 'cul'.
A 'vul' could be a 'vol'.
Various manipulations of 'u' and 'v' are possible:
An additional provision is the attempt to recognize and display the value of
Roman numerals, and combinations of appropriate letters that do not parse
conventionally to a value but are probably ill-formed Roman numerals.
Various combinations of these tricks are attempted, and each try that results
in a possible hit is run against the full dictionary, which can make these
efforts time consuming. That is a good reason to make the dictionary as large as
possible, rather than counting on a smaller number of roots and doing the
maximum word formation.
Finally, while the program can succeed on a word that requires two or three
of these tricks to work in combination, there are limits. Some words for which
all the modifications are supported will fail, if there are just too many. In
fact, it is probably better that that be the case, otherwise one will generate
too many false positives. Testing so far does not seem to show excessive zeal on
the part of the program, but the user should examine the results, especially
when several tricks are involved.
^
Examples
Here are some anotated examples of output. Read through them
and you will get a good idea of the system. The present version may not match
these examples exactly - things are changing - but the principle is there. agricolarum
agricol.arum N 1 1 GEN P M P
> farmer
Here we have the simple first declension noun, and a unique
interpretation. The "1 1" means it is first declension, with variant 1. This is
an internal coding of the program, and may not correspond exactly with the
grammatical numbering. The "N" means it is a noun. It is the form for genitive
(GEN), plural (1st 'P'). The stem is masculine (M) and represents a person (2nd
'P'). The stem is given as "agricol" and the ending is "arum". The stem is
normal in this case, but is a product of the program, and may not always
correspond to conventional usage. feminae
femin.ae N 1 1 GEN S F P
femin.ae N 1 1 DAT S F P
femin.ae N 1 1 NOM P F P
femin.ae N 1 1 VOC P F P
> woman
Here we have a word that has several possible interpretations in case and
number (Singular and Plural). The gender is Feminine. Presumably, the user can
examine the adjoining words and reduce the set of possibilities. Maybe the
program will take care of this in some future version. cornu
corn.u N 4 2 NOM S N T
corn.u N 4 2 DAT S N T
corn.u N 4 2 ACC S N T
corn.u N 4 2 ABL S N T
> horn (of an animal); horn, trumpet; wing of an attacking army
Here is an example of another declension and a second variant. The
Masculine (-us) nouns of the declension (fructus) are "4 1" and the Neuter (-u)
nouns are coded as "4 2". This word is neuter (2nd N) and represents a thing
(T). ego
ego PRON 5 1 NOM S C PERS
> I, me; myself
A pronoun is much like a noun. The gender is common (C), that is, it may
be masculine or feminine. It is a personal (PERS) pronoun. illud
ill.ud PRON 6 1 NOM S N ADJECT
ill.ud PRON 6 1 ACC S N ADJECT
GEN> that; those (pl.); also DEMONST
Here we have an adjectival (ADJECT) and demonstrative (DEMONST) pronoun. hic
hic ADV POS
GEN> here, in this place
h.ic PRON 3 1 NOM S M ADJECT
GEN> this; these (pl.); also DEMONST
In this case there is a adjectival/demonstrative pronoun, or it may be an
adverb. The POS means that the comparison of the adverb is positive. bonum
bon.um N 2 2 NOM S N T
bon.um N 2 2 ACC S N T
> good thing, profit, advantage; goods (pl.), possessions
bon.um ADJ 1 1 NOM S N POS
bon.um ADJ 1 1 ACC S M POS
bon.um ADJ 1 1 ACC S N POS
bon.um ADJ 1 1 VOC S N POS
> good, honest, brave, noble; better; best
Here we have an adjective, but it might also be a noun. The interpretation
of the adjective says that it is POSitive, but note that there are meanings for
COMParative and SUPERlative also on the line. Check the comparison value before
deciding. facile
facile ADV POS
> easily, readily
facil.e ADJ 3 2 NOM S N POS
facil.e ADJ 3 2 ACC S N POS
facil.e ADJ 3 2 VOC S N POS
> easy, easy to do, without difficulty, ready, quick, good natured, courteo
Here is an adjective or and adverb. Although they are related in meaning,
they are different words. acerrimus
acerrim.us ADJ 3 2 NOM S M SUPER
> sharp, bitter, pointed, piercing, shrill; sagacious, keen; severe, vigoro
Here we have an adjective in the SUPERlative. The meanings are all
POSitive and the user must add the -est by himself. optime
optim.e ADJ 1 1 VOC S M SUPER
> good, honest, brave, noble; better; best
optime ADV SUPER
> well, very, quite, rightly, agreeably, cheaply, in good, style; better; b
Here is an adjective or and adverb, both are SUPERlative. monuissemus
monu.issemus V 2 1 PLUP ACTIVE SUB 1 P X
GEN> remind, advise, warn; teach; admonish; foretell
Here is a verb for which the form is PLUPerfect, ACTIVE, SUBjunctive, 1st
person, Plural. It is 2nd conjugation, variant 1. amat
am.at V 1 1 PRES ACTIVE IND 3 S X
GEN> love, like; fall in love with; be fond of; have a tendency to
Another regular verb, PRESent, ACTIVE, INDicative. amatus
amat.us VPAR 1 1 NOM S M PERF PASSIVE PPL
GEN> love, like; fall in love with; be fond of; have a tendency to
Here we have the PERFect, PASSIVE ParticiPLe, in the NOMinative, Singlar,
Masculine. amatu
amat.u SUPINE 1 1 ABL S X
GEN> love, like; fall in love with; be fond of; have a tendency to
Here is the SUPINE of the verb in the ABLative Singular. orietur
ori.etur V 3 4 FUT PASSIVE IND 3 S DEP
GEN> rise, arise; spring from, appear; be descended; begin, proceed, originate
For DEPondent verbs the passive form is to be translated as if it were
active voice. ab
ab PREP ABL
> by, from, away from
Here is a PREPosition that takes an ABLative object. sine
sin.e V 3 1 PRES ACTIVE IMP 2 S X
> allow, permit
sine PREP ABL
> without
Here is a PREPosition that might also be a Verb. contra
contra PREP ACC
> against, opposite; facing; contrary to, in reply to
contra ADV POS
> in opposition, in turn; opposite, on the contrary
Here is a PREPosition that might also be an ADVerb. This is a very common
situation, with the meanings being much the same. et
et CONJ
> and, and even; also, even; (et ... et = both ... and)
Here is a straight CONJunction. vae
vae INTERJ
> alas, woe, ah; oh dear; (Vae, puto deus fio.)
Here is a straight INTERJection.
septem
septem NUM 2 0 X X X CARD 7
GEN> seven
VII
vii NUM 2 0 X X X CARD 7
XXX> 7 as a ROMAN NUMERAL
Two ways of expressing a numeral.
^
Tricks
There are a number of situations in Latin writing where certain
modifications or conventions regularly are found. While often found, these are
not the normal classical forms. If a conventional match is not found, the
program may be instructed to TRY_TRICKS. Below is a partial list of current
tricks.
^
Codes
For completeness, the codes used in the output are listed here as
Ada statements. Not all the facilities implied by these values are developed yet
in the program or the dictionary. This list is only for Version 1.0. Later
versions will likely be somewhat different. This may make their dictionaries
incompatible with the present program.
type PART_OF_SPEECH_TYPE is (X, -- Default, "dont care"
N,
PRON,
PACK, -- PRON with TACKON (-cum,...)
ADJ,
ADV,
V,
VPAR, -- Uses Verb stems, no DICT entries
SUPINE, -- Uses Verb stems, no DICT entries
PREP,
CONJ,
INTERJ,
NUM,
PREFIX, -- Purely artificial, for computer convenience
SUFFIX, -- Purely artificial, for computer convenience
TACKON); -- Purely artificial, for computer convenience
type GENDER_TYPE is (X, M, F, N, C); -- C = Common (M or F)
type CASE_TYPE is (X,
NOM,
GEN,
DAT,
ACC,
ABL,
VOC,
LOC);
type NUMBER_TYPE is (X, S, P);
type PERSON_TYPE is range 0..3;
type COMPARISON_TYPE is (POS, COMP, SUPER, X);
type TENSE_TYPE is (X,
PRES,
IMP,
FUT,
PERF,
PLUP,
FUTP);
type MOOD_TYPE is (X,
IND,
SUB,
IMP,
INF,
PPL);
type VOICE_TYPE is (X, ACTIVE, PASSIVE);
type NOUN_KIND_TYPE is (X, -- unknown, nondescript
N, -- proper Name
L, -- Locale, country, city
W, -- a place Where
P, -- a Person type
T); -- a Thing
type PRONOUN_KIND_TYPE is (X, PERS, REL, REFLEX,
DEMONS, INTERR, INDEF, ADJECT);
type VERB_KIND_TYPE is (X, TO_BE, IMPERS, GEN, DAT, ABL,
TRANS, INTRANS,
DEP, SEMIDEP, PERFDEF);
type NUMERAL_KIND_TYPE is (X, CARD, ORD, DIST, ADVERB);
^
Help for Parameters
This section lists the help available in
CHANGE_PARAMETERS. Mode parameters are displayed with their current values,
either the values read from the WORD.MOD, or the default values if there is no
valid mode file. The user can accept the current value by giving a HAVE_OUTPUT_FILE_HELP :
This option instructs the program to create a file which can hold the
output for later study, otherwise the results are just displayed on
the screen. The output file is named & OUTPUT_FULL_NAME
& (39+OUTPUT_FULL_NAME'LENGTH..70 => ' '),
This means that one run will necessarily overwrite a previous run,
unless the previous results are renamed or copied to a file of another
name. Using this output file slows the program, especially if it is
being executed from a floppy; just having it will not matter much.
The default is N(o), since this prevents the program from overwriting
previous work unintentionally. Y(es) creates the output file.
WRITE_OUTPUT_TO_FILE_HELP :
This option instructs the program, when HAVE_OUTPUT_FILE is on, to
write results to the file & OUTPUT_FULL_NAME
& (27+OUTPUT_FULL_NAME'LENGTH..70 => ' '),
This option may be turned on and off during running of the program,
thereby capturing only certain desired results.
If the option HAVE_OUTPUT_FILE is off, the user will not be given a
chance to turn this one on. Default is N(o).
DO_UNKNOWNS_ONLY_HELP :
This option instructs the program to only output those words that it
cannot resolve. Of course, it has to do processing on all words, but
those that are found (with prefix/suffix, if that option in on) will
be ignored. The purpose of this option is o allow a quick look to
determine if the dictionary and process is going to do an acceptable
job on the current text. It also allows the user to assemble a list
of unknown words to look up manually, and perhaps augment the system
dictionary. For those purposes, the system is usually run with the
MINIMIZE_OUTPUT option, just producing a list. Another use is to run
without MINIMIZE to an output file. This gives a list of the input
text with the unknown words, by line. This functions as a spelling
checker for Latin. The default is N(o).
WRITE_UNKNOWNS_TO_FILE_HELP :
This option instructs the program to write all unresolved words to a
UNKNOWNS file named & UNKNOWNS_FULL_NAME
& (21+UNKNOWNS_FULL_NAME'LENGTH..70 => ' '),
With this option on , the file of unknowns is written, even though
the main output contains both known and unknown (unresolved) words.
One may wish to save the unknowns for later analysis, testing, or to
form the basis for dictionary additions. When this option is turned
on, the UNKNOWNS file is written, destroying any file from a previous
run. However, the write may be turned on and off during a single run
without destroying the information written in that run.
This option is for specialized use, so its default is N(o).
DO_FIXES_HELP :
This option instructs the program, when it is unable to find a proper
match in the dictionary, to attach various prefixes and suffixes and
try again. This effort is successful in about a quarter of the cases
which would otherwise give UNKNOWN results, or so it seems in limited
tests. For those cases in which a result is produced, about half give
easily interpreted output; many of the rest are etymologically true,
but not necessarily obvious; about a tenth give entirely spurious
derivations. The user must proceed with caution.
The default choice is Y(es), since the results are generally useful.
This processing can be turned off with the choice of N(o).
DO_ONLY_FIXES_HELP :
This option instructs the program to ignore the normal dictionary
search and to go direct to attach various prefixes and suffixes before
processing. This is a pure research tool. It allows one to examine
the coverage of pure stems and dictionary primary compositions.
This option is only available if DO_FIXES is turned on.
This is entirely a development and research tool, not to be used in
conventional translation situations, so the default choice is N(o).
This processing can be turned on with the choice of Y(es).
DO_FIXES_ANYWAY_HELP :
This option instructs the program to do both the normal dictionary
search and then process for the various prefixes and suffixes too.
This is a pure research tool allowing one to consider the possibility
of strangge constructions, even in the presence of conventional
results, e.g., alte => deeply (ADV), but al+t+e => wing+ed (ADJ VOC)
(If multiple suffixes were supported this could also be wing+ed+ly.)
This option is only available if DO_FIXES is turned on.
This is entirely a development and research tool, not to be used in
conventional translation situations, so the default choice is N(o).
This processing can be turned on with the choice of Y(es).
------ PRESENTLY NOT IMPLEMENTED ------
IGNORE_UNKNOWN_NAMES_HELP :
This option instructs the program to assume that any capitalized word
longer than three letters is a proper name. As no dictionary can be
expected to account for many proper names, many such occur that would
be called UNKNOWN. This contaminates the output in most cases, and
it is often convenient to ignore these sperious UNKNOWN hits. This
option implements that mode, and calls such words proper names. Of
course, any proper names that are in the dictionary are handled in the
normal way. The default is Y(es).
V_TO_U_HELP :
This option instructs the program to asssume that the input text holds
to the convention of the new Oxford Latin Dictionary which rejects the
character 'v'. In all places where one might have a 'v', 'u' is used,
with the exception of capital letters in which case only 'V' appears,
for both 'U' and 'V'. If this mode is set No, the program will still
try to make the substitution as a TRICK if no other match is found.
However, if the 'u' convention is used througout the text, processing
will go much faster if the mode is set. The default is N(o).
DO_TRICKS_HELP :
This option instructs the program, when it is unable to find a proper
match in the dictionary, and after various prefixes and suffixes, to
try every dirty latin trick it can think of, mainly common letter
replacements like cl -> cul, vul -> vol, ads -> ass, inp -> imp, etc.
Together these tricks are useful, but may give false positives (>10%).
The default choice is Y(es), since the results are sometimes useful,
but expensive. This processing is turned off with the choice of N(o).
DO_SYNCOPE_HELP :
This option instructs the program to postulate that syncope of
perfect stem verbs may have occured (e.g, aver -> ar in the perfect),
and to try various possibilities for the insertion of a removed 'v'.
To do this it has to fully process the modified candidates, which can
have a consderable impact on the speed of processind a large file.
However, this trick seldom producesa false positive, and syncope is
very common in Latin (first year texts excepted). Default is Y(es).
This lengthy processing is turned off with the choice of N(o).
DO_COMPOUNDS_HELP :
This option instructs the program to look ahead for the verb TO_BE (or
iri) when it finds a verb participle, with the expectation of finding
a compound perfect tense or periphastic. The default choice is Y(es).
This processing is turned off with the choice of N(o).
DO_EXAMPLES_HELP :
This option instructs the program to provide examples of usage of the
cases/tenses/etc. that were constructed. The default choice is N(o).
This produces lengthly output and is turned on with the choice Y(es).
DO_ONLY_MEANINGS_HELP :
This option instructs the program to only output the MEANING for a
word, and omit the inflection details. This is primarily used in
analyzing new dictionary material, comparing with the existing.
However it may be of use for the translator who knows most all of
the words and just needs a little reminder for a few.
The default choice is N(o), but it can be turned on with a Y(es).
DO_STEMS_FOR_UNKNOWN_HELP :
This option instructs the program, when it is unable to find a proper
match in the dictionary, and after various prefixes and suffixes, to
try even dirtier tricks, specifically to try all the dictionary stems
that it finds that fit the letters, independent of whether the endings
match the parts of speech to which the stems are assigned. This will
catch a substantive for which only the ADJ stem appears in dictionary,
an ADJ for which there is only a N stem, etc. It will also list the
various endings that match the end of the input word. A certain
amount of weeding has been done, so only reasonably common endings
are quoted, and these are lumped together masking declension, etc.
Only N, ADJ, and V endings are given, LOC and VOC omitted, etc.
The user can then make his own judgement. This option should
probably only be used with individual UNKNOWN words, and off-line
from full translations, therefore the default choice is N(o).
This processing can be turned on with the choice of Y(es).
INCLUDE_UNKNOWN_CONTEXT_HELP :
This option instructs the program, when writing to an UNKNOWNS file,
to put out the whole context of the UNKNOWN (the whole input line on
which the UNKNOWN was found). This is appropriate for processing
large text files in which it is expected that there will be relatively
few UNKNOWNS. The main use at the moment is to provide display
of the input line on the output file in the case of UNKNOWNS_ONLY.
With NO you get a minimal list of unknown words, as opposed to a more
frequently desirable listing of the text of the input file with the
unknowns put after each line. The default is Y(es).
MINIMIZE_OUTPUT_HELP :
This option instructs the program to minimize the output. This is a
somewhat flexible term, but the use of this option will probably lead
to less output. The default is Y(es).
TRIM_OUTPUT_HELP :
This option instructs the program to remove from the output list of
possible constructs those which are least likely. At the present
stage, there is not much trimming, however, if the program grows more
powerful this may be a very useful option. Nevertheless, there is no
absolute assurence that the items removed are not correct, just that
they are statistically less likely (e.g., vocatives or locatives in
certain situations). Since little is now done, the default is Y(es)
SAVE_PARAMETERS_HELP :
This option instructs the program, to save the current parameters, as
just established by the user, in a file WORD.MOD. If such a file
exists, the program will load those parameters at the start. If no
such file can be found in the current subdirectory, the program will
start with a default set of parameters. Since this parameter file is
human-readable ASCII, it may also be created with a text editor. If
the file found has been improperly created, is in the wrong format, or
otherwise uninterpretable by the program, it will be ignored and the
default parameters used, until a proper parameter file in written by
the program. Since one may want to make temporary changes during a
run, but revert to the usual set, the default is N(o).