This manual describes version 0.1 of FreeDict-Editor.
Copyright © 2005 Michael Bunk <micha@luetzschena.de>
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License (GFDL), Version 1.1 or any later version published by the Free Software Foundation with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. You can find a copy of the GFDL at this link or in the file COPYING-DOCS distributed with this manual.
This manual is part of a collection of GNOME manuals distributed under the GFDL. If you want to distribute this manual separately from the collection, you can do so by adding a copy of the license to the manual, as described in section 6 of the license.
Many of the names used by companies to distinguish their products and services are claimed as trademarks. Where those names appear in any GNOME documentation, and the members of the GNOME Documentation Project are made aware of those trademarks, then the names are in capital letters or initial capital letters.
DOCUMENT AND MODIFIED VERSIONS OF THE DOCUMENT ARE PROVIDED UNDER THE TERMS OF THE GNU FREE DOCUMENTATION LICENSE WITH THE FURTHER UNDERSTANDING THAT:
DOCUMENT IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES THAT THE DOCUMENT OR MODIFIED VERSION OF THE DOCUMENT IS FREE OF DEFECTS MERCHANTABLE, FIT FOR A PARTICULAR PURPOSE OR NON-INFRINGING. THE ENTIRE RISK AS TO THE QUALITY, ACCURACY, AND PERFORMANCE OF THE DOCUMENT OR MODIFIED VERSION OF THE DOCUMENT IS WITH YOU. SHOULD ANY DOCUMENT OR MODIFIED VERSION PROVE DEFECTIVE IN ANY RESPECT, YOU (NOT THE INITIAL WRITER, AUTHOR OR ANY CONTRIBUTOR) ASSUME THE COST OF ANY NECESSARY SERVICING, REPAIR OR CORRECTION. THIS DISCLAIMER OF WARRANTY CONSTITUTES AN ESSENTIAL PART OF THIS LICENSE. NO USE OF ANY DOCUMENT OR MODIFIED VERSION OF THE DOCUMENT IS AUTHORIZED HEREUNDER EXCEPT UNDER THIS DISCLAIMER; AND
UNDER NO CIRCUMSTANCES AND UNDER NO LEGAL THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE, SHALL THE AUTHOR, INITIAL WRITER, ANY CONTRIBUTOR, OR ANY DISTRIBUTOR OF THE DOCUMENT OR MODIFIED VERSION OF THE DOCUMENT, OR ANY SUPPLIER OF ANY OF SUCH PARTIES, BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF GOODWILL, WORK STOPPAGE, COMPUTER FAILURE OR MALFUNCTION, OR ANY AND ALL OTHER DAMAGES OR LOSSES ARISING OUT OF OR RELATING TO USE OF THE DOCUMENT AND MODIFIED VERSIONS OF THE DOCUMENT, EVEN IF SUCH PARTY SHALL HAVE BEEN INFORMED OF THE POSSIBILITY OF SUCH DAMAGES.
Feedback
To report a bug or make a suggestion regarding the FreeDict-Editor
application or this manual, use the FreeDict
Bug Tracking System, provided by SourceForge, using the
category FreeDict-Editor.
| Revision History | |
|---|---|
| Revision FreeDict-Editor Manual V0.1 | 2005-03-22 |
|
FreeDict | |
Table of Contents
The FreeDict-Editor enables you to create and edit dictionaries. It provides the following features:
Partial support for the Text Encoding Initiative eXtensible Markup Language, Version 4, Chapter 12 "Print Dictionaries" file format (TEI P4 XML). The generated file is always valid.
XPath support to flexibly select entries to be edited.
Easy, user friendly Form View to edit entries, fallback to plain XML for entries that do not match the structure of the Form.
Support for entering pronunciation in IPA characters as well as some hints on entering special characters under X11.
HTML preview of a formatted entry using a user-configurable XSLT stylesheet.
When adding entries, the workflow ensures that double-entries of the same word are avoided.
Integration of sanity checks and a spell checker enable authoring of high quality.
Ability to abort a never ending XPath evaluation (this is not a matter of course!).
FreeDict-Editor has the following software requirements:
GNU/Linux operating system
libgtkhtml (it doesn't seem to have its own homepage)
libgnomeui-2.0, libbonoboui-2.0 and gconf-2.0 - these are all GNOME libraries
The following additional software is recommended, but a build should proceed without it:
FreeDict-Editor requires a lot of memory. The virtual memory usage of a FreeDict-Editor process as shown by the ps command is as follows:
This section mostly assumes that you have already started FreeDict-Editor.
Before you can work on a dictionary, you have to open one. You can open a TEI file in the following ways:
This is not worth mentioning, since it is a standard dialog.
But one thing deserves mentioning, even though it is not specific to this application. The standard GTK+ file dialog supports filename completion with the TAB key. I always considered the GTK+ standard file dialog inferior to others (eg. those from KDE or M$ Windoze) for lack of favorite locations and cumbersome directory traversal. But this feature outweighs this lack a bit.
From GTK+ 2.4 GtkFileChooser is available, which supports favorite locations.
To start FreeDict-Editor and open a TEI file from a command line, type the following command, then press Return:
freedict-editor
filename.tei
where filename.tei is the name of
the dictionary file you want to open. You can specify only one
file to open.
The following line might come in handy in your
~/.bashrc, because it enables filename
completion of TEI files when you have already entered the
command freedict-editor:
complete -G '*.tei' -X '!&*'
freedict-editor
You can also drag a TEI file from another application such as a file manager like Nautilus to the FreeDict-Editor window to open it.
Do not drop it on the list of matching entries, it doesn't like it! This is a bug that should be fixed.
If the mime-info files were installed in the right location, double-clicking on a TEI file in the Nautilus file manager should default to open it with FreeDict-Editor.
When you start FreeDict-Editor, the following window is displayed.
The FreeDict-Editor window contains the following elements:
These are Standard GUI elements not requiring much mention if the manual is not to bore the user. Remember that toolbars generally contain only a subset of the things reachable via the Menubar for faster access.
These are two Input fields. The XPath Template contains a
template. It may contain the string %s, which
would be replaced by the contents of the Select Input Field.
If you want to use a literal % character in
the template, you would have to escape it by preceding it with
another % character.
After every change of the Select Input Field, the list of headwords of entries that matched the XPath expression is shown, limited to 50 entries. You can double-click any of those to openit in the Entry Editor.
This is the heart of the application. The editor can in two modes. If the entry matches a certain format (XXX document that format!), it can be edited by filling fields of a Form. Otherwise, the plain XML source has to be edited.
To enter characters of the International Phonetic Alphabet, the IPA Input Method of GTK+ can be used. The global GTK+ Input Method is automatically switched to IPA Mode when then Pronunciation field is entered. The user's previously selected input method is restored, when the Pronunciation field is left.
To manually choose a GTK+ Input Method, on any input field except the Pronunciation field (since that is handled automatically), eg. the Orthography field in the Form View of the Entry Editor. Choose your desired input method from the offered ones in the submenu.
You can now use the following key sequences to compose IPA characters:
Table 2. Key Sequences of the IPA Input Method
| Key Sequence | IPA Character | Character Name |
|---|---|---|
| GDK_ampersand | ɣ | LATIN SMALL LETTER GAMMA |
| GDK_apostrophe | ˈ | MODIFIER LETTER VERTICAL LINE |
| GDK_slash + GDK_apostrophe | ˊ | MODIFIER LETTER ACUTE ACCENT |
| GDK_slash + GDK_slash | / | SOLIDUS |
| GDK_slash + GDK_3 | ɛ | LATIN SMALL LETTER OPEN E |
| GDK_slash + GDK_A | ɒ | LATIN LETTER TURNED ALPHA |
| GDK_slash + GDK_R | ʁ | LATIN LETTER SMALL CAPITAL INVERTED R |
| GDK_slash + GDK_a | ɐ | LATIN SMALL LETTER TURNED A |
| GDK_slash + GDK_c | ɔ | LATIN SMALL LETTER OPEN O |
| GDK_slash + GDK_e | ə | LATIN SMALL LETTER SCHWA |
| GDK_slash + GDK_h | ɥ | LATIN SMALL LETTER TURNED H |
| GDK_slash + GDK_m | ɯ | LATIN SMALL LETTER TURNED M |
| GDK_slash + GDK_r | ɹ | LATIN SMALL LETTER TURNED R |
| GDK_slash + GDK_v | ʌ | LATIN SMALL LETTER TURNED V |
| GDK_slash + GDK_w | ʍ | LATIN SMALL LETTER TURNED W |
| GDK_slash + GDK_y | ʎ | LATIN SMALL LETTER TRUEND Y |
| GDK_3 | ʒ | LATIN SMALL LETTER EZH |
| GDK_colon | ː | MODIFIER LETTER TRIANGULAR COLON |
| GDK_A | ɑ | LATIN SMALL LETTER ALPHA |
| GDK_E | ɛ | LATIN SMALL LETTER OPEN E |
| GDK_I | ɪ | LATIN LETTER SMALL CAPITAL I |
| GDK_L | ʟ | LATIN LETTER SMALL CAPITAL L |
| GDK_M | ʍ | LATIN SMALL LETTER TURNED W |
| GDK_O | O | LATIN LETTER SMALL CAPITAL OE |
| GDK_O + GDK_E | ɶ | LATIN LETTER SMALL CAPITAL OE |
| GDK_R | ʀ | LATIN LETTER SMALL CAPITAL R |
| GDK_U | ʊ | LATIN SMALL LETTER UPSILON |
| GDK_Y | ʏ | LATIN LETTER SMALL CAPITAL Y |
| GDK_grave | ˌ | MODIFIER LETTER LOW VERTICAL LINE |
| GDK_a | a | LATIN SMALL LETTER A |
| GDK_a + GDK_e | æ | LATIN SMALL LETTER AE |
| GDK_c | c | LATIN SMALL LETTER C |
| GDK_c + GDK_comma | ç | LATIN SMALL LETTER C WITH CEDILLA |
| GDK_d | d | LATIN SMALL LETTER E |
| GDK_d + GDK_apostrophe | d | LATIN SMALL LETTER D |
| GDK_d + GDK_h | ð | LATIN SMALL LETTER ETH |
| GDK_e | e | LATIN SMALL LETTER E |
| GDK_e + GDK_minus | ɚ | LATIN SMALL LETTER SCHWA WITH HOOK |
| GDK_e + GDK_bar | ɚ | LATIN SMALL LETTER SCHWA WITH HOOK |
| GDK_g | g | LATIN SMALL LETTER G |
| GDK_g + GDK_n | ɲ | LATIN SMALL LETTER N WITH LEFT HOOK |
| GDK_i | i | LATIN SMALL LETTER I |
| GDK_i + GDK_minus | ɨ | LATIN SMALL LETTER I WITH STROKE |
| GDK_n | n | LATIN SMALL LETTER N |
| GDK_n + GDK_g | ŋ | LATIN SMALL LETTER ENG |
| GDK_o | o | LATIN SMALL LETTER O |
| GDK_o + GDK_minus | ɵ | LATIN LETTER BARRED O |
| GDK_o + GDK_slash | ø | LATIN SMALL LETTER O WITH STROKE |
| GDK_o + GDK_e | œ | LATIN SMALL LIGATURE OE |
| GDK_o + GDK_bar | ɑ | LATIN SMALL LETTER ALPHA |
| GDK_s | s | LATIN SMALL LETTER_ESH |
| GDK_s + GDK_h | ʃ | LATIN SMALL LETTER_ESH |
| GDK_t | t | LATIN SMALL LETTER T |
| GDK_t + GDK_h | θ | GREEK SMALL LETTER THETA |
| GDK_u | u | LATIN SMALL LETTER U |
| GDK_u + GDK_minus | ʉ | LATIN LETTER U BAR |
| GDK_z | z | LATIN SMALL LETTER Z |
| GDK_z + GDK_h | ʒ | LATIN SMALL LETTER EZH |
| GDK_bar + GDK_o | ɒ | LATIN LETTER TURNED ALPHA |
| GDK_asciitilde | ̃ | COMBINING TILDE |
These sequences were extracted from the file
modules/input/imipa.c in the GTK+ 2.6.1
distribution. The lines GDK_O / O / LATIN LETTER SMALL
CAPITAL OE, GDK_d / d / LATIN SMALL LETTER
E and GDK_s / s / LATIN SMALL
LETTER_ESH are obviously wrong. That is a mistake in
imipa.c that I should have reported.
Other applications supporting IPA Characters are
Yudit and Emacs.
Please refer to files like
/usr/share/yudit/src/ASCII-IPA.kmap and
/usr/share/xemacs/mule-packages/etc/mule-ucs/reldata/uipa.el
for inspiration in case the GTK+ IPA Input Method lacks any character.
The ultimative reference regarding IPA Characters is the Unicode Standard, especially the Character Range 0x250-0x2af covers at least 85 characters, so some might be missing here! You can look at the Unicode Code Charts for Symbols.
In gtk/gtkimcontextsimple.c it is documented
that you can enter Unicode characters by holding down Shift + Control
and entering the hexadecimal code of the Unicode character you want
to enter.
This functionality is not provided by FreeDict-Editor itself. But many languages require accented and other special characters to be entered, hence this word of advice.
One way to enter special characters is to use GTK+ Input Methods. In FreeDict-Editor this way is mainly used for entering IPA Characters, but it can be used to enter other characters as well. See the section called “How to Enter IPA Characters”.
In the X Window System, the X Keyboard Extension (XKB) is responsible for translating numeric scancodes sent from the keyboard into symbolic keysyms that are handed over to the application. XKB can also compose characters out of combining (among them diacritical) marks and base characters. For the compose function to work, a key with the compose function is required, as well as "dead" keyboard keys. Those dead keys will hold the combining marks and will not produce a character immediately after being pressed. They are composed with other characters.
The most authoritative source for XKB is "The X Keyboard
Extension: Protocol Specification" that could be found at
/mnt/hdb2/cygwin/usr/x11r6/lib/x11/doc/PostScript/XKBproto.PS
on my system.
An exemplary keyboard layout with dead keys is
latin. The following table shows by which key combinations
the dead keys are reachable (using the "basic" section of that file;
there are other sections for example for a German keyboard model). The
information presented here was extracted from
/etc/X11/xkb/symbols/pc/latin. The file comments
say it is Revision 1.4 from 2003/01/26. Above path is valid for my
SuSE 9.0 system and may differ for yours.
Table 3. Dead keys of the latin keyboard layout
| Keysym without any modifier / Keycap | Keysym with Shift modifier only / Keycap | Keysym with ISO_Level3_Shift modifier only | Keysym with Shift+ISO_Level3_Shift modifiers | |
|---|---|---|---|---|
| equal / = | plus / + | dead_cedilla | dead_ogonek | |
| bracketleft / [ | braceleft / { | dead_diaeresis | dead_abovering | |
| bracketright / ] | braceright / } | dead_tilde | dead_macron | |
| semicolon / ; | colon / : | dead_acute | dead_doubleacute | |
| apostrophe / ' | quotedbl / " | dead_circumflex | dead_caron | |
| backslash / \ | bar / | | dead_grave | dead_breve | |
| slash / / | question / ? | dead_belowdot | dead_abovedot |
Another keyboard layout promising many dead keys is
us_intl. But its disadvantage is that it
interferes with the multi layout concept because it uses group 2
instead of shift levels 3 and 4.
For us_intl to be useful, you must have an
easy way to switch to group 2, eg. by defining the right Alt key to
function as Mode_switch.
The setxkbmap command can be used to configure XKB. For example, enter the following command into an xterm:
setxkbmap latin,de -option compose:ralt -option
grp:alt_shift_toggle
The compose:ralt option puts the compose
function on the right Alt key. That key will produce
the LevelThree keysym, which will be interpreted
as ISO_Level3_Shift modifier.
The above command also sets up a multi layout configuration. Initially, the latin layout is active. Using the group toggle keys Shift+Left Alt is is possible to switch permanently between the two layouts. The permanently here means the layout switched to is enabled even when the toggle keys are left.
An alternative to toggling is to switch between layouts only temporarily. The Mode_switch key can do this by selecting the next group while it is pressed. Thus, it acts like another modifier.
You can use the command setxkbmap -print to
find out the current xkb settings.
Another nice thing of XKB is its ability to show the current
keyboard configuration using the current keyboard geometry. Enter the
following command:
xkbprint
-color -lg
group -ll
shiftlevel :0 - | gv -seascape -group must be replaced by the keyboard
group to be shown (1-4). shiftlevel must
also be replaced by a number from 1-4 - out of which 1 (the default)
and 3 are most useful. The first part of this command produces a
PostScript file. To view it with the second part of this command, the
gv command must be available, ie. the
ghostview application must be
installed.
Another way of configuring the XKB extension is to specify the XKB
options in the InputDevice Section of the XF86Config file.
The manual page keyboard(4) describes how to specify XKB
options there (But it does not explicitly mention how to specify
multiple values for XkbOptions. xf86config
separates values by comma). XXX extend keyboard(4) man page. An example is:
# from /etc/X11/XF86Config Section "InputDevice" Driver "Keyboard" Identifier "Keyboard[0]" Option "AutoRepeat" "200 39" Option "Protocol" "Standard" Option "XkbLayout" "latin,de" Option "XkbModel" "pc104" Option "XkbRules" "xfree86" Option "XkbVariant" "xfree86" Option "XKbOptions" "grp:alt_shift_toggle,grp:switch" EndSection
Prior to editing entries, you have to select them from all the enties in the currently open dictionary. To select entries you have to write an XPath expression which the computer will evaluate. The result is a set of matching entries.
The XPath language is very powerful and because of that it can be complicated.
FreeDict-Editor has two input fields to support you in entering XPath epressions for
selecting entries. The Input Field XPath Template lets you enter
the actual XPath expression. But in it you are allowed to write %s,
which will be replaced by the contents of the Select Input Field.
Normally you will modify the Select Input Field much more often than
the template.
WHwn an XPath evaluation takes too much time (this happens especially when you use
the // operator), you can abort it. For this just click on
the Button Abort XPath Evaluation.
FreeDict-Editor extends the functions available in XPath expressions. For this it uses the
namespace prefix fd for the namespace
http://freedict.org/freedict-editor.
Currently only one XPath extension function exists:
bool unbalanced-braces(nodeset)
checks the string representation of each node in the nodeset and returns true() when any node's string representation is unbalanced
It is used in the sanity check for unbalanced braces.
One might wonder what this function has lost in a dictionary editor. Most new dictionaries will be developed for languages which don't have a spellchecking wordlist yet. But just consider the other language of the dictionary. Mostly it will be English - for which good spellchecking lists do exist! Also, you could generate a spellchecking wordlist from the headwords and check the example sentences with it!
So much for the motivation. Just a number: In my 2280 entries Khasi-English dictionary I had 10 spelling mistakes.
To configure FreeDict-Editor, choose → . The Preferences dialog contains the following tabbed sections:
Parameter Entities (ie. entities for use in DTDs) in the internal DTD subset are dereferenced (ie. replaced by their replacement text), you see the result after saving and reloading.
General Entities (ie. entities for use in the document content) are not handled transparently by XSLT. They are replaced by their replacement text and should be avoided from the beginning.
Memleaks are likely
Some deprecated and/or private fields/functions of GTK+, libgnomeui and libbonoboui are used
This section contains links to sources of further information.
This document assumes that you are familiar with the FreeDict HOWTO
"Dictionary Editor and Browser" as presented in "Lexical Databases in XML" by Pavel Smrz and Martin Povolny, Faculty of Informatics, Masaryk University Brno, Botanicka, 68a, 602 00 Brno, Czech Republic, E-mail: {smrz,xpovolny}@fi.muni.cz (XXX add link) - DEB is a system that might be more suited to large projects than FreeDict-Editor
SIL Dictionary Development Program - XXX quote from its website
FreeDict-Editor was written by Michael Bunk
<micha@luetzschena.de>. To find more information, please
visit freedict.org.
To report a bug or make a suggestion regarding this application or this manual, follow the directions in the Feedback paragraph (XXX link to releaseinfo/legalnotice - but that legal notice actually only belongs to this doc).
This program is distributed under the terms of the GNU General Public license as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. A copy of this license can be found at this link, or in the file COPYING included with the source code of this program.