FreeDict-Editor Manual V0.1

Michael Bunk

FreeDict

This manual describes version 0.1 of FreeDict-Editor.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License (GFDL), Version 1.1 or any later version published by the Free Software Foundation with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. You can find a copy of the GFDL at this link or in the file COPYING-DOCS distributed with this manual.

This manual is part of a collection of GNOME manuals distributed under the GFDL. If you want to distribute this manual separately from the collection, you can do so by adding a copy of the license to the manual, as described in section 6 of the license.

Many of the names used by companies to distinguish their products and services are claimed as trademarks. Where those names appear in any GNOME documentation, and the members of the GNOME Documentation Project are made aware of those trademarks, then the names are in capital letters or initial capital letters.

DOCUMENT AND MODIFIED VERSIONS OF THE DOCUMENT ARE PROVIDED UNDER THE TERMS OF THE GNU FREE DOCUMENTATION LICENSE WITH THE FURTHER UNDERSTANDING THAT:

  1. DOCUMENT IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES THAT THE DOCUMENT OR MODIFIED VERSION OF THE DOCUMENT IS FREE OF DEFECTS MERCHANTABLE, FIT FOR A PARTICULAR PURPOSE OR NON-INFRINGING. THE ENTIRE RISK AS TO THE QUALITY, ACCURACY, AND PERFORMANCE OF THE DOCUMENT OR MODIFIED VERSION OF THE DOCUMENT IS WITH YOU. SHOULD ANY DOCUMENT OR MODIFIED VERSION PROVE DEFECTIVE IN ANY RESPECT, YOU (NOT THE INITIAL WRITER, AUTHOR OR ANY CONTRIBUTOR) ASSUME THE COST OF ANY NECESSARY SERVICING, REPAIR OR CORRECTION. THIS DISCLAIMER OF WARRANTY CONSTITUTES AN ESSENTIAL PART OF THIS LICENSE. NO USE OF ANY DOCUMENT OR MODIFIED VERSION OF THE DOCUMENT IS AUTHORIZED HEREUNDER EXCEPT UNDER THIS DISCLAIMER; AND

  2. UNDER NO CIRCUMSTANCES AND UNDER NO LEGAL THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE, SHALL THE AUTHOR, INITIAL WRITER, ANY CONTRIBUTOR, OR ANY DISTRIBUTOR OF THE DOCUMENT OR MODIFIED VERSION OF THE DOCUMENT, OR ANY SUPPLIER OF ANY OF SUCH PARTIES, BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF GOODWILL, WORK STOPPAGE, COMPUTER FAILURE OR MALFUNCTION, OR ANY AND ALL OTHER DAMAGES OR LOSSES ARISING OUT OF OR RELATING TO USE OF THE DOCUMENT AND MODIFIED VERSIONS OF THE DOCUMENT, EVEN IF SUCH PARTY SHALL HAVE BEEN INFORMED OF THE POSSIBILITY OF SUCH DAMAGES.

Feedback

To report a bug or make a suggestion regarding the FreeDict-Editor application or this manual, use the FreeDict Bug Tracking System, provided by SourceForge, using the category FreeDict-Editor.

Revision History
Revision FreeDict-Editor Manual V0.12005-03-22

Michael Bunk

FreeDict


Table of Contents

Introduction
Requirements
Getting Started
Opening a TEI dictionary
When You Start FreeDict-Editor
Usage
How to Enter IPA Characters
How to Enter Special Characters
How to Select Entries
How to Spell Check
Settings
General
Entry Editor
Known Bugs and Limitations
Links
About FreeDict-Editor

Introduction

The FreeDict-Editor enables you to create and edit dictionaries. It provides the following features:

  • Partial support for the Text Encoding Initiative eXtensible Markup Language, Version 4, Chapter 12 "Print Dictionaries" file format (TEI P4 XML). The generated file is always valid.

  • XPath support to flexibly select entries to be edited.

  • Easy, user friendly Form View to edit entries, fallback to plain XML for entries that do not match the structure of the Form.

  • Support for entering pronunciation in IPA characters as well as some hints on entering special characters under X11.

  • HTML preview of a formatted entry using a user-configurable XSLT stylesheet.

  • When adding entries, the workflow ensures that double-entries of the same word are avoided.

  • Integration of sanity checks and a spell checker enable authoring of high quality.

  • Ability to abort a never ending XPath evaluation (this is not a matter of course!).

Requirements

FreeDict-Editor has the following software requirements:

The following additional software is recommended, but a build should proceed without it:

FreeDict-Editor requires a lot of memory. The virtual memory usage of a FreeDict-Editor process as shown by the ps command is as follows:

Table 1. FreeDict-Editor Memory Usage

EntriesVirtual Memory Usage in MB
011
8 00030
80 000140

Getting Started

This section mostly assumes that you have already started FreeDict-Editor.

Opening a TEI dictionary

Before you can work on a dictionary, you have to open one. You can open a TEI file in the following ways:

Open File Dialog

This is not worth mentioning, since it is a standard dialog.

But one thing deserves mentioning, even though it is not specific to this application. The standard GTK+ file dialog supports filename completion with the TAB key. I always considered the GTK+ standard file dialog inferior to others (eg. those from KDE or M$ Windoze) for lack of favorite locations and cumbersome directory traversal. But this feature outweighs this lack a bit.

Note

From GTK+ 2.4 GtkFileChooser is available, which supports favorite locations.

Command line

To start FreeDict-Editor and open a TEI file from a command line, type the following command, then press Return:

freedict-editor filename.tei

where filename.tei is the name of the dictionary file you want to open. You can specify only one file to open.

Tip

The following line might come in handy in your ~/.bashrc, because it enables filename completion of TEI files when you have already entered the command freedict-editor:

complete -G '*.tei' -X '!&*' freedict-editor

Drag-And-Drop

You can also drag a TEI file from another application such as a file manager like Nautilus to the FreeDict-Editor window to open it.

Warning

Do not drop it on the list of matching entries, it doesn't like it! This is a bug that should be fixed.

Doubleclick

If the mime-info files were installed in the right location, double-clicking on a TEI file in the Nautilus file manager should default to open it with FreeDict-Editor.

When You Start FreeDict-Editor

When you start FreeDict-Editor, the following window is displayed.

The FreeDict-Editor window contains the following elements:

Menubar, Toolbar, Statusbar

These are Standard GUI elements not requiring much mention if the manual is not to bore the user. Remember that toolbars generally contain only a subset of the things reachable via the Menubar for faster access.

XPath Matching Controls

These are two Input fields. The XPath Template contains a template. It may contain the string %s, which would be replaced by the contents of the Select Input Field. If you want to use a literal % character in the template, you would have to escape it by preceding it with another % character.

Matching Entries

After every change of the Select Input Field, the list of headwords of entries that matched the XPath expression is shown, limited to 50 entries. You can double-click any of those to openit in the Entry Editor.

Entry Editor

This is the heart of the application. The editor can in two modes. If the entry matches a certain format (XXX document that format!), it can be edited by filling fields of a Form. Otherwise, the plain XML source has to be edited.

Usage

How to Enter IPA Characters

To enter characters of the International Phonetic Alphabet, the IPA Input Method of GTK+ can be used. The global GTK+ Input Method is automatically switched to IPA Mode when then Pronunciation field is entered. The user's previously selected input method is restored, when the Pronunciation field is left.

To manually choose a GTK+ Input Method, Right-Click on any input field except the Pronunciation field (since that is handled automatically), eg. the Orthography field in the Form View of the Entry Editor. Choose your desired input method from the offered ones in the Input Methods submenu.

You can now use the following key sequences to compose IPA characters:

Table 2. Key Sequences of the IPA Input Method

Key SequenceIPA CharacterCharacter Name
GDK_ampersandɣLATIN SMALL LETTER GAMMA
GDK_apostropheˈMODIFIER LETTER VERTICAL LINE
GDK_slash + GDK_apostropheˊMODIFIER LETTER ACUTE ACCENT
GDK_slash + GDK_slash/SOLIDUS
GDK_slash + GDK_3ɛLATIN SMALL LETTER OPEN E
GDK_slash + GDK_AɒLATIN LETTER TURNED ALPHA
GDK_slash + GDK_RʁLATIN LETTER SMALL CAPITAL INVERTED R
GDK_slash + GDK_aɐLATIN SMALL LETTER TURNED A
GDK_slash + GDK_cɔLATIN SMALL LETTER OPEN O
GDK_slash + GDK_eəLATIN SMALL LETTER SCHWA
GDK_slash + GDK_hɥLATIN SMALL LETTER TURNED H
GDK_slash + GDK_mɯLATIN SMALL LETTER TURNED M
GDK_slash + GDK_rɹLATIN SMALL LETTER TURNED R
GDK_slash + GDK_vʌLATIN SMALL LETTER TURNED V
GDK_slash + GDK_wʍLATIN SMALL LETTER TURNED W
GDK_slash + GDK_yʎLATIN SMALL LETTER TRUEND Y
GDK_3ʒLATIN SMALL LETTER EZH
GDK_colonːMODIFIER LETTER TRIANGULAR COLON
GDK_AɑLATIN SMALL LETTER ALPHA
GDK_EɛLATIN SMALL LETTER OPEN E
GDK_IɪLATIN LETTER SMALL CAPITAL I
GDK_LʟLATIN LETTER SMALL CAPITAL L
GDK_MʍLATIN SMALL LETTER TURNED W
GDK_OOLATIN LETTER SMALL CAPITAL OE
GDK_O + GDK_EɶLATIN LETTER SMALL CAPITAL OE
GDK_RʀLATIN LETTER SMALL CAPITAL R
GDK_UʊLATIN SMALL LETTER UPSILON
GDK_YʏLATIN LETTER SMALL CAPITAL Y
GDK_graveˌMODIFIER LETTER LOW VERTICAL LINE
GDK_aaLATIN SMALL LETTER A
GDK_a + GDK_eæLATIN SMALL LETTER AE
GDK_ccLATIN SMALL LETTER C
GDK_c + GDK_commaçLATIN SMALL LETTER C WITH CEDILLA
GDK_ddLATIN SMALL LETTER E
GDK_d + GDK_apostrophedLATIN SMALL LETTER D
GDK_d + GDK_hðLATIN SMALL LETTER ETH
GDK_eeLATIN SMALL LETTER E
GDK_e + GDK_minusɚLATIN SMALL LETTER SCHWA WITH HOOK
GDK_e + GDK_barɚLATIN SMALL LETTER SCHWA WITH HOOK
GDK_ggLATIN SMALL LETTER G
GDK_g + GDK_nɲLATIN SMALL LETTER N WITH LEFT HOOK
GDK_iiLATIN SMALL LETTER I
GDK_i + GDK_minusɨLATIN SMALL LETTER I WITH STROKE
GDK_nnLATIN SMALL LETTER N
GDK_n + GDK_gŋLATIN SMALL LETTER ENG
GDK_ooLATIN SMALL LETTER O
GDK_o + GDK_minusɵLATIN LETTER BARRED O
GDK_o + GDK_slashøLATIN SMALL LETTER O WITH STROKE
GDK_o + GDK_eœLATIN SMALL LIGATURE OE
GDK_o + GDK_barɑLATIN SMALL LETTER ALPHA
GDK_ssLATIN SMALL LETTER_ESH
GDK_s + GDK_hʃLATIN SMALL LETTER_ESH
GDK_ttLATIN SMALL LETTER T
GDK_t + GDK_hθGREEK SMALL LETTER THETA
GDK_uuLATIN SMALL LETTER U
GDK_u + GDK_minusʉLATIN LETTER U BAR
GDK_zzLATIN SMALL LETTER Z
GDK_z + GDK_hʒLATIN SMALL LETTER EZH
GDK_bar + GDK_oɒLATIN LETTER TURNED ALPHA
GDK_asciitildẽCOMBINING TILDE


Note

These sequences were extracted from the file modules/input/imipa.c in the GTK+ 2.6.1 distribution. The lines GDK_O / O / LATIN LETTER SMALL CAPITAL OE, GDK_d / d / LATIN SMALL LETTER E and GDK_s / s / LATIN SMALL LETTER_ESH are obviously wrong. That is a mistake in imipa.c that I should have reported.

Note

Other applications supporting IPA Characters are Yudit and Emacs. Please refer to files like /usr/share/yudit/src/ASCII-IPA.kmap and /usr/share/xemacs/mule-packages/etc/mule-ucs/reldata/uipa.el for inspiration in case the GTK+ IPA Input Method lacks any character.

The ultimative reference regarding IPA Characters is the Unicode Standard, especially the Character Range 0x250-0x2af covers at least 85 characters, so some might be missing here! You can look at the Unicode Code Charts for Symbols.

Tip

In gtk/gtkimcontextsimple.c it is documented that you can enter Unicode characters by holding down Shift + Control and entering the hexadecimal code of the Unicode character you want to enter.

How to Enter Special Characters

This functionality is not provided by FreeDict-Editor itself. But many languages require accented and other special characters to be entered, hence this word of advice.

One way to enter special characters is to use GTK+ Input Methods. In FreeDict-Editor this way is mainly used for entering IPA Characters, but it can be used to enter other characters as well. See the section called “How to Enter IPA Characters”.

In the X Window System, the X Keyboard Extension (XKB) is responsible for translating numeric scancodes sent from the keyboard into symbolic keysyms that are handed over to the application. XKB can also compose characters out of combining (among them diacritical) marks and base characters. For the compose function to work, a key with the compose function is required, as well as "dead" keyboard keys. Those dead keys will hold the combining marks and will not produce a character immediately after being pressed. They are composed with other characters.

Note

The most authoritative source for XKB is "The X Keyboard Extension: Protocol Specification" that could be found at /mnt/hdb2/cygwin/usr/x11r6/lib/x11/doc/PostScript/XKBproto.PS on my system.

Keyboard Layouts

An exemplary keyboard layout with dead keys is latin. The following table shows by which key combinations the dead keys are reachable (using the "basic" section of that file; there are other sections for example for a German keyboard model). The information presented here was extracted from /etc/X11/xkb/symbols/pc/latin. The file comments say it is Revision 1.4 from 2003/01/26. Above path is valid for my SuSE 9.0 system and may differ for yours.

Table 3. Dead keys of the latin keyboard layout

Keysym without any modifier / KeycapKeysym with Shift modifier only / KeycapKeysym with ISO_Level3_Shift modifier onlyKeysym with Shift+ISO_Level3_Shift modifiers 
equal / =plus / +dead_cedilladead_ogonek 
bracketleft / [braceleft / {dead_diaeresisdead_abovering 
bracketright / ]braceright / }dead_tildedead_macron 
semicolon / ;colon / :dead_acutedead_doubleacute 
apostrophe / 'quotedbl / "dead_circumflexdead_caron 
backslash / \bar / |dead_gravedead_breve 
slash / /question / ?dead_belowdotdead_abovedot 

Warning

Another keyboard layout promising many dead keys is us_intl. But its disadvantage is that it interferes with the multi layout concept because it uses group 2 instead of shift levels 3 and 4.

For us_intl to be useful, you must have an easy way to switch to group 2, eg. by defining the right Alt key to function as Mode_switch.

Configuring the X Keyboard Extension

The setxkbmap command can be used to configure XKB. For example, enter the following command into an xterm:

setxkbmap latin,de -option compose:ralt -option grp:alt_shift_toggle

The compose:ralt option puts the compose function on the right Alt key. That key will produce the LevelThree keysym, which will be interpreted as ISO_Level3_Shift modifier.

Note

The above command also sets up a multi layout configuration. Initially, the latin layout is active. Using the group toggle keys Shift+Left Alt is is possible to switch permanently between the two layouts. The permanently here means the layout switched to is enabled even when the toggle keys are left.

An alternative to toggling is to switch between layouts only temporarily. The Mode_switch key can do this by selecting the next group while it is pressed. Thus, it acts like another modifier.

Tip

You can use the command setxkbmap -print to find out the current xkb settings.

Tip

Another nice thing of XKB is its ability to show the current keyboard configuration using the current keyboard geometry. Enter the following command: xkbprint -color -lg group -ll shiftlevel :0 - | gv -seascape - group must be replaced by the keyboard group to be shown (1-4). shiftlevel must also be replaced by a number from 1-4 - out of which 1 (the default) and 3 are most useful. The first part of this command produces a PostScript file. To view it with the second part of this command, the gv command must be available, ie. the ghostview application must be installed.

Another way of configuring the XKB extension is to specify the XKB options in the InputDevice Section of the XF86Config file. The manual page keyboard(4) describes how to specify XKB options there (But it does not explicitly mention how to specify multiple values for XkbOptions. xf86config separates values by comma). XXX extend keyboard(4) man page. An example is:

# from /etc/X11/XF86Config
Section "InputDevice"
  Driver       "Keyboard"
  Identifier   "Keyboard[0]"
  Option       "AutoRepeat" "200 39"
  Option       "Protocol" "Standard"
  Option       "XkbLayout" "latin,de"
  Option       "XkbModel" "pc104"
  Option       "XkbRules" "xfree86"
  Option       "XkbVariant" "xfree86"
  Option       "XKbOptions" "grp:alt_shift_toggle,grp:switch"
EndSection

How to Select Entries

Prior to editing entries, you have to select them from all the enties in the currently open dictionary. To select entries you have to write an XPath expression which the computer will evaluate. The result is a set of matching entries.

The XPath language is very powerful and because of that it can be complicated.

FreeDict-Editor has two input fields to support you in entering XPath epressions for selecting entries. The Input Field XPath Template lets you enter the actual XPath expression. But in it you are allowed to write %s, which will be replaced by the contents of the Select Input Field. Normally you will modify the Select Input Field much more often than the template.

WHwn an XPath evaluation takes too much time (this happens especially when you use the // operator), you can abort it. For this just click on the Button Abort XPath Evaluation.

FreeDict-Editor extends the functions available in XPath expressions. For this it uses the namespace prefix fd for the namespace http://freedict.org/freedict-editor.

Currently only one XPath extension function exists:

bool unbalanced-braces(nodeset)

checks the string representation of each node in the nodeset and returns true() when any node's string representation is unbalanced

It is used in the sanity check for unbalanced braces.

How to Spell Check

One might wonder what this function has lost in a dictionary editor. Most new dictionaries will be developed for languages which don't have a spellchecking wordlist yet. But just consider the other language of the dictionary. Mostly it will be English - for which good spellchecking lists do exist! Also, you could generate a spellchecking wordlist from the headwords and check the example sentences with it!

So much for the motivation. Just a number: In my 2280 entries Khasi-English dictionary I had 10 spelling mistakes.

Settings

To configure FreeDict-Editor, choose EditPreferences. The Preferences dialog contains the following tabbed sections:

General

(guilabel)

(description)

Entry Editor

This section remains unwritten. This paragraph just keeps DOCBOOK validators happy.

Known Bugs and Limitations

  • Parameter Entities (ie. entities for use in DTDs) in the internal DTD subset are dereferenced (ie. replaced by their replacement text), you see the result after saving and reloading.

  • General Entities (ie. entities for use in the document content) are not handled transparently by XSLT. They are replaced by their replacement text and should be avoided from the beginning.

  • Memleaks are likely

  • Some deprecated and/or private fields/functions of GTK+, libgnomeui and libbonoboui are used

Links

This section contains links to sources of further information.

  • This document assumes that you are familiar with the FreeDict HOWTO

  • "Dictionary Editor and Browser" as presented in "Lexical Databases in XML" by Pavel Smrz and Martin Povolny, Faculty of Informatics, Masaryk University Brno, Botanicka, 68a, 602 00 Brno, Czech Republic, E-mail: {smrz,xpovolny}@fi.muni.cz (XXX add link) - DEB is a system that might be more suited to large projects than FreeDict-Editor

  • SIL Dictionary Development Program - XXX quote from its website

About FreeDict-Editor

FreeDict-Editor was written by Michael Bunk . To find more information, please visit freedict.org.

To report a bug or make a suggestion regarding this application or this manual, follow the directions in the Feedback paragraph (XXX link to releaseinfo/legalnotice - but that legal notice actually only belongs to this doc).

This program is distributed under the terms of the GNU General Public license as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. A copy of this license can be found at this link, or in the file COPYING included with the source code of this program.