All rights reserved. Copyright (C) 1996-2005 by NARITA Tomio
Last modified at Jan.16th,2004.
lv - a Powerful Multilingual File Viewer / Grep
The latest version is ver 4.51: Download
All rights reserved. Copyright (C) 1996-2005 by NARITA Tomio. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
See also GNU General Public License Version 2.
MSDOS installation:
Before making lv, you need to install LSI C-86 Compiler (limited and freeware version of LSI C-86 for sample usage).
MSDOS version of lv directly outputs ANSI escape sequences without regard to termcap and terminfo. Perhaps you need an ANSI escape sequence driver named ``ANSI.SYS'' (or more sophisticated one) on MSDOS including DOS prompt on MS-Windoze. Since Windoze-NT does not seem to prepare such drivers for DOS prompt in default, please look into the driver configuration when lv fails to handle the terminal capability correctly.
Or, using redirect or pipe-line:
Compressed files that have suffix ``gz'', ``z'', or ``GZ'', ``Z'' are extracted by lv using zcat (1), and ``bz2'' or ``BZ2'' with bzcat (1). Please install zcat and bzcat that can expand all of them.
In case that standard output is not connected to an ordinal terminal but to redirect or pipe-line, lv works as a coding-system or code-points conversion filter like nkf (1) and tcs (1).
lv also works like grep (1) by giving it another name, lgrep. Please install symbolic (or hard) link whose name is lgrep to lv (1). Or, lgrep functionality is also turned on the option '-g'. lgrep is used like below:
The coding-system of grep_pattern can be specified as ``keyboard coding system'' (see below).
iso-2022-cn, -jp, -kr can be converted into euc-china or -taiwan, euc-japan, euc-korea, respectively (and vice versa). shift-jis uses the same internal code-points as iso-2022-jp and euc-japan.
Since big5 characters can be converted into CNS 11643-1992 with negligible incompleteness, big5 streams can be translated into iso-2022-cn or euc-taiwan (and vice versa) with code-points conversion. Note that the iso-2022-cn referred here is not GB sequence, only just CNS one. You should remember that lv cannot translate big5 into GB directly.
The search function of lv may not work correctly when lv additionally performs ``code-points'' conversion (not ``coding-system'' translation), because visible code and internal code are different from each other. lv will try to avoid this problem with converting charsets of search patterns automatically, but this function is not always perfect.
Every configuration will be overloaded in the following order if there is. Command line options are always read finally.
Examples:
The following keys have special meanings in the keyboard input:
G0 | G1 | G2 | G3 | |
---|---|---|---|---|
Designation | ASCII | GB 2312-80, CNS 11643-1992 Plane 1, ISO-IR-165 | CNS 11643-1992 Plane 2 | CNS 11643-1992 Plane 3..7 |
G0 | G1 | G2 | G3 | |
---|---|---|---|---|
Designation | ASCII | GB 2312-80 | not used | not used |
G0 | G1 | G2 | G3 | |
---|---|---|---|---|
Designation | ASCII | JIS X 0208 | JIS X 0201 Katakana | JIS X 0212 |
G0 | G1 | G2 | G3 | |
---|---|---|---|---|
Designation | ASCII | KS C 5601-1987 | not used | not used |
G0 | G1 | G2 | G3 | |
---|---|---|---|---|
Designation | ASCII | CNS 11643 Plane 1 | CNS 11643 Plane 2-7 | not used |
lv can convert character codesets between Unicode and the following charsets: GB 2312-80, JIS X 0208, JIS X 0212, KSC 5601-1987, Big Five, CNS 11643-1992 Plane 1-2, and ISO 8859-1..16.
Currently lv's mapping table is based on Unicode 1.1.
Encoding | Charset used for mapping from Unicode |
---|---|
iso-2022-cn | GB 2312-80 (primary), CNS 11643-1992 (secondary), (ISO 8859-*) |
iso-2022-jp | JIS X0208, JIS X0212, JIS X0201, (ISO 8859-*) |
iso-2022-kr | KSC 5601-1987, (ISO 8859-*) |
euc-china | GB 2312-80 |
euc-japan | JIS X0208, JIS X0212, JIS X0201 |
euc-korea | KSC 5601-1987 |
euc-taiwan | CNS 11643-1992 Plane 1-2 |
shift-jis | JIS X0208, JIS X0201 |
big5 | Big Five |
When you output Unicode CJK unified ideographs through iso-2022-cn, GB 2312-80 is used primarily, and the rest which are not included in GB are mapped into CNS 11643-1992.
Note that euc-japan and shift-jis are mutually exclusive for decoding.
Invalid characters which cause error state under specified coding system might be ignored partially. If it is printable, it will be output as a control character.
If you don't specify any input coding system, that is, when auto-select is specified, lv will select input coding system automatically.
When a 8bit code is found during file loading and the input coding syste is auto-select (its entity is iso-2022-kr), lv examines ``the first line that contains the first 8bit code''. Then lv tries several 8bit decodings as below:
The coding system cheking results are examined in the following order:
If a text contains only EUC code points, it is hard to identify the language the EUC coding system represents. So lv provides default EUC coding system used when lv chooses the input coding system from EUCs. Default EUC coding system is set by option -D (euc-japan on Japanese version LV).
You can toggle coding systems even while viewing a file by run-time command `t' and `T', which traverses through all coding sytems implemented in LV. In addition, you can toggle HZ decoding mode by C-t on demand.
You should remember that the auto-selection mechanism of LV works incorrectly in some cases. Especially, if a text contains only JIS X 0201 Katakana in shift-jis, it will be misinterpreted as euc-japan.
If the result of auto selection is incorrect and you know the input coding system, please set it by the option -I, which disables auto selection.
MSDOS | UNIX | |
---|---|---|
Input: | auto-select | auto-select |
Keyboard: | shift-jis | iso-2022-jp |
Output: | shift-jis | iso-2022-jp |
Pathname: | shift-jis | iso-2022-jp |
Default EUC: | euc-japan | euc-japan |
To change above, please modify lv.c. However, those coding systems can be specified through options.
These charset are only recognized by lv, and it is depend on your terminal's capability that actually can display them or not.
Inversely, you can handle non-listed charsets above as latin-1 in such case as a 8bit coding system is displayed on a 8bit terminal. (If there is no code conversion and each character has one column.)
$BLnB<$5$s(B(nomu@ipl.mech.nagoya-u.ac.jp)
$B@PDM$5$s(B(ishizuka@db.is.kyushu-u.ac.jp)
$BLnCf$5$s(B(nona@in.it.okayama-u.ac.jp)
$B>>86$5$s(B(moody@osk.threewebnet.or.jp)
$BB<0f$5$s(B(murai@geophys.hokudai.ac.jp)