The translation of a LaTeX source file into HTML consists of loading the style package tex4ht.sty into the source file, choosing the desirable options for the translation, compiling the source into dvi code with the native LaTeX engine, and postprocessing the outcome with the tex4ht and t4ht programs (see overview).
The htlatex command loads a script which takes on itself to invoke the different steps of the process, without user intervention. The command assumes the form
where the first set of options is for the tex4ht.sty and *.4ht style files, the second option is for the tex4ht postprocessor, and the third set for the t4ht postprocessor. If not empty, the second option should be a path, from the root directory ht-fonts of the hypertext fonts to a subdirectory. For instance,
In addition, the command requests a break up of the output into separate web pages, in accordance to the two top sectioning levels of the document. Moreover, it asks for a listing in the log file of the information available for the style files in use.
Documents requiring the combination of Latin, Greek, and Hebrew are probably best served, by the commonly available browsers, when compiled to Latin and Greek in iso-8859-7, with the Hebrew content translated to unicode or pictures. For instance,
htlatex foo "html,iso-8859-7,RL2LR,rl2lr" "unicode/hebrew/"
htlatex foo "html,iso-8859-7,pic-RL"
The first package parameter is distinguished in that it may refer to ‘html’, to ‘xhtml’, or to a user-provided configuration file; other values are ignored. An extension ‘cfg’ is assumed, if the file name is provided without an extension.
Quite a few variants of the htlatex script are included in the distribution, and many others can be easily tailored.
The output of TeX4ht can be easily broken. Hence, it is very important to validate the outcome.
TeX4ht doesn’t offer a built-in parser to verify the correctness of the outcome. However, external validator(s) can quite easily be integrated into the compilation process.
Most applications might require the knowledge of just a few additional simple features of TeX4ht, if any. Hence, it is strongly advised to check the output obtained from the default configuration, before trying to work with other settings.
The remainder of this document provides much more than that, with an eye directed toward users that want to customize their outcome. Therefore, the reader is encouraged to skim the information provided below for acquiring a general understanding of the system, leaving the tedious learning of the details to when the need arises.
To keep with the spirit of LaTeX and hypertext, in which style is assumed to be separated from content, the users are encouraged to avoid inserting TeX4ht code into their source files. Instead, they should place their modifications, to the default settings, within private configuration files to be loaded by htlatex-like commands.
The following are some of the more useful underlying commands of TeX4ht.
1 | \HCode{...} |
2 | \HPage{anchor}content\EndHPage{} |
3 | \Link[target-file arguments]{target-loc}{cur-loc}anchor\EndLink |
4 | \ifHtml... \else... \fi |
5 | \ifOption{...}{true-part}{false-part} |
A non-leading package parameter ‘1’, ‘2’, ‘3’, or ‘4’, in \usepackage, asks for a tree-structured set of files, reflecting on the sectioning of the document to the specified depth. Sequential prev-next links within the hierarchy, instead of the default hierarchical ones, can be requested with the ‘next’ parameter. The parameter ‘sections+’ creates titles for the sectioning commands that link to the tables of contents.
Finer control is possible with the following commands.
1 | \CutAt{at-unit,until-unit-1,until-unit-2,...} |
2 | \tableofcontents[unit-1,unit-2,...] |
3 | \TocAt{at-unit,unit-1,unit-2,...,/until-unit-1,/until-unit-2,...} |
4 | \ConfigureToc{unit} {before-mark} {before-title} {before-page-number} {at-end} |
5 | \Configure{tableofcontents} {before-toc} {end-of-toc} {after-toc} {before-nonindented-par} {before-indented-par} |
6 | \Configure{TocAt} {before-toc} {after-toc} |
7 | \Configure{TocAt*} {before-toc} {after-toc} |
8 | \Configure{unit} {top} {bottom} {before-title} {after-title} |
9 | \Configure{CutAt} {unit} {before-button} {after-button} |
10 | \Configure{+CutAt} {unit} {before-button} {after-button} |
11 | \NewSection\unit {mark-for-toc} |
12 | \Configure{crosslinks} {left-delimiter} {right-delimiter} {next} {prev} {prev-tail} {front} {tail} {up} |
13 | \Configure{crosslinks+} {before-top-links} {after-top-links} {before-bottom-links} {after-bottob-links} |
Tables with \multicolum entries need a few LaTeX compilations to stabilize.
1 | \Configure{table} {before-tbl} {after-tbl} {before-row} {after-row} {before-entry} {after-entry} |
The appearances of lists and \begin-\end environments are configured with the following commands.
1 | \ConfigureList{list-name} {before-list} {after-list} {before-label} {after-label} |
2 | \ConfigureEnv{environment-name} {before-environment} {after-environment} {before-list} {after-list} |
The next command imports external pictures, and the two commands that follow request pictorial representations for local content. The attributes, and the replacement parameters with their enclosing rectangular brackets, are optional.
1 | \Picture[replacement-for-textual-browser]{file-name attributes} |
2 | \Picture+[replacement-for-text-browsers]{file-name attributes}content\EndPicture |
3 | \Picture*[replacement-for-text-browsers]{file-name attributes}content\EndPicture |
In the default setting, the math environments ‘\(...\)’, and the display math environments ‘\[...\]’ and ‘$$...$$’, request pictorial representations for their content. On the other hand, the math environments ‘$...$’ ask for no special treatment. Simple features like mathematical symbols, subscripts, and superscripts, are translated into html, and more complex entities like roots and fractions are translated into pictures (example).
1 | \Configure{[]} {before$$at-start} {at-end$$after},
\Configure{()}{before$at-start}{at-end$after} \Configure{$$}{before}{after}{at-start} \Configure{$}{before}{after}{at-start} |
2 | \Configure{SUB}{before}{after} \Configure{SUP}{before}{after} \Configure{SUBSUP}{before}{between}{after} |
3 | no_, no^ |
The insertions of code at paragraph breaks are controlled by the following commands.
1 | \Configure{HtmlPar} {noindent-P} {indent-P} {from-noindent-P} {from-indent-P} \EndP |
2 | \IgnorePar |
3 | \ShowPar |
4 | \IgnoreIndent |
5 | \ShowIndent |
Scripts similar to htlatex are available for the different modes of output under support. The outcome of the translations should be checked by validators for proper syntax. Typically, with the presence of validators, errors are easy to detect and correct, but they require human intervention.
In particular, it might be worthwhile to notice some of the more common sources of problems for MathML.
Cascade style sheets attach presentations to the content of hypertext pages, in a manner similar to the way that ‘.sty’ files define the presentations to the content of source LaTeX files. TeX4ht produces a CSS file for each document that is translated to HTML transitional 4.0 code. The following are related commands.
1 | \Css{content} |
2 | \Css content\EndCss |
3 | \CssFile[list-of-css-files]content\EndCssFile |
TeX4ht has an elaborated machinery for handling fonts, through special virtual hypertext fonts stored in ‘.htf’ files. Instead of providing a design for each symbol, as is the case in standard fonts, the virtual fonts provide a content for each symbol. The following commands offer some control, from within the source LaTeX documents, over the content provided to the symbols.
1 | \NoFonts |
2 | \EndNoFonts |
3 | \Configure{htf} {class} {delimiter} {template-1} {template-2} {template-3} {template-4} {template-5} {template-6} {template-7} |
4 | \Configure{htf-sty} {class/font} {CSS-instructions} |
The htf fonts might request pictorial representations for symbols. In such cases, the sizes of the pictures depend on the sizes of the TeX fonts in use. Size changes through the \magnification command should be made before loading the tex4ht.sty package.
The design of a virtual hypertext font might take some labor, but it does not require too much sophistication.
Literate programming is a discipline that promotes the writing of programs the way one explains them to human beings. ProTeX is a literate programming system fully implemented in terms of TeX, and it is compatible with LaTeX and other TeX-base systems. TeX4ht, and ProTeX itself, are examples of applications written in ProTeX.
1 | \input ProTex.sty \AlProTex{extension,<<<>>>,list,title,escape-character} |
2 | \<title\><<< code fragment >>> |
3 | `<title`> |
4 | \OutputCode\<...\> |
Scripts produce the content in verbatim format with no decorations.
1 | \ScriptEnv{environment} {prefix} {postfix} |
2 | \ScriptCommand{\command} {prefix} {postfix} |
3 | \JavaScript...\EndJavaScript |
Source TeX files are treated in a manner similar to the way LaTeX source files are treated, with the obvious restriction that only TeX commands are allowed. In particular, the \usepackage command is not valid in TeX. A counter part of the htlatex system command is called httex and it takes a similar format.
The htlatex implies a loading of an implicit or an explicit configuration file when the command \begin{document} is encountered. The httex command, on the other hand, requires the insertion of the code ‘\csname tex4ht\endcsname’ into the source TeX file, at the location where the implicit or explicit configuration file is to be loaded (example).
The configuration files for TeX are similar to those for LaTeX, with the only exception of not including the ‘\begin{document}’ instruction.
The compilation, of sources which explicitly include the configuration files, can be invoked with a command of the form ‘ht tex filename’ (example).
The following are package options are available for TeX only.
1 | plain- |
2 | pic-eqalign |
A \TableOfContents command, similar to the generalized command of \tableofcontents offered to LaTeX, is also provided for TeX.
Much of the look and feel of TeX4ht is achieved through configurable hooks which are defined with the following commands.
1 | \NewConfigure{name}[i]{body} |
2 | \Configure{name}{parameter-1}...{parameter-i} |
For help configuring hooks already seeded in the system, compile the source files in use with the ‘info’ option active and review the information in log files. Much of the information in the log files may also be obtained by running ‘xhlatex mktex4ht’ and reviewing the entries in the outcome page ‘mktex4ht.html => index => mktex4ht’.
The following features can become handy for tailoring markups in LaTeX documents.
1 | Package parameter ‘0.0’ |
2 | Parameter ‘hooks’ |
3 | Option ‘hooks+’ |
4 | Package parameter ‘edit’ \Tg<...> \Tg</...> \Tg<.../> |
5 | \Configure{edit} {before} {after} \Configure{hooks} {before} {after} {}{} |
6 | \Configure<...>{before}{after} \Configure</...>{before}{after} \Configure<.../>{before}{after} |
7 | \Configure<...>-{replacement} \Configure</...>-{replacement} \Configure<.../>-{replacement} |
8 | Package parameter ‘edit+’
This parameter is a generalization of the ‘edit’ parameter, which introduces configuration information into the log file. |
9 | Package parameter ‘verify’ \Verify...\EndVerify |
10 | Package parameter ‘verify+’ |
The \usepackage{tex4ht} implicitly assumes a private configuration file of the following form.
\Preamble{html}\begin{document}\EndPreamble
Similarly, a command of the form ‘\usepackage[html,option,option,...]{tex4ht}’ implicitly assumes a file of the following form.
\Preamble{html,option,option,...}\begin{document}\EndPreamble
On the other hand, a command of the form ‘\usepackage[file,options]{tex4ht}’ assumes a configuration file obeying the following format (example). The extension ‘cfg’ is assumed for names of configuration files that are listed without their extension.
One can avoid using configuration files, by including their implicit and explicit content within the source files. In such a case, the ‘\begin{document}’ of the source file should be replaced with a code segment of the following format (example).
1 | no-halign |
2 | pictex |
3 | jpg, png |
A compilation starts by opening ‘tex4ht.sty’ and loading a fraction of its code. The main purpose of this phase is to request the loading of the system at a later time (for instance, upon reaching \begin{document}). The motivation for the late loading is to allow TeX4ht to collect as much information as possible about the environment requested by the source file, and help the system reshape that environment with minimal interference from elsewhere.
The system uses two kinds of (4ht) configuration files. The files of the first kind mainly seed hooks into the macros loaded by the source file (for instance, latex.4ht, fontmath.4ht, and article.4ht). The files of the second kind mainly attach meaning to the hooks (for instance, html4.4ht, unicode.4ht, and mathml.4ht).
Different source files may request the loading of different style files and in different orders. The hook seeding files are loaded in response to the loading of the style files, and in a compatible order. Since the different style files may redefine the syntax and semantics of macros, TeX4t follows a similar route of defining and redefining the hooks and their meanings.
The meaning attaching files are normally requested through option names introduced in the tex4ht.4ht system file. The user may add option names, and redefine old ones, within a new file named tex4ht.usr.
A new ‘tex4ht.usr’ file should group references to *.4ht configuration files under arbitrarily chosen option names. For that purpose, \Configure commands similar to those provided in tex4ht.4ht should be employed.
Variants of the htlatex-like scripts may be produced in the following manner.