Here's a little program (javahtmlizer.jar, source code) that converts Java source to HTML. To use it, just do:
java -classpath javahtmlizer.jar HTMLizer path/to/my/Source.java > out.html
Here's an example of the output. Nice, eh?
Some notes on the program:
The JavaCC "grammar" (here) is just a tokenizer - no parsing, no abstract syntax tree building. Why bother?
Most of the time you'll see whitespace being skipped. But for the purposes of this task, whitespace is important -
for example, spaces needs to be converted to character entities. So this grammar creates
full-fledged tokens from spaces, tabs, and line feed characters. The only SKIP token is for carriage return
characters.
Notice that there are no SPECIAL_TOKEN definitions - comments, like whitespace, are significant for this task.
The actual HTMLization of the code is split between a COMMON_TOKEN_ACTION block and Token.java. I experimented with various ways of doing things - first I did the conversion inside HTMLizer.java, then I moved it entirely into Token.java, then I moved it into lexical actions following each token, and finally I settled on the current approach.
This program requires Java 1.5, which is nice since the autoboxing and generics lets us do nice things like having a Set<Integer> that contains all the token kinds.
Note the limitations that only having a tokenizer imposes - we can't do specific colors for method invocations or class names or whatever since those are syntax-level concepts vs token-level concepts. Still, it does pretty well.
I added several utility methods to Token.java to assist in HTMLization - spanify, translateWhitespace, and htmlEscape. This was pretty straightforward since JavaCC won't overwrite this file if it already exists. Putting these functions here also makes them easy to unit test - if I had written unit tests, of course. :-)
This program is BSD licensed; if you find it helpful or interesting, pick up my JavaCC book!