The Canadian Well Logging Society (CWLS) does log analysis of well data. The well logs are specified in the LAS format, which is more or less a fixed length data file with various sections. Here's an example file and here's a representative snippet:
#MNEM .UNIT Data Description of #---- ----- ------------------------------ ----------------- # STRT .F 10180.0000 :START DEPTH STOP .F 10414.0000 :STOP DEPTH STEP .F 1.0000 :STEP NULL . -999.25 :NULL VALUE COMP . Cramer Oil :COMPANY WELL . #36-16 State :WELL LOC . SE SE 36-47N-71W :LOCATION CNTY . Campbell :COUNTY
I've written a small JJTree grammar (JJDoc'd HTML for the grammar is here) that parses the LAS format and forms it into an abstract syntax tree (AST). Here's an example of the structure that this JJTree grammar produces:
>DataFile > VersionSection > VersionLine > WrapLine > CreationLine > WellInformationSection > StartLine > StopLine > StepLine > NullLine
This could be useful in all sorts of ways when combined with a a program that traverses the AST and does something with the appropriate nodes. For example, you could graph the resistance as the well depth changes, or display the temperature changes, you could display well locations on a map, or you could use this parser as a front end for moving the well data into a relational database.
Here's the grammar and some supporting code (las.jar, source code); to use it, just do:
java -classpath las.jar LAS some_data_file.las
Some notes on the grammar:
private static int[][] fieldDefinitions = {
{8, 7, 32, 28},
{NAME, UNIT, DATA, DESCRIPTION},
{12, 12, 12, 12, 12},
{DEPTH, DELTA_T, RESISTENCE, SP, GR}
};
Token objects "manually". That is, rather than attempting to specify a
token definition, it uses a catch-all token definition: ~[]. Then it reads in the proper number of characters
and appends those to the Token object's image field. Next it sets the Token.kind field to
proper integer value. So we still let the tokenizer
create the Token objects and link them together, but we don't have to use a separate lexical state
for each token.TownshipLine, GammaRayLine, and so
forth. But all those nodes consume the same sequence of tokens, so at first I had four token definitions repeated a few
dozen times in the grammar. This seemed ugly and so I refactored it into a DefaultData
production. This removes the duplication, and since DefaultData is a #void production,
it doesn't clutter up the AST. If you use this grammar you'll probably want to enhance DefaultData
so that it captures the tokens and saves their images in a parent production field.This program is BSD licensed; if you find it helpful or interesting, pick up my JavaCC book!