%! % POSTSCRIPT LOG FILE INTERPRETER UTILITIES % =================================================================== % by Don Lancaster February 28, 1997 % =================================================================== % Copyright c 1997 by Don Lancaster and Synergetics, Box 809, % Thatcher AZ, 85552. (520) 428-4073 www.tinaja.com don@tinaja.com % All commercial rights and all electronic media rights are % *fully* reserved. Linking welcome. Reposting is expressly forbidden. % =================================================================== % These utilities are intended to be sent to the Adobe Acrobat % Distiller 3.01 by modifying this file with an editor, dragging, % and dropping. GhostScript use is possible but is not supported. % =================================================================== % WARNING: Preliminary and possibly buggy code. For advanced users only. % =================================================================== % This is yet another series of PostScript-as-language examples. % The routines are intended to read a disk based Apache web server % log file, reformat it in a simplified form at 20:1 compression % and extract ongoing custom info not normally provided by commercial % web analyzer software. Such as watching the popularity of a new % file as it moves up through the ranks. % The process is also often faster and takes up less memory and % disk space. % The initial version is presently limited to 255 days and 255 files. % It is easily extended. All accesses in a single day are presently % considered one session. An access crossing midnight is two sessions. % Max log file size depends on available RAM, and is probably currently % around 25 megs. %%%%%%%% (A) FILE COMPRESSION AND CONVERSION UTILITY %%%%%%%%%%%% % /convertfile reads the textfile from the Apache log server and % converts it into three dictionaries. % /filedict is the first dictionary. Each entry is the unique % name of an accessed filename and defines an increasing integer. % Thus, the fourth file entry might be the equivalent of % /myfile.zorch 3 def This file allows substitution of a single % ASCII string character or its numeric equivalent for a full filename. % /datedict is the first dictionary. Each entry is a unique 24 hour % day. Consisting of a name and an increasing integer. Thus, the third % file entry might be the equivalent of % /07/Sep/1997 3 def This file allows substitution of a single % ASCII string character or its numeric equivalent for a date. % NOTE: PostScript will apparently allow slashes inside of a % name string, but ONLY if that name is *always* written by % a string followed by a cvn operator. And only if the name % is *always* read by a cvn cvx exec sequence. The legality % of this sneaky stunt is assumed. % /visitordict is the main dictionary. There is one name entry for % each unique visitor. Each entry consists of an array of two strings, % a date string and a filename string. Such as % % /sbijok.yrac.net [(abcd)(ABCD)] def % % "a" is the numeric 0-255 pointer to the first filename accessed. % That access took place on day "A". Compression of approximately 20:1 % comes about since each user appears only once and each file access % consists of two characters. The length of either string is the number % of total file accesses for a user. The number of DIFFERENT entries % in the second string is the number of user sessions per month. % Note that this dictionary inherently "self sorts", eliminating % the interleaved multi user records in the original file. % This is the high level routine that reads the log file and % converts it into a compressed PostScript -visitordict- dictionary... /visitordict 1 dict def % default dictionaries expand upon use /filedict 1 dict def /rfiledict 1 dict def /datedict 1 dict def /rdatedict 1 dict def /curfile (dummy) def /progress 0 def % progress marker % high level code to convert log file into PostScript dictionaries /convertfile { (starting logfile conversion) print flush logfilename (r) file /source exch def {source 1000 string readline {procline}{procline exit} ifelse }loop makerdicts } def % procline unifies the line processing tasks into one job /procline {dup length 5 gt {parseline validate addtofile markprogress} if } def /markprogress {/progress progress 1 add store progress 3000 mod 0 eq {(.) print flush} if} def % /parseline continues the log line separation, isolating the customer % data, data data, file data, and access data /parseline { /ok true store ( - - [) search { /curcust exch store pop % save current url ( "GET ) search {/curfulldate exch store pop ( ) search { /curfilename exch store pop /curdregs exch store} {pop /ok false store} ifelse} % short line error {pop /ok false store} ifelse % might be NULL or HEAD or error }{pop /ok false store} ifelse } def % /validate checks for a wanted new entry. Filelength must be more than % two characters. Must be code 200 (valid) or 304 (cached) access. % File must be in include list of non-zero legnth or not in exclude % list. Proc exits with /ok set to true or false /validate {curfilename length 2 lt {/curfilename (home page) store} if curdregs ( 200 ) search {pop pop pop true}{pop false} ifelse curdregs ( 304 ) search {pop pop pop true}{pop false} ifelse or not {/ok false store}if curfilename filterfilename } def % addtofile first adds an approved filename to filedict. It then adds % a day-only date to datedict. It then adds the user to visitordict. % Finally, the curfilenum and curdatenum is added as data into data % strings in visitordict. /addtofile { ok { filedict curfilename cvn known {filedict curfilename cvn get } % strange lets /'s {filedict dup length 1 add curfilename cvn exch put filedict length} ifelse /filenum exch store % save file pointer curfulldate (:) search { exch pop exch pop /curdaydate exch store}{date_reading_error} ifelse datedict curdaydate cvn known {datedict curdaydate cvn get } % strange lets /'s {datedict dup length 1 add curdaydate cvn exch put datedict length} ifelse /datenum exch store % save date pointer visitordict curcust cvn known not {visitordict curcust cvn [()()] put} if % stuff visitor visitordict curcust cvn get dup 0 get % get old string (X) dup filenum 0 exch put mergestr 0 exch put visitordict curcust cvn get dup 1 get % get old string (X) dup datenum 0 exch put mergestr 1 exch put } if } def % /makerdicts inverts the date dictionary and the file dictionary % providing for much faster reporting access /makerdicts { (inverting dictionaries) print flush filedict rfiledict invertdict % invert file dict datedict rdatedict invertdict % invert date dict } def % /reportactivity goes through visitordict and generates a list of % users and the files they accessed in time order. Note: this can % later be greatly enhanced. Reporting is also best written to a % disk file rather than the error as is temporarily done here. /reportactivity{ (started report) print flush /visitorcount 0 store /repeatvisitor 0 store reportfilename (w) file /outfile exch def (\n\nUser paths through http://www.tinaja.com\n) outfile exch writestring visitordict { exch 100 string cvs (\n\n) exch mergestr (:) mergestr outfile exch writestring dup 1 get /datehold exch store /datecount 0 store datehold 0 get /dayhold exch store 0 get { datehold datecount get /daytry exch store daytry dayhold ne { (\n ===============) outfile exch writestring /dayhold daytry store /repeatvisitor repeatvisitor 1 add store } if rfiledict exch get (\n ) exch mergestr outfile exch writestring /datecount datecount 1 add store }forall /visitorcount visitorcount 1 add store } forall (\n\nTotal visitors: ) visitorcount 10 string cvs mergestr outfile exch writestring (\nRepeat visits: ) repeatvisitor 10 string cvs mergestr (\n\n) mergestr outfile exch writestring } def % mergestr merges the two top stack strings into one top stack string /mergestr {2 copy length exch length add string dup dup 4 3 roll 4 index length exch putinterval 3 1 roll exch 0 exch putinterval} def %%%%%%%% (F) DICTIONARY INVERSION UTILITY %%%%%%%%%%%%%%%% % Inverting a dictionary greatly speeds up letting you enter % a *value* to find a *key* Ferinstance, if /zorch 6 def % goes into a dictionary, entering a 6 in the reverse dict % retrives the original name. This is handy to undo the % compact string representations of dates and filenames. % Inversion is based on the fact that a PostScript dictionary % really just relates paired values of ANY non-null filetype. % The method is STRICTLY restricted to dictionaries where % each definition value is UNIQUE and no nulls are present. % Inversion is safest used when the definition values consist % solely of sequential integers. % enter with -frontwardsdict- -backwardsdict- on stack with % -backwardsdict- defined but empty. Code is amazingly short... /invertdict {begin {exch 100 string cvs def} forall end} def %%%%%%%% (G) FILE EXCLUSION/INCLUSION FILTER %%%%%%%%%%%%%% % /filterfilename accepts a filename and then reads an % /includename and an /excludename array. It returns true % if the filename remains valid. At present, matches are on % "contains". This is later modifiable for "starts with" or % "ends with". Array format is [(namefrag1)...(namefragn)] % and are read only if their length is greater than zero. % Thus, an empty include or exclude list includes everything. /includenamelist [] def % empty default of [(n1)(n2)..(nn)] /excludenamelist [(.gif)(.jpg)] def % empty default /filterfilename {/ffnhold exch store false includenamelist length 0 gt { includenamelist {ffnhold exch search {pop pop pop pop true} {pop} ifelse} forall} if true excludenamelist length 0 gt { excludenamelist {ffnhold exch search {pop pop pop pop false} {pop} ifelse} forall} if or not {/ok false store} if } def % ////////////// demo - remove or alter before reuse //////////////// % WARNING: Be sure to use "\\" when you mean "\" in any file string! /logfilename (c:\\Windows\\Desktop\\Zeketo~1\\bbbbb) def /reportfilename (c:\\Windows\\Desktop\\PSLOGG~1\\report) def convertfile reportactivity % Simple demo currently should return report file something like... %% %% %% cgigw.cgi.com: %% /linkpdf1.html %% /demo/linkdemo.pdf %% /demo/linkdemo.pdf %% /linkpdf1.html %% %% pm12-ppp29.gis.net: %% home page %% /tinaja01.html %% /glib/gramtram.pdf %% =============== % this is a different day repeat visit marker %% /index.html %% /libry01.html %% /other01.html %% /third01.html %% /whtnu01.html %% /text/bannlast.html %% /webwb01.html %% %% whx-ca14-01.ix.netcom.com: %% home page %% /libry01.html %% /hack01.html %% /dtekwb01.html %% %% 192.38.39.254: %% home page %% home page %% home page %% home page %% %% etc... %% %% Total visitors: 4576 %% Repeat visits: 785 %% % %%%%%%%%%%%%%%% END POSTSCRIPT LOG READER UTILITIES %%%%%%%%%%%%%%% % Consulting services available on concepts shown. % =================================================================== % Copyright c 1997 by Don Lancaster and Synergetics, Box 809, % Thatcher AZ, 85552. (520) 428-4073 www.tinaja.com don@tinaja.com % All commercial rights and all electronic media rights are % *fully* reserved. Linking welcome. Reposting expressly forbidden. % ===================================================================