%! % name of textfile: % .... % X E % Z / {} def / {} def %! % POSTSCRIPT & DISTILLER FILE MANIPULATION TOOLS % ============================================== % by Don Lancaster % Copyright c. 1996 by Don Lancaster and Synergetics, Box 809, % Thatcher AZ, 85552 (520) 428-4073. synergetics@tinaja.com % All commercial rights and all electronic media rights *fully* % reserved. Reposting is expressly forbidden. % Further support on www.tinaja.com % Consulting services available via don@tinaja.com % ==================================== % The PostScript general purpose computing language is ideal for % manipulating disk based files. Either inside a printer that has % a companion hard disk, or by using Distiller or GhostScript on host. % Here are a collection of tools and program fragments that show % some of the possibilities. Read the tools first in an editor, % then isolate them into individual routines. Finally, customize % each routine for your own use. % These examples were initially aimed at extracting useful information % from web activity log files. Some bugs or limitations may remain. % Be sure to use "\\" when you mean "\" in all filename strings! % These utilities are greatly enhanced through the Gonzo PostScript % utilities in GONZO20.PS on the PostScript library shelf of www.tinaja.com % Here's a typical web log file entry line... % la30-120.compuglotch.com - - [22/Aug/1996:00:27:52 -0700] % "GET /greenstrap.gif HTTP/1.0" 200 7263 % Everything up to the first space is the approximate user address. But % note that a user might be, say -dialin123- one time and -dialin126- % the next. And that a file might be reloaded in a few seconds or a few % minutes. And that the country of numeric entries is hard to extract. % The bracketed portion is the time and date. The time starts with the first % colon. I am not sure what the -0700 is. Probably a format clue. % The quoted portion is the file requested or other activity asked for. % The next to the last number is the transaction result, with "200" % meaning "file sent". The final number is how many bytes got transferred. % For a crude extraction, grab everything up to the first space for the % user. Grab everything in brackets for the date. Grab everything in % quotes for the activity. Then reduce the substrings from there. %%%%%%%%%%%%%% (A) CHECK AVAILABLE DISK SPACE %%%%%%%%%%%%%%% statusdict begin diskonline dup == flush true {diskstatus == == flush} if end %%%%%%%%%%%%%%%%%% (B) CATALOG DISK %%%%%%%%%%%%%%%%%%%%%%%%%% % This is crude code that needs improved... /scratchstring 50 string def /str ( ) def (%*) {scratchstring print flush (\r) print flush 0 1 scratchstring length 1 sub {scratchstring exch 32 put} for } scratchstring filenameforall %%%%%%%%%%%%%% (C) WRITE A FILE TO DISK %%%%%%%%%%%%%%%%%%%% % uncomment, rename, and PREPEND the following to a text file to be % stored on a laser printer's hard disk... % Note: any "\" in the filename string MUST be entered as "\\". %% /filename (filenamehere) def %% %% filename status %% {4 {pop} repeat filename deletefile} if %% /mfn filename (w) file def %% /buffer 1024 string def %% /bf {{currentfile buffer readstring pop dup length 0 eq %% {pop mfn closefile exit}{mfn exch writestring}ifelse %% } loop}def %% bf %%%%%%%%%%%%%%%%% (D) VERIFY PRESENCE OF FILE %%%%%%%%%%%%%%% /filename (xxxxx) def filename status dup true ne {== flush}{== == == == == flush} ifelse % returns false or pages bytes access creation true %%%%%%%%%%%%%%% (E) READ A FILE ONE BYTE AT A TIME %%%%%%%%%%% /strrx (X) def % temp char stash /filename (xxxxx) def filename (r) file /myworkfile exch def % name of opened file { myworkfile read { strrx exch 0 exch put % character to stash strrx print flush % reporter example } {myworkfile closefile exit} ifelse } loop % code resumes here. %%%%%%%%%%%%%%% (F) READ A FILE ONE LINE AT A TIME %%%%%%%%%%% /strrx 256 string def % temp line stash /filename (xxxxx) def filename (r) file /myworkfile exch def % name of opened file { myworkfile strrx readline { % read line to here print (\r\r) print flush % reporter example } {myworkfile closefile exit} ifelse } loop % code resumes here. %%%%%%%%%%%%%% (G) EXTRACT EACH FULL DATE FROM A LOG FILE %%%%%%%%%%%%%% /grabdate { linehold ([) search { pop pop % get everything after brackets (]) search {exch pop exch pop % get everything before brackets == flush } {pop} ifelse } {pop} ifelse } def % use example /strrx 256 string def % temp line stash /filename (xxxxx) def filename (r) file /myworkfile exch def % name of opened file {myworkfile strrx readline {/linehold exch store grabdate} {myworkfile closefile exit} ifelse } loop % code resumes here. %%%%%%%%%%%%%% (H) EXTRACT DAY DATE ONLY FROM A LOG FILE %%%%%%%%%%%%%% /grabdateonly { linehold ([) search { pop pop % get everything after brackets (:) search {exch pop exch pop % get everything before colon == flush } {pop} ifelse } {pop} ifelse } def % use example /strrx 256 string def % temp line stash /filename (xxxxx) def filename (r) file /myworkfile exch def % name of opened file {myworkfile strrx readline { /linehold exch store grabdateonly} {myworkfile closefile exit} ifelse } loop % code resumes here. %%%%%%%%%%%%%% (I) EXTRACT UNIQUE DATES ONLY FROM A LOG FILE %%%%%%%%%%%%%% % This can be expanded to let you handle each day's hits separately % Assumes that dates are in strict numeric order. /currentdate (X) def /hitstoday 0 def /grabuniquedateonly { linehold ([) search { pop pop % get everything after brackets (:) search {exch pop exch pop % get everything before colon dup currentdate eq { pop} {dup 20 string cvs % DEREFERENCE string pointer!!! /currentdate exch store % leaves date for next line == flush } ifelse } {pop} ifelse} {pop} ifelse} def % use example /strrx 256 string def % temp line stash /filename (xxxxx) def filename (r) file /myworkfile exch def % name of opened file {myworkfile strrx readline { /linehold exch store grabuniquedateonly} {myworkfile closefile exit} ifelse } loop % code resumes here. %%%%%%%%%%%%%% (J) COUNT ALL HITS ON A GIVEN DAY %%%%%%%%%%%%%% % This can be expanded to let you handle each day's hits separately % Assumes that dates are in strict numeric order. /currentdate (X) def /hitstoday 0 def /inchits {/hitstoday hitstoday 1 add store} def /reportolddate { currentdate (X) ne {currentdate print ( had ) print hitstoday 20 string cvs print ( hits.\r) print flush} if } def /grabdate&counthits { linehold ([) search { pop pop % get everything after brackets (:) search {exch pop exch pop % get everything before colon inchits % add one to hit counter dup currentdate eq {pop} {20 string cvs % DEREFERENCE string pointer!!! /newcurrentdate exch store % save new date reportolddate /hitstoday 0 store /currentdate newcurrentdate 20 string cvs store % DEREF again } ifelse } {pop} ifelse } {pop} ifelse } def % use example /strrx 256 string def % temp line stash /filename (xxxxx) def filename (r) file /myworkfile exch def % name of opened file {myworkfile strrx readline {/linehold exch store grabdate&counthits} {myworkfile closefile exit} ifelse } loop reportolddate % for last date string % code resumes here. %%%%%%%%%%%%%% (K) REPORT ALL HITS ON A GIVEN DAY %%%%%%%%%%%%%% % This is just a stepping stone to more advanced procs. /currentdate (X) def /hitstoday 0 def /inchits {/hitstoday hitstoday 1 add store} def /reportolddate { currentdate (X) ne {currentdate print ( had ) print hitstoday 20 string cvs print ( hits.\r) print flush} if } def /processfilename { ( ) search { exch pop exch pop == flush} if } def /reportgotfile { linebeyonddate ("GET ) search {pop pop processfilename }{pop} ifelse } def /grabdate&counthits { linehold ([) search { pop pop % get everything after brackets (:) search {exch pop exch % get everything before colon /linebeyonddate exch store % save file info for further use inchits % add one to hit counter dup currentdate eq {pop reportgotfile } {20 string cvs % DEREFERENCE string pointer!!! /newcurrentdate exch store % save new date reportolddate /hitstoday 0 store /currentdate newcurrentdate 20 string cvs store % DEREF again (\rProcessing ) print currentdate print (:\r) print flush reportgotfile } ifelse } {pop} ifelse } {pop} ifelse } def % use example /strrx 256 string def % temp line stash /filename (xxxxx) def (starting processing of ) print filename print (\r\r) print flush filename (r) file /myworkfile exch def % name of opened file {myworkfile strrx readline {/linehold exch store grabdate&counthits} {myworkfile closefile exit} ifelse } loop reportolddate % for last date string % code resumes here %%%%%% (L) REPORT SELECTED SUBDIRECTORY HITS ON A GIVEN DAY %%%%%%%%%%% % Notes: reports all log entries, even if made by same url within a few % minutes of each other, and even if not all bytes transferred. % % -chosensubdirectory- can also target an individual file. /chosensubdirectory (/glib) def % select your subdirectory or file /currentdate (X) def /hitstoday 0 def /filteredhitstoday 0 def /inchits {/hitstoday hitstoday 1 add store} def /incfhits {/filteredhitstoday filteredhitstoday 1 add store} def /reportolddate { currentdate (X) ne {currentdate print ( had ) print hitstoday 20 string cvs print ( total hits and ) print filteredhitstoday 20 string cvs print ( filtered hits.\r) print flush} if } def /processfilename { ( ) search {exch pop exch pop chosensubdirectory search {pop pop == flush incfhits} {pop} ifelse } if } def /reportgotfile {linebeyonddate ("GET ) search {pop pop 100 string cvs processfilename } {pop} ifelse} def /grabdate&counthits { linehold ([) search { pop pop % get everything after brackets (:) search {exch pop exch % get everything before colon 200 string cvs % DEREFERNCE string!! /linebeyonddate exch store % save file info for further use inchits % add one to hit counter dup currentdate eq { pop reportgotfile } {30 string cvs % DEREFERENCE string pointer!!! /newcurrentdate exch store % save new date reportolddate /hitstoday 0 store /filteredhitstoday 0 store /currentdate newcurrentdate 30 string cvs store % DEREF again (\rProcessing ) print currentdate print (:\r) print flush reportgotfile } ifelse } {pop} ifelse } {pop} ifelse } def % use example /strrx 256 string def % temp line stash /filename (xxxxx) def (starting processing of ) print filename print (\r\r) print flush filename (r) file /myworkfile exch def % name of opened file {myworkfile strrx readline {/linehold exch store grabdate&counthits} {myworkfile closefile exit} ifelse } loop reportolddate % for last date string % code resumes here. %%%%%% (M) REPORT TOTAL SELECTED SUBDIRECTORY OR FILE HITS %%%%%%%%%%% % This differs from (L) in that an array of filenames and their totals are % created for the entire log file. The format is /totalstash [[(f1) f1count] % [(f2) f2count] ... [(fn) fncount] -null- -null- -null- ] def % /reporttotals reports the contents of the final file counts... /reporttotals { 0 1 usednames 1 sub {/posn exch store % for the list totalstash posn get dup 0 get print % get name ( had a total of ) print % and then 1 get 10 string cvs print % get count ( hits.\r) print flush % and report } for } def % demo-- %% /totalstash [[(alpha)43][(bravo)27][(charlie)14][(delta)51]] def %% reporttotals % /addfhit adds one count to known stash or creates a new one /addfhit { /gotmatch false store % set match flag 25 string cvs /curfile exch store % grab filename 0 1 usednames 1 sub {/posn2 exch store % test for presence totalstash posn2 get 0 get curfile eq {totalstash posn2 get dup 1 get 1 add 1 exch % bump count if present put /gotmatch true store exit % and stop search } if } for gotmatch not % create new entry { totalstash usednames [curfile 1] put % two element array /usednames usednames 1 add store } if % bump name count } def % /inittotalaccum sets up a blank array for each new date... /inittotalaccum { /maxfiles 100 def % maximum anticipated filenames /totalstash [maxfiles {null} repeat] def % null array /usednames 0 def } def % demo... %% inittotalaccum %% (aa) addfhit %% (aa) addfhit %% (aa) addfhit %% (bb) addfhit %% (aa) addfhit %% (aa) addfhit %% (aa) addfhit %% (cc) addfhit %% (bb) addfhit %% reporttotals %% % and the main code /chosensubdirectory (/glib) def % select your subdirectory or file /currentdate (X) def % running data stash /hitstoday 0 def % total hit accumulator /filteredhitstoday 0 def % subdirectory hit accumulator /inchits {/hitstoday hitstoday 1 add store} def /incfhits {/filteredhitstoday filteredhitstoday 1 add store} def % /reportolddate reports the total and subdirectory hits for each day. % Usually just after the date changes. % The initial date is (X) and is unreported. /reportolddate { currentdate (X) ne {currentdate print ( had ) print hitstoday 20 string cvs print ( total hits and ) print filteredhitstoday 20 string cvs print ( filtered hits.\r) print flush} if } def % /procesfilename strips the filename out of the remainder of the log file % line by looking for the first trailing space. /processfilename { ( ) search {exch pop exch pop % filename ends with space chosensubdirectory search {pop pop % split out filename addfhit % add to list incfhits % bump hit count }{pop} ifelse} if % only if space present. } def % /isolategotfile pulls out log file lines that start with "GET... /isolategotfile { linebeyonddate ("GET ) search {pop pop 100 string cvs % DEREFERENCE string!! processfilename } {pop} ifelse % and process line } def % /grabdate&counthits isolates the day date and increments the hit counter % Note that PostScript strings are often defined by pointers to other strings % and can wildly change in any complex program. Dereferencing a string % with a ... 30 string cvs ... or whatever guarantees the new string % content is uniquely defined and will not unexpectedly change later. /grabdate&counthits { linehold ([) search { pop pop % get everything after brackets (:) search {exch pop exch % get everything before colon 200 string cvs % DEREFERNCE string!! /linebeyonddate exch store % save file info for further use inchits % add one to hit counter dup currentdate eq % is this same date as before { pop isolategotfile } % yes, get filename info only { 30 string cvs % DEREFERENCE string pointer!!! /newcurrentdate exch store % save new date reportolddate % report previous day hits /hitstoday 0 store % reset hit counters /filteredhitstoday 0 store /currentdate newcurrentdate 30 string cvs store % DEREF again (\rProcessing ) print currentdate % print status message print (:\r) print flush isolategotfile % get filename info } ifelse } {pop} ifelse } {pop} ifelse } def % use example % start exec loop /strrx 256 string def % temp line stash /filename (xxxxx) def % disk name of log file (starting processing of ) print filename print (\r\r) print flush filename (r) file /myworkfile exch def % name of opened file inittotalaccum % clear counters {myworkfile strrx readline {/linehold % read log file a line at a time exch store % save log line grabdate&counthits} % check date and process further {myworkfile closefile exit} ifelse % exit on end of log file } loop reportolddate % for last date string completed reporttotals % totals for file (done.) print flush % status % code resumes here. % showpage or rest of program % showpage %%%%%% (N) IMPROVED REPORT TOTAL SELECTED SUBDIRECTORY OR FILE HITS %%%%%%%% % Added features to (M) include sorting the reports, a total file report, % and a minimum hit clip. % This PostScript utility can be sent to Acrobat distiller 3.0 to % read a web log file, calculate hits and filtered hits by date % and report the results. % All processing is in RAM and only a single hard disk pass is required. % To use, change the log filename and your filter subdirectory name below. % Send to Acrobat DISTILLER. Read generated .LOG file. Throw away .PDF file. % An array of filenames and their totals are % created for the entire log file. The format is /totalstash [[(f1) f1count] % [(f2) f2count] ... [(fn) fncount] -null- -null- -null- ] def % /reporttotals reports the contents of the final file counts... /reporttotals { return flush % pretty print (Total filtered hits for entire log file:) print return return flush % header totalstash bubblesortx /stotstash exch store % sort highest first 0 1 usednames 1 sub {/posn exch store % for the list stotstash posn get 1 get minhits ge { % ignore dregs stotstash posn get dup 0 get print % get name ( had a total of ) print % and then 1 get 10 string cvs print % get count ( hits.) print return flush % and report } if % dregs } for % names } def % demo-- %% /totalstash [[(alpha)43][(bravo)27][(charlie)14][(delta)51]] def %% reporttotals % /addfhit adds one count to known stash or creates a new one /addfhit { /gotmatch false store % set match flag 127 string cvs /curfile exch store % grab filename 0 1 usednames 1 sub {/posn2 exch store % test for presence totalstash posn2 get 0 get curfile eq {totalstash posn2 get dup 1 get 1 add 1 exch % bump count if present put /gotmatch true store exit % and stop search } if } for gotmatch not % create new entry { totalstash usednames [curfile 1] put % two element array /usednames usednames 1 add store } if % bump name count } def % /inittotalaccum sets up a blank array for each new date... /inittotalaccum { /maxfiles 100 def % maximum anticipated filenames /totalstash [maxfiles {null} repeat] def % null array /usednames 0 def /hitsforall 0 def % global total hit counter /filteredhitsforall 0 def % global filtered hit counter } def % demo... %% inittotalaccum %% (aa) addfhit %% (aa) addfhit %% (aa) addfhit %% (bb) addfhit %% (aa) addfhit %% (aa) addfhit %% (aa) addfhit %% (cc) addfhit %% (bb) addfhit %% reporttotals %% % and the main code /chosensubdirectory (/glib) def % select your subdirectory or file /minhits 0 def /currentdate (X) def % running data stash /hitstoday 0 def % total hit accumulator /filteredhitstoday 0 def % subdirectory hit accumulator /inchits {/hitstoday hitstoday 1 add store} def /incfhits {/filteredhitstoday filteredhitstoday 1 add store} def % /bubblesortx accepts an array in /totalstash format and rearranges it so % the most used files go to the left... /bubblesortx { 0 usednames getinterval % strip the nulls mark % mark the stack exch aload pop counttomark /idx exch def % count the elements { 0 1 idx 1 sub {pop % don't need pointer 2 copy % check stack top pair 1 get exch 1 get exch % look at second array element lt {exch} if idx 1 roll} for idx 1 roll % blurp /idx idx 1 sub store % one less element idx 0 eq {exit} if} loop ] } def % /reportolddate reports the total and subdirectory hits for each day. % Usually just after the date changes. % The initial date is (X) and is unreported. /reportolddate { currentdate (X) ne {currentdate print ( had ) print hitstoday 20 string cvs print ( total hits and ) print filteredhitstoday 20 string cvs print ( ") print chosensubdirectory print (" filtered hits.) print return (This is a hit ratio of ) print filteredhitstoday hitstoday div 100 mul % optional find percentage 100 mul cvi 100 div % change to percent format 5 string cvs print (%.) print return % and report flush} if } def /reportalldates {return (The entire log file had ) print hitsforall 20 string cvs print ( total hits and ) print filteredhitsforall 20 string cvs print ( ") print chosensubdirectory print (" filtered hits.) print return (This is a hit ratio of ) print filteredhitsforall hitsforall div 100 mul % optional find percentage 100 mul cvi 100 div % change to percent format 5 string cvs print (%.) print return % and report flush} def % /procesfilename strips the filename out of the remainder of the log file % line by looking for the first trailing space. /processfilename { ( ) search {exch pop exch pop % filename ends with space chosensubdirectory search {pop pop % split out filename addfhit % add to list incfhits % bump hit count }{pop} ifelse} if % only if space present. } def % /isolategotfile pulls out log file lines that start with "GET... /isolategotfile { linebeyonddate ("GET ) search {pop pop 127 string cvs % DEREFERENCE string!! processfilename } {pop} ifelse % and process line } def % /grabdate&counthits isolates the day date and increments the hit counter % Note that PostScript strings are often defined by pointers to other strings % and can wildly change in any complex program. Dereferencing a string % with a ... 30 string cvs ... or whatever guarantees the new string % content is uniquely defined and will not unexpectedly change later. /grabdate&counthits { linehold ([) search { pop pop % get everything after brackets (:) search {exch pop exch % get everything before colon 200 string cvs % DEREFERNCE string!! /linebeyonddate exch store % save file info for further use inchits % add one to hit counter dup currentdate eq % is this same date as before { pop isolategotfile } % yes, get filename info only { 30 string cvs % DEREFERENCE string pointer!!! /newcurrentdate exch store % save new date reportolddate % report previous day hits /hitsforall hitsforall hitstoday add store % update totals /filteredhitsforall filteredhitsforall filteredhitstoday add store /hitstoday 0 store % reset hit counters /filteredhitstoday 0 store /currentdate newcurrentdate 30 string cvs store % DEREF again return (Processing ) print currentdate % print status message print (:) print return flush isolategotfile % get filename info } ifelse } {pop} ifelse } {pop} ifelse } def %%%%%%%%%%%%%%%%%%% USE EXAMPLE %%%%%%%%%%%%%%%%%%%% % IMPORTANT: Enter "\\" any time you mean "\" in the filename!!! /filename (c:\\Windows\\Desktop\\Zeketo~1\\webtrac.006) def % disk log file /chosensubdirectory (/glib) def % select subdirectory or file % must start with "/" /minhits 3 def % minimum reportable hits /crlf true def % ibm format? /strrx 256 string def % temp line stash /return {(\r) print crlf {(\n) print} if} def (starting processing of ) print filename print return return flush filename (r) file /myworkfile exch def % name of opened file inittotalaccum % clear counters {myworkfile strrx readline {/linehold % read log file a line at a time exch store % save log line grabdate&counthits} % check date and process further {myworkfile closefile exit} ifelse % exit on end of log file } loop reportolddate % for last date string completed reportalldates % global hit sums reporttotals % totals for file return return (rdone.) print flush % status % code resumes here % showpage or rest of program showpage % ==================================== % Copyright c. 1996 by Don Lancaster and Synergetics, Box 809, % Thatcher AZ, 85552 (520) 428-4073. synergetics@tinaja.com % All commercial rights and all electronic media rights *fully* % reserved. Reposting is expressly forbidden. % Further support on www.tinaja.com % Consulting services available via don@tinaja.com % ====================================