Reading from a file

<-Previous | ^UP^ | Next->

Compared with writing to a file, reading from a file is a much more involved process. As with writing a file, a stream is opened and connected to the file. But then the file contents need to be read, line by line, until the end of the file. In general we will not know how many lines are to be read. Once the data has been read, it needs to be converted or conditioned into a format that the application can use. This implies

  • We know something about the format of the data being supplied
  • The data is supplied in that known format
It is often the achilles heel of many applications which take data from the outside world and effort in making this interface as robust as possible is always time well spent

A general purpose function to read data from a text file may look something like this

(defun read-file (file))(let ((result))(with-open-file (str file :direction :input)(do ((line (read-line str nil 'eof)(read-line str nil 'eof)))((eql line 'eof) result)(setq result (append result (list line)))))))

We have introduced the use of the CL macro do as part of the body of expressions wrapped in with-open-file

do accepts a list of variable expressions in the form (variable initial-iteration subsequent-iterations). The initial-iteration is an expression which is evaluated on the first iteration and subsequent-iterations is an expression evaluated on all subsequent iterations. In our case we are using the CL function read-line The first argument is the stream which comes from with-open-file, the second argument determines if an end of file error should be signalled, in this case nil, and the third argument is the value returned when end of file is encountered. The output of read-line is set to the variable line which is appended into the result variable. The second list in do defines the condition to be met to end iteration (in this case, the value of line is the end of file value) and the value to be returned when this condition is T, in this case result

Evaluating this function against the file created in the previous topic gives the following

GDL-USER> (read-file "c:/temp/report.txt")

("Box Width 3" "Box Length 6" "Box Height 4" "Box Center 0.0,0.0,0.0" "Box Volume 72")

The next task when reading a data file is generally parsing it to convert it into a format that our application can use. The following function, making use of our read-file function will do that and return an appropriate plist

(defun import-data (file)(let*((raw-data (read-file file))(res (mapcar #'(lambda(a) (glisp:split-regexp "\\s+" a)) raw-data))(res-1 (mapcan #'(lambda(a) (list (make-keyword (second a)) (third a))) res))(keywords (remove nil res-1 :key #'(lambda(a) (keywordp a))))(r nil))(dolist (k keywords r)(cond ((or (eq k :width) (eq k :length) (eq k :height))(setq r (append r (list k (read-safe-string (getf res-1 k))))))((eq k :center)(let ((co-ord (glisp:split-regexp "," (getf res-1 k))))(setq r (appendr(list k (make-point (read-safe-string (first co-ord))(read-safe-string (second co-ord))(read-safe-string (third co-ord))))))))))))

Working through the function

  • We first read the data into a variable raw-data. This is a list of strings, where each string is a separate line from the file
  • The we use the function glisp:split-regexp to split each string into separate words, breaking where there is one or more whitespaces (the regular-expression "\\s+")
  • Because we know the structure of the data, we can comfortably discarded the first word from each line and convert the second word into a keyword. By using mapcan the local variable res-1 becomes a plist of keyword and value, although each value is still a string
  • We need to convert the values into the correct data types, but it's different depending on what value we are considering. length, width and height will all be numbers, whilst center will be a point (vector).
  • Finally, we use the keywords to determine how to process the data and return a plist. A key point to note here is that we are not using the CL function read-string as it does have some security vulnerabilities; to prevent this we use the GendL function read-safe-string to convert the numbers as strings into real numbers
  • Evaluating the function in the REPL we get

GDL-USER> (import-data "c:/temp/report.txt")

(:WIDTH 3 :LENGTH 6 :HEIGHT 4 :CENTER #(0.0 0.0 0.0))

One final point to note in the example code; there is virtually no error handling. Given the data file is one we generated automatically and are therefore in full control of, this is probably acceptable, but in general the interface functions like shown above need to be extreemly robust