12.5 Importing Environments from Text Files

To aid in the conversion of environments from the old PDP software to the format used in PDP++, and for generally importing training and testing data represented in plain text files, we have provided functions on the Environment that read and write `.pat' files in the old PDP format. These functions are called ReadOldPDP, and WriteOldPDP.

The format that these functions read and write is very simple, consisting of a sequence of numbers, with an (optional) event name at the beginning of the line. When reading in a file, ReadOldPDP simply reads in numbers sequentially for each pattern in each event, so the layout of the numbers is not critical. If the optional name is to be used, it must appear at the beginning of the line that starts a new event.

For example, in the old PDP software, the "xor.pat" file for the XOR example looks like this:

p00 0 0 0
p01 0 1 1
p10 1 0 1
p11 1 1 0

It is critical that the EventSpec and its constitutent PatternSpecs (see section 12.2 Events, Patterns and their Specs) are configured in advance for the correct number of values in the pattern file. The event spec for the above example would contain two PatternSpecs. The PatternSpecs would look like:

PatternSpec[0] {
   type = INPUT;
   to_layer = FIRST;
   n_vals = 2;
};

PatternSpec[0] {
   type = TARGET;
   to_layer = LAST;
   n_vals = 1;
};

So that the first two values (n_vals = 2) will be read into the first (input) pattern, and the third value (n_vals = 1) will be read into the last (output) pattern.

The ReadOldPDP function also allows comments in the .pat files, as it skips over lines beginning with # or //. Further, ReadOldPDP allows input to be split on different lines, since it will read numbers until it gets the right number for each pattern.

There is a special set of comments you can use to control the creation and organization of subgroups of events. To start a new subgroup, put the comment # startgroup before the pattern lines for the events in your subgroup. When you are done with a subgroup put the comment # endgroup after the patterns of the last event in that subgroup. For example, if you wanted 2 groups of 3 events you might have a file that looked like this:

# startgroup
p01 0 0 0
p02 0 1 1
p03 0 1 0
# endgroup
# startgroup
p11 1 0 1
p12 1 1 0
p13 1 1 1
#endgroup 

WriteOldPDP simply produces a file in the above format for all of the events in the environment on which it is called. This can be useful for exporting to other programs, or for converting patterns into a different type of environment. For example if events were created originally in a regular Environment, but you now want to associate a frequency with them, then you can use WriteOldPDP to save the regular events to a file, and then use ReadOldPDP to read them into a FreqEnv which will enable a frequency to be attached to them.

For Environments that are more complicated than a simple list of events (e.g., if groups of events are used), it is possible to use CSS to import text files of these events. Example code for reading events structured into subgroups is included in the distribution as `css/include/read_event_gps.css', and can be used as a starting point for reading various kinds of different formats. The key function which makes writing these kinds of functions in CSS easy is ReadLine, which reads one line of data from a file and puts it into an array of strings, which can then be manipulated, converted into numbers, etc. This is much like the `awk' utility.

The read_event_gps.css example assumes that it will be read into a Script object in a project, with three s_args values that control the parameters of the expected format. Note that these parameters could instead be put in the top of the data file, and read in from there at the start.