REDUNDANCY AND LINEAR LOGICAL NETS Seymour Papert National Physical Laboratory 1. Linear Logical Nets In this paper we shall discuss, in a general form, a class of logical computers known under such names a "neuron-nets", "random nets", "perceptron", "conditional probability computers", etc. Our purpose: to show that in a certain sense they function by exploiting redundancy and are thus related to the "infallible" networks reported at this symposium. All these systems have a set of input variables which we shall assume, for simplicity, to be binary. Il.. · In' signal is an n-vector S= (I1. . . In) determined by the values of the input variable at a particular time. The purpose of the system is to recognize or distinguish between certain classes of input signals. We denote classes of signals by the Greek Σ, when necessary, with subscripts such as Σ,Σ2 · There are one or more output variable; which can take two values, or 0, and which take one or the other of these values say " 2 as a function of the input signal, ♫ = F(S) = F(I,···.In). In other words the system computes logical (i.e. Boolian) functions of the input variables. In order to recognize the class of signals the function F must be such that F(S)=1 for S in E and F(S)= 0 for S in the complementary set or vice versa, and clearly there is only one such function. For discrimination, however, there will in general be many functions which are as good as one another and we shall see that this fact is of some importance in the theory of these systems. In effect, if we want to discriminate between classes Σ, and Ɛ2, we need a function F such that F(S)= 1 for S in 2, and F(S)= 0 for S in while the value of F(S) can be 0 or1 for S in neither Σ, 2 or 2. The interesting systems are those which can compute more than one logical function, so we suppose that of parameters which determine the there is a set x1 Xm function to be computed. We note by X=(x1. . . xm) the vector quantity representing all these parameters and by Fx (S) the logical function computed when the parameters have the value X. A particular interest attaches to systems which can set their parameters automatically through a feedback which tells them whether they are making the desired response or not to each signal. Any given system, which we shall call repertoire (B), of logical functions i.e. B, has a definite (B) is the set of all logical functions of the form Fx (S) where X runs over all possible values. In principle (B) could be the whole set, say of all possible Boolian functions of n variables. In fact this is impossible for a real hardware system for n at all big. No matter how the computations in B are carried out, the numbers of possible states of B is bounded by tm where m is the number of parameters and t is the number of effective values each parameter car. take. Now contains 22h aan function and if all (B), we should have to these are to be in the repertoire have tm at least equal to 22 i.e. m bigger than 2 log2t. Thus for n = 100, even if t = 250, m would have to be at least 210050 1.e. about 294 ! universal logical computer partial logical computer. We shall call the system, B, a Our concern here is with partial computers either as practical devices (eg for pattern recognition) or as models of brain functioning (if, indeed, there is a distinction between the two motivations) which satisfy certain conditions inspired by ideas about the nature of brain functioning; in particular: simple components with properties inspired by neurones, limited interconnectivity, highly parallel operation and, feed-back self-programming. It would be interesting to formalize these conditions and develop a general theory of systems which satisfy them. In the absence of such a theory, we shall discuss the problems involved in the synthesis of a particular type of system, which we shall call semi-linear net. The mathematical structure of these systems is as follows: The system first transforms the input variables I into a set of new variables 1,...1 where [1 = F1 (11 n In In) is a fixed Boolian function independent of the parameter X of The term linear net would be appropriate for the special case where the set {7}defined in the next paragraph coincides with the set {i}· the system. The outputs are a set ♪...♫、 of binary variables each of which is a linear Boolian function of the variables 1........1k. The notion of linear Boolian function is implicit in much work on this subject and there is a pressing need for the development of a mathematical theory of these functions which can be defined thus: a logical function F(y1 · m) of the binary variables y is linear if there m i=l Yixi 70. The that: F(y1 ...m) = 1 if and only if following systems are all cases of linear logical nets: (i) The conditional probability computers studied by A.M. Uttley (ii) a generalization of the conditional probability computer discussed independently by Minsky and Selfridge and by myself at the 1960 London Information Theory Symposium (iii) the later versions of the perceptron studied by Rosenblatt and his collaborators (iv) the McCulloch "threshold neurons" and the "infallible nets" discussed by the McCulloch group at this Symposium. (We shall return to comment on the relationship between systems such as the perceptron and the "infallible nets" which exists in their mathematical structure despite enormous differences in motivation.) 2. Some Comparative Psychology of Nets and People The crucial fact in the synthesis of any logical computer is in a that universality is impossible as soon as n exceeds a small limit and in the present context this manifests itself particularly clear-cut way through the theorem that the number of linear logical functions tends to zero as a fraction of the total number of Boolian functions on n variables. (A simple proof is given in note (1) at the end of the paper.) This means that any given net will be able to be made to discriminate between only a very small fraction of possible pairs (E,, E, ) of sets of input signals. The compensating fact, which makes the theory of these nets interesting, is that one might know in advance something about the discriminations it is likely to be called on to carry out, and design it accordingly. This prior information might be of two sorts which we shall call qualitative and quantitative. The first of these is obvious and has been frequently mentioned, so we shall refer to it only in relation to the other, which, as far as we know, has not been discussed explicitly in the literature and arises in the following way: Suppose we wish to design a simplified "animal" which will have to learn to respond differently to two classes of external signals. If we know in advance what the classes are, there is no problem. Suppose, then, that we know nothing about the two classes except that they will each contain at most No signals. What sort of linear net would we need? Before we can give any answer we have to make some assumptions about the sense organs of our "animal", say that it needs a retina of √n by √ |