Изображения страниц
PDF
EPUB

INTEGRAL GEOMETRY AS A TOOL IN PATTERN PERCEPTION

Albert B. J. Novikoff
Stanford Research Institute

This paper will be largely didactic, explaining what is, in a sense, a well-known part of the mathematical literature, but a part of the mathematical literature that I hope isn't so well known that I am telling people only things they already know. I hope therefore, you forgive me if I take what may seem to be a patronizing tack in the didactic portion, but I would rather be clear about a few simple things than address myself to profundities with only a lick and a promise. Then there is a portion that is not purely didactic, namely my proposal that this chapter of the mathematical literature may be of use in designing pattern recognition or pattern discrimination devices.

This work all began because of an attempt to clarify a detail concerning the behavior of Rosenblatt's Perceptron. I was told that the way in which the Perceptron would distinguish between, say, a circle and a square of the same area was, roughly speaking, to have in its own internal modus operandi a replica of, say, the circle with which it would go hunting around the retina, checking for overlap, and that in some sense circles prefer circles to squares of the same area, with regard to the amount of overlap. Some one had given me this idea--I don't say Rosenblatt, because I myself have never been able to read the Rosenblatt reports and rely on "filters" to acquire their content, and in fact one of these "filters" was

in error.

Now I smelled a rat concerning that statement and looked into integral geometry as the subject in which to find out precisely what is true about the overlap between circles and circles as opposed to the overlap between circles and squares. I did indeed find the rat, but the body of the rat is not anywhere in the Perceptron. That at any rate is how this inquiry all began. The device I propose has a very specific and modest end in view. We suppose that there is a prescribed alphabet of patterns, "characters, which are to be recognized, and that the designer of the device is very well aware of what this alphabet is. He has at his disposal as much geometrical description as he requires. However, he doesn't know the location and orientation of the pattern on the retina of the device. He wants to be able to discriminate, to be able to tell which of the alphabet has arisen, and my remarks concern how to design such a device.

Now I will give an instance of the kind of technique of pattern recognition which I will employ repeatedly. This example is chosen chiefly because it's intuitive; it rests on a theorem of integral geometry, as does

everything else that I will say, but this particular example rests on a theorem which is conceivably popular and well known, although I don't think that most of the other theorems I exploit are popular. This example is, in fact, not especially simple to fit in the framework of integral geometry as a whole, but taken by itself it is a very simple one to understand.

Suppose that the two patterns to be recognized are both an infinite grid of infinitely extended parallel lines and the grid width of one pattern d and the grid width of the other pattern is d'. Assume d < d' (Fig 1). How might we go about recognizing which of these two patterns is presented when we don't know the orientation of the pattern? We recall now a famous theorem which goes back to 1760, in common with a lot of other bright ideas, and which is usually associated with the name of Buffon, under the name of the "Buffon needle problem." Stated in the customary language of geometric probability, this states that if a "needle," that is, an oriented line segment, of length, smaller than the grid width, or at least no larger than the grid width, is tossed at random on a grid, the probability that the needle intersects a grid line (and doesn't just fall between), is (2/π)(l/a). Then the designer of the device picks a needle whose length is the smaller grid width d, tosses it repeatedly and averages the number of times the grid has crossed the line. If he gets a number which closely resembles 2/7, it was the smaller grid width he was dealing with. If he gets a number which approximates (2/π) (a/a1) or which is suspiciously less than 2/π, he will report, "I was looking at the coarser grid."

This method, roughly described, is independent of orientation and location of the grid. If the method of looking is itself independent of orientation and translation, the object being looked at can suffer orientations and translations.

We could actually outline a precise statistical test of hypotheses that the designer would go through in order to make his decision. I will omit these hypothesis-testing and decision-theoretic questions. The above summarizes the idea of how a designer could in principle make such a decision.

Now observe the following facts. One, I have already observed. The method is independent of orientation and translation. Second, the designer should be able to make some estimate of how many tosses of the needle are required to get an answer of a given reliability. This, however, is one of the statistical questions that I will not address myself to further, but answers--sensible statements--can be made on this question. (Unfortunately, some of these sensible statements require the use of higher moments of a random variable, not their first moments, and for many of the theorems I am going to describe, only first moments are available; however, this simply indicates there are some outstanding, useful problems to be solved; I don't think it is a major defect in my proposal). The most important thing to notice is that this method requires the knowledge of a theorem. We know the theorem of Buffon, which tells us what number to expect when we know what the pattern is and when we toss a needle upon it, i.e. 2/TT and (2/17) (à/à1). The more such theorems are available, the more recognition devices are possible, sensitive to more and more interesting patterns.

P

d'

THE "NEEDLE"

Figure 1

For example, one need not toss a needle, that is, an oriented segment, of length d. One can toss an arbitrary oriented, rectifiable curve of length d, for example, a circle of radius d/2 T, and again, the probability of intersection is just what it would have been for a needle, strange to say, so the flexibility of device mechanisms varies with the amount of theorems we possess. I am making a case, in other words, for storing a large number of theorems of this character.

Another remark. In this case there was no "retina" upon which the character was displayed (or rather the retina was, unrealistically, the whole plane). In general, recognition devices have sort of edge effects because the character or pattern will be presented in a certain finite location. How do we take that into account? I will have more to say about this later. To construct a physical device which performs pattern discrimination according to this idea, you want a device which can, first of all, toss a needle at random. It should be able to tally intersections. Now questions as to speed of repetition and the ability to tally, I regard as questions of an engineering nature and I will say no more about them, but what it means to toss a needle at random is not an intrinsically clear notion without further remarks.

I will conclude this introductory portion by giving an instance of the perils of loose reasoning in the realm of geometric probability and this instance is, again, offered solely to orient the listener. It too, is very well known, but I think it will serve to warn you why we must be clear about the notion of tossing objects at random before we can proceed to design devices based on that notion. This example is known as the Bertrand paradox, and is of mid-nineteenth century origin. We wish to toss a chord at random on a circle, and we ask, what is the probability that the chord should have a length which exceeds the length of the inscribed equilateral triangle. Of course, there are many inscribed equilateral triangles, but the side of an equilateral triangle inscribed in the circle is independent of the choice of the equilateral triangle. The problem has the following solution, according to some: By reasons of symmetry you may regard one end of the chord as fixed so that it is only the location of the second end point which determines the chord. The circle can be trisected into three arcs, each of which is equally likely when we toss a chord at random (Fig. 2 a). If the second end point lies in either of the two adjacent arcs the chord is not a "successful" one, but in the opposite arc it is. The three arcs being equally likely, the probability is one-third.

The second solution is this. By reasons of symmetry we may regard the angle of the chord as fixed and proceed to examine only one family of parallel chords, taking as fixed the common diameter that they are perpendicular to. We ask what fraction of this family is successful (Fig. 2 b). In this case an elementary study of what an equilateral triangle inscribed in a circle looks like, shows that if you quadrisect that diameter, the chords lying in the extreme two-fourths, whereas those crossing the central two-fourths are "successful." Thus, of the four equally likely divisions of the diameter, two of them are successful and two are not. So the probability of success is one-half.

« ПредыдущаяПродолжить »