the system to ”bootstrap” the subsequent task evolution.
In the first task the controller is required to fly the helicopter along a prede- fined random generated set of waypoints. The fitness is based on progress along the path so defined; a waypoint was deemed to be visited when the centre of the helicopter approached within 1 foot of it. The waypoints were placed at a mean distance of 17.5 feet from each other. To encourage a straight path between the waypoint the fitness was reduced if the helicopter deviated from the path. Additional penalties were awarded for differences from the correct altitude and heading. The complete fitness function is this:
f = i=0 (whPchain | z − znext | − | ψ − ψref |) . (1)
N
1 One of the authors, who can fly the Hirobo helicopter very competently, has consis- tently failed to control the simulated model for more than a few seconds of simulated time.
Where N is the number of timesteps allowed to execute the task (for evolution N = 1000 was used) and znext , ψref are respectively the altitude of the next waypoint, and the fixed reference heading. The factor wh is equal to one if the helicopter is on the shortest path between waypoints and decays as the cube of the orthogonal distance from it.
The input to the network are formed by the helicopter attitudes φ, θ, ψ, the rotational speeds p, q, r, the linear speeds u, v, w, and the relative distance to the next waypoint ∆x, ∆y, ∆z (both speeds are expressed in the helicopter body reference frame).论文网
A sample of the trajectory flown by the best controller during a test run is shown in figure 3. As we can see, the evolved controller exhibits the ability to fly correctly through the predesigned waypoint chain. Regardless of the relative distance or the position between the waypoints, the path is very smooth.
A second task was devised with the Reynolds flocking rules in mind. At the beginning of the task the helicopter is started in a hovering position, and a randomly generated increment in velocity (expressed in the three helicopter frame of reference components) is requested. Each single velocity increment can have a value in the range [−3.5 &pide; 3.5]ft/s; if the sum of the current velocity and of the increment exceeds 10ft/s a cut-off is applied to guarantee a requested speed consistent with the helicopter’s capabilities. Along with the requested change in velocity, the required duration of the change is also specified. This is also a random value in the range [12 &pide; 350] timesteps. The fitness is simply the squared magnitude of the error between the commanded and the instantaneous helicopter velocity:
.N 2
f = i=0 "v(t) − vc(t)"2 . (2)
N
Where N is the number of timesteps allowed to execute the task (for evolution N = 1000 was used), v(t) is the velocity of the helicopter in the body frame coordinates at time t and vc(t) is the velocity commanded at the same time instant.
The inputs to the network are the helicopter attitudes φ, θ, ψ and rotational speeds p, q, r, the linear speeds in the body reference frame u, v, vz , and the difference between the commanded speeds and the actual speeds ∆u, ∆v, ∆vz .
Sample plots of the difference between the commanded value of the velocity and the actual velocity are displayed in figure 4. It is clear that the helicopter speed varies in accordance with the request; unsurprisingly, a steady state error is present. It is noteworthy that the responses to the input steps do not show any signs of instability, and that a clear coupling between the longitudinal and lateral speeds is present. Future work will add complexity in the network structure to attempt to compensate for the coupling.