Название | Robot Learning from Human Teachers |
---|---|
Автор произведения | Sonia Chernova |
Жанр | Компьютерное Железо |
Серия | Synthesis Lectures on Artificial Intelligence and Machine Learning |
Издательство | Компьютерное Железо |
Год выпуска | 0 |
isbn | 9781681731797 |
Accurately sensing the teacher’s actions is critical for the success of this approach. Traditionally, many techniques have relied on instrumenting the teacher’s body with sensors, including the use of motion capture systems and inertial sensors. Ijspeert et al. [114, 115] use a Sarcos Sen-Suit worn by the user to simultaneously record 35 DOF motion. The recorded joint angles were used to teach a 30-DoF humanoid to drum, reach, draw patterns, and perform tennis swings (Figure 3.3(a)). This work is extended in [184] to walking patterns. The same device, supplemented with Hall sensors, is used by Billard et al. to teach a humanoid robot to manipulate boxes in sequence [29]. In later work, Calinon and Billard combine demonstrations executed by human teacher via wearable motion sensors with kinesthetic teaching [50].
Wearable sensors, and other forms of specialized recording devices, provide a high degree of accuracy in the observations. However, their use restricts the adoption of such learning methods beyond research laboratories and niche applications. A number of approaches have been designed to use only camera data. One of the earliest works in this area was the 1994 paper by Kuniyoshi et al. [152], in which a robot extracts the action sequence and infers and executes a task plan based on observations of a human hand demonstrating a blocks assembly task. Another example of this demonstration approach includes the work of Bentivegna et al. [25], in which a 37-DoF humanoid learns to play air hockey by tracking the position of the human opponent’s paddle (Figure 3.3(b)). Visual markers are also often used to improve the quality of visual information, such as in [30], where reaching patterns are taught to a simulated humanoid. Markers are similarly used to optically track human motion in [122, 123, 259] and to teach manipulation [209] and motion sequences [10]. In recent years, the availability of low-cost depth sensors (e.g., Microsoft Kinect) and their associated body pose tracking methods makes this a great source of input data for LfD methods that rely on external observations of the teacher (e.g., [79]).
Related to the learning by observation problem, several works focus exclusively on the perceptual-motor mapping problem of LfD, where in order to imitate the robot has to map a sensed experience to a corresponding motor output. Often this is treated as a supervised learning problem, where the robot is given several sensory observations of a particular motor action. Demiris and Hayes use forward models as the mechanism to solve the dual-task of recognition and generation of action[80]. Mataric and Jenkins suggest behavior primitives as a useful action representation mechanism for imitation [122]. In their work on facial imitation, Breazeal et al. use an imitation game to facilitate learning the sensory-motor mapping of facial features tracked with a camera to robot facial motors. In a turn-taking interaction the human first imitates the robot as it performs a series of its primitive actions, teaching it the mapping, then the robot is able to imitate [37].
Finally, observations can also focus on the effects of the teacher’s actions instead of the action movements themselves. Tracking the trajectories of the objects being manipulated by the teacher, as in [249], can enable the robot to infer the desired task model and to generate a plan that imitates the observed behavior.
3.4 LEARNING FROM CRITIQUE
The approaches described in the above sections capture demonstrations in the form of state-action pairs, relying on the human’s ability to directly perform the task through one of the many possible interaction methods. While this is one of the most common demonstration techniques, other forms of input also exist in addition to, or in place of, such methods.
In learning from critique or shaping, the robot practices the task, often selecting actions through exploration, while the teacher provides feedback to indicate the desirability of the exhibited behavior. The idea of shaping is borrowed from psychology, in which behavioral shaping is defined as a training procedure that uses reinforcement to condition the desired behavior in a human or animal [234]. During training, the reward signal is initially used to reinforce any tendency towards the correct behavior, but is gradually changed to reward successively more difficult elements of the task.
Figure 3.5: A robot learning from critique provided by the user through a hand-held remote [138].
Shaping methods with human-controlled rewards have been successfully demonstrated in a variety of software agent applications [33, 135, 252] as well as robots [129, 138, 242]. Most of the developed techniques extend traditional Reinforcement Learning (RL) frameworks [245]. A common approach is to let the human directly control the reward signal to the agent [91, 119, 138, 241]. For example, in Figure 3.4, the human trainer provides positive and negative reward feedback via a hand-held remote in order to train the robot to perform the desired behavior [138].
Конец ознакомительного фрагмента.
Текст предоставлен ООО «ЛитРес».
Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.
Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.