Online Learning for the Control of Human Standing via Spinal Cord Stimulation
Many applications in recommender systems or experimental design need to make decisions online. Each decision leads to a stochastic reward with initially unknown distribution, while new decisions are made based on the observations of previous rewards. To maximize the total reward, one needs to balance between exploring different strategies and exploiting currently optimal strategies within a given set of strategies. This is the underlying trade-off of a number of clinical neural engineering problems, including brain-computer interface, deep brain stimulation, and spinal cord injury therapy. In these systems, complex electronic and computational systems interact with the human central nervous system. A critical issue is how to control the agents to produce results which are optimal under some measure, for example, efficiently decoding user's intention in brain-computer interface or perform temporal and spatial specific stimulation in deep brain stimulation. This dissertation is motivated by electrical sipnal cord stimulation with high dimensional inputs. The stimulation is applied to promote the function and rehabilitation of the remaining neural circuitry below the spinal cord injury. And enable complex motor behaviors such as stepping and standing. To enable the careful tuning of these stimuli for each patient, the electrode arrays which deliver these stimuli have become increasingly more sophisticated, with a corresponding increase in the number of free parameters over which the stimuli need to be optimized. Since the number of stimuli is growing exponentially with the number of electrodes, algorithmic method of selecting stimuli is necessary, particularly when the feedback is expensive to get.
In many online learning settings, particularly those that involve human feedback, reliable feedback is often limited to pairwise preferences instead of real valued feedback. Examples include implicit or subjective feedback for information retrieval and recommender systems. Such as clicks on search results, and subjective feedback on the quality of recommended care. And sometimes with real valued feedback, we require that the sampled function values exceed some prespecified safety threshold, a requirement that existing algorithms fail to meet. Examples include medical applications where the patients' comfort must be guaranteed; recommender systems aiming to avoid user dissatisfaction; and robotic control, where one seeks to avoid controls causing physical harm to the platform.
This dissertation provides online learning algorithms for several specific online decision-making problems. \selfsparring optimizes the cumulative reward with relative feedback. RankComparison deals with ranking feedback. \safeopt considers the optimization with real valued feedback and safety constraints. \cduel is designed for the specific spinal cord injury therapy. A variant of \cduel was implemented in closed-loop human experiments, controlling which epidural stimulating electrodes are used in the spinal cord of SCI patients. The results obtained are compared with concurrent stimulus tuning carried out by human experimenter. These experiments show that this algorithm is at least as effective as the human experimenter, suggesting that this algorithm can be applied to the more challenging problems of enabling and optimizing complex, sensory-dependent behaviors, such as stepping and standing in SCI patients.
In order to get reliable quantitative measurements besides comparisons, the standing behaviors of paralyzed patients under spinal cord stimulation are evaluated. The potential of quantifying the quality of bipedal standing in an automatic approach is also shown in this work.