INTRODUCTION
Microwave cavity filters are in great demand in satellite communication industry, mobile communication industry, radar systems and electronic countermeasures systems. Especially the recent fast growing rate of mobile communication industry makes it more pressing in solving the problem restricting the development of microwave cavity filters. Because of the imprecisions in manufacturing and assembling processes, each cavity filter should be carefully tuned by a professional technician before leaving the factory. This operation is completed by manually tuning the screws and adjusting the screw nuts to fit the scattering characteristics (or S-parameters) shown in the vector network analyzer to the design specifications (Fig 1).
Fig.1. Manually tuning of cavity filters by human experts.
The complicated and nonlinear relationship between the S-parameters and the screw positions of the cavity filters makes the tuning process difficult and time-consuming. The tuning speed not only depends on the specific type and structure of the filters, but also highlights the importance of human roles in the tuning operation. The tuning time varies with different human experience and strategies from several minutes to several hours. This improvement of the production efficiency severely be restricted by lacking of experienced technicians, who are always expensive to train and employ.
Over time, the automation reformation of the cavity filter tuning tasks seems particularly important and urgent. On the one hand, the manually tuning of filters consumes much time, slowing down the production efficiency of the enterprises. On the other hand, fewer and fewer young people tend to dot these dull and repetitive tasks.
Hence, we proposed a new tuning method based on the mechanism of reward and penalty, which is the key of the reinforcement learning (RL). A single layer feedforward neural network is built to model the value function for Q-learning.
PUBLICATIONS
The project is described in the following paper:
[1] Reinforcement Learning Approach to Learning Human Experience in Tuning Cavity Filters, IEEE 2015 ROBIO.(code, pdf)
[2] IntelligentTuningAlgorithmofCavityFilterandTuningMethodUsingSame,PatentNo.CN105680827A.
INTELLIGENT TUNING BASED ON Q-LEARNING
Reinforcement Learning is an area of machine leanring inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. Similary, when the worker tuned filter, he adjusted the position of screw according to changing S-parameters untill satisfy the requirement (shown in Fig. 1). We learned hence a model which based on reinforcement learning to finish tuning the filter.
Fig.2 The process of filter tuning.
Algorithm
- Initialize replay memory D to capacity N
- Initialize action-value function Q with random weights w and biases b
- Initialize target action-value function \(Q_{target}\) with weights \(w_{t} = w\)
- For episode = 1, M do
- Initialize sequence \(s_{t}\) (S-parameters)
- For t = 1, T do
- With probability e select a random action \(a_{t}\)
- Execute action \(a_{t}\) in filter tuning and observe reward \(r_{t}\) and \(s_{t+1}\)
- Store transition \((s_{t}, a_{t}, r_{t}, s_{t+1})\) in D
- Sample random minibatch of transitions from D
- Compute \( y_{t}= r_{t} + \max _{a_{t+1}}Q’(s_{t+1},a_{t+1};w,b) \)
- Perform a gradient descent to update w, b
- end for
- end for
RESULT
We trained the Q-network for 100 epochs, each with 1000 maximum tuning steps. As can be seen from the results, at the beginning of the training process the Q-network model seemed hard to achieve the desired state and failed the tuning after 1000 frames (tuning steps). As the training proceeded the step for successfully tuning gradually decreased and converged to about 100 times. After 100 training epochs, we tested the trained Q-network with 100 random states and we found that the probability of successfully tuning reached 95% and in most of the cases the filter could be tuned out within only 50 steps.