'Jump' by AI - Jack's Ship

Recently, one friend recommanded me an article about using AI to play a popular html5 game on Wechat called “Jump”. The destination of the game is to make the chessman jump from one table to another by touching the screen with right timing.

The basic idea about the solution is shown as below steps.

Step 1: Making a screenshot of the game by using adb tools.
Step 2: Getting chessman’s position.
Step 3: Getting target talbe position.
Step 4: Calculate distances between chessman and target table and calculate the timing of the touching.
Step 5: Sending adb command to finish the jumping.

The key challenge is how to get the position of chessman and target table accurately. The article gives 2 solutions.

The first solution is quite simple. It is using traditional template match of multi-scale images based on opencv. Here is an article about details of this methods.

def multi_scale_search(pivot, screen, range=0.3, num=10):
    H, W = screen.shape[:2]
    h, w = pivot.shape[:2]

    found = None
    for scale in np.linspace(1-range, 1+range, num)[::-1]:
        resized = cv2.resize(screen, (int(W * scale), int(H * scale)))
        r = W / float(resized.shape[1])
        if resized.shape[0] < h or resized.shape[1] < w:
            break
        res = cv2.matchTemplate(resized, pivot, cv2.TM_CCOEFF_NORMED)

        loc = np.where(res >= res.max())
        pos_h, pos_w = list(zip(*loc))[0]

        if found is None or res.max() > found[-1]:
            found = (pos_h, pos_w, r, res.max())

    if found is None: return (0,0,0,0,0)
    pos_h, pos_w, r, score = found
    start_h, start_w = int(pos_h * r), int(pos_w * r)
    end_h, end_w = int((pos_h + h) * r), int((pos_w + w) * r)
    return [start_h, start_w, end_h, end_w, score]

The second solution is more interesting and worth to learn. It uses a machine learing algorithmn called “CNN(Convolutional Neural Networks) Corase-to-Fine” to search the position of target table. Here is an article about this aglorithmn using on image classification. However, from the code, I found they are different definitions. The game’s searching aglorithmn is not so complicated as the one in article. It is just using 2 layer CNN models, one called “coarse” for searching an approximate postion of target table and use it to create an 320*320 cropping image of the target table, and using “fine” model to search this cropping image to get a accurate position of center of target table. About CNN’s introduction and implimentation by TensorFlow, please read here.

Comparing this two solutions, the first is easy to understand and implimentation, but the 2nd one is more accurate and faster. And it is also an good example to learn about the development of machine learning and computer vision by Python, opencv and TensorFlow.