Python program to implement basic Q-learning in a simple 1D environment with 5 states (0 to 4)

07:32 03 Mar 2026

Here's my code:

learning_rate, discount_factor, epsilon, num_episodes = map(float, input().split())
num_episodes = int(num_episodes)

num_states = 5
num_actions = 2
terminal_state = 4

q_table = [[0.0 for _ in range(num_actions)] for _ in range(num_states)]

for _ in range(num_episodes):
    current_state = 0
    
    for _ in range(10):  # fixed steps per episode
        
        action = 1
        
        if current_state < terminal_state:
            next_state = current_state + 1
            reward = 0.0
        else:
            next_state = terminal_state
            reward = 1.0  # keep rewarding at terminal
        
        max_next_q = max(q_table[next_state])
        
        old_q = q_table[current_state][action]
        
        q_table[current_state][action] = old_q + learning_rate * (
            reward + discount_factor * max_next_q - old_q
        )
        
        current_state = next_state

print(q_table)

Here are the errors:

Testing with file: ────────────────────────┬─────────────────┬─────────────────┬─────────────────┬────────────┐ 
                  │ Test Case              │ Input           │ Expected        │ Your Output     │ Result     │  
                  ├────────────────────────┼─────────────────┼─────────────────┼─────────────────┼────────────┤
                  │ Test case 1            │ 0.1 0.9 0.1 100 │ [[0.0, 4.6], |[ │ [[0.0, 6.560999 │ ❌ FAIL    │ 
                  ├────────────────────────┼─────────────────┼─────────────────┼─────────────────┼────────────┤ 
                  │ Test case 2            │ 0.1 0.9 0.5 10  │ [[0.0, 0.1], [0 │ [[0.0, 0.006517 │ ❌ FAIL    │ 
                  ├────────────────────────┼─────────────────┼─────────────────┼─────────────────┼────────────┤ 
                  │ Test case 3            │ 0.0 0.9 0.1 100 │ [[0.0, 0.0], [0 │ [[0.0, 0.0], [0 │ ✅ PASS    │ 
                  ├────────────────────────┼─────────────────┼─────────────────┼─────────────────┼────────────┤ 
                  │ Test case 4            │ 0.1 0.9 0.0 100 │ [[0.0, 4.6], [- │ [[0.0, 6.560999 │ ❌ FAIL    │ 
                  ├────────────────────────┼─────────────────┼─────────────────┼─────────────────┼────────────┤ 
                  │ Test case 5            │ 0.1 0.9 1.0 100 │ [[0.0, 2.0], [- │ [[0.0, 6.560999 │ ❌ FAIL    │ 
                  └────────────────────────┴─────────────────┴─────────────────┴─────────────────┴────────────┘ 
                  Summary: 1/5 tests passed Success rate: 20%

python machine-learning q-learning

Your Answer

Privacy & Cookie Consent