Multi-line non-deterministic Turing simulator

Python Methods and Functions

The simulator is written in Python 3 and uses all the power and expressiveness of this programming language, combining object-oriented and functional programming methods.

The organization is as follows. First, TM is introduced informally, highlighting its many applications in theoretical CS. Then the formal definitions of the base model and the multi-line option are provided. Finally, the design and implementation of the simulator is presented, examples of its use and execution are given.




Introduction

ТМ — these are abstract automata developed by Alan Turing in 1936 to explore the frontiers of computation. TM can calculate functions according to simple rules.

TM — it is a primitive computational model with 3 components:

  • Memory: I / O tape divided into individual cells in which characters are stored. The ribbon has a left cell, but it is not limited to the right, so there is no limit on the length of the lines it can store.
  • A control box with a finite set of states and a ribbon head that points to the current cell and can be moved to the left or to the right during computation.
  • The program is stored in the final control, which controls the computation of the machine.

The TM operation consists of three stages:

  1. Initialization. An input string of length N is loaded into the first N cells of the tape. The rest of the infinitely many cells contain a special character called empty. The machine returns to its original state.
  2. Calculus. Each calculation step includes:
    • Reading the character in the current cell (scanned by the tape head).
    • Following the rules defined by the program for the combination of the current state and the character read. The rules are called transitions or movements and consist of (a) writing a new character in the current cell, (b) switching to a new state, and (c) optionally moving the head one cell to the left or right.
  3. Improvement. The calculation stops when there is no rule for the current state and symbol. If the machine is in a final state, TM takes an input string. If the current state is not final, TM rejects the entered string. Note that not all TMs reach this stage, because it is quite possible that TM never stops at a given input, entering an infinite loop.

TM have many applications in theoretical computer science and are closely related to the theory of formal language. TM — these are language recognizers that adopt the highest class of the Chomsky language hierarchy: type 0 languages ​​generated by unrestricted grammars. They are also language converters: given an input string from one language, TM can evaluate an output string from the same or a different language. This capability allows TMs to compute functions whose inputs and outputs are encoded as strings of a formal language, for example, binary numbers viewed as a set of strings in the alphabet {0, 1}.

Church-Turing claims that TMs are capable of computing any a function that can be expressed by the algorithm. This means that TM are actually universal computers, which, being abstract mathematical devices, do not suffer from the limitations of physical time and space. If this thesis is correct, as many computer scientists believe, the fact that Turing discovered that there are functions that cannot be computed by TM implies that there are functions that cannot be algorithmically computed by any computer in the past, present, or the future.

TM have also been very important in the study of computational complexity and one of the central unsolved problems in CS and mathematics: the P vs NP problem. TM are a convenient hardware-independent model for analyzing the computational complexity of algorithms in terms of the number of steps performed (time complexity) or the number of scanned cells (spatial complexity) during computation.




Formal definition: base model

Turing Machine (TM) is a 7-tuple where:

  • finite non-empty set of states.
  • this is a finite non-empty set of characters called the strip alphabet.
  • this is the input alphabet.
  • is a jump or next move function that displays pairs of status characters in a subset states of triplets, symbol, head direction (left, right or stop).
  • this is the initial state.
  • is the final receiving state.
  • this is an empty character

At each stage of the calculation, a TM can be described by an instant description (ID). The id is ternary where is the actual state, the string contained in the cells to the left of the cell scanned by the machine and the string contained in the current cell and other cells to the right of the ribbon head until the cell starting an infinite sequence of spaces.

Binary relation links two identifiers and is defined as follows, for all and and :

  • then and only then
  • then and only then
  • then and only then
  • then and only then
  • then and only then
  • then and only then

Allow to be transitive and reflective closure , i.e. applying zero or more transitions between identifiers. Then the language recognized by is defined as: ,

If for all states and ribbon symbols , has a maximum of one element, which is called TM deterministic. If there are transitions with more than one choice, TM is non-deterministic.

The sequence of identifiers of deterministic TMs is linear. For non-deterministic TM, it forms a computation tree . Nondeterminism can be thought of as if the machine created its own replicas that run in parallel. This useful analogy will be used by our simulator.

At first glance, we might think that non-deterministic TMs are more powerful than deterministic TMs because they are able to “guess” the correct path. But this is not the case: the DTM is only a special case of the NDTM, and any NDTM can be converted to a DTM. Thus, they have the same processing power.

In fact, several variants of TM have been proposed: with a double-sided endless tape, with several tracks, without a stop option, etc. the same processing power as the base model. They recognize the same class of languages.

In the next section, we present a very useful option: multi-line non-deterministic TM.




Formal definition: multitape TM

TM Multitape have multiple I / O tapes with independent heads. This option does not increase the computational power of the original, but, as we will see, it can simplify the construction of TM using auxiliary tapes.

A K-tape TM is a 7-tuple where all the elements are the same as in the base TM, except for the transition function, which is a mapping , It maps the read-state character pairs to subsets of the new write-state pairs + directions.

For example, the following two-part TM computes the sum of the numbers stored in unary notation in the first tape. The first tape contains factors: sequences of 1 separated by 0, which are natural numbers. The machine writes all 1s to tape 2, calculating the sum of all factors.

Formally let where is defined as follows:

  • Skip all 0s:
  • Copy 1 to tape 2:
  • Stop when space is reached:



Stopping problem

TM may not stop for some inputs ... For example, consider the TM with ,

The halting problem means that it is unresolvable to check if an arbitrary TM will halt on a given input line. This problem has profound implications because it shows that there are problems that cannot be calculated by TM, and if the Church-Turing thesis is correct, it means that no algorithm can solve these problems.

This is very bad news for the TM simulator, because it means that the simulator can go into an infinite loop.

We cannot completely avoid this problem, but we can solve its limited form. Consider the case of non-deterministic TM, where there are branches of the computation tree that enter an infinite loop and grow forever where others reach their final state. In this case, the simulator should stop accepting the input string. But if we traverse the tree using depth-first search (DFS), the simulator gets stuck when it enters one of the infinite branches. To avoid this, the simulator will traverse the computation tree using Breadth First Search (BFS). BFS — it is a graph traversal strategy that examines all children of a branch before proceeding to their descendants.




NDTM Multitape Simulator in Python

In this section, we present a simulator for non-deterministic TMs with multiple feeds written in Python.

The simulator consists of two classes: a ribbon class and an NDTM class.

Ribbon instances contain a list of currently scanned cells and a pointer to the ribbon head and provide the following operations:

  • readSymbol (): returns the character scanned by the head, or a space if the head is in the last cell scanned
  • writeSymbol (): replaces the character scanned by the head with another. If the heading is in the last scanned cells, adds a character to the end of the character list.
  • moveHead (): moves the head one position to the left (-1), right (1), or no positions (0).
  • clone (): Creates a copy or a copy of the tape. This will be very useful for simulating non-determinism.

NDTM instances have the following attributes:

  • start and end states.
  • current state.
  • list of feeds (Tape objects).
  • navigation dictionary.

The navigation function is implemented using a dictionary whose keys are — tuples (state, read_symbols), and values ​​— lists of tuples (new_state, move). For example, a TM that adds numbers in the unary notation introduced earlier would be represented as:

 {('q0', (' 1', '#')): [('q0', (( '1',' R'), ('1',' R')))], ('q0', (' 0', '#')): [('q0', ((' 0', 'R'), (' # ',' S')))], ('q0', (' # ',' # ')): [(' q1', (('#', 'S') , ('#', 'S')))]} 

Note that Python's representation is very similar to the mathematical definition of a transition function, thanks to Python's generic data structures such as dictionaries, tuples, and lists. A subclass of dict, defaultdict, from the standard collections module, is used to ease the burden of initialization.

NDTM objects contain methods for reading the current character set on tapes, for adding, getting and navigating, and for making copies of the current TM.

The main NDTM method is accept (). Its argument is an input string and returns an NDTM object if any branch of the computation tree reaches an accepting state, or None if none of the branches reaches. It traverses the computation tree using Breadth First Search (BFS) to allow computation to stop if any branch reaches an accepting state. BFS uses a queue to keep track of pending branches. Deque Python from Collections module is used to get O (1) performance on queue operations. The algorithm looks like this:

 Add the TM instance to the queue While queue is not empty: Fetch the first TM from the queue If there is no transition for the current state and read symbols: If the TM is in a final state: return TM Else: If the transition is nondeterministic: Create replicas of the TM and add them to the queue Execute the transition in the current TM and add it to the queue 

Finally, the NDTM class has methods for printing a TM representation as a collection of instant descriptions and analyzing a TM specification from a file. As usual, this I / O is the most cumbersome part of the simulator.

The spec files have the following syntax

% HEADER: mandatory start_state final_state blank number_of_tapes% TRANSITIONS state read_symbols new_state write_symbol, move write_symbol , move ... 

Lines starting with '%' are considered comments. For example, a TM that adds numbers in unary notation has the following spec file:

% HEADER q0 q1 # 2% TRANSITIONS q0 1, # q0 1, R 1, R q0 0, # q0 0, R #, S q0 #, # q1 #, S #, S 

States and characters can be any string that does not contain spaces or commas.

The simulator can be run from a Python session to examining the output configuration. For example, if the previous file was saved with the name "2sum.tm":

from NDTM import NDTM

tm = NDTM.parse ( '2sum.tm' )

print (tm.accepts ( '11011101' ))

The output shows that the simulator has generated the sum of units on tape # 1:

  Output:  q1: ['1',' 1', '0',' 1', '1',' 1', '0',' 1'] ['#'] q1: ['1',' 1', '1',' 1', '1',' 1' ] ['#'] 

The output shows the contents of the two tapes, the positions of the heads (the second list of each tape), and the final state of the TM.


Simulator source code

Excluding I / O code and comments, the simulator is less than 100 lines of code. This is a testament to the power and economy of Python. It's object oriented, but also uses functional constructs like lists.

####
# NDTM.py: Non-deterministic Turing Machine Simulator
# By David Gil del Rosal ([email protected])
#### from collections import defaultdict, deque

 

class Tape:

# Constructor. Sets an empty character,

# load string and feed head position

def __ init __ ( self , blank, string = ' ', head = 0 ):

self . blank = blank

self . loadString (string, head)

 

  # Loads a new line and sets the ribbon head

def loadString ( self , string, head):

self . symbols = list (string)

self . head = head

 

# Returns the character in the current cell or a space

# if head is at the beginning of infinite spaces

def readSymbol ( sel f ):

if self . head & lt; len ( self .symbols):

return self . symbols [ self . head]

else :

return self . blank

 

# Записывает символ в текущей ячейке, расширяя

 # список при необходимости

 def writeSymbol( self, symbol):

 if self.head <len(self.symbols):

 self.symbols[self.head] = symbol

 else:

 self.symbols.append(symbol)

  

 # Перемещает голову влево (-1), остается (0) или вправо (1)

 def moveHead(self, direction):

 if direction == 'L': inc = -1

 elif direction == 'R': inc = 1

 else: inc = 0

 self.head+= inc

  

 # Создает новую ленту с теми же атрибутами, что и эта

 def clone(self):

 return Tape(self. blank, self . symbols, self . head)

 

# String representation of the feed

  def __ str __ ( self ):

return str ( self . symbols [: self . head]) +

  str ( self . symbols [ self . head:])

 

 

class NDTM:

# Constructor. Sets the start and end states and

# feed units

def __ init __ ( self , start, final, blank = '#' , ntapes = 1 ):

self . start = self . state = start

self . final = final

  self . tapes = [Tape (blank) for _ in range (ntapes)]

self . trans = defaultdict ( list )

 

# Puts TM to the initial state and loads the input

# line in the first feed

  def restart ( self , string):

self . state = self . start

self . tapes [ 0 ]. loadString (string, 0 )

for tape in self . tapes [ 1 :]:

tape .loadString ('', 0 )

 

  # Returns a tuple with the current characters read

def readSymbols ( self ):

return tuple (tape.readSymbol ( ) for tape in self . tapes)

  

# Add record to transaction table

def addTrans ( self , state , read_sym, new_state, moves):

  self . trans [(state, read_sym)]. append ((new_state, moves))

 

# Returns a transaction that matches

# current state and read characters, or None if not

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              def getTrans ( self ):

  key = ( self . state, self . readSymbols ())

  return self . trans [key] if key in self . trans else None

 

# Performs a transaction, updating state and

  # feeds. Returns a TM object to resolve chaining

def execTrans ( self , trans):

  self . state, moves = trans

f



Get Solution for free from DataCamp guru