摘要:
There is disclosed a fault tolerant duplex computer system capable of increasing accuracy of processing to be continued by collecting trouble information without stopping duplex running. CPU's (112, 122), memories (113, 123), and IO processors (114, 124) of systems (110, 120) announce a reparable trouble to fault diagnosis processors (116, 126) when the generated trouble can be repaired, and an irreparable trouble when the generated trouble cannot be repaired. When an out-of-sync situation is confirmed, the out-of-sync situation is announced. A fault monitoring section (130) updates reparable trouble information (131) of a relevant system when the reparable trouble is received, and irreparable trouble information (132) of a relevant system when the irreparable trouble is received. Upon reception of the out-of-sync situation, a synchronous processing instruction is made by setting the system of a smaller amount of trouble information as an active system and the system of a larger number of trouble information as a standby system.
摘要:
A computing device may be joined to a cluster by discovering the device, determining whether the device is eligible to join the cluster, configuring the device, and assigning the device a cluster role. A device may be assigned to act as a cluster master, backup master, active device, standby device, or another role. The cluster master may be configured to assign tasks, such as network flow processing to the cluster devices. The cluster master and backup master may maintain global, run-time synchronization data pertaining to each of the network flows, shared resources, cluster configuration, and the like. The devices within the cluster may monitor one another. Monitoring may include transmitting status messages comprising indicators of device health to the other devices in the cluster. In the event a device satisfies failover conditions, a failover operation to replace the device with another standby device, may be performed.
摘要:
A distributed computing system can achieve consensus while introducing fewer message delays by using an algorithm that allows the constituent devices to vote on functions received directly from one or more clients. If a conflict occurs, a leader device from among the devices can be selected such that the leader device already knows of the other devices' previous votes, and can determine an appropriate function to propose, using an immediately subsequent proposal number, without performing the first phase of the Paxos algorithm. Alternatively, each device can independently determine, by using the same repeatable mechanism used by a leader device, what function the leader device would propose, and can then vote for that function using the immediately subsequent proposal number. If the devices' votes again result in a conflict, the Paxos algorithm can be used, or additional iterations can be performed prior to resorting to the Paxos algorithm.
摘要:
In a network of computer nodes, a directory service provides both the physical location of directory information around the network and the directory information itself in a single data structure. This single data structure is distributed throughout the network, and continuously redistributed, so as to create a directory service that is both more flexible, and more robust, than prior art directory services.
摘要:
A method and device for monitoring a distributed system made up of a plurality of users that are connected by one bus system are provided, in which distributed system at least a number of the users are provided as monitoring users. The process data of at least one monitored user are filed in data areas of memory units of the bus system, to which the monitoring users have access, and the process data are evaluated by the monitoring users.
摘要:
A distributed computing system can be operated in a fault tolerant manner using a collection of auxiliary computing devices and more main computing devices than the number of faults the system can tolerate. A quorum of all of the main computing devices can be used. In the event of a failure, an alternative quorum from a selected set of quorums, comprising at least one main computing device and some or all of the auxiliary computing devices, can be used to complete pending operations and to select a new set of quorums. Alternatively, another state machine, comprising at least one main computing device and some or all of the auxiliary computing devices, can select a new quorum comprising the currently operating main computing devices, and the new quorum can then complete pending operations and can continue to select proposals using the proposal number assigned by the other state machine.
摘要:
An electronic module is provided. The module includes a first logic device having at least two processors and a first comparator and a second logic device having at least one processor and a second comparator. Each of the at least two processors are coupled to each of the first and second comparators. The first and second comparators operate as a distributed comparator system. Each comparator independently identifies faults in the processors.
摘要:
Byzantine Agreement requires a set of parties in a distributed system to agree on a value even if some parties are corrupted. The invention comprises a method for achieving agreement among participating network devices in an asynchronous network is disclosed that makes use of cryptography, specifically of threshold digital signatures and a distributed coin-tossing protocol.
摘要:
A system and method for synchronizing a plurality of main processors. At a first time and in response to a first time reference, a first rendezvous signal is sent from a first to a second of the plurality of main processors. At a second time, and in response to a second time reference, a second rendezvous signal is sent from the second of the plurality of main processors, to the first of said plurality of main processors. After the first rendezvous signal is received by the second of the plurality of main processors and the second rendezvous signal is received by the first of said plurality of main processors, substantially simultaneous scanning of control information is initiated by the first and second of the plurality of main processors. In variations, a difference between the first and second times signals a fault condition.
摘要:
A method and apparatus is disclosed which provides improved security in distributed-environment voting. At least three voting processors running a voting algorithm are connected to a local area network (LAN) and exchange their individually determined results of a process application. Each result is committed to an interface module where it is checked, authenticated and buffered. The allotted time for receiving and buffering committed results is constrained by a first timed interval within the interface module. The first timed interval may be reset several times. The allotted time for checking and comparing the committed results from each processor is constrained by a second timed interval within each voting processor. A majority vote of those authenticated committed results is formed once all necessary iterations of the both the first and second timed intervals are completed. Enhanced security is thereby afforded to the overall voting process and yields a majority vote that is correct despite the introduction of errors associated with faulty or hostile processors.