An IVR For Beginners

Building IVR Applications the Easy Way

The Problem

Architecting a server to answer telephone calls in a scalable way is a very hard task to accomplish. Before a line of application code can be written thousands of lines of scaffolding are necessary.

Mastering the various technologies - telephony, waveform audio, speech etc - takes time. And in order for the server to run smoothly every input and output operation needs to be performed asynchronously. Since these operations - telephony, audio, media file, speech, database, keypad input - are so different each has a different way of announcing its completion. Keeping track of all that is going on or and is pending in the server is not easy. Propagating events from one subsystem to another so that, for example, the text-to-speech engine can stop synthesizing text when the caller hangs up is tricky business. 

Further, it would not be at unusual for the server to be passing text through the text-to-speech engine on one line, recognizing the caller's spoken input on a second, detecting keypad input on a third, streaming a media file on a fourth, recording a message on a fifth, transferring a call on a sixth, answering a call on a seventh and disconnecting a call on an eighth. Trying do perform such disparate tasks concurrently and efficiently requires that substantial amounts of time be spent on the threading and I/O models of the server. 

On the one hand, the language that is best suited to implement such a complicated server is C++ because it provides so much flexibility and the greatest access to the services provided by the host operating system. But on the other, building the business logic of an IVR application in C++ is not always the the best choice.

The Solution

We solve this problem by delivering a pre-built server which is exists solely to perform the tasks common to all IVR servers. Out of the box, it knows how to answer telephone calls, stream and record media files, route audio through text-to-speech and automatic speech recognition engines and detect keypad input. It provides a plug-in architecture so that the task of building an IVR application is reduced to building a module which handles the voice user interface for an application. Its plug-ins are written from the perspective of a single call so that the application developer does not have to concern himself with the thorny issues related to scalability, concurrency or synchronization. And while the server is written in C++ for maximum performance, it permits applications to be written in the language most appropriate to the application's domain.

One of the most challenging aspects in building a scalable server is managing the resources required to automate the handling of a telephone call. Our server deals with that issue by removing the responsibility of resource management from the application to the server.

In every language or environment which the server supports, there is a function or method named Answer() which serves to bound the period during which resources are tracked. That is to say that when the server calls the function, it announces to an application that a call has arrived on a line. Until the application returns from that method, the line will be the exclusive property of the application. The application never has to allocate the line ( that's done by the server before Answer() is called ) or to free it ( that's done automatically when the function returns ). In addition, applications never see "handles" to devices, files, engines, grammars etc. Resources are allocated when used and freed automatically by the server when Answer() returns. This, for example, is a complete IVR application written in C#:

using IVRForBeginners;

public class CSharpIVRApp : NetClientCall
{
	public void Answer()
	{
	}
}	

Notice that nowhere in the application is there any mention of the line whose calls are to be answered. That's because the server provides a deployment wizard which allows the application developer to simply point at a the line that is to be associated with an application. The deployment wizard simplifies applications in other ways too. For example, it allows a developer to declare the barge-in behavior and to specify the synthetic voice to be used by an application without writing any code.

Notice too that there is no visible code in the application which opens the line, closes it, detects an incoming call, or hangs up the line. The server does all of those things automatically and at the proper time.

Further, exactly what it does in that regard to the line device depends on whether the application is deployed to an ordinary telephone line (also known as a "port") or to a special type of line known as a route point. In the case of a route point, the server accepts the call, finds a free port, redirects the call to a free port and answers the new call on the allocated port. That done it dispatches the application which was deployed against the route point. When the application completes the allocated port is returned to the list of free ports. This allows the developer to build an application that answers calls on one hundred lines as easily as an application that handles a single line.

Getting Started

We suggest that you view the video tutorials before you start building an application or diving in to the help text here. After you have done that you can open the topic in the left pane that corresponds to the language or environment that you will use to build your application.