
How applications interact with the Kinect sensor
The Kinect for Windows SDK works as an interface between the Kinect device and your application. When you need to access the sensor, the application sends an API call to the driver. The Kinect driver controls access to sensor data. To take a granular look inside the application interfacing with the sensor, refer to the following diagram:

The installed drivers for the sensors sit with the components of system device drivers and can talk to each other. The drivers help to stream the video and audio data from the sensors and return it to the application. These drivers help to detect the Kinect microphone array as a default audio device and also help the array to interact with the Windows default speech recognition engine. Another part of the Kinect device driver controls the USB hubs on the connected sensor as well.
Understanding the classification of SDK APIs
To understand the functionality of different APIs and to know their use, it is always good to have a clear view of the way they work. We can classify the SDK libraries into the two following categories:
- Those controlling and accessing Kinect sensors
- Those accessing microphones and controlling audio
The first category deals with the sensors by capturing the color stream, infrared data stream, and depth stream, by tracking human skeletons and taking control of sensor initialization. A set of APIs in this category directly talks to the sensor hardware, whereas a few APIs on processing apply the data that is captured from the sensor.
On the other hand, the audio APIs control the Kinect microphone array and help to capture the audio stream from the sensors, controlling the sound source and enabling speech recognition, and so on. The following diagram shows a top-level API classification based on the type of work the API performs:

We can also define the SDK API as a Natural User Interfaces (NUI) API, which retrieves the data from the depth sensor and color camera and captures the audio data stream. There are several APIs that are written on top of the NUI APIs, such as those for retrieving sensor information just by reading sensor details and for tracking human skeletons based on the depth data stream returned from the sensor.