What is a voice chip? What does it do?

What is a voice chip? What does it do?

What is a voice chip?
A voice chip definition: It converts voice signals into digital data through sampling, stores them in the IC's ROM, and then uses circuitry to convert the digital data in the ROM back into the voice signal.

Voice chips are divided into two categories based on their output method: PWM output and DAC output. PWM output volume is not continuously adjustable and cannot be connected to a standard amplifier. Most voice chips on the market currently use PWM output. The other type uses DAC amplification with internal EQ. These voice chips offer continuous volume adjustment, digital control, and can be connected to an external amplifier.
The playback function of a standard voice chip is essentially a DAC process, while the ADC process is performed by a computer, including voice signal sampling, compression, and EQ.

A recording chip includes both ADC and DAC processes, both of which are performed by the chip itself, including voice data acquisition, analysis, compression, storage, and playback.
ADC = Analog Digital Converter
DAC = Digital Analog Converter

Sound quality depends on the number of bits in the ADC and DAC. For example: 20 seconds to 340 seconds, the lowest is from 10 seconds to 340 seconds. From the name, the voice chip is intuitively a chip related to voice. Voice is stored electronic sound. Any chip that can make sound is a voice chip, commonly known as a sound chip. More accurately, it should be called a Voice IC. In the family of voice chips, they can be divided into two types according to the type of sound: (Speech IC) and (Music IC). This should be considered a professional method of distinguishing voice chips.

2. Quantitative representation of voice signals: (Classification: Voice chip and Music chip)
(a) Introduction to "Voice chip":
(1) Quantization of voice signals
Sampling rate (f), bit number (n), baud rate (T)
Sampling: converting voice analog signals into digital signals.
Sampling rate: the number of samples per second (byte).
Baud rate: the number of bits sampled per second (bit). The baud rate directly determines the sound quality. Bps: bit per second
The number of sampling bits refers to the number of bits under binary conditions. Generally, unless otherwise specified, the sampling bit number of the sound is 8 bits, from 00H to FFH, and 80H for silence.

(2) Sampling rate
Nyquist sampling theorem: To restore the original signal from the sampled signal without distortion, the sampling frequency should be greater than 2 times the highest frequency of the signal. When the sampling frequency is less than 2 times the highest frequency of the spectrum, the spectrum of the signal has aliasing. When the sampling frequency is greater than 2 times the highest frequency of the spectrum, the spectrum of the signal has no aliasing.

The bandwidth of the voice is about 20 to 20 kHz, and the bandwidth of ordinary sounds is about 3 kHz or less. Therefore, the sound quality of CDs is generally 44.1 kHz and 16 bits. If you encounter some special sounds, such as musical instruments, the sound quality can also be 48 kHz and 24 bits, but it is not mainstream.

Generally, when we deal with ordinary voice ICs, a sampling rate of up to 16 kHz is sufficient. For speech, 8 kHz (such as telephone sound quality) or around 6 kHz is generally used. The effect is relatively poor if it is lower than 6 kHz. When using a microcontroller, higher sampling rates and faster timer interrupt rates can affect the monitoring and detection of other signals, so comprehensive considerations are essential.

What is the Function of a Voice Chip?
A voice chip primarily adds voice broadcast functionality to a product, essentially playing back the sound.
Common applications include: voice commands for small appliance buttons, safety alarms, and truck announcements such as "Left Turn, Caution!" Voice chips are widely used.

There are two main types of voice chips:
1. The first type, called an OTP voice chip, is mostly packaged in an SOP8 package. It has several commonly used voice commands built in and is factory-programmed, making them unchangeable. These are programmed using a dedicated programmer or may be programmed with light during wafer fabrication. This type requires a large number of chips, resulting in limited flexibility and average playback quality.

2. The second type is the MP3 voice chip. It supports MP3 decoding, a technological leap forward compared to OTP voice chips. This applies the advanced MP3 technology to voice chips. Its powerful USB interface allows for direct virtual flash storage, creating a USB flash drive. This makes voice updates extremely convenient; simply copy the audio file into the chip as if it were a USB flash drive. It can also decode and play audio from an external TF card, SD card, or USB flash drive, delivering detailed, high-quality audio.

Voice Chip Development Trends
Trend 1: Customization, Low Power Consumption, High Performance, and End-to-End Intelligence
Voice chips, with their customization, low power consumption, high performance, end-to-end intelligence, and cost advantages, are poised to occupy a significant market share in the future, becoming the bridge between people and the cloud. Driven by companies like Amazon, Alibaba, and Xiaomi, smart speaker sales have experienced explosive growth worldwide, and voice chip shipments are expected to surge as well. The development history of voice chips has evolved through three stages: general-purpose combination chips, voice chips, and voice AI chips. During the initial phase of general-purpose combination chips, due to the long R&D cycles and high R&D costs involved, no products specifically utilizing voice chips emerged before a significant market scale was established. Furthermore, in the early days of voice interaction, smart devices had yet to achieve significant sales volume. Even if relevant practitioners saw potential opportunities, the initial investment required to develop a mature voice chip meant that large-scale adoption of voice chips in these early smart devices was impractical, forcing them to rely on other chips as a transitional step.

Trend 2: AI Voice Chips Poised for Growth
With the launch of Huawei's Kirin 970 chip and Apple's A11 chip, AI chips have become a hot topic in the industry. AI chips, also known as AI accelerators or computing cards, are modules specifically designed to handle the bulk of the computational tasks in AI applications (other non-computing tasks remain within the CPU), thereby enabling on-device intelligence.

Currently, whether in smart speakers or other smart devices, more intelligence is being generated in the cloud. However, this cloud suffers from latency issues with voice interaction, and the network requirements limit device usage, leading to data and privacy risks. In order to unrestrict device usage scenarios and provide a better user experience, edge intelligence has become a trend, and voice AI chips have followed suit.