Speaker recognition is a person's identification of the voice characteristics ( biometric votes ). This is also called voice recognition . There is a difference between speaker recognition (recognizing who is in speech) and voice recognition (recognizing what is being said). Both of these terms are often confusing, and "voice recognition" can be used for both. In addition, there is a difference between authentication actions (usually referred to as speaker verification or speaker authentication ) and identification. Finally, there is a difference between speaker recognition (recognizing who is in speech) and an authorized speaker (recognizing when the same speaker is talking). Recognizing speakers can simplify the task of translating speech in a system that has been trained in a particular person's voice or can be used to authenticate or verify the speaker's identity as part of the security process.
Speaker recognition has a history that starts around four decades and uses the acoustic features of speech that have been found to be different between individuals. This acoustic pattern reflects anatomy (eg the size and shape of the throat and mouth) and learned behavioral patterns (eg, tone of voice, speech style). Speaker verification has earned the recognition of classifier speakers as "behavioral biometrics".
Video Speaker recognition
Verify versus identification
There are two main applications of speaker recognition technology and methodology. If the speaker claims a particular identity and the voice is used to verify this claim, this is called verification or authentication . On the other hand, identification is the task to determine the identity of an unknown speaker. In a sense speaker verification is a 1: 1 match in which a speaker's voice is matched to one template (also called "sound print" or "sound model") while speaker identification is a match 1: N where sound is compared to N template.
From a security perspective, identification differs from verification. For example, presenting your passport on the control border is a verification process: the agent compares your face with the image in the document. Conversely, a police officer who compares the sketches of an attacker with a previously documented criminal database to find the closest match is the identification process.
Speaker speakers are typically used as "gatekeepers" to provide access to secure systems (eg phone banking). This system operates with user knowledge and usually requires their cooperation. The system's speaker identification can also be applied secretly without the user's knowledge to identify the speaker in the discussion, alerting the automated system of speaker changes, checking whether the user is registered in the system, etc.
In a forensic application, it is usually the first time the speaker identifies the process to create a "best match" list and then perform a series of verification processes to determine a conclusive game.
Maps Speaker recognition
Speaker recognition variant
Each speaker recognition system has two phases: Registration and verification. During registration, the speaker's voice is recorded and usually a number of features are extracted to form voice print , template , or model . In the verification phase, speech samples or "speech" are compared with pre-made sound prints. For identification systems, they are compared with some voice prints to determine the best fit while the verification system compares speech with a single voice. Because the process is involved, verification is faster than identification.
Speaker recognition system is divided into two categories: dependent on text and text-independent.
Text-Dependent:
If the text should be the same for registration and verification, this is called an introduction that depends on the text. In text-dependent systems, hints may apply to all speakers (for example: common passphrase) or unique. In addition, the use of shared secrets (eg passwords and PINs) or knowledge-based information can be used to create multi-factor authentication scenarios.
Text-Independent:
The text-independent system is most often used for speaker identification because they require very little if any cooperation by the speaker. In this case the text at registration and test is different. In fact, registration can occur without the user's knowledge, as in the case for many forensic applications. Since text-free technology does not compare what was said during enrollment and verification, verification apps tend to also use speech recognition to determine what users are saying at the authentication point.
In an independent text system both acoustic and speech analysis techniques are used.
Technology
Speaker recognition is a pattern recognition problem. Various technologies used for processing and storing sound prints include frequency estimates, hidden Markov models, Gaussian mixed models, pattern matching algorithms, neural networks, matrix representations, Vector Quantization, and decision trees. Some systems also use "anti-speaker" techniques, such as cohort models, and world models. Spectral features are mostly used in representing speaker characteristics.
The surrounding noise level can block both the initial and subsequent collection of voice samples. The noise reduction algorithm can be used to improve accuracy, but the wrong application can have the opposite effect. Performance degradation can result from changes in voice behavior attributes and from registration using one phone and verification on another phone ("cross-channel"). Integration with two-factor authentication products is expected to increase. Changes in sounds due to aging can affect system performance over time. Some systems adjust the speaker model after each successful verification to capture long-term changes in the sound, despite debate over the overall security impacts imposed by automatic adaptation.
Capture of biometrics is seen as non-invasive. This technology has traditionally used a microphone and voice transmission technology that allows remote recognition over a regular phone (wired or wireless).
Identification of digitally recorded audio sounds and identification of analog sound recordings using electronic measurements as well as critical listening skills that must be applied by forensic experts to be accurately identified.
Apps
The first international patent filed in 1983, derived from telecom research at CSELT (Italy) by Michele Cavazza and Alberto Ciaramella as the basis for both future telecommunication services to end customers and to improve noise reduction techniques across the network.
In May 2013 it was announced that Barclays Wealth uses a passive speaker identifier to verify the identity of the phone subscriber in 30 seconds of normal conversation. The system used has been developed by Voice Recognition Company Nuance (which in 2011 acquired Loquendo company, spin-off from CSELT itself for speech technology), the company behind Siri Apple technology. A verified voiceprint will be used to identify callers to the system and the system will be launched in the future throughout the company.
The private banking division of Barclays is the first financial services company to use voice biometrics as the primary means to authenticate customers to their call centers. 93% of customer users have rated the system on "9 out of 10" for speed, ease of use and security.
Since then, the Nuance Voice Biometrics solution has been used in several financial institutions, including Banco Santander, Royal Bank of Canada, Tangerine Bank, and Manulife.
In August 2014 GoVivace Inc. deploy a speaker identification system that enables telecom industry clients to positively locate individuals among millions of speakers using only one sample of their sound recordings.
Speech recognition can also be used in criminal investigations, such as those committed by the execution of 2014, among others, James Foley and Steven Sotloff.
In February 2016 HSBC's high-street bank and internet-based retail bank First Direct announced that it would offer 15 million biometric banking customers to access online accounts and phones using their fingerprints or voice.
See also
- AI effects
- Artificial intelligence app
- Speaker sharing
- Speech recognition
- Voice converter
- List
- List of emerging technologies
- Outline of Artificial Intelligence
Note
References
- "Biometrics of the film" -National Institute of Standards and Technology
- Elisabeth Zetterholm (2003), Voice Imitation. Phonetic Studies on Perceptual Illusions and Acoustic Success, PhD thesis, University of Lund.
- Md Sahidullah (2015), Performance Improvement Speaker Recognition Using Block Level, Relative and Temporal Information from Subband Energy , PhD thesis, Indian Institute of Technology Kharagpur.
External links
- Removing the PLA Radio Podcast's Voice Authentication recently featured a simple way to fool an incomplete voice authentication system.
- Speaker recognition - Scholarpedia
- The benefits of voice recognition and challenges in access control
Software
- bob.bio.spear
- ALIZE
Source of the article : Wikipedia