Solutions to Empower Analysts…….
Typical and conventional cycle for gaining intelligence from speech/text entails (a) Listening to hundreds of Intercepts by various language experts (b) language identification (c) assessing its intelligence value through manual and labor-intensive process of transcription & translation and (d) finally intelligence collection/collation. The cycle is not only inefficient, expensive and error prone, it also results in inherent and unacceptable delays that may render gathered intelligence of little or no value.
A structured approach to Intelligence gathering would be:-
- Analyze target audio for its worth, by Preliminary Automated Investigation through Filtering and Supporting technologies (Sound Engineering)
- Initial Analysis of worthy audio- to arrive at Audio of interest. (Voice Biometrics).
- Detailed (Content) Analysis- for pinpointing sections of interest within shortlisted audio. (Speech Analytics.)
- Investigations in to Multiple Audio Recordings- for a comprehensive intelligence mapping and aiding government agencies in investigations. (Speech Forensics)
- Speech to Text- for archive/records/circulation.(Transcription)
- Translation of Text to desired language- for uniformity, dissemination, archiving and further analytics. (Text Translation)
Voice Filtering and Support
Language, accent, or the channel independent solutions that filter out 40 percent of poor quality Audio.
Speech Quality Estimation
Measures the quality of speech
Voice Activity Detection
Detects the audio part containing voice
Speaker
Diarization
Separates multiple speakers in an audio recording automatically.
Speaker Age Estimation
Estimates the speaker’s age group
Voice Biometrics
Language, accent and channel independent Solutions that facilitate accurate identification and search functions.
Powered by state-of-the-art deep neural networks (DNN), the Solutions achieve their functionality based on the comparison of the unique characteristics of a human voice (a voiceprint).
Language Identification
- Detect the language spoken and dialect automatically.
- Filter the Audio for further processing by language dependent analytics technologies.
- Well over 70 pre-trained language models provided.
- User can easily train the tool for any language of interest or improve/customize the pre-trained models supplied.
Gender Identification
- Pre-filter audio files by identifying the Gender (male/female)
Speaker Identification
- Search for and recognize a speaker automatically based on the uniqueness of their voice.
- Recognition by Voiceprint comparison against a database of suspects.
- Suspects’ database can be built and improved dynamically.
Voice Analytics
Cutting-edge Language dependent Solutions for Advance analysis.
Keyword Spotting
- Automatically detect specified keywords in speech.
- Discover related audio content.
- Powered by Deep Neural Network and Acoustic based algorithms
- Automatically generates Pronunciations of specified Keyword.
- Pronunciations (phonemes) form basis of search.
- Provision to add pronunciation variants for each keyword or phrase.
- Over 20 Languages supported.
Phoneme Recognition
- Convert (Transcribe) speech recordings and standard orthography into phoneme symbols for further use by Keyword Spotting Module.
- Correct possible mistakes for further improvement of Phonemes generated.
Time Analysis Extraction
- Applicable to 2-Channel Recordings
- Extract information about conversation flow in an Audio.
- Identify reaction times, cross talk and speaker responses in the two channels.
Waveform Denoiser
- Remove noise in audio automatically and improve the audibility of speech for analysts.
- Focused on better audibility to the human ear
- Trained on various kinds of Noise
Voice Forensics
Salient Features:
- Independent of language, accent, text and channel
- 1:1 speaker comparison
- 1:N speaker identification for more complex cases.
- Diarization for ease of working with audio recordings containing multiple speakers.
- Search/visualization of the same phoneme sequences across audio files through a phoneme recognizer.
- Measures accuracy in a user’s data sets for evaluation purposes.
- Enables Waveform Editing with tools such as a spectrum panel, voice activity Technology.
- Compatible with the widest range of audio sources possible GSM/CDMA, 3G, VoIP, landlines, etc.
Inputs requirements:
- Signal formats : WAV or RAW (8 or 16-bit linear coding), with A-law or Mu-law, PCM, 8 kHz+ sampling
- Minimum speech signal duration for enrollment: 20+ seconds
- Minimum speech signal for identification: 3+ seconds
Outputs:
- Scoring to a likelihood ratio (LR), log-likelihood ratio (LLR) and verbal presentation of results
- Graphic presentation of the likelihood ratio (LR)
- Detailed report output (expert opinion template automatically generated) for presentation of results (to a court or an investigation team)
Speech Transcription
Dissemination/Archive in print:
- Convert speech into plain text automatically
- Search for the topic of interest instantly.
- Quickly annotate speech content of call recordings with the combination of Language Identification and Speech to Text technologies.
- The language of a recording is detected automatically, speech transcribed accordingly
- Annotations with a confidence level for each word are generated.
- The words with a low confidence level are highlighted so that an operator can manually correct them.
- If a natural language processing (NLP) layer is implemented, a synopsis proposal is also created and, along with the generated annotations, sent either directly to a corresponding operator based on the language or run through an offline translation layer first.
Pre-process of Audio for translation or any other analytics of audio intercepts in text domain
- Free users of dependence on Cloud based solutions, thus making transcription, cost effective while ensuring user data security.
- Over 90 languages supported, with high quality punctuation
- Faster response due to minimal data exchange latency
- Caters to varying operational needs by scalable deployment on local PC, organisations Intranet or extranet,
- Easily integrates with user applications or services.
Text Translation
- Cost vis-à-vis Accuracy requirements.
- Cost vis-à-vis Security expectations by allowing flexibility in deployment as On-premise (Behind Organization’s Firewall) or Cloud Based solution.
- Scalability from a PC based solution to a High End On-premise machine translation server which may be configured/sized as per the number of users, machines, data, etc.
- Ease of integration into any Information Retrieval (IR) or communication systems to facilitate multilingual Information Retrieval and Document Exploitation (DOCEX).
- Ease of integration with Customers’ OSINT and COMINT platform using standard APIs and partner connectors.
- Additional value added features like:-
- Translation of:-
- Manual Text input.
- Text & Video Files.
- Websites.
- Integration with:-
- Automatic Speech Recognition (ASR) systems to translate Voice Data
- Optical Character Recognition (OCR) systems for image data translation
- Language identification
- Domain and topic classification
- Named-entity recognition.