LEXISTEMS SensibleSpeech: AI-based speech recognition

The speech recognition solution that understands people and data.

Welcome to a new generation of speech recognition.

By voice-enabling everyone, on virtually anything -- web & mobile apps, devices, vehicles, industrial systems -- SensibleSpeech augments products, empowers users and delivers immediate ROI.

In software API or server appliance version, SensibleSpeech brings plug & play voice to any application and runs on any infrastructure, including your own exclusively.

In middleware version, SensibleSpeech brings voice to any device or equipment, thereby freeing the hands and moves of operators. It works with or *without* network connection, making it ideal for military and medical requirements.

Because SensibleSpeech is GAFAM-free, it can be trained on specific business vocabularies, keeps productions costs under control, secures data end-to-end and complies with GDPR and all other privacy regulations.

The time for voice is now.

Discover why adopting SensibleSpeech is the most sensible decision you can make today...

Book a demo

SensibleSpeech's critical features

Functional
- ●
  Voice-enables any application or device
- ●
  Works with or without network connection
- ●
  Available with generic or industry-specific models (e.g. Medical)
- ●
  100% adaptable to corporate or industry vocabularies
- ●
  Live / offline / batch transcriptions by drag & drop
- ●
  Downloadable audio and texts transcripts, from live audio or media files
- ●
  Directly pluggable to vertical-specific NLUs, including LEXISTEMS'
- ●
  100% customizable in the functional and lexical areas.
Technical
- ●
  Available as a software, middleware or hardware solution (device or API server appliance)
- ●
  Available in 3 modes: remote API, partially or fully embedded (cloud / edge)
- ●
  API mode installable on any infrastructure, including 100% on-premises
- ●
  Embedded mode compatible with the most constrained hardware platforms and equipments
- ●
  Comes with optimized, fully customizable frontend user interfaces
- ●
  Ethical architecture.
Eco-responsibility
- ●
  Eco-responsible from design to production use
Availability and support
- ●
  Available and supported worldwide through the SENSIBLE Alliance.
GAFAM-free and therefore
- ●
  ...sensibly cost-effective via subscription or one-off pricings
- ●
  ...fully secured and compliant with all Data and Privacy regulations, end to end.

SensibleSpeech solves vocal pain points

Everyone wants voice in their apps and devices.

SensibleSpeech delivers it today, all inclusive, for a fraction of the costs.

Time

Problem: You don't have time to add voice to your products.

Solution: SensibleSpeech comes ready to run. For web or mobile apps, it's typically just a few, non-disruptive additional lines of UI code. For devices, standard middlewares slash integration times to a matter of days.

Costs

Problem: Existing vocal solutions cost a fortune to run in production.

Solution: Affordability is the second pillar of SensibleSpeech's ROI. Sensible pricing plans, whatever the number of supported languages. No hidden costs.

Security

Problem: Your data, apps and/or devices are sensitive.

Solution: In API or server appliance mode, SensibleSpeech can be deployed 100% on premises or on your own infrastructure. Secured protocols protect network traffic end-to-end, with complete shielding from network access. In device or middleware mode, SensibleSpeech can work *without* network connection.

Business specificity

Problem: Your organization and/or industry use specific vocabularies.

Solution: Contrary to the competition, SensibleSpeech can be trained on the most specific vocabularies, regardless of languages. Business specificity comes in addition to general-purpose models.

Custom processing

Problem: You need additional processing along with voice control.

Solution: Like all LEXISTEMS solutions, SensibleSpeech can be augmented with NLUs (including LEXISTEMS' meaning-based ones), custom plugins or black boxes. Name it, we'll connect it.

More solved pain points...

Multilingualism

Problem: Your users speak different native languages.

Solution: SensibleSpeech supports several simultaneous languages and can be trained on any number of new languages. Production costs are *not* multiplied by the number of supported languages.

Scalability

Problem: All your users and customers want it.

Solution: As an API, server appliance, device or middleware, SensibleSpeech scales horizontally to as many users as desired.

Compliance

Problem: GDPR, DPA, CCPA... Your organization is legally bound and compliance is your professional responsibility.

Solution: SensibleSpeech complies by design with privacy and other business-specific data regulations, current and future. Nothing to worry about on this front either.

Examples of SensibleSpeech use cases

Prudential - Business-oriented record / dictate

Problems solved - This project illustrates how live recording or dictation is one of the most demanding speech recognition applications. As a reminder, recording applies automatic punctuation; in dictation, punctuation is left to the speaker. In both cases, speech recognition must support the infinite variety of human expression, understand business-specific vocabularies and be as resilient as possible to conditions like accents, ambient noise, etc. Designed for recording or dictation in several languages, SensibleSpeech meets these requirements. It is available with general or vertical-specific models (e.g. Medical & life sciences, see the screen capture at the bottom of this page) and can be trained on custom / corporate specificities. With every SensibleSpeech applications, audio recordings + text transcripts are accessible for download for legal and archiving purposes.

Enedis Cinke - Voice enabling an existing application

Problems solved - This project (already presented with LEXISTEMS SensibleSearch) is a good example of how SensibleSpeech easily augments existing applications. It just took a few lines of frontend code to enable vocal control. In this case, the application has two roles: first, it collects information on power incidents (or other customers requests) and assigns response teams based on location and internal planning. Then, it predicts different timings, including arrival on site and duration of the repair works given the specific incident or request conditions. By voice enabling all input fields, the time to collect information and give customers an estimated repair time is slashed by half. With SensibleSpeech, voice-enablement works on all browser or mobile terminals, giving employees a consistent efficiency regardless of the application's context of use.
Note: this project combines SensibleSpeech and SensibleSearch.

Banorte - A smart bank officer assistant

Problems solved - In this on-going project with Banorte, Mexico's second largest bank, SensibleSpeech is used to listen to conversations between customers and branch officers. Whenever certain predefined topics arise in the conversation, SensibleSpeech records their time and occurrence, and fetches in real time relevant forms, brochures and other customer care documents automatically. The officer can therefore respond immediately to the customer's needs. This provides Banorte's employees with all necessary resources on a potentially infinite range of customer requests, promotes products or services based on seasonal sales strategies and helps Banorte optimize costs on training and internal communication. Conversations and commented transcripts are available for download and other legal requirements. They can be automatically converted into CRM records.
Note: this project combines SensibleSpeech and SensibleSearch.

Your SensibleSpeech ROI

All Speech recognition solutions are not created equal.

Here is why SensibleSpeech is the soundest choice ROI-wise:

Immediate

SensibleSpeech is ready to deploy, wherever and however you want. No complex setup. Time‑to‑voice is immediate.

Customizable

Applications? Devices? Cloud? Edge? SensibleSpeech adapts to your requirements and can be fully customized in the functional and lexical areas.

Cost-effective

SensibleSpeech's pricing options match your budget with predictable, no hidden costs.

LEXISTEMS SensibleSpeech: guaranteed ROI on your Speech-to-Text applications.

Elastic

SensibleSpeech scales along your user base and products lineup. Adopt it once and expand with it.

Secure

In API or server appliance mode, SensibleSpeech runs entirely on your infrastructure through secured network protocols. In embedded mode, it is fully functional without network connection.

Compliant

SensibleSpeech enforces all Data and Privacy regulations, current and future.

Benchmarks

WER metrics^[1]

	LibriSpeech	Common Voice	Common Voice	MLS	Common Voice	MLS
	English		French		Spanish
Amazon Transcribe	16.21%	9.55%	26.56%	25.48%	13.65%	11.96%
Google STT	13.36%	33.45%	37.07%	29.21%	16.25%	20.80%
IBM Watson	16.57%	35.94%	35.00%	52.00%	21.26%	25.50%
Microsoft Azure STT	11.57%	7.27%	12.76%	49.86%	7.60%	30.00%
SensibleSpeech	4.86%	11.34%	4.33%	5.82%	6.36%	7.40%

	LibriSpeech	Common Voice
	English
Amazon Transcribe	16.21%	9.55%
Google STT	13.36%	33.45%
IBM Watson	16.57%	35.94%
Microsoft Azure STT	11.57%	7.27%
SensibleSpeech	4.86%	11.34%

	Common Voice	MLS
	French
Amazon Transcribe	26.56%	25.48%
Google STT	37.07%	29.21%
IBM Watson	35.00%	52.00%
Microsoft Azure STT	12.76%	49.86%
SensibleSpeech	4.33%	5.82%

	Common Voice	MLS
	Spanish
Amazon Transcribe	13.65%	11.96%
Google STT	16.25%	20.80%
IBM Watson	21.26%	25.50%
Microsoft Azure STT	7.60%	30.00%
SensibleSpeech	6.36%	7.40%

WER, or Word Error Rate, is a standard metric for comparing speech recognition systems. It measures the percentage of words left out or misinterpreted during audio recognition.

MER metrics^[1]

	LibriSpeech	Common Voice	Common Voice	MLS	Common Voice	MLS
	English		French		Spanish
Amazon Transcribe	15.12%	8.57%	24.97%	24.35%	12.69%	11.32%
Google STT	12.87%	32.3%	36.40%	28.71%	15.76%	20.13%
IBM Watson	15.55%	32.77%	33.01%	51.20%	19.98	24.83%
Microsoft Azure STT	11.29%	6.99%	12.28%	49.38%	7.30%	29.83%
SensibleSpeech	4.68%	11.29%	4.29%	5.79%	6.18%	7.20%

	LibriSpeech	Common Voice
	English
Amazon Transcribe	15.12%	8.57%
Google STT	12.87%	32.30%
IBM Watson	15.55%	32.77%
Microsoft Azure STT	11.29%	6.99%
SensibleSpeech	4.68%	11.29%

	Common Voice	MLS
	French
Amazon Transcribe	24.97%	24.35%
Google STT	36.40%	28.71%
IBM Watson	33.01%	51.20%
Microsoft Azure STT	12.28%	49.38%
SensibleSpeech	4.29%	5.79%

	Common Voice	MLS
	Spanish
Amazon Transcribe	12.69%	11.32%
Google STT	15.76%	20.13%
IBM Watson	19.98%	24.83%
Microsoft Azure STT	7.30%	29.83%
SensibleSpeech	6.18%	7.20%

MER, or Match Error Rate, is the probability of a given match (between speech and recognition) being incorrect.

WIL metrics^[1]

	LibriSpeech	Common Voice	Common Voice	MLS	Common Voice	MLS
	English		French		Spanish
Amazon Transcribe	23.71%	12.59%	34.59%	37.07%	19.03%	17.85%
Google STT	19.62%	39.09%	44.16%	40.28%	21.86%	28.20%
IBM Watson	24.41%	43.78%	45.13%	72.06%	26.43%	38.01%
Microsoft Azure STT	15.41%	10.51%	18.51%	55.44%	10.87%	33.70%
SensibleSpeech	7.75%	2.08%	7.28	8.28%	10.20%	11.90%

	LibriSpeech	Common Voice
	English
Amazon Transcribe	23.71%	12.59%
Google STT	19.62%	39.09%
IBM Watson	24.41%	43.78%
Microsoft Azure STT	15.41%	10.51%
SensibleSpeech	7.75%	2.08%

	Common Voice	MLS
	French
Amazon Transcribe	34.59%	37.07%
Google STT	44.16%	40.28%
IBM Watson	45.13%	72.06%
Microsoft Azure STT	18.51%	55.44%
SensibleSpeech	7.28%	8.28%

	Common Voice	MLS
	Spanish
Amazon Transcribe	19.03%	17.85%
Google STT	21.86%	28.20%
IBM Watson	26.43%	38.01%
Microsoft Azure STT	10.87%	33.70%
SensibleSpeech	10.20%	11.90%

WIL, or Word Information Lost, is a probabilistic approximation to the proportion of word information not being preserved during recognition.

Given the specificities of speech recognition, no single metric is enough in itself to define performance. If WER has become the de facto standard, real world uses prove that it is affected by ambiant noise, accents, crosstalk and rare words. MER and WIL have been introduced to compensate for these flaws. In this respect, WIL is a finer-grained measurement of recognition exactness.

It is an industry practice to evaluate speech recognition systems on audio datasets that are themselves standardized and publicly available. Among them, LibriSpeech is the most used worldwide, but it is only available for English. Common Voice and MLS are good substitutes for non-English evaluations.

The values above have been collected by LEXISTEMS via regular subscriptions to the services, using automated benchmarking scripts.
These scripts are available upon demand.

[1] September 2022 update.

Technical requirements

SensibleSpeech API

SensibleSpeech is a green code, highly optimized API. So are its technical requirements.

Deployment - For maximum deployment flexibility, the SensibleSpeech API is delivered as a fully-functional, self-sufficient Docker image with customizable volumes and shielding from network access. As such, SensibleSpeech fits right in any orchestration / production environment, and scales infinitely with state-of-the-art security.

Monitoring - SensibleSpeech supports all custom or standard platforms (Kibana, Graphana, Prometheus...) with async, non-blocking interactions. An optional dedicated monitoring platform is available, based on Kibana.

Hardware platforms - Depending on the specificities of your SensibleSpeech API, the servers to host it may or may not require a compute GPU. Other hardware specifications are of commodity grade (e.g. Intel Xeon E / AMD Epyc 7F32+, 32 GB RAM) unless concurrent transaction volumes or custom features require otherwise.

Please consult us for details.

SensibleSpeech Server Appliances

SensibleSpeech server appliances enclose a SensibleSpeech API within a dedicated server. They come ready to plug and serve speech transcripts.

Custom configurations - SensibleSpeech server appliances are typically built to order according to customers' production requirements, especially user base, estimated concurrent queries and optional custom NLU features.

Vendor agnostic - SensibleSpeech server appliances can be designed from any mix of OEM sources. We favor cost-effectiveness, scale provisioning and energy efficiency with recommended architectures, but any setup is possible.

Please consult us for details.

SensibleSpeech Devices

Fully functional, autonomous SensibleSpeech devices are available based on standard platforms like Raspberry PI 4, NVIDIA Jetson Nano and Asus Tinker boards.

Input/Output, peripheral connectivity, networkless operation - SensibleSpeech devices include audio input connectors or optional integrated microphones, as well as video output connectors or optional integrated screens. They also feature GPIO (General Purpose Input/Output) interfaces allowing them to drive any connected peripheral or equipment, by wire or wirelessly, with or *without* network connection. Logically, technical requirements vary on a use case basis. For specific projects or integration on other platforms, please consult us.

SensibleSpeech Middlewares

SensibleSpeech middlewares are designed to easily voice-enable vehicles, OEM devices and industrial systems.

Business-specific customization - As SensibleSpeech devices demonstrate, SensibleSpeech middlewares integrate into virtually any OEM systems, even those with basic hardware specifications. Features availability depends on these specifications, but typically speech recognition is completed with AI models and NLU customized for the equipment's vertical and functional standards (e.g. DMX or Digital Multiplex for the media production lighting industry). In other words, a SensibleSpeech-powered equipment becomes 100% voice operable, giving operators freedom of hands and moves.

Systems assignment, networkless operation - SensibleSpeech middlewares can be configured to drive either a single piece of equipment (X systems = X middlewares each with a specific system ID) or a group of equipments (X systems = 1/X middlewares with routing IDs). SensibleSpeech middlewares work with or *without* network connection depending on the production environment specificities and data security requirements.