# Custom Speech Synthesizer

The speech synthesizer is a text-to-speech service that is used to convert text into sounds that approximate the sound of human speech. It can work with the ReadAoud feature to provide a powerful text-to-speeech function which can read aloud page contents.

Depending on the speech technology, the sounds generated may be somewhat stilted and artificial sounding, or sound very much like the voice of a real person.

To better demonstrate how you could use different text-to-speech technology with our Foxit PDF SDK for Web, we take browser native Web Speech API (opens new window) as an example in the following section Customize PDFTextToSpeechSynthesis, and use the Google cloud text-to-speech API (opens new window) in the section of Integrating 3rd Party TTS Service.

# Speech Synthesizer APIs

# PDFTextToSpeechSynthesis Interface Specification

interface PDFTextToSpeechSynthesis {
    status: PDFTextToSpeechSynthesisStatus;
    supported(): boolean;
    pause(): void;
    resume(): void;
    stop():void;
    play(utterances: IterableIterator<Promise<PDFTextToSpeechUtterance>>, options?: ReadAloudOptions): Promise<void>;
    updateOptions(options: Partial<ReadAloudOptions>): void;
}

# 1. status Properties

The status enumerates the current reading aloud state. It can be defined as below:

enum PDFTextToSpeechSynthesisStatus {
    playing, paused, stopped,
}

The default value is stopped.

# 2. supported():boolean Method

This method is used to detect if the PDFTextToSpeechSynthesis is supported in your current client environment. If there is a 3rd party speech service running in the background, you only need to check if HTML<audio> is supported on your client side.

Note: The client here could either be a browser, or something others such as Electron, Apache cordova, ect.

Code Example:

class CustomPDFTextToSpeechSynthesis {
    supported(): boolean {
        return typeof window.HTMLAudioElement === 'function';
    }
    // .... other methods
}

# 3. pause(),resume and stop() Methods

These methods are used to control the state of reading aloud. Through these methods, the PDFTextToSpeechSynthesis can manage the voice media to pause, resume, stop, and specify the status property.

# 4. updateOptions(options: Partial<ReadAloudOptions>) Method

This method is used to update the PDFTextToSpeechSynthesis in the reading aloud state, such as change the voice volume.

# 5. play(utterances: IterableIterator<Promise<PDFTextToSpeechUtterance>>, options?: ReadAloudOptions): Promise<void> Method

Parameter Description:

  1. utterances: This is an IterableIterator that contains the content of the text to be read as well as the page number and coordinate information, which can be used with for...of to iterate.
  2. options: This is an optional parameter that contains the speed, pitch, volume of the playback and the 'external' parameter, where 'external' is the parameter object passed to the third party speech synthesizer service.

# Customize PDFTextToSpeechSynthesis

# Method 1: Implement PDFTextToSpeechSynthesis interface

Notice: This demo only supports in Chrome, Firefox, Chromium Edge.

# Method 2: Using AbstractPDFTextToSpeechSynthesis to customize the speech sythesizer

# The Difference of PDFTextToSpeechSynthesis and AbstractPDFTextToSpeechSynthesis

The Method 1 customizes speech synthesizer by implementing the interface PDFTextToSpeechSynthesis. It needs to manually manage state changes as well as iterate the list of 'utterances' by for await...of. Each item in the 'Utterance' list is a text block obtained from PDFPage. In some cases, the text block may just contain a part of a word or sentence, which requires merging text blocks to build up a complete word and sentence for better speech synthesizing. This merging operation can be completed in the play() method.

The Method 2 customizes speech synthesizer by inheriting the AbstractPDFTextToSpeechSynthesis abstract class. It doesn't require to manually manage state and iterate utterances list, but needs correctly call window.SpeechSynthesisUtterance to generate speech and play the voice based on the received text and parameters. These received text blocks will be automatically merged by AbstractPDFTextToSpeechSynthesis. However currently it is tough to guarantee that all the combined text blocks in different language environments would comprise complete words or sentences, as such if you are strict with reading correctness with each sentence and word, you are recommended to use Method 1.

# Integrating with 3rd Party TTS Service

We take @google-cloud/text-to-speech (opens new window) as an example in this section.

# Server

To start with Google Cloud Text-to-Speech server library with favorite programming language, refer to https://cloud.google.com/text-to-speech/docs/quickstarts (opens new window).

# Client

var readAloud = UIExtension.PDFViewCtrl.readAloud;
var PDFTextToSpeechSynthesisStatus = readAloud.PDFTextToSpeechSynthesisStatus;
var AbstractPDFTextToSpeechSynthesis = readAloud.AbstractPDFTextToSpeechSynthesis;
var SPEECH_SYNTHESIS_URL = '<server url>'; // the server API address

var ThirdpartyPDFTextToSpeechSynthesis = AbstractPDFTextToSpeechSynthesis.extend({
    init: function() {
        this.audioElement = null;
    },
    supported: function() {
        return typeof window.HTMLAudioElement === 'function' && document.createElement('audio') instanceof window.HTMLAudioElement;
    },
    doPause: function() {
        if(this.audioElement) {
            this.audioElement.pause();
        }
    },
    doStop: function() {
        if(this.audioElement) {
            this.audioElement.pause();
            this.audioElement.currentTime = 0;
            this.audioElement = null;
        }
    },
    doResume: function() {
        if(this.audioElement) {
            this.audioElement.play();
        }
    },
    onCurrentPlayingOptionsUpdated: function() {
        if(!this.audioElement) {
            return;
        }
        var options = this.currentPlayingOptions;
        if (this.status === PDFTextToSpeechSynthesisStatus.playing) {
            if(options.volume >= 0 && options.volume <= 1) {
                this.audioElement.volume = options.volume;
            }
        }
    },
    speakText: function(text, options) {
        var audioElement = document.createElement('audio');
        this.audioElement = audioElement;
        if(options.volume >= 0 && options.volume <= 1) {
            audioElement.volume = options.volume;
        }
        return this.speechSynthesis(text, options).then(function(src) {
            return new Promise(function(resolve, reject) {
                audioElement.src = src;
                audioElement.onended = function() {
                    resolve();
                };
                audioElement.onabort = function() {
                    resolve();
                };
                audioElement.onerror = function(e) {
                    reject(e);
                };
                audioElement.play();
            }).finally(function() {
                URL.revokeObjectURL(src);
            });
        });
    },
    // If the server API request method or parameter form is not consistent with the following implementation, it will need to be adjusted accordingly.
    speechSynthesis: function(text, options) {
        var url = SPEECH_SYNTHESIS_URL + '?' + this.buildURIQueries(text, options);
        return fetch(url).then(function(response) {
            if(response.status >= 400) {
                return response.json().then(function(json) {
                    return Promise.reject(JSON.parse(json).error);
                });
            }
            return response.blob();
        }).then(function (blob) {
            return URL.createObjectURL(blob);
        });
    },
    buildURIQueries: function(text, options) {
        var queries = [
            'text=' + encodeURIComponent(text)
        ];
        if(!options) {
            return queries.join('&');
        }
        if(typeof options.rate === 'number') {
            queries.push( 'rate=' + options.rate );
        }
        if(typeof options.spitch === 'number') {
            queries.push('spitch=' + options.spitch);
        }
        if(typeof options.lang === 'string') {
            queries.push('lang=' + encodeURIComponent(options.lang));
        }
        if(typeof options.voice === 'string') {
            queries.push('voice=' + encodeURIComponent(options.voice));
        }
        if(typeof options.external !== 'undefined') {
            queries.push('external=' + encodeURIComponent(JSON.stringify(options.external)));
        }
        return queries.join('&');
    }
});

# Using the custom speech synthesizer:

pdfui.getReadAloudService().then(function(service) {
    serivce.set(new ThirdpartyPDFTextToSpeechSynthesis());
});