o �J�h0k�@s$ddlZddlZddlZddlmZmZmZm Z m Z ddlmZddl mZmZmZmZddlmZmZddlmZddlmZddlZddlTGd d �d e�ZGdd�de�ZGd d�de�ZGdd�de�ZGdd�de�Z Gdd�de�Z!Gdd�de�Z"Gdd�de�Z#Gdd�de�Z$dS)�N)�Optional�Dict�List�Union� NamedTuple)�Query)� BaseModel�Field�field_validator� ConfigDict)� Translate�gettext)�Enum)�deepcopy)�*c@seZdZdZdZdZdS)�WhisperImpl�whisperzfaster-whisperZinsanely_fast_whisperN)�__name__� __module__�__qualname__�WHISPER�FASTER_WHISPER�INSANELY_FAST_WHISPER�rr�DC:\pinokio\api\whisper-webui.git\app\modules\whisper\data_classes.pyrsrc@s4eZdZUeddd�Zeeed<eddd�Zeeed<eddd�Z ee ed<edd d�Zeeed <eddd�Z eeed<edd d�Zeeeed<eddd�Zeeed<eddd�Zeeed<eddd�Zeeed<eddd�Zeeed<eddd�Zeeded<edejjfdd��ZdS)�SegmentNzIncremental id for the segment��default�description�idz&Seek of the segment from chunked audio�seekz!Transcription text of the segment�textzStart time of the segment�startzEnd time of the segment�endzList of token IDs�tokensz,Temperature used during the decoding process�temperaturez%Average log probability of the tokens�avg_logprobz Compression ratio of the segment�compression_ratioz Probability that it's not speech�no_speech_probz&List of words contained in the segment�Word�words�segc CsR|jdurdd�|jD�}nd}||j|j|j|j|j|j|j|j|j |j |d�S)NcSs$g|]}t|j|j|j|jd��qS))r"r#�word�probability)r)r"r#r,r-)�.0�wrrr� <listcomp>&s��z/Segment.from_faster_whisper.<locals>.<listcomp>)rr r!r"r#r$r%r&r'r(r*)r*rr r!r"r#r$r%r&r'r()�clsr+r*rrr�from_faster_whisper"s$ � �zSegment.from_faster_whisper)rrrr rr�int�__annotations__r r!�strr"�floatr#r$rr%r&r'r(r*�classmethod�faster_whisper� transcriberr2rrrrrs �rc@sneZdZUeddd�Zeeed<eddd�Zeeed<eddd�Z ee ed<eddd�Zeeed <dS) r)NzStart time of the wordrr"r#z Word textr,zProbability of the wordr-)rrrr r"rr6r4r#r,r5r-rrrrr)@s r)c@sHeZdZedd�Zdefdd�Zdefdd�Ze deddfd d ��Z dS)� BaseParamsr)�protected_namespaces�returncCs|��S�N)� model_dump��selfrrr�to_dictJszBaseParams.to_dictcCst|��Sr=)�listr>�valuesr?rrr�to_listMszBaseParams.to_list� data_listcCs&t|j��}|ditt||��S)Nr)rB�model_fields�keys�dict�zip)r1rE�field_namesrrr� from_listPszBaseParams.from_listN)rrrr�model_configrrArrDr7rKrrrrr:Gs r:c@s�eZdZUdZeddd�Zeed<edddd d �Ze ed<edd dd�Z eed<ee d�d dd�Ze ed<edd dd�Z eed<edd dd�Zeed<ed deedeejjjfdd��ZdS)!� VadParamsz#Voice Activity Detection parametersFz>Enable voice activity detection to filter out non-speech partsr� vad_filter��?��?zUSpeech threshold for Silero VAD. Probabilities above this value are considered speech�r�ge�ler� threshold��rz3Final speech chunks shorter than this are discarded�rrSr�min_speech_duration_ms�infz,Maximum duration of speech chunks in seconds�r�gtr�max_speech_duration_si�z.Minimum silence duration between speech chunks�min_silence_duration_msi�z+Padding added to each side of speech chunks� speech_pad_msN�defaultsr<c Cs�tjtd�|�d|jdj�dtd�d�tjdddd |�d |jd j�dd�tjd d|�d|jdj�dd�tjd|�dt�dd�tjdd|�d|jdj�dd�tjdd|�d|jdj�dd�gS)NzEnable Silero VAD FilterrNTz-Enable this to transcribe only detected voice��label�value�interactive�inforPrQ�{�G�z�?zSpeech ThresholdrUz.Lower it to be more sensitive to small sounds.)�minimum�maximum�steprarbrdzMinimum Speech Duration (ms)rrXz9Final speech chunks shorter than this time are thrown out)ra� precisionrbrdzMaximum Speech Duration (s)r\z/Maximum duration of speech chunks in "seconds".�rarbrdzMinimum Silence Duration (ms)r]zGIn the end of each speech chunk wait for this time before separating itzSpeech Padding (ms)r^z5Final speech chunks are padded by this time each side) �gr�Checkbox�_�get� __fields__r�Slider�NumberZGRADIO_NONE_NUMBER_MAX)r1r_rrr�to_gradio_inputsvs@�� zVadParams.to_gradio_inputsr=)rrr�__doc__r rN�boolr4rUr6rXr3r\r]r^r7rrrrk� components�base� FormComponentrrrrrrrMXs> ��(rMc@s�eZdZUdZeddd�Zeed<eddd�Ze ed<ed d d�Z e ed<edd d�Zeed<e dde ede ede e deejjjfdd��ZdS)�DiarizationParamszSpeaker diarization parametersFzEnable speaker diarizationr� is_diarize�cudaz Device to run Diarization model.�diarization_device�z5Hugging Face token for downloading diarization models�hf_tokenTz3Offload Diarization model after Speaker diarization�enable_offloadNr_�available_devices�devicer<c Cs�tjtd�|�d|jdj�d�tjtd�|durgd�n||�d|�d�tjtd�|�d |jd j�td �d�tjtd�|�d |jd j�d�gS)NzEnable Diarizationry�rarb�Device��cpurz�xpur��ra�choicesrbzHuggingFace Tokenr}z9This is only needed the first time you download the modelrj�Offload sub model when finishedr~)rkrlrmrnror�Dropdown�Textbox)r1r_rr�rrrrr�s&� ��z"DiarizationParams.to_gradio_inputs)NNN)rrrrsr ryrtr4r{r5r}r~r7rrrrkrurvrwrrrrrrrx�s2 ��rxc @s�eZdZUdZeddd�Zeed<eddd�Ze ed<ed d d�Z e ed<edd dd�Zeed<eddd�Z eed<eddd�Zeed<e ddeedeedee deedeejjjf dd��ZdS)�BGMSeparationParamsz&Background music separation parametersFz"Enable background music separationr�is_separate_bgm�UVR-MDX-NET-Inst_HQ_4zUVR model size�uvr_model_sizerzzDevice to run UVR model.� uvr_device�r�Segment size for UVR modelrZ�segment_sizez%Whether to save separated audio files� save_fileTz%Offload UVR model after transcriptionr~Nr_rr��available_modelsr<cCs�tjtd�|�d|jdj�dtd�d�tjtd�|dur!ddgn||�d |jd j�d �tjtd�|dur:gd�n||�d |�d �tjd|�d|jdj�ddd�tjtd�|�d|jdj�d�tjtd�|�d|jdj�d�gS)Nz&Enable Background Music Remover Filterr�Tz*Enabling this will remove background musicr`�Modelr�zUVR-MDX-NET-Inst_3r�r�r�r�r�zSegment Sizer�rr��rarbrirdzSave separated files to outputr�r�r�r~)rkrlrmrnrorr�rq)r1r_rr�r�rrr�to_gradio_input�sF�� z#BGMSeparationParams.to_gradio_input)NNNN)rrrrsr r�rtr4r�r5r�r�r3r�r~r7rrrrkrurvrwr�rrrrr��sJ ��r�c@s`eZdZUdZeddd�Zeed<eddd�Ze eed<ed d d�Z eed<edd dd�Ze ed<eddd�Zeed<eddddd�Zeed<eddd�Zeed<edd dd�Ze ed<eddd d!�Zeed"<ed#d$d�Zeed%<ed&ddd'd�Zeed(<edd)d�Ze eed*<eddd+d�Zeed,<ed-dd.d!�Zeed/<eddd0d!�Zeed1<eddd2d!�Zeed3<eddd4d�Ze ed5<edd6d�Ze eed7<ed#d8d�Zeed9<ed:gd;d�Ze ee e efed<<eddd=d�Z!eed><ed d?d�Z"eed@<edAdBd�Z#e eedC<edDdEd�Z$e eedF<eddGd�Z%e e edH<edIdJd�Z&e e edK<eddLd�Z'e eedM<eddNd�Z(e eedO<ed&dPd�Z)e eedQ<ed ddRd!�Z*e edS<edTddUd!�Z+e edV<ed#dWd�Z,eedX<e-d�dYdZ��Z.e-d<�d[d\��Z/e0 # ded]e e1d^e ed_e ed`e e dae e dbe e de efdcdd��Z2dS)f� WhisperParamszWhisper parameterszlarge-v2zWhisper model sizer� model_sizeNz)Source language of the file to transcribe�langFz&Translate speech to English end-to-end�is_translate��Beam size for decodingrW� beam_sizeg��7Threshold for average log probability of sampled tokens�log_prob_thresholdg333333�?rPrQ�Threshold for detecting silencerR�no_speech_threshold�float16�"Computation type for transcription�compute_type�"Number of candidates when sampling�best_ofr�Beam search patience factorrZ�patienceT�-Use previous output as prompt for next window�condition_on_previous_textrO�*Temperature threshold for resetting prompt�prompt_reset_on_temperature�Initial prompt for first window�initial_prompt�Temperature for samplingr%g333333@�$Threshold for gzip compression ratio�compression_ratio_threshold�Exponential length penalty�length_penalty�Penalty for repeated tokens�repetition_penalty�%Size of n-grams to prevent repetition�no_repeat_ngram_size�Prefix text for first window�prefix�+Suppress blank outputs at start of sampling�suppress_blank��Token IDs to suppress�suppress_tokens�Maximum initial timestamp�max_initial_timestamp�Extract word-level timestamps�word_timestampsu"'“¿([{-�$Punctuations to merge with next word�prepend_punctuationsu"'.。,，!！?？:：”)]}、�(Punctuations to merge with previous word�append_punctuations�&Maximum number of new tokens per chunk�max_new_tokens��#Length of audio segments in seconds�chunk_length�@Threshold for skipping silent periods in hallucination detection�hallucination_silence_threshold�#Hotwords/hint phrases for the model�hotwords�,Threshold for language detection probability�language_detection_threshold�)Number of segments for language detection�language_detection_segments��Batch size for processing� batch_sizez)Offload Whisper model after transcriptionr~cCs ddlm}||��krdS|S)Nr)�AUTOMATIC_DETECTION)�modules.utils.constantsr��unwrap)r1�vr�rrr� validate_langWszWhisperParams.validate_langc Csrddl}z!t|t�r|�|�}t|t�std��|WSt|t�r#|WSWdSty8}ztd|��d}~ww)Nrz<Invalid Suppress Tokens. The value must be type of List[int]z>Invalid Suppress Tokens. The value must be type of List[int]: )�ast� isinstancer5�literal_evalrB� ValueError� Exception)r1r�r�r��errr�validate_supress_tokens\s ��z%WhisperParams.validate_supress_tokensr_� only_advanced�whisper_typer��available_langs�available_compute_typescCsL|durtjjn|��}g}|sD|tjtd�||�d|j dj �d�tjtd�||�dt�d�tjtd�|�d|j dj �d�g7}|tj d |�d |j d j �ddd �tj d|�d|j dj �dd�tj d|�d|j dj �dd�tjd|dur~gd�n||�d|�dd�tj d|�d|j dj �ddd �tj d|�d|j dj �dd�tjd |�d!|j d!j �d"d�tjd#|�d$|j d$j �dd%d&d'd(�tjd)|�d*t�d+d�tjd,|�d-|j d-j �d.d&d/d0d1�tj d2|�d3|j d3j �d4d�g7}tj d5|�d6|j d6j �d7d�tj d8|�d9|j d9j �d:d�tj d;|�d<|j d<j �dd=d �tjd>|�d?t�d@d�tjdA|�dB|j dBj �dCd�tjdD|�dEdF�dGd�tj dH|�dI|j dIj �dJd�tjdK|�dL|j dLj �dMd�tjdN|�dO|j dOj �dPd�tjdQ|�dR|j dRj �dSd�tj dT|�dUt�ddVd �tj dW|�dX|j dXj �ddYd �tj dZ|�d[t�d\d�tjd]|�d^|j d^j �d_d�tj d`|�dat�dbd�tj dc|�dd|j ddj �dded �g} tj df|�dg|j dgj �ddhd �g} |tjjk�r�| D]}di|_�q�|tjjk�r | D]}di|_�q|| | 7}|tjtdj�|�dk|j dkj �d�g7}|S)lNr�r�r��Languager�zTranslate to English?r�r�z Beam Sizer�rr�r�zLog Probability Thresholdr�r�rjzNo Speech Thresholdr�r�zCompute Type)r��int8�int16r�r�)rar�rbrdzBest Ofr�r�ZPatiencer�r�zCondition On Previous Textr�r�zPrompt Reset On Temperaturer�r�rer�)rarbrfrgrhrdzInitial Promptr�r�ZTemperaturer%rPrQr�)rarbrfrhrgrdzCompression Ratio Thresholdr�r�zLength Penaltyr�r�zRepetition Penaltyr�r�zNo Repeat N-gram Sizer�r�ZPrefixr�r�zSuppress Blankr�r�zSuppress Tokensr�z[-1]r�zMax Initial Timestampr�r�zWord Timestampsr�r�zPrepend Punctuationsr�r�zAppend Punctuationsr�r�zMax New Tokensr�r�zChunk Length (s)r�r�z%Hallucination Silence Threshold (sec)r�r�ZHotwordsr�r�zLanguage Detection Thresholdr�r�zLanguage Detection Segmentsr�r�z Batch Sizer�r�Fr�r~)rrrb�strip�lowerrkr�rmrnrorr�rlrqrpr�ZGRADIO_NONE_STRZGRADIO_NONE_NUMBER_MIN�visibler)r1r_r�r�r�r�r�r��inputsZfaster_whisper_inputsZinsanely_fast_whisper_inputs�input_componentrrrrrjs� � �� F�� [�� zWhisperParams.to_gradio_inputs)NTNNNNN)3rrrrsr r�r5r4r�rr�rtr�r3r�r6r�r�r�r�r�r�r�r%r�r�r�r�r�r�r�rrr�r�r�r�r�r�r�r�r�r�r�r~r r�r�r7rrrrrrrr�s� ��&�� r�c@s�eZdZUdZeed�Zeed<eed�Z eed<ee d�Ze ed<eed�Z eed<defdd �Zdefd d�Zededdfd d��ZdS)�TranscriptionPipelineParamsz!Transcription pipeline parameters)�default_factoryr�vad�diarization�bgm_separationr<cCs*|j��|j��|j��|j��d�}|S)N�rr�r�r�)rrAr�r�r�)r@�datarrrrAKs�z#TranscriptionPipelineParams.to_dictcCs8|j��}|j��}|j��}|j��}||||S)a Convert data class to the list because I have to pass the parameters as a list in the gradio. Related Gradio issue: https://github.com/gradio-app/gradio/issues/2471 See more about Gradio pre-processing: https://www.gradio.app/docs/components )rrDr�r�r�)r@�whisper_list�vad_list�diarization_list�bgm_sep_listrrrrDTs z#TranscriptionPipelineParams.to_list� pipeline_listcCs�t|�}|dttj��}|ttj�d�}|dttj��}|ttj�d�}|dttj��}|ttj�d�}|dttj��}tt�|�t�|�t�|�t�|�d�S)z=Convert list to the data class again to use it in a function.rNr�) r�lenr�r4rMrxr�r�rK)r�rEr�r�r�r�rrrrK`s�z%TranscriptionPipelineParams.from_listN)rrrrsr r�rr4rMr�rxr�r�r�rrArrD�staticmethodrKrrrrr�Ds r�)%�faster_whisper.transcriber8�gradiork�torch�typingrrrrr�fastapir�pydanticrr r r�gradio_i18nrr rm�enumr�copyr�yamlr�rrr)r:rMrxr�r�r�rrrr�<module>s,+C(?D