o �J�h0k�@s$ddlZddlZddlZddlmZmZmZm Z m Z ddl m Z ddl mZmZmZmZddlmZmZddlmZddlmZddlZddlTGd d �d e�ZGd d �d e�ZGd d�de�ZGdd�de�ZGdd�de�Z Gdd�de�Z!Gdd�de�Z"Gdd�de�Z#Gdd�de�Z$dS)�N)�Optional�Dict�List�Union� NamedTuple)�Query)� BaseModel�Field�field_validator� ConfigDict)� Translate�gettext)�Enum)�deepcopy)�*c@seZdZdZdZdZdS)� WhisperImpl�whisperzfaster-whisperZinsanely_fast_whisperN)�__name__� __module__� __qualname__�WHISPER�FASTER_WHISPER�INSANELY_FAST_WHISPER�rr�DC:\pinokio\api\whisper-webui.git\app\modules\whisper\data_classes.pyrsrc@s4eZdZUeddd�Zeeed<eddd�Zeeed<eddd�Z ee ed<edd d�Z ee ed <edd d�Z ee ed <edd d�Zeeeed<eddd�Zee ed<eddd�Zee ed<eddd�Zee ed<eddd�Zee ed<eddd�Zeeded<edejjfdd��ZdS)�SegmentNzIncremental id for the segment��default� description�idz&Seek of the segment from chunked audio�seekz!Transcription text of the segment�textzStart time of the segment�startzEnd time of the segment�endzList of token IDs�tokensz,Temperature used during the decoding process� temperaturez%Average log probability of the tokens� avg_logprobz Compression ratio of the segment�compression_ratioz Probability that it's not speech�no_speech_probz&List of words contained in the segment�Word�words�segc CsR|jdurdd�|jD�}nd}||j|j|j|j|j|j|j|j|j |j |d� S)NcSs$g|]}t|j|j|j|jd��qS))r"r#�word� probability)r)r"r#r,r-)�.0�wrrr� <listcomp>&s���z/Segment.from_faster_whisper.<locals>.<listcomp>) rr r!r"r#r$r%r&r'r(r*) r*rr r!r"r#r$r%r&r'r()�clsr+r*rrr�from_faster_whisper"s$ � �zSegment.from_faster_whisper)rrrr rr�int�__annotations__r r!�strr"�floatr#r$rr%r&r'r(r*� classmethod�faster_whisper� transcriberr2rrrrrs �rc@sneZdZUeddd�Zeeed<eddd�Zeeed<eddd�Z ee ed<eddd�Z eeed <dS) r)NzStart time of the wordrr"r#z Word textr,zProbability of the wordr-) rrrr r"rr6r4r#r,r5r-rrrrr)@s r)c@sHeZdZedd�Zdefdd�Zdefdd�Ze deddfd d ��Z d S) � BaseParamsr)�protected_namespaces�returncCs|��S�N)� model_dump��selfrrr�to_dictJszBaseParams.to_dictcCst|�����Sr=)�listr>�valuesr?rrr�to_listMszBaseParams.to_list� data_listcCs&t|j���}|ditt||����S)Nr)rB� model_fields�keys�dict�zip)r1rE� field_namesrrr� from_listPszBaseParams.from_listN) rrrr � model_configrrArrDr7rKrrrrr:Gs  r:c@s�eZdZUdZeddd�Zeed<edddd d �Ze ed <ed d dd�Z e ed<ee d�d dd�Z e ed<edd dd�Z e ed<edd dd�Ze ed<ed deedeejjjfdd��ZdS)!� VadParamsz#Voice Activity Detection parametersFz>Enable voice activity detection to filter out non-speech partsr� vad_filter��?���?zUSpeech threshold for Silero VAD. Probabilities above this value are considered speech�r�ge�ler� threshold��rz3Final speech chunks shorter than this are discarded�rrSr�min_speech_duration_ms�infz,Maximum duration of speech chunks in seconds�r�gtr�max_speech_duration_si�z.Minimum silence duration between speech chunks�min_silence_duration_msi�z+Padding added to each side of speech chunks� speech_pad_msN�defaultsr<c Cs�tjtd�|�d|jdj�dtd�d�tjdddd |�d |jd j�d d �tjd d|�d|jdj�dd�tjd|�dt�dd�tjdd|�d|jdj�dd�tjdd|�d|jdj�dd�gS)NzEnable Silero VAD FilterrNTz-Enable this to transcribe only detected voice��label�value� interactive�inforPrQ�{�G�z�?zSpeech ThresholdrUz.Lower it to be more sensitive to small sounds.)�minimum�maximum�steprarbrdzMinimum Speech Duration (ms)rrXz9Final speech chunks shorter than this time are thrown out)ra� precisionrbrdzMaximum Speech Duration (s)r\z/Maximum duration of speech chunks in "seconds".�rarbrdzMinimum Silence Duration (ms)r]zGIn the end of each speech chunk wait for this time before separating itzSpeech Padding (ms)r^z5Final speech chunks are padded by this time each side) �gr�Checkbox�_�get� __fields__r�Slider�NumberZGRADIO_NONE_NUMBER_MAX)r1r_rrr�to_gradio_inputsvs@��� ����zVadParams.to_gradio_inputsr=)rrr�__doc__r rN�boolr4rUr6rXr3r\r]r^r7rrrrk� components�base� FormComponentrrrrrrrMXs> �����(rMc @s�eZdZUdZeddd�Zeed<eddd�Ze ed<ed d d�Z e ed <ed d d�Z eed<e   dde ede ede e deejjjfdd��ZdS)�DiarizationParamszSpeaker diarization parametersFzEnable speaker diarizationr� is_diarize�cudaz Device to run Diarization model.�diarization_device�z5Hugging Face token for downloading diarization models�hf_tokenTz3Offload Diarization model after Speaker diarization�enable_offloadNr_�available_devices�devicer<c Cs�tjtd�|�d|jdj�d�tjtd�|durgd�n||�d|�d�tjtd�|�d |jd j�td �d �tjtd �|�d |jd j�d�gS)NzEnable Diarizationry�rarb�Device��cpurz�xpur��ra�choicesrbzHuggingFace Tokenr}z9This is only needed the first time you download the modelrj�Offload sub model when finishedr~)rkrlrmrnror�Dropdown�Textbox)r1r_rr�rrrrr�s&� ����z"DiarizationParams.to_gradio_inputs)NNN)rrrrsr ryrtr4r{r5r}r~r7rrrrkrurvrwrrrrrrrx�s2 ������ �rxc @s�eZdZUdZeddd�Zeed<eddd�Ze ed<ed d d�Z e ed <ed d dd�Z e ed<eddd�Z eed<eddd�Zeed<e    ddeedeedee deedeejjjf dd��ZdS)�BGMSeparationParamsz&Background music separation parametersFz"Enable background music separationr�is_separate_bgm�UVR-MDX-NET-Inst_HQ_4zUVR model size�uvr_model_sizerzzDevice to run UVR model.� uvr_device�r�Segment size for UVR modelrZ� segment_sizez%Whether to save separated audio files� save_fileTz%Offload UVR model after transcriptionr~Nr_rr��available_modelsr<c Cs�tjtd�|�d|jdj�dtd�d�tjtd�|dur!ddgn||�d |jd j�d �tjtd �|dur:gd �n||�d |�d �tjd|�d|jdj�ddd�tjtd�|�d|jdj�d�tjtd�|�d|jdj�d�gS)Nz&Enable Background Music Remover Filterr�Tz*Enabling this will remove background musicr`�Modelr�zUVR-MDX-NET-Inst_3r�r�r�r�r�z Segment Sizer�rr��rarbrirdzSave separated files to outputr�r�r�r~)rkrlrmrnrorr�rq)r1r_rr�r�rrr�to_gradio_input�sF���� �����z#BGMSeparationParams.to_gradio_input)NNNN)rrrrsr r�rtr4r�r5r�r�r3r�r~r7rrrrkrurvrwr�rrrrr��sJ ��������� �r�c@s`eZdZUdZeddd�Zeed<eddd�Ze eed<ed d d�Z e ed <ed d dd�Z e ed<eddd�Zeed<eddddd�Zeed<eddd�Zeed<ed d dd�Ze ed<eddd d!�Zeed"<ed#d$d�Ze ed%<ed&ddd'd�Zeed(<edd)d�Ze eed*<eddd+d�Zeed,<ed-dd.d!�Zeed/<eddd0d!�Zeed1<eddd2d!�Zeed3<eddd4d�Ze ed5<edd6d�Ze eed7<ed#d8d�Ze ed9<ed:gd;d�Ze ee e efed<<eddd=d�Z!eed><ed d?d�Z"e ed@<edAdBd�Z#e eedC<edDdEd�Z$e eedF<eddGd�Z%e e edH<edIdJd�Z&e e edK<eddLd�Z'e eedM<eddNd�Z(e eedO<ed&dPd�Z)e eedQ<ed ddRd!�Z*e edS<edTddUd!�Z+e edV<ed#dWd�Z,e edX<e-d�dYdZ��Z.e-d<�d[d\��Z/e0  #     ded]e e1d^e e d_e ed`e e dae e dbe e de efdcdd��Z2dS)f� WhisperParamszWhisper parameterszlarge-v2zWhisper model sizer� model_sizeNz)Source language of the file to transcribe�langFz&Translate speech to English end-to-end� is_translate���Beam size for decodingrW� beam_sizeg��7Threshold for average log probability of sampled tokens�log_prob_thresholdg333333�?rPrQ�Threshold for detecting silencerR�no_speech_threshold�float16�"Computation type for transcription� compute_type�"Number of candidates when sampling�best_ofr�Beam search patience factorrZ�patienceT�-Use previous output as prompt for next window�condition_on_previous_textrO�*Temperature threshold for resetting prompt�prompt_reset_on_temperature�Initial prompt for first window�initial_prompt�Temperature for samplingr%g333333@�$Threshold for gzip compression ratio�compression_ratio_threshold�Exponential length penalty�length_penalty�Penalty for repeated tokens�repetition_penalty�%Size of n-grams to prevent repetition�no_repeat_ngram_size�Prefix text for first window�prefix�+Suppress blank outputs at start of sampling�suppress_blank������Token IDs to suppress�suppress_tokens�Maximum initial timestamp�max_initial_timestamp�Extract word-level timestamps�word_timestampsu "'“¿([{-�$Punctuations to merge with next word�prepend_punctuationsu"'.。,,!!??::”)]}、�(Punctuations to merge with previous word�append_punctuations�&Maximum number of new tokens per chunk�max_new_tokens��#Length of audio segments in seconds� chunk_length�@Threshold for skipping silent periods in hallucination detection�hallucination_silence_threshold�#Hotwords/hint phrases for the model�hotwords�,Threshold for language detection probability�language_detection_threshold�)Number of segments for language detection�language_detection_segments��Batch size for processing� batch_sizez)Offload Whisper model after transcriptionr~cCs ddlm}||��krdS|S)Nr)�AUTOMATIC_DETECTION)�modules.utils.constantsr��unwrap)r1�vr�rrr� validate_langWs zWhisperParams.validate_langc Csrddl}z!t|t�r|�|�}t|t�std��|WSt|t�r#|WSWdSty8}ztd|����d}~ww)Nrz<Invalid Suppress Tokens. The value must be type of List[int]z>Invalid Suppress Tokens. The value must be type of List[int]: )�ast� isinstancer5� literal_evalrB� ValueError� Exception)r1r�r�r��errr�validate_supress_tokens\s    ���z%WhisperParams.validate_supress_tokensr_� only_advanced� whisper_typer��available_langs�available_compute_typesc CsL|durtjjn|����}g}|sD|tjtd�||�d|j dj �d�tjtd�||�dt �d�tj td�|�d|j dj �d�g7}|tj d |�d |j d j �d d d �tj d|�d|j dj �dd�tj d|�d|j dj �dd�tjd|dur~gd�n||�d|�dd�tj d|�d|j dj �d dd �tj d|�d|j dj �dd�tj d |�d!|j d!j �d"d�tjd#|�d$|j d$j �d d%d&d'd(�tjd)|�d*t�d+d�tjd,|�d-|j d-j �d.d&d/d0d1�tj d2|�d3|j d3j �d4d�g 7}tj d5|�d6|j d6j �d7d�tj d8|�d9|j d9j �d:d�tj d;|�d<|j d<j �d d=d �tjd>|�d?t�d@d�tj dA|�dB|j dBj �dCd�tjdD|�dEdF�dGd�tj dH|�dI|j dIj �dJd�tj dK|�dL|j dLj �dMd�tjdN|�dO|j dOj �dPd�tjdQ|�dR|j dRj �dSd�tj dT|�dUt�d dVd �tj dW|�dX|j dXj �d dYd �tj dZ|�d[t�d\d�tjd]|�d^|j d^j �d_d�tj d`|�dat�dbd�tj dc|�dd|j ddj �d ded �g} tj df|�dg|j dgj �d dhd �g} |tjjk�r�| D]} di| _�q�|tjjk�r | D]} di| _�q|| | 7}|tj tdj�|�dk|j dkj �d�g7}|S)lNr�r�r��Languager�zTranslate to English?r�r�z Beam Sizer�rr�r�zLog Probability Thresholdr�r�rjzNo Speech Thresholdr�r�z Compute Type)r��int8�int16r�r�)rar�rbrdzBest Ofr�r�ZPatiencer�r�zCondition On Previous Textr�r�zPrompt Reset On Temperaturer�r�rer�)rarbrfrgrhrdzInitial Promptr�r�Z Temperaturer%rPrQr�)rarbrfrhrgrdzCompression Ratio Thresholdr�r�zLength Penaltyr�r�zRepetition Penaltyr�r�zNo Repeat N-gram Sizer�r�ZPrefixr�r�zSuppress Blankr�r�zSuppress Tokensr�z[-1]r�zMax Initial Timestampr�r�zWord Timestampsr�r�zPrepend Punctuationsr�r�zAppend Punctuationsr�r�zMax New Tokensr�r�zChunk Length (s)r�r�z%Hallucination Silence Threshold (sec)r�r�ZHotwordsr�r�zLanguage Detection Thresholdr�r�zLanguage Detection Segmentsr�r�z Batch Sizer�r�Fr�r~)rrrb�strip�lowerrkr�rmrnrorr�rlrqrpr�ZGRADIO_NONE_STRZGRADIO_NONE_NUMBER_MIN�visibler) r1r_r�r�r�r�r�r��inputsZfaster_whisper_inputsZinsanely_fast_whisper_inputs�input_componentrrrrrjs� � ������ ���� ��  �� ���F��� �� ����� ������� ���[��    ��zWhisperParams.to_gradio_inputs)NTNNNNN)3rrrrsr r�r5r4r�rr�rtr�r3r�r6r�r�r�r�r�r�r�r%r�r�r�r�r�r�r�rrr�r�r�r�r�r�r�r�r�r�r�r~r r�r�r7rrrrrrrr�s� �������&�������   ��������r�c@s�eZdZUdZeed�Zeed<eed�Z eed<ee d�Z e ed<ee d�Z e ed<defdd �Zdefd d �Zed eddfd d��ZdS)�TranscriptionPipelineParamsz!Transcription pipeline parameters)�default_factoryr�vad� diarization�bgm_separationr<cCs*|j��|j��|j��|j��d�}|S)N�rr�r�r�)rrAr�r�r�)r@�datarrrrAKs �z#TranscriptionPipelineParams.to_dictcCs8|j��}|j��}|j��}|j��}||||S)a Convert data class to the list because I have to pass the parameters as a list in the gradio. Related Gradio issue: https://github.com/gradio-app/gradio/issues/2471 See more about Gradio pre-processing: https://www.gradio.app/docs/components )rrDr�r�r�)r@� whisper_list�vad_list�diarization_list� bgm_sep_listrrrrDTs    z#TranscriptionPipelineParams.to_list� pipeline_listcCs�t|�}|dttj��}|ttj�d�}|dttj��}|ttj�d�}|dttj��}|ttj�d�}|dttj��}tt�|�t�|�t�|�t�|�d�S)z=Convert list to the data class again to use it in a function.rNr�) r�lenr�r4rMrxr�r�rK)r�rEr�r�r�r�rrrrK`s�z%TranscriptionPipelineParams.from_listN)rrrrsr r�rr4rMr�rxr�r�r�rrArrD� staticmethodrKrrrrr�Ds   r�)%�faster_whisper.transcriber8�gradiork�torch�typingrrrrr�fastapir�pydanticrr r r � gradio_i18nr r rm�enumr�copyr�yamlr�rrr)r:rMrxr�r�r�rrrr�<module>s,   +C(?D
Memory