Skip to content

Commit

Permalink
Update the manifest statistics of the L subset of wenetspeech (#731)
Browse files Browse the repository at this point in the history
  • Loading branch information
csukuangfj authored Dec 4, 2022
1 parent c25c8c6 commit bd7fa22
Showing 1 changed file with 19 additions and 0 deletions.
19 changes: 19 additions & 0 deletions egs/wenetspeech/ASR/local/display_manifest_statistics.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ def main():
paths = [
"./data/fbank/cuts_S.jsonl.gz",
"./data/fbank/cuts_M.jsonl.gz",
"./data/fbank/cuts_L.jsonl.gz",
"./data/fbank/cuts_DEV.jsonl.gz",
"./data/fbank/cuts_TEST_NET.jsonl.gz",
"./data/fbank/cuts_TEST_MEETING.jsonl.gz",
Expand All @@ -48,6 +49,24 @@ def main():
main()

"""
Starting display the statistics for ./data/fbank/cuts_L.jsonl.gz
Cuts count: 43874235
Total duration (hours): 30217.3
Speech duration (hours): 30217.3 (100.0%)
***
Duration statistics (seconds):
mean 2.5
std 1.7
min 0.2
25% 1.4
50% 2.0
75% 3.0
99% 8.4
99.5% 9.1
99.9% 15.4
max 405.1
Starting display the statistics for ./data/fbank/cuts_S.jsonl.gz
Duration statistics (seconds):
mean 2.4
Expand Down

0 comments on commit bd7fa22

Please sign in to comment.