James Taylor
f706689a56
extract_item_info: Don't extract author, author_id, etc. for channel items
...
Philosophically, a channel doesn't create itself.
2019-12-24 13:11:21 -08:00
James Taylor
3200d66d88
Fix extract_approx_int not working for non-approx ints, make extract_int more robust
...
For example, "354 subscribers" wasn't being extracted correctly be extract_approx_int.
Make extract_approx_int and extract_int only extract integers that are words.
So e.g. 342 will not be extracted from internetuser342
2019-12-24 13:07:12 -08:00
James Taylor
7a6bcb6128
Rewrite channel extraction with proper error handling and new extraction names. Extract subscriber_count correctly.
...
Don't just shove english strings into info['stats']. Actually give semantic names for the stats.
2019-12-21 15:45:01 -08:00
James Taylor
3936310e7e
Fix extract_approx_int. Fixes incorrect subscriber count on channels.
...
It wasn't working because decimals such as 15.1M weren't considered, so it was extracting "1M"
2019-12-21 15:44:03 -08:00
James Taylor
a9f67d4630
Fix regression: date extraction broken. Move constants to correct file in yt_data_extract
2019-12-20 18:48:40 -08:00
James Taylor
4a3529df95
Extraction: Move stuff around in files and put underscores in front of internal helper function names
...
Move get_captions_url in watch_extraction to bottom next to other exported, public functions
2019-12-19 20:12:37 -08:00
James Taylor
d1d908d5b1
Extraction: Move html post processing stuff from yt_data_extract to util
2019-12-19 19:48:53 -08:00
James Taylor
76376b29a0
Extraction: Split yt_data_extract.py into multiple files
2019-12-19 19:29:47 -08:00