使用BS4解析HTML表(Parsing HTML Tables with BS4)
我一直在尝试从这个站点抓取数据的不同方法( http://nflcombineresults.com/nflcombinedata.php?year=1999&pos=WR&college= ),似乎无法让它们中的任何一个工作。 我尝试过使用指数,但似乎无法使其发挥作用。 我想我此刻已经尝试了太多东西,所以如果有人能指出我正确的方向,我会非常感激。
我想提取所有信息并将其导出到.csv文件,但此时我只是想获取打印的名称和位置以开始使用。
这是我的代码:
import urllib2 from bs4 import BeautifulSoup import re url = ('http://nflcombineresults.com/nflcombinedata.php?year=1999&pos=&college=') page = urllib2.urlopen(url).read() soup = BeautifulSoup(page) table = soup.find('table') for row in table.findAll('tr')[0:]: col = row.findAll('tr') name = col[1].string position = col[3].string player = (name, position) print "|".join(player)
这是我得到的错误:第14行,名称= col [1] .string IndexError:列表索引超出范围。
--UPDATE--
好的,我已经取得了一些进展。 它现在允许我从头到尾,但它需要知道表中有多少行。 我怎么能直到最后才通过它们? 更新的代码:
import urllib2 from bs4 import BeautifulSoup import re url = ('http://nflcombineresults.com/nflcombinedata.php?year=1999&pos=&college=') page = urllib2.urlopen(url).read() soup = BeautifulSoup(page) table = soup.find('table') for row in table.findAll('tr')[1:250]: col = row.findAll('td') name = col[1].getText() position = col[3].getText() player = (name, position) print "|".join(player)
I've been trying different methods of scraping data from this site (http://nflcombineresults.com/nflcombinedata.php?year=1999&pos=WR&college=) and can't seem to get any of them to work. I've tried playing with the indices given, but can't seem to make it work. I think I've tried too many things at this point,so if someone could point me in the right direction I would really appreciate it.
I would like to pull all of the information and export it to a .csv file, but at this point I'm just trying to get the name and position to print to get started.
Here's my code:
import urllib2 from bs4 import BeautifulSoup import re url = ('http://nflcombineresults.com/nflcombinedata.php?year=1999&pos=&college=') page = urllib2.urlopen(url).read() soup = BeautifulSoup(page) table = soup.find('table') for row in table.findAll('tr')[0:]: col = row.findAll('tr') name = col[1].string position = col[3].string player = (name, position) print "|".join(player)
Here's the error I'm getting: line 14, in name = col[1].string IndexError: list index out of range.
--UPDATE--
Ok, I've made a little progress. It now allows me to go from start to finish, but it requires knowing how many rows are in the table. How would I get it to just go through them until the end? Updated Code:
import urllib2 from bs4 import BeautifulSoup import re url = ('http://nflcombineresults.com/nflcombinedata.php?year=1999&pos=&college=') page = urllib2.urlopen(url).read() soup = BeautifulSoup(page) table = soup.find('table') for row in table.findAll('tr')[1:250]: col = row.findAll('td') name = col[1].getText() position = col[3].getText() player = (name, position) print "|".join(player)
原文:https://stackoverflow.com/questions/22078620
满意答案
在Audacity中,您必须选中首选项的“ 导入/导出”部分中的 “使用自定义混合”单选按钮。 这将允许您导出多声道文件,并手动将曲目分配给频道。
除此之外,普通的旧.wav可以正常工作。
但您也可以使用SoX以更自动化的方式创建文件。
手动,您可以将五个不同的文件组合(或“合并”,如文档中所述)五个不同的文件,如下所示:
sox -M chan1.wav chan2.wav chan3.wav chan4.wav chan5.wav multi.wav
为了自动化这个过程,我整理了一个简短的Bash例程,用于生成具有交错测试音的多声道文件:
NUM=5 # Number of channels LEN=2 # Length of each test tone, in seconds OVL=0.5 # Overlap between test tones, in seconds # A one-channel base file containing simple white noise. # faded at both end with a quarter wave envelope to ensure # smooth equal power transitions sox -n -b 24 -c 1 out.wav synth $LEN whitenoise fade q $OVL -0 $OVL # Instead of white noise you can for example make a 1kHz tone # like this: # sox -n -b 24 -c 1 out.wav synth $LEN sine 1k fade q $OVL -0 $OVL # Or a sweep from 10Hz to 10kHz like this: # sox -n -b 24 -c 1 out.wav synth $LEN sine 10-10k fade q $OVL -0 $OVL # Produces a sequence of the number of seconds each channel # shall be padded with SEQ=$(for ((i=1; i<=NUM; i++)) do echo "$i 1 - [$LEN $OVL -]x * p" | dc # reverse-Polish arithmetic done) echo $SEQ # Padding the base file to various degrees and saving them separately for j in $SEQ do sox -c 1 out.wav outpad${j}.wav pad $j done # Finding the just-produced individual files FIL=$(ls | grep ^outpad) # Merging the individual files into a single multi-channel file sox -M $FIL multi.wav rm $FIL # removing the individual files # Producing a multi-channel waveform plot ffmpeg -i multi.wav -y -filter_complex "showwavespic=s=2400x900:split_channels=1" -frames:v 1 waveform.png # displaying the waveform plot open waveform.png
如波形图清晰显示,结果由一个包含五个通道的文件组成,每个通道具有相同的内容,只是在一段时间内移动:
更多关于使用
dc
反向波兰算法: http : //wiki.bash-hackers.org/howto/calculate-dc有关使用
ffmpeg
显示波形的更多信息: https : //trac.ffmpeg.org/wiki/WaveformIn Audacity you have to check the 'Use custom mix' radio button in the Import/Export section of the preferences. This will let you export multi-channel files, and manually assign tracks to channels.
Other than that, plain old .wav works fine for this.
But you can also use SoX to create the files in a more automated manner.
Manually you can combine (or 'merge' as it's referred to in the documentation) five distinct files into a single five-channel file like this:
sox -M chan1.wav chan2.wav chan3.wav chan4.wav chan5.wav multi.wav
To automate the process I put together a short Bash routine for producing a multichannel file with staggered test tones:
NUM=5 # Number of channels LEN=2 # Length of each test tone, in seconds OVL=0.5 # Overlap between test tones, in seconds # A one-channel base file containing simple white noise. # faded at both end with a quarter wave envelope to ensure # smooth equal power transitions sox -n -b 24 -c 1 out.wav synth $LEN whitenoise fade q $OVL -0 $OVL # Instead of white noise you can for example make a 1kHz tone # like this: # sox -n -b 24 -c 1 out.wav synth $LEN sine 1k fade q $OVL -0 $OVL # Or a sweep from 10Hz to 10kHz like this: # sox -n -b 24 -c 1 out.wav synth $LEN sine 10-10k fade q $OVL -0 $OVL # Produces a sequence of the number of seconds each channel # shall be padded with SEQ=$(for ((i=1; i<=NUM; i++)) do echo "$i 1 - [$LEN $OVL -]x * p" | dc # reverse-Polish arithmetic done) echo $SEQ # Padding the base file to various degrees and saving them separately for j in $SEQ do sox -c 1 out.wav outpad${j}.wav pad $j done # Finding the just-produced individual files FIL=$(ls | grep ^outpad) # Merging the individual files into a single multi-channel file sox -M $FIL multi.wav rm $FIL # removing the individual files # Producing a multi-channel waveform plot ffmpeg -i multi.wav -y -filter_complex "showwavespic=s=2400x900:split_channels=1" -frames:v 1 waveform.png # displaying the waveform plot open waveform.png
As the waveform plot clearly shows, the result consists of a file with five channels, each with the same content, just moved about some in time:
More on reverse-Polish arithmetic using
dc
: http://wiki.bash-hackers.org/howto/calculate-dcMore on displaying waveforms using
ffmpeg
: https://trac.ffmpeg.org/wiki/Waveform
相关问答
更多FFMpeg如何从wav / .w64中提取单个音频通道,并使用轨道标记插入.mxf(FFMpeg How to extract individual audio channels from wav/.w64 and insert in .mxf with track tags)
提取quicktime文件中的每个音频通道(extract every audio-channel in a quicktime file)
左右音频通道正在交换(Left and Right audio channels are exchanging)
在ffmpeg中平移所有音频频道中心?(Panning all audio channels center in ffmpeg?)
1个文件中的5个独立音频通道(5 individual audio channels in 1 file)
如何混合两个音频通道?(how to mix two audio channels?)
分离然后加入.wav立体声通道的混乱音频(Choppy audio from separating and then joining .wav stereo channels)
如果有多个通道,则合并,然后从音频文件中获取采样长度并将其保存到s3(If multiple channels, merge then take sample length from audio file and save it to s3)
如何使用外部音频接口访问具有核心音频的单个通道(How to access individual channels with core audio from an external audio interface)
将多个音频输出到各个声卡通道(Output multiple audio to individual sound card channels)
相关文章
更多英特尔推出Hadoop免费版本 布局BS时代
Cocos-html5 初识
nodejs与html代码分离
HTML转义标签
HTML 排版标记
在HTML中使用javascript
HTML meta标签
在html使用CSS的方式
最新问答
更多获取MVC 4使用的DisplayMode后缀(Get the DisplayMode Suffix being used by MVC 4)
如何通过引用返回对象?(How is returning an object by reference possible?)
矩阵如何存储在内存中?(How are matrices stored in memory?)
每个请求的Java新会话?(Java New Session For Each Request?)
css:浮动div中重叠的标题h1(css: overlapping headlines h1 in floated divs)
无论图像如何,Caffe预测同一类(Caffe predicts same class regardless of image)
xcode语法颜色编码解释?(xcode syntax color coding explained?)
在Access 2010 Runtime中使用Office 2000校对工具(Use Office 2000 proofing tools in Access 2010 Runtime)
从单独的Web主机将图像传输到服务器上(Getting images onto server from separate web host)
从旧版本复制文件并保留它们(旧/新版本)(Copy a file from old revision and keep both of them (old / new revision))
Copyright ©2023 peixunduo.com All Rights Reserved.粤ICP备14003112号
本站部分内容来源于互联网,仅供学习和参考使用,请莫用于商业用途。如有侵犯你的版权,请联系我们(neng862121861#163.com),本站将尽快处理。谢谢合作!