2006 年 07 月 10 日, 星期一
用oracle来背单词
看英文文档的时候,总是遇到很多生词,那些是重点记住,那些属于无所谓的呢?我的想法是把文档里所有单词入库,只出现一次的,就不用背的,记得记不得就无所谓了,从而过滤一大P单词。
本文使用oracle的sqlldr,utl_file和shell脚本实现,简单,容易实现。
1,把文档单词入库:
1)手工拷贝整个电子文档(比如PDF文档),放入一个文本文件,如a.txt(直接拷贝完接,不用做其他处理)。
2)执行脚本(脚本内容见后文) 命令入库单词:%sh read_word.sh 文本文件名 表名 数据库连接字符串
比如%sh read_word.sh d32643.txt abc bigboar/bigboar@db121
2,统计单词输出:
比如:
以出现频率排序:
select name,count(*) from abc group by name order by count(*) desc;
select name,count(*) from abc group by name order by count(*) asc;
先以字母顺序abcd.....xyz排序,在字母里再以出现频率排序
select name,count(*) from abc group by name order by name asc,count(*) desc;
过滤只出现一次的单词
select name,count(*) from abc group by name having count(*) > 1 order by count(*) desc;
select name,count(*) from abc group by name having count(*) > 1 order by count(*) asc;
select name,count(*) from abc group by name having count(*) > 1 order by name asc,count(*) desc;
由于名词复数,动词过去式等,并不能智能识别,我的感觉是先按字母排序,在字母里按出现频率排序,不对单词进行过滤这样比较好点。
3,把单词输出成文本
SQL>create or replace directory exp_dir as '/tmp';
SQL>grant read, write on directory exp_dir to bigboar;
然后执行下面这个存储过程,把单词输出到文本/tmp/abc.txt文件,执行完了,去看文件就可以了,到此,大功告成,把很熟练的单词删掉,剩下的就是自己可以背的单词了。
create or replace procedure test01
is
fhandle utl_file.file_type;
l_name varchar2(100);
l_cnt int;
cursor cur_name
is select name,count(*) as cnt
from abc
group by name
order by name asc,cnt desc;
begin
fhandle := utl_file.fopen('EXP_DIR', 'abc.txt', 'w');
open cur_name;
loop
fetch cur_name into l_name,l_cnt;
exit when cur_name%notfound;
utl_file.put_line(fhandle ,l_name||' '||l_cnt);
end loop;
utl_file.fclose(fhandle);
close cur_name;
end;
[root@bigboar bigboar]# cat read_word.sh
#!/bin/sh
#
# usage: %sh read_word.sh filename tabname dbconn
#
ffile="$1"
tname=$2
dbconn_str=$3
sed 's/[0-9,%@:.{<()>}_?"''=]/ /g' $ffile > .${ffile}.temp
> .${ffile}.format
for I in `cat .${ffile}.temp`
do
echo $I >> .${ffile}.format
done
lfname=.${ffile}.format
echo "load data" > .${ffile}.ctl
echo "infile '$lfname'" >> .${ffile}.ctl
echo "append into table temp_${tname}" >> .${ffile}.ctl
echo "fields terminated by ''" >> .${ffile}.ctl
echo "(name)" >> .${ffile}.ctl
sqlplus $dbconn_str <
drop table $tname;
create table temp_${tname} (name varchar2(100));
!sqlldr $dbconn_str control=.${ffile}.ctl
commit;
delete from temp_${tname}
where
name like '%[%'
or name like '%]%'
or name like '%-%'
or name like '%?%' escape '?';
create table $tname as select lower(name) name from temp_${tname};
drop table temp_${tname};
EOF
rm .${ffile}.format
rm .${ffile}.ctl
rm .${ffile}.temp
rm .${ffile}
[root@bigboar bigboar]#