diskprices.org html to csv converter
Quick and dirty shell script to extract and convert data from diskprices.org.
Note: data are not clean, not all fields are corrects.
Keys
Affiliate_Link
Capacity
Condition
Date
Form_Factor
Name
Price
Price_per_GB
Price_per_TB
Tech
Technology
Type
Warranty
Date
ls *.csv \
| sed s/.csv// \
| xargs -I%i sh -c 'sed -i -Ee "1s/(.*)/Date;\1/" -e "2,\$s/(.*)/%i;\1/" %i.csv'
JSON
Quick and dirty way to convert csv to json
ls *.csv \
| grep -v full \
| sed s/.csv// \
| xargs -I%i -P8 sh -c 'cat %i.csv | perl ../to_json.pl > ../us.json/%i.json'
Cronjob
A cronjob can also be installed to snapshot this website every day.
# archive diskprices
0 1 * * * ${HOME}/bin/archive.sh https://diskprices.com > /dev/null
0 2 * * * ${HOME}/bin/archive.sh https://diskprices.com/?locale=uk > /dev/null
0 3 * * * ${HOME}/bin/archive.sh https://diskprices.com/?locale=de > /dev/null
0 4 * * * ${HOME}/bin/archive.sh https://diskprices.com/?locale=es > /dev/null
0 5 * * * ${HOME}/bin/archive.sh https://diskprices.com/?locale=fr > /dev/null
0 6 * * * ${HOME}/bin/archive.sh https://diskprices.com/?locale=ca > /dev/null
0 7 * * * ${HOME}/bin/archive.sh https://diskprices.com/?locale=au > /dev/null
0 8 * * * ${HOME}/bin/archive.sh https://diskprices.com/?locale=in > /dev/null
0 9 * * * ${HOME}/bin/archive.sh https://diskprices.com/?locale=se > /dev/null
Here the script made on OpenBSD (replace jot with seq).
#!/bin/sh
######################################################################
# dirty way to archive website
######################################################################
if [ -z "${SLEEP}" ]
then
sleep $(jot -r 1 3600)
fi
curl -vvL https://web.archive.org/save/${1}
Description
Languages
Python
64.6%
Shell
25.4%
Perl
8.4%
Makefile
1.6%