Motivation
I (34 yo) have a femoroacetabular impingement. It may affect the hip joint in young and middle-aged adults and occurs when the ball shaped femoral head rubs abnormally or does not permit a normal range of motion in the acetabular socket. Wikipedia.
The only long term solution is surgery. And like everyone else, I’d like the best doctor to “open” my hip. The problem is, it’s hard to find doctor reviews from real people even if you try to search in Google, and when you find a forum where people talk about the operations, it’s even harder to find specific information.
Because of that, I decided to create a small scraper that will search for every comment where a doctor appears and it retrieves a json or csv file to read about it.
Overview
I have open sourced it (in a very alpha version), making it possible to add more resources of others illnesses or syndromes. Right now it only search for a given Spanish doctor for femoroacetabular impingement.
Are you ready to operate, Doctor? - I’d love to, but first I have to perform surgery.
- Free software: MIT license
- Documentation: https://zoidberg.readthedocs.io..
The goal of Zoidberg is to help people find useful information of a surgeon. So the information you should provide to Zoidberg is:
- Country (i.e. ES for Spain)
- Doctor (i.e. Margalet)
- Area (i.e. traumatologia)
- Illness (i.e. femoroacetabular)
- output (csv or json file)
Technology
I used Python 3, although (but not tested) it will probably work in python 2 also, and Scrapy, an open source scraper framework. I also use cookiecutters CL utility to create a Python package. There are plenty of templates for every type of project, like standard Python (the one I used), Django … Also a good resource is the Jeff Knupp guide, open sourcing a python project the right way.
How to install it
or clone from:
How to use it
From a project:
Or from CLI (Zoidberg will ask you several questions in order to find the doctor reviews):
How it works
Zoidberg project is organized in the following tree:
.zoidberg.py
is the file that holds Zoidberg class that runs aCrawlerRunner
with the user parameters (country, doctor, area, illness, output and path)..scraper/
a Scrapy project named scraper..scraper/db
holds the country jsons with the db of the Internet forums (domains) and the urls for each illness..scraper/pipelines.py
cleans the information scraped by the spider and writes it in the output file selected..scraper/settings.py
default settings of the scrapy spider and AUTOTHROTTLE_ENABLED = True for responsible scraping..scraper/spiders/
hold the spiders organized by country. All spiders inherit fromZoidbergSpider
.
Next steps
Lots of work still to do…
- CLI: Change argparse to click.pocoo.org.
- Get a list of countries available.
- Get a list of areas available for a country.
- Get a list of illnesses available for an area.
- Add keywords of illnesses for search.
- Search a doctor for every area or illness.
- Comprehensive tests
Conclusions
Despite it working, Zoidberg is still a child lobster scraper that need a lot of improvement, although it works perfectly for the first purpose it was conceived for; search a good doctor for my hip arthroscopy.
Margalet is supposed to be the best femoracetabular specialist in Spain, but after using Zoidberg, I realized that lots of people are complaining about his new technique called “out inside”. No complaints about Pérez Carro, my new hip doctor.
If you understand Spanish, you can see the differences Zoidberg found between the doctors: