I’ll continue to add code, scripts, data, links, or other useful items for researchers to this page over time. Code and programs are provided as-is. Always back up your data.

Weibo Data Collection

If you’re interested in using the Weibo API to collect your own data, see the documentation (Chinese).

For the actual script which I have relied on to collect my data sample, you can view the GWU Gelman Library Scholarly Technology Group’s relevant Github page here. As that page notes, you will need your own account, API key, etc. from Weibo. Thanks again to Dan Chudnov with the STG for his help on this project.

PDF Extraction Routine

This is a small Python script which I wrote for handling text in PDF files. It’s designed to be run from a Linux command line. It will automatically extract the raw text from any PDF files in the target directory and save their content into new, individual .txt files, but can also be used to combine multiple files’ text into a single, large new text file. I use this to convert media articles in bulk so that I can analyze them using other software.

Please see instructions within the file itself for usage and options. Download by clicking here.

What I Use

In the course of grad school, I’ve found a set of software applications that have been critical to my research work. Below are a few recommendations based on my own experience, presented in the hope of saving others time and effort. Everything’s free unless otherwise noted.

  • Note-taking: Microsoft OneNote
    This excellent cross-platform program lets you take notes free-form. Advantages include an extremely flexible system for manual organization, search with various scopes, and more advanced things like searching within audio voice recordings or handwriting recognition for tablets. Absolutely essential.
  • Reference Management: Zotero
    There are many reference management products. In my opinion, Zotero is the easiest for general use. You can use it in the browser or as a regular desktop program. The desktop client syncs using a free account (only if you want to), keeping all of your computers up to date.
  • Text Manipulation: Notepad++
    Working with text data can call for some specialized tools. Have you ever tried to open a file with hundreds of thousands of lines? Wanted to find and replace across multiple documents simultaneously? Notepad++ does that and a lot more while remaining very lightweight.
  • Scripting Language: Python
    For smaller jobs, it just doesn’t get much easier than Python. I’m hardly an expert, but invest a few days with a solid tutorial or online class and you’ll be coding up all sorts of handy things in no time. A great place to start if you’re totally new to programming or computer-aided analysis.