Rough Guide to rural data collection with ODK
This post has three purposes, which I think overlap sufficiently to combine them:
- A User Guide for the system that we developed for UNICEF, IDS and RuralNet Zambia
- A Developers' Guide for anyone wishing to build something similar
- Notes on lessons learned that may assist future implementers
Project goalsAutomate the data entry part of a long paper-based survey, by replacing the paper forms with electronic devices.
Hardware and application selectionThe survey has several long and complex questions, and long sets of multiple-choice answers. The data collection needs to be done in dusty rural Zambia, and the devices might need to be used for a full day without power. Collected data should be sent wirelessly to a secure data repository at some time after collection.
Text entry is required for many fields. That means either a real keyboard with keys, or a sufficiently large touch screen to type comfortably on. Use of the device camera, and presentation of reports and graphs on the same device, might be required in future.
Two possible hardware platforms were identified:
- Tablet laptops with touch screens
- Tablet mobile devices (iPad or Android tablet)
The available software options that we identified were:
- EpiSurveyor (Java J2ME, partly closed source, we have used before and fixed bugs)
- OpenXdata (Java J2ME, open source, developed and supported by an Aptivate alumnus among others)
- Open Data Kit (ODK) (Android, open source, active community)
- Bespoke online/offline survey in HTML5
We chose ODK over a bespoke system due to limited time available for development, and ability to easily take photos and record GPS coordinates using the device's hardware.
Of the available Android tablet devices, we chose the Samsung Galaxy Tab for the pilot project, due to its high quality construction. For future projects we would probably use a lower cost device; see the lessons learned for details.
Form creationSince the survey is quite long (about 230 questions) we wanted an easy way to enter the questions. The ODK application requires the form to be in XForms format. We identified the following tools for creating XForms:
- More visually appealing
- All available options presented visually (types of controls, groups, etc.)
- Less likely to make a mistake and produce an invalid form
- Cumbersome user interface slows down data entry
You can download the conversion tools, and the Excel spreadsheet with the completed questionnaire as we delivered it to RuralNet, here. RuralNet staff, please use the latest version of the spreadsheet that you can find locally. To use the tools, you will need to download and install Python 2.7 and Java (JRE). Then download the tools as a ZIP file and extract it somewhere. I recommend that you keep the master copy of the spreadsheet in Dropbox to ensure that it's backed up, and it's always clear what the latest version is.
For help in building surveys using XLS2XForm, please see the documentation. In addition to the question types listed there, we have used the following shortcuts, which also work in this customised version of XLS2XForm:
textis short for
add text prompt(a text field, such as a person's name)
noteis short for
add note prompt(a read-only field, providing additional information for the user)
timeis a time field without a date (for example, survey start and end times)
build_and_validate.pyscript by double-clicking on it. If it works, it will show the message "Success!", otherwise it will show an error message, usually caused by a mistake in the Excel spreadsheet. If it works, it will create (replace) the file called
zambia-ranq-round3.xmlin the same directory. If your spreadsheet has a different name, you can create a shortcut to call
build_and_validate_custom.pywith the name of the spreadsheet on the command line.
Software componentsODK Aggregate is the software that powers the Internet server. It is a repository for blank forms (designs) and completed forms (data). Our server is located at http://partimob.appspot.com/. This server is currently paid for by us, and will need to transfer to RuralNet at some point.
ODK Collect is the application runs on the device, and users interact with it to complete the survey. It's essentially a user interface for XForms. It can download blank forms (designs) from an ODK Aggregate server, and upload completed forms (data) to the Aggregate server as well.
ODK Briefcase is the software that downloads completed forms (data) from the Aggregate server and convert them into CSV (spreadsheet) format, which can be loaded into
Customised ODK CollectWe are using a custom version of ODK Collect. You can download the source code for it here, or the compiled application here. You can also find it in the ZIP file download. If you prefer, you can use the latest official version of ODK Collect. The two are compatible, but our version adds the following useful features:
- Use supplied login and password by default to save a round trip and a prompt.
- Add keyboard navigation, useful for form filling on android-x86 because the mouse interface is pretty clunky.
- Restore ability to modify completed and submitted forms on the device, which was removed from the official version in 1.1.7.
- Improved error messages and progress indication during form uploads.
- Allow setting the instance name on the first page of the survey.
- Allow saving incomplete surveys on required questions (in case a survey is interrupted; almost all of our questions are required).
- Download it from the Android Market (official version only, not our customised version)
- Copy the APK file onto a microSD card, insert the card into the device, and use the My Files application find and open it from the SD card.
- Attach the USB cable from the device to a computer, enable mass storage mode on the device, and on the computer, drag and drop the APK file onto the device's internal memory, then use the My Files application to find and open it.
- Attach the USB cable from the device to a computer, and use ADB's
installcommand to install the APK file.
It's also useful to remove all the other junk from the desktop. For each icon and widget on the desktop, press and hold it with your finger for a few seconds, until the trashcan icon appears, then drag your finger to the trashcan and release it there.
Form management on the device
There are several ways to put blank forms (designs) onto the tablets:
- Download them from the ODK Aggregate server using ODK Collect.
- Copy them onto a microSD card, insert the card into the device, and use the My Files application to copy them from the SD card to the /sdcard/odk/forms directory.
- Attach the USB cable from the device to a computer, enable mass storage mode on the device, and on the computer, drag and drop the form into the /sdcard/odk/forms directory.
- Attach the USB cable from the device to a computer, and use ADB or DDMS to push the file onto the device, into the /sdcard/odk/forms directory.
Similarly there are several ways to copy completed forms (data) off the device:
- Upload them to the ODK Aggregate server using ODK Collect.
- Use the My Files application to copy them from /sdcard/odk/instances to a microSD card, then remove the card and connect it to the computer, and drop the files into the ODK Briefcase data directory.
- Attach the USB cable from the device to a computer, enable mass storage mode on the device, and on the computer, drag and drop the files from the /sdcard/odk/instances directory to the ODK Briefcase data directory.
- Attach the USB cable from the device to a computer, and use ADB or DDMS to pull the file from the device's /sdcard/odk/instances directory to the ODK Briefcase data directory.
Since the Aggregate server is on the Internet, this method requires that the device have Internet access. So it either needs a valid SIM card installed with credit and a data bundle, or a WiFi network connected. We had many problems with using SIM cards for data, so WiFi is preferred if possible.
The directories mentioned above will not exist until ODK Collect is installed on the device and run for the first time. Forms downloaded from the Aggregate server will also be placed in the /sdcard/odk/forms directory. Forms completed on the device will be placed in the /sdcard/odk/instances directory.
Configuring ODK CollectCollect needs to know the details of the ODK Aggregate server to log into it, download blank forms and upload completed forms.
Open the ODK Collect application, press the Settings button and click on Change Settings. Click on URL and enter https://partimob.appspot.com. Similarly, complete the Username and Password using the details that you've been given by the Aggregate server operator, or the account that you've created on the Aggregate server. This account should only have Data Collector permissions, no more. Press the Back key to get back to the main menu of ODK Collect.
Downloading forms using ODK CollectOpen ODK Collect on the device, and click on the Get Blank Form button. Collect will try to log into the Aggregate server using the details that you've provided, and get a list of forms on the server that have the Downloadable box ticked. This is on by default for newly uploaded forms.
Tick the box next to all the forms that you want to download, and click on the Get Selected button.
Filling forms on the deviceOpen ODK Collect on the device, and click on the Fill Blank Form button. All the forms in the device's /sdcard/odk/forms directory should be listed. Choose the form that you want to complete.
You will see an introductory screen showing how to move between questions by swiping your finger across the screen, from right to left or left to right. This screen has a text box at the bottom, which you can use to name the form that you're completing. Naming forms is useful if your data collection is interrupted and you need to resume it later. It's much easier to identify the form using its name, rather than opening it and flicking through to find some identifying information. You might name the form based on the household code that you're surveying.
Depending on your answers to some questions, others may be hidden, or their text might change.
At the end of the form there is another chance to Name this form, and a tickbox to Mark form as finalized. Before you can upload the form to the Aggregate server, this box must be ticked, and you must press the Save Form and Exit button. Otherwise Collect will consider that the form is incomplete.
Sending completed forms to AggregateOpen ODK Collect on the device, and click on the Send Finalized Form button on the main menu. Tick the box next to all the forms that you want to upload to Aggregate, and click on Send Selected. After the upload is complete, you should see the Upload Results message. Every form should have "Success" next to it, otherwise it was not sent successfully.
Downloading forms using BriefcaseWe are using a customised version of ODK Briefcase with the following changes:
- Fix the export of repeated groups, which before only worked for the first row (issue 461).
- Shorten exported column names, to allow the CSV file to be imported into Access.
- Allow the server name, username and password to be provided on the command line (or via a shortcut).
To download the completed forms, open Briefcase by double-clicking on the
briefcase-1.0-jar-with-dependencies.jar file. On the Transfer tab, click on the Connect button. For the URL, enter
https://partimob.appspot.com, and for the user name and password, give the details of an ODK Aggregate account with Data Viewer permissions.
Then you should see a list of forms appear under the heading Forms to Transfer. Tick the box next to the one that your users have been completing, and then click on the Transfer button. If you do this after all the completed forms (data) have been submitted to the ODK Aggregate server, you will not need to do it again for that form template (design).
Now switch to the Transform tab and see if the form appears in the Form list. If it doesn't, then exit and restart the Briefcase application (issue 464).
For Output Type, choose .csv and media files. For Output Directory, choose the directory where you'd like to save the CSV files. Note that any previous files exported to that directory from the same form will be overwritten without warning, even if they have been modified (cleaned). Click on the Output button to write the CSV files.
Cleaning data in ExcelYou can find the Excel spreadsheet that we use for data storage and cleaning here. Note that Excel is a long way from the best way to store and manipulate data like this. Microsoft Access would be far more appropriate. Yet again I wish there was a sufficiently powerful open source alternative desktop application to Access, allowing ordinary people (not developers) to develop and maintain their databases themselves.
Because the spreadsheet contains cleaned data, which is "better" than the raw data which is included in the CSV export, we don't want to overwrite existing rows. For the main section of the questionnaire (the so-called Single Responses) you can include only the new data like this:
- Open the main spreadsheet and switch to the Single Responses tab
- Highlight all rows from 3 down to the bottom, and Sort them by the SubmissionDate column.
- Note the last submission date on this spreadsheet.
- Open the newly exported CSV file for the single responses (something like RANQ-2011-Round-4-v5.csv).
- Sort this file by the SubmissionDate column as well.
- Highlight and copy all the rows whose submission date is later (more recent) than the last one in the main spreadsheet.
- Paste them at the bottom of the Single Responses tab of the main spreadsheet, below the other data.
You can then check and clean the data by viewing and modifying it in Excel. Note that each sheet has one or two columns at the end, which are filled by formulae that look up values from the Single Responses sheet, such as the Household Code.
Using the Android x86 EmulatorTo be written.
The actual aim of the project was never clear, because all the stakeholders wanted different things. But if it was to help our partners work more efficiently, then we could have attacked other parts of the process that would have yielded bigger improvements more quickly, such as helping with the output processing (data analysis and report writing) rather than data collection. If the project had incorporated a systems analyst early on, we would have identified and addressed these needs better.
Nobody wanted to be the product owner, to take responsibility for setting priorities for the project. We normally refuse to do this ourselves, because we see our role as assisting someone else to achieve their goals, and that person will need to maintain ownership of the project after our work is done. But in this case we had no choice but to become the product owners, because we couldn't function without one.
We found that workflow elicitation was useful for generating user stories in the agile sense, but we did not have a clear map of the workflow during the development process.
During development we collectively discounted the need for data cleaning, because we thought incorrectly that all the errors were introduced during transcription. Data cleaning might be less necessary if we had collected more data, since the errors would tend to average out, but it was still essential, and not planned for, and there was no workflow to make it happen, so we had to hack something together in the field.
We had difficulty actually purchasing equipment in country. Samsung Zambia could not accept credit card payments over the phone. In the end, we had to bring large amounts of cash into Zambia and change it locally. This resulted in late and risky procurement of the equipment.
We received disappointing service from the retailer of the tablets, including repeated failure to deliver the tablets to their shop for purchase despite prior agreements and assurances, and supplying us with devices in unsealed boxes. We suspect that some of the tablets had been used as demonstration models in shops, or returned by other customers.
The supplier's warranty was only one week, after which we would have to return the tablets to Samsung in China.
In future we would:
- Ensure equipment is new (in sealed boxes)
- Have a trusted in-country agent pick up and test the equipment -- this agent must take responsibility for functioning goods
- Ensure that equipment is in warranty, and if it fails tests it must be returned
We expected that Samsung would supply high quality, reliable tablets at a high price. We were disappointed with their reliability and performance. We could have spent much less money on the tablets for a similar level of performance, and had more spares.
We had only a limited number of hardware devices, and our user experience developer did not even have access to an Android phone or an emulator.
Two tablets had hardware problems: short battery life and failure to connect to the mobile network. These were discovered in the field, too late to return the devices to the supplier for replacement.
We had reports from enumerators that the touch screen became less sensitive as the battery drained.
Most tablets failed to submit data over the wireless network at all. We had many problems with data communications in Mufulira, which we expected, but we thought we would at least be able to submit small amounts of data wirelessly, and this turned out not to work.
The survey instrument itself was unclear, complex, and not designed for electronic use. The wording of the questions was unclear and we had many arguments over it. Some questions would have benefited from having a calculator embedded, to help with subtraction and conversion of units, such as calculating change in land area owned by a household.
We had many problems with the form logic, in particular skipping questions depending on the answers to previous questions. This was very difficult to get right, and took us a long time, especially when using XLS2XForm to input the survey. It would have been easier if we could have visualised the flows through the form. We suspect that we did not have time to provide enough training that the local staff would be comfortable maintaining the survey in the XLS2XForm spreadsheet by themselves.
We had problems with ODK Collect crashing many times while enumerators were entering data. This always resulted in loss of the completed survey, unlike paper forms. We debugged and patched several bugs in Collect in the field. This required us to have an Android and ODK development environment already set up, because there was no way to download that software in Mufulira. At least we were able to fix the bugs, as this was open source software; some other products would have left us powerless.
Enumerators reported that Collect would sometimes register a different response than intended. Perhaps touching the screen in the wrong place, between questions, or an unclear/lazy touch, might activate the wrong answer. I noticed several times that Collect did not register an answer at all, and I found myself repeatedly jabbing at the screen in frustration. I put this down to hardware problems with the tablets.
For some questions, a grid with three columns (Question, Yes and No checkboxes) would have been a faster way to enter data than repeatedly touching an answer and then dragging left to flip pages. This was not a component that we had available to us in Collect.
We had intended to load the survey output into Microsoft Access for data processing and storage for future use. However we were not able to make the data load into Access successfully. ODK Briefcase outputs CSV files with field names (column headings) too long for Access to import. Importing 20 separate CSV files into Access was very painful. I strongly recommend building much better integration between ODK and Access in future. We have an open question as to whether Access is the best data storage and management solution, but it would fit the needs and fit within the comfort zone of the project members who would be carrying this process forward in future.
We identified the need to check that we could import multiple spreadsheets into SPSS during the development phase, but we ran out of time and did not actually complete this task. When we got around to processing the data, we discovered that it was not possible, which seriously disrupted our assumptions and plans for the data analysis.
In addition, our backup plan to process the data and produce graphs using Google Fusion Tables, also failed because Aggregate was unable to export the data successfully to Fusion Tables, and editing (cleaning) the data online turned out to be too difficult (complex, inconvenient and awkward).
All the enumerators reported they would prefer to use tablets than paper forms in future.
We also realised that interviewing people can create social change by encouraging them to think in new directions, and question previous assumptions about how things worked. Being listened to is also empowering.
One of our enumerators took photos of the people he interviewed, using the tablet camera, without even being asked! Well done that man. I think the photos were one of the most valuable outputs from the project, more than the data collected. Politicians, all of us can relate to photos, they tell a much more powerful and personal story than numbers.
I would really like to see technology used to empower local people to reach out to their politicians and hold them to account.