This repository contains the trained Random Forest (RF) model for predicting the digital literacy of individuals using the best 7-item survey module (referred to as platform-neutral module) described in the paper, "Validated Digital Literacy Measures for Populations with Low Levels of Internet Experiences," The Journal of Engineering in Economic Development (Dev Eng), 2023
- Paper Authors: Dr. Ayesha Ali, Dr. Agha Ali Raza, and Dr. Ihsan Ayyub Qazi (LUMS)
- R code: This code provides a trained Random Forest (RF) model in the R language
- Note: For any questions or comments, please email ihsan.qazi@lums.edu.pk
The repository contains the following files:
- "DL_model.rmd": R file for making predictions from the trained model. It has contains an example.
- "rf_model.rds": Trained RF model
The 7 items/questions in the survey module and their response options are as follows:
- Are you able to search/google things online? [Response Options: Yes(1); No(0)]
How familiar are you with the following computer and Internet-related items? Please choose a number between 1 and 5, where 1 represents no understanding and 5 represents full understanding of the item:
- Internet [Response Options: 1-5]
- Browser [Response Options: 1-5]
- PDF [Response Options: 1-5]
- Bookmark [Response Options: 1-5]
- URL [Response Options: 1-5]
- Torrent [Response Options: 1-5]
Model Card
- Input: one or more observations, where each observation correponds to responses to the 7 questions above
- Input Order: (term_pdf, term_internet, term_browser, term_bookmark, term_url, search, term_torrent)
- Example input: (3, 5, 4, 2, 3, 1, 2)
- Output: for each observation the model predicts a digital literacy score between 0 and 1
- Model: A random forest model trained using 100,000 trees. We used the randomForest library in R and employed default values for other hyperparameters.
- Model Performance: R^2 over OOB samples was 0.8 and MSE was 0.019
- Data: The model was trained over a sample of 143 individuals from Pakistan with different levels of digital literacy (please refer to the paper for a detailed description of the model)
- Suitability: The model is best suited for populations comprising a high proportion of individuals with low levels of digital literacy.