Juan M. Banda

I'm a

About Me

Main information about me

I'm Juan M. Banda and I am a

Hello, my name is Juan M. Banda and I am currently an assistant professor of computer science at Georgia State University. In my research lab, Panacea Lab, we aim to build machine learning, computer vision, and NLP methods that help to generate insights from multi-modal large-scale data sources. With applications to precision medicine, medical informatics, astroinformatics and other domains, our work addresses domain-specific problems with data science methods and practices. As an engineer at heart and practice for the last 20 years, I have used Python, Bash, ontologies, and NLP tools to build pipelines to annotate over 68 million clinical notes. I have built custom ETLs to map over 8 million patient electronic health records, from 4 institutions, to common data models (OMOP) for large scale analytics and machine learning purposes. I have designed pipelines, databases, and processes to build research infrastructure for my current and previous labs. I have used R, SQL, Matlab, Perl, Java, Javascript, and other languages to acquire, clean and operationalize data from multiple sources. I have mined over 9 billion Tweets for NLP tasks to gain insights from them. In my earlier days, I built content-based image retrieval systems for NASA’s SDO mission, with capacity to process and index over 40,000 images daily, and provide computer vision-aided similarity search for images. I started my engineering days designing and developing point-of-sale systems written in Visual Basic. Apart from my technical skills, I have strong communication and writing skills (over 70 refereed publications) and management skills (I have managed over 40 employees and 23 students). With the desire of improving patient outcomes, medical care and building things that change people’s lives, I am committed to releasing all my work via open-source licenses following the FAIR data sharing principles. I am an active collaborator of the Observational Health Data Sciences and Informatics and my work has been funded by the Department of Veteran Affairs, National Institute on Aging as well as NASA, NSF, and NIH.

Technical Skills

Some of the skills here are grouped by areas for brevity. As a generalist, I have experience with many programming languages and tools, from things like COBOL to Business Objects.

Databases (MySQL, SQLServer, PostgreSQL, Oracle, MongoDB) - 17 years of experience95%
Data Science (NLP pipelines, Visualization, Reporting) - 16 years of experience92%
Data Engineering (Custom ETLs, Data Pipelines, Bash, Pandas, SQL) - 14 years of experience80%
Machine Learning (Tensorflow, Scikit-learn, Theano, caret, WEKA) - 12 years of experience80%
Scientific Programming Languages (R, Matlab, Maple) - 12 years of experience75%
General Programming Langauges (Python, Java, Visual Basic) - 12 years of experience75%
Cloud Environments (AWS, Azure, BigQuery) - 8 years of experience65%
  • 0

    Lines of Code
  • 0

    Hackathons Participated In
  • 0+

    Peer-reviewed Publications

In the News

My work has been mentioned in the following outlets


List of peer-reviewed publications:


  1. A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research - An International Collaboration, JM Banda, R Tekumalla, G Wang, J Yu, T Liu, Y Ding, E Artemova, E Tutubalina, G Chowell, Epidemiologia, 2021, 2, pp. 315-324
  2. A Biomedically oriented automatically annotated Twitter COVID-19 Dataset, LRA Hernandez, TJ Callahan, & JM Banda Genomics & Informatics 2021;19(3):e21
  3. Negative Perception of the COVID-19 Pandemic Is Dropping: Evidence From Twitter Posts, AN Vargas, A Maier, MBR Vallim, JM Banda, VM Preciado, Front. Psychol. 12 (2021) 4067
  4. Pulse of the Pandemic: Iterative Topic Filtering for Clinical Information Extraction from Social Media, J Wu, V Sivaraman, D Kumar, JM Banda, D Sontag J. Biomed. Inform. (2021) 103844
  5. Transmission dynamics and forecasts of the COVID-19 pandemic in Mexico, March-December 2020, A Tariq, JM Banda, P Skums, S Dahal, C Castillo-Garsow, B Espinoza, et al., PLoS ONE, 16(7): e0254826, 2021
  6. Changes in Public Response Associated With Various COVID-19 Restrictions in Ontario, Canada: Observational Infoveillance Study Using Social Media Time Series Data, A Chum, A Nielsen, Z Bellows, E Farrell, P Durette, JM Banda, G Cupchik, J Med Internet Res, 2021;23(8):e28716.
  7. Characterizing all-cause excess mortality patterns during COVID-19 pandemic in Mexico, S Dahal, JM Banda, AI Bento, et al. BMC Infect Dis. 21, 432 (2021)
  8. ACE: the Advanced Cohort Engine for searching longitudinal patient records, A Callahan, V Polony, JD Posada, JM Banda, S Gombar, NH Shah, Journal of the American Medical Informatics Association, Volume 28, Issue 7, July 2021, Pages 1468-1479.
  9. A Minimal Information Model for Potential Drug-Drug Interactions, H Hochheiser, X Jing, EA Garcia, S Ayvaz, R Sahay, M Dumontier, JM Banda, O Beya, M Brochhausen, E Draper, S Habiel, O Hassanzadeh, M Herrero-Zazo, B Hocum, J Horn, B LeBaron, DC Malone, O Nytro, T Reese, K Romagnoli, J Schneider, LY, Zhang, and RD Boyce Front. Pharmacol. 08 March 2021.
  10. Normalizing Clinical Document Titles to LOINC Document Ontology: an Initial Study, X Zuo, J Li, B Zhao, Y Zhou, X Dong, J Duke, K Natarajan, G Hripcsak, NH Shah, JM Banda, R Reeves, T Miller, H Xu, AMIA Annu Symp Proc. 2021, Jan 25;2021:1441-1450.
  11. An ensemble approach for classification and extraction of drug mentions in Tweets, LAR Hernandez, RC Srinivasa, JM Banda, Proceedings of the BioCreative VII Challenge Evaluation Workshop pp. 221-226. 2021
  12. An Enhanced Approach to Identify and Extract Medication Mentions in Tweets via Weak Supervision, R Tekumalla and JM Banda Proceedings of the BioCreative VII Challenge Evaluation Workshop pp. 201-206. 2021
  13. A Pharmacovigilance Application of Social Media Mining: An Ensemble Approach for Automated Classification and Extraction of Drug Mentions in Tweets LAR Hernandez, RC Srinivasa, JM Banda, NeurIPS 2021 Workshop LatinX in AI,   2021.
  14. Overview of the sixth social media mining for health applications (#SMM4H) shared tasks at NAACL 2021, A Magge, A Klein, A Miranda-Escalada, M Ali Al-Garadi, I. Alimova, Z Miftahutdinov, E Farre, S Lima Lopez, I Flores, K O'Connor, D Weissenbacher, E Tutubalina, A Sarker, JM Banda, M Krallinger, G Gonzalez-Hernandez Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task Association for Computational Linguistics, Stroudsburg, PA, USA, 2021.
  15. Characterization of Anonymous Physician Perspectives on COVID-19 Using Social Media Data, KJ Sullivan, M Burden, A Keniston, JM Banda, LE Hunter,Biocomputing 2021, pp. 95-106 (2021)
  16. 2020

  17. Risk of depression, suicide and psychosis with hydroxychloroquine treatment for rheumatoid arthritis: a multinational network cohort study,JCE Lane, J Weaver, K Kostka, T Duarte-Salles, MTF Abrahao, H Alghoul, O Alser, ..., JM Banda, et al. Rheumatology,   December 2020.
  18. Characterizing drug mentions in COVID-19 Twitter Chatter, R Tekumalla and JM Banda Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020
  19. Deep phenotyping of 34,128 adult patients hospitalised with COVID-19 in an international network study, E Burn, SC You, AG Sena, K Kostka, H Abedtash, MTF Abrahao, A Alberga, H Alghoul, O Alser, TM Alshammari, M Aragon, C Areia, JM Banda, et al. Nature Communications 11, 5009,  2020.
  20. Risk of hydroxychloroquine alone and in combination with azithromycin in the treatment of rheumatoid arthritis: a multinational, retrospective study,JCE Lane, J Weaver, K Kostka, T Duarte-Salles, MTF Abrahao, H Alghoul, O Alser, ..., JM Banda, et al.The Lancet Rheumatology,   August 2020.
  21. Clinical decision support tool for phototherapy initiation in preterm infants, Y Arain, JM Banda, J Faulkenberry, VK Bhutani, J Palma Journal of Perinatology,  2020.
  22. Mining Archive.org Twitter Stream Grab for Pharmacovigilance Research Gold, R Tekumalla, JR Asl, and JM Banda Proceedings of the International AAAI Conference on Web and Social Media , 14, (1), pages 909-917, 2020.
  23. Social Media Mining Toolkit (SMMT), R Tekumalla and JM Banda. Genomics & Informatics, 18, (2), 2020.
  24. Development and Validation of Phenotype Classifiers across Multiple Sites in the Observational Health Sciences and Informatics (OHDSI) Network, M Kashyap, M Seneviratne, JM Banda, T Falconer, B Ryu, S Yoo, et. al. Journal of the American Medical Informatics Association, 27, (6), pages 877-883, 2020.
  25. Ten simple rules to run a successful BioHackathon, L Garcia, E Antezana, A Garcia, E Bolton, R Jimenez, P Prins, JM Banda, T. Katayama. PLOS Computational Biology, 16, (5), 2020.
  26. 2019

  27. Finding missed cases of familial hypercholesterolemia in health systems using machine learning, JM Banda, A Sarraju, F Abbasi, J Parizo, M Pariani, H Ison, E Briskin, et. al. npj Digital Medicine, 2, (1), pages 23-23, 2019.
  28. Precision screening for familial hypercholesterolaemia: a machine learning study applied to electronic health encounter data, KD Myers, JW Knowles, D Staszak, MD Shapiro, W Howard, M Yadava,... JM Banda, et. al. The Lancet Digital Health, 1, (8), pages 393-402, 2019.
  29. Assessing the potential impact of vector-borne disease transmission following heavy rainfall events: a mathematical framework, G Chowell, K Mizumoto, JM Banda, S Poccia, and C Perrings. Philosophical Transactions of the Royal Society B, 374, (1775), 2019.
  30. Fully connecting the Observational Health Data Science and Informatics (OHDSI) initiative with the world of linked open data, JM Banda. Genomics & Informatics, 17, (2), 2019.
  31. Extending Achilles Heel Data Quality Tool with New Rules Informed by Multi-Site Data Quality Comparison, V Huser, X Li, Z Zhang, S Jung, RW Park, JM Banda, H Razzaghi, A Londhe, et. al. Studies in health technology and informatics, 264, pages 1488-1489, 2019.
  32. Web-and ontology-based annotation and retrieval of geoscience photomicrographs with ontologies and machine learning, HA Babaie, A Davarpanah, and JM Banda. AGU Fall Meeting 2019, IN32B, 2019.
  33. Solar Event Tracking with Deep Regression Networks: A Proof of Concept Evaluation, TT Sarker, and JM Banda.2019 IEEE International Conference on Big Data (Big Data), pages 4942-4949, 2019.
  34. 2018

  35. Advances in electronic phenotyping: from rule-based definitions to machine learning models,JM Banda, M Seneviratne, T Hernandez-Boussard, and NH Shah Annual review of biomedical data science, 1, pages 53-68, 2018.
  36. Association of Hemoglobin A1c Levels With Use of Sulfonylureas, Dipeptidyl Peptidase 4 Inhibitors, and Thiazolidinediones in Patients With Type 2 Diabetes Treated With Metformin, R Vashisht, K Jung, A Schuler, JM Banda, RW Park, S Jin, L Li, JT Dudley, et.al. JAMA network open 1 (4), e181755-, pages 181755, 2018.
  37. Nanopublications: A growing resource of provenance-centric scientific linked data, T Kuhn, A Merono-Penuela, A Malic, JH Poelen, AH Hurlbert, EC Ortiz, et.al. In 2018 IEEE 14th International Conference on e-Science (e-Science), pages 83-92, 2018.
  38. Advancing the use of the ISBT-128 Coding System in electronic health records to monitor blood transfusion prevalence in the United States, J Obidi, K Chada, J Gruber, G Dores, E Storch, A Williams, JM Banda, et.al. Transfusion 58, 167A-167A, 2018.
  39. Roadmap for Reliable Ensemble Forecasting of the Sun-Earth System, G Nita, R Angryk, B Aydin, JM Banda, T Bastian, T Berger, V Bindi, et.al. arXiv preprint arXiv:1810 , 8728, 2018.
  40. Transfusion Trends of Whole Blood and Blood Components in Electronic Health Records and Claims Databases, K Chada, J Obidi, J Gruber, G Dores, E Storch, A Williams, JM Banda, et.al.Transfusion 58, 91A-91A, 2018.
  41. Scalable Electronic Phenotyping For Studying Patient Comorbidities, AY Ling, E Alsentzer, J Chen, JM Banda, S Tamang, and E MintyAMIA Annual Symposium Proceedings, 2018, pages 740-740, 2018.
  42. Identifying Cases of Metastatic Prostate Cancer Using Machine Learning on Electronic Health Records,MG Seneviratne, JM Banda, JD Brooks, NH Shah, et.al.AMIA Annual Symposium Proceedings, 2018, pages 1498-1498, 2018.
  43. 2017

  44. Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network, JM Banda, Y Halpern, D Sontag, and NH Shah. AMIA Summits on Translational Science Proceedings, 2017, pages 48-48, 2017.
  45. Risk of angioedema associated with levetiracetam compared with phenytoin: Findings of the observational health data sciences and informatics research network, JD Duke, PB Ryan, MA Suchard, G Hripcsak, P Jin, C Reich, MS Schwalm, JM Banda, et.al.Epilepsia 58 (8), e101-, pages 106, 2017.
  46. Solar event classification using deep convolutional neural networks, A Kucuk, JM Banda, and RA Angryk. International Conference on Artificial Intelligence and Soft Computing, pages 118-130, 2017.
  47. A large-scale solar dynamics observatory image dataset for computer vision applications, A Kucuk, JM Banda, and RA Angryk. Nature Scientific data, 4, pages 170096-170096, 2017.
  48. 2016

  49. Characterizing treatment pathways at scale using the OHDSI network, G Hripcsak, PB Ryan, JD Duke, NH Shah, RW Park, V Huser, MA Suchard, JM Banda, et.al.Proceedings of the National Academy of Sciences, 113, (27), pages 7329-7336, 2016.
  50. Learning statistical models of phenotypes using noisy labeled training data, V Agarwal, T Podchiyska, JM Banda, V Goel, TI Leung, EP Minty, et.al.Journal of the American Medical Informatics Association, 23, (6), pages 1166-1173, 2016.
  51. A curated and standardized adverse drug event resource to accelerate drug safety research, JM Banda, L Evans, RS Vanguri, NP Tatonetti, PB Ryan, and NH Shah.Nature Scientific data, 3, pages 160026-160026, 2016.
  52. Feasibility of prioritizing drug–drug-event associations found in electronic health records, JM Banda, A Callahan, R Winnenburg, HR Strasberg, A Cami, BY Reis, et. al.Drug safety, 39, (1), pages 45-57, 2016.
  53. Mining At Most Top-K% Spatiotemporal Co-occurrence Patterns in Datasets with Extended Spatial Representations, KG Pillai, RA Angryk, JM Banda, D Kempton, B Aydin, and PC Martens. ACM Transactions on Spatial Algorithms and Systems (TSAS), 2, (3), pages 10-10, 2016.
  54. Solar Data Mining at Georgia State University,R Angryk, PC Martens, M Schuh, B Aydin, D Kempton, JM Banda, R Ma, et.al.AGU Fall Meeting Abstracts, 2016.
  55. 2015

  56. Provenance-centered dataset of drug-drug interactions, JM Banda, T Kuhn, NH Shah, and M Dumontier. International Semantic Web Conference, pages 293-300, 2015.
  57. On visualization techniques for solar data mining, MA Schuh, JM Banda, T Wylie, P McInerney, KG Pillai, and RA Angryk.Astronomy and computing, 10, pages 32-42, 2015.
  58. Regional content-based image retrieval for solar images: Traditional versus modern methods, JM Banda, and RA Angryk. Astronomy and computing, 13, pages 108-116, 2015.
  59. Unsupervised learning techniques for detection of regions of interest in Solar Images, JM Banda, and RA Angryk.2015 IEEE International Conference on Data Mining Workshop (ICDMW), pages 582-588, 2015.
  60. 2014

  61. A Comparative Evaluation of Automated Solar Filament Detection, MA Schuh, JM Banda, P Bernasconi, and RA Angryk. Solar Physics, 289, (7), pages 2503-2524, 2014.
  62. Spatiotemporal co-occurrence rules, KG Pillai, RA Angryk, JM Banda, T Wylie, and MA Schuh. New Trends in Databases and Information Systems, pages 27-35, 2014.
  63. When too similar is bad: A practical example of the solar dynamics observatory content-based image-retrieval system, JM Banda, MA Schuh, T Wylie, P McInerney, and RA Angryk/ New Trends in Databases and Information Systems, pages 87-95, 2014.
  64. Big data new frontiers: mining, search and management of massive repositories of solar image data and solar events, JM Banda, MA Schuh, RA Angryk, KG Pillai, and P McInerney New Trends in Databases and Information Systems, pages 151-158, 2014.
  65. Large-scale region-based multimedia retrieval for solar images, JM Banda, and RA Angryk Lecture Notes in Computer Science: Artificial Intelligence and Soft …, 2014.
  66. Scalable solar image retrieval with lucene, JM Banda, and RA Angryk. 2014 IEEE International Conference on Big Data (Big Data), pages 11-17, 2014.
  67. Image retrieval on compressed images: Can we tell the difference?, JM Banda, RA Angryk, MA Schuh, and PC Martens. 2014 4th International Conference on Image Processing Theory, Tools and …, 2014.
  68. 2013

  69. A large-scale solar image dataset with labeled event regions, MA Schuh, RA Angryk, KG Pillai, JM Banda, and PC Martens. 2013 IEEE International Conference on Image Processing, pages 4349-4353, 2013.
  70. On dimensionality reduction for indexing and retrieval of large-scale solar image data, JM Banda, RA Angryk, and PCH Martens. Solar Physics, 283, (1), pages 113-141, 2013.
  71. Steps Toward a Large-Scale Solar Image Data Analysis to Differentiate Solar Phenomena, JM Banda, RA Angryk, and PCH Martens. Solar Physics, 2013.
  72. Region-based Querying of Solar Data Using Descriptor Signatures, JM Banda, C Liu, and RA Angryk.2013 IEEE 13th International Conference on Data Mining Workshops, pages 1-7, 2013.
  73. A comprehensive study of idistance partitioning strategies for knn queries and high-dimensional data indexing, MA Schuh, T Wylie, JM Banda, and RA Angryk. British National Conference on Databases, pages 238-252, 2013.
  74. Imagefarmer: introducing a data mining framework for the creation of large-scale content-based image retrieval systems, JM Banda, RA Angryk, and PC Martens. International Journal of Computer Applications, 79, (13), 2013.
  75. On Using SIFT Descriptors for Image Parameter Evaluation, PM McInerney, JM Banda, and RA Angryk. 2013 IEEE 13th International Conference on Data Mining Workshops, pages 32-39, 2013.
  76. Introducing the first publicly available content-based image-retrieval system for the solar dynamics observatory mission, MA Schuh, JM Banda, RA Angryk, and P Martens.AAS/SPD Meeting 44, # 100., 97, 2013.
  77. Extending high-dimensional indexing techniques pyramid and iminmax (θ): lessons learned,
  78. KG Pillai, L Sturlaugson, JM Banda, and RA Angryk. British National Conference on Databases, pages 253-267, 2013.


  79. Spatio-temporal co-occurrence pattern mining in data sets with evolving regions, KG Pillai, RA Angryk, JM Banda, MA Schuh, and T Wylie. 2012 IEEE 12th International Conference on Data Mining Workshops, pages 805-812, 2012.
  80. Quantitative comparison of linear and non-linear dimensionality reduction techniques for solar image archives, JM Banda, RA Angryk, and PC Martens. Twenty-Fifth International FLAIRS Conference, 2012.
  81. Content-based Image Retrieval For Solar Physics: First Steps And A Practical Demonstration., JM Banda, RA Angryk, and PCH Martens. American Astronomical Society Meeting Abstracts#, 220, pages 220-220, 2012.

  82. Supporting Solar Physics Research via Data Mining, RA Angryk, JM Banda, MA Schuh, KG Ganesan Pillai, H Tosun, and PCH Martens. American Astronomical Society Meeting Abstracts#, 220, pages 220-220, 2012.
  83. 2011

  84. On the surprisingly accurate transfer of image parameters between medical and solar images, JM Banda, RA Angryk, and PC Martens. 2011 18th IEEE International Conference on Image Processing, pages 3669-3672, 2011.
  85. Framework for creating large-scale content-based image retrieval system (CBIR) for solar data analysis, JM Banda, and RA Adviser-AngrykMontana State University, 1, (1), 2011.
  86. 2010

  87. An experimental evaluation of popular image parameters for monochromatic solar image categorization, JM Banda, and R Angryk. Twenty-Third International FLAIRS Conference, 2010.
  88. Selection of image parameters as the first step towards creating a CBIR system for the solar dynamics observatory, JM Banda, and RA Angryk. 2010 International Conference on Digital Image Computing: Techniques and …, 2010.
  89. Usage of Dissimilarity Measures and Multidimensional Scaling for Large Scale Solar Data Analysis., JM Banda, and RA Angryk. CIDU, pages 189-203, 2010.
  90. 2009

  91. On the effectiveness of fuzzy clustering as a data discretization technique for large-scale classification of solar images, JM Banda, and RA Angryk. 2009 IEEE International Conference on Fuzzy Systems, pages 2019-2024, 2009.


Some videos of talks and slide decks from previous presentations

Stanford Human-Centered Artificial Intelligence Weekly Seminar with Juan Banda Co-Hosted by SAGE Center - February 2, 2022

Are Phenotyping Algorithms Fair for Underrepresented Minorities Within Older Adults? - Columbia DBMI Seminar - 2021

Leveraging APHRODITE to identify bias in statistical phenotyping algorithms - OHDSI Symposium 2021 - Lightning Talk - 2021

Characterizing the COVID-19 epidemic with NLP - ICLR 2020 - LatinX in AI Workshop - 2020

Building tools and frameworks for large-scale social media mining: Creating data infrastructure for COVID-19 research - Democratizing Artificial Intelligence Research, Education, and Technologies - dar.ai - 2020

Some of my older slide decks can be found here

I give around 10-12 talks a year, so for a full list, please contact me.

If interested in my talking to your lab or organization about my work, please contact me.

Contact Me

Get in touch with me