Juan M. Banda

I'm a

About Me

Main information about me

I'm Juan M. Banda and I am a

Hello, my name is Juan M. Banda and I am currently an assistant professor of computer science at Georgia State University. In my research lab, Panacea Lab, we aim to build machine learning, computer vision, and NLP methods that help to generate insights from multi-modal large-scale data sources. With applications to precision medicine, medical informatics, astroinformatics and other domains, our work addresses domain-specific problems with data science methods and practices. As an engineer at heart and practice for the last 20 years, I have used Python, Bash, ontologies, and NLP tools to build pipelines to annotate over 68 million clinical notes. I have built custom ETLs to map over 8 million patient electronic health records, from 4 institutions, to common data models (OMOP) for large scale analytics and machine learning purposes. I have designed pipelines, databases, and processes to build research infrastructure for my current and previous labs. I have used R, SQL, Matlab, Perl, Java, Javascript, and other languages to acquire, clean and operationalize data from multiple sources. I have mined over 9 billion Tweets for NLP tasks to gain insights from them. In my earlier days, I built content-based image retrieval systems for NASA’s SDO mission, with capacity to process and index over 40,000 images daily, and provide computer vision-aided similarity search for images. I started my engineering days designing and developing point-of-sale systems written in Visual Basic. Apart from my technical skills, I have strong communication and writing skills (over 50 refereed publications) and management skills (I have managed over 40 employees and 20 students). With the desire of improving patient outcomes, medical care and building things that change people’s lives, I am committed to releasing all my work via open-source licenses following the FAIR data sharing principles.

Technical Skills

Some of the skills here are grouped by areas for brevity. As a generalist, I have experience with many programming languages and tools, from things like COBOL to Business Objects.

Databases (MySQL, SQLServer, PostgreSQL, Oracle, MongoDB) - 15 years of experience95%
Data Science (NLP pipelines, Visualization, Reporting) - 14 years of experience92%
Data Engineering (Custom ETLs, Data Pipelines, Bash, Pandas, SQL) - 12 years of experience80%
Machine Learning (Tensorflow, Scikit-learn, Theano, caret, WEKA) - 10 years of experience80%
Scientific Programming Languages (R, Matlab, Maple) - 10 years of experience75%
General Programming Langauges (Python, Java, Visual Basic) - 10 years of experience75%
Cloud Environments (AWS, Azure, BigQuery) - 6 years of experience65%
  • 0

    Lines of Code
  • 0

    Hackathons Participated In
  • 0+

    Peer-reviewed Publications

In the News

My work has been mentioned in the following outlets

Under construction!!!

For now, some of the news references can be found here


List of peer-reviewed publications:


  1. Clinical decision support tool for phototherapy initiation in preterm infants, Y Arain, JM Banda, J Faulkenberry, VK Bhutani, J Palma Journal of Perinatology,  2020.
  2. Mining Archive.org Twitter Stream Grab for Pharmacovigilance Research Gold, R Tekumalla, JR Asl, and JM Banda Proceedings of the International AAAI Conference on Web and Social Media , 14, (1), pages 909-917, 2020.
  3. Social Media Mining Toolkit (SMMT), R Tekumalla and JM Banda. Genomics & Informatics, 18, (2), 2020.
  4. Development and Validation of Phenotype Classifiers across Multiple Sites in the Observational Health Sciences and Informatics (OHDSI) Network, M Kashyap, M Seneviratne, JM Banda, T Falconer, B Ryu, S Yoo, et. al. Journal of the American Medical Informatics Association, 27, (6), pages 877-883, 2020.
  5. Ten simple rules to run a successful BioHackathon, L Garcia, E Antezana, A Garcia, E Bolton, R Jimenez, P Prins, JM Banda, T. Katayama. PLOS Computational Biology, 16, (5), 2020.
  6. 2019

  7. Finding missed cases of familial hypercholesterolemia in health systems using machine learning, JM Banda, A Sarraju, F Abbasi, J Parizo, M Pariani, H Ison, E Briskin, et. al. npj Digital Medicine, 2, (1), pages 23-23, 2019.
  8. Precision screening for familial hypercholesterolaemia: a machine learning study applied to electronic health encounter data, KD Myers, JW Knowles, D Staszak, MD Shapiro, W Howard, M Yadava,... JM Banda, et. al. The Lancet Digital Health, 1, (8), pages 393-402, 2019.
  9. Assessing the potential impact of vector-borne disease transmission following heavy rainfall events: a mathematical framework, G Chowell, K Mizumoto, JM Banda, S Poccia, and C Perrings. Philosophical Transactions of the Royal Society B, 374, (1775), 2019.
  10. Fully connecting the Observational Health Data Science and Informatics (OHDSI) initiative with the world of linked open data, JM Banda. Genomics & Informatics, 17, (2), 2019.
  11. Extending Achilles Heel Data Quality Tool with New Rules Informed by Multi-Site Data Quality Comparison, V Huser, X Li, Z Zhang, S Jung, RW Park, JM Banda, H Razzaghi, A Londhe, et. al. Studies in health technology and informatics, 264, pages 1488-1489, 2019.
  12. Web-and ontology-based annotation and retrieval of geoscience photomicrographs with ontologies and machine learning, HA Babaie, A Davarpanah, and JM Banda. AGU Fall Meeting 2019, IN32B, 2019.
  13. Solar Event Tracking with Deep Regression Networks: A Proof of Concept Evaluation, TT Sarker, and JM Banda.2019 IEEE International Conference on Big Data (Big Data), pages 4942-4949, 2019.
  14. 2018

  15. Advances in electronic phenotyping: from rule-based definitions to machine learning models,JM Banda, M Seneviratne, T Hernandez-Boussard, and NH Shah Annual review of biomedical data science, 1, pages 53-68, 2018.
  16. Association of Hemoglobin A1c Levels With Use of Sulfonylureas, Dipeptidyl Peptidase 4 Inhibitors, and Thiazolidinediones in Patients With Type 2 Diabetes Treated With Metformin, R Vashisht, K Jung, A Schuler, JM Banda, RW Park, S Jin, L Li, JT Dudley, et.al. JAMA network open 1 (4), e181755-, pages 181755, 2018.
  17. Nanopublications: A growing resource of provenance-centric scientific linked data, T Kuhn, A Merono-Penuela, A Malic, JH Poelen, AH Hurlbert, EC Ortiz, et.al. In 2018 IEEE 14th International Conference on e-Science (e-Science), pages 83-92, 2018.
  18. Advancing the use of the ISBT-128 Coding System in electronic health records to monitor blood transfusion prevalence in the United States, J Obidi, K Chada, J Gruber, G Dores, E Storch, A Williams, JM Banda, et.al. Transfusion 58, 167A-167A, 2018.
  19. Roadmap for Reliable Ensemble Forecasting of the Sun-Earth System, G Nita, R Angryk, B Aydin, JM Banda, T Bastian, T Berger, V Bindi, et.al. arXiv preprint arXiv:1810 , 8728, 2018.
  20. Transfusion Trends of Whole Blood and Blood Components in Electronic Health Records and Claims Databases, K Chada, J Obidi, J Gruber, G Dores, E Storch, A Williams, JM Banda, et.al.Transfusion 58, 91A-91A, 2018.
  21. Scalable Electronic Phenotyping For Studying Patient Comorbidities, AY Ling, E Alsentzer, J Chen, JM Banda, S Tamang, and E MintyAMIA Annual Symposium Proceedings, 2018, pages 740-740, 2018.
  22. Identifying Cases of Metastatic Prostate Cancer Using Machine Learning on Electronic Health Records,MG Seneviratne, JM Banda, JD Brooks, NH Shah, et.al.AMIA Annual Symposium Proceedings, 2018, pages 1498-1498, 2018.
  23. 2017

  24. Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network, JM Banda, Y Halpern, D Sontag, and NH Shah. AMIA Summits on Translational Science Proceedings, 2017, pages 48-48, 2017.
  25. Risk of angioedema associated with levetiracetam compared with phenytoin: Findings of the observational health data sciences and informatics research network, JD Duke, PB Ryan, MA Suchard, G Hripcsak, P Jin, C Reich, MS Schwalm, JM Banda, et.al.Epilepsia 58 (8), e101-, pages 106, 2017.
  26. Solar event classification using deep convolutional neural networks, A Kucuk, JM Banda, and RA Angryk. International Conference on Artificial Intelligence and Soft Computing, pages 118-130, 2017.
  27. A large-scale solar dynamics observatory image dataset for computer vision applications, A Kucuk, JM Banda, and RA Angryk. Nature Scientific data, 4, pages 170096-170096, 2017.
  28. 2016

  29. Characterizing treatment pathways at scale using the OHDSI network, G Hripcsak, PB Ryan, JD Duke, NH Shah, RW Park, V Huser, MA Suchard, JM Banda, et.al.Proceedings of the National Academy of Sciences, 113, (27), pages 7329-7336, 2016.
  30. Learning statistical models of phenotypes using noisy labeled training data, V Agarwal, T Podchiyska, JM Banda, V Goel, TI Leung, EP Minty, et.al.Journal of the American Medical Informatics Association, 23, (6), pages 1166-1173, 2016.
  31. A curated and standardized adverse drug event resource to accelerate drug safety research, JM Banda, L Evans, RS Vanguri, NP Tatonetti, PB Ryan, and NH Shah.Nature Scientific data, 3, pages 160026-160026, 2016.
  32. Feasibility of prioritizing drug–drug-event associations found in electronic health records, JM Banda, A Callahan, R Winnenburg, HR Strasberg, A Cami, BY Reis, et. al.Drug safety, 39, (1), pages 45-57, 2016.
  33. Mining At Most Top-K% Spatiotemporal Co-occurrence Patterns in Datasets with Extended Spatial Representations, KG Pillai, RA Angryk, JM Banda, D Kempton, B Aydin, and PC Martens. ACM Transactions on Spatial Algorithms and Systems (TSAS), 2, (3), pages 10-10, 2016.
  34. Solar Data Mining at Georgia State University,R Angryk, PC Martens, M Schuh, B Aydin, D Kempton, JM Banda, R Ma, et.al.AGU Fall Meeting Abstracts, 2016.
  35. 2015

  36. Provenance-centered dataset of drug-drug interactions, JM Banda, T Kuhn, NH Shah, and M Dumontier. International Semantic Web Conference, pages 293-300, 2015.
  37. On visualization techniques for solar data mining, MA Schuh, JM Banda, T Wylie, P McInerney, KG Pillai, and RA Angryk.Astronomy and computing, 10, pages 32-42, 2015.
  38. Regional content-based image retrieval for solar images: Traditional versus modern methods, JM Banda, and RA Angryk. Astronomy and computing, 13, pages 108-116, 2015.
  39. Unsupervised learning techniques for detection of regions of interest in Solar Images, JM Banda, and RA Angryk.2015 IEEE International Conference on Data Mining Workshop (ICDMW), pages 582-588, 2015.
  40. 2014

  41. A Comparative Evaluation of Automated Solar Filament Detection, MA Schuh, JM Banda, P Bernasconi, and RA Angryk. Solar Physics, 289, (7), pages 2503-2524, 2014.
  42. Spatiotemporal co-occurrence rules, KG Pillai, RA Angryk, JM Banda, T Wylie, and MA Schuh. New Trends in Databases and Information Systems, pages 27-35, 2014.
  43. When too similar is bad: A practical example of the solar dynamics observatory content-based image-retrieval system, JM Banda, MA Schuh, T Wylie, P McInerney, and RA Angryk/ New Trends in Databases and Information Systems, pages 87-95, 2014.
  44. Big data new frontiers: mining, search and management of massive repositories of solar image data and solar events, JM Banda, MA Schuh, RA Angryk, KG Pillai, and P McInerney New Trends in Databases and Information Systems, pages 151-158, 2014.
  45. Large-scale region-based multimedia retrieval for solar images, JM Banda, and RA Angryk Lecture Notes in Computer Science: Artificial Intelligence and Soft …, 2014.
  46. Scalable solar image retrieval with lucene, JM Banda, and RA Angryk. 2014 IEEE International Conference on Big Data (Big Data), pages 11-17, 2014.
  47. Image retrieval on compressed images: Can we tell the difference?, JM Banda, RA Angryk, MA Schuh, and PC Martens. 2014 4th International Conference on Image Processing Theory, Tools and …, 2014.
  48. 2013

  49. A large-scale solar image dataset with labeled event regions, MA Schuh, RA Angryk, KG Pillai, JM Banda, and PC Martens. 2013 IEEE International Conference on Image Processing, pages 4349-4353, 2013.
  50. On dimensionality reduction for indexing and retrieval of large-scale solar image data, JM Banda, RA Angryk, and PCH Martens. Solar Physics, 283, (1), pages 113-141, 2013.
  51. Steps Toward a Large-Scale Solar Image Data Analysis to Differentiate Solar Phenomena, JM Banda, RA Angryk, and PCH Martens. Solar Physics, 2013.
  52. Region-based Querying of Solar Data Using Descriptor Signatures, JM Banda, C Liu, and RA Angryk.2013 IEEE 13th International Conference on Data Mining Workshops, pages 1-7, 2013.
  53. A comprehensive study of idistance partitioning strategies for knn queries and high-dimensional data indexing, MA Schuh, T Wylie, JM Banda, and RA Angryk. British National Conference on Databases, pages 238-252, 2013.
  54. Imagefarmer: introducing a data mining framework for the creation of large-scale content-based image retrieval systems, JM Banda, RA Angryk, and PC Martens. International Journal of Computer Applications, 79, (13), 2013.
  55. On Using SIFT Descriptors for Image Parameter Evaluation, PM McInerney, JM Banda, and RA Angryk. 2013 IEEE 13th International Conference on Data Mining Workshops, pages 32-39, 2013.
  56. Introducing the first publicly available content-based image-retrieval system for the solar dynamics observatory mission, MA Schuh, JM Banda, RA Angryk, and P Martens.AAS/SPD Meeting 44, # 100., 97, 2013.
  57. Extending high-dimensional indexing techniques pyramid and iminmax (θ): lessons learned,
  58. KG Pillai, L Sturlaugson, JM Banda, and RA Angryk. British National Conference on Databases, pages 253-267, 2013.


  59. Spatio-temporal co-occurrence pattern mining in data sets with evolving regions, KG Pillai, RA Angryk, JM Banda, MA Schuh, and T Wylie. 2012 IEEE 12th International Conference on Data Mining Workshops, pages 805-812, 2012.
  60. Quantitative comparison of linear and non-linear dimensionality reduction techniques for solar image archives, JM Banda, RA Angryk, and PC Martens. Twenty-Fifth International FLAIRS Conference, 2012.
  61. Content-based Image Retrieval For Solar Physics: First Steps And A Practical Demonstration., JM Banda, RA Angryk, and PCH Martens. American Astronomical Society Meeting Abstracts#, 220, pages 220-220, 2012.

  62. Supporting Solar Physics Research via Data Mining, RA Angryk, JM Banda, MA Schuh, KG Ganesan Pillai, H Tosun, and PCH Martens. American Astronomical Society Meeting Abstracts#, 220, pages 220-220, 2012.
  63. 2011

  64. On the surprisingly accurate transfer of image parameters between medical and solar images, JM Banda, RA Angryk, and PC Martens. 2011 18th IEEE International Conference on Image Processing, pages 3669-3672, 2011.
  65. Framework for creating large-scale content-based image retrieval system (CBIR) for solar data analysis, JM Banda, and RA Adviser-AngrykMontana State University, 1, (1), 2011.
  66. 2010

  67. An experimental evaluation of popular image parameters for monochromatic solar image categorization, JM Banda, and R Angryk. Twenty-Third International FLAIRS Conference, 2010.
  68. Selection of image parameters as the first step towards creating a CBIR system for the solar dynamics observatory, JM Banda, and RA Angryk. 2010 International Conference on Digital Image Computing: Techniques and …, 2010.
  69. Usage of Dissimilarity Measures and Multidimensional Scaling for Large Scale Solar Data Analysis., JM Banda, and RA Angryk. CIDU, pages 189-203, 2010.
  70. 2009

  71. On the effectiveness of fuzzy clustering as a data discretization technique for large-scale classification of solar images, JM Banda, and RA Angryk. 2009 IEEE International Conference on Fuzzy Systems, pages 2019-2024, 2009.


Some videos of talks and slide decks from previous presentations

Under construction!!!

For now, some of my slide decks can be found here

Contact Me

Get in touch with me