Muhammad Monjurul Karim

112 More Hall · University of Washington · Seattle · WA 98195 · (609) 787-9233 · mmkarim@uw.edu

I am a Postdoctoral Scholar at University of Washington. I received my Ph.D. in Civil Engineering from Stony Brook University and M.S. in Systems Engineering from Missouri University of Science and Technology.

My research goal is to provide visual perception to the real world applications by achieving artificial intelligence. I am interested in solving large-scale visual recognition and prediction problems by developing novel deep learning approaches. My research experience includes but not limited to traffic accident anticipation, risk localization, object detection/segmentation, tracking, visual attention, and remote sensing.

Please, find my CV.


Research Projects

Risky object localization in driving videos

Abstract: Detecting dangerous traffic agents in videos captured by vehicle-mounted dashboard cameras (dashcams) is essential to facilitate safe navigation in a complex environment. Accident-related videos are just a minor portion of the driving video big data, and the transient pre-accident processes are highly dynamic and complex. Besides, risky and non-risky traffic agents can be similar in their appearance. These make risky object localization in the driving video particularly challenging. To this end, we developed an attention-guided multistream feature fusion network (AM-Net) to localize dangerous traffic agents from dashcam videos. Two Gated Recurrent Unit (GRU) networks use object bounding box and optical flow features extracted from consecutive video frames to capture spatio-temporal cues for distinguishing dangerous traffic agents. An attention module coupled with the GRUs learns to attend to the traffic agents relevant to an accident. Fusing the two streams of features, AM-Net predicts the riskiness scores of traffic agents in the video. In supporting this study, we also introduced a benchmark dataset called Risky Object Localization (ROL). Project Link

This resarch is supported by National Science Foundation (NSF)

Predicting future traffic accidents with vehicle mounted camera

accident

Abstract: The rapid advancement of sensor technologies and artificial intelligence are creating new opportunities for traffic safety enhancement. Dashboard cameras (dashcams) have been widely deployed on both human driving vehicles and automated driving vehicles. A computational intelligence model that can accurately and promptly predict accidents from the dashcam video will improve the preparedness for accident prevention. The spatial-temporal interaction of traffic agents is complex. Visual cues for predicting a future accident are embedded deeply in dashcam video data. Therefore, the early anticipation of traffic accidents remains a challenge. Inspired by the humans’ attention behavior in visually perceiving accident risks, this paper proposes a Dynamic Spatial-Temporal Attention (DSTA) network for the early accident anticipation from dashcam videos. The DSTA-network learns to select discriminative temporal segments of a video sequence with a Dynamic Temporal Attention (DTA) module. It also learns to focus on the informative spatial regions of frames with a Dynamic Spatial Attention (DSA) module. A Gated Recurrent Unit (GRU) network is trained jointly with the attention modules to predict the probability of a future accident.

This resarch is supported by National Science Foundation (NSF)

Explainable Artificial Intelligence for Traffic Accident Prediction

grad-cam

Abstract: Traffic accident anticipation is a vital function of Automated Driving Systems (ADSs) for providing a safety-guaranteed driving experience. An accident anticipation model aims to predict accidents promptly and accurately before they occur. Existing Artificial Intelligence (AI) models of accident anticipation lack a human-interpretable explanation of their decision-making. Although these models perform well, they remain a black-box to the ADS users, thus difficult to get their trust. To this end, this paper presents a Gated Recurrent Unit (GRU) network that learns spatio-temporal relational features for the early anticipation of traffic accidents from dashcam video data. A post-hoc attention mechanism named Grad-CAM is integrated into the network to generate saliency maps as the visual explanation of the accident anticipation decision. An eye tracker captures human eye fixation points for generating human attention maps. The explainability of network-generated saliency maps is evaluated in comparison to human attention maps.

This resarch is supported by National Science Foundation (NSF)

Bridge Inspection Video Data Analysis for Data-driven Asset Management

Overview

Abstract: Inspection of the transportation infrastructure, such as bridges, is an important step towards the preservation andrehabilitation of the infrastructure for extending their service lives. The advancement of mobile robotic technology hasmade it possible to rapidly collect a large amount of inspection video data. Yet, the data are mainly images of complexscenes, wherein a bridge of various structural elements mix with a cluttered background. Assisting bridge inspectors inextracting structural elements of bridges from the big complex video data, and sorting them out by classes, will prepareinspectors for the element-wise inspection to determine the condition of bridges. This paper is motivated to developan assistive intelligence model for segmenting multiclass bridge elements from inspection videos captured by an aerialinspection platform. First, with a small initial training dataset labeled by inspectors, a Mask Region-based ConvolutionalNeural Network (Mask R-CNN) pre-trained on a large public dataset was transferred to the new task of multiclass bridgeelement segmentation. Then, the temporal coherence analysis attempts to recover false negative detections by thetransferred network. Finally, a semi-supervised self-training (S3T) algorithm was developed, which leverages inspectors’domain knowledge into the intelligence model by engaging them in refining the network iteratively.

Snow
Forest
Project Link

This resarch is supported by INSPIRE University Transportation Center (http://inspire-utc.mst.edu) .

vision sensor based deep neural networks for complex driving scene analysis in support of crash risk assessment and prevention

multinet

Abstract: To assist human drivers and autonomous vehicles in assessing crash risks, driving scene analysis using dash cameras on vehicles and deep learning algorithms is of paramount importance. Although these technologies are increasingly available, driving scene analysis for this purpose still remains a challenge. This is mainly due to the lack of annotated large image datasets for analyzing crash risk indicators and crash likelihood, and the lack of an effective method to extract lots of required information from complex driving scenes. To fill the gap, this paper develops a scene analysis system. The Multi-Net of the system includes two multi-task neural networks that perform scene classification to provide four labels for each scene. The DeepLab v3 and YOLO v3 are combined by the system to detect and locate risky pedestrians and the nearest vehicles. All identified information can provide the situational awareness to autonomous vehicles or human drivers for identifying crash risks from the surrounding traffic. To address the scarcity of annotated image datasets for studying traffic crashes, two completely new datasets have been developed by this paper and made available to the public, which were proved to be effective in training the proposed deep neural networks. The paper further evaluates the performance of the Multi-Net and the efficiency of the developed system. Comprehensive scene analysis is further illustrated with representative examples. Results demonstrate the effectiveness of the developed system and datasets for driving scene analysis, and their supportiveness for crash risk assessment and crash prevention. Project Link

This resarch is supported by MATC University Transportation Center (http://matc.unl.edu/)

Other Projects

A Driving Simulator Based Study for Evaluating Safe Development of Autonomous Truck Mounted Attenuators Vehicle

simulator

Description: Developed a driving simulation using blender gaming engine to collect data from drivers to better understand the impact of employing ATMA ( Autonomous Truck Mounted Attenuator)


Object detection and tracking using Mask RCNN and temporal coherence

simulator

Description: This is the implementation of manufacturing Object detection and tracking in the manufacturing plants. This model uses Mask RCNN model to do the initial segmentation. Which is based on Feature Pyramid Network(FPN) and a ResNet50 backbone. To give temporal consistency in the detection results, a two-staged detection threshold has been used to boost up weak detections in a frame by referring to objects with high detection scores in neighboring frames. | Project

This resarch is supported by National Science Foundation (NSF)

Publications

Journal Papers

1. Karim, M.M. , Qin, R., Wang, Y. (2024). Fusion-GRU: A Deep Learning Model for Future Bounding Box Prediction of Traffic Agents in Risky Driving Videos Transportation Research Record , DOI

2. Karim, M.M. , Qin, R., Yin, Z. (2023). An Attention-guided Multistream Feature Fusion Network for Localization of Risky Objects in Driving Videos, IEEE Transaction on Intelligent Vehicles, doi: 10.1109/TIV.2023.3275543. Preprint | Code

3. Karim, M.M. ,Li, Y., Qin, R., Yin, Z. (2022). A dynamic spatial-temporal attention network for early anticipation of traffic accidents. IEEE Transaction on Intelligent Transportation Systems vol. 23, no. 7, pp. 9590-9600. Preprint | Code

4. Karim, M.M., Li, Y., Qin, R.(2022). Towards explainable artificial intelligence (XAI) for early anticipation of traffic accidents. Transportation Research Record, 2676(6), 743-755. Preprint | Code

5. Karim, M.M. , Qin, R., Yin, Z., & Chen, G. (2021). A semi-supervised self-training method to develop assistive intelligence for segmenting multiclass bridge elements from inspection videos. Structural Health Monitoring, 21(3), 835-852. Preprint | Code

6. Li, Y., Karim, M.M., Qin, R., Sun, Z., Wang, Z., Yin, Z. (2021). Crash report data analysis for creating scenario-sise, spatio-temporal attention guidance to support computer vision-based perception of fatal crash risks. Accident Analysis and Prevention.,151, pp. 105962. Preprint

Peer-Reviewed Conference Papers

1. Karim, M.M., Li, Y., Qin, R., Yin, Z. (2021). A system of vision sensor based deep neural networks for complex driving scene analysis in support of crashrisk assessment and prevention, The 100th Transportation Research Board(TRB) Annual Meeting, Virtual Meeting, January 5-29,2021. Preprint | Code

2. Karim, M.M., Dagli, CH. (2020). Sos meta-architecture selection for infrastructure inspection system using aerial drones. In Proceeding of the 15th IEEE International Symposium on System of Systems Engineering (SoSE 2020). Budapest, Hungary. June 2-4, 2020. DOI

3. Karim, M.M., Dagli, CH., & Qin, R. (2019). Modeling and simulation of a robotic bridge inspection system. In Proceedings of the 2019 Complex Adaptive Systems Conference (CAS’19). Malvern, PA. November 13-15, 2019. DOI

4. Karim, M.M., Doell, D., Lingard, R., Yin, Z., Leu, MC., & Qin, R. (2019). A region-based deep learning algorithm for detecting and tracking objects in manufacturing plants. In Proceedings of the 25th International Conference on Production Research (ICPR’19). Chicago, IL. August 9-14, 2019. DOI | Code

Technical Reports

1. Qin, R., Chen, G., Long, S.K., Yin, Z., Louis, S. Karim, M.M. , Zhao, T., (2020). A training framework of robotic operation and image analysis for decision-making in bridge inspection and preservation (Technical Report INSPIRE-006). USDOT INSPIRE University Transportation Center. Website

2. Qin, R., Yin, Z., Karim, M.M. , Li, Y., Wang, Z. (2020). Crash prediction and avoidance by identifying and evaluating risk factors from onboard cameras (Technical Report 25-1121-0005-135-2). USDOT MATC University Transportation Center. Download


Teaching

Instructor

Course : CET 590 (Traffic Systems Operations)
CET 590 is a graduate course that is offered from the Department of Civil and Environmental Engineering at the University of Washington. As the instructor of this course I taught traffic control system concepts, components, and algorithms. Major topics of this course include, traffic control systems, timing plan design, traffic flow characteristics, driver behavior modeling, vehicle actuated programming, and simulation. Vissim traffic simulation package is also taught in this course to evaluate the performance of traffic operation plans.
Fall 2023

 

Co-Instructor

Course : CIV 555 (Analytics for Engineering Systems)
CIV 555 is a graduate course that is offered from the Civil Engineering Department at Stony Brook University. I was a co-instructor for this course with Dr. Ruwen Qin in this Fall 2022 and Fall 2021 semester. I taught Neural Networks and Deep Learning.
Fall 2021, Fall 2022

 

Guest Lecturer

Course : CIV 355 (Data Analytics for Civil Engineering Systems)
CIV 355 is an undergraduate course that is offered from the Civil Engineering Department at Stony Brook University. I delivered a lecture to introduce neural networks to undergraduate students.
Spring 2022

Awards & Certifications

  • 2022 CIV Research Merit Award, Stony Brook University
2022
  • Honorable mention – Graduate Student Research Symposium Competition organized by Department of CE, Stony Brook University
2022
  • Finalist - Student Competition organized by ASCE T&DI Technical Committee on Artificial Intelligence
2021
  • 2 nd Place - INSPIRE UTC Graduate Student Poster Competetion 2020
2020
  • 2 nd Place - Intelligent Systems Center Poster Competion - Missouri University of Science & Technology 2019
2019
  • 2 nd Place - Best Paper Award - Complex Adaptive Systems Conference 2019
2019

Blog