Given a year of EHR data for patients without Diabetes, we predict which patients will be diagnosed with Diabetes in the next year.
Data Scientist A
Allergy This table contains a list of allergies recorded for a patient.
Key | Column Name | Data Type | Description |
---|---|---|---|
PK | AllergyGuid | UNIQUEIDENTIFIER NOT NULL | Unique identifier for the patient allergy record. |
FK | PatientGuid | UNIQUEIDENTIFIER | Identifier for the patient. |
AllergyType | NVARCHAR(100) NOT NULL | Substance to twhich the patient has shown allergy. Values: Animal Dander(Contact); Baker's Yeast; Dairy; Dust(Inhaled); Dust Mites (Contact); Egg; Insect Stings (Contact); Latex(Contact); Medication; Melons, Bananas, Cucumbers (Ragweed Pollen); Mold (inhaled); Nickel (Contact); Other nuts; Peanut; Pollen (inhalled); Seafood; Shellfish; Soy: Wheat | |
Startyear | SMALLINT NOT NULL | Year of onset for the allergy. | |
ReactionName | NVARCHAR(100) NULL | Allergic reaction reported by the patient. Values: Bloating/gas; Bradycardia; Chest Pain;Conjunctivitis; Cough; Diarrhea; Difficulty speaking or swallowing; Dizziness/Lightheadedness; Facial swelling; Hives; Irregular Heartbeat; Itchiness; Loss of consciousness; Nausea; Pain/cramping; Patchy swellingskin; Rash - generalized; Rash - localized; Respiratory Distress; Runny nose; Shortness of breath; Tachycardia; Tongue swelling; Vomiting; Wheezing | |
Severityname | NVARCHAR(100) NULL | Severity of the patient allergy. Values: Very Mild; Mild; Modest; Severe. | |
MedicationNdcCode | NVARCHAR(50) NULL | National Drug Code (NDC) identifier for medication taken by the patient for the allergy. | |
UserGuid | UNIQUEIDENTIFIER NOT NULL | Identifier for the provider who recorded the allergy. |
Condition Reference table containing valid patient conditions.
Key | Column Name | Data Type | Description |
---|---|---|---|
PK | ConditionGuid | UNIQUEIDENTIFIER NOT NULL | Unique identifier for a condition. |
Code | NVARCHAR(50)NOT NULL | Code for the condition. | |
Name | NVARCHAR(100) NOT NULL | Description of the condition. |
Diagnosis This table contains a list of diagnoses for a patient.
Key | Column Name | Data Type | Description |
---|---|---|---|
PK | DiagnosisGuid | UNIQUEIDENTIFIER NOT NULL | Unique identifier for a patient diagnosis. |
FK | PatientGuid | UNIQUEIDENTIFIER NOT NULL | Identifier for the patient. |
ICD9Code | NVARCHAR(50) NOT NULL | ICD9 code for the diagnosis. (See http://en.wikipedia.org/wiki/List_of_ICD-9_codes) | |
DiagnosisDescription | NVARCHAR(256) NOT NULL | Description of ICD9 code. | |
StartYear | SMALLINT NULL | Year for the onset of the diagnosis. | |
StopYear | SMALLINT NULL | Year for the end of the diagnosis. | |
Acute | BIT NULL | Is the diagnosis an acute condition? If not acute, then the condition is considered chronic. Values: 1 = Acute; 0 = Chronic. | |
UserGuid | UNIQUEIDENTIFIER NOT NULL | Unique identifier for the provider who recorded the diagnosis. |
Immunization This table contains a list of immunizations for a patient.
Key | Column Name | Data Type | Description |
---|---|---|---|
PK | ImmunizationGuid | UNIQUEIDENTIFIER NOT NULL | Unique identifier for the patient immunization record. |
FK | PatientGuid | UNIQUEIDENTIFIER NOT NULL | Identifier for the patient. |
VaccineName | NVARCHAR(256) NULL | Name of the vaccine administered to the patient. | |
AdministeredYear | SMALLINT NULL | Year the vaccine was administered. | |
CvxCode | NVARCHAR(100) NULL | CVX code for the administered vaccine (CDC). | |
UserGuid | UNIQUEIDENTIFIER NOT NULL | Identifier for the provider who recorded the immunization. |
LabObservation This table contains lab test observations for a lab panel.
Key | Column Name | Data Type | Description |
---|---|---|---|
PK | LabObservationGuid | UNIQUEIDENTIFIER NOT NULL | Unique identifier for the lab test observation. |
FK | LabPanelGuid | UNIQUEIDENTIFIER NULL | Identifier for the lab test panel. |
HL7Identifier | NVARCHAR(255) NOT NULL | Code for the lab test observation, as provided by the lab facility. | |
HL7Text | NVARCHAR(255) NOT NULL | Name of the lab test observation, as provided by the lab facility. | |
HL7CodingSystem | NVARCHAR(255) NOT NULL | Observation coding system used for the lab testobservation code in column HL7Identifier.Values: LN = LOINC. | |
IsLoinc | BIT NOT NULL | [Not used] | |
ObservationValue | NVARCHAR(255) NOT NULL | Value for the lab test observation. | |
IsValidValue | BIT NOT NULL | [Not used] | |
Units | NVARCHAR(255) NULL | Units of measure for the lab test observation value. | |
ReferenceRange | NVARCHAR(255) NULL | Range for normal lab test observation value. | |
AbnormalFlags | NVARCHAR(255) NULL | Flags for abnormal lab test observation value. Values: Abnormal Result; Abnormal; Above Normal High; Alert High; Alert Low; Below Normal Low; Panic High; Panic Low; UKNOWN. | |
ResultStatus | NVARCHAR(255) NULL | Status of the lab test observation. Values: Corrected; Final; Incomplete; Not Performed; Preliminary; UKNOWN; UNKNOWN. | |
ObservationYear | SMALLINT NULL | Year the lab test observation was performed. | |
ObservationMethod | NVARCHAR(255) NULL | [Not used] | |
UserGuid | UNIQUEIDENTIFIER NULL | Identifier for the provider for the lab test result. | |
IsAbnormalValue | BIT NULL | Indicator whether the lab test observation is abnormal. | |
Sequence | INT NULL | Sequence of the lab test observation for the lab test panel. |
LabPanel This table contains lab test panels reported in a patient lab test result.
Key | Column Name | Data Type | Description |
---|---|---|---|
PK | LabPanelGuid | UNIQUEIDENTIFIER NOT NULL | Unique identifier for the lab test panel performed. |
FK | LabResultGuid | UNIQUEIDENTIFIER NOT NULL | Identifier for the lab test result. |
PanelName | NVARCHAR(255) NULL | Name of the lab test panel. | |
ObservationYear | SMALLINT NULL | Year the lab test panel was performed. | |
DangerCode | NVARCHAR(255) NULL | [Not used] | |
Status | NVARCHAR(255) NULL | Status of the lab test panel. Values: Cancelled; Corrected; Final; Incomplete; Not Performed; Preliminary; UKNOWN; UNKNOWN. | |
Sequence | INT NULL | Sequence of the lab test panel within the lab test result. |
LabResult This table contains patient lab test results received from a lab facility.
Key | Column Name | Data Type | Description |
---|---|---|---|
PK | LabResultGuid | UNIQUEIDENTIFIER NOT NULL | Unique identifier for the lab test result. |
UserGuid | UNIQUEIDENTIFIER NULL | Identifier for the provider who ordered the lab test. | |
FK | PatientGuid | UNIQUEIDENTIFIER NULL | Identifier for the patient. |
FK | TranscriptGuid | UNIQUEIDENTIFIER NULL | Patient visit transcript for the lab test order. |
PracticeGuid | UNIQUEIDENTIFIER NULL | Identifier for the provider's medical practice. | |
FacilityGuid | UNIQUEIDENTIFIER NULL | Identifier for the facility performing the lab test. | |
ReportYear | SMALLINT NULL | Year the lab result was created. | |
FK | AncestorLabResultGuid | UNIQUEIDENTIFIER NULL | Prior lab test result to which this result is related. |
Medication This table contains the medication history for a patient.
Key | Column Name | Data Type | Description |
---|---|---|---|
PK | MedicationGuid | UNIQUEIDENTIFIER NOT NULL | Unique identifier for a patient medication. |
FK | PatientGuid | UNIQUEIDENTIFIER NOT NULL | Identifier for the patient taking the medication. |
NdcCode | NVARCHAR(50) NULL | NDC code for the medication. (See http://www.fda.gov/drugs/informationondrugs/ucm142438.htm) | |
MedicationName | NVARCHAR(256) NULL | Name of the medication. | |
MedicationStrength | NVARCHAR(50) NULL | Strength of the medication. | |
Schedule | NVARCHAR(50) NULL | Controlled substance schedule. (See http://www.deadiversion.usdoj.gov/schedules/index.html) | |
FK | DiagnosisGuid | UNIQUEIDENTIFIER NULL | Identifier for the diagnosis that the provider linked to the medication. |
UserGuid | UNIQUEIDENTIFIER NOT NULL | Identifier of the provider who added the medication. |
Patient This table contains a header record for each patient.
Key | Column Name | Data Type | Description |
---|---|---|---|
PK | PatientGuid | UNIQUEIDENTIFIER NOT NULL | Unique identifier for a patient record. |
Gender | NVARCHAR(1) NOT NULL | Patient gender. Values: M = Male; F = Female. | |
YearOfBirth | SMALLINT NOT NULL | Patient's year of birth. | |
State | NVARCHAR(2) NOT NULL | State abbreviation for the state in which the patient lives. | |
PracticeGuid | UNIQUEIDENTIFIER NOT NULL | Identifier for the practice whose providers are seeing the patient. |
PatientCondition This table lists conditions for a patient.
Key | Column Name | Data Type | Description |
---|---|---|---|
PK | PatientConditionGuid | UNIQUEIDENTIFIER NOT NULL | Unique identifier for a patient condition record. |
FK | PatientGuid | UNIQUEIDENTIFIER NULL | Identifier for the patient with the condition. |
FK | ConditionGuid | UNIQUEIDENTIFIER NULL | Identifier of the condition. |
CreatedYear | SMALLINT NOT NULL | Year the provider added the condition to the electronic health record. |
PatientSmokingStatus This table contains changes in smoking status for each patient,by year.
Key | Column Name | Data Type | Description |
---|---|---|---|
PK | PatientSmokingStatusGuid | UNIQUEIDENTIFIER NOT NULL | Unique identifier for the patient smoking status record. |
FK | PatientGuid | UNIQUEIDENTIFIER NULL | Identifier for the patient. |
FK | SmokingStatusGuid | UNIQUEIDENTIFIER NULL | Identifier for the smoking status. |
EffectiveYear | SMALLINT NULL | Starting year for the smoking status change. |
Prescription This table contains prescription records for a patient. A prescription is a written direction from the provider for administering medication.
Key | Column Name | Data Type | Description |
---|---|---|---|
PK | PrescriptionGuid | UNIQUEIDENTIFIER NOT NULL | Unique identifier for a prescription. |
FK | PatientGuid | UNIQUEIDENTIFIER NOT NULL | Identifier for the patient. |
FK | MedicationGuid | UNIQUEIDENTIFIER NULL | Identifier for the medication that the doctor prescribed. |
PrescriptionYear | SMALLINT NULL | Year in which the prescription was made. | |
Quantity | NVARCHAR(50) NOT NULL | Number of units of the medication prescribed. | |
NumberOfRefills | NVARCHAR(50) NULL | Number of refills included in the prescription. | |
RefillAsNeeded | BIT NULL | Can the patient refills as needed? Values: 1 = Yes, 0 = No. | |
GenericAllowed | BIT NULL | Can a generic drug be substituted? Values: 1 = Yes, 0 = No. | |
UserGuid | UNIQUEIDENTIFIER NOT NULL | Identifier of the provider who created the prescription. |
SmokingStatus Reference table containing valid values for patient smoking status.
Key | Column Name | Data Type | Description |
---|---|---|---|
PK | SmokingStatusGuid | UNIQUEIDENTIFIER NOT NULL | Unique identifier for the smoking status record. |
Description | NVARCHAR(255) NULL | Description of the smoking status. | |
NISTcode | INT NULL | Corresponding NIST code value for smoking status, as specified in HHS HIT 45 CFR ยง170.302(g). |
Transcript This table contains visit transcript records for a patient. Each patient visit is documented by a visit transcript record.
Key | Column Name | Data Type | Description |
---|---|---|---|
PK | TranscriptGuid | UNIQUEIDENTIFIER NOT NULL | Unique identifier for a visit transcript. |
FK | PatientGuid | UNIQUEIDENTIFIER NOT NULL | Identifier for the patient. |
VisitYear | SMALLINT NOT NULL | Year of the visit for which the transcript was created. | |
Height | FLOAT NULL | Patient height (inches). | |
Weight | FLOAT NULL | Patient weight (lbs). | |
BMI | FLOAT NULL | Body mass index. | |
SystolicBP | SMALLINT NULL | Systolic blood pressure. | |
DiastolicBP | SMALLINT NULL | Diastolic blood pressure. | |
RespiratoryRate | SMALLINT NULL | Respiratory rate (breaths per minute). | |
HeartRate | SMALLINT NULL | Heart rate (beats per minute). | |
Temperature | FLOAT NULL | Temperature as entered by the provider (Expected as degrees Fahrenheit; a very small subset in Centigrade). | |
PhysicianSpecialty | NVARCHAR(256) NOT NULL | Specialty of the provider seeing the patient. | |
UserGuid | UNIQUEIDENTIFIER NOT NULL | Identifier of the provider recording the visit transcript. |
TranscriptAllergy This table contains a list of allergies recorded per visit transcript.
Key | Column Name | Data Type | Description |
---|---|---|---|
PK | TranscriptAllergyGuid | UNIQUEIDENTIFIER NOT NULL | Unique identifier for the visit transcript allergy record. |
FK | TranscriptGuid | UNIQUEIDENTIFIER NOT NULL | Identifier for the visit transcript. |
FK | AllergyGuid | UNIQUEIDENTIFIER NOT NULL | Identifier for the allergy assigned to the visit transcript. |
DisplayOrder | INT NOT NULL | Sequence that the allergy appears in the visit transcript. |
TranscriptDiagnosis This table contains a list of diagnoses per visit transcript.
Key | Column Name | Data Type | Description |
---|---|---|---|
PK | TranscriptDiagnosisGuid | UNIQUEIDENTIFIER NOT NULL | Unique identifier for a visit transcript diagnosis record. |
FK | TranscriptGuid | UNIQUEIDENTIFIER NULL | Identifier for the visit transcript. |
FK | DiagnosisGuid | UNIQUEIDENTIFIER NOT NULL | Identifier for the diagnosis assigned to the visit transcript. |
OrderBy | INT NOT NULL | Sequence that the diagnosis appears in the visit transcript. Sometimes indicates the severity or importance of the diagnosis in relation to other diagnoses in the same visit transcript. |
Examine the surface properties of the acquired data.
===================================================================================================
Data Scientist B
Table | Description |
---|---|
Allergy | This table contains a list of allergies recorded for a patient. |
Condition | Reference table containing valid patient conditions. |
Diagnosis | This table contains a list of diagnoses for a patient. |
Immunization | This table contains a list of immunizations for a patient. |
LabObservation | This table contains lab test observations for a lab panel. |
LabPanel | This table contains lab test panels reported in a patient lab test result. |
LabResult | This table contains patient lab test results received from a lab facility. |
Medication | This table contains the medication history for a patient. |
Patient | This table contains a header record for each patient. |
PatientCondition | This table lists conditions for a patient. |
PatientSmokingStatus | This table contains changes in smoking status for each patient,by year. |
Prescription | This table contains prescription records for a patient. A prescription is a written direction from the provider for administering medication. |
SmokingStatus | Reference table containing valid values for patient smoking status. |
Transcript | This table contains visit transcript records for a patient. Each patient visit is documented by a visit transcript record. |
TranscriptAllergy | This table contains a list of allergies recorded per visit transcript. |
TranscriptDiagnosis | This table contains a list of diagnoses per visit transcript. |
Examine the surface properties of the acquired data.
Data Quality Output
(a) Continuous Features
Feature | Count | % Miss. | Card. | Min. | 1st Qrt. | Mean | Median | 3rd. Qrt. | Max | |
---|---|---|---|---|---|---|---|---|---|---|
RangeSystolicBP | 14,926 | 0% | 171 | 0.00 | 4.00 | 12.46 | 11.00 | 20.00 | 94.00 | 10.6657 |
RangeDiastolicBP | 14,926 | 0% | 493 | 0.00 | 2.00 | 8.25 | 8.00 | 12.00 | 52.00 | 6.903738 |
HighLowBP | 14,926 | 0% | 78 | 0.00 | 41.00 | 49.08 | 48.00 | 55.00 | 125.00 | 10.99835 |
RespiratoryRateMaxT | 14,926 | 2.0% | 36 | 0.00 | 0.00 | 12.27 | 16.00 | 18.00 | 80.00 | 8.244918 |
RespiratoryRateMedian | 14,926 | 0.0% | 28 | 0.00 | 0.00 | 11.81 | 16.00 | 18.00 | 79.00 | 7.909366 |
RangeBMI | 14,926 | 0.0% | 40 | 0.00 | 0.00 | 0.9779 | 0.6359 | 1.3643 | 14.4170 | 1.219054 |
(b) Categorical Features
Feature | Count | % Miss. | Card. | Mode | Mode Freq. | Mode % | 2nd Mode | 2nd Mode Freq. | 2nd Mode % |
---|---|---|---|---|---|---|---|---|---|
Gender | 14,926 | 0% | 2 | F | 8,554 | 57.3% | M | 6,372 | 42.7% |
PracticeGuid | 14,926 | 0% | 28 | 4D27688B-C925-4513-9CF9-8D281ACC6712 | 867 | 5.8% | 5639C194-706D-49FE-AF69-D4F36752893C | 740 | 5.0% |
Underweight Valid Patients(valid weight <85)
Category: Underweight | PatientGuid |
---|---|
'3C8D1326-7D79-4350-89D8-1271A34C62E3' | |
'3066A99F-3B26-44A3-95C3-349FCCEB303B' | |
'3E917FBC-C42E-4948-A834-EA999131D1A0' | |
'812E6BE3-BDBB-4FD2-AA86-7E4FCEA33431' | |
'8B6688AB-E183-4C32-8FE6-29B9D8865687' | |
'A71E42F0-385F-4102-BE27-FB55ED06D3AE' | |
'D70EDC3E-24D7-44E7-993A-501D620E999A' | |
'D70EDC3E-24D7-44E7-993A-501D620E999A' | |
'BB0CC274-240A-4C02-A764-5070CC514A87' | |
'D0703010-9CF8-426C-BD1B-61E8C53D0338' | |
'096AA3A2-3306-4F9A-9082-6B479C2BB5AA' | |
'148B6713-1D0D-4DE4-8EE5-E6D7988088ED' |
Overweight valid patients (valid weight >320)
Category: Overweight | PatientGuid |
---|---|
'04477471-FAFE-45AE-9EF8-6EC94B94789C' | |
'0D3EF269-0675-444A-9F45-6FBCF2046CCD' | |
'100ECEEB-8759-4D59-AC23-A86697DD4113' | |
'10AA6386-B0A5-4F6B-B495-DBC588042E4D' | |
'17DCFE68-7A86-473C-A589-11B0FC256BA2' | |
'1B0556CF-72DF-480E-8429-DF61C252F16F' | |
'25B9E447-F7E3-479C-8B72-E6D76CA3182D' | |
'2A767E8A-1E47-4555-B11A-8A507B48DD64' | |
'2C2F654C-FC57-4E06-9E41-9AEE559A7D6E' | |
'2CC737E9-98B8-48FE-A744-F75770C59218' | |
'2EDC74B4-360A-4114-AB6C-BF0689AF2C70' | |
'3F8A2228-0DF3-4FAF-8C26-F04A9DD14C36' | |
'50BABFE9-D522-425D-86A0-170300961A46' | |
'53C120E7-11B2-40F2-AC09-DA209DFA2CBE' | |
'5883AF5C-513F-4F2A-B204-762AAF6963E6' | |
'61B927CE-1EB3-4464-B843-4BA4943BA027' | |
'71047BE4-1639-4F34-8BE9-5AA89281F9C7' | |
'763112F7-6F43-43D7-B745-75C0506E4197' | |
'7CA6C507-CCA6-4CF2-880E-F11EBECAD928' | |
'7E3B9F76-C55F-46B9-897D-E5C4DB645CF0' | |
'83190041-69EC-4D8C-8486-DF5C99CB1844' | |
'87F932AE-370E-49B8-8B77-EE866EB04267' | |
'91A9EECB-093F-4DA7-9F18-4C803E5F94B4' | |
'92626713-16CE-4473-8523-ADCDB86A7DC8' | |
'A1E6ED8A-68AC-45F1-B622-28833D2ECC37' | |
'A42807AE-854E-409B-83E0-6B2389E4DE44' | |
'A53C6FBB-730C-4F4C-A128-6B5F3120BCE1' |
The below graph indicates data characteristics that suggest further examination of interesting data subsets:
The data is complete. It does contain an insignificant amount of errors and there are some missing values as well, however they do not occur that often.