Friday, 20 March 2009

How to identify the encoding format of the text file?

When we reading data from ANSI formatted or Unicode formatted text file using OLEDB, the schema.ini file has to be changed accordingly to read data from the text file. The default format is UTF8. If the encoding format of the text file is “ANSI/UTF8”, we do not need to specify the character set in the schema.ini file. If the encoding format of the text file is “Unicode”, we should specify the characterset=Unicode in the schema.ini file to read the non-English characters.

How to identify or find the encoding format of a text file at runtime?

FileInfo file = new FileInfo(@"C:\TestFile.txt"); StreamReader streamReader = file.OpenText(); streamReader.ReadLine(); string encodingFormat = streamReader.CurrentEncoding.EncodingName;

If you use Quick Watch window in VS.Net 2005, you will see as below


Wednesday, 11 March 2009

How to read data from Unicode formatted text file and import to Data Table using .Net

Scenario: When we want to read and display data from normal text file (ANSI Encoding) from .Net code, it does properly. But when we try to read and display the data from Unicode formatted text file (that contains characters other than English such as Tamil, Chinese, Japanese…), it will not get displayed properly. Boxed characters will be displayed in the UI such as Datagrid, DataList…etc

To overcome this problem, we can create one schema.ini file to specify the character set of the text file.

How to create a schema.ini file manually:

Write the following code in a text file


[Filename.txt]

Format=TabDelimited
CharacterSet=Unicode

FileName - specify the FileName that contains Unicode characters.
Format – specify anything such as comma or custom as per your requirement

Save this file as “schema.ini” file.
And this file should be placed in the location where the Unicode text file exists.

Now, when our .Net code read this text file, it will get the format specified in the schema.ini file automatically. We do not need to map this schema.ini file into our .Net code.

How to create a schema.ini file programmatically:

string schema = Path.Combine(file.DirectoryName, "Schema.ini");
if (!File.Exists(schema))
{
using (StreamWriter writer = new StreamWriter(schema))
{
writer.WriteLine(string.Format(CultureInfo.InvariantCulture, "[{0}]", file.Name));
writer.WriteLine(string.Format(CultureInfo.InvariantCulture, "Format={0}", delimitedFormat));
writer.WriteLine(string.Format("CharacterSet=Unicode"));
}
}

Read and store the data from the text file:

After creating a schema.ini file (either manually or programmatically), use OLEDB provider to load data from the text file to data table.

The sample code as follows:

using (
OleDbConnection con =
new OleDbConnection(
string.Format(
@"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=""{0}""; Extended Properties='text;{1}HDR={2};';",
file.DirectoryName, “FMT=TabDelimited”,"Yes")))
{
using (OleDbCommand cmd = new OleDbCommand(string.Format
("SELECT * FROM [{0}]", file.Name), con))
{
con.Open();

// Using a DataTable to process the data
using (OleDbDataAdapter adp = new OleDbDataAdapter(cmd))
{
importedData = new DataTable();
adp.Fill(importedData);

}
}
}